Background/aims To compare the performance of generative versus retrieval-based chatbots in answering patient inquiries regarding age-related macular degeneration (AMD) and diabetic retinopathy (DR).Methods We evaluated four chatbots: generative models (ChatGPT-4, ChatGPT-3.5 and Google Bard) and a retrieval-based model (OcularBERT) in a cross-sectional study. Their response accuracy to 45 questions (15 AMD, 15 DR and 15 others) was evaluated and compared. Three masked retinal specialists graded the responses using a three-point Likert scale: either 2 (good, error-free), 1 (borderline) or 0 (poor with significant inaccuracies). The scores were aggregated, ranging from 0 to 6. Based on majority consensus among the graders, the responses were also classified as 'Good', 'Borderline' or 'Poor' quality.Results Overall, ChatGPT-4 and ChatGPT-3.5 outperformed the other chatbots, both achieving median scores (IQR) of 6 (1), compared with 4.5 (2) in Google Bard, and 2 (1) in OcularBERT (all p <= 8.4x10-3). Based on the consensus approach, 83.3% of ChatGPT-4's responses and 86.7% of ChatGPT-3.5's were rated as 'Good', surpassing Google Bard (50%) and OcularBERT (10%) (all p <= 1.4x10-2). ChatGPT-4 and ChatGPT-3.5 had no 'Poor' rated responses. Google Bard produced 6.7% Poor responses, and OcularBERT produced 20%. Across question types, ChatGPT-4 outperformed Google Bard only for AMD, and ChatGPT-3.5 outperformed Google Bard for DR and others.Conclusion ChatGPT-4 and ChatGPT-3.5 demonstrated superior performance, followed by Google Bard and OcularBERT. Generative chatbots are potentially capable of answering domain-specific questions outside their original training. Further validation studies are still required prior to real-world implementation.
基金:
This work was supported in part by the Institute of Information &
Communications Technology Planning & Evaluation (IITP) grant funded by the Korea
government (MSIT) (No. 2021-0-
02068,
Artificial Intelligence Innovation Hub).
第一作者机构:[1]Singapore Eye Res Inst, Singapore Natl Eye Ctr, Singapore, Singapore
通讯作者:
通讯机构:[1]Singapore Eye Res Inst, Singapore Natl Eye Ctr, Singapore, Singapore[4]Duke NUS Med Sch, Ophthalmol & Visual Sci Acad Clin Program Eye ACP, Singapore, Singapore[6]Natl Univ Singapore, Ctr Innovat & Precis Eye Hlth, Yong Loo Lin Sch Med, Singapore, Singapore[7]Natl Univ Singapore, Yong Loo Lin Sch Med, Dept Ophthalmol, Singapore, Singapore[16]Natl Univ Singapore, Yong Loo Lin Sch Med, Ophthalmol, Singapore, Singapore
推荐引用方式(GB/T 7714):
Cheong Kai Xiong,Zhang Chenxi,Tan Tien-En,et al.Comparing generative and retrieval-based chatbots in answering patient questions regarding age-related macular degeneration and diabetic retinopathy[J].BRITISH JOURNAL OF OPHTHALMOLOGY.2024,108(10):1443-1449.doi:10.1136/bjo-2023-324533.
APA:
Cheong, Kai Xiong,Zhang, Chenxi,Tan, Tien-En,Fenner, Beau J.,Wong, Wendy Meihua...&Tham, Yih Chung.(2024).Comparing generative and retrieval-based chatbots in answering patient questions regarding age-related macular degeneration and diabetic retinopathy.BRITISH JOURNAL OF OPHTHALMOLOGY,108,(10)
MLA:
Cheong, Kai Xiong,et al."Comparing generative and retrieval-based chatbots in answering patient questions regarding age-related macular degeneration and diabetic retinopathy".BRITISH JOURNAL OF OPHTHALMOLOGY 108..10(2024):1443-1449