高级检索
当前位置: 首页 > 详情页

Comparing generative and retrieval-based chatbots in answering patient questions regarding age-related macular degeneration and diabetic retinopathy

文献详情

资源类型:
WOS体系:
Pubmed体系:

收录情况: ◇ SCIE

机构: [1]Singapore Eye Res Inst, Singapore Natl Eye Ctr, Singapore, Singapore [2]Chinese Acad Med Sci, Beijing, Peoples R China [3]Peking Union Med Coll Hosp, Beijing, Peoples R China [4]Duke NUS Med Sch, Ophthalmol & Visual Sci Acad Clin Program Eye ACP, Singapore, Singapore [5]Natl Univ Singapore Hosp, Dept Ophthalmol, Singapore, Singapore [6]Natl Univ Singapore, Ctr Innovat & Precis Eye Hlth, Yong Loo Lin Sch Med, Singapore, Singapore [7]Natl Univ Singapore, Yong Loo Lin Sch Med, Dept Ophthalmol, Singapore, Singapore [8]Capital Univ Med Sci, Beijing Tongren Hosp, Beijing Inst Ophthalmol, Beijing, Peoples R China [9]Moorfields Eye Hosp NHS Fdn Trust, London, England [10]Moorfields Eye Hosp NHS Fdn Trust, Med Retina, London, England [11]Univ Washington, Dept Ophthalmol, Seattle, WA USA [12]Tsinghua Univ, Tsinghua Med, Beijing, Peoples R China [13]Beijing Tsinghua Changgung Hosp, Sch Clin Med, Beijing, Peoples R China [14]Sungkyunkwan Univ, Seoul, South Korea [15]Kangbuk Samsung Hosp, Seoul, South Korea [16]Natl Univ Singapore, Yong Loo Lin Sch Med, Ophthalmol, Singapore, Singapore
出处:
ISSN:

关键词: Macula Public health Retina

摘要:
Background/aims To compare the performance of generative versus retrieval-based chatbots in answering patient inquiries regarding age-related macular degeneration (AMD) and diabetic retinopathy (DR).Methods We evaluated four chatbots: generative models (ChatGPT-4, ChatGPT-3.5 and Google Bard) and a retrieval-based model (OcularBERT) in a cross-sectional study. Their response accuracy to 45 questions (15 AMD, 15 DR and 15 others) was evaluated and compared. Three masked retinal specialists graded the responses using a three-point Likert scale: either 2 (good, error-free), 1 (borderline) or 0 (poor with significant inaccuracies). The scores were aggregated, ranging from 0 to 6. Based on majority consensus among the graders, the responses were also classified as 'Good', 'Borderline' or 'Poor' quality.Results Overall, ChatGPT-4 and ChatGPT-3.5 outperformed the other chatbots, both achieving median scores (IQR) of 6 (1), compared with 4.5 (2) in Google Bard, and 2 (1) in OcularBERT (all p <= 8.4x10-3). Based on the consensus approach, 83.3% of ChatGPT-4's responses and 86.7% of ChatGPT-3.5's were rated as 'Good', surpassing Google Bard (50%) and OcularBERT (10%) (all p <= 1.4x10-2). ChatGPT-4 and ChatGPT-3.5 had no 'Poor' rated responses. Google Bard produced 6.7% Poor responses, and OcularBERT produced 20%. Across question types, ChatGPT-4 outperformed Google Bard only for AMD, and ChatGPT-3.5 outperformed Google Bard for DR and others.Conclusion ChatGPT-4 and ChatGPT-3.5 demonstrated superior performance, followed by Google Bard and OcularBERT. Generative chatbots are potentially capable of answering domain-specific questions outside their original training. Further validation studies are still required prior to real-world implementation.

基金:
语种:
被引次数:
WOS:
PubmedID:
中科院(CAS)分区:
出版当年[2023]版:
大类 | 2 区 医学
小类 | 2 区 眼科学
最新[2025]版:
大类 | 2 区 医学
小类 | 2 区 眼科学
JCR分区:
出版当年[2022]版:
Q1 OPHTHALMOLOGY
最新[2023]版:
Q1 OPHTHALMOLOGY

影响因子: 最新[2023版] 最新五年平均 出版当年[2022版] 出版当年五年平均 出版前一年[2021版] 出版后一年[2023版]

第一作者:
第一作者机构: [1]Singapore Eye Res Inst, Singapore Natl Eye Ctr, Singapore, Singapore
通讯作者:
通讯机构: [1]Singapore Eye Res Inst, Singapore Natl Eye Ctr, Singapore, Singapore [4]Duke NUS Med Sch, Ophthalmol & Visual Sci Acad Clin Program Eye ACP, Singapore, Singapore [6]Natl Univ Singapore, Ctr Innovat & Precis Eye Hlth, Yong Loo Lin Sch Med, Singapore, Singapore [7]Natl Univ Singapore, Yong Loo Lin Sch Med, Dept Ophthalmol, Singapore, Singapore [16]Natl Univ Singapore, Yong Loo Lin Sch Med, Ophthalmol, Singapore, Singapore
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:23419 今日访问量:4 总访问量:1280 更新日期:2025-04-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 首都医科大学附属北京同仁医院 技术支持:重庆聚合科技有限公司 地址:北京市东城区东交民巷1号(100730)