高级检索
当前位置: 首页 > 详情页

A comparative study of GPT-4o and human ophthalmologists in glaucoma diagnosis

文献详情

资源类型:
WOS体系:
Pubmed体系:

收录情况: ◇ SCIE

机构: [1]Capital Med Univ, Beijing Tongren Hosp, Beijing Tongren Eye Ctr, Beijing Ophthalmol & Visual Sci Key Lab, Beijing 100730, Peoples R China [2]Univ Tradit Chinese Med, Affiliated Eye Hosp Shandong, Jinan 250001, Shandong, Peoples R China [3]Shanxi Med Univ, Dept Ophthalmol, Shanxi Eye Hosp Affiliated, Taiyuan 030002, Shanxi, Peoples R China [4]Yunnan Univ, Yunnan Eye Hosp, Dept Ophthalmol, Affiliated Hosp, Kunming 650021, Yunnan, Peoples R China [5]Hulunbuir Peoples Hosp, Dept Ophthalmol, Inner Mongolia Autonomous Reg, Hulunbuir, Peoples R China
出处:
ISSN:

摘要:
Artificial intelligence (AI), particularly large language models like GPT-4o, holds promise for enhancing diagnostic accuracy in healthcare. This study evaluates the diagnostic performance of GPT-4o compared to human ophthalmologists in glaucoma cases. A prospective, observational study was conducted at a tertiary care ophthalmology center. Twenty-six glaucoma cases, including both primary and secondary types, were selected from publicly available databases and institutional records. The cases were analyzed by GPT-4o and three ophthalmologists with varying levels of experience. The accuracy and completeness of primary and differential diagnoses were assessed using 10-point and 6-point Likert scales, respectively. Statistical analyses were performed using nonparametric methods, including the Kruskal-Wallis and Mann-Whitney U tests. GPT-4o was significantly less accurate in primary diagnosis compared to human ophthalmologists. Specifically, GPT-4o achieved a mean score of 5.500 (p < 0.001) compared to Doctor C, who had the highest score of 8.038 (p < 0.001). Completeness scores for GPT-4o 3.077 (p < 0.001) were also lower than Doctor B, who had the lowest score of 3.615 (p < 0.001) among human ophthalmologists. However, for differential diagnosis, GPT-4o (7.577) showed comparable accuracy to Doctor A (7.615) and Doctor C (7.673) (p < 0.0001) while achieving the highest completeness score (4.096), outperforming Doctor C (3.846), Doctor A (2.923), and Doctor B (2.808) (p < 0.0001). AI, including GPT-4o, is currently not an acceptable standalone method for diagnosing glaucoma due to its lower accuracy compared to human clinicians. These findings suggest that GPT-4o could serve as a valuable adjunct in clinical practice, particularly in complex cases, but should not replace human expertise, especially for initial diagnoses. Future improvements in AI models could enhance their utility in ophthalmology.

语种:
被引次数:
WOS:
PubmedID:
中科院(CAS)分区:
出版当年[2023]版:
大类 | 2 区 综合性期刊
小类 | 2 区 综合性期刊
最新[2025]版:
大类 | 3 区 综合性期刊
小类 | 3 区 综合性期刊
JCR分区:
出版当年[2022]版:
Q2 MULTIDISCIPLINARY SCIENCES
最新[2024]版:
Q1 MULTIDISCIPLINARY SCIENCES

影响因子: 最新[2024版] 最新五年平均 出版当年[2022版] 出版当年五年平均 出版前一年[2021版] 出版后一年[2023版]

第一作者:
第一作者机构: [1]Capital Med Univ, Beijing Tongren Hosp, Beijing Tongren Eye Ctr, Beijing Ophthalmol & Visual Sci Key Lab, Beijing 100730, Peoples R China
通讯作者:
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:28508 今日访问量:0 总访问量:1584 更新日期:2025-09-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 首都医科大学附属北京同仁医院 技术支持:重庆聚合科技有限公司 地址:北京市东城区东交民巷1号(100730)