详情页 - 首都医科大学附属北京同仁医院知识库

当前位置：首页 > 详情页

A comparative study of GPT-4o and human ophthalmologists in glaucoma diagnosis

8| 认领 | 导出 | 链接全文 |

文献详情

资源类型：

WOS体系：

Pubmed体系：

收录情况： ◇ SCIE

作者：

机构： [1]Capital Med Univ, Beijing Tongren Hosp, Beijing Tongren Eye Ctr, Beijing Ophthalmol & Visual Sci Key Lab, Beijing 100730, Peoples R China [2]Univ Tradit Chinese Med, Affiliated Eye Hosp Shandong, Jinan 250001, Shandong, Peoples R China [3]Shanxi Med Univ, Dept Ophthalmol, Shanxi Eye Hosp Affiliated, Taiyuan 030002, Shanxi, Peoples R China [4]Yunnan Univ, Yunnan Eye Hosp, Dept Ophthalmol, Affiliated Hosp, Kunming 650021, Yunnan, Peoples R China [5]Hulunbuir Peoples Hosp, Dept Ophthalmol, Inner Mongolia Autonomous Reg, Hulunbuir, Peoples R China

出处：

DOI：

ISSN：

摘要：

Artificial intelligence (AI), particularly large language models like GPT-4o, holds promise for enhancing diagnostic accuracy in healthcare. This study evaluates the diagnostic performance of GPT-4o compared to human ophthalmologists in glaucoma cases. A prospective, observational study was conducted at a tertiary care ophthalmology center. Twenty-six glaucoma cases, including both primary and secondary types, were selected from publicly available databases and institutional records. The cases were analyzed by GPT-4o and three ophthalmologists with varying levels of experience. The accuracy and completeness of primary and differential diagnoses were assessed using 10-point and 6-point Likert scales, respectively. Statistical analyses were performed using nonparametric methods, including the Kruskal-Wallis and Mann-Whitney U tests. GPT-4o was significantly less accurate in primary diagnosis compared to human ophthalmologists. Specifically, GPT-4o achieved a mean score of 5.500 (p < 0.001) compared to Doctor C, who had the highest score of 8.038 (p < 0.001). Completeness scores for GPT-4o 3.077 (p < 0.001) were also lower than Doctor B, who had the lowest score of 3.615 (p < 0.001) among human ophthalmologists. However, for differential diagnosis, GPT-4o (7.577) showed comparable accuracy to Doctor A (7.615) and Doctor C (7.673) (p < 0.0001) while achieving the highest completeness score (4.096), outperforming Doctor C (3.846), Doctor A (2.923), and Doctor B (2.808) (p < 0.0001). AI, including GPT-4o, is currently not an acceptable standalone method for diagnosing glaucoma due to its lower accuracy compared to human clinicians. These findings suggest that GPT-4o could serve as a valuable adjunct in clinical practice, particularly in complex cases, but should not replace human expertise, especially for initial diagnoses. Future improvements in AI models could enhance their utility in ophthalmology.

语种：

被引次数：

WOS：

PubmedID：

中科院(CAS)分区：

出版当年[2023]版：

大类 | 2 区综合性期刊

小类 | 2 区综合性期刊

最新[2025]版：

大类 | 3 区综合性期刊

小类 | 3 区综合性期刊

JCR分区：

出版当年[2022]版：

Q2 MULTIDISCIPLINARY SCIENCES

最新[2024]版：

Q1 MULTIDISCIPLINARY SCIENCES

影响因子： 3.9 最新[2024版] 4.3 最新五年平均 4.6 出版当年[2022版] 4.9 出版当年五年平均 4.997 出版前一年[2021版] 3.8 出版后一年[2023版]

第一作者：

第一作者机构： [1]Capital Med Univ, Beijing Tongren Hosp, Beijing Tongren Eye Ctr, Beijing Ophthalmol & Visual Sci Key Lab, Beijing 100730, Peoples R China

通讯作者：

推荐引用方式(GB/T 7714)：

APA：

MLA：