Early detection of eye diseases is vital for preventing vision loss. Existing ophthalmic artificial intelligence models focus on single modalities, overlooking multi-view information and struggling with rare diseases due to long-tail distributions. We propose EyeCLIP, a multimodal visual-language foundation model trained on 2.77 million ophthalmology images from 11 modalities with partial clinical text. Our novel pretraining strategy combines self-supervised reconstruction, multimodal image contrastive learning, and image-text contrastive learning to capture shared representations across modalities. EyeCLIP demonstrates robust performance across 14 benchmark datasets, excelling in disease classification, visual question answering, and cross-modal retrieval. It also exhibits strong few-shot and zero-shot capabilities, enabling accurate predictions in real-world, long-tail scenarios. EyeCLIP offers significant potential for detecting both ocular and systemic diseases, and bridging gaps in real-world clinical applications.
基金:
Start-up Fund for RAPs under the Strategic Hiring Scheme; Hong Kong Jockey Club Charities Trust [P0046113]; Global STEM Professorship Scheme; Henry G. Leong Endowed Professorship in Elderly Vision Health
语种:
外文
被引次数:
WOS:
PubmedID:
中科院(CAS)分区:
出版当年[2025]版:
大类|1 区医学
小类|1 区卫生保健与服务1 区医学:信息
最新[2025]版:
大类|1 区医学
小类|1 区卫生保健与服务1 区医学:信息
JCR分区:
出版当年[2023]版:
Q1HEALTH CARE SCIENCES & SERVICESQ1MEDICAL INFORMATICS
最新[2024]版:
Q1HEALTH CARE SCIENCES & SERVICESQ1MEDICAL INFORMATICS
第一作者机构:[1]Hong Kong Polytech Univ, Sch Optometry, Kowloon, Hong Kong, Peoples R China[2]Hong Kong Polytech Univ, Res Ctr SHARP Vis RCSV, Kowloon, Hong Kong, Peoples R China
共同第一作者:
通讯作者:
通讯机构:[1]Hong Kong Polytech Univ, Sch Optometry, Kowloon, Hong Kong, Peoples R China[2]Hong Kong Polytech Univ, Res Ctr SHARP Vis RCSV, Kowloon, Hong Kong, Peoples R China[12]Ctr Eye & Vis Res CEVR, 17W Hong Kong Sci Pk,Sci Pk, Hong Kong, Peoples R China
推荐引用方式(GB/T 7714):
Shi Danli,Zhang Weiyi,Yang Jiancheng,et al.A multimodal visual-language foundation model for computational ophthalmology[J].NPJ DIGITAL MEDICINE.2025,8(1):doi:10.1038/s41746-025-01772-2.
APA:
Shi, Danli,Zhang, Weiyi,Yang, Jiancheng,Huang, Siyu,Chen, Xiaolan...&He, Mingguang.(2025).A multimodal visual-language foundation model for computational ophthalmology.NPJ DIGITAL MEDICINE,8,(1)
MLA:
Shi, Danli,et al."A multimodal visual-language foundation model for computational ophthalmology".NPJ DIGITAL MEDICINE 8..1(2025)