详情页 - 首都医科大学附属北京同仁医院知识库

当前位置：首页 > 详情页

MIL-ViT: A multiple instance vision transformer for fundus image classification

文献详情

资源类型：

WOS体系：

收录情况： ◇ SCIE

作者：

机构： [1]Jarvis Res Ctr, Tencent YouTu Lab, Shenzhen, Peoples R China [2]Pazhou Lab, Guangzhou, Peoples R China [3]Guangxi Med Univ, Life Sci Inst, Nanning, Peoples R China [4]Capital Med Univ, Beijing Tongren Hosp, Beijing, Peoples R China

出处：

DOI：

ISSN：

关键词： Vision transformer Multiple instance learning Fundus image Attention aggregation Calibrated attention mechanism

摘要：

Despite the great success of deep learning approaches, retinal disease classification is still challenging as the early-stage pathological regions of retinal diseases may be extremely tiny and subtle, which are difficult for networks to detect. The feature representations learnt by deep learning models focusing more on the local view may lead to indiscriminative semantic-level representation. On the contrary, if they focus more on the global semantic-level, they may ignore the discerning subtle local pathological regions. To address this issue, in this paper, we propose a hybrid framework, combining the strong global semantic representation learning capability of the vision Transformer (ViT) and the excellent capacity of local representation extraction from the conventional multiple instance learning (MIL). Particularly, a multiple instance vision Transformer (MIL-ViT) is implemented, where the vanilla ViT branch and the MIL branch generate semantic probability distributions separately, and a bag consistency loss is proposed to minimize the difference between them. Moreover, a calibrated attention mechanism is developed to embed the instance representation into the bag representation in our MIL-ViT. To further improve the feature representation capability for fundus images, we pre-train the vanilla ViT on a large-scale fundus image database. The experimental results validate that the generalization capability of the model using our pre-trained weights for fundus disease diagnosis is better than the one using ImageNet pre-trained weights. Extensive experiments on four publicly available benchmarks demonstrate that our proposed MIL-ViT outperforms latest fundus image classification methods, including various deep learning models and deep MIL methods. All our source code and pre-trained models are publicly available at https://github.com/greentreeys/MIL-VT.

基金：

语种：

被引次数：

WOS：

中科院(CAS)分区：

出版当年[2022]版：

大类 | 3 区计算机科学

小类 | 3 区计算机：软件工程 3 区计算机：信息系统

最新[2025]版：

大类 | 4 区计算机科学

小类 | 4 区计算机：信息系统 4 区计算机：软件工程

JCR分区：

出版当年[2021]版：

Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

最新[2023]版：

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

影响因子： 2.6 最新[2023版] 2.4 最新五年平均 2.887 出版当年[2021版] 2.711 出版当年五年平均 2.678 出版前一年[2020版] 2.6 出版后一年[2022版]

第一作者：

第一作者机构： [1]Jarvis Res Ctr, Tencent YouTu Lab, Shenzhen, Peoples R China

通讯作者：

推荐引用方式(GB/T 7714)：

APA：

MLA：

相关文献

[1]MIL-VT: Multiple Instance Learning Enhanced Vision Transformer for Fundus Image Classification [2]SSVT: Self-Supervised Vision Transformer For Eye Disease Diagnosis Based On Fundus Images [3]Local-Global Dual Perception Based Deep Multiple Instance Learning for Retinal Disease Classification [4]Fundus Image Based Cataract Classification [5]眼底图像的配准 [6]Improved dual-aggregation polyp segmentation network combining a pyramid vision transformer with a fully convolutional network [7]A fundus image classification framework for learning with noisy labels [8]A VGG attention vision transformer network for benign and malignant classification of breast ultrasound images [9]Vision transformer-based stratification of pre/diabetic and pre/hypertensive patients from retinal photographs for 3PM applications [10]Versatile cataract fundus image restoration model utilizing unpaired cataract and high-quality images