高级检索
当前位置: 首页 > 详情页

MIL-VT: Multiple Instance Learning Enhanced Vision Transformer for Fundus Image Classification

文献详情

资源类型:
WOS体系:

收录情况: ◇ CPCI(ISTP) ◇ EI

机构: [1]Tencent, Tencent Jarvis Lab, Shenzhen, Peoples R China [2]Capital Med Univ, Beijing Tongren Hosp, Beijing, Peoples R China
出处:
ISSN:

关键词: Vision Transformer Multiple instance learning Fundus image Deep learning

摘要:
With the advancement and prevailing success of Transformer models in the natural language processing (NLP) field, an increasing number of research works have explored the applicability of Transformer for various vision tasks and reported superior performance compared with convolutional neural networks (CNNs). However, as the proper training of Transformer generally requires an extremely large quantity of data, it has rarely been explored for the medical imaging tasks. In this paper, we attempt to adopt the Vision Transformer for the retinal disease classification tasks, by pre-training the Transformer model on a large fundus image database and then fine-tuning on downstream retinal disease classification tasks. In addition, to fully exploit the feature representations extracted by individual image patches, we propose a multiple instance learning (MIL) based 'MIL head', which can be conveniently attached to the Vision Transformer in a plug-and-play manner and effectively enhances the model performance for the downstream fundus image classification tasks. The proposed MIL-VT framework achieves superior performance over CNN models on two publicly available datasets when being trained and tested under the same setup. The implementation code and pre-trained weights are released for public access (Code link: https://github.com/greentreeys/MIL-VT).

基金:
语种:
被引次数:
WOS:
第一作者:
第一作者机构: [1]Tencent, Tencent Jarvis Lab, Shenzhen, Peoples R China
通讯作者:
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:21169 今日访问量:0 总访问量:1219 更新日期:2025-01-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 首都医科大学附属北京同仁医院 技术支持:重庆聚合科技有限公司 地址:北京市东城区东交民巷1号(100730)