详情页 - 首都医科大学附属北京同仁医院知识库

当前位置：首页 > 详情页

MIL-VT: Multiple Instance Learning Enhanced Vision Transformer for Fundus Image Classification

文献详情

资源类型：

WOS体系：

收录情况： ◇ CPCI(ISTP) ◇ EI

作者：

机构： [1]Tencent, Tencent Jarvis Lab, Shenzhen, Peoples R China [2]Capital Med Univ, Beijing Tongren Hosp, Beijing, Peoples R China

出处：

DOI：

ISSN：

关键词： Vision Transformer Multiple instance learning Fundus image Deep learning

摘要：

With the advancement and prevailing success of Transformer models in the natural language processing (NLP) field, an increasing number of research works have explored the applicability of Transformer for various vision tasks and reported superior performance compared with convolutional neural networks (CNNs). However, as the proper training of Transformer generally requires an extremely large quantity of data, it has rarely been explored for the medical imaging tasks. In this paper, we attempt to adopt the Vision Transformer for the retinal disease classification tasks, by pre-training the Transformer model on a large fundus image database and then fine-tuning on downstream retinal disease classification tasks. In addition, to fully exploit the feature representations extracted by individual image patches, we propose a multiple instance learning (MIL) based 'MIL head', which can be conveniently attached to the Vision Transformer in a plug-and-play manner and effectively enhances the model performance for the downstream fundus image classification tasks. The proposed MIL-VT framework achieves superior performance over CNN models on two publicly available datasets when being trained and tested under the same setup. The implementation code and pre-trained weights are released for public access (Code link: https://github.com/greentreeys/MIL-VT).

基金：

语种：

被引次数：

WOS：

第一作者：

第一作者机构： [1]Tencent, Tencent Jarvis Lab, Shenzhen, Peoples R China

通讯作者：

推荐引用方式(GB/T 7714)：

APA：

MLA：

相关文献

[1]MIL-ViT: A multiple instance vision transformer for fundus image classification [2]Local-Global Dual Perception Based Deep Multiple Instance Learning for Retinal Disease Classification [3]A VGG attention vision transformer network for benign and malignant classification of breast ultrasound images [4]Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening [5]Deep Learning-Based Estimation of Axial Length and Subfoveal Choroidal Thickness From Color Fundus Photographs [6]Fundus photograph-based cataract evaluation network using deep learning [7]Reliable and stable fundus image registration based on brain-inspired spatially-varying adaptive pyramid context aggregation network [8]S-UNet: A Bridge-Style U-Net Framework With a Saliency Mechanism for Retinal Vessel Segmentation [9]Effective methods of diabetic retinopathy detection based on deep convolutional neural networks. [10]Fundus Image Based Cataract Classification