Abstract
Deep learning (DL) has been used in the automatic diagnosis of Mild Cognitive Impairment (MCI) and Alzheimer’s Disease (AD) with brain imaging data. However, previous methods have not fully exploited the relation between brain image and clinical information that is widely adopted by experts in practice. To exploit the heterogeneous features from imaging and tabular data simultaneously, we propose the Visual-Attribute Prompt Learning-based Transformer (VAP-Former), a transformer-based network that efficiently extracts and fuses the multi-modal features with prompt fine-tuning. Furthermore, we propose a Prompt fine-Tuning (PT) scheme to transfer the knowledge from AD prediction task for progressive MCI (pMCI) diagnosis. In details, we first pre-train the VAP-Former without prompts on the AD diagnosis task and then fine-tune the model on the pMCI detection task with PT, which only needs to optimize a small amount of parameters while keeping the backbone frozen. Next, we propose a novel global prompt token for the visual prompts to provide global guidance to the multi-modal representations. Extensive experiments not only show the superiority of our method compared with the state-of-the-art methods in pMCI prediction but also demonstrate that the global prompt can make the prompt learning process more effective and stable. Interestingly, the proposed prompt learning model even outperforms the fully fine-tuning baseline on transferring the knowledge from AD to pMCI.
Keywords
- Alzheimer’s disease
- Prompt learning
- Magnetic resonance imaging
- Multi-modal classification
- Transformer
- Attention modeling
This work is supported by Chinese Key-Area Research and Development Program of Guangdong Province (2020B0101350001), and the National Natural Science Foundation of China (No.62102267), and the Guangdong Basic and Applied Basic Research Foundation (2023A1515011464), and the Shenzhen Science and Technology Program (JCYJ20220818103001002), and the Guangdong Provincial Key Laboratory of Big Data Computing, The Chinese University of Hong Kong, Shenzhen.
L. Kang and H. Gong—Contribute equally to this work.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Al-Kababji, A., Bensaali, F., Dakua, S.P.: Scheduling techniques for liver segmentation: Reducelronplateau vs OneCycleLR. In: Bennour, A., Ensari, T., Kessentini, Y., Eom, S. (eds.) ISPR 2022, vol. 1589, pp. 204–212. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-08277-1_17
Arbabshirani, M.R., Plis, S., Sui, J., Calhoun, V.D.: Single subject prediction of brain disorders in neuroimaging: promises and pitfalls. Neuroimage 145, 137–165 (2017)
Brodersen, K.H., Ong, C.S., Stephan, K.E., Buhmann, J.M.: The balanced accuracy and its posterior distribution. In: 20th International Conference on Pattern Recognition, pp. 3121–3124. IEEE (2010)
Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020)
Chen, S., Guhur, P.L., Schmid, C., Laptev, I.: History aware multimodal transformer for vision-and-language navigation. Adv. Neural Inf. Process. Syst. 34, 5834–5847 (2021)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=YicbFdNTTy
Ebrahimighahnavieh, M.A., Luo, S., Chiong, R.: Deep learning to detect Alzheimer’s disease from neuroimaging: a systematic literature review. Comput. Methods Prog. Biomed. 187, 105242 (2020)
El-Sappagh, S., Abuhmed, T., Islam, S.R., Kwak, K.S.: Multimodal multitask deep learning model for Alzheimer’s disease progression detection based on time series data. Neurocomputing 412, 197–215 (2020)
Gao, P., et al.: Dynamic fusion with intra-and inter-modality attention flow for visual question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6639–6648 (2019)
Gong, H., Chen, G., Mao, M., Li, Z., Li, G.: Vqamix: conditional triplet mixup for medical visual question answering. IEEE Trans. Med. Imaging 41(11), 3332–3343 (2022)
He, X., Yang, S., Li, G., Li, H., Chang, H., Yu, Y.: Non-local context encoder: robust biomedical image segmentation against adversarial attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8417–8424 (2019)
Huang, J., Li, H., Li, G., Wan, X.: Attentive symmetric autoencoder for brain MRI segmentation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, pp. 203–213. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-16443-9_20
Jack, C.R., Jr., et al.: The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. J. Magn. Reson. Imaging 27(4), 685–691 (2008)
Jia, M., et al.: Visual prompt tuning. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, pp. 709–727. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19827-4_41
Li, H., Chen, G., Li, G., Yu, Y.: Motion guided attention for video salient object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7274–7283 (2019)
Li, H., et al.: View-disentangled transformer for brain lesion detection. In: IEEE 19th International Symposium on Biomedical Imaging (ISBI), pp. 1–5 (2022)
Li, H., Li, G., Yang, B., Chen, G., Lin, L., Yu, Y.: Depthwise nonlocal module for fast salient object detection using a single thread. IEEE Trans. Cybern. 51(12), 6188–6199 (2020)
Lian, C., Liu, M., Zhang, J., Shen, D.: Hierarchical fully convolutional network for joint atrophy localization and alzheimer’s disease diagnosis using structural MRI. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 880–893 (2020)
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in adam (2018). https://openreview.net/forum?id=rk6qdGgCZ
Padhi, I., et al.: Tabular transformers for modeling multivariate time series. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3565–3569 (2021)
Pan, Y., Chen, Y., Shen, D., Xia, Y.: Collaborative image synthesis and disease diagnosis for classification of neurodegenerative disorders with incomplete multi-modal neuroimages. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 480–489. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3_46
Pan, Y., Liu, M., Lian, C., Xia, Y., Shen, D.: Spatially-constrained fisher representation for brain disease identification with incomplete multi-modal neuroimages. IEEE Trans. Med. Imaging 39(9), 2965–2975 (2020)
Pölsterl, S., Wolf, T.N., Wachinger, C.: Combining 3D image and tabular data via the dynamic affine feature map transform. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 688–698. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3_66
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Risacher, S.L., Saykin, A.J., Wes, J.D., Shen, L., Firpi, H.A., McDonald, B.C.: Baseline MRI predictors of conversion from MCI to probable AD in the ADNI cohort. Curr. Alzheimer Res. 6(4), 347–361 (2009)
Ruiz, J., Mahmud, M., Modasshir, Md., Shamim Kaiser, M.: 3D DenseNet ensemble in 4-way classification of alzheimer’s disease. In: Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N. (eds.) BI 2020. LNCS (LNAI), vol. 12241, pp. 85–96. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59277-6_8
Shaker, A., Maaz, M., Rasheed, H., Khan, S., Yang, M.H., Khan, F.S.: Unetr++: delving into efficient and accurate 3d medical image segmentation. arXiv preprint arXiv:2212.04497 (2022)
Spasov, S., Passamonti, L., Duggento, A., Lio, P., Toschi, N., Initiative, A.D.N., et al.: A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer’s disease. Neuroimage 189, 276–287 (2019)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Winblad, B., et al.: Defeating Alzheimer’s disease and other dementias: a priority for European science and society. Lancet Neurol. 15(5), 455–532 (2016)
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vision 130(9), 2337–2348 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kang, L., Gong, H., Wan, X., Li, H. (2023). Visual-Attribute Prompt Learning for Progressive Mild Cognitive Impairment Prediction. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14224. Springer, Cham. https://doi.org/10.1007/978-3-031-43904-9_53
Download citation
DOI: https://doi.org/10.1007/978-3-031-43904-9_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43903-2
Online ISBN: 978-3-031-43904-9
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://miccai.org/