Skip to main content

Visual-Attribute Prompt Learning for Progressive Mild Cognitive Impairment Prediction

  • 1979 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 14224)


Deep learning (DL) has been used in the automatic diagnosis of Mild Cognitive Impairment (MCI) and Alzheimer’s Disease (AD) with brain imaging data. However, previous methods have not fully exploited the relation between brain image and clinical information that is widely adopted by experts in practice. To exploit the heterogeneous features from imaging and tabular data simultaneously, we propose the Visual-Attribute Prompt Learning-based Transformer (VAP-Former), a transformer-based network that efficiently extracts and fuses the multi-modal features with prompt fine-tuning. Furthermore, we propose a Prompt fine-Tuning (PT) scheme to transfer the knowledge from AD prediction task for progressive MCI (pMCI) diagnosis. In details, we first pre-train the VAP-Former without prompts on the AD diagnosis task and then fine-tune the model on the pMCI detection task with PT, which only needs to optimize a small amount of parameters while keeping the backbone frozen. Next, we propose a novel global prompt token for the visual prompts to provide global guidance to the multi-modal representations. Extensive experiments not only show the superiority of our method compared with the state-of-the-art methods in pMCI prediction but also demonstrate that the global prompt can make the prompt learning process more effective and stable. Interestingly, the proposed prompt learning model even outperforms the fully fine-tuning baseline on transferring the knowledge from AD to pMCI.


  • Alzheimer’s disease
  • Prompt learning
  • Magnetic resonance imaging
  • Multi-modal classification
  • Transformer
  • Attention modeling

This work is supported by Chinese Key-Area Research and Development Program of Guangdong Province (2020B0101350001), and the National Natural Science Foundation of China (No.62102267), and the Guangdong Basic and Applied Basic Research Foundation (2023A1515011464), and the Shenzhen Science and Technology Program (JCYJ20220818103001002), and the Guangdong Provincial Key Laboratory of Big Data Computing, The Chinese University of Hong Kong, Shenzhen.

L. Kang and H. Gong—Contribute equally to this work.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.

  2. 2.


  1. Al-Kababji, A., Bensaali, F., Dakua, S.P.: Scheduling techniques for liver segmentation: Reducelronplateau vs OneCycleLR. In: Bennour, A., Ensari, T., Kessentini, Y., Eom, S. (eds.) ISPR 2022, vol. 1589, pp. 204–212. Springer, Heidelberg (2022).

    CrossRef  Google Scholar 

  2. Arbabshirani, M.R., Plis, S., Sui, J., Calhoun, V.D.: Single subject prediction of brain disorders in neuroimaging: promises and pitfalls. Neuroimage 145, 137–165 (2017)

    CrossRef  Google Scholar 

  3. Brodersen, K.H., Ong, C.S., Stephan, K.E., Buhmann, J.M.: The balanced accuracy and its posterior distribution. In: 20th International Conference on Pattern Recognition, pp. 3121–3124. IEEE (2010)

    Google Scholar 

  4. Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  5. Chen, S., Guhur, P.L., Schmid, C., Laptev, I.: History aware multimodal transformer for vision-and-language navigation. Adv. Neural Inf. Process. Syst. 34, 5834–5847 (2021)

    Google Scholar 

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)

    Google Scholar 

  7. Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021).

  8. Ebrahimighahnavieh, M.A., Luo, S., Chiong, R.: Deep learning to detect Alzheimer’s disease from neuroimaging: a systematic literature review. Comput. Methods Prog. Biomed. 187, 105242 (2020)

    CrossRef  Google Scholar 

  9. El-Sappagh, S., Abuhmed, T., Islam, S.R., Kwak, K.S.: Multimodal multitask deep learning model for Alzheimer’s disease progression detection based on time series data. Neurocomputing 412, 197–215 (2020)

    CrossRef  Google Scholar 

  10. Gao, P., et al.: Dynamic fusion with intra-and inter-modality attention flow for visual question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6639–6648 (2019)

    Google Scholar 

  11. Gong, H., Chen, G., Mao, M., Li, Z., Li, G.: Vqamix: conditional triplet mixup for medical visual question answering. IEEE Trans. Med. Imaging 41(11), 3332–3343 (2022)

    CrossRef  Google Scholar 

  12. He, X., Yang, S., Li, G., Li, H., Chang, H., Yu, Y.: Non-local context encoder: robust biomedical image segmentation against adversarial attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8417–8424 (2019)

    Google Scholar 

  13. Huang, J., Li, H., Li, G., Wan, X.: Attentive symmetric autoencoder for brain MRI segmentation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, pp. 203–213. Springer, Heidelberg (2022).

    CrossRef  Google Scholar 

  14. Jack, C.R., Jr., et al.: The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. J. Magn. Reson. Imaging 27(4), 685–691 (2008)

    CrossRef  Google Scholar 

  15. Jia, M., et al.: Visual prompt tuning. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, pp. 709–727. Springer, Heidelberg (2022).

    CrossRef  Google Scholar 

  16. Li, H., Chen, G., Li, G., Yu, Y.: Motion guided attention for video salient object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7274–7283 (2019)

    Google Scholar 

  17. Li, H., et al.: View-disentangled transformer for brain lesion detection. In: IEEE 19th International Symposium on Biomedical Imaging (ISBI), pp. 1–5 (2022)

    Google Scholar 

  18. Li, H., Li, G., Yang, B., Chen, G., Lin, L., Yu, Y.: Depthwise nonlocal module for fast salient object detection using a single thread. IEEE Trans. Cybern. 51(12), 6188–6199 (2020)

    CrossRef  Google Scholar 

  19. Lian, C., Liu, M., Zhang, J., Shen, D.: Hierarchical fully convolutional network for joint atrophy localization and alzheimer’s disease diagnosis using structural MRI. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 880–893 (2020)

    CrossRef  Google Scholar 

  20. Loshchilov, I., Hutter, F.: Fixing weight decay regularization in adam (2018).

  21. Padhi, I., et al.: Tabular transformers for modeling multivariate time series. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3565–3569 (2021)

    Google Scholar 

  22. Pan, Y., Chen, Y., Shen, D., Xia, Y.: Collaborative image synthesis and disease diagnosis for classification of neurodegenerative disorders with incomplete multi-modal neuroimages. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 480–489. Springer, Cham (2021).

    CrossRef  Google Scholar 

  23. Pan, Y., Liu, M., Lian, C., Xia, Y., Shen, D.: Spatially-constrained fisher representation for brain disease identification with incomplete multi-modal neuroimages. IEEE Trans. Med. Imaging 39(9), 2965–2975 (2020)

    CrossRef  Google Scholar 

  24. Pölsterl, S., Wolf, T.N., Wachinger, C.: Combining 3D image and tabular data via the dynamic affine feature map transform. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 688–698. Springer, Cham (2021).

    CrossRef  Google Scholar 

  25. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  26. Risacher, S.L., Saykin, A.J., Wes, J.D., Shen, L., Firpi, H.A., McDonald, B.C.: Baseline MRI predictors of conversion from MCI to probable AD in the ADNI cohort. Curr. Alzheimer Res. 6(4), 347–361 (2009)

    CrossRef  Google Scholar 

  27. Ruiz, J., Mahmud, M., Modasshir, Md., Shamim Kaiser, M.: 3D DenseNet ensemble in 4-way classification of alzheimer’s disease. In: Mahmud, M., Vassanelli, S., Kaiser, M.S., Zhong, N. (eds.) BI 2020. LNCS (LNAI), vol. 12241, pp. 85–96. Springer, Cham (2020).

    CrossRef  Google Scholar 

  28. Shaker, A., Maaz, M., Rasheed, H., Khan, S., Yang, M.H., Khan, F.S.: Unetr++: delving into efficient and accurate 3d medical image segmentation. arXiv preprint arXiv:2212.04497 (2022)

  29. Spasov, S., Passamonti, L., Duggento, A., Lio, P., Toschi, N., Initiative, A.D.N., et al.: A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer’s disease. Neuroimage 189, 276–287 (2019)

    CrossRef  Google Scholar 

  30. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  31. Winblad, B., et al.: Defeating Alzheimer’s disease and other dementias: a priority for European science and society. Lancet Neurol. 15(5), 455–532 (2016)

    CrossRef  Google Scholar 

  32. Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vision 130(9), 2337–2348 (2022)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Haofeng Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 121 KB)

Rights and permissions

Reprints and Permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kang, L., Gong, H., Wan, X., Li, H. (2023). Visual-Attribute Prompt Learning for Progressive Mild Cognitive Impairment Prediction. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14224. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43903-2

  • Online ISBN: 978-3-031-43904-9

  • eBook Packages: Computer ScienceComputer Science (R0)