Skip to main content

Overcoming Data Limitation in Medical Visual Question Answering

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2019 (MICCAI 2019)

Abstract

Traditional approaches for Visual Question Answering (VQA) require large amount of labeled data for training. Unfortunately, such large scale data is usually not available for medical domain. In this paper, we propose a novel medical VQA framework that overcomes the labeled data limitation. The proposed framework explores the use of the unsupervised Denoising Auto-Encoder (DAE) and the supervised Meta-Learning. The advantage of DAE is to leverage the large amount of unlabeled images while the advantage of Meta-Learning is to learn meta-weights that quickly adapt to VQA problem with limited labeled data. By leveraging the advantages of these techniques, it allows the proposed framework to be efficiently trained using a small labeled training set. The experimental results show that our proposed method significantly outperforms the state-of-the-art medical VQA. The source code is available at https://github.com/aioz-ai/MICCAI19-MedVQA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The descriptions of new defined classes are presented in Sect. 4.1.

  2. 2.

    https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia.

  3. 3.

    https://www.synapse.org/#!Synapse:syn3193805/wiki/217753.

  4. 4.

    Those frameworks are completed VQA models in which the core components in those frameworks are SAN and MCB attentions. We refer the reader to the corresponding papers [5, 19] for the detail of those models.

References

  1. Abacha, A.B., Gayen, S., Lau, J.J., Rajaraman, S., Demner-Fushman, D.: NLM at ImageCLEF 2018 visual question answering in the medical domain. In: CEUR Workshop Proceedings (2018)

    Google Scholar 

  2. Bar, Y., Diamant, I., Wolf, L., Greenspan, H.: Deep learning with non-medical training used for chest pathology identification. In: Medical Imaging: Computer-Aided Diagnosis (2015)

    Google Scholar 

  3. Clark, K., Vendt, B., Smith, K., Freymann, J., et al.: The cancer imaging archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging 26(6), 1045–1057 (2013)

    Article  Google Scholar 

  4. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)

    Google Scholar 

  5. Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. In: EMNLP (2016)

    Google Scholar 

  6. Hasan, S.A., Ling, Y., Farri, O., Liu, J., Lungren, M., Müller, H.: Overview of the ImageCLEF 2018 medical domain visual question answering task. In: CEUR Workshop Proceedings (2018)

    Google Scholar 

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  8. Jifara, W., Jiang, F., Rho, S., Cheng, M., Liu, S.: Medical image denoising using convolutional neural network: a residual learning approach. J. Supercomputing 75, 1–15 (2017). https://doi.org/10.1007/s11227-017-2080-0

    Article  Google Scholar 

  9. Kim, J.H., Jun, J., Zhang, B.T.: Bilinear attention networks. In: NIPS (2018)

    Google Scholar 

  10. Lau, J.J., Gayen, S., Abacha, A.B., Demner-Fushman, D.: A dataset of clinically generated visual questions and answers about radiology images. Nature 5, 180251 (2018)

    Google Scholar 

  11. Maicas, G., Bradley, A.P., Nascimento, J.C., Reid, I., Carneiro, G.: Training medical image analysis systems like radiologists. In: MICCAI (2018)

    Google Scholar 

  12. Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: ICANN (2011)

    Google Scholar 

  13. Peng, Y., Liu, F., Rosen, M.P.: UMass at ImageCLEF medical visual question answering (MeD-VQA) 2018 task. In: CEUR Workshop Proceedings (2018)

    Google Scholar 

  14. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP (2014)

    Google Scholar 

  15. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Tech. rep. (1985)

    Google Scholar 

  16. Russakovsky, O., Deng, J., Su, H., et al.: Imagenet large scale visual recognition challenge. In: IJCV, pp. 211–252 (2015)

    Google Scholar 

  17. Schmidhuber, J.: Evolutionary principles in self-referential learning (1987)

    Google Scholar 

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

    Google Scholar 

  19. Yang, Z., He, X., Gao, J., Deng, L., Smola, A.J.: Stacked attention networks for image question answering. In: CVPR (2016)

    Google Scholar 

  20. Zhou, Y., Kang, X., Ren, F.: Employing inception-Resnet-v2 and Bi-LSTM for medical domain visual question answering. In: CEUR Workshop Proceedings (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thanh-Toan Do .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, B.D., Do, TT., Nguyen, B.X., Do, T., Tjiputra, E., Tran, Q.D. (2019). Overcoming Data Limitation in Medical Visual Question Answering. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11767. Springer, Cham. https://doi.org/10.1007/978-3-030-32251-9_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32251-9_57

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32250-2

  • Online ISBN: 978-3-030-32251-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics