Overcoming Data Limitation in Medical Visual Question Answering

Nguyen, Binh D.; Do, Thanh-Toan; Nguyen, Binh X.; Do, Tuong; Tjiputra, Erman; Tran, Quang D.

doi:10.1007/978-3-030-32251-9_57

Binh D. Nguyen¹⁶,
Thanh-Toan Do¹⁷,
Binh X. Nguyen¹⁶,
Tuong Do¹⁶,
Erman Tjiputra¹⁶ &
…
Quang D. Tran¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11767))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

9348 Accesses
62 Citations
1 Altmetric

Abstract

Traditional approaches for Visual Question Answering (VQA) require large amount of labeled data for training. Unfortunately, such large scale data is usually not available for medical domain. In this paper, we propose a novel medical VQA framework that overcomes the labeled data limitation. The proposed framework explores the use of the unsupervised Denoising Auto-Encoder (DAE) and the supervised Meta-Learning. The advantage of DAE is to leverage the large amount of unlabeled images while the advantage of Meta-Learning is to learn meta-weights that quickly adapt to VQA problem with limited labeled data. By leveraging the advantages of these techniques, it allows the proposed framework to be efficiently trained using a small labeled training set. The experimental results show that our proposed method significantly outperforms the state-of-the-art medical VQA. The source code is available at https://github.com/aioz-ai/MICCAI19-MedVQA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The descriptions of new defined classes are presented in Sect. 4.1.
2.
https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia.
3.
https://www.synapse.org/#!Synapse:syn3193805/wiki/217753.
4.
Those frameworks are completed VQA models in which the core components in those frameworks are SAN and MCB attentions. We refer the reader to the corresponding papers [5, 19] for the detail of those models.

References

Abacha, A.B., Gayen, S., Lau, J.J., Rajaraman, S., Demner-Fushman, D.: NLM at ImageCLEF 2018 visual question answering in the medical domain. In: CEUR Workshop Proceedings (2018)
Google Scholar
Bar, Y., Diamant, I., Wolf, L., Greenspan, H.: Deep learning with non-medical training used for chest pathology identification. In: Medical Imaging: Computer-Aided Diagnosis (2015)
Google Scholar
Clark, K., Vendt, B., Smith, K., Freymann, J., et al.: The cancer imaging archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging 26(6), 1045–1057 (2013)
Article Google Scholar
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)
Google Scholar
Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. In: EMNLP (2016)
Google Scholar
Hasan, S.A., Ling, Y., Farri, O., Liu, J., Lungren, M., Müller, H.: Overview of the ImageCLEF 2018 medical domain visual question answering task. In: CEUR Workshop Proceedings (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Jifara, W., Jiang, F., Rho, S., Cheng, M., Liu, S.: Medical image denoising using convolutional neural network: a residual learning approach. J. Supercomputing 75, 1–15 (2017). https://doi.org/10.1007/s11227-017-2080-0
Article Google Scholar
Kim, J.H., Jun, J., Zhang, B.T.: Bilinear attention networks. In: NIPS (2018)
Google Scholar
Lau, J.J., Gayen, S., Abacha, A.B., Demner-Fushman, D.: A dataset of clinically generated visual questions and answers about radiology images. Nature 5, 180251 (2018)
Google Scholar
Maicas, G., Bradley, A.P., Nascimento, J.C., Reid, I., Carneiro, G.: Training medical image analysis systems like radiologists. In: MICCAI (2018)
Google Scholar
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: ICANN (2011)
Google Scholar
Peng, Y., Liu, F., Rosen, M.P.: UMass at ImageCLEF medical visual question answering (MeD-VQA) 2018 task. In: CEUR Workshop Proceedings (2018)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP (2014)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Tech. rep. (1985)
Google Scholar
Russakovsky, O., Deng, J., Su, H., et al.: Imagenet large scale visual recognition challenge. In: IJCV, pp. 211–252 (2015)
Google Scholar
Schmidhuber, J.: Evolutionary principles in self-referential learning (1987)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Yang, Z., He, X., Gao, J., Deng, L., Smola, A.J.: Stacked attention networks for image question answering. In: CVPR (2016)
Google Scholar
Zhou, Y., Kang, X., Ren, F.: Employing inception-Resnet-v2 and Bi-LSTM for medical domain visual question answering. In: CEUR Workshop Proceedings (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

AIOZ Pte Ltd., Singapore, Singapore
Binh D. Nguyen, Binh X. Nguyen, Tuong Do, Erman Tjiputra & Quang D. Tran
University of Liverpool, Liverpool, UK
Thanh-Toan Do

Authors

Binh D. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Thanh-Toan Do
View author publications
You can also search for this author in PubMed Google Scholar
Binh X. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Tuong Do
View author publications
You can also search for this author in PubMed Google Scholar
Erman Tjiputra
View author publications
You can also search for this author in PubMed Google Scholar
Quang D. Tran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanh-Toan Do .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, B.D., Do, TT., Nguyen, B.X., Do, T., Tjiputra, E., Tran, Q.D. (2019). Overcoming Data Limitation in Medical Visual Question Answering. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11767. Springer, Cham. https://doi.org/10.1007/978-3-030-32251-9_57

Download citation

DOI: https://doi.org/10.1007/978-3-030-32251-9_57
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32250-2
Online ISBN: 978-3-030-32251-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)