Abstract
We address the task of automatically generating a medical report from chest X-rays. Many authors have proposed deep learning models to solve this task, but they focus mainly on improving NLP metrics, such as BLEU and CIDEr, which are not suitable to measure clinical correctness in clinical reports. In this work, we propose CNN-TRG, a Template-based Report Generation model that detects a set of abnormalities and verbalizes them via fixed sentences, which is much simpler than other state-of-the-art NLG methods and achieves better results in medical correctness metrics.
We benchmark our model in the IU X-ray and MIMIC-CXR datasets against naive baselines as well as deep learning-based models, by employing the Chexpert labeler and MIRQI as clinical correctness evaluations, and NLP metrics as secondary evaluation. We also provide further evidence indicating that traditional NLP metrics are not suitable for this task by presenting their lack of robustness in multiple cases. We show that slightly altering a template-based model can increase NLP metrics considerably while maintaining high clinical performance. Our work contributes by a simple but effective approach for chest X-ray report generation, as well as by supporting a model evaluation focused primarily on clinical correctness metrics and secondarily on NLP metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
References
Biswal, S., Xiao, C., Glass, L.M., Westover, B., Sun, J.: Clara: clinical report auto-completion. In: The Web Conference (2020). https://doi.org/10.1145/3366423.3380137
Boag, W., Hsu, T.M.H., Mcdermott, M., Berner, G., Alesentzer, E., Szolovits, P.: Baselines for chest X-ray report generation. In: ML4H at NeurIPS (2020)
Chen, Z., Song, Y., Chang, T.H., Wan, X.: Generating radiology reports via memory-driven transformer. In: EMNLP (2020). https://doi.org/10.18653/v1/2020.emnlp-main.112
Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. JAMIA (2015). https://doi.org/10.1093/jamia/ocv080
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009). https://doi.org/10.1109/CVPR.2009.5206848
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017). https://doi.org/10.1109/CVPR.2017.243
Huang, X., Yan, F., Xu, W., Li, M.: Multi-attention and incorporating background information model for chest x-ray image report generation. IEEE Access (2019). https://doi.org/10.1109/ACCESS.2019.2947134
Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: AAAI Conference on Artificial Intelligence (2019). https://doi.org/10.1609/aaai.v33i01.3301590
Jing, B., Wang, Z., Xing, E.: Show, describe and conclude: on exploiting the structure information of chest x-ray reports. In: ACL (2019). https://doi.org/10.18653/v1/P19-1657
Jing, B., Xie, P., Xing, E.: On the automatic generation of medical imaging reports. In: ACL (2018). https://doi.org/10.18653/v1/P18-1240
Johnson, A., et al.: MIMIC-CXR-JPG-chest radiographs with structured labels (version 2.0.0). PhysioNet (2019). https://doi.org/10.13026/8360-t248
Johnson, A.E.W., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data (2019). https://doi.org/10.1038/s41597-019-0322-0
Kougia, V., Pavlopoulos, J., Papapetrou, P., Gordon, M.: RTEX: a novel framework for ranking, tagging, and explanatory diagnostic captioning of radiography exams. JAMIA (2021). https://doi.org/10.1093/jamia/ocab046
Li, C.Y., Liang, X., Hu, Z., Xing, E.P.: Knowledge-driven encode, retrieve, paraphrase for medical image report generation. In: AAAI Conference on Artificial Intelligence (2019). https://doi.org/10.1609/aaai.v33i01.33016666
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out (2004)
Liu, G., et al.: Clinically accurate chest x-ray report generation. In: ML4H (2019)
Lovelace, J., Mortazavi, B.: Learning to generate clinically coherent chest X-ray reports. In: EMNLP (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.110
Mathur, N., Baldwin, T., Cohn, T.: Tangled up in BLEU: Reevaluating the evaluation of automatic machine translation evaluation metrics. In: ACL (2020). https://doi.org/10.18653/v1/2020.acl-main.448
Messina, P., et al.: A survey on deep learning and explainability for automatic image-based medical report generation (2020)
Ni, J., Hsu, C.N., Gentili, A., McAuley, J.: Learning visual-semantic embeddings for reporting abnormal findings on chest X-rays. In: EMNLP (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.176
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002). https://doi.org/10.3115/1073083.1073135
Pino, P., Parra, D., Messina, P., Besa, C., Uribe, S.: Inspecting state of the art performance and NLP metrics in image-based medical report generation. arXiv preprint arXiv:2011.09257 (2020). In LXAI at NeurIPS 2020
Rajpurkar, P., et al.: CheXNet: radiologist-level pneumonia detection on chest x-rays with deep learning (2017)
Reiter, E.: A structured review of the validity of BLEU. Comput. Linguist. (2018). https://doi.org/10.1162/coli_a_00322
Reyes, M., et al.: On the interpretability of artificial intelligence in radiology: Challenges and opportunities. Radiol. Artif. Intell. (2020). https://doi.org/10.1148/ryai.2020190043
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618–626 (2017). https://doi.org/10.1109/ICCV.2017.74
Syeda-Mahmood, T., et al.: Chest X-ray report generation through fine-grained label learning. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12262, pp. 561–571. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59713-9_54
Vedantam, R., Lawrence Zitnick, C., Parikh, D.: CIDEr: consensus-based image description evaluation. In: CVPR (2015). https://doi.org/10.1109/CVPR.2015.7299087
Xiong, Y., Du, B., Yan, P.: Reinforced transformer for medical image captioning. In: MLMI (2019). https://doi.org/10.1007/978-3-030-32692-0_77
Xue, Y., et al.: Multimodal recurrent model with attention for automated radiology report generation. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 457–466. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1_52
Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A., Xu, D.: When radiology report generation meets knowledge graph. In: AAAI Conference on Artificial Intelligence (2020). https://doi.org/10.1609/aaai.v34i07.6989
Zhang, Y., Ding, D.Y., Qian, T., Manning, C.D., Langlotz, C.P.: Learning to summarize radiology findings. In: LOUHI at NeurIPS (2018). https://doi.org/10.18653/v1/W18-5623
Acknowledgments
This work was partially funded by ANID, Millennium Science Initiative Program, Code ICN17_002 and by ANID, Fondecyt grant 1191791.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Pino, P., Parra, D., Besa, C., Lagos, C. (2021). Clinically Correct Report Generation from Chest X-Rays Using Templates. In: Lian, C., Cao, X., Rekik, I., Xu, X., Yan, P. (eds) Machine Learning in Medical Imaging. MLMI 2021. Lecture Notes in Computer Science(), vol 12966. Springer, Cham. https://doi.org/10.1007/978-3-030-87589-3_67
Download citation
DOI: https://doi.org/10.1007/978-3-030-87589-3_67
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87588-6
Online ISBN: 978-3-030-87589-3
eBook Packages: Computer ScienceComputer Science (R0)