Skip to main content

Generalizable Inter-Institutional Classification of Abnormal Chest Radiographs Using Efficient Convolutional Neural Networks


Our objective is to evaluate the effectiveness of efficient convolutional neural networks (CNNs) for abnormality detection in chest radiographs and investigate the generalizability of our models on data from independent sources. We used the National Institutes of Health ChestX-ray14 (NIH-CXR) and the Rhode Island Hospital chest radiograph (RIH-CXR) datasets in this study. Both datasets were split into training, validation, and test sets. The DenseNet and MobileNetV2 CNN architectures were used to train models on each dataset to classify chest radiographs into normal or abnormal categories; models trained on NIH-CXR were designed to also predict the presence of 14 different pathological findings. Models were evaluated on both NIH-CXR and RIH-CXR test sets based on the area under the receiver operating characteristic curve (AUROC). DenseNet and MobileNetV2 models achieved AUROCs of 0.900 and 0.893 for normal versus abnormal classification on NIH-CXR and AUROCs of 0.960 and 0.951 on RIH-CXR. For the 14 pathological findings in NIH-CXR, MobileNetV2 achieved an AUROC within 0.03 of DenseNet for each finding, with an average difference of 0.01. When externally validated on independently collected data (e.g., RIH-CXR-trained models on NIH-CXR), model AUROCs decreased by 3.6–5.2% relative to their locally trained counterparts. MobileNetV2 achieved comparable performance to DenseNet in our analysis, demonstrating the efficacy of efficient CNNs for chest radiograph abnormality detection. In addition, models were able to generalize to external data albeit with performance decreases that should be taken into consideration when applying models on data from different institutions.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2


  1. Yu Q, Yang Y, Liu F, Song Y-Z, Xiang T, Hospedales TM: Sketch-A-Net: a deep neural network that beats humans. Int J Comput Vis. 122(3):411–425, 2017.

    Article  Google Scholar 

  2. Dodge S, Karam L. A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions. arXiv:170502498 [cs]. May 2017.

  3. Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in Neural Information Processing Systems 25. 2012:1097–1105.

  4. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Nelson PC, Mega JL, Webster DR: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 316(22):2402–2410, 2016.

    Article  PubMed  Google Scholar 

  5. Ting DSW, Cheung CY-L, Lim G et al.: Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 318(22):2211–2223, 2017.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S: Dermatologist-level classification of skin cancer with deep neural networks. Nature. 542(7639):115–118, 2017.

    Article  CAS  PubMed  Google Scholar 

  7. Ehteshami Bejnordi B, Veta M, Johannes van Diest P, van Ginneken B, Karssemeijer N, Litjens G, van der Laak JAWM, and the CAMELYON16 Consortium, Hermsen M, Manson QF, Balkenhol M, Geessink O, Stathonikos N, van Dijk MCRF, Bult P, Beca F, Beck AH, Wang D, Khosla A, Gargeya R, Irshad H, Zhong A, Dou Q, Li Q, Chen H, Lin HJ, Heng PA, Haß C, Bruni E, Wong Q, Halici U, Öner MÜ, Cetin-Atalay R, Berseth M, Khvatkov V, Vylegzhanin A, Kraus O, Shaban M, Rajpoot N, Awan R, Sirinukunwattana K, Qaiser T, Tsang YW, Tellez D, Annuscheit J, Hufnagl P, Valkonen M, Kartasalo K, Latonen L, Ruusuvuori P, Liimatainen K, Albarqouni S, Mungal B, George A, Demirci S, Navab N, Watanabe S, Seno S, Takenaka Y, Matsuda H, Ahmady Phoulady H, Kovalev V, Kalinovsky A, Liauchuk V, Bueno G, Fernandez-Carrobles MM, Serrano I, Deniz O, Racoceanu D, Venâncio R: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA. 318(22):2199–2210, 2017.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Lee H, Tajmir S, Lee J, Zissen M, Yeshiwas BA, Alkasab TK, Choy G, Do S: Fully automated deep learning system for bone age assessment. J Digit Imaging. 30(4):427–441, 2017.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Larson DB, Chen MC, Lungren MP, Halabi SS, Stence NV, Langlotz CP: Performance of a deep-learning neural network model in assessing skeletal maturity on pediatric hand radiographs. Radiology. 287(1):313–322, 2017.

    Article  PubMed  Google Scholar 

  10. Halabi SS, Prevedello LM, Kalpathy-Cramer J, Mamonov AB, Bilbily A, Cicero M, Pan I, Pereira LA, Sousa RT, Abdala N, Kitamura FC, Thodberg HH, Chen L, Shih G, Andriole K, Kohli MD, Erickson BJ, Flanders AE: The RSNA pediatric bone age machine learning challenge. Radiology.:180736, 2018.

  11. Chilamkurthy S, Ghosh R, Tanamala S, Biviji M, Campeau NG, Venugopal VK, Mahajan V, Rao P, Warier P: Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet. 392(10162):2388–2396, 2018.

    Article  PubMed  Google Scholar 

  12. Ribli D, Horváth A, Unger Z, Pollner P, Csabai I: Detecting and classifying lesions in mammograms with Deep Learning. Sci Rep. 8:4165, 2018.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Becker AS, Marcon M, Ghafoor S, Wurnig MC, Frauenfelder T, Boss A: Deep learning in mammography: diagnostic accuracy of a multipurpose image analysis software in the detection of breast cancer. Invest Radiol. 52(7):434–440, 2017.

    Article  PubMed  Google Scholar 

  14. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der Laak JAWM, van Ginneken B, Sánchez CI: A survey on deep learning in medical image analysis. Medical Image Analysis. 42:60–88, 2017.

    Article  PubMed  Google Scholar 

  15. Lakhani P, Sundaram B: Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology. 284(2):574–582, 2017.

    Article  PubMed  Google Scholar 

  16. Lakhani P: Deep convolutional neural networks for endotracheal tube position and X-ray image classification: challenges and opportunities. J Digit Imaging. 30(4):460–468, 2017.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Cicero M, Bilbily A, Colak E, Dowdell T, Gray B, Perampaladas K, Barfett J: Training and validating a deep convolutional neural network for computer-aided detection and classification of abnormalities on frontal chest radiographs. Invest Radiol. 52(5):281–287, 2017.

    Article  PubMed  Google Scholar 

  18. Putha P, Tadepalli M, Reddy B, et al. Can Artificial Intelligence Reliably Report Chest X-Rays?: Radiologist Validation of an Algorithm trained on 1.2 Million X-Rays. arXiv:180707455 [cs]. July 2018.

  19. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017:3462–3471.

  20. Rajpurkar P, Irvin J, Ball RL, Zhu K, Yang B, Mehta H, Duan T, Ding D, Bagul A, Langlotz CP, Patel BN, Yeom KW, Shpanskaya K, Blankenberg FG, Seekins J, Amrhein TJ, Mong DA, Halabi SS, Zucker EJ, Ng AY, Lungren MP: Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLOS Medicine. 15(11):e1002686, 2018.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv:180104381 [cs]. 2018.

  22. Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK: Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med. 15(11):e1002683, 2018.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Swenson DW, Baird GL, Portelli DC, Mainiero MB, Movson JS: Pilot study of a new comprehensive radiology report categorization (RADCAT) system in the emergency department. Emerg Radiol. 25(2):139–145, 2018.

  24. Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. arXiv:160806993 [cs]. 2016.

  25. Russakovsky O, Deng J, Su H, et al. ImageNet Large Scale Visual Recognition Challenge. arXiv:14090575 [cs]. 2014.

  26. Paszke A, Gross S, Chintala S, et al. Automatic differentiation in PyTorch. 2017.

  27. Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv:14126980 [cs]. 2014.

  28. Efron B, Tibshirani R: Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist Sci. 1(1):54–75, 1986.

    Article  Google Scholar 

  29. Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv:171105225 [cs, stat]. 2017.

  30. Raoof S, Feigin D, Sung A, Raoof S, Irugulpati L, Rosenow EC: Interpretation of plain chest roentgenogram. Chest. 141(2):545–558, 2012.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ian Pan.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pan, I., Agarwal, S. & Merck, D. Generalizable Inter-Institutional Classification of Abnormal Chest Radiographs Using Efficient Convolutional Neural Networks. J Digit Imaging 32, 888–896 (2019).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Convolutional neural networks
  • Deep learning
  • Generalizability
  • Chest radiographs
  • Classification