Effect of Training Data Volume on Performance of Convolutional Neural Network Pneumothorax Classifiers

Thian, Yee Liang; Ng, Dian Wen; Hallinan, James Thomas Patrick Decourcy; Jagmohan, Pooja; Sia, Soon Yiew; Mohamed, Jalila Sayed Adnan; Quek, Swee Tian; Feng, Mengling

doi:10.1007/s10278-022-00594-y

Effect of Training Data Volume on Performance of Convolutional Neural Network Pneumothorax Classifiers

Published: 03 March 2022

Volume 35, pages 881–892, (2022)
Cite this article

Journal of Digital Imaging Aims and scope Submit manuscript

Yee Liang Thian ORCID: orcid.org/0000-0001-9899-205X¹^na1,
Dian Wen Ng^1,2^na1,
James Thomas Patrick Decourcy Hallinan¹,
Pooja Jagmohan¹,
Soon Yiew Sia¹,
Jalila Sayed Adnan Mohamed^1,3,
Swee Tian Quek¹ &
…
Mengling Feng²

476 Accesses
10 Citations
Explore all metrics

Abstract

Large datasets with high-quality labels required to train deep neural networks are challenging to obtain in the radiology domain. This work investigates the effect of training dataset size on the performance of deep learning classifiers, focusing on chest radiograph pneumothorax detection as a proxy visual task in the radiology domain. Two open-source datasets (ChestX-ray14 and CheXpert) comprising 291,454 images were merged and convolutional neural networks trained with stepwise increase in training dataset sizes. Model iterations at each dataset volume were evaluated on an external test set of 525 emergency department chest radiographs. Learning curve analysis was performed to fit the observed AUCs for all models generated. For all three network architectures tested, model AUCs and accuracy increased rapidly from 2 × 10³ to 20 × 10³ training samples, with more gradual increase until the maximum training dataset size of 291 × 10³ images. AUCs for models trained with the maximum tested dataset size of 291 × 10³ images were significantly higher than models trained with 20 × 10³ images: ResNet-50: AUC_20k = 0.86, AUC_291k = 0.95, p < 0.001; DenseNet-121 AUC_20k = 0.85, AUC_291k = 0.93, p < 0.001; EfficientNet AUC_20k = 0.92, AUC _291 k = 0.98, p < 0.001. Our study established learning curves describing the relationship between dataset training size and model performance of deep learning convolutional neural networks applied to a typical radiology binary classification task. These curves suggest a point of diminishing performance returns for increasing training data volumes, which algorithm developers should consider given the high costs of obtaining and labelling radiology data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalizable Inter-Institutional Classification of Abnormal Chest Radiographs Using Efficient Convolutional Neural Networks

Article 05 March 2019

Generalizable disease detection using model ensemble on chest X-ray images

Article Open access 11 March 2024

Sensitivity and Specificity Evaluation of Deep Learning Models for Detection of Pneumoperitoneum on Chest Radiographs

Availability of Data and Material

Open-source training data as described in the “Materials and Methods” section.

Code Availability

Available on Github.

References

Sun, C., Shrivastava, A., Singh, S., & Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision pp. 843–852, 2017.
Alwosheel A, van Cranenburgh S, Chorus CG. Is your dataset big enough? Sample size requirements when using artificial neural networks for discrete choice analysis. Journal of choice modelling. 28:167-82, 2018.
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition pp. 248–255, 2009.
Parkhi, O.M., Vedaldi, A., & Zisserman, A. Deep face recognition. In bmvc, vol. 1, p.6, 2015.
Google Scholar
Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC medicine. 17:1-9 2019.
Article CAS Google Scholar
Price WN, Cohen IG. Privacy in the age of medical big data. Nat Med. 25:37-43, 2019.
Article CAS Google Scholar
Prevedello LM, Halabi SS, Shih G, Wu CC, Kohli MD, Chokshi FH, Erickson BJ, Kalpathy-Cramer J, Andriole KP, Flanders AE. Challenges related to artificial intelligence research in medical imaging and the importance of image analysis competitions. Radiol Artif Intell. 1:e180031, 2019.
Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H, Folio LR, Summers RM, Rubin DL, Lungren MP. Preparing medical imaging data for machine learning. Radiology. 295:4-15, 2020.
Article Google Scholar
Kim DH, MacKinnon T. Artificial intelligence in fracture detection: transfer learning from deep convolutional neural networks. Clin Radiol. 73:439-45, 2018.
Article CAS Google Scholar
Majkowska A, Mittal S, Steiner DF, Reicher JJ, McKinney SM, Duggan GE, Eswaran K, Cameron Chen PH, Liu Y, Kalidindi SR, Ding A. Chest radiograph interpretation with deep learning models: assessment with radiologist-adjudicated reference standards and population-adjusted evaluation. Radiology. 294:421-31, 2020.
Article Google Scholar
Cho J, Lee K, Shin E, Choy G, Do S. How much data is needed to train a medical image deep learning system to achieve necessary high accuracy? arXiv Prepr arXiv151106348, 2015.
Narayana, P. A., Coronado, I., Sujit, S. J., Wolinsky, J. S., Lublin, F. D., & Gabr, R. E. Deep-Learning-Based Neural Tissue Segmentation of MRI in Multiple Sclerosis: Effect of Training Set Size. J Magn Reson Imaging. 51:1487–1496, 2020.
Article Google Scholar
Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers R. Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. InIEEE CVPR pp. 3462–3471, 2017.
Irvin J, Rajpurkar P, Ko M, Yu Y, Ciurea-Ilcus S, Chute C, Marklund H, Haghgoo B, Ball R, Shpanskaya K, Seekins J. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI Conference on Artificial Intelligenc. 33: 590-597, 2019.
Article Google Scholar
Oakden-Rayner L. Exploring large-scale public medical image datasets. Acad Radiol. 27:106-12, 2020.
Article Google Scholar
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition 770–778, 2016.
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition 4700–4708, 2017.
Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning 6105–6114, 2019.
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference On Computer Vision 618–626, 2017.
Krawczyk B. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence. 5:221-32, 2016.
Article Google Scholar
Kohli MD, Summers RM, Geis JR. Medical image data and datasets in the era of machine learning—whitepaper from the 2016 C-MIMI meeting dataset session. J Digit Imaging 30:392-9, 2017
Article Google Scholar
Figueroa RL, Zeng-Treitler Q, Kandula S, Ngo LH. Predicting sample size required for classification performance. BMC Med Inform Decis Mak. 12:8, 2012.
Article Google Scholar
Hestness J, Narang S, Ardalani N, Diamos G, Jun H, Kianinejad H, Patwary M, Ali M, Yang Y, Zhou Y. Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409, 2017 .
Balki I, Amirabadi A, Levman J, Martel AL, Emersic Z, Meden B, Garcia-Pedrero A, Ramirez SC, Kong D, Moody AR, Tyrrell PN. Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Can Assoc of Radiol J.;70:344-53, 2019.
Article Google Scholar
Dunnmon JA, Yi D, Langlotz CP, Ré C, Rubin DL, Lungren MP. Assessment of convolutional neural networks for automated classification of chest radiographs. Radiology. 290:537-44, 2019
Article Google Scholar
Krause J, Gebru T, Deng J, Li LJ, Fei-Fei L. Learning features and parts for fine-grained recognition. In 2014 22nd International Conference on Pattern Recognition. 26–33, 2014. IEEE.

Download references

Funding

This research was supported by the NUHS Internal Grant Funding under NUHS Seed Fund (NUHSRO/2018/097/R05 + 5/Seed-Nov/07), NUHS-NHIC Joint MedTech Grant (NUHS-NHIC MT2020-02), NUHSRO/2018/019/RO5 + 5/NUHS), and NMRC Health Service Research Grant (HSRG-OC17nov004).

Author information

Yee Liang Thian and Dian Wen Ng contributed equally to this work and share co-first authorship.

Authors and Affiliations

Department of Diagnostic Imaging, National University Hospital, 5 Lower Kent Ridge Rd, Queenstown, 119074, Singapore
Yee Liang Thian, Dian Wen Ng, James Thomas Patrick Decourcy Hallinan, Pooja Jagmohan, Soon Yiew Sia, Jalila Sayed Adnan Mohamed & Swee Tian Quek
Saw Swee Hock School of Public Health, School of Computer Science, Yong Loo Lin School of Medicine, National University of Singapore, 12 Science Drive 2, #10-01, Queenstown, 117549, Singapore
Dian Wen Ng & Mengling Feng
Salmaniya Medical Complex Rd 2904, Manama, Bahrain
Jalila Sayed Adnan Mohamed

Authors

Yee Liang Thian
View author publications
You can also search for this author in PubMed Google Scholar
Dian Wen Ng
View author publications
You can also search for this author in PubMed Google Scholar
James Thomas Patrick Decourcy Hallinan
View author publications
You can also search for this author in PubMed Google Scholar
Pooja Jagmohan
View author publications
You can also search for this author in PubMed Google Scholar
Soon Yiew Sia
View author publications
You can also search for this author in PubMed Google Scholar
Jalila Sayed Adnan Mohamed
View author publications
You can also search for this author in PubMed Google Scholar
Swee Tian Quek
View author publications
You can also search for this author in PubMed Google Scholar
Mengling Feng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Study design, YLT, DWN, MF; data acquisition, YLT, JTPDH, PJ, SYS, JSAM, QST; data analysis, YLT, DWN, JTPDH, MF; literature search, YLT, DWN, MF; clinical studies, YLT, DWN, JTPDH, PJ, SYS, JSAM, QST, MF; statistical analysis, YLT, DWN, MF; manuscript editing, YLT, DWN, JTPDH, PJ, SYS, JSAM, QST, MF.

Corresponding author

Correspondence to Yee Liang Thian.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thian, Y.L., Ng, D.W., Hallinan, J.T.P.D. et al. Effect of Training Data Volume on Performance of Convolutional Neural Network Pneumothorax Classifiers. J Digit Imaging 35, 881–892 (2022). https://doi.org/10.1007/s10278-022-00594-y

Download citation

Received: 02 June 2021
Revised: 27 December 2021
Accepted: 20 January 2022
Published: 03 March 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10278-022-00594-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effect of Training Data Volume on Performance of Convolutional Neural Network Pneumothorax Classifiers

Abstract

Access this article

Similar content being viewed by others

Generalizable Inter-Institutional Classification of Abnormal Chest Radiographs Using Efficient Convolutional Neural Networks

Generalizable disease detection using model ensemble on chest X-ray images

Sensitivity and Specificity Evaluation of Deep Learning Models for Detection of Pneumoperitoneum on Chest Radiographs

Availability of Data and Material

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effect of Training Data Volume on Performance of Convolutional Neural Network Pneumothorax Classifiers

Abstract

Access this article

Similar content being viewed by others

Generalizable Inter-Institutional Classification of Abnormal Chest Radiographs Using Efficient Convolutional Neural Networks

Generalizable disease detection using model ensemble on chest X-ray images

Sensitivity and Specificity Evaluation of Deep Learning Models for Detection of Pneumoperitoneum on Chest Radiographs

Availability of Data and Material

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation