Artificial intelligence-assisted interpretation of bone age radiographs improves accuracy and decreases variability
Radiographic bone age assessment (BAA) is used in the evaluation of pediatric endocrine and metabolic disorders. We previously developed an automated artificial intelligence (AI) deep learning algorithm to perform BAA using convolutional neural networks. We compared the BAA performance of a cohort of pediatric radiologists with and without AI assistance.
Materials and methods
Six board-certified, subspecialty trained pediatric radiologists interpreted 280 age- and gender-matched bone age radiographs ranging from 5 to 18 years. Three of those radiologists then performed BAA with AI assistance. Bone age accuracy and root mean squared error (RMSE) were used as measures of accuracy. Intraclass correlation coefficient evaluated inter-rater variation.
AI BAA accuracy was 68.2% overall and 98.6% within 1 year, and the mean six-reader cohort accuracy was 63.6 and 97.4% within 1 year. AI RMSE was 0.601 years, while mean single-reader RMSE was 0.661 years. Pooled RMSE decreased from 0.661 to 0.508 years, all individually decreasing with AI assistance. ICC without AI was 0.9914 and with AI was 0.9951.
AI improves radiologist’s bone age assessment by increasing accuracy and decreasing variability and RMSE. The utilization of AI by radiologists improves performance compared to AI alone, a radiologist alone, or a pooled cohort of experts. This suggests that AI may optimally be utilized as an adjunct to radiologist interpretation of imaging studies to improve performance.
KeywordsMachine learning Bone age Augmented intelligence Radiographs Pediatric
Compliance with ethical standards
Conflict of interest
- 1.Johnson M, Schuster M, Le QV, et al. Google’s multilingual neural machine translation system: enabling zero-shot translation. arXiv [csCL]. 2016. http://arxiv.org/abs/1611.04558.
- 2.Maas R, Rastrow A, Goehner K, Tiwari G, Joseph S, Hoffmeister B. Domain-specific utterance end-point detection for speech recognition. In: Interspeech 2017. 2017. https://doi.org/10.21437/interspeech.2017-1673.
- 6.Lewis-Kraus G. The Great A.I. Awakening. The New York Times. https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html. Published December 14, 2016. Accessed 23 Oct 2017.
- 7.Mukherjee S. A.I. Versus M.D. The New Yorker. https://www.newyorker.com/magazine/2017/04/03/ai-versus-md. Published March 27, 2017. Accessed 23 Oct 2017.
- 8.Larson DB, Chen MC, Lungren MP, Halabi SS, Stence NV, Langlotz CP. Performance of a deep-learning neural network model in assessing skeletal maturity on pediatric hand radiographs. Radiology. 2017;170236. https://doi.org/10.1148/radiol.2017170236.
- 11.Lee H, Tajmir S, Lee J, et al. Fully automated deep learning system for bone age assessment. J Digit Imaging. 2017. https://doi.org/10.1007/s10278-017-9955-8.
- 12.Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, editors. Computer vision – ECCV 2014. Lecture Notes in Computer Science. Springer International Publishing; 2014. p. 818-833. https://doi.org/10.1007/978-3-319-10590-1_53.
- 15.van Grinsven MJJP, van Ginneken B, Hoyng CB, Theelen T, Sanchez CI. Fast convolutional neural network training using selective data sampling: application to hemorrhage detection in color fundus images. IEEE Trans Med Imaging. 2016;35(5):1273–84. https://doi.org/10.1109/TMI.2016.2526689.CrossRefPubMedGoogle Scholar
- 16.Lakhani P, Sundaram B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology. 2017:162326. https://doi.org/10.1148/radiol.2017162326.
- 17.González G, Ash SY, Vegas Sanchez-Ferrero G, et al. Disease staging and prognosis in smokers using deep learning in chest computed tomography. Am J Respir Crit Care Med. 2017. https://doi.org/10.1164/rccm.201705-0860OC.
- 18.Lee H, Troschel FM, Tajmir S, et al. Pixel-level deep segmentation: artificial intelligence quantifies muscle on computed tomography for body morphometric analysis. J Digit Imaging. 2017. https://doi.org/10.1007/s10278-017-9988-z.
- 19.Bahl M, Barzilay R, Yedidia AB, Locascio NJ, Yu L, Lehman CD. High-risk breast lesions: a machine learning model to predict pathologic upgrade and reduce unnecessary surgical excision. Radiology. 2017:170549. https://doi.org/10.1148/radiol.2017170549.
- 24.Kim SY, Oh YJ, Shin JY, Rhie YJ, Lee KH. Comparison of the Greulich-Pyle and Tanner Whitehouse (TW3) methods in bone age assessment. J Korean Soc Pediatr Endocrinol. 2008;13(1):50–5. https://www.koreamed.org/SearchBasic.php?RID=0113JKSPE/2008.13.1.50&DT=1 Google Scholar
- 25.Kim JR, Shim WH, Yoon HM, et al. Computerized bone age estimation using deep learning-based program: evaluation of the accuracy and efficiency. AJR Am J Roentgenol. 2017:1-7. https://doi.org/10.2214/AJR.17.18224.