Journal of Digital Imaging

, Volume 31, Issue 4, pp 513–519 | Cite as

MABAL: a Novel Deep-Learning Architecture for Machine-Assisted Bone Age Labeling

  • Simukayi Mutasa
  • Peter D. Chang
  • Carrie Ruzal-Shapiro
  • Rama Ayyala


Bone age assessment (BAA) is a commonly performed diagnostic study in pediatric radiology to assess skeletal maturity. The most commonly utilized method for assessment of BAA is the Greulich and Pyle method (Pediatr Radiol 46.9:1269–1274, 2016; Arch Dis Child 81.2:172–173, 1999) atlas. The evaluation of BAA can be a tedious and time-consuming process for the radiologist. As such, several computer-assisted detection/diagnosis (CAD) methods have been proposed for automation of BAA. Classical CAD tools have traditionally relied on hard-coded algorithmic features for BAA which suffer from a variety of drawbacks. Recently, the advent and proliferation of convolutional neural networks (CNNs) has shown promise in a variety of medical imaging applications. There have been at least two published applications of using deep learning for evaluation of bone age (Med Image Anal 36:41–51, 2017; JDI 1–5, 2017). However, current implementations are limited by a combination of both architecture design and relatively small datasets. The purpose of this study is to demonstrate the benefits of a customized neural network algorithm carefully calibrated to the evaluation of bone age utilizing a relatively large institutional dataset. In doing so, this study will aim to show that advanced architectures can be successfully trained from scratch in the medical imaging domain and can generate results that outperform any existing proposed algorithm. The training data consisted of 10,289 images of different skeletal age examinations, 8909 from the hospital Picture Archiving and Communication System at our institution and 1383 from the public Digital Hand Atlas Database. The data was separated into four cohorts, one each for male and female children above the age of 8, and one each for male and female children below the age of 10. The testing set consisted of 20 radiographs of each 1-year-age cohort from 0 to 1 years to 14–15+ years, half male and half female. The testing set included left-hand radiographs done for bone age assessment, trauma evaluation without significant findings, and skeletal surveys. A 14 hidden layer-customized neural network was designed for this study. The network included several state of the art techniques including residual-style connections, inception layers, and spatial transformer layers. Data augmentation was applied to the network inputs to prevent overfitting. A linear regression output was utilized. Mean square error was used as the network loss function and mean absolute error (MAE) was utilized as the primary performance metric. MAE accuracies on the validation and test sets for young females were 0.654 and 0.561 respectively. For older females, validation and test accuracies were 0.662 and 0.497 respectively. For young males, validation and test accuracies were 0.649 and 0.585 respectively. Finally, for older males, validation and test set accuracies were 0.581 and 0.501 respectively. The female cohorts were trained for 900 epochs each and the male cohorts were trained for 600 epochs. An eightfold cross-validation set was employed for hyperparameter tuning. Test error was obtained after training on a full data set with the selected hyperparameters. Using our proposed customized neural network architecture on our large available data, we achieved an aggregate validation and test set mean absolute errors of 0.637 and 0.536 respectively. To date, this is the best published performance on utilizing deep learning for bone age assessment. Our results support our initial hypothesis that customized, purpose-built neural networks provide improved performance over networks derived from pre-trained imaging data sets. We build on that initial work by showing that the addition of state-of-the-art techniques such as residual connections and inception architecture further improves prediction accuracy. This is important because the current assumption for use of residual and/or inception architectures is that a large pre-trained network is required for successful implementation given the relatively small datasets in medical imaging. Instead we show that a small, customized architecture incorporating advanced CNN strategies can indeed be trained from scratch, yielding significant improvements in algorithm accuracy. It should be noted that for all four cohorts, testing error outperformed validation error. One reason for this is that our ground truth for our test set was obtained by averaging two pediatric radiologist reads compared to our training data for which only a single read was used. This suggests that despite relatively noisy training data, the algorithm could successfully model the variation between observers and generate estimates that are close to the expected ground truth.


Machine learning Deep learning Convolutional neural networks Radiology Pediatric radiology Endocrinology 


  1. 1.
    Breen MA et al.: Bone age assessment practices in infants and older children among Society for Pediatric Radiology members. Pediatr Radiol 46(9):1269–1274, 2016CrossRefPubMedGoogle Scholar
  2. 2.
    Bull RK et al.: Bone age assessment: a large scale comparison of the Greulich and Pyle, and Tanner and Whitehouse (TW2) methods. Arch Dis Child 81(2):172–173, 1999CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Thodberg HH, Sävendahl L: Validation and reference values of automated bone age determination for four ethnicities. Acad Radiol 17(11):1425–1432, 2010CrossRefPubMedGoogle Scholar
  4. 4.
    Ontell FK et al.: Bone age in children of diverse ethnicity. AJR. Am J Roentgenol 167(6):1395–1398, 1996CrossRefPubMedGoogle Scholar
  5. 5.
    Berst MJ et al.: Effect of knowledge of chronologic age on the variability of pediatric bone age determined using the Greulich and Pyle standards. Am J Roentgenol 176(2):507–510, 2001CrossRefGoogle Scholar
  6. 6.
    Spampinato C et al.: Deep learning for automated skeletal bone age assessment in X-ray images. Medical image analysis 36:41–51, 2017CrossRefPubMedGoogle Scholar
  7. 7.
    Thodberg HH et al.: The BoneXpert method for automated determination of skeletal maturity. IEEE Trans Med Imaging 28(1):52–66, 2009CrossRefPubMedGoogle Scholar
  8. 8.
    Lee H, et al: Fully Automated Deep Learning System for Bone Age Assessment. J Digit Imaging (2017): 1–15Google Scholar
  9. 9.
    Shen W, Zhou M, Yang F, Yang C, Tian J: Multi-scale Convolutional Neural Networks for Lung Nodule Classification. Inf Process Med Imaging 24:588–599, 2015PubMedGoogle Scholar
  10. 10.
    Anthimopoulos M, Christodoulidis S, Ebner L, Christe A, Mougiakakou S: Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network. IEEE Trans Med Imaging [Internet]. 2016 May [cited 2017 Jul 2];35(5):1207–16. Available from:
  11. 11.
    Kooi T, Litjens G, van Ginneken B, Gubern-Mérida A, Sánchez CI, Mann R, et al: Large scale deep learning for computer aided detection of mammographic lesions. Med Image Anal [Internet]. Elsevier; 2017 Jan 1 [cited 2017 Sep 18]; 35:303–12. Available from:
  12. 12.
    Dou Q, Chen H, Yu L, Zhao L, Qin J, Wang D, et al: Automatic Detection of Cerebral Microbleeds From MR Images via 3D Convolutional Neural Networks. IEEE Trans Med Imaging [Internet]. 2016 May [cited 2017 Jul 2];35(5):1182–95. Available from:
  13. 13.
    Pereira S, Pinto A, Alves V, Silva CA: Brain Tumor Segmentation Using Convolutional Neural Networks in MRI Images. IEEE Trans Med Imaging [Internet]. 2016 May [cited 2017 Jul 2];35(5):1240–51. Available from:
  14. 14.
    Chang PD: Fully Convolutional Deep Residual Neural Networks for Brain Tumor Segmentation. In: Crimi A, Menze B, Maier O, Reyes M, Winzeck S, Handels H, editors. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: Second International Workshop, BrainLes 2016, with the Challenges on BRATS, ISLES and mTOP 2016, Held in Conjunction with MICCAI 2016, Athens, Greece, October 17, 2016, Revised [Internet]. Cham: Springer International Publishing; 2016. p. 108–18. Available from:
  15. 15.
    Wang J, Fang Z, Lang N, Yuan H, Su MY, Baldi P: A multi-resolution approach for spinal metastasis detection using deep Siamese neural networks. Comput Biol Med 84:137–146, 2017CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Cao F et al.: Digital hand atlas and web-based bone age assessment: system design and implementation. Comput Med Imaging Graph 24(5):297–307, 2000CrossRefPubMedGoogle Scholar
  17. 17.
    LeCun Y et al.: Gradient-based learning applied to document recognition. Proceed IEEE 86(11):2278–2324, 1998CrossRefGoogle Scholar
  18. 18.
    He K, et al: Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016Google Scholar
  19. 19.
    Szegedy C, et al: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. AAAI. 2017Google Scholar
  20. 20.
    Szegedy C, et al: Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition. 2015Google Scholar
  21. 21.
    Jaderberg M, Simonyan K, Zisserman A:. Spatial transformer networks. Adv Neural Inf Process Syst 2015Google Scholar
  22. 22.
    Kingma D, Ba J: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014Google Scholar
  23. 23.
    Nesterov Y: Gradient methods for minimizing composite objective function. 2007Google Scholar
  24. 24.
    Dozat T: Incorporating nesterov momentum into Adam. 2016Google Scholar
  25. 25.
    Glorot X, Bengio Y: Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 2010Google Scholar
  26. 26.
    Srivastava N et al.: Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958, 2014Google Scholar
  27. 27.
    Ioffe S, Szegedy C: Batch normalization: Accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning. 2015Google Scholar
  28. 28.
    Maclaurin D, Duvenaud D, Adams R: Gradient-based hyperparameter optimization through reversible learning. International Conference on Machine Learning. 2015Google Scholar
  29. 29.
    LeCun YA, et al: Efficient backprop. Neural networks: Tricks of the trade. Springer Berlin Heidelberg, 2012. 9–48Google Scholar
  30. 30.
    Bengio Y: Practical recommendations for gradient-based training of deep architectures. Neural networks: Tricks of the trade. Berlin Heidelberg: Springer, 2012, pp. 437–478CrossRefGoogle Scholar
  31. 31.
    Sun C, et al: Revisiting unreasonable effectiveness of data in deep learning era. arXiv preprint arXiv:1707.02968 2017Google Scholar

Copyright information

© Society for Imaging Informatics in Medicine 2018

Authors and Affiliations

  1. 1.Columbia University Medical CenterNew YorkUSA

Personalised recommendations