CASED: Curriculum Adaptive Sampling for Extreme Data Imbalance

  • Andrew JessonEmail author
  • Nicolas Guizard
  • Sina Hamidi Ghalehjegh
  • Damien Goblot
  • Florian Soudan
  • Nicolas Chapados
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10435)


We introduce CASED, a novel curriculum sampling algorithm that facilitates the optimization of deep learning segmentation or detection models on data sets with extreme class imbalance. We evaluate the CASED learning framework on the task of lung nodule detection in chest CT. In contrast to two-stage solutions, wherein nodule candidates are first proposed by a segmentation model and refined by a second detection stage, CASED improves the training of deep nodule segmentation models (e.g. UNet) to the point where state of the art results are achieved using only a trivial detection stage. CASED improves the optimization of deep segmentation models by allowing them to first learn how to distinguish nodules from their immediate surroundings, while continuously adding a greater proportion of difficult-to-classify global context, until uniformly sampling from the empirical data distribution. Using CASED during training yields a minimalist proposal to the lung nodule detection problem that tops the LUNA16 nodule detection benchmark with an average sensitivity score of 88.35%. Furthermore, we find that models trained using CASED are robust to nodule annotation quality by showing that comparable results can be achieved when only a point and radius for each ground truth nodule are provided during training. Finally, the CASED learning framework makes no assumptions with regard to imaging modality or segmentation target and should generalize to other medical imaging problems where class imbalance is a persistent problem.


Lung cancer Computer aided detection Nodule detection Curriculum learning Data imbalance 3D convolutional neural networks 


  1. 1.
    Lung nodule analysis 2016. Accessed 22 Feb 2017
  2. 2.
    Armato, S.G., McLennan, G., et al.: The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys. 38(2), 915–931 (2011). CrossRefGoogle Scholar
  3. 3.
    Armato, S.G., McLennan, G., et al.: Data From LIDC-IDRI. The Cancer Imaging Archive (2015).
  4. 4.
    Bengio, Y., Louradour, J., et al.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48. ACM (2009)Google Scholar
  5. 5.
    Clark, K., Vendt, B., et al.: The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J. Digital Imaging 26(6), 1045–1057 (2013). CrossRefGoogle Scholar
  6. 6.
    Diederich, S., Lentschig, M., et al.: Detection of pulmonary nodules at spiral CT: comparison of maximum intensity projection sliding slabs and single-image reporting. Eur. Radiol. 11(8), 1345–1350 (2001). CrossRefGoogle Scholar
  7. 7.
    Dubey, R., Zhou, J., et al.: Analysis of sampling techniques for imbalanced data: an \(N = 648\) ADNI study. Neuroimage 87, 220–241 (2014)CrossRefGoogle Scholar
  8. 8.
    Havaei, M., Davy, A., et al.: Brain tumor segmentation with deep neural networks. Med. Image Anal. 35, 18–31 (2017). CrossRefGoogle Scholar
  9. 9.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  10. 10.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  11. 11.
    Lopez Torres, E., Fiorina, E., et al.: Large scale validation of the M5L lung CAD on heterogeneous CT datasets. Med. Phys. 42(4), 1477–1489 (2015). CrossRefGoogle Scholar
  12. 12.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). doi: 10.1007/978-3-319-24574-4_28 CrossRefGoogle Scholar
  13. 13.
    Setio, A.A.A., Jacobs, C., et al.: Automatic detection of large pulmonary solid nodules in thoracic CT images. Med. Phys. 42(10), 5642–5653 (2015). CrossRefGoogle Scholar
  14. 14.
    Setio, A.A.A., Ciompi, F., et al.: Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks. IEEE Trans. Med. Imaging 35(5), 1160–1169 (2016)CrossRefGoogle Scholar
  15. 15.
    Setio, A.A.A., Traverso, A., et al.: Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 challenge. arXiv preprint arXiv:1612.08012 (2016)
  16. 16.
    Siegel, R., Naishadham, D., Jemal, A.: Cancer statistics, 2013. CA: Cancer J. Clin. 63(1), 11–30 (2013). Google Scholar
  17. 17.
    Sung, K.K., Poggio, T.: Example-based learning for view-based human face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 39–51 (1998). CrossRefGoogle Scholar
  18. 18.
    Tan, M., Deklerck, R., et al.: A novel computer-aided lung nodule detection system for CT images. Med. Phys. 38(10), 5630–5645 (2011). CrossRefGoogle Scholar
  19. 19.
    Valente, I.R.S., Cortez, P.C., et al.: Automatic 3D pulmonary nodule detection in CT images: a survey. Comput. Methods Programs Biomed. 124, 91–107 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Andrew Jesson
    • 1
    Email author
  • Nicolas Guizard
    • 1
  • Sina Hamidi Ghalehjegh
    • 1
  • Damien Goblot
    • 1
  • Florian Soudan
    • 1
  • Nicolas Chapados
    • 1
  1. 1.Imagia Cybernetics Inc.MontrealCanada

Personalised recommendations