Skip to main content
Log in

Deep curriculum learning optimization

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

We describe a quantitative and practical framework to integrate curriculum learning (CL) into deep learning training pipeline to improve feature learning in deep feed-forward networks. The framework has several unique characteristics: (1) dynamicity—it proposes a set of batch-level training strategies (syllabi or curricula) that are sensitive to data complexity (2) adaptivity—it dynamically estimates the effectiveness of a given strategy and performs objective comparison with alternative strategies making the method suitable both for practical and research purposes. (3) Employs replace–retrain mechanism when a strategy is unfit to the task at hand. In addition to these traits, the framework can combine CL with several variants of gradient descent (GD) algorithms and has been used to generate efficient batch-specific or data-set specific strategies. Comparative studies of various current state-of-the-art vision models, such as FixEfficentNet and BiT-L (ResNet), on several benchmark datasets including CIFAR10 demonstrate the effectiveness of the proposed method. We present results that show training loss reduction by as much as a factor 5. Additionally, we present a set of practical curriculum strategies to improve the generalization performance of select networks on various datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://github.com/h3nok/curriculum_learning_optimization.

  2. https://github.com/tensorflow/models/tree/master/research/slim.

References

  1. Bengio Y, Louradour J, Collobert R, Weston J. Curriculum learning. 2009. https://doi.org/10.1145/1553374.1553380.

  2. Graves A, Bellemare MG, Menick J, Munos R, Kavukcuoglu K. Automated curriculum learning for neural networks. In: Proceedings of the 34th international conference on machine learning, vol. 70. ICML’17. 2017, pp. 1311–1320.

  3. Avramova V. Curriculum learning with deep convolutional neural networks. Thesis. KTH Royal Institute of Technology. 2015, p. 119.

  4. Weinshall D, Cohen G, Amir D. Curriculum learning by transfer learning: theory and experiments with deep networks. ArXiv180203796 Cs, Feb. 2018, [Online]. arXiv:1802.03796. Accessed 15 Jun 2018.

  5. Henok G, Gita A. Information theory-based curriculum learning factory to optimize training. In: Asian conference on pattern recognition, Auckland, Zew Zealand, 2019.

  6. Zhang C, Bengio S, Hardt M, Recht B, Vinyals O. Understanding deep learning requires rethinking generalization. ArXiv161103530 Cs, Nov. 2016, [Online]. arXiv:1611.03530. Accessed 02 Apr 2018.

  7. Martin CH, Mahoney MW. Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior. ArXiv171009553 Cs Stat, Oct. 2017. [Online]. arXiv:1710.09553. Accessed 12 Nov 2018.

  8. Szegedy C et al. Intriguing properties of neural networks. ArXiv13126199 Cs, Dec. 2013. [Online]. https://arXiv.org/abs/1312.6199. Accessed 26 Oct 2018.

  9. Ghebrechristos H, Alaghband G. Expediting training using information theory-based patch ordering algorithm. Las Vegas: CSCI; 2018. p. 6.

    Google Scholar 

  10. Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989;2(5):359–66. https://doi.org/10.1016/0893-6080(89)90020-8.

    Article  MATH  Google Scholar 

  11. Kotsiantis SB. Supervised machine learning: a review of classification techniques. Emerg Artif Intell Appl Comput Eng. 2017;160:20.

    Google Scholar 

  12. Deming WE, Morgan SL. The elements of statistical learning. Amsterdam: Elsevier; 1993.

    Google Scholar 

  13. Zhang J. Gradient descent based optimization algorithms for deep learning models training. ArXiv190303614 Cs Stat, Mar. 2019. [Online]. arXiv:1903.03614. Accessed 16 Nov 2019.

  14. Janocha K, Czarnecki WM. On loss functions for deep neural networks in classification. ArXiv170205659 Cs, Feb. 2017, [Online]. arXiv:1702.05659. Accessed 16 Jun 2018.

  15. Goodfellow I, Bengio Y, Courville A. Deep learning. New York: MIT Press; 2016.

    MATH  Google Scholar 

  16. Cover TM, Thomas JA. Elements of information theory. New York: Wiley; 2006. p. 774.

    MATH  Google Scholar 

  17. Feixas M, Bardera A, Rigau J, Xu Q, Sbert M. Information theory tools for image processing. Synth Lect Comput Graph Animat. 2014;6(1):1–164.

    MATH  Google Scholar 

  18. Leff HS, Rex AF, editors. Maxwell’s demon: entropy, information, computing. Princeton: Princeton University Press; 1990.

    Google Scholar 

  19. Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27:55.

    Article  MathSciNet  Google Scholar 

  20. Bonev BI. Feature selection based on information theory. New York: Springer; 2010. p. 200.

    Google Scholar 

  21. Horé A, Ziou D. Image quality metrics: PSNR vs. SSIM. Aug. 2010, pp. 2366–2369. https://doi.org/10.1109/ICPR.2010.579.

  22. Russakoff DB, Tomasi C, Rohlfing T, Maurer Jr CR. Image similarity using mutual information of Torsten Rohlfing. In: 8th European conference on computer vision (ECCV, 2004, pp. 596–607).

  23. Abadi M, et al. TensorFlow: a system for large-scale machine learning. ArXiv160508695 Cs, May 2016. [Online]. arXiv:1605.08695. Accessed 23 Jun 2018.

  24. Parkhi OM, Vedaldi A, Zisserman A, Jawahar CV. Cats and dogs. In: 2012 IEEE conference on computer vision and pattern recognition, Jun. 2012, pp. 3498–3505. https://doi.org/10.1109/CVPR.2012.6248092.

  25. Krizhevsky A. Learning multiple layers of features from tiny images. New York: Springer; 2009. p. 60.

    Google Scholar 

  26. Kingma DP, Ba J. Adam: a method for stochastic optimization. ArXiv14126980 Cs, Dec. 2014. [Online]. arXiv:1412.6980. Accessed 16 Jun 2018.

  27. Tan M, Le QV. EfficientNet: rethinking model scaling for convolutional neural networks. ArXiv190511946 Cs Stat, Nov. 2019, [Online]. arXiv:1905.11946. Accessed 19 Mar 2020.

  28. Touvron H, Vedaldi A, Douze M, Jégou H. Fixing the train-test resolution discrepancy: FixEfficientNet. ArXiv200308237 Cs, Apr. 2020. [Online]. arXiv:2003.08237. Accessed 20 May 2020.

  29. Xie Q, Luong M-T, Hovy E, Le QV. Self-training with noisy student improves ImageNet classification. ArXiv191104252 Cs Stat, Apr. 2020. [Online]. arXiv:1911.04252. Accessed 20 May 2020.

  30. Touvron H, Vedaldi A, Douze M, Jégou H. Fixing the train-test resolution discrepancy. ArXiv190606423 Cs, Mar. 2020, [Online]. arXiv:1906.06423. Accessed 22 May 2020.

  31. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, 2012, pp. 1097–1105, [Online]. https://papers.nips.cc/paper/4824-imagenet-classification-w. Accessed 17 Sept 2016.

  32. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90.

  33. Kolesnikov A, et al. Big transfer (BiT): general visual representation learning. ArXiv191211370 Cs, May 2020, [Online]. arXiv:1912.11370. Accessed 20 May 2020.

  34. Yalniz IZ, Jégou H, Chen K, Paluri M, Mahajan D. Billion-scale semi-supervised learning for image classification. ArXiv190500546 Cs, May 2019, [Online]. arXiv:1905.00546. Accessed 22 May 2020.

  35. Howard AG, et al. MobileNets: efficient convolutional neural networks for mobile vision applications. ArXiv170404861 Cs, Apr. 2017. [Online]. arXiv:1704.04861. Accessed 06 Apr 2019.

  36. Koyejo OO, Natarajan N, Ravikumar PK, Dhillon IS. Consistent binary classification with generalized performance metrics. New York: Springer; 2014. p. 9.

    Google Scholar 

  37. Hossin M, Sulaiman MN. A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process. 2015;5(2):1–11. https://doi.org/10.5121/ijdkp.2015.5201.

    Article  Google Scholar 

  38. Kumar R, Indrayan A. Receiver operating characteristic (ROC) curve for medical researchers. Indian Pediatr. 2011;48(4):277–87. https://doi.org/10.1007/s13312-011-0055-4.

    Article  Google Scholar 

  39. Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13:25.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henok Ghebrechristos.

Ethics declarations

Conflict of interest

On behalf of all authors, I state that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Machine Learning in Pattern Analysis” guest edited by Reinhard Klette, Brendan McCane, Gabriella Sanniti di Baja, Palaiahnakote Shivakumara and Liang Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghebrechristos, H., Alaghband, G. Deep curriculum learning optimization. SN COMPUT. SCI. 1, 245 (2020). https://doi.org/10.1007/s42979-020-00251-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-020-00251-7

Keywords

Navigation