Deep curriculum learning optimization

Ghebrechristos, Henok; Alaghband, Gita

doi:10.1007/s42979-020-00251-7

Deep curriculum learning optimization

Original Research
Published: 29 July 2020

Volume 1, article number 245, (2020)
Cite this article

SN Computer Science Aims and scope Submit manuscript

656 Accesses
Explore all metrics

Abstract

We describe a quantitative and practical framework to integrate curriculum learning (CL) into deep learning training pipeline to improve feature learning in deep feed-forward networks. The framework has several unique characteristics: (1) dynamicity—it proposes a set of batch-level training strategies (syllabi or curricula) that are sensitive to data complexity (2) adaptivity—it dynamically estimates the effectiveness of a given strategy and performs objective comparison with alternative strategies making the method suitable both for practical and research purposes. (3) Employs replace–retrain mechanism when a strategy is unfit to the task at hand. In addition to these traits, the framework can combine CL with several variants of gradient descent (GD) algorithms and has been used to generate efficient batch-specific or data-set specific strategies. Comparative studies of various current state-of-the-art vision models, such as FixEfficentNet and BiT-L (ResNet), on several benchmark datasets including CIFAR10 demonstrate the effectiveness of the proposed method. We present results that show training loss reduction by as much as a factor 5. Additionally, we present a set of practical curriculum strategies to improve the generalization performance of select networks on various datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

Notes

References

Bengio Y, Louradour J, Collobert R, Weston J. Curriculum learning. 2009. https://doi.org/10.1145/1553374.1553380.
Graves A, Bellemare MG, Menick J, Munos R, Kavukcuoglu K. Automated curriculum learning for neural networks. In: Proceedings of the 34th international conference on machine learning, vol. 70. ICML’17. 2017, pp. 1311–1320.
Avramova V. Curriculum learning with deep convolutional neural networks. Thesis. KTH Royal Institute of Technology. 2015, p. 119.
Weinshall D, Cohen G, Amir D. Curriculum learning by transfer learning: theory and experiments with deep networks. ArXiv180203796 Cs, Feb. 2018, [Online]. arXiv:1802.03796. Accessed 15 Jun 2018.
Henok G, Gita A. Information theory-based curriculum learning factory to optimize training. In: Asian conference on pattern recognition, Auckland, Zew Zealand, 2019.
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O. Understanding deep learning requires rethinking generalization. ArXiv161103530 Cs, Nov. 2016, [Online]. arXiv:1611.03530. Accessed 02 Apr 2018.
Martin CH, Mahoney MW. Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior. ArXiv171009553 Cs Stat, Oct. 2017. [Online]. arXiv:1710.09553. Accessed 12 Nov 2018.
Szegedy C et al. Intriguing properties of neural networks. ArXiv13126199 Cs, Dec. 2013. [Online]. https://arXiv.org/abs/1312.6199. Accessed 26 Oct 2018.
Ghebrechristos H, Alaghband G. Expediting training using information theory-based patch ordering algorithm. Las Vegas: CSCI; 2018. p. 6.
Google Scholar
Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989;2(5):359–66. https://doi.org/10.1016/0893-6080(89)90020-8.
Article MATH Google Scholar
Kotsiantis SB. Supervised machine learning: a review of classification techniques. Emerg Artif Intell Appl Comput Eng. 2017;160:20.
Google Scholar
Deming WE, Morgan SL. The elements of statistical learning. Amsterdam: Elsevier; 1993.
Google Scholar
Zhang J. Gradient descent based optimization algorithms for deep learning models training. ArXiv190303614 Cs Stat, Mar. 2019. [Online]. arXiv:1903.03614. Accessed 16 Nov 2019.
Janocha K, Czarnecki WM. On loss functions for deep neural networks in classification. ArXiv170205659 Cs, Feb. 2017, [Online]. arXiv:1702.05659. Accessed 16 Jun 2018.
Goodfellow I, Bengio Y, Courville A. Deep learning. New York: MIT Press; 2016.
MATH Google Scholar
Cover TM, Thomas JA. Elements of information theory. New York: Wiley; 2006. p. 774.
MATH Google Scholar
Feixas M, Bardera A, Rigau J, Xu Q, Sbert M. Information theory tools for image processing. Synth Lect Comput Graph Animat. 2014;6(1):1–164.
MATH Google Scholar
Leff HS, Rex AF, editors. Maxwell’s demon: entropy, information, computing. Princeton: Princeton University Press; 1990.
Google Scholar
Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27:55.
Article MathSciNet Google Scholar
Bonev BI. Feature selection based on information theory. New York: Springer; 2010. p. 200.
Google Scholar
Horé A, Ziou D. Image quality metrics: PSNR vs. SSIM. Aug. 2010, pp. 2366–2369. https://doi.org/10.1109/ICPR.2010.579.
Russakoff DB, Tomasi C, Rohlfing T, Maurer Jr CR. Image similarity using mutual information of Torsten Rohlfing. In: 8th European conference on computer vision (ECCV, 2004, pp. 596–607).
Abadi M, et al. TensorFlow: a system for large-scale machine learning. ArXiv160508695 Cs, May 2016. [Online]. arXiv:1605.08695. Accessed 23 Jun 2018.
Parkhi OM, Vedaldi A, Zisserman A, Jawahar CV. Cats and dogs. In: 2012 IEEE conference on computer vision and pattern recognition, Jun. 2012, pp. 3498–3505. https://doi.org/10.1109/CVPR.2012.6248092.
Krizhevsky A. Learning multiple layers of features from tiny images. New York: Springer; 2009. p. 60.
Google Scholar
Kingma DP, Ba J. Adam: a method for stochastic optimization. ArXiv14126980 Cs, Dec. 2014. [Online]. arXiv:1412.6980. Accessed 16 Jun 2018.
Tan M, Le QV. EfficientNet: rethinking model scaling for convolutional neural networks. ArXiv190511946 Cs Stat, Nov. 2019, [Online]. arXiv:1905.11946. Accessed 19 Mar 2020.
Touvron H, Vedaldi A, Douze M, Jégou H. Fixing the train-test resolution discrepancy: FixEfficientNet. ArXiv200308237 Cs, Apr. 2020. [Online]. arXiv:2003.08237. Accessed 20 May 2020.
Xie Q, Luong M-T, Hovy E, Le QV. Self-training with noisy student improves ImageNet classification. ArXiv191104252 Cs Stat, Apr. 2020. [Online]. arXiv:1911.04252. Accessed 20 May 2020.
Touvron H, Vedaldi A, Douze M, Jégou H. Fixing the train-test resolution discrepancy. ArXiv190606423 Cs, Mar. 2020, [Online]. arXiv:1906.06423. Accessed 22 May 2020.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, 2012, pp. 1097–1105, [Online]. https://papers.nips.cc/paper/4824-imagenet-classification-w. Accessed 17 Sept 2016.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90.
Kolesnikov A, et al. Big transfer (BiT): general visual representation learning. ArXiv191211370 Cs, May 2020, [Online]. arXiv:1912.11370. Accessed 20 May 2020.
Yalniz IZ, Jégou H, Chen K, Paluri M, Mahajan D. Billion-scale semi-supervised learning for image classification. ArXiv190500546 Cs, May 2019, [Online]. arXiv:1905.00546. Accessed 22 May 2020.
Howard AG, et al. MobileNets: efficient convolutional neural networks for mobile vision applications. ArXiv170404861 Cs, Apr. 2017. [Online]. arXiv:1704.04861. Accessed 06 Apr 2019.
Koyejo OO, Natarajan N, Ravikumar PK, Dhillon IS. Consistent binary classification with generalized performance metrics. New York: Springer; 2014. p. 9.
Google Scholar
Hossin M, Sulaiman MN. A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process. 2015;5(2):1–11. https://doi.org/10.5121/ijdkp.2015.5201.
Article Google Scholar
Kumar R, Indrayan A. Receiver operating characteristic (ROC) curve for medical researchers. Indian Pediatr. 2011;48(4):277–87. https://doi.org/10.1007/s13312-011-0055-4.
Article Google Scholar
Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13:25.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Colorado, Denver, CO, 80014, USA
Henok Ghebrechristos & Gita Alaghband

Authors

Henok Ghebrechristos
View author publications
You can also search for this author in PubMed Google Scholar
Gita Alaghband
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henok Ghebrechristos.

Ethics declarations

Conflict of interest

On behalf of all authors, I state that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Machine Learning in Pattern Analysis” guest edited by Reinhard Klette, Brendan McCane, Gabriella Sanniti di Baja, Palaiahnakote Shivakumara and Liang Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghebrechristos, H., Alaghband, G. Deep curriculum learning optimization. SN COMPUT. SCI. 1, 245 (2020). https://doi.org/10.1007/s42979-020-00251-7

Download citation

Received: 31 March 2020
Accepted: 11 July 2020
Published: 29 July 2020
DOI: https://doi.org/10.1007/s42979-020-00251-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep curriculum learning optimization

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A survey on Image Data Augmentation for Deep Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep curriculum learning optimization

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A survey on Image Data Augmentation for Deep Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation