A Comprehensive Survey of Loss Functions in Machine Learning

Wang, Qi; Ma, Yue; Zhao, Kun; Tian, Yingjie

doi:10.1007/s40745-020-00253-5

A Comprehensive Survey of Loss Functions in Machine Learning

Published: 12 April 2020

Volume 9, pages 187–212, (2022)
Cite this article

Annals of Data Science Aims and scope Submit manuscript

Qi Wang^1,2,3,
Yue Ma^1,2,3,
Kun Zhao⁴ &
…
Yingjie Tian^2,3,5

13k Accesses
180 Citations
4 Altmetric
Explore all metrics

Abstract

As one of the important research topics in machine learning, loss function plays an important role in the construction of machine learning algorithms and the improvement of their performance, which has been concerned and explored by many researchers. But it still has a big gap to summarize, analyze and compare the classical loss functions. Therefore, this paper summarizes and analyzes 31 classical loss functions in machine learning. Specifically, we describe the loss functions from the aspects of traditional machine learning and deep learning respectively. The former is divided into classification problem, regression problem and unsupervised learning according to the task type. The latter is subdivided according to the application scenario, and here we mainly select object detection and face recognition to introduces their loss functions. In each task or application, in addition to analyzing each loss function from formula, meaning, image and algorithm, the loss functions under the same task or application are also summarized and compared to deepen the understanding and provide help for the selection and improvement of loss function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What Is Machine Learning?

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Notes

Available at https://www.dzone.com/articles/comparison-between-deep-learning-vs-machine-learni.

References

Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10(66–71):13
Google Scholar
Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Mich State Univ 2(2):4
Google Scholar
Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. arXiv: 1306.6709
Moutafis P, Leng M, Kakadiaris IA (2016) An overview and empirical comparison of distance metric learning methods. IEEE Trans Cybern 47(3):612–625
Article Google Scholar
Seber GA, Lee AJ (2012) Linear regression analysis, vol 329. Wiley, Hoboken
Google Scholar
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674
Article Google Scholar
Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22
Google Scholar
Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory. Springer, Berlin, Heidelberg, pp 23–37
Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87
Article Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Ahonen T, Hadid A, Pietikainen M (2006) Face description with local binary patterns: application to face recognition. IEEE Trans Pattern Anal Mach Intell 28(12):2037–2041
Article Google Scholar
Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In: Proceedings of the British Machine Vision Conference (BMVC), pp 41.1–41.12
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 234–241
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Wu Y, Schuster M, Chen Z, Le Q V, Norouzi M, Macherey W, Klingner J (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Twenty-ninth AAAI conference on artificial intelligence
Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the 23rd international conference on machine learning, pp 201–208
Huang X, Shi L, Suykens JA (2013) Support vector machine classifier with pinball loss. IEEE Trans Pattern Anal Mach Intell 36(5):984–997
Article Google Scholar
Shen X, Niu L, Qi Z, Tian Y (2017) Support vector machine classifier with truncated pinball loss. Pattern Recognit 68:199–210
Article Google Scholar
Xu G, Cao Z, Hu BG, Principe JC (2017) Robust support vector machines based on the rescaled hinge loss function. Pattern Recognit 63:139–148
Article Google Scholar
Dietterich TG, Bakiri G (1994) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286
Article Google Scholar
Allwein EL, Schapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1(12):113–141
Google Scholar
Lee Y, Lin Y, Wahba G (2004) Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc 99(465):67–81
Article Google Scholar
Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. Esann 99:219–224
Google Scholar
Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2(12):265–292
Google Scholar
Liu Y, Yuan M (2011) Reinforced multicategory support vector machines. J Comput Graph Stat 20(4):901–919
Article Google Scholar
Zhang C, Liu Y (2013) Multicategory large-margin unified machines. J Mach Learn Res 14(1):1349–1386
Google Scholar
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386
Article Google Scholar
Minsky M, Papert SA (2017) Perceptrons: an introduction to computational geometry. MIT Press, Cambridge
Book Google Scholar
Kleinbaum DG, Dietz K, Gail M, Klein M, Klein M (2002) Logistic regression. Springer, New York
Google Scholar
Gasso G (2019) Logistic regression
Kohonen T, Barna G, Chrisley R (1988) Statistical pattern recognition with neural networks: benchmarking studies. In: IEEE International Conference on Neural Networks, vol 1, pp 61–68
Berger AL, Pietra VJD, Pietra SAD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71
Google Scholar
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407
Article Google Scholar
Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297
Google Scholar
Deng N, Tian Y, Zhang C (2012) Support vector machines: optimization based theory, algorithms, and extensions. Chapman and Hall/CRC, Boca Raton
Book Google Scholar
Steinwart I (2003) Sparseness of support vector machines. J Mach Learn Res 4(11):1071–1105
Google Scholar
Yuille AL, Rangarajan A (2002) The concave-convex procedure (CCCP). In: Advances in neural information processing systems, pp 1033–1040
Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35(1):73–101
Article Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat pp 1189-1232
Koenker R, Hallock KF (2001) Quantile regression. J Econ Perspect 15(4):143–156
Article Google Scholar
Drucker H, Burges CJ, Kaufman L, Smola AJ, Vapnik V (1997) Support vector regression machines. In: Advances in neural information processing systems, pp 155–161
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall Inc, Upper Saddle River
Google Scholar
Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75(2):245–248
Article Google Scholar
Cox TF, Cox MA (2000) Multidimensional scaling. Chapman and Hall/CRC, Boca Raton
Book Google Scholar
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Article Google Scholar
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemometr Intell Lab Syst 2(1–3):37–52
Article Google Scholar
Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459
Article Google Scholar
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: An advanced object detection network. In Proceedings of the 24th ACM international conference on Multimedia, pp 516-520
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 658–666
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761–769
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 539–546
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE, pp 1735–1742
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision. Springer, Cham, pp 499–515
Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 212–220
Liu W, Wen Y, Yu Z, Yang M (2016) Large-margin softmax loss for convolutional neural networks. In: ICML vol 2(3), p 7
Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Liu W (2018) Cosface: Large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274
Wang F, Cheng J, Liu W, Liu H (2018) Additive margin softmax for face verification. IEEE Signal Process Lett 25(7):926–930
Article Google Scholar
Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4690–4699

Download references

Acknowledgements

This work has been partially supported by Grants from: Science and Technology Service Network Program of Chinese Academy of Sciences (STS Program, No. KFJ-STS-ZDTP-060), National Natural Science Foundation of China (Nos. 71731009, 61472390, 71331005, 91546201), and Beijing Social Science Foundation Project (No.17GLB020).

Author information

Authors and Affiliations

School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
Qi Wang & Yue Ma
Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing, 100190, China
Qi Wang, Yue Ma & Yingjie Tian
Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, 100190, China
Qi Wang, Yue Ma & Yingjie Tian
School of Logistics, Beijing Wuzi University, Beijing, 101149, China
Kun Zhao
School of Economics and Management, University of Chinese Academy of Sciences, Beijing, 100190, China
Yingjie Tian

Authors

Qi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yue Ma
View author publications
You can also search for this author in PubMed Google Scholar
Kun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yingjie Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yingjie Tian.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Q., Ma, Y., Zhao, K. et al. A Comprehensive Survey of Loss Functions in Machine Learning. Ann. Data. Sci. 9, 187–212 (2022). https://doi.org/10.1007/s40745-020-00253-5

Download citation

Received: 02 March 2020
Revised: 12 March 2020
Accepted: 14 March 2020
Published: 12 April 2020
Issue Date: April 2022
DOI: https://doi.org/10.1007/s40745-020-00253-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Loss Functions in Machine Learning

Abstract

Access this article

Similar content being viewed by others

What Is Machine Learning?

A survey on Image Data Augmentation for Deep Learning

ImageNet Large Scale Visual Recognition Challenge

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Comprehensive Survey of Loss Functions in Machine Learning

Abstract

Access this article

Similar content being viewed by others

What Is Machine Learning?

A survey on Image Data Augmentation for Deep Learning

ImageNet Large Scale Visual Recognition Challenge

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation