Abstract
Machine learning consists in designing algorithms that exploit data (sometimes called observations) in order to acquire domain knowledge and perform an automated decision-making task. Contrary to most conventional computing tasks, learning algorithms are data-dependent in the sense that they build task-specific models and improve upon them using the data fed to them. Machine learning algorithms are mainly classified into four classes: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. Each of these classes has its own interest and peculiarities. In this book, we focus on supervised classification to illustrate and to formalize the main concepts of robust machine learning. Most of the robustness techniques we discuss however in the book can also be applied to other machine learning classes. In this chapter, we present the fundamentals of supervised learning, through the specifics of the supervised classification task, and we review some of the standard optimization algorithms used for solving this task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The animal icons we use in this figure are courtesy of the Freepick team. See the following links for the respective icons: Elephant, Chicken, Mouse and Snake.
- 2.
Hint: When \(\left \lVert {\nabla {\mathcal {L}\left ( \boldsymbol {\theta } \right )}}\right \rVert > 0\), there exists \(\delta > 0\) such that \(\frac {\psi \left (\gamma \nabla {\mathcal {L}\left ( \boldsymbol {\theta } \right )} \right )}{\gamma \left \lVert {\nabla {\mathcal {L}\left ( \boldsymbol {\theta } \right )} }\right \rVert } < \frac {1}{2} \left \lVert {\nabla {\mathcal {L}\left ( \boldsymbol {\theta } \right )} }\right \rVert \) for all \(\gamma \) such that \(\gamma \left \lVert {\nabla {\mathcal {L}\left ( \boldsymbol {\theta } \right )} }\right \rVert < \delta \).
- 3.
We use the Bachmann–Landau notation \(\mathcal {O}_a(\dot )\) for describing the infinite behavior of a function when \(a \to +\infty \). See section “Notation” for more details on notation.
- 4.
This condition is referred as the second-order sufficient condition for optimality.
References
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from: Tensorflow.org
Arjevani Y, Carmon Y, Duchi JC, Foster DJ, Srebro N, Woodworth B (2023) Lower bounds for non-convex stochastic optimization. Math Program 199(1–2):165–214
Bach F (2023) Learning theory from first principles. MIT Press (Draft), Cambridge
Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Amer Stat Assoc 101(473):138–156
Bottou L (1999) On-line learning and stochastic approximations. In: On-line learning in neural networks. Saad D (ed).. Publications of the Newton Institute. Cambridge University Press, Cambridge, pp 9–42
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. Siam Rev 60(2):223–311
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Cutkosky A, Orabona F (2019) Momentum-based variance reduction in non-convex SGD. In: Advances in Neural Information Processing Systems, vol 32. Curran Associates, Red Hook, pp 15236–15245
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer New York, New York
Khaled A, Richtárik P (2023) Better theory for SGD in the nonconvex world. Trans Mach Learn Res. Survey Certification. ISSN: 2835-8856. https://openreview.net/forum?id=AU4qHN2VkS
Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of machine learning. MIT Press, Cambridge
Moulines E, Bach F (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Advances in Neural Information Processing Systems. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger K, vol 24. Curran Associates, Red Hook
Nesterov Y et al. (2018) Lectures on convex optimization, vol 137. Springer, Berlin
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol 32. Curran Associates, Red Hook, pp 8024–8035
Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: From theory to algorithms. Cambridge University Press, Cambridge
Steinwart I (2007) How to compare different loss functions and their risks. Construct Approx 26(2):225–287
Zhang T (2004) Statistical behavior and consistency of classification methods based on convex risk minimization. Ann Statist 32(1):56–85
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Guerraoui, R., Gupta, N., Pinot, R. (2024). Basics of Machine Learning. In: Robust Machine Learning. Machine Learning: Foundations, Methodologies, and Applications. Springer, Singapore. https://doi.org/10.1007/978-981-97-0688-4_2
Download citation
DOI: https://doi.org/10.1007/978-981-97-0688-4_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0687-7
Online ISBN: 978-981-97-0688-4
eBook Packages: Computer ScienceComputer Science (R0)