Skip to main content

Basics of Machine Learning

  • Chapter
  • First Online:
Robust Machine Learning

Abstract

Machine learning consists in designing algorithms that exploit data (sometimes called observations) in order to acquire domain knowledge and perform an automated decision-making task. Contrary to most conventional computing tasks, learning algorithms are data-dependent in the sense that they build task-specific models and improve upon them using the data fed to them. Machine learning algorithms are mainly classified into four classes: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. Each of these classes has its own interest and peculiarities. In this book, we focus on supervised classification to illustrate and to formalize the main concepts of robust machine learning. Most of the robustness techniques we discuss however in the book can also be applied to other machine learning classes. In this chapter, we present the fundamentals of supervised learning, through the specifics of the supervised classification task, and we review some of the standard optimization algorithms used for solving this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The animal icons we use in this figure are courtesy of the Freepick team. See the following links for the respective icons: Elephant, Chicken, Mouse and Snake.

  2. 2.

    Hint: When \(\left \lVert {\nabla {\mathcal {L}\left ( \boldsymbol {\theta } \right )}}\right \rVert > 0\), there exists \(\delta > 0\) such that \(\frac {\psi \left (\gamma \nabla {\mathcal {L}\left ( \boldsymbol {\theta } \right )} \right )}{\gamma \left \lVert {\nabla {\mathcal {L}\left ( \boldsymbol {\theta } \right )} }\right \rVert } < \frac {1}{2} \left \lVert {\nabla {\mathcal {L}\left ( \boldsymbol {\theta } \right )} }\right \rVert \) for all \(\gamma \) such that \(\gamma \left \lVert {\nabla {\mathcal {L}\left ( \boldsymbol {\theta } \right )} }\right \rVert < \delta \).

  3. 3.

    We use the Bachmann–Landau notation \(\mathcal {O}_a(\dot )\) for describing the infinite behavior of a function when \(a \to +\infty \). See section “Notation” for more details on notation.

  4. 4.

    This condition is referred as the second-order sufficient condition for optimality.

References

  1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from: Tensorflow.org

  2. Arjevani Y, Carmon Y, Duchi JC, Foster DJ, Srebro N, Woodworth B (2023) Lower bounds for non-convex stochastic optimization. Math Program 199(1–2):165–214

    Article  MathSciNet  Google Scholar 

  3. Bach F (2023) Learning theory from first principles. MIT Press (Draft), Cambridge

    Google Scholar 

  4. Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Amer Stat Assoc 101(473):138–156

    Article  MathSciNet  Google Scholar 

  5. Bottou L (1999) On-line learning and stochastic approximations. In: On-line learning in neural networks. Saad D (ed).. Publications of the Newton Institute. Cambridge University Press, Cambridge, pp 9–42

    Chapter  Google Scholar 

  6. Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. Siam Rev 60(2):223–311

    Article  MathSciNet  Google Scholar 

  7. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    Book  Google Scholar 

  8. Cutkosky A, Orabona F (2019) Momentum-based variance reduction in non-convex SGD. In: Advances in Neural Information Processing Systems, vol 32. Curran Associates, Red Hook, pp 15236–15245

    Google Scholar 

  9. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer New York, New York

    Google Scholar 

  10. Khaled A, Richtárik P (2023) Better theory for SGD in the nonconvex world. Trans Mach Learn Res. Survey Certification. ISSN: 2835-8856. https://openreview.net/forum?id=AU4qHN2VkS

  11. Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of machine learning. MIT Press, Cambridge

    Google Scholar 

  12. Moulines E, Bach F (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Advances in Neural Information Processing Systems. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger K, vol 24. Curran Associates, Red Hook

    Google Scholar 

  13. Nesterov Y et al. (2018) Lectures on convex optimization, vol 137. Springer, Berlin

    Google Scholar 

  14. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol 32. Curran Associates, Red Hook, pp 8024–8035

    Google Scholar 

  15. Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: From theory to algorithms. Cambridge University Press, Cambridge

    Book  Google Scholar 

  16. Steinwart I (2007) How to compare different loss functions and their risks. Construct Approx 26(2):225–287

    Article  MathSciNet  Google Scholar 

  17. Zhang T (2004) Statistical behavior and consistency of classification methods based on convex risk minimization. Ann Statist 32(1):56–85

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Guerraoui, R., Gupta, N., Pinot, R. (2024). Basics of Machine Learning. In: Robust Machine Learning. Machine Learning: Foundations, Methodologies, and Applications. Springer, Singapore. https://doi.org/10.1007/978-981-97-0688-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0688-4_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0687-7

  • Online ISBN: 978-981-97-0688-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics