Skip to main content
Log in

Beginning with machine learning: a comprehensive primer

  • Review
  • Published:
The European Physical Journal Special Topics Aims and scope Submit manuscript

Abstract

This is a primer on machine learning for beginners. Certainly, there are plenty of excellent books on the subject, providing detailed explanations of many algorithms. The intent of this primer is not to outdo these texts in rigor; rather, to provide an introduction to the subject that is accessible, yet covers all the mathematical details, and provides implementations of most algorithms in Python. We feel this provides a well-rounded understanding of each algorithm: only by writing the code and seeing the math applied, and visually inspecting the algorithm’s working, will a reader be fully able to connect all the dots. The style of the primer is largely conversational, and avoids too much formal jargon. We will certainly introduce all required technical terms, but while explaining an algorithm, we will use simple English and avoid unnecessarily formalisms. We hope this proves useful for individuals willing to seriously study the subject.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data Availability Statement

This manuscript has associated data in a data repository. [Authors’ comment: ...].

Notes

  1. Blog Link: https://beginningwithml.wordpress.com/.

  2. https://www.coursera.org/learn/machine-learning.

  3. Image from https://rasbt.github.io/mlxtend/user_guide/general_concepts/gradient-optimization/.

  4. https://see.stanford.edu/course/cs229.

  5. There are other necessary conditions for a matrix to be invertible, but being a square matrix is a fundamental requirement.

  6. This is not, strictly speaking, true. In some cases, the algorithm will perform worse than if the sample was within the range, but in such cases, not scaling would almost certainly not be of help. You could fix this by performing outlier analysis, which aims to find such samples, or by clipping the value to 1, which is a less frequently used approach, but useful in some domains.

  7. Source: https://en.wikipedia.org/wiki/Sigmoid_function.

  8. http://ece.eng.umanitoba.ca/undergraduate/ECE4850T02/Lecture%20Slides/LocallyWeightedRegression.pdf.

  9. We will talk about kernel functions in a lot more detail when we discuss support vector machines. This is just an intuitive understanding of kernels.

  10. https://web.as.uky.edu/statistics/users/pbreheny/621/F10/notes/11-4.pdf.

  11. https://en.wikipedia.org/wiki/Local_regression#Weight_function.

  12. https://www.itl.nist.gov/div898/handbook/pmd/section1/pmd144.htm.

  13. By Inductiveload—self-made, Mathematica, Inkscape, Public Domain, link: https://commons.wikimedia.org/w/index.php?curid=3817954.

  14. http://www.cs.princeton.edu/courses/archive/spr09/cos513/scribe/lecture11.pdf.

  15. https://stats.stackexchange.com/a/353342/212844.

  16. By Nicoguaro—Own work, CC BY 4.0, link: https://commons.wikimedia.org/w/index.php?curid=46259145.

  17. https://drive.google.com/file/d/1Ngq7t_HxcvVKRRQkrepgtU-P2PaUCYKx/view?usp=sharing.

  18. It is actually pretty friendly; it just has an unfortunate name.

  19. https://math.stackexchange.com/a/602192.

  20. https://math.stackexchange.com/a/38704.

  21. Credits: https://www.byclb.com/TR/Tutorials/neural_networks/ch4_1.htm.

  22. Credits: CS229 materials from Stanford SEE.

  23. This specific example is called the duck test—and it is where “duck typing" in Python gets its name.

  24. http://math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf.

  25. Credits: https://bricaud.github.io/personal-blog/entropy-in-decision-trees/.

  26. https://1drv.ms/b/s!AiFT_8UzfVHdtwT3lwKOb3mF6ssy.

  27. https://drive.google.com/open?id=1BjZrw5_alezgJEpsKgfzSFl0z5fFRq5S.

  28. Machine Learning, 2nd Edition, by Tom M. Mitchell.

  29. Fayyad and Irani, 1991. On the handling of continuous-valued attributes in decision tree generation. http://web.cs.iastate.edu/~honavar/fayyad.pdf.

  30. Fayyad and Irani, 1993. Multi-interval discretization of continuous-valued attributes for classification learning. https://www.ijcai.org/Proceedings/93-2/Papers/022.pdf.

  31. Quinlan, 1986. Induction of decision trees. http://hunch.net/~coms-4771/quinlan.pdf.

  32. https://commons.wikimedia.org/w/index.php?curid=73710028.

  33. https://xavierbourretsicotte.github.io/SVM_implementation.html.

  34. http://goelhardik.github.io/2016/11/28/svm-cvxopt/.

  35. https://jonchar.net/notebooks/SVM/.

  36. https://people.cs.pitt.edu/~milos/courses/cs3750-Fall2007/lectures/class-kernels.pdf.

  37. http://cs229.stanford.edu/notes/cs229-notes3.pdf.

  38. https://www.coursera.org/learn/neural-networks-deep-learning.

  39. https://1drv.ms/b/s!AiFT_8UzfVHdtwgyEQcKNYmIC4v5?e=CSXVdG.

  40. https://1drv.ms/b/s!AiFT_8UzfVHdtwO9luN6QZavlfq-?e=7vCGet.

  41. https://papers.nips.cc/paper/5422-on-the-number-of-linear-regions-of-deep-neural-networks.pdf.

  42. https://arxiv.org/abs/1806.01844.

  43. https://www.researchgate.net/publication/332513541_Evolution_of_Novel_Activation_Functions_in_Neural_Network_Training_and_implications_in_Habitability_Classification.

  44. https://arxiv.org/abs/1502.01852.

  45. Srivastava, Nitish, et al. “Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 (2014):1929–1958.

  46. Ioffe, Sergey, and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” arXiv preprint arXiv:1502.03167 (2015).

  47. Santurkar, Shibani, et al. “How does batch normalization help optimization?.” Advances in Neural Information Processing Systems. 2018.

  48. Salimans, Tim, and Durk P. Kingma. “Weight normalization: A simple reparameterization to accelerate training of deep neural networks.” Advances in Neural Information Processing Systems. 2016.

  49. He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015.

  50. By Stephenekka—Own work, CC BY-SA 4.0, Link: https://commons.wikimedia.org/w/index.php?curid=49572625.

  51. Smith, Leslie N. "A disciplined approach to neural network hyper-parameters: Part 1—learning rate, batch size, momentum, and weight decay." arXiv preprint arXiv:1803.09820 (2018).

  52. Smith, Leslie N. “Cyclical learning rates for training neural networks.” 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2017.

  53. Seong, Sihyeon, et al. “Towards Flatter Loss Surface via Nonmonotonic Learning Rate Scheduling.” UAI. 2018.

  54. Yedida, Rahul, and Snehanshu Saha. “A novel adaptive learning rate scheduler for deep neural networks.” arXiv preprint arXiv:1902.07399 (2019).

  55. Li, Hao, et al. “Visualizing the loss landscape of neural nets.” Advances in Neural Information Processing Systems. 2018.

  56. Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” European conference on computer vision. Springer, Cham, 2014.

  57. Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. “Distilling the knowledge in a neural network.” arXiv preprint arXiv:1503.02531 (2015).

  58. Furlanello, Tommaso, et al. “Born again neural networks.” arXiv preprint arXiv:1805.04770 (2018).

  59. https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148.

  60. https://blog.floydhub.com/gans-story-so-far/.

  61. https://1drv.ms/b/s!AiFT_8UzfVHdtwIcgiINLQ-o6sCh?e=49iRq4.

  62. https://www.coursera.org/specializations/deep-learning.

  63. https://course.fast.ai/.

  64. https://scikit-learn.org/stable/modules/clustering.html.

  65. https://en.wikipedia.org/wiki/Coordinate_descent.

  66. Image taken from https://stats.stackexchange.com/questions/194734/dbscan-what-is-a-core-point.

  67. Tan, P.N., 2018. Introduction to data mining. Pearson Education India.

  68. https://en.wikipedia.org/wiki/DBSCAN.

  69. Image from https://www.analyticsvidhya.com/blog/2017/02/test-data-scientist-clustering/.

  70. Tan, P.N., 2018. Introduction to data mining. Pearson Education India.

  71. https://en.wikipedia.org/wiki/Ward%27s_method.

  72. https://newonlinecourses.science.psu.edu/stat505/node/146/.

  73. Tan, P.N., 2018. Introduction to data mining. Pearson Education India.

  74. From Tan, P.N., 2018. Introduction to data mining. Pearson Education India.

  75. https://en.wikipedia.org/wiki/Graph_partition#Problem.

  76. Bach, F.R. and Jordan, M.I., 2004. Learning spectral clustering. In Advances in neural information processing systems (pp. 305-312).

  77. https://en.wikipedia.org/wiki/Laplacian_matrix.

  78. https://calculatedcontent.com/2012/10/09/spectral-clustering/.

  79. Ng, A.Y., Jordan, M.I. and Weiss, Y., 2002. On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems (pp. 849–856).

  80. https://en.wikipedia.org/wiki/Dunn_index.

  81. see Bernard Desgraupes notes: https://cran.r-project.org/web/packages/clusterCrit/vignettes/clusterCrit.pdf.

  82. https://en.wikipedia.org/wiki/Silhouette_(clustering).

  83. from L. Kaufman and P. J. Rousseeuw, Finding groups in data: an introduction to cluster analysis, vol. 344. John Wiley & Sons, 2009.

  84. http://cda.psych.uiuc.edu/multivariate_fall_2012/systat_cluster_manual.pdf.

  85. see: https://en.wikipedia.org/wiki/Cophenetic_correlation.

  86. https://en.wikipedia.org/wiki/Rand_index.

  87. https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html.

  88. https://web.archive.org/web/20110124070213/http://gremlin1.gdcb.iastate.edu/MIP/gene/MicroarrayData/gapstatistics.pdf.

  89. https://datasciencelab.wordpress.com/tag/gap-statistic/.

  90. https://stats.stackexchange.com/a/11702.

  91. https://en.wikipedia.org/wiki/Jaccard_index.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Snehanshu Saha.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yedida, R., Saha, S. Beginning with machine learning: a comprehensive primer. Eur. Phys. J. Spec. Top. 230, 2363–2444 (2021). https://doi.org/10.1140/epjs/s11734-021-00209-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1140/epjs/s11734-021-00209-7

Navigation