Abstract
This is a primer on machine learning for beginners. Certainly, there are plenty of excellent books on the subject, providing detailed explanations of many algorithms. The intent of this primer is not to outdo these texts in rigor; rather, to provide an introduction to the subject that is accessible, yet covers all the mathematical details, and provides implementations of most algorithms in Python. We feel this provides a well-rounded understanding of each algorithm: only by writing the code and seeing the math applied, and visually inspecting the algorithm’s working, will a reader be fully able to connect all the dots. The style of the primer is largely conversational, and avoids too much formal jargon. We will certainly introduce all required technical terms, but while explaining an algorithm, we will use simple English and avoid unnecessarily formalisms. We hope this proves useful for individuals willing to seriously study the subject.
Similar content being viewed by others
Data Availability Statement
This manuscript has associated data in a data repository. [Authors’ comment: ...].
Notes
Blog Link: https://beginningwithml.wordpress.com/.
There are other necessary conditions for a matrix to be invertible, but being a square matrix is a fundamental requirement.
This is not, strictly speaking, true. In some cases, the algorithm will perform worse than if the sample was within the range, but in such cases, not scaling would almost certainly not be of help. You could fix this by performing outlier analysis, which aims to find such samples, or by clipping the value to 1, which is a less frequently used approach, but useful in some domains.
We will talk about kernel functions in a lot more detail when we discuss support vector machines. This is just an intuitive understanding of kernels.
By Inductiveload—self-made, Mathematica, Inkscape, Public Domain, link: https://commons.wikimedia.org/w/index.php?curid=3817954.
By Nicoguaro—Own work, CC BY 4.0, link: https://commons.wikimedia.org/w/index.php?curid=46259145.
It is actually pretty friendly; it just has an unfortunate name.
Credits: CS229 materials from Stanford SEE.
This specific example is called the duck test—and it is where “duck typing" in Python gets its name.
Machine Learning, 2nd Edition, by Tom M. Mitchell.
Fayyad and Irani, 1991. On the handling of continuous-valued attributes in decision tree generation. http://web.cs.iastate.edu/~honavar/fayyad.pdf.
Fayyad and Irani, 1993. Multi-interval discretization of continuous-valued attributes for classification learning. https://www.ijcai.org/Proceedings/93-2/Papers/022.pdf.
Quinlan, 1986. Induction of decision trees. http://hunch.net/~coms-4771/quinlan.pdf.
Srivastava, Nitish, et al. “Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 (2014):1929–1958.
Ioffe, Sergey, and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” arXiv preprint arXiv:1502.03167 (2015).
Santurkar, Shibani, et al. “How does batch normalization help optimization?.” Advances in Neural Information Processing Systems. 2018.
Salimans, Tim, and Durk P. Kingma. “Weight normalization: A simple reparameterization to accelerate training of deep neural networks.” Advances in Neural Information Processing Systems. 2016.
He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015.
By Stephenekka—Own work, CC BY-SA 4.0, Link: https://commons.wikimedia.org/w/index.php?curid=49572625.
Smith, Leslie N. "A disciplined approach to neural network hyper-parameters: Part 1—learning rate, batch size, momentum, and weight decay." arXiv preprint arXiv:1803.09820 (2018).
Smith, Leslie N. “Cyclical learning rates for training neural networks.” 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2017.
Seong, Sihyeon, et al. “Towards Flatter Loss Surface via Nonmonotonic Learning Rate Scheduling.” UAI. 2018.
Yedida, Rahul, and Snehanshu Saha. “A novel adaptive learning rate scheduler for deep neural networks.” arXiv preprint arXiv:1902.07399 (2019).
Li, Hao, et al. “Visualizing the loss landscape of neural nets.” Advances in Neural Information Processing Systems. 2018.
Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” European conference on computer vision. Springer, Cham, 2014.
Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. “Distilling the knowledge in a neural network.” arXiv preprint arXiv:1503.02531 (2015).
Furlanello, Tommaso, et al. “Born again neural networks.” arXiv preprint arXiv:1805.04770 (2018).
Tan, P.N., 2018. Introduction to data mining. Pearson Education India.
Tan, P.N., 2018. Introduction to data mining. Pearson Education India.
Tan, P.N., 2018. Introduction to data mining. Pearson Education India.
From Tan, P.N., 2018. Introduction to data mining. Pearson Education India.
Bach, F.R. and Jordan, M.I., 2004. Learning spectral clustering. In Advances in neural information processing systems (pp. 305-312).
Ng, A.Y., Jordan, M.I. and Weiss, Y., 2002. On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems (pp. 849–856).
see Bernard Desgraupes notes: https://cran.r-project.org/web/packages/clusterCrit/vignettes/clusterCrit.pdf.
from L. Kaufman and P. J. Rousseeuw, Finding groups in data: an introduction to cluster analysis, vol. 344. John Wiley & Sons, 2009.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yedida, R., Saha, S. Beginning with machine learning: a comprehensive primer. Eur. Phys. J. Spec. Top. 230, 2363–2444 (2021). https://doi.org/10.1140/epjs/s11734-021-00209-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1140/epjs/s11734-021-00209-7