Abstract
Many machine learning models are often cast as continuous optimization problems in multiple variables. The simplest example of such a problem is least-squares regression, which is also viewed as a fundamental problem in linear algebra. This is because solving a (consistent) system of equations is a special case of least-squares regression. In least-squares regression, one finds the best-fit solution to a system of equations that may or may not be consistent, and the loss corresponds to the aggregate squared error of the best fit. The special case of a consistent system of equations yields a loss value of 0. Least-squares regression has a special place in linear algebra, optimization, and machine learning, because it serves as a foundational problem in all three disciplines. Least-squares regression historically preceded the classification problem in machine learning, and the optimization models for classification were often motivated as modifications of the least-squares regression model. The main difference between least-squares regression and classification is that the predicted target variable is numerical in the former, whereas it is discrete (typically binary) in the latter. Therefore, the optimization model for linear regression needs to be “repaired” in order to make it usable for discrete target variables. This chapter will make a special effort to show how least-squares regression is so foundational to machine learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It is possible to construct pathological counter-examples where this is not true.
References
C. Aggarwal. Data mining: The textbook. Springer, 2015.
C. Aggarwal. Machine learning for text. Springer, 2018.
C. Aggarwal. Recommender systems: The textbook. Springer, 2016.
C. Aggarwal. Outlier analysis. Springer, 2017.
M. Bazaraa, H. Sherali, and C. Shetty. Nonlinear programming: theory and algorithms. John Wiley and Sons, 2013.
J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, pp. 281–305, 2012.
D. Bertsekas. Nonlinear programming. Athena scientific, 1999.
D. Bertsimas and J. Tsitsiklis. Introduction to linear optimization. Athena Scientific, 1997.
C. M. Bishop. Pattern recognition and machine learning. Springer, 2007.
C. M. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995.
E. Bodewig. Matrix calculus. Elsevier, 2014.
S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 2004.
C. Chang and C. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27, 2011. http://www.csie.ntu.edu.tw/~cjlin/libsvm/
C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3), pp. 273–297, 1995.
N. Cristianini, and J. Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, 2000.
N. Draper and H. Smith. Applied regression analysis. John Wiley & Sons, 2014.
R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley and Sons, 2012.
R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, pp. 1871–1874, 2008. http://www.csie.ntu.edu.tw/~cjlin/liblinear/
R. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7: pp. 179–188, 1936.
P. Flach. Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press, 2012.
G. Golub and C. F. Van Loan. Matrix computations, John Hopkins University Press, 2012.
I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT Press, 2016.
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009.
G. Hinton. Connectionist learning procedures. Artificial Intelligence, 40(1–3), pp. 185–234, 1989.
S. Marsland. Machine learning: An algorithmic perspective, CRC Press, 2015.
T. Minka. A comparison of numerical optimizers for logistic regression. Unpublished Draft, 2003.
T. Mitchell. Machine learning, McGraw Hill, 1997.
K. Murphy. Machine learning: A probabilistic perspective, MIT Press, 2012.
J. Nocedal and S. Wright. Numerical optimization. Springer, 2006.
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386, 1958.
G. Strang. An introduction to linear algebra, Fifth Edition. Wellseley-Cambridge Press, 2016.
G. Strang. Linear algebra and its applications, Fourth Edition. Brooks Cole, 2011.
A. Tikhonov and V. Arsenin. Solution of ill-posed problems. Winston and Sons, 1977.
H. Wendland. Numerical linear algebra: An introduction. Cambridge University Press, 2018.
B. Widrow and M. Hoff. Adaptive switching circuits. IRE WESCON Convention Record, 4(1), pp. 96–104, 1960.
S. Wright. Coordinate descent algorithms. Mathematical Programming, 151(1), pp. 3–34, 2015.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Aggarwal, C.C. (2020). Optimization Basics: A Machine Learning View. In: Linear Algebra and Optimization for Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-40344-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-40344-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-40343-0
Online ISBN: 978-3-030-40344-7
eBook Packages: Computer ScienceComputer Science (R0)