Optimization Basics: A Machine Learning View

Aggarwal, Charu C.

doi:10.1007/978-3-030-40344-7_4

Charu C. Aggarwal²

15k Accesses
2 Citations
1 Altmetric

Abstract

Many machine learning models are often cast as continuous optimization problems in multiple variables. The simplest example of such a problem is least-squares regression, which is also viewed as a fundamental problem in linear algebra. This is because solving a (consistent) system of equations is a special case of least-squares regression. In least-squares regression, one finds the best-fit solution to a system of equations that may or may not be consistent, and the loss corresponds to the aggregate squared error of the best fit. The special case of a consistent system of equations yields a loss value of 0. Least-squares regression has a special place in linear algebra, optimization, and machine learning, because it serves as a foundational problem in all three disciplines. Least-squares regression historically preceded the classification problem in machine learning, and the optimization models for classification were often motivated as modifications of the least-squares regression model. The main difference between least-squares regression and classification is that the predicted target variable is numerical in the former, whereas it is discrete (typically binary) in the latter. Therefore, the optimization model for linear regression needs to be “repaired” in order to make it usable for discrete target variables. This chapter will make a special effort to show how least-squares regression is so foundational to machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Hardcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
It is possible to construct pathological counter-examples where this is not true.

References

C. Aggarwal. Data mining: The textbook. Springer, 2015.
MATH Google Scholar
C. Aggarwal. Machine learning for text. Springer, 2018.
Book Google Scholar
C. Aggarwal. Recommender systems: The textbook. Springer, 2016.
Book Google Scholar
C. Aggarwal. Outlier analysis. Springer, 2017.
Book Google Scholar
M. Bazaraa, H. Sherali, and C. Shetty. Nonlinear programming: theory and algorithms. John Wiley and Sons, 2013.
MATH Google Scholar
J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, pp. 281–305, 2012.
MathSciNet MATH Google Scholar
D. Bertsekas. Nonlinear programming. Athena scientific, 1999.
Google Scholar
D. Bertsimas and J. Tsitsiklis. Introduction to linear optimization. Athena Scientific, 1997.
Google Scholar
C. M. Bishop. Pattern recognition and machine learning. Springer, 2007.
MATH Google Scholar
C. M. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995.
MATH Google Scholar
E. Bodewig. Matrix calculus. Elsevier, 2014.
MATH Google Scholar
S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 2004.
Book Google Scholar
C. Chang and C. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27, 2011. http://www.csie.ntu.edu.tw/~cjlin/libsvm/
C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3), pp. 273–297, 1995.
MATH Google Scholar
N. Cristianini, and J. Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, 2000.
Book Google Scholar
N. Draper and H. Smith. Applied regression analysis. John Wiley & Sons, 2014.
MATH Google Scholar
R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley and Sons, 2012.
MATH Google Scholar
R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, pp. 1871–1874, 2008. http://www.csie.ntu.edu.tw/~cjlin/liblinear/
MATH Google Scholar
R. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7: pp. 179–188, 1936.
Article Google Scholar
P. Flach. Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press, 2012.
Book Google Scholar
G. Golub and C. F. Van Loan. Matrix computations, John Hopkins University Press, 2012.
MATH Google Scholar
I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT Press, 2016.
MATH Google Scholar
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009.
Book Google Scholar
G. Hinton. Connectionist learning procedures. Artificial Intelligence, 40(1–3), pp. 185–234, 1989.
Article Google Scholar
S. Marsland. Machine learning: An algorithmic perspective, CRC Press, 2015.
Google Scholar
T. Minka. A comparison of numerical optimizers for logistic regression. Unpublished Draft, 2003.
Google Scholar
T. Mitchell. Machine learning, McGraw Hill, 1997.
MATH Google Scholar
K. Murphy. Machine learning: A probabilistic perspective, MIT Press, 2012.
MATH Google Scholar
J. Nocedal and S. Wright. Numerical optimization. Springer, 2006.
MATH Google Scholar
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386, 1958.
Google Scholar
G. Strang. An introduction to linear algebra, Fifth Edition. Wellseley-Cambridge Press, 2016.
MATH Google Scholar
G. Strang. Linear algebra and its applications, Fourth Edition. Brooks Cole, 2011.
Google Scholar
A. Tikhonov and V. Arsenin. Solution of ill-posed problems. Winston and Sons, 1977.
MATH Google Scholar
H. Wendland. Numerical linear algebra: An introduction. Cambridge University Press, 2018.
MATH Google Scholar
B. Widrow and M. Hoff. Adaptive switching circuits. IRE WESCON Convention Record, 4(1), pp. 96–104, 1960.
Google Scholar
S. Wright. Coordinate descent algorithms. Mathematical Programming, 151(1), pp. 3–34, 2015.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Charu C. Aggarwal (Distinguished Research Staff Member)

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C.C. (2020). Optimization Basics: A Machine Learning View. In: Linear Algebra and Optimization for Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-40344-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-40344-7_4
Published: 13 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-40343-0
Online ISBN: 978-3-030-40344-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics