Abstract
Online learning and its variants are one of the main models of computational learning theory, complementing statistical PAC learning and related models. An online learner needs to make predictions about a sequence of instances, one after the other, and receives feedback after each prediction. The performance of the online learner is typically compared to the best predictor from a given class, often in terms of its excess loss (the regret) over the best predictor. Some of the fundamental online learning algorithms and their variants are discussed: weighted majority, follow the perturbed leader, follow the regularized leader, the perceptron algorithm, the doubling trick, bandit algorithms, and the issue of adaptive versus oblivious instance sequences. A typical performance proof of an online learning algorithm is exemplified for the perceptron algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Angluin D (1988) Queries and concept learning. Mach Learn 2:319–342
Auer P, Cesa-Bianchi N, Freund Y, Schapire R (2002) The nonstochastic multiarmed bandit problem. SIAM J Comput 32:48–77
Bartók G, Foster D, Pál D, Rakhlin A, Szepesvári C (2014) Partial monitoring—classification, regret bounds, and algorithms. Math Oper Res 39: 967–997
Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found Trends Mach Learn 5:1–122
Cesa-Bianchi N, Freund Y, Haussler D, Helmbold D, Schapire R, Warmuth M (1997) How to use expert advice. JACM 44:427–485
Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games. Cambridge University Press, Cambridge/New York
Dekel O, Tewari A, Arora R (2012) Online bandit learning against an adaptive adversary: from regret to policy regret. In: Proceedings of the 29th international conference on machine learning, Edinburgh
Hannan J (1957) Approximation to Bayes risk in repeated play. Contrib Theory Games 3:97–139
Littlestone N (1988) Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach Learn 2:285–318
Littlestone N, Warmuth M (1994) The weighted majority algorithm. Inf Comput 108:212–261
Luo H, Schapire RE (2015) Achieving all with no parameters: Adanormalhedge. In: Proceedings of the 28th conference on learning theory, Paris, pp 1286–1304
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65:386–408
Shalev-Shwartz S (2011) Online learning and online convex optimization. Found Trends Mach Learn 4: 107–194
Vovk V (1990) Aggregating strategies. In: Proceedings of 3rd annual workshop on computational learning theory, Rochester. Morgan Kaufmann, pp 371–386
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
Auer, P. (2017). Online Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_618
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_618
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering