Abstract
This paper surveys some basic techniques and recent results related to online learning.Our focus is on linear classification.The most familiar algorithm for this task is the perceptron.We explain the perceptron algorithm and its convergence proof as an instance of a generic method based on Bregman divergences.This leads to a more general algorithm known as the p -norm perceptron.We give the proof for generalizing the perceptron convergence theorem for the p -norm perceptron and the non-separable case.We also show how regularization,again based on Bregman divergences,can make an online algorithm more robust against target movement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
P. Auer, N. Cesa-Bianchi and C. Gentile. Adaptive and self-confident on-line learning algorithms. Journal of Computer and System Sciences, 64:48–75, 2002.
B.E. Boser, I.M. Guyon and V.N. Vapnik. A training algorithm for optimal margin classifiers. In Proc.5th Annual Workshop on Computational Learning Theory, pages 144–152. ACM Press, New York, NY,1992.
L.M. Bregman.The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming.USSR Computational Mathematics and Physics, 7:200–217, 1967.
I. Csiszar.Why least squares and maximum entropy?An axiomatic approach for linear inverse problems. The Annals of Statistics, 19:2032–2066, 1991.
Y. Freund and R.E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37:277–296,1999.
C. Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2:213–242, 2001.
C. Gentile and N. Littlestone. The robustness of the p-norm algorithms. In Proc. 12th Annual Conference on Computational Learning Theory, pages 1–11. ACM Press, New York, NY, 1999.
C. Gentile and M.K. Warmuth. Hinge loss and average margin. In M.S. Kearns, S.A. Solla and D.A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 225–231. MIT Press, Cambridge, MA, 1998.
A.J. Grove, N. Littlestone and D. Schuurmans. General convergence results for linear discriminant updates. Machine Learning, 43:173–210, 2001.
D.P. Helmbold, J. Kivinen and M.K. Warmuth. Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10:1291–1304, 1999.
R. Herbrich. Learning Kernel Classifiers:Theory and Algorithms. MIT Press, Cambridge, MA, 2002.
M. Herbster and M.K. Warmuth. Tracking the best linear predictor. Journal of Machine Learning Research, 1:281–309, 2001.
K.-U. Höffgen, H.-U. Simon and K.S. Van Horn. Robust trainability of single neurons. Journal of Computer and System Sciences, 50:114–125, 1995.
L. Jones and C. Byrne.General entropy criteria for inverse problems,with applications to data compression,pattern classification and cluster analysis. IEEE Transactions on Information Theory, 36:23–30, 1990.
J. Kivinen, A.J. Smola and R.C. Williamson. Online learning with kernels. In T.G. Dietterich, S. Becker and Z. Ghahramani, editors, Advances in Neural Information Processing Systems14, pages 785–792. MIT Press, Cambridge,MA, 2002.
J. Kivinen and M.K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132:1–64, 19
J. Kivinen and M.K. Warmuth. Relative loss bounds for multidimensional regression problems. Machine Learning, 45:301–329, 2001.
J. Kivinen, M.K. Warmuth and P. Auer. The Perceptron algorithm vs.Winnow:linear vs. logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, 97:325–343, 1997.
J. Kivinen, A.J. Smola and R.C. Williamson. Large margin classification for moving targets. In N. Cesa-Bianchi, M. Numao and R. Reischuk, editors, Proc. 13th International Conference on Algorithmic Learning Theory. Springer, Berlin, November 2002.
N. Littlestone. Learning quickly when irrelevant attributes abound:A new linear threshold algorithm. Machine Learning, 2:285–318, 1988.
N. Littlestone.Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of CaliforniaSanta Cruz, 1989.
C. Mesterharm. Tracking linear-threshold concepts with Winnow. In J. Kivinen and B. Sloan, editors, Proc.15th Annual Conference on Computational Learning Theory, pages 138–152. Springer, Berlin, 2002.
A.B.J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume 12, pages 615–622. Polytechnic Institute of Brooklyn, 1962.
G. Rätsch and M.K. Warmuth. Maximizing the margin with boosting. In J. Kivinen and B. Sloan, editors, Proc.15th Annual Conference on Computational Learning Theory, pages 334–350. Springer, Berlin, 2002.
R. Rockafellar. Convex Analysis. Princeton University Press, 1970.
V.N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer, New York, NY, 1982.
M.K. Warmuth and A. Jagota. Continuous and discrete time nonlinear gradient descent:relative loss bounds and convergence. In R. Greiner and E. Boros, editors, Electronic Proceedings of Fifth International Symposium on Artificial Intelligence and Mathematics, http://www.rutcor.rutgers.edu/~amai, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kivinen, J. (2003). Online Learning of Linear Classifiers. In: Mendelson, S., Smola, A.J. (eds) Advanced Lectures on Machine Learning. Lecture Notes in Computer Science(), vol 2600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36434-X_8
Download citation
DOI: https://doi.org/10.1007/3-540-36434-X_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00529-2
Online ISBN: 978-3-540-36434-4
eBook Packages: Springer Book Archive