Online Learning of Linear Classifiers

Kivinen, Jyrki

doi:10.1007/3-540-36434-X_8

Jyrki Kivinen³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2600))

3060 Accesses
4 Citations

Abstract

This paper surveys some basic techniques and recent results related to online learning.Our focus is on linear classification.The most familiar algorithm for this task is the perceptron.We explain the perceptron algorithm and its convergence proof as an instance of a generic method based on Bregman divergences.This leads to a more general algorithm known as the p -norm perceptron.We give the proof for generalizing the perceptron convergence theorem for the p -norm perceptron and the non-separable case.We also show how regularization,again based on Bregman divergences,can make an online algorithm more robust against target movement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P. Auer, N. Cesa-Bianchi and C. Gentile. Adaptive and self-confident on-line learning algorithms. Journal of Computer and System Sciences, 64:48–75, 2002.
Article MATH MathSciNet Google Scholar
B.E. Boser, I.M. Guyon and V.N. Vapnik. A training algorithm for optimal margin classifiers. In Proc.5th Annual Workshop on Computational Learning Theory, pages 144–152. ACM Press, New York, NY,1992.
Google Scholar
L.M. Bregman.The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming.USSR Computational Mathematics and Physics, 7:200–217, 1967.
Google Scholar
I. Csiszar.Why least squares and maximum entropy?An axiomatic approach for linear inverse problems. The Annals of Statistics, 19:2032–2066, 1991.
Google Scholar
Y. Freund and R.E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37:277–296,1999.
Article MATH Google Scholar
C. Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2:213–242, 2001.
Article MathSciNet Google Scholar
C. Gentile and N. Littlestone. The robustness of the p-norm algorithms. In Proc. 12th Annual Conference on Computational Learning Theory, pages 1–11. ACM Press, New York, NY, 1999.
Google Scholar
C. Gentile and M.K. Warmuth. Hinge loss and average margin. In M.S. Kearns, S.A. Solla and D.A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 225–231. MIT Press, Cambridge, MA, 1998.
Google Scholar
A.J. Grove, N. Littlestone and D. Schuurmans. General convergence results for linear discriminant updates. Machine Learning, 43:173–210, 2001.
Article MATH Google Scholar
D.P. Helmbold, J. Kivinen and M.K. Warmuth. Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10:1291–1304, 1999.
Article Google Scholar
R. Herbrich. Learning Kernel Classifiers:Theory and Algorithms. MIT Press, Cambridge, MA, 2002.
Google Scholar
M. Herbster and M.K. Warmuth. Tracking the best linear predictor. Journal of Machine Learning Research, 1:281–309, 2001.
Article MATH MathSciNet Google Scholar
K.-U. Höffgen, H.-U. Simon and K.S. Van Horn. Robust trainability of single neurons. Journal of Computer and System Sciences, 50:114–125, 1995.
Article MATH MathSciNet Google Scholar
L. Jones and C. Byrne.General entropy criteria for inverse problems,with applications to data compression,pattern classification and cluster analysis. IEEE Transactions on Information Theory, 36:23–30, 1990.
Article MATH MathSciNet Google Scholar
J. Kivinen, A.J. Smola and R.C. Williamson. Online learning with kernels. In T.G. Dietterich, S. Becker and Z. Ghahramani, editors, Advances in Neural Information Processing Systems14, pages 785–792. MIT Press, Cambridge,MA, 2002.
Google Scholar
J. Kivinen and M.K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132:1–64, 19
Article MATH MathSciNet Google Scholar
J. Kivinen and M.K. Warmuth. Relative loss bounds for multidimensional regression problems. Machine Learning, 45:301–329, 2001.
Article MATH Google Scholar
J. Kivinen, M.K. Warmuth and P. Auer. The Perceptron algorithm vs.Winnow:linear vs. logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, 97:325–343, 1997.
Article MATH MathSciNet Google Scholar
J. Kivinen, A.J. Smola and R.C. Williamson. Large margin classification for moving targets. In N. Cesa-Bianchi, M. Numao and R. Reischuk, editors, Proc. 13th International Conference on Algorithmic Learning Theory. Springer, Berlin, November 2002.
Google Scholar
N. Littlestone. Learning quickly when irrelevant attributes abound:A new linear threshold algorithm. Machine Learning, 2:285–318, 1988.
Google Scholar
N. Littlestone.Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of CaliforniaSanta Cruz, 1989.
Google Scholar
C. Mesterharm. Tracking linear-threshold concepts with Winnow. In J. Kivinen and B. Sloan, editors, Proc.15th Annual Conference on Computational Learning Theory, pages 138–152. Springer, Berlin, 2002.
Google Scholar
A.B.J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume 12, pages 615–622. Polytechnic Institute of Brooklyn, 1962.
Google Scholar
G. Rätsch and M.K. Warmuth. Maximizing the margin with boosting. In J. Kivinen and B. Sloan, editors, Proc.15th Annual Conference on Computational Learning Theory, pages 334–350. Springer, Berlin, 2002.
Google Scholar
R. Rockafellar. Convex Analysis. Princeton University Press, 1970.
Google Scholar
V.N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer, New York, NY, 1982.
MATH Google Scholar
M.K. Warmuth and A. Jagota. Continuous and discrete time nonlinear gradient descent:relative loss bounds and convergence. In R. Greiner and E. Boros, editors, Electronic Proceedings of Fifth International Symposium on Artificial Intelligence and Mathematics, http://www.rutcor.rutgers.edu/~amai, 1998.

Download references

Author information

Authors and Affiliations

Research School of Information Sciences and Engineering, Australian National University, 0200, Canberra, ACT, Australia
Jyrki Kivinen

Authors

Jyrki Kivinen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

RSISE, The Australian National University, 0200, Canberra, ACT, Australia
Shahar Mendelson
Research School for Information Sciences and Engineering, The Australian National University, 0200, Canberra, ACT, Australia
Alexander J. Smola

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kivinen, J. (2003). Online Learning of Linear Classifiers. In: Mendelson, S., Smola, A.J. (eds) Advanced Lectures on Machine Learning. Lecture Notes in Computer Science(), vol 2600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36434-X_8

Download citation

DOI: https://doi.org/10.1007/3-540-36434-X_8
Published: 30 January 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00529-2
Online ISBN: 978-3-540-36434-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics