Skip to main content

Online Learning of Linear Classifiers

  • Chapter
  • First Online:
Advanced Lectures on Machine Learning

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2600))

Abstract

This paper surveys some basic techniques and recent results related to online learning.Our focus is on linear classification.The most familiar algorithm for this task is the perceptron.We explain the perceptron algorithm and its convergence proof as an instance of a generic method based on Bregman divergences.This leads to a more general algorithm known as the p -norm perceptron.We give the proof for generalizing the perceptron convergence theorem for the p -norm perceptron and the non-separable case.We also show how regularization,again based on Bregman divergences,can make an online algorithm more robust against target movement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Auer, N. Cesa-Bianchi and C. Gentile. Adaptive and self-confident on-line learning algorithms. Journal of Computer and System Sciences, 64:48–75, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  2. B.E. Boser, I.M. Guyon and V.N. Vapnik. A training algorithm for optimal margin classifiers. In Proc.5th Annual Workshop on Computational Learning Theory, pages 144–152. ACM Press, New York, NY,1992.

    Google Scholar 

  3. L.M. Bregman.The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming.USSR Computational Mathematics and Physics, 7:200–217, 1967.

    Google Scholar 

  4. I. Csiszar.Why least squares and maximum entropy?An axiomatic approach for linear inverse problems. The Annals of Statistics, 19:2032–2066, 1991.

    Google Scholar 

  5. Y. Freund and R.E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37:277–296,1999.

    Article  MATH  Google Scholar 

  6. C. Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2:213–242, 2001.

    Article  MathSciNet  Google Scholar 

  7. C. Gentile and N. Littlestone. The robustness of the p-norm algorithms. In Proc. 12th Annual Conference on Computational Learning Theory, pages 1–11. ACM Press, New York, NY, 1999.

    Google Scholar 

  8. C. Gentile and M.K. Warmuth. Hinge loss and average margin. In M.S. Kearns, S.A. Solla and D.A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 225–231. MIT Press, Cambridge, MA, 1998.

    Google Scholar 

  9. A.J. Grove, N. Littlestone and D. Schuurmans. General convergence results for linear discriminant updates. Machine Learning, 43:173–210, 2001.

    Article  MATH  Google Scholar 

  10. D.P. Helmbold, J. Kivinen and M.K. Warmuth. Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10:1291–1304, 1999.

    Article  Google Scholar 

  11. R. Herbrich. Learning Kernel Classifiers:Theory and Algorithms. MIT Press, Cambridge, MA, 2002.

    Google Scholar 

  12. M. Herbster and M.K. Warmuth. Tracking the best linear predictor. Journal of Machine Learning Research, 1:281–309, 2001.

    Article  MATH  MathSciNet  Google Scholar 

  13. K.-U. Höffgen, H.-U. Simon and K.S. Van Horn. Robust trainability of single neurons. Journal of Computer and System Sciences, 50:114–125, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  14. L. Jones and C. Byrne.General entropy criteria for inverse problems,with applications to data compression,pattern classification and cluster analysis. IEEE Transactions on Information Theory, 36:23–30, 1990.

    Article  MATH  MathSciNet  Google Scholar 

  15. J. Kivinen, A.J. Smola and R.C. Williamson. Online learning with kernels. In T.G. Dietterich, S. Becker and Z. Ghahramani, editors, Advances in Neural Information Processing Systems14, pages 785–792. MIT Press, Cambridge,MA, 2002.

    Google Scholar 

  16. J. Kivinen and M.K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132:1–64, 19

    Article  MATH  MathSciNet  Google Scholar 

  17. J. Kivinen and M.K. Warmuth. Relative loss bounds for multidimensional regression problems. Machine Learning, 45:301–329, 2001.

    Article  MATH  Google Scholar 

  18. J. Kivinen, M.K. Warmuth and P. Auer. The Perceptron algorithm vs.Winnow:linear vs. logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, 97:325–343, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  19. J. Kivinen, A.J. Smola and R.C. Williamson. Large margin classification for moving targets. In N. Cesa-Bianchi, M. Numao and R. Reischuk, editors, Proc. 13th International Conference on Algorithmic Learning Theory. Springer, Berlin, November 2002.

    Google Scholar 

  20. N. Littlestone. Learning quickly when irrelevant attributes abound:A new linear threshold algorithm. Machine Learning, 2:285–318, 1988.

    Google Scholar 

  21. N. Littlestone.Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of CaliforniaSanta Cruz, 1989.

    Google Scholar 

  22. C. Mesterharm. Tracking linear-threshold concepts with Winnow. In J. Kivinen and B. Sloan, editors, Proc.15th Annual Conference on Computational Learning Theory, pages 138–152. Springer, Berlin, 2002.

    Google Scholar 

  23. A.B.J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume 12, pages 615–622. Polytechnic Institute of Brooklyn, 1962.

    Google Scholar 

  24. G. Rätsch and M.K. Warmuth. Maximizing the margin with boosting. In J. Kivinen and B. Sloan, editors, Proc.15th Annual Conference on Computational Learning Theory, pages 334–350. Springer, Berlin, 2002.

    Google Scholar 

  25. R. Rockafellar. Convex Analysis. Princeton University Press, 1970.

    Google Scholar 

  26. V.N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer, New York, NY, 1982.

    MATH  Google Scholar 

  27. M.K. Warmuth and A. Jagota. Continuous and discrete time nonlinear gradient descent:relative loss bounds and convergence. In R. Greiner and E. Boros, editors, Electronic Proceedings of Fifth International Symposium on Artificial Intelligence and Mathematics, http://www.rutcor.rutgers.edu/~amai, 1998.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kivinen, J. (2003). Online Learning of Linear Classifiers. In: Mendelson, S., Smola, A.J. (eds) Advanced Lectures on Machine Learning. Lecture Notes in Computer Science(), vol 2600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36434-X_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-36434-X_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00529-2

  • Online ISBN: 978-3-540-36434-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics