Skip to main content

Learning Additive Models Online with Fast Evaluating Kernels

  • Conference paper
  • First Online:
Computational Learning Theory (COLT 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2111))

Included in the following conference series:

Abstract

We develop three new techniques to build on the recent advances in online learning with kernels. First, we show that an exponential speed-up in prediction time per trial is possible for such algorithms as the Kernel-Adatron,the Kernel-Perceptron,and ROMMA for specific additive models. Second, we show that the techniques of the recent algorithms developed for online linear prediction when the best predictor changes over time may be implemented for kernel-based learners at no additional asymptotic cost. Finally, we introduce a new online kernelbased learning algorithm for which we give worst-case loss bounds for the ε-insensitive square loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M.A. Aizerman, E.M. Braverman,and L.I. Rozonoér. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821–837,1964.

    Google Scholar 

  2. N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc.,68: 337–404,1950.

    Article  MATH  MathSciNet  Google Scholar 

  3. P. Auer and M.K. Warmuth. Tracking the best disjunction. Journal of Machine Learning, 32(2):127–150, August 1998. Special issue on concept drift.

    Article  MATH  Google Scholar 

  4. Katy S. Azoury and M.K. Warmuth. Relative loss bounds for on-line density estirnation with the exponential family of distributions. In Kathryn B. Laskey and Henri Prade, editors, Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), pages 31–40, S.F., Cal., July 30-August 1 1999. Morgan Kaufmann Publishers.

    Google Scholar 

  5. Heinz H. Bauschke and Jonathan M. Borwein. On projection algorithms for solving convex feasibility problems. SIAM Review, 38(3):367–426, September 1996.

    Article  MATH  MathSciNet  Google Scholar 

  6. A. Blum and C. Burch. On-line learning and the metrical task system problem. Machine Learning, 39(1):35–58, 2000.

    Article  MATH  Google Scholar 

  7. B.E. Boser, I.M. Guyon,and V.N. Vapnik. A training algorithm for optimal margin classifiers. In Proc. 5th Annu. Workshop on Comput. Learning Theory, pages 144–152. ACM Press, New York, NY, 1992.

    Chapter  Google Scholar 

  8. N. Cesa-Bianchi, P. Long, and M.K. Warmuth. Worst-case quadratic loss bounds for on-line prediction of linear functions by gradient descent. IEEE Transactions on Neural Networks, 7(2):604–619, May 1996.

    Article  Google Scholar 

  9. T.H. Cormen, C.E. Leiserson,and R.L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1990.

    Google Scholar 

  10. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.

    Google Scholar 

  11. V. Faber and J. Mycielski. Applications of learning theorems. Fundamenta Informaticae, 15(2):145–167, 1991.

    MATH  MathSciNet  Google Scholar 

  12. D.P. Foster. Prediction in the worst case. The Annals of Statistics, 19(2): 1084–1090, 1991.

    Article  MATH  MathSciNet  Google Scholar 

  13. Yoav Freund and Robert E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999.

    Article  MATH  Google Scholar 

  14. Thilo-Thomas Frieß, Nello Cristianini,and Colin Campbell. The Kernel-Adatron algorithm: a fast and simple learning procedure for Support Vector machines. In Proc. 15th International Conf. on Machine Learning, pages 188–196. Morgan Kaufmann, San Francisco, CA, 1998.

    Google Scholar 

  15. C. Gentile. A new approximate maximal margin classification algorithm. In T.K. Leen, T.G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, volume 13, 2001.

    Google Scholar 

  16. Federico Girosi, Michael Jones, and Tomaso Poggio. Regularization theory and neural networks architectures. Neural Computation, 7(2):219–269, 1995.

    Article  Google Scholar 

  17. T. Hastie and R. Tibshirani. Generalized additive models, 1990.

    Google Scholar 

  18. D. Haussler, J. Kivinen, and M.K. Warmuth. Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory, 44(2):1906–1925, September 1998.

    Article  MATH  MathSciNet  Google Scholar 

  19. D.P. Helmbold, J. Kivinen, and M.K. Warmuth. Relative loss bounds for single neurons. Journal of Machine Learning, 2001. To appear.

    Google Scholar 

  20. Mark Herbster and Manfred Warmuth. Tracking the best expert. In Proc. 12th International Conference on Machine Learning, pages 286–294. Morgan Kaufmann, 1995.

    Google Scholar 

  21. Mark Herbster and Manfred K. Warmuth. Tracking the best regressor. In Proc. 11th Annu. Conf. on Comput. Learning Theory, pages 24–31. ACM Press, New York,NY, 1998.

    Google Scholar 

  22. G.S. Kimeldorf and G. Wahba. Some results on tchebycheffian spline functions. J. Math. Anal. Applications, 33(1):82–95, 1971.

    Article  MATH  MathSciNet  Google Scholar 

  23. J. Kivinen and M.K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1):1–64, January 1997.

    Article  MATH  MathSciNet  Google Scholar 

  24. Y. LeCun, L. Jackel, L. Bottou, A. Brunot, C. Cortes, J. Denker, H. Drucker, I. Guyon, U. Muller, E. Sackinger, P. Simard, and V. Vapnik. Comparison of learning algorithms for handwritten digit recognition, 1995.

    Google Scholar 

  25. Y. Li. and P. Long. The relaxed online maximum margin algorithm. Machine Learning, 2001.

    Google Scholar 

  26. N. Littlestone. Learning when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988.

    Google Scholar 

  27. N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of California Santa Cruz, 1989.

    Google Scholar 

  28. N. Littlestone and M.K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  29. J. Mercer. Functions of a positive and negative type and their connection with the threory of integral equations. Philosophical Transactions Royal Society London Ser. A.,209, 1909.

    Google Scholar 

  30. E.H. Moore. General Analysis. Part I. American Philosophical Society, Philadelphia, 1935.

    Google Scholar 

  31. J. Von Neumann. Functional Operators, Vol II. The Geometry of orthogonal spaces, volume 22. Princeton University Press, 1950.

    Google Scholar 

  32. A. Novikoff. On convergence proofs for perceptrons. In Proc. Sympos. Math. Theory of Automata (New York, 1962),pages 615–622. Polytechnic Press of Polytechnic Inst. of Brooklyn, Brooklyn, N.Y., 1963.

    Google Scholar 

  33. J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C.J.C. Burges,and A.J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 185–208, Cambridge, MA, 1999. MIT Press.

    Google Scholar 

  34. F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psych. Rev., 65:386–407, 1958. (Reprinted in Neurocomputing (MIT Press, 1988).).

    Article  MathSciNet  Google Scholar 

  35. Walter Rudin. Real and Complex Analysis. McGraw-Hill, New York, 3edition, 1986.

    Google Scholar 

  36. G. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual variables. In Proc. 15th International Conf. on Machine Learning, pages 515–521. Morgan Kaufmann, San Francisco, CA, 1998.

    Google Scholar 

  37. John Shawe-Taylor and Nello Cristianini. Further results on the margin distribution. In Proc. 12th Annu. Conf. on Comput. Learning Theory, pages 278–285. ACM Press, New York, NY, 1999.

    Google Scholar 

  38. A. Smola. Large scale and online learning with kernels. Talk given Dec 5,2000 at Royal Holloway University, based on joint work with J. Kivinen, P. Wankadia, and R. Williamson.

    Google Scholar 

  39. V. Vapnik. Statistical Learning Theory. John Wiley, 1998.

    Google Scholar 

  40. V. Vapnik, S. Golowich, and A. Smola. Support vector method for function approximation, regression estimation,and signal processing. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 281–287, Cambridge, MA, 1997. MIT Press.

    Google Scholar 

  41. V.N. Vapnik and A.Y. Chervonenkis. Teoriya raspoznavaniya obrazov. Statisticheskie problemy obucheniya. [Theory of Pattern Recognition. Izdat. “Nauka”, Moscow, 1974.

    Google Scholar 

  42. V. Vovk. Aggregating strategies. In Proc. 3rd Annu. Workshop on Comput. Learning Theory, pages 371–383. Morgan Kaufmann, 1990.

    Google Scholar 

  43. V. Vovk. Derandomizing stochastic prediction strategies. In Proc. 10th Annu. Workshop on Comput. Learning Theory. ACM Press, New York, NY, 1997.

    Google Scholar 

  44. Volodya Vovk. Competitive on-line linear regression. In Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, editors, Advances in Neural Information Processing Systems, volume 10. The MIT Press, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Herbster, M. (2001). Learning Additive Models Online with Fast Evaluating Kernels. In: Helmbold, D., Williamson, B. (eds) Computational Learning Theory. COLT 2001. Lecture Notes in Computer Science(), vol 2111. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44581-1_29

Download citation

  • DOI: https://doi.org/10.1007/3-540-44581-1_29

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42343-0

  • Online ISBN: 978-3-540-44581-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics