Learning Additive Models Online with Fast Evaluating Kernels

Herbster, Mark

doi:10.1007/3-540-44581-1_29

Mark Herbster³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2111))

Included in the following conference series:

International Conference on Computational Learning Theory

2057 Accesses
17 Citations

Abstract

We develop three new techniques to build on the recent advances in online learning with kernels. First, we show that an exponential speed-up in prediction time per trial is possible for such algorithms as the Kernel-Adatron,the Kernel-Perceptron,and ROMMA for specific additive models. Second, we show that the techniques of the recent algorithms developed for online linear prediction when the best predictor changes over time may be implemented for kernel-based learners at no additional asymptotic cost. Finally, we introduce a new online kernelbased learning algorithm for which we give worst-case loss bounds for the ε-insensitive square loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M.A. Aizerman, E.M. Braverman,and L.I. Rozonoér. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821–837,1964.
Google Scholar
N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc.,68: 337–404,1950.
Article MATH MathSciNet Google Scholar
P. Auer and M.K. Warmuth. Tracking the best disjunction. Journal of Machine Learning, 32(2):127–150, August 1998. Special issue on concept drift.
Article MATH Google Scholar
Katy S. Azoury and M.K. Warmuth. Relative loss bounds for on-line density estirnation with the exponential family of distributions. In Kathryn B. Laskey and Henri Prade, editors, Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), pages 31–40, S.F., Cal., July 30-August 1 1999. Morgan Kaufmann Publishers.
Google Scholar
Heinz H. Bauschke and Jonathan M. Borwein. On projection algorithms for solving convex feasibility problems. SIAM Review, 38(3):367–426, September 1996.
Article MATH MathSciNet Google Scholar
A. Blum and C. Burch. On-line learning and the metrical task system problem. Machine Learning, 39(1):35–58, 2000.
Article MATH Google Scholar
B.E. Boser, I.M. Guyon,and V.N. Vapnik. A training algorithm for optimal margin classifiers. In Proc. 5th Annu. Workshop on Comput. Learning Theory, pages 144–152. ACM Press, New York, NY, 1992.
Chapter Google Scholar
N. Cesa-Bianchi, P. Long, and M.K. Warmuth. Worst-case quadratic loss bounds for on-line prediction of linear functions by gradient descent. IEEE Transactions on Neural Networks, 7(2):604–619, May 1996.
Article Google Scholar
T.H. Cormen, C.E. Leiserson,and R.L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1990.
Google Scholar
N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.
Google Scholar
V. Faber and J. Mycielski. Applications of learning theorems. Fundamenta Informaticae, 15(2):145–167, 1991.
MATH MathSciNet Google Scholar
D.P. Foster. Prediction in the worst case. The Annals of Statistics, 19(2): 1084–1090, 1991.
Article MATH MathSciNet Google Scholar
Yoav Freund and Robert E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999.
Article MATH Google Scholar
Thilo-Thomas Frieß, Nello Cristianini,and Colin Campbell. The Kernel-Adatron algorithm: a fast and simple learning procedure for Support Vector machines. In Proc. 15th International Conf. on Machine Learning, pages 188–196. Morgan Kaufmann, San Francisco, CA, 1998.
Google Scholar
C. Gentile. A new approximate maximal margin classification algorithm. In T.K. Leen, T.G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, volume 13, 2001.
Google Scholar
Federico Girosi, Michael Jones, and Tomaso Poggio. Regularization theory and neural networks architectures. Neural Computation, 7(2):219–269, 1995.
Article Google Scholar
T. Hastie and R. Tibshirani. Generalized additive models, 1990.
Google Scholar
D. Haussler, J. Kivinen, and M.K. Warmuth. Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory, 44(2):1906–1925, September 1998.
Article MATH MathSciNet Google Scholar
D.P. Helmbold, J. Kivinen, and M.K. Warmuth. Relative loss bounds for single neurons. Journal of Machine Learning, 2001. To appear.
Google Scholar
Mark Herbster and Manfred Warmuth. Tracking the best expert. In Proc. 12th International Conference on Machine Learning, pages 286–294. Morgan Kaufmann, 1995.
Google Scholar
Mark Herbster and Manfred K. Warmuth. Tracking the best regressor. In Proc. 11th Annu. Conf. on Comput. Learning Theory, pages 24–31. ACM Press, New York,NY, 1998.
Google Scholar
G.S. Kimeldorf and G. Wahba. Some results on tchebycheffian spline functions. J. Math. Anal. Applications, 33(1):82–95, 1971.
Article MATH MathSciNet Google Scholar
J. Kivinen and M.K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1):1–64, January 1997.
Article MATH MathSciNet Google Scholar
Y. LeCun, L. Jackel, L. Bottou, A. Brunot, C. Cortes, J. Denker, H. Drucker, I. Guyon, U. Muller, E. Sackinger, P. Simard, and V. Vapnik. Comparison of learning algorithms for handwritten digit recognition, 1995.
Google Scholar
Y. Li. and P. Long. The relaxed online maximum margin algorithm. Machine Learning, 2001.
Google Scholar
N. Littlestone. Learning when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988.
Google Scholar
N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of California Santa Cruz, 1989.
Google Scholar
N. Littlestone and M.K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994.
Article MATH MathSciNet Google Scholar
J. Mercer. Functions of a positive and negative type and their connection with the threory of integral equations. Philosophical Transactions Royal Society London Ser. A.,209, 1909.
Google Scholar
E.H. Moore. General Analysis. Part I. American Philosophical Society, Philadelphia, 1935.
Google Scholar
J. Von Neumann. Functional Operators, Vol II. The Geometry of orthogonal spaces, volume 22. Princeton University Press, 1950.
Google Scholar
A. Novikoff. On convergence proofs for perceptrons. In Proc. Sympos. Math. Theory of Automata (New York, 1962),pages 615–622. Polytechnic Press of Polytechnic Inst. of Brooklyn, Brooklyn, N.Y., 1963.
Google Scholar
J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C.J.C. Burges,and A.J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 185–208, Cambridge, MA, 1999. MIT Press.
Google Scholar
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psych. Rev., 65:386–407, 1958. (Reprinted in Neurocomputing (MIT Press, 1988).).
Article MathSciNet Google Scholar
Walter Rudin. Real and Complex Analysis. McGraw-Hill, New York, 3edition, 1986.
Google Scholar
G. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual variables. In Proc. 15th International Conf. on Machine Learning, pages 515–521. Morgan Kaufmann, San Francisco, CA, 1998.
Google Scholar
John Shawe-Taylor and Nello Cristianini. Further results on the margin distribution. In Proc. 12th Annu. Conf. on Comput. Learning Theory, pages 278–285. ACM Press, New York, NY, 1999.
Google Scholar
A. Smola. Large scale and online learning with kernels. Talk given Dec 5,2000 at Royal Holloway University, based on joint work with J. Kivinen, P. Wankadia, and R. Williamson.
Google Scholar
V. Vapnik. Statistical Learning Theory. John Wiley, 1998.
Google Scholar
V. Vapnik, S. Golowich, and A. Smola. Support vector method for function approximation, regression estimation,and signal processing. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 281–287, Cambridge, MA, 1997. MIT Press.
Google Scholar
V.N. Vapnik and A.Y. Chervonenkis. Teoriya raspoznavaniya obrazov. Statisticheskie problemy obucheniya. [Theory of Pattern Recognition. Izdat. “Nauka”, Moscow, 1974.
Google Scholar
V. Vovk. Aggregating strategies. In Proc. 3rd Annu. Workshop on Comput. Learning Theory, pages 371–383. Morgan Kaufmann, 1990.
Google Scholar
V. Vovk. Derandomizing stochastic prediction strategies. In Proc. 10th Annu. Workshop on Comput. Learning Theory. ACM Press, New York, NY, 1997.
Google Scholar
Volodya Vovk. Competitive on-line linear regression. In Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, editors, Advances in Neural Information Processing Systems, volume 10. The MIT Press, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
Mark Herbster

Authors

Mark Herbster
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, Department of Computer Science, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
David Helmbold
Research School of Information Sciences and Engineering Department of Telecommunications Engineering, Australian National University, Canberra, 0200, Australia
Bob Williamson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Herbster, M. (2001). Learning Additive Models Online with Fast Evaluating Kernels. In: Helmbold, D., Williamson, B. (eds) Computational Learning Theory. COLT 2001. Lecture Notes in Computer Science(), vol 2111. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44581-1_29

Download citation

DOI: https://doi.org/10.1007/3-540-44581-1_29
Published: 13 September 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42343-0
Online ISBN: 978-3-540-44581-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics