Abstract
We develop three new techniques to build on the recent advances in online learning with kernels. First, we show that an exponential speed-up in prediction time per trial is possible for such algorithms as the Kernel-Adatron,the Kernel-Perceptron,and ROMMA for specific additive models. Second, we show that the techniques of the recent algorithms developed for online linear prediction when the best predictor changes over time may be implemented for kernel-based learners at no additional asymptotic cost. Finally, we introduce a new online kernelbased learning algorithm for which we give worst-case loss bounds for the ε-insensitive square loss.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M.A. Aizerman, E.M. Braverman,and L.I. Rozonoér. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821–837,1964.
N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc.,68: 337–404,1950.
P. Auer and M.K. Warmuth. Tracking the best disjunction. Journal of Machine Learning, 32(2):127–150, August 1998. Special issue on concept drift.
Katy S. Azoury and M.K. Warmuth. Relative loss bounds for on-line density estirnation with the exponential family of distributions. In Kathryn B. Laskey and Henri Prade, editors, Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), pages 31–40, S.F., Cal., July 30-August 1 1999. Morgan Kaufmann Publishers.
Heinz H. Bauschke and Jonathan M. Borwein. On projection algorithms for solving convex feasibility problems. SIAM Review, 38(3):367–426, September 1996.
A. Blum and C. Burch. On-line learning and the metrical task system problem. Machine Learning, 39(1):35–58, 2000.
B.E. Boser, I.M. Guyon,and V.N. Vapnik. A training algorithm for optimal margin classifiers. In Proc. 5th Annu. Workshop on Comput. Learning Theory, pages 144–152. ACM Press, New York, NY, 1992.
N. Cesa-Bianchi, P. Long, and M.K. Warmuth. Worst-case quadratic loss bounds for on-line prediction of linear functions by gradient descent. IEEE Transactions on Neural Networks, 7(2):604–619, May 1996.
T.H. Cormen, C.E. Leiserson,and R.L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1990.
N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.
V. Faber and J. Mycielski. Applications of learning theorems. Fundamenta Informaticae, 15(2):145–167, 1991.
D.P. Foster. Prediction in the worst case. The Annals of Statistics, 19(2): 1084–1090, 1991.
Yoav Freund and Robert E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999.
Thilo-Thomas Frieß, Nello Cristianini,and Colin Campbell. The Kernel-Adatron algorithm: a fast and simple learning procedure for Support Vector machines. In Proc. 15th International Conf. on Machine Learning, pages 188–196. Morgan Kaufmann, San Francisco, CA, 1998.
C. Gentile. A new approximate maximal margin classification algorithm. In T.K. Leen, T.G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, volume 13, 2001.
Federico Girosi, Michael Jones, and Tomaso Poggio. Regularization theory and neural networks architectures. Neural Computation, 7(2):219–269, 1995.
T. Hastie and R. Tibshirani. Generalized additive models, 1990.
D. Haussler, J. Kivinen, and M.K. Warmuth. Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory, 44(2):1906–1925, September 1998.
D.P. Helmbold, J. Kivinen, and M.K. Warmuth. Relative loss bounds for single neurons. Journal of Machine Learning, 2001. To appear.
Mark Herbster and Manfred Warmuth. Tracking the best expert. In Proc. 12th International Conference on Machine Learning, pages 286–294. Morgan Kaufmann, 1995.
Mark Herbster and Manfred K. Warmuth. Tracking the best regressor. In Proc. 11th Annu. Conf. on Comput. Learning Theory, pages 24–31. ACM Press, New York,NY, 1998.
G.S. Kimeldorf and G. Wahba. Some results on tchebycheffian spline functions. J. Math. Anal. Applications, 33(1):82–95, 1971.
J. Kivinen and M.K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1):1–64, January 1997.
Y. LeCun, L. Jackel, L. Bottou, A. Brunot, C. Cortes, J. Denker, H. Drucker, I. Guyon, U. Muller, E. Sackinger, P. Simard, and V. Vapnik. Comparison of learning algorithms for handwritten digit recognition, 1995.
Y. Li. and P. Long. The relaxed online maximum margin algorithm. Machine Learning, 2001.
N. Littlestone. Learning when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988.
N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of California Santa Cruz, 1989.
N. Littlestone and M.K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994.
J. Mercer. Functions of a positive and negative type and their connection with the threory of integral equations. Philosophical Transactions Royal Society London Ser. A.,209, 1909.
E.H. Moore. General Analysis. Part I. American Philosophical Society, Philadelphia, 1935.
J. Von Neumann. Functional Operators, Vol II. The Geometry of orthogonal spaces, volume 22. Princeton University Press, 1950.
A. Novikoff. On convergence proofs for perceptrons. In Proc. Sympos. Math. Theory of Automata (New York, 1962),pages 615–622. Polytechnic Press of Polytechnic Inst. of Brooklyn, Brooklyn, N.Y., 1963.
J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C.J.C. Burges,and A.J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 185–208, Cambridge, MA, 1999. MIT Press.
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psych. Rev., 65:386–407, 1958. (Reprinted in Neurocomputing (MIT Press, 1988).).
Walter Rudin. Real and Complex Analysis. McGraw-Hill, New York, 3edition, 1986.
G. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual variables. In Proc. 15th International Conf. on Machine Learning, pages 515–521. Morgan Kaufmann, San Francisco, CA, 1998.
John Shawe-Taylor and Nello Cristianini. Further results on the margin distribution. In Proc. 12th Annu. Conf. on Comput. Learning Theory, pages 278–285. ACM Press, New York, NY, 1999.
A. Smola. Large scale and online learning with kernels. Talk given Dec 5,2000 at Royal Holloway University, based on joint work with J. Kivinen, P. Wankadia, and R. Williamson.
V. Vapnik. Statistical Learning Theory. John Wiley, 1998.
V. Vapnik, S. Golowich, and A. Smola. Support vector method for function approximation, regression estimation,and signal processing. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 281–287, Cambridge, MA, 1997. MIT Press.
V.N. Vapnik and A.Y. Chervonenkis. Teoriya raspoznavaniya obrazov. Statisticheskie problemy obucheniya. [Theory of Pattern Recognition. Izdat. “Nauka”, Moscow, 1974.
V. Vovk. Aggregating strategies. In Proc. 3rd Annu. Workshop on Comput. Learning Theory, pages 371–383. Morgan Kaufmann, 1990.
V. Vovk. Derandomizing stochastic prediction strategies. In Proc. 10th Annu. Workshop on Comput. Learning Theory. ACM Press, New York, NY, 1997.
Volodya Vovk. Competitive on-line linear regression. In Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, editors, Advances in Neural Information Processing Systems, volume 10. The MIT Press, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Herbster, M. (2001). Learning Additive Models Online with Fast Evaluating Kernels. In: Helmbold, D., Williamson, B. (eds) Computational Learning Theory. COLT 2001. Lecture Notes in Computer Science(), vol 2111. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44581-1_29
Download citation
DOI: https://doi.org/10.1007/3-540-44581-1_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42343-0
Online ISBN: 978-3-540-44581-4
eBook Packages: Springer Book Archive