An Identity for Kernel Ridge Regression

  • Fedor Zhdanov
  • Yuri Kalnishkan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6331)


This paper provides a probabilistic derivation of an identity connecting the square loss of ridge regression in on-line mode with the loss of a retrospectively best regressor. Some corollaries of the identity providing upper bounds for the cumulative loss of on-line ridge regression are also discussed.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aronszajn, N.: La théorie des noyaux reproduisants et ses applications. Première partie. Proceedings of the Cambridge Philosophical Society 39, 133–153 (1943)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Azoury, K.S., Warmuth, M.K.: Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning 43, 211–246 (2001)MATHCrossRefGoogle Scholar
  3. 3.
    Beckenbach, E.F., Bellman, R.E.: Inequalities. Springer, Heidelberg (1961)Google Scholar
  4. 4.
    Busuttil, S., Kalnishkan, Y.: Online regression competitive with changing predictors. In: Proceedings of Algorithmic Learning Theory, 18th International Conference, pp. 181–195 (2007)Google Scholar
  5. 5.
    Cesa-Bianchi, N., Long, P., Warmuth, M.K.: Worst-case quadratic loss bounds for on-line prediction of linear functions by gradient descent. IEEE Transactions on Neural Networks 7, 604–619 (1996)CrossRefGoogle Scholar
  6. 6.
    Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)MATHCrossRefGoogle Scholar
  7. 7.
    Henderson, H.V., Searle, S.R.: On deriving the inverse of a sum of matrices. SIAM Review 23(1) (1981)Google Scholar
  8. 8.
    Herbster, M., Warmuth, M.K.: Tracking the best linear predictor. Journal of Machine Learning Research 1, 281–309 (2001)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Hoerl, A.E.: Application of ridge analysis to regression problems. Chemical Engineering Progress 58, 54–59 (1962)Google Scholar
  10. 10.
    Kakade, S.M., Seeger, M.W., Foster, D.P.: Worst-case bounds for Gaussian process models. In: Proceedings of the 19th Annual Conference on Neural Information Processing Systems (2005)Google Scholar
  11. 11.
    Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gradient descent for linear predictors. Infornation and Computation 132(1), 1–63 (1997)MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Kumon, M., Takemura, A., Takeuchi, K.: Sequential optimizing strategy in multi-dimensional bounded forecasting games. CoRR abs/0911.3933v1 (2009)Google Scholar
  13. 13.
    Lamperti, J.: Stochastic Processes: A Survey of the Mathematical Theory. Springer, Heidelberg (1977)MATHGoogle Scholar
  14. 14.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)MATHGoogle Scholar
  15. 15.
    Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proceedings of the 15th International Conference on Machine Learning, pp. 515–521 (1998)Google Scholar
  16. 16.
    Seeger, M.W., Kakade, S.M., Foster, D.P.: Information consistency of nonparametric Gaussian process methods. IEEE Transactions on Information Theory 54(5), 2376–2382 (2008)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Vovk, V.: Competitive on-line statistics. International Statistical Review 69(2), 213–248 (2001)MATHCrossRefGoogle Scholar
  18. 18.
    Zhdanov, F., Vovk, V.: Competing with gaussian linear experts. CoRR abs/0910.4683 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Fedor Zhdanov
    • 1
  • Yuri Kalnishkan
    • 1
  1. 1.Computer Learning Research Centre and Department of Computer ScienceRoyal Holloway, University of LondonEgham, SurreyUnited Kingdom

Personalised recommendations