Kernelization of Matrix Updates, When and How?

  • Manfred K. Warmuth
  • Wojciech Kotłowski
  • Shuisheng Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7568)


We define what it means for a learning algorithm to be kernelizable in the case when the instances are vectors, asymmetric matrices and symmetric matrices, respectively. We can characterize kernelizability in terms of an invariance of the algorithm to certain orthogonal transformations. If we assume that the algorithm’s action relies on a linear prediction, then we can show that in each case the linear parameter vector must be a certain linear combination of the instances. We give a number of examples of how to apply our methods. In particular we show how to kernelize multiplicative updates for symmetric instance matrices.


Kernelization multiplicative updates rotational invariance 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abernethy, J., Bach, F., Evgeniou, T., Vert, J.P.: A new approach to collaborative filtering: Operator estimation with spectral regularization. Journal of Machine Learning 10, 803–826 (2009)zbMATHGoogle Scholar
  2. 2.
    Argyriou, A., Micchelli, C.A., Pontil, M.: When is there a representer theorem? vector versus matrix regularizers. Journal of Machine Learning Research 10, 2507–2529 (2009)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Azoury, K., Warmuth, M.K.: Relative loss bounds for on-line density estimation with the exponential family of distributions. Journal of Machine Learning 43(3), 211–246 (2001)zbMATHCrossRefGoogle Scholar
  4. 4.
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proc. 5th Annual ACM Workshop on Comput. Learning Theory, pp. 144–152. ACM Press, New York (1992)CrossRefGoogle Scholar
  5. 5.
    Cavallanti, G., Cesa-Bianchi, N., Gentile, C.: Linear algorithms for online multitask classification. In: Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008), pp. 251–262 (July 2008)Google Scholar
  6. 6.
    Forster, J.: On Relative Loss Bounds in Generalized Linear Regression. In: Ciobanu, G., Păun, G. (eds.) FCT 1999. LNCS, vol. 1684, pp. 269–280. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  7. 7.
    Herbster, M., Warmuth, M.K.: Tracking the best linear predictor. Journal of Machine Learning Research 1, 281–309 (2001)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Kimeldorf, G.S., Wahba, G.: Some results on Tchebycheffian spline functions. J. Math. Anal. Applic. 33, 82–95 (1971)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Kuzmin, D., Warmuth, M.K.: Online Kernel PCA with entropic matrix updates. In: Proceedings of the 24th International Conference on Machine Learning (ICML 2007). ACM International Conference Proceedings Series, pp. 465–471 (June 2007)Google Scholar
  10. 10.
    Mika, S., Ratsch, G., Weston, J., Schölkopf, B., Mullers, K.R.: Fisher discriminant analysis with kernels. In: Proc. NNSP 1999. IEEE Signal Processing Society Workshop, pp. 41–48 (1999)Google Scholar
  11. 11.
    Schölkopf, B., Herbrich, R., Smola, A.J.: A Generalized Representer Theorem. In: Helmbold, D.P., Williamson, B. (eds.) COLT/EuroCOLT 2001. LNCS (LNAI), vol. 2111, pp. 416–426. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  12. 12.
    Schölkopf, B., Smola, A.J., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)CrossRefGoogle Scholar
  13. 13.
    Srebro, N., Sridharan, K., Tewari, A.: On the universality of online mirror descent. In: Advances in Neural Information Processing Systems 23 (NIPS 2011), pp. 2645–2653 (2011)Google Scholar
  14. 14.
    Tsuda, K., Rätsch, G., Warmuth, M.K.: Matrix exponentiated gradient updates for on-line learning and Bregman projections. Journal of Machine Learning Research 6, 995–1018 (2005)zbMATHGoogle Scholar
  15. 15.
    Vovk, V.: Competitive on-line statistics. International Statistical Review 69, 213–248 (2001)zbMATHCrossRefGoogle Scholar
  16. 16.
    Warmuth, M.K.: Winnowing subspaces. In: Proceedings of the 24th International Conference on Machine Learning (ICML 2007), ACM Press (June 2007)Google Scholar
  17. 17.
    Warmuth, M.K., Kuzmin, D.: Randomized PCA algorithms with regret bounds that are logarithmic in the dimension. Journal of Machine Learning Research 9, 2217–2250 (2008)MathSciNetGoogle Scholar
  18. 18.
    Warmuth, M.K., Vishwanathan, S.V.N.: Leaving the Span. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 366–381. Springer, Heidelberg (2005); Journal version in progressCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Manfred K. Warmuth
    • 1
  • Wojciech Kotłowski
    • 2
  • Shuisheng Zhou
    • 3
  1. 1.Department of Computer ScienceUniversity of CaliforniaSanta CruzUSA
  2. 2.Institute of Computing SciencePoznań University of TechnologyPoland
  3. 3.School of ScienceXidian UniversityXianChina

Personalised recommendations