Advertisement

Model Selection in Kernel Methods Based on a Spectral Analysis of Label Information

  • Mikio L. Braun
  • Tilman Lange
  • Joachim M. Buhmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4174)

Abstract

We propose a novel method for addressing the model selection problem in the context of kernel methods. In contrast to existing methods which rely on hold-out testing or try to compensate for the optimism of the generalization error, our method is based on a structural analysis of the label information using the eigenstructure of the kernel matrix. In this setting, the label vector can be transformed into a representation in which the smooth information is easily discernible from the noise. This permits to estimate a cut-off dimension such that the leading coefficients in that representation contains the learnable information, discarding the noise. Based on this cut-off dimension, the regularization parameter is estimated for kernel ridge regression.

Keywords

Support Vector Machine Regularization Parameter Kernel Method Kernel Matrix Label Information 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Koltchinskii, V., Giné, E.: Random matrix approximation of spectra of integral operators. Bernoulli 6(1), 113–167 (2000)MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Taylor, J.S., Williams, C., Cristianini, N., Kandola, J.: On the eigenspectrum of the Gram matrix and the generalization error of kernel PCA. IEEE Transactions on Information Theory 51, 2510–2522 (2005)CrossRefGoogle Scholar
  3. 3.
    Blanchard, G.: Statistical properties of kernel principal component analysis. Machine Learning (2006)Google Scholar
  4. 4.
    Koltchinskii, V.I.: Asymptotics of spectral projections of some random matrices approximating integral operators. Progress in Probability 43, 191–227 (1998)MathSciNetGoogle Scholar
  5. 5.
    Zwald, L., Blanchard, G.: On the convergence of eigenspaces in kernel principal component analysis. In: NIPS 2005 (2005)Google Scholar
  6. 6.
    Braun, M.L.: Spectral Properties of the Kernel Matrix and their Application to Kernel Methods in Machine Learning. PhD thesis, University of Bonn, published electronically (2005), available at: http://hss.ulb.uni-bonn.de/diss_online/math_nat_fak/2005/braun_mikio
  7. 7.
    Vapnik, V.: Statistical Learning Theory. J. Wiley, Chichester (1998)Google Scholar
  8. 8.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)Google Scholar
  9. 9.
    Williams, C.K.I., Rasmussen, C.E.: Gaussian processes for regression. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems 8. MIT Press, Cambridge (1996)Google Scholar
  10. 10.
    Wahba, G.: Spline Models For Observational Data. Society for Industrial and Applied Mathematics (1990)Google Scholar
  11. 11.
    Rätsch, G., Onoda, T., Müller, K.R.: Soft margins for AdaBoost. Machine Learning 42(3), 287–320 (2001); Also NeuroCOLT Technical Report NC-TR-1998-021MATHCrossRefGoogle Scholar
  12. 12.
    Golub, G., Heath, M., Wahba, G.: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21, 215–224 (1979)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Mikio L. Braun
    • 1
  • Tilman Lange
    • 2
  • Joachim M. Buhmann
    • 2
  1. 1.Fraunhofer Institute FIRST, Intelligent Data Analysis GroupBerlinGermany
  2. 2.Institute of Computational ScienceETH ZurichZurichSwitzerland

Personalised recommendations