Advertisement

Dealing with large diagonals in kernel matrices

  • Jason Weston
  • Bernhard Schölkopf
  • Eleazar Eskin
  • Christina Leslie
  • William Stafford Noble
Special Section on New Trends in Statistical Information Processing

Abstract

In kernel methods, all the information about the training data is contained in the Gram matrix. If this matrix has large diagonal values, which arises for many types of kernels, then kernel methods do not perform well: We propose and test several methods for dealing with this problem by reducing the dynamic range of the matrix while preserving the positive definiteness of the Hessian of the quadratic programming problem that one has to solve when training a Support Vector Machine, which is a common kernel approach for pattern recognition.

Key words and phrases

Kernel methods Support Vector Machines pattern recognition bioinformatics microarray data analysis transduction regularization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alizadeh, A. A.et al (2000). Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling,Nature,403, 503–511 (Data available from http://llmpp.nih.gov/lymphoma)CrossRefGoogle Scholar
  2. Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D. and Levine, A. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues probed by oligonucleotide arrays,Cell Biology,96, 6745–6750.Google Scholar
  3. Berg, C., Christensen, J. P. R. and Ressel, P. (1984).Harmonic Analysis on Semigroups, Springer, New York.zbMATHGoogle Scholar
  4. Boser, B. E., Guyon, I. M. and Vapnik, V. (1992). A training algorithm for optimal margin classifiers (ed. D. Haussler),Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, 144–152, ACM Press, Pittsburgh, Pensylvania.CrossRefGoogle Scholar
  5. Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C., Furey, T. S., Ares, M. and Haussler, D. (2000). Knowledge-based analysis of microarray gene expression data using support vector machines,Proc. Nat. Acad. Sci. U.S.A.,97(1), 262–267.CrossRefGoogle Scholar
  6. Cortes, C. and Vapnik, V. (1995). Support vector networks,Machine Learning,20, 273–297.Google Scholar
  7. Guyon, I., Weston, J., Barnhill, S. and Vapnik, V. (2002). Gene selection for cancer classification using support vector machines,Machine Learning,46, 389–422.CrossRefGoogle Scholar
  8. Hastie, T. J. and Tibshirani, R. J. (1990).Generalized Additive Models, Monographs on Statistics and Applied Probability, Vol. 43, Chapman & Hall, London.zbMATHGoogle Scholar
  9. Haussler, D. (1999). Convolutional kernels on discrete structures, Tech. Report, UCSC-CRL-99-10, Computer Science Department, University of California at Santa Cruz.Google Scholar
  10. Jaakkola, T. S. and Haussler, D. (1999). Exploiting generative models in discriminative classifiers (eds. M. S. Kearns, S. A. Solla and D. A. Cohn),Advances in Neural Information Processing Systems 11, MIT Press, Cambridge, Massachusetts.Google Scholar
  11. Jaakkola, T. S., Diekhans, M. and Haussler, D. (2000). A discriminative framework for detecting remote protein homologies,Journal of Computational Biology,7, 95–114.CrossRefGoogle Scholar
  12. Leslie, C., Eskin, E. and Noble, W. S. (2002). The spectrum kernel: A string kernel for SVM protein classification,Proceedings of the Pacific Symposium on Biocomputing, 564–575.Google Scholar
  13. Liao, L. and Noble, W. S. (2002). Combining pairwise sequence similarity and support vector machines for remote protein homology detection,Proceedings of the Sixth International Conference on Computational Molecular Biology.Google Scholar
  14. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N. and Watkins, C. (2002). Text classification using string kernels,Journal of Machine Learning Research,2, 419–444.CrossRefGoogle Scholar
  15. Murzin, A. G., Brenner, S. E., Hubbard, T. and Chothia, C. (1995). SCOP: A structural classification of proteins database for the investigation of sequences and structures,Journal of Molecular Biology,247, 536–540.CrossRefGoogle Scholar
  16. Schölkopf, B. and Smola, A. J. (2002).Learning with Kernels, MIT Press, Cambridge, Massachusetts.Google Scholar
  17. Schölkopf, B., Weston, J., Eskin, E., Leslie, C. and Noble, W. S. (2002). A kernel approach for learning from almost orthogonal patterns,Proceedings ECML’2002, Helsinki (to appear).Google Scholar
  18. Tsuda, K. (1999). Support vector classifier with asymmetric kernel function (ed. M. Verleysen),Proceedings ESANN, 183–188, D Facto, Brussels.Google Scholar
  19. Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S. and Müller, K. (2002). A new discriminative kernel from probabilistic models (eds. t. Dietterich, S. Becker and Z. Ghahramani).Advances in Neural Information Processing Systems,14, MIT Press, Cambridge, Massachusetts.Google Scholar
  20. Vapnik, V. (1979).Estimation of Dependences Based on Empirical Data, Nauka, Moscow (in Russian) (English translation: Springer Verlag, New York 1982).Google Scholar
  21. Vapnik, V. (1998).Statistical Learning Theory, Wiley, New York.zbMATHGoogle Scholar
  22. Watkins, C. (2000). Dynamic alignment kernels (eds. A. J. Smola, P. L. Bartlett, B. Schölkopf and D. Schuurmans),Advances in Large Margin Classifiers, 39–50, MIT Press, Cambridge, Massachusetts.Google Scholar
  23. Weston, J., Elisseff, A. and Schölkopf, B. (2001). Use of the ℓ0 with linear models and kernel methods, Tech. Report, Biowulf Technologies, New York.Google Scholar
  24. Weston, J., Pérez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff A. and Schölkopf, B. (2002). Feature selection and transduction for prediction of molecular bioactivity for drug design, http://www. conclu.de/≈jason/kdd/kdd.htmlGoogle Scholar

Copyright information

© The Institute of Statistical Mathematics 2003

Authors and Affiliations

  • Jason Weston
    • 1
  • Bernhard Schölkopf
    • 1
  • Eleazar Eskin
    • 2
  • Christina Leslie
    • 2
    • 3
  • William Stafford Noble
    • 4
  1. 1.Max-Planck-Institut für biologische KybernetikTübingenGermany
  2. 2.Department of Computer ScienceColumbia UniversityNew YorkUSA
  3. 3.Center for Computational Biology (C2B2), Russ Berrie PavilionColumbia UniversityNew YorkUSA
  4. 4.Department of Genome SciencesUniversity of Washington, Health Sciences CenterSeattleUSA

Personalised recommendations