Advertisement

A Kernel Approach for Learning from Almost Orthogonal Patterns

  • Bernhard Schölkopf
  • Jason Weston
  • Eleazar Eskin
  • Christina Leslie
  • William Stafford Noble
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2431)

Abstract

In kernel methods, all the information about the training data is contained in the Gram matrix. If this matrix has large diagonal values, which arises for many types of kernels, then kernel methods do not perform well. We propose and test several methods for dealing with this problem by reducing the dynamic range of the matrix while preserving the positive definiteness of the Hessian of the quadratic programming problem that one has to solve when training a Support Vector Machine.

Keywords

Support Vector Machine Feature Space Kernel Method Hide Variable Functional Calculus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Bibliography

  1. A. A. Alizadeh et al. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403:503–511, 2000. Data available from http://llmpp.nih.gov/lymphoma.CrossRefGoogle Scholar
  2. U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, and A. Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues probed by oligonucleotide arrays. Cell Biology, 96:6745–6750, 1999.Google Scholar
  3. C. Berg, J. P. R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups. Springer-Verlag, New York, 1984.zbMATHGoogle Scholar
  4. B. E. Boser, I. M. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, Pittsburgh, PA, July 1992. ACM Press.Google Scholar
  5. M. P. S. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. Sugnet, T. S. Furey, M. Ares, and D. Haussler. Knowledge-based analysis of microarray gene expression data using support vector machines. Proceedings of the National Academy of Sciences, 97(1):262–267, 2000.CrossRefGoogle Scholar
  6. C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20: 273–297, 1995.zbMATHGoogle Scholar
  7. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 2001.Google Scholar
  8. D. Haussler. Convolutional kernels on discrete structures. Technical Report UCSC-CRL-99-10, Computer Science Department, University of California at Santa Cruz, 1999.Google Scholar
  9. T. S. Jaakkola, M. Diekhans, and D. Haussler. A discriminative framework for detecting remote protein homologies. Journal of Computational Biology, 7: 95–114, 2000.CrossRefGoogle Scholar
  10. T. S. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11, Cambridge, MA, 1999. MIT Press.Google Scholar
  11. C. Leslie, E. Eskin, and W. S. Noble. The spectrum kernel: A string kernel for SVM protein classification. Proceedings of the Pacific Symposium on Biocomputing, 2002. To appear.Google Scholar
  12. L. Liao and W. S. Noble. Combining pairwise sequence similarity and support vector machines for remote protein homology detection. Proceedings of the Sixth International Conference on Computational Molecular Biology, 2002.Google Scholar
  13. H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string kernels. Journal of Machine Learning Research, 2: 419–444, 2002.zbMATHCrossRefGoogle Scholar
  14. A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia. SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, pages 247:536–540, 1995.CrossRefGoogle Scholar
  15. E. Osuna and F. Girosi. Reducing the run-time complexity in support vector machines. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 271–284, Cambridge, MA, 1999. MIT Press.Google Scholar
  16. B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.Google Scholar
  17. K. Tsuda. Support vector classifier with asymmetric kernel function. In M. Verleysen, editor, Proceedings ESANN, pages 183–188, Brussels, 1999. D Facto.Google Scholar
  18. K. Tsuda, M. Kawanabe, G. Rätsch, S. Sonnenburg, and K.R. Müller. A new discriminative kernel from probabilistic models. In T.G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, volume 14. MIT Press, 2002. To appear.Google Scholar
  19. V. Vapnik. Estimation of Dependences Based on Empirical Data [in Russian]. Nauka, Moscow, 1979. (English translation: Springer Verlag, New York, 1982).Google Scholar
  20. V. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998.zbMATHGoogle Scholar
  21. C. Watkins. Dynamic alignment kernels. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 39–50, Cambridge, MA, 2000. MIT Press.Google Scholar
  22. J. Weston, A. Elisseeff, and B. Schölkopf. Use of the l0-norm with linear models and kernel methods. Biowulf Technical report, 2001. http://www.conclu.de/~jason/.
  23. J. Weston, F. Pérez-Cruz, O. Bousquet, O. Chapelle, A. Elisseeff, and B. Schölkopf. Feature selection and transduction for prediction of molecular bioactivity for drug design, 2002. http://www.conclu.de/~jason/kdd/kdd.html.
  24. J. Weston and B. Schölkopf. Dealing with large diagonals in kernel matrices. In New Trends in Optimization and Computational algorithms (NTOC 2001), Kyoto, Japan, 2001.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Bernhard Schölkopf
    • 1
  • Jason Weston
    • 1
  • Eleazar Eskin
    • 2
  • Christina Leslie
    • 2
  • William Stafford Noble
    • 2
    • 3
  1. 1.Max-Planck-Institut für biologische KybernetikTübingenGermany
  2. 2.Department of Computer ScienceColumbia UniversityNew York
  3. 3.Columbia Genome CenterColumbia UniversityNew York

Personalised recommendations