Learning Interpretable SVMs for Biological Sequence Classification

  • S. Sonnenburg
  • G. Rätsch
  • C. Schäfer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3500)


We propose novel algorithms for solving the so-called Support Vector Multiple Kernel Learning problem and show how they can be used to understand the resulting support vector decision function. While classical kernel-based algorithms (such as SVMs) are based on a single kernel, in Multiple Kernel Learning a quadratically-constraint quadratic program is solved in order to find a sparse convex combination of a set of support vector kernels. We show how this problem can be cast into a semi-infinite linear optimization problem which can in turn be solved efficiently using a boosting-like iterative method in combination with standard SVM optimization algorithms. The proposed method is able to deal with thousands of examples while combining hundreds of kernels within reasonable time.

In the second part we show how this technique can be used to understand the obtained decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We consider the problem of splice site identification and combine string kernels at different sequence positions and with various substring (oligomer) lengths. The proposed algorithm computes a sparse weighting over the length and the substring, highlighting which substrings are important for discrimination. Finally, we propose a bootstrap scheme in order to reliably identify a few statistically significant positions, which can then be used for further analysis such as consensus finding.


Support Vector Machine Multiple Kernel Learning String Kernel Weighted Degree Kernel Interpretation of SVM results Splice Site Prediction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernel learning, conic duality, and the SMO algorithm. In: Twenty-first international conference on Machine learning. ACM Press, New York (2004)Google Scholar
  2. 2.
    Bennett, K.P., Demiriz, A., Shawe-Taylor, J.: A column generation algorithm for boosting. In: Langley, P. (ed.) Proceedings, 17th ICML, pp. 65–72. Morgan Kaufmann, San Francisco (2000)Google Scholar
  3. 3.
    Boguski, M.S., Lowe, T.M., Tolstoshev, C.M.: dbEST–database for expressed sequence tags. Nat. Genet. 4(4), 332–333 (1993)CrossRefGoogle Scholar
  4. 4.
    Breiman, L.: Prediction games and arcing algorithms. Technical Report 504, Statistics Department, University of California (December 1997)Google Scholar
  5. 5.
    Cortes, C., Vapnik, V.N.: Support vector networks. Machine Learning 20, 273–297 (1995)zbMATHGoogle Scholar
  6. 6.
    Delcher, A.L., Harmon, D., Kasif, S., White, O., Salzberg, S.L.: Improved microbial gene identification with GLIMMER. Nucleic Acids Research 27(23), 4636–4641 (1999)CrossRefGoogle Scholar
  7. 7.
    Engel, Y., Mannor, S., Meir, R.: Sparse online greedy support vector regression. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 84–96. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: EuroCOLT: European Conference on Computational Learning Theory. LNCS. Springer, Heidelberg (1994)Google Scholar
  9. 9.
    Harris, T.W., et al.: Wormbase: a multi-species resource for nematode biology and genomics. Nucl. Acids Res. 32 (Database issue:D411-7) (2004)Google Scholar
  10. 10.
    Hettich, R., Kortanek, K.O.: Semi-infinite programming: Theory, methods and applications. SIAM Review 3, 380–429 (1993)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. J. Comput. Biol. 7(1-2), 95–114 (2000)CrossRefGoogle Scholar
  12. 12.
    Joachims, T.: Making large–scale SVM learning practical. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods — Support Vector Learning, pp. 169–184. MIT Press, Cambridge (1999)Google Scholar
  13. 13.
    Kent, W.J.: Blat–the blast-like alignment tool. Genome Res. 12(4), 656–664 (2002)MathSciNetGoogle Scholar
  14. 14.
    Kuang, R., Ie, E., Wang, K., Wang, K., Siddiqi, M., Freund, Y., Leslie, C.: Profile-based string kernels for remote homology detection and motif extraction. In: Computational Systems Bioinformatics Conference 2004, pp. 146–154 (2004)Google Scholar
  15. 15.
    Lanckriet, G.R.G., De Bie, T., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics (2004)Google Scholar
  16. 16.
    Lehmann, E.L.: Testing Statistical Hypotheses, 2nd edn. Springer, New York (1997)zbMATHGoogle Scholar
  17. 17.
    Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for SVM protein classification. In: Proceedings of the Pacific Symposium on Biocomputing, Kaua’i, Hawaii (2002)Google Scholar
  18. 18.
    Mood, A.M., Graybill, F.A., Boes, D.C.: Introduction to the Theory of Statistics, 3rd edn. McGraw-Hill, New York (1974)zbMATHGoogle Scholar
  19. 19.
    Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks 12(2), 181–201 (2001)CrossRefGoogle Scholar
  20. 20.
    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods — Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar
  21. 21.
    Rätsch, G.: Robust Boosting via Convex Optimization. PhD thesis, University of Potsdam, Computer Science Dept., August-Bebel-Str. 89, 14482 Potsdam, Germany (2001)Google Scholar
  22. 22.
    Rätsch, G., Demiriz, A., Bennett, K.: Sparse regression ensembles in infinite and finite hypothesis spaces. Machine Learning 48(1-3), 193–221 (2002); Special Issue on New Methods for Model Selection and Model Combination. Also NeuroCOLT2 Technical Report NC-TR-2000-085Google Scholar
  23. 23.
    Rätsch, G., Sonnenburg, S.: Accurate Splice Site Prediction for Caenorhabditis Elegans. MIT Press series on Computational Molecular Biology, pp. 277–298. MIT Press, Cambridge (2003)Google Scholar
  24. 24.
    Rätsch, G., Warmuth, M.K.: Marginal boosting. NeuroCOLT2 Technical Report 97, Royal Holloway College, London (July 2001)Google Scholar
  25. 25.
    Wheeler, D.L., et al.: Database resources of the national center for biotechnology. Nucl. Acids Res. 31, 33–38 (2003)CrossRefGoogle Scholar
  26. 26.
    Zhang, X.H., Heller, K.A., Hefter, I., Leslie, C.S., Chasin, L.A.: Sequence information for the splicing of human pre-mrna identified by support vector machine classification. Genome Res. 13(12), 637–650 (2003)Google Scholar
  27. 27.
    Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., Müller, K.-R.: Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites. BioInformatics 16(9), 799–807 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • S. Sonnenburg
    • 1
  • G. Rätsch
    • 2
  • C. Schäfer
    • 1
  1. 1.Fraunhofer Institute FIRSTBerlinGermany
  2. 2.Friedrich Miescher LabMax Planck SocietyTübingenGermany

Personalised recommendations