Opening the Black Box: Revealing Interpretable Sequence Motifs in Kernel-Based Learning Algorithms

  • Marina M.-C. Vidovic
  • Nico Görnitz
  • Klaus-Robert Müller
  • Gunnar Rätsch
  • Marius Kloft
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9285)


This work is in the context of kernel-based learning algorithms for sequence data. We present a probabilistic approach to automatically extract, from the output of such string-kernel-based learning algorithms, the subsequences—or motifs—truly underlying the machine’s predictions. The proposed framework views motifs as free parameters in a probabilistic model, which is solved through a global optimization approach. In contrast to prevalent approaches, the proposed method can discover even difficult, long motifs, and could be combined with any kernel-based learning algorithm that is based on an adequate sequence kernel. We show that, by using a discriminate kernel machine such as a support vector machine, the approach can reveal discriminative motifs underlying the kernel predictor. We demonstrate the efficacy of our approach through a series of experiments on synthetic and real data, including problems from handwritten digit recognition and a large-scale human splice site data set from the domain of computational biology.


Support Vector Machine Multiple Kernel Learning Handwritten Digit Hilbert Curve Candidate Motif 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abeel, T., de Peer, Y.V., Saeys, Y.: Towards a gold standard for promoter prediction evaluation. Bioinformatics (2009)Google Scholar
  2. 2.
    Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE (2015)Google Scholar
  3. 3.
    Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Müller, K.R.: How to explain individual classification decisions. JMLR 11, 1803–1831 (2010)Google Scholar
  4. 4.
    Ben-Hur, A., Ong, C.S., Sonnenburg, S., Schölkopf, B., Rätsch, G.: Support vector machines and kernels for computational biology. PLoS Comput Biology 4(10), e1000173 (2008). CrossRefGoogle Scholar
  5. 5.
    Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) COLT. pp. 144–152. ACM (1992)Google Scholar
  6. 6.
    Chung, K.L., Huang, Y.L., Liu, Y.W.: Efficient algorithms for coding hilbert curve of arbitrary-sized image and application to window query. Information Sciences 177(10), 2130–2151 (2007)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)Google Scholar
  8. 8.
    Crooks, G., Hon, G., Chandonia, J., Brenner, S.: Weblogo: A sequence logo generator. Genome Research 14, 1188–1190 (2004)CrossRefGoogle Scholar
  9. 9.
    Dafner, R., Cohen-Or, D., Matias, Y.: Context-based space filling curves. In: Computer Graphics Forum, vol. 19, pp. 209–218. Wiley Online Library (2000)Google Scholar
  10. 10.
    Goernitz, N., Braun, M., Kloft, M.: Hidden markov anomaly detection. In: Proceedings of The 32nd International Conference on Machine Learning, pp. 1833–1842 (2015)Google Scholar
  11. 11.
    Görnitz, N., Kloft, M., Brefeld, U.: Active and semi-supervised data domain description. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part I. LNCS, vol. 5781, pp. 407–422. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  12. 12.
    Görnitz, N., Kloft, M., Rieck, K., Brefeld, U.: Active learning for network intrusion detection. In: AISEC, p. 47. ACM Press (2009)Google Scholar
  13. 13.
    Görnitz, N., Kloft, M.M., Rieck, K., Brefeld, U.: Toward supervised anomaly detection. Journal of Artificial Intelligence Research (2013)Google Scholar
  14. 14.
    Hansen, K., Baehrens, D., Schroeter, T., Rupp, M., Müller, K.R.: Visual interpretation of kernel-based prediction models. Molecular Informatics 30(9), September 2011. WILEY-VCH VerlagGoogle Scholar
  15. 15.
    Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., Tibshirani, R.: The elements of statistical learning, vol. 2. Springer (2009)Google Scholar
  16. 16.
    Hull, J.J.: A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(5), 550–554 (1994)CrossRefGoogle Scholar
  17. 17.
    Kloft, M., Brefeld, U., Sonnenburg, S., Zien, A.: lp-Norm Multiple Kernel Learning. JMLR 12, 953–997 (2011)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Kloft, M., Brefeld, U., Düessel, P., Gehl, C., Laskov, P.: Automatic feature selection for anomaly detection. In: Proceedings of the 1st ACM Workshop on AISec, pp. 71–76. ACM (2008)Google Scholar
  19. 19.
    Kloft, M., Brefeld, U., Sonnenburg, S., Laskov, P., Müller, K.R., Zien, A.: Efficient and accurate lp-norm multiple kernel learning. Advances in Neural Information Processing Systems 22(22), 997–1005 (2009)zbMATHGoogle Scholar
  20. 20.
    Kloft, M., Laskov, P.: Online anomaly detection under adversarial impact. In: AISTATS, pp. 405–412 (2010)Google Scholar
  21. 21.
    Kloft, M., Rückert, U., Bartlett, P.: A unifying view of multiple kernel learning. Machine Learning and Knowledge Discovery in Databases pp. 66–81 (2010)Google Scholar
  22. 22.
    Leslie, C.S., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for svm protein classification. In: Pacific Symposium on Biocomputing, pp. 566–575 (2002)Google Scholar
  23. 23.
    Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(3), 503–528 (1989). MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of machine learning. MIT press (2012)Google Scholar
  25. 25.
    Montavon, G., Braun, M.L., Krueger, T., Müller, K.R.: Analyzing local structure in kernel-based learning: Explanation, complexity and reliability assessment. Signal Processing Magazine, IEEE 30(4), 62–74 (2013)CrossRefGoogle Scholar
  26. 26.
    Müller, K.R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks 12(2), 181–201 (2001). CrossRefGoogle Scholar
  27. 27.
    Rätsch, G., Sonnenburg, S., Srinivasan, J., Witte, H., Müller, K.R., Sommer, R.J., Schölkopf, B.: Improving the caenorhabditis elegans genome annotation using machine learning. PLoS Comput. Biol. 3(2), e20 (2007)CrossRefGoogle Scholar
  28. 28.
    Rätsch, G., Sonnenburg, S.: Accurate splice site prediction for caenorhabditis elegans. Kernel Methods in Computational Biology, 277–298 (2004). MIT Press series on Computational Molecular Biology, MIT PressGoogle Scholar
  29. 29.
    Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W., Lenhard, B.: Jaspar: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research 32(Database–Issue), 91–94 (2004)CrossRefGoogle Scholar
  30. 30.
    Sandelin, A., Höglund, A., Lenhardd, B., Wasserman, W.W.: Integrated analysis of yeast regulatory sequences for biologically linked clusters of genes. Functional & Integrative Genomics 3(3), 125–134 (2003)CrossRefzbMATHGoogle Scholar
  31. 31.
    Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
  32. 32.
    Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)CrossRefGoogle Scholar
  33. 33.
    Sonnenburg, S., Rätsch, G., Henschel, S., Widmer, C., Behr, J., Zien, A., Bona, F.D., Binder, A., Gehl, C., Franc, V.: The SHOGUN machine learning toolbox. Journal of Machine Learning Research 11, 1799–1802 (2010)zbMATHGoogle Scholar
  34. 34.
    Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. Journal of Machine Learning Research 7, 1531–1565 (2006)Google Scholar
  35. 35.
    Sonnenburg, S., Zien, A., Philips, P., Rätsch, G.: POIMs: positional oligomer importance matrices – understanding support vector machine based signal detectors. Bioinformatics (2008). (received the Outstanding Student Paper Award at ISMB 2008)Google Scholar
  36. 36.
    Sonnenburg, S., Franc, V.: Coffin: a computational framework for linear SVMs. In: ICML, pp. 999–1006 (2010)Google Scholar
  37. 37.
    Sonnenburg, S., Schweikert, G., Philips, P., Behr, J., Rätsch, G.: Accurate Splice Site Prediction. BMC Bioinformatics, Special Issue from NIPS workshop on New Problems and Methods in Computational Biology Whistler, Canada, December 18, 2006, vol. 8(Suppl. 10), p. S7, December 2007Google Scholar
  38. 38.
    Sonnenburg, S., Zien, A., Rätsch, G.: ARTS: Accurate Recognition of Transcription Starts in Human. Bioinformatics 22(14), e472–480 (2006)CrossRefGoogle Scholar
  39. 39.
    Zeller, G., Goernitz, N., Kahles, A., Behr, J., Mudrakarta, P., Sonnenburg, S., Raetsch, G.: mtim: rapid and accurate transcript reconstruction from rna-seq data. arXiv preprint arXiv:1309.5211 (2013)
  40. 40.
    Zien, A., Philips, P., Sonnenburg, S.: Computing Positional Oligomer Importance Matrices (POIMs). Research Report; Electronic Publication 2, Fraunhofer Institute FIRST, December 2007Google Scholar
  41. 41.
    Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., Müller, K.R.: Engineering support vector machine kernels that recognize translation initiation sites in DNA. BioInformatics 16(9), 799–807 (2000)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Berlin Institute of TechnologyBerlinGermany
  2. 2.Department of Brain and Cognitive EngineeringKorea UniversitySeoulRepublic of Korea
  3. 3.Memorial Sloan-Kettering Cancer CenterNew YorkUSA
  4. 4.Humboldt University of BerlinBerlinGermany

Personalised recommendations