SVM-Based Classification of Distant Proteins Using Hierarchical Motifs

  • Jérôme Mikolajczack
  • Gérard Ramstein
  • Yannick Jacques
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3177)


This article presents a discriminative approach to the protein classification in the particular case of remote homology. The protein family is modelled by a set M of motifs related to the physicochemical properties of the residues. We propose an algorithm for discovering motifs based on the ascending hierarchical classification paradigm. The set M defines a feature space of the sequences: each sequence is transformed into a vector that indicates the possible presence of the motifs belonging to M. We then use the SVM learning method to discriminate the target family. Our hierarchical motif set specifically modelises interleukins among all the structural families of the SCOP database. Our method yields a significantly better remote protein classification compared to spectrum kernel techniques.


Support Vector Machine Scop Database Support Vector Machine Algorithm String Kernel Linear Time Complexity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Vapnik, V.N.: The nature of statistical learning theory. Springer, Heidelberg (1995)zbMATHGoogle Scholar
  2. 2.
    Vapnik, V.N.: Statistical Learning Theory. Springer, Heidelberg (1998)zbMATHGoogle Scholar
  3. 3.
    Scholköpf, B., Guyon, I., Weston, J.: Statistical learning and kernel methods in bioinformatics. In: Frasconi, P., Shamir, R. (eds.) Artificial Intelligence and Heuristic Methods in Bioinformatics, pp. 1–21. IOS Press, Amsterdam (2003)Google Scholar
  4. 4.
    Jaakola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computationnal Biology 7(1-2), 95–114 (2000)CrossRefGoogle Scholar
  5. 5.
    Li, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionnary and structural relationships. Journal of Computationnal Biology 10(6), 857–868 (2003)CrossRefGoogle Scholar
  6. 6.
    Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the Pacific Biocomputing Symposium, pp. 564–575 (2002)Google Scholar
  7. 7.
    Leslie, C., Eskin, E., Zhou, D., Noble, W.S.: Mismatch String Kernel for SVM protein classification. Bioinformatics 20(4), 467–476 (2004)CrossRefGoogle Scholar
  8. 8.
    Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540, 581–586 (1995) ISBN 3-540-41066-XGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Jérôme Mikolajczack
    • 1
  • Gérard Ramstein
    • 2
  • Yannick Jacques
    • 1
  1. 1.Département de CancérologieInstitut de BiologieNantes Cedex
  2. 2.LINA Ecole polytechnique de l’Université de NantesNantes Cedex 3

Personalised recommendations