Advertisement

Partitional Clustering of Protein Sequences – An Inductive Logic Programming Approach

Conference paper
  • 2k Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5518)

Abstract

We present a novel approach to cluster sets of protein sequences, based on Inductive Logic Programming (ILP). Preliminary results show that the method proposed produces understandable descriptions/explanations of the clusters. Furthermore, it can be used as a knowledge elicitation tool to explain clusters proposed by other clustering approaches, such as standard phylogenetic programs.

Keywords

Clustering Inductive Logic Programming 

References

  1. 1.
    Zelezný, F., Lavrač, N.: Propositionalization-based relational subgroup discovery with rsd. Machine Learning 62(1-2), 33–63 (2006)CrossRefGoogle Scholar
  2. 2.
    Fonseca, N.A., Camacho, R., Rocha, R., Costa, V.S.: Compile the hypothesis space: do it once, use it often. Fundamenta Informaticae, Special Issue on Multi-Relational Data Mining (89), 45–67 (2008)zbMATHGoogle Scholar
  3. 3.
    Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)CrossRefGoogle Scholar
  4. 4.
    Hand, D.J., Smyth, P., Mannila, H.: Principles of data mining. MIT Press, Cambridge (2001)Google Scholar
  5. 5.
    Rice, P., Longden, I., Bleasby, A.: EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics 6(16), 276–277 (2000)CrossRefGoogle Scholar
  6. 6.
    Pereira, P., Fonseca, N.A., Silva, F.: Fast Discovery of Statistically Interesting Words. Technical Report DCC-2007-01, DCC-FC & LIACC, Universidade do Porto (2007)Google Scholar
  7. 7.
    Conesa, A., Götz, S., García-Gómez, J.M., Terol, J., Talón, M., Robles, M.: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18), 3674–3676 (2005)CrossRefGoogle Scholar
  8. 8.
    Ronquist, F., Huelsenbeck, J.P.: Mrbayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12), 1572–1574 (2003)CrossRefGoogle Scholar
  9. 9.
    Notredame, C., Higgins, D.G., Heringa, J.: T-coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  1. 1.Instituto de Biologia Molecular e Celular (IBMC)Universidade do PortoPortoPortugal
  2. 2.CRACS-INESC Porto LA, Universidade do PortoPortoPortugal
  3. 3.LIAAD-INESC Porto LA & FEUP, Universidade do PortoPortoPortugal

Personalised recommendations