Partitional Clustering of Protein Sequences – An Inductive Logic Programming Approach

  • Nuno A. Fonseca
  • Vitor Santos Costa
  • Rui Camacho
  • Cristina Vieira
  • Jorge Vieira
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5518)

Abstract

We present a novel approach to cluster sets of protein sequences, based on Inductive Logic Programming (ILP). Preliminary results show that the method proposed produces understandable descriptions/explanations of the clusters. Furthermore, it can be used as a knowledge elicitation tool to explain clusters proposed by other clustering approaches, such as standard phylogenetic programs.

Keywords

Clustering Inductive Logic Programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zelezný, F., Lavrač, N.: Propositionalization-based relational subgroup discovery with rsd. Machine Learning 62(1-2), 33–63 (2006)CrossRefGoogle Scholar
  2. 2.
    Fonseca, N.A., Camacho, R., Rocha, R., Costa, V.S.: Compile the hypothesis space: do it once, use it often. Fundamenta Informaticae, Special Issue on Multi-Relational Data Mining (89), 45–67 (2008)MATHGoogle Scholar
  3. 3.
    Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)CrossRefGoogle Scholar
  4. 4.
    Hand, D.J., Smyth, P., Mannila, H.: Principles of data mining. MIT Press, Cambridge (2001)Google Scholar
  5. 5.
    Rice, P., Longden, I., Bleasby, A.: EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics 6(16), 276–277 (2000)CrossRefGoogle Scholar
  6. 6.
    Pereira, P., Fonseca, N.A., Silva, F.: Fast Discovery of Statistically Interesting Words. Technical Report DCC-2007-01, DCC-FC & LIACC, Universidade do Porto (2007)Google Scholar
  7. 7.
    Conesa, A., Götz, S., García-Gómez, J.M., Terol, J., Talón, M., Robles, M.: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18), 3674–3676 (2005)CrossRefGoogle Scholar
  8. 8.
    Ronquist, F., Huelsenbeck, J.P.: Mrbayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12), 1572–1574 (2003)CrossRefGoogle Scholar
  9. 9.
    Notredame, C., Higgins, D.G., Heringa, J.: T-coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Nuno A. Fonseca
    • 1
    • 2
  • Vitor Santos Costa
    • 2
  • Rui Camacho
    • 3
  • Cristina Vieira
    • 1
  • Jorge Vieira
    • 1
  1. 1.Instituto de Biologia Molecular e Celular (IBMC)Universidade do PortoPortoPortugal
  2. 2.CRACS-INESC Porto LA, Universidade do PortoPortoPortugal
  3. 3.LIAAD-INESC Porto LA & FEUP, Universidade do PortoPortoPortugal

Personalised recommendations