Algebraic Interpretations Towards Clustering Protein Homology Data

  • Fotis E. Psomopoulos
  • Pericles A. Mitkas
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 437)


The identification of meaningful groups of proteins has always been a principal goal in structural and functional genomics. A successful protein clustering can lead to significant insight, both in the evolutionary history of the respective molecules and in the identification of potential functions and interactions of novel sequences. In this work we propose a novel metric for distance evaluation, when applied to protein homology data. The metric is based on a matrix manipulation approach, defining the homology matrix as a form of block diagonal matrix. A first exploratory implementation of the overall process is shown to produce interesting results when using a well explored reference set of genomes. Near future steps include a thorough theoretical validation and comparison against similar approaches.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Williams, S.M., Moore, J.H.: Big Data analysis on autopilot? BioData Min. 6(1), 22 (2013)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. U.S.A. 96(6), 2896–2901 (1999)CrossRefGoogle Scholar
  3. 3.
    Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)CrossRefGoogle Scholar
  4. 4.
    Sarkar, A., Soueidan, H., Nikolski, M.: Identification of conserved gene clusters in multiple genomes based on synteny and homology. BMC Bioinformatics 12(suppl.9), S18 (2011)Google Scholar
  5. 5.
    Miele, V., Penel, S., Duret, L.: Ultra-fast sequence clustering from similarity networks with SiLiX. BMC Bioinformatics 12(1), 116 (2011)CrossRefGoogle Scholar
  6. 6.
    Röttger, R., Kalaghatgi, P., Sun, P., Soares, S.D.C., Azevedo, V., Wittkop, T., Baumbach, J.: Density parameter estimation for finding clusters of homologous proteins–tracing actinobacterial pathogenicity lifestyles. Bioinformatics 29(2), 215–222 (2013)CrossRefGoogle Scholar
  7. 7.
    Fouts, D.E., Brinkac, L., Beck, E., Inman, J., Sutton, G.: PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species. Nucleic Acids Res. 40(22), e172 (2012)Google Scholar
  8. 8.
    Bonet, J., Planas-Iglesias, J., Garcia-Garcia, J., Marín-López, M.A., Fernandez-Fuentes, N., Oliva, B.: ArchDB 2014: structural classification of loops in proteins. Nucleic Acids Res. 42(database issue), D315–D319 (2014)Google Scholar
  9. 9.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRefGoogle Scholar
  10. 10.
    Freilich, S., Goldovsky, L., Gottlieb, A., Blanc, E., Tsoka, S., Ouzounis, C.A.: Stratification of co-evolving genomic groups using ranked phylogenetic profiles. BMC Bioinformatics 10, 355 (2009)CrossRefGoogle Scholar
  11. 11.
    Psomopoulos, F.E., Mitkas, P.A., Ouzounis, C.A.: Detection of genomic idiosyncrasies using fuzzy phylogenetic profiles. PLoS One 8(1), e52854 (2013)Google Scholar
  12. 12.
    Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D., Clayton, R.A., et al.: The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403 (1995), doi:10.1126/science.270.5235.397CrossRefGoogle Scholar
  13. 13.
    Glass, J.I., Lefkowitz, E.J., Glass, J.S., Heiner, C.R., Chen, E.Y., et al.: The complete sequence of the mucosal pathogen Ureaplasma urealyticum. Nature 407, 757–762 (2000), doi:10.1038/35037619CrossRefGoogle Scholar
  14. 14.
    Ferretti, J.J., McShan, W.M., Ajdic, D., Savic, D.J., Savic, G., et al.: Complete genome sequence of an M1 strain of Streptococcus pyogenes. Proc. Natl. Acad. Sci. U.S.A. 98, 4658–4663 (2001), doi:10.1073/pnas.071559398CrossRefGoogle Scholar
  15. 15.
    Shigenobu, S., Watanabe, H., Hattori, M., Sakaki, Y., Ishikawa, H.: Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407, 81–86 (2000)CrossRefGoogle Scholar
  16. 16.
    Waters, E., Hohn, M.J., Ahel, I., Graham, D.E., Adams, M.D., et al.: The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc. Natl. Acad. Sci. U.S.A. 100, 12984–12988 (2003), doi:10.1073/pnas.1735403100CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2014

Authors and Affiliations

  • Fotis E. Psomopoulos
    • 1
  • Pericles A. Mitkas
    • 2
  1. 1.Center for Research and Technology HellasThessalonikiGreece
  2. 2.Dept. of Electrical and Computer EngineeringAristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations