Abstract
The identification of meaningful groups of proteins has always been a principal goal in structural and functional genomics. A successful protein clustering can lead to significant insight, both in the evolutionary history of the respective molecules and in the identification of potential functions and interactions of novel sequences. In this work we propose a novel metric for distance evaluation, when applied to protein homology data. The metric is based on a matrix manipulation approach, defining the homology matrix as a form of block diagonal matrix. A first exploratory implementation of the overall process is shown to produce interesting results when using a well explored reference set of genomes. Near future steps include a thorough theoretical validation and comparison against similar approaches.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Williams, S.M., Moore, J.H.: Big Data analysis on autopilot? BioData Min. 6(1), 22 (2013)
Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. U.S.A. 96(6), 2896–2901 (1999)
Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)
Sarkar, A., Soueidan, H., Nikolski, M.: Identification of conserved gene clusters in multiple genomes based on synteny and homology. BMC Bioinformatics 12(suppl.9), S18 (2011)
Miele, V., Penel, S., Duret, L.: Ultra-fast sequence clustering from similarity networks with SiLiX. BMC Bioinformatics 12(1), 116 (2011)
Röttger, R., Kalaghatgi, P., Sun, P., Soares, S.D.C., Azevedo, V., Wittkop, T., Baumbach, J.: Density parameter estimation for finding clusters of homologous proteins–tracing actinobacterial pathogenicity lifestyles. Bioinformatics 29(2), 215–222 (2013)
Fouts, D.E., Brinkac, L., Beck, E., Inman, J., Sutton, G.: PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species. Nucleic Acids Res. 40(22), e172 (2012)
Bonet, J., Planas-Iglesias, J., Garcia-Garcia, J., MarÃn-López, M.A., Fernandez-Fuentes, N., Oliva, B.: ArchDB 2014: structural classification of loops in proteins. Nucleic Acids Res. 42(database issue), D315–D319 (2014)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Freilich, S., Goldovsky, L., Gottlieb, A., Blanc, E., Tsoka, S., Ouzounis, C.A.: Stratification of co-evolving genomic groups using ranked phylogenetic profiles. BMC Bioinformatics 10, 355 (2009)
Psomopoulos, F.E., Mitkas, P.A., Ouzounis, C.A.: Detection of genomic idiosyncrasies using fuzzy phylogenetic profiles. PLoS One 8(1), e52854 (2013)
Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D., Clayton, R.A., et al.: The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403 (1995), doi:10.1126/science.270.5235.397
Glass, J.I., Lefkowitz, E.J., Glass, J.S., Heiner, C.R., Chen, E.Y., et al.: The complete sequence of the mucosal pathogen Ureaplasma urealyticum. Nature 407, 757–762 (2000), doi:10.1038/35037619
Ferretti, J.J., McShan, W.M., Ajdic, D., Savic, D.J., Savic, G., et al.: Complete genome sequence of an M1 strain of Streptococcus pyogenes. Proc. Natl. Acad. Sci. U.S.A. 98, 4658–4663 (2001), doi:10.1073/pnas.071559398
Shigenobu, S., Watanabe, H., Hattori, M., Sakaki, Y., Ishikawa, H.: Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407, 81–86 (2000)
Waters, E., Hohn, M.J., Ahel, I., Graham, D.E., Adams, M.D., et al.: The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc. Natl. Acad. Sci. U.S.A. 100, 12984–12988 (2003), doi:10.1073/pnas.1735403100
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
Psomopoulos, F.E., Mitkas, P.A. (2014). Algebraic Interpretations Towards Clustering Protein Homology Data. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H., Sioutas, S., Makris, C. (eds) Artificial Intelligence Applications and Innovations. AIAI 2014. IFIP Advances in Information and Communication Technology, vol 437. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44722-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-662-44722-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44721-5
Online ISBN: 978-3-662-44722-2
eBook Packages: Computer ScienceComputer Science (R0)