Algebraic Interpretations Towards Clustering Protein Homology Data
The identification of meaningful groups of proteins has always been a principal goal in structural and functional genomics. A successful protein clustering can lead to significant insight, both in the evolutionary history of the respective molecules and in the identification of potential functions and interactions of novel sequences. In this work we propose a novel metric for distance evaluation, when applied to protein homology data. The metric is based on a matrix manipulation approach, defining the homology matrix as a form of block diagonal matrix. A first exploratory implementation of the overall process is shown to produce interesting results when using a well explored reference set of genomes. Near future steps include a thorough theoretical validation and comparison against similar approaches.
Unable to display preview. Download preview PDF.
- 4.Sarkar, A., Soueidan, H., Nikolski, M.: Identification of conserved gene clusters in multiple genomes based on synteny and homology. BMC Bioinformatics 12(suppl.9), S18 (2011)Google Scholar
- 7.Fouts, D.E., Brinkac, L., Beck, E., Inman, J., Sutton, G.: PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species. Nucleic Acids Res. 40(22), e172 (2012)Google Scholar
- 8.Bonet, J., Planas-Iglesias, J., Garcia-Garcia, J., Marín-López, M.A., Fernandez-Fuentes, N., Oliva, B.: ArchDB 2014: structural classification of loops in proteins. Nucleic Acids Res. 42(database issue), D315–D319 (2014)Google Scholar
- 11.Psomopoulos, F.E., Mitkas, P.A., Ouzounis, C.A.: Detection of genomic idiosyncrasies using fuzzy phylogenetic profiles. PLoS One 8(1), e52854 (2013)Google Scholar