Abstract
Phylogenetic proximity has guided our understanding of the evolution of species for decades. It is clear nowadays that the paradigm “phylogenetically close species should share similar characters” is just one facet of the complex process of evolution inherent in development and species differentiation. Today, there is a need for novel mathematical approaches to cluster together symbolic information organized into trees of characters that could highlight the evolutionary relations between characters and the processes of coevolution of characters. We propose a combinatorial method to do so and to derive groups of characters which appear to be correlated through their evolutionary history. This approach was first developed for protein sequences, but it is revealed to be general and applicable to any list of characters describing species. In particular, one does not need to know all characters for all species to perform coevolution analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
A. Armon, D. Graur, N. Ben-Tal, ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J. Mol. Biol. 307, 447–463 (2001)
W.R. Atchley, K.R. Wollenberg, W.M. Fitch, W. Terhalle, A.W. Dress, Correlation among amino acid sites in bHLH protein domains: an information theoretic analysis. Mol. Biol. Evol. 17, 164–178 (2000)
J. Baussand, A. Carbone, A combinatorial approach to detect co-evolved amino-acid networks in protein families with variable divergence. PLoS Comput. Biol. 5(9), e1000488 (2009)
J. Bernardes, G. Zaverucha, C. Vaquero, A. Carbone, High performance domain identification in proteins explores a multitude of diversified profiles with grid computing (2012). Manuscript submitted
L. Brillouin, Science and Information Theory (Dover Publications, Mineola, 2004), p. 293
A. Carbone, L. Dib, Co-evolution and information signals in biological sequences. Theor. Comput. Sci. (2010). doi:10.1016/j.tcs.2010.10.040
A. Carbone, S. Engelen, Information content of sets of biological sequences revisited, in Algorithmic Bioprocesses, ed. by A. Condon, D. Harel, J.N. Kok, A. Salomaa, E. Winfree. Natural Computing Series (Springer, Berlin/Heidelberg, 2008)
A. Carbone, F. Képès, A. Zinovyev, Codon bias signatures, organisation of microorganisms in codon space and lifestyle. Mol. Biol. Evol. 22(3), 547–561 (2004)
G. Cheng, B. Qian, R. Samudrala, D. Baker, Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Res. 33, 5861–5867 (2005)
T. Cover, J. Thomas, Elements of Information Theory (Wiley, New York, 1991)
A. Del Sol, M.J. Arauzo-Bravo, D. Amoros, R. Nussinov, Modular architecture of protein structures and allosteric communications: potential implications for signaling proteins and regulatory linkages. Genome Biol. 8, R92 (2006)
A. Del Sol, H. Fujihashi, D. Amoros, R. Nussinov, Residues crucial for maintaining short paths in network communication mediate signaling in proteins. Mol. Syst. Biol. 2, 2006.0019 (2006)
L. Dib, A. Carbone, Protein fragments: functional and structural roles of their coevolution networks. PLoS ONE 13, 194 (2012)
L. Dib, A. Carbone, CLAG: an unsupervised non hierarchical clustering algorithm handling biological data. BMC Bioinform. 13, 194 (2012)
R.I. Dima, D. Thirumalai, Determination of networks of residues that regulate allostery in protein families using sequence analysis. Protein Sci. 15, 258–268 (2006)
S. Engelen, L. Trojan, S. Sacquin-Mora, R. Lavery, A. Carbone, Joint evolutionary trees: detection and analysis of protein interfaces. PLoS Comput. Biol. 5(1), e1000267, 1–17 (2009)
M. Fares, S.A.A. Travers, A novel method for detecting intramolecular coevolution: adding a further dimension to select constraints analyses. Genetics 173, 9–13 (2006)
J.H. Gillespie, Population Genetics: A Concise Guide (Johns Hopkins Press, Baltimore, 1998)
G.B. Gloor, L.C. Martin, L.N. Wahl, S.D. Dunn, Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. Biochemistry 44, 7156–7165 (2005)
C.C. Goh, A.A. Bogan, M. Joachmiak, D. Walther, F.E. Cohen, Coevolution of proteins with their interaction partners. J. Mol. Biol. 299, 283–293 (2000)
D. Hartl, Principles of Population Genetics (Sinauer Associates Publisher, Sunderland, 2007)
C.A. Innis, siteFiNDER-3D: a web-based tool for predicting the location of functional sites in proteins. Nucleic Acids Res. 35(Web-Server-Issue), 489–494 (2007)
P.D. Kreil, C.A. Ouzounis, Identification of thermophilic species by the amino-acids composition deduced from their genomes. Nucleic Acids Res. 29, 1608–1615 (2001)
S. Kullback, R.A. Leibler, On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
O. Lichtarge, M.E. Sowa, Evolutionary predictions of binding surfaces and interactions. Curr. Opin. Struct. Biol. 12, 21–27 (2002)
O. Lichtarge, H.R. Bourne, F.E. Cohen, An evolutionary trace method define binding surface common to protein families. J. Mol. Biol. 257, 342–358 (1996)
S.W. Lockless, R. Ranganathan, Evolutionary conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999)
D.J. Lynn, G.A. Singer, D.A. Hickey, Synonymous codon usage is subject to selection in thermophilic bacteria. Nucleic Acids Res. 30, 4272–4277 (2002)
I. Mihalek, I. Res, O. Lichtarge, A family of evolution-entropy hybrid methods for ranking protein residues by importance. J. Mol. Biol. 336, 1265–1282 (2004)
N. Ota, D.A. Agard, Intramolecular signaling pathways revealed by modeling anisotropic thermal diffusion. Eur. J. Mol. Biol. 351, 345–354 (2005)
D.D. Pollock, W.R. Taylor, Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. Protein Eng. 10, 647–657 (1997)
T. Pupko, R.E. Bell, I. Mayrose, F. Glaser, N. Ben-Tal, Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18, S71–S77 (2002)
A.K. Ramani, E.M. Marcotte, Exploiting the coevolution of interacting proteins to discover interaction specificity. J. Mol. Biol. 327, 273–284 (2003)
C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
G.M. Suel, S.W. Lockless, M.A. Wall, R. Ranganathan, Evolutionary conserved networks of residues mediate allosteric communication in proteins. Nat. Struct. Biol. 23, 59–69 (2003)
F. Tekaia, E. Yeramian, B. Dujon, Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis. Gene 297, 51–60 (2002)
J.N. Thompson, The Geographic Mosaic of Coevolution (University of Chicago Press, Chicago, 2005)
J.D. Watson, R.A. Laskowski, J.M. Thornton, Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15, 275–284 (2005)
M. Weigt, R.A. White, H. Szurmant, J.A. Hoch, T. Hwa, Identification of direct residue contacts in protein-protein interaction by message passing. Proc. Natl. Acad. Sci. U.S.A. 106, 67–72 (2009)
H. Willenbrock, C. Friis, A.S. Juncker, D.W. Ussery, An environmental signature for 323 microbial genomes based on codon adaptation indices. Genome Biol. 7(12), R114 (2006)
C.-H. Yeang, D. Haussler, Detecting coevolution in and among proteins domains. PLoS Comput. Biol. 3, 2122–2134 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Carbone, A. (2014). Extracting Coevolving Characters from a Tree of Species. In: Jonoska, N., Saito, M. (eds) Discrete and Topological Models in Molecular Biology. Natural Computing Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40193-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-40193-0_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40192-3
Online ISBN: 978-3-642-40193-0
eBook Packages: Computer ScienceComputer Science (R0)