Abstract
Some proteins, such as homeodomain transcription factors, contain highly conserved regions of sequence. It has recently been suggested that multiple functional domains overlap in the homeodomain, together explaining this high conservation. However, the question remains why so many functional domains cluster together in one relatively small and constrained region of the protein. Here we have modeled an evolutionary mechanism that can produce this kind of clustering: conserved functional domains are displaced from the parts of the molecule that are undergoing adaptive evolution because novel functions generally out-compete conserved functions for control over the identity of amino acid residues. We call this model COAA, for Competition Over Amino Acids. We also studied the evolution of amino acid repeats (a.k.a. homopeptides), which are especially prevalent in transcription factors. Repeats that are encoded by non-homogenous mixtures of synonymous codons cannot be explained by replication slippage alone. Our model provides two explanations for their origin, maintenance, and over-representation in highly conserved proteins. We demonstrate that either competition between multiple functional domains for space within a sequence, or reuse of a sequence for many functions over time, can cause the evolution of amino acid repeats. Both of these processes are characteristic of multifunctional proteins such as homeodomain transcription factors. We conclude that the COAA model can explain two widely recognized features of transcription factor proteins: conserved domains and a tendency to accumulate homopeptides.
Similar content being viewed by others
References
Alba MM, Guigo R (2004) Comparative analysis of amino acid repeats in rodents and humans. Genome Res 14:549–554
Alba MM, Santibanez-Koref MF, Hancock JM (1999a) Conservation of polyglutamine tract size between mouse and human depends on codon interruption. Mol Biol Evol 16:1641–1644
Alba MM, Santibanez-Koref MF, Hancock JM (1999b) Amino acid reiterations in yeast are overrepresented in particular classes of proteins and show evidence of slippage-like mutational process. J Mol Evol 49:789–797
Alba MM, Santibanez-Koref MF, Hancock JM (2001) The comparative genomics of polyglutamine repeats: extreme differences in the codon organization or repeat-encoding regions between mammals and Drosophila. J Mol Evol 52:249–259
Alba MM, Laskowski RA, Hancock JM (2002) Detecting cryptically simple protein sequences using the SIMPLE algorithm. Bioinformatics 18(5):672–678
Babajide A, Farber R, Hofacker IL, Inman J, Lapedes AS, Stadler PF (2001) Exploring protein sequence space using knowledge-based potentials. J Theor Biol 212:35–46
Crow KD, Amemiya CT, Roth J, Wagner GP (2009) Hypermutability of HoxA13a and functional divergence from its paralog are associated with the origin of a novel developmental feature in zebrafish and related taxa (Cypriniformes). Evolution 63(6):1574–1592
Cummings CJ, Zoghbi HY (2000) Trinucleotide repeats: mechanisms and pathophysiology. Annu Rev Genom Hum Genet 1:281–328
Faux NG, Bottomley SP, Lesk AM (2005) Functional insights from the distribution and role of homopeptide repeat-containing proteins. Genome Res 15:537–551
Fondon JW, Garner HR (2004) Molecular origins of rapid and continuous morphological evolution. Proc Natl Acad Sci USA 101(52):18058–18063
Green H, Wang N (1994) Codon reiteration and the evolution of proteins. Proc Natl Acad Sci USA 91:4298–4302
Gu X (2006) A simple statistical method for estimating Type-II (cluster-specific) functional divergence of protein sequences. Mol Biol Evol 23(10):1937–1945
Han K, Manley JL (1993) Functional domains of the Drosophila engrailed protein. EMBO J 12:2723–2733
Hancock JM, Simon M (2005) Simple sequence repeats in proteins and their significance for network evolution. Gene 345:113–118
Hancock JM, Wothey EA, Santibanez-Koref MF (2001) A role for selection in regulating the evolutionary emergence of disease-causing and other coding CAG repeats in humans and mice. Mol Biol Evol 18:1014–1023
Hill WG (1982) Rates of change in quantitative traits from fixation of new mutations. Proc Natl Acad Sci USA 79:142–145
Karlin S (1995) Curr Opin Struct Biol 5:360–371
Karlin S, Brocchieri L, Bergman A, Mrazek J, Gentles AJ (2002) Amino acid runs in eukaryotic proteomes and disease associations. Proc Natl Acad Sci USA 99:333–338
Ledneva RK, Alexeevskii AV, Vasil SA, Spirin SA, Karyagina AS (2001) Structural aspects of interaction of homeodomains with DNA. Mol Biol 35(5):647–659
Levinson G, Gutman GA (1987) Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 4:203–221
Li YC, Korol AB, Fahima T, Nevo E (2004) Microsatellites within genes: structure, function and evolution. Mol Biol Evol 21:991–1007
Lovell SC (2003) Are non-function, unfolded proteins (“junk proteins”) common in the genome? FEBS Lett 554:237–239
Lynch M (2007) The origins of genome architecture. Sinauer Associates, Inc., Sunderland, MA
Lynch VJ, Roth JJ, Wagner GP (2006) Adaptive evolution of Hox-gene homeodomains after cluster duplications. BMC Evol Biol 6:86
Lynch VJ, Tanzer A, Wang Y, Leung FC, Gellersen B, Emera D, Wagner GP (2008) Adaptive changes in the transcription factor HoxA-11 are essential for the evolution of pregnancy in mammals. Proc Natl Acad Sci USA 105(39):14928–14933
Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C, Gonzales NR, Gwadz M, Hao L, He S, Hurwitz DI, Jackson JD, Ke Z, Krylov D, Lanczycki CJ, Liebert CA, Liu C, Lu F, Shennan Lu, Marchler GH, Muilokandov M, Song JS, Thanki N, Yamashita RA, Yin JJ, Zhang D, Bryant SH (2007) CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 35:D237–D240
Mularoni L, Veitia RA, Albà MM (2007) Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. Genomics 89:316–325
Nakachi Y, Hayakawa T, Oota H, Sumiyama K, Wang L, Ueda S (1997) Nucleotide compositional constraints on genomes generate alanine-, glycine-, and proline-rich structures in transcription factors. Mol Biol Evol 14:1042–1049
Ofran Y, Rost B (2003) Analysing six types of protein-protein interfaces. J Mol Biol 325:377–387
Petes TD, Greenwell PW, Dominska M (1997) Stabilization of microsatellite sequences by variant repeats in the yeast Saccharomyces cerevisiae. Genetics 146:491–498
Rice S (2004) Evolutionary theory: mathematical and conceptual foundations. Sinauer Associates, Sunderland, MA, pp 76–78
Richard GF, Dujon B (1997) Trinucleotide repeats in yeast. Res Microbiol 148:731–744
Roth JJ, Breitenback M, Wagner GP (2005) J Exp Zool B Mol Dev Evol 304B:468–475
Wagner GP, Pyle AM (2007) Tinkering with transcription factor proteins: the role of transcription factor adaptation in developmental evolution. Novartis Foundation Symp 284:115–125
Wagner GP, Schwenk K (2000) Evolutionarily stable configurations: functional integration and the evolution of phenotypic stability. In: Hecht MK, MacIntyre RJ, Clegg MT (eds) Evolutionary biology, vol 31. Kluwer Academic/Plenum Publishers, New York, NY, pp 155–217
Acknowledgments
We are tremendously grateful to JME Associate Editor, Mark Jensen, for his thorough and insightful review, and for the useful comments and suggestions of two anonymous reviewers. The experimental work in the Wagner lab is supported by a grant from the John Templeton Foundation (Grant number 12793). The views expressed in this paper do not necessarily reflect the views of the JTF.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Rorick, M.M., Wagner, G.P. The Origin of Conserved Protein Domains and Amino Acid Repeats Via Adaptive Competition for Control Over Amino Acid Residues. J Mol Evol 70, 29–43 (2010). https://doi.org/10.1007/s00239-009-9305-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-009-9305-7