Skip to main content
Log in

The Origin of Conserved Protein Domains and Amino Acid Repeats Via Adaptive Competition for Control Over Amino Acid Residues

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

Some proteins, such as homeodomain transcription factors, contain highly conserved regions of sequence. It has recently been suggested that multiple functional domains overlap in the homeodomain, together explaining this high conservation. However, the question remains why so many functional domains cluster together in one relatively small and constrained region of the protein. Here we have modeled an evolutionary mechanism that can produce this kind of clustering: conserved functional domains are displaced from the parts of the molecule that are undergoing adaptive evolution because novel functions generally out-compete conserved functions for control over the identity of amino acid residues. We call this model COAA, for Competition Over Amino Acids. We also studied the evolution of amino acid repeats (a.k.a. homopeptides), which are especially prevalent in transcription factors. Repeats that are encoded by non-homogenous mixtures of synonymous codons cannot be explained by replication slippage alone. Our model provides two explanations for their origin, maintenance, and over-representation in highly conserved proteins. We demonstrate that either competition between multiple functional domains for space within a sequence, or reuse of a sequence for many functions over time, can cause the evolution of amino acid repeats. Both of these processes are characteristic of multifunctional proteins such as homeodomain transcription factors. We conclude that the COAA model can explain two widely recognized features of transcription factor proteins: conserved domains and a tendency to accumulate homopeptides.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Alba MM, Guigo R (2004) Comparative analysis of amino acid repeats in rodents and humans. Genome Res 14:549–554

    Article  CAS  PubMed  Google Scholar 

  • Alba MM, Santibanez-Koref MF, Hancock JM (1999a) Conservation of polyglutamine tract size between mouse and human depends on codon interruption. Mol Biol Evol 16:1641–1644

    CAS  PubMed  Google Scholar 

  • Alba MM, Santibanez-Koref MF, Hancock JM (1999b) Amino acid reiterations in yeast are overrepresented in particular classes of proteins and show evidence of slippage-like mutational process. J Mol Evol 49:789–797

    Article  CAS  Google Scholar 

  • Alba MM, Santibanez-Koref MF, Hancock JM (2001) The comparative genomics of polyglutamine repeats: extreme differences in the codon organization or repeat-encoding regions between mammals and Drosophila. J Mol Evol 52:249–259

    CAS  PubMed  Google Scholar 

  • Alba MM, Laskowski RA, Hancock JM (2002) Detecting cryptically simple protein sequences using the SIMPLE algorithm. Bioinformatics 18(5):672–678

    Article  CAS  PubMed  Google Scholar 

  • Babajide A, Farber R, Hofacker IL, Inman J, Lapedes AS, Stadler PF (2001) Exploring protein sequence space using knowledge-based potentials. J Theor Biol 212:35–46

    Article  CAS  PubMed  Google Scholar 

  • Crow KD, Amemiya CT, Roth J, Wagner GP (2009) Hypermutability of HoxA13a and functional divergence from its paralog are associated with the origin of a novel developmental feature in zebrafish and related taxa (Cypriniformes). Evolution 63(6):1574–1592

    Article  CAS  PubMed  Google Scholar 

  • Cummings CJ, Zoghbi HY (2000) Trinucleotide repeats: mechanisms and pathophysiology. Annu Rev Genom Hum Genet 1:281–328

    Article  CAS  Google Scholar 

  • Faux NG, Bottomley SP, Lesk AM (2005) Functional insights from the distribution and role of homopeptide repeat-containing proteins. Genome Res 15:537–551

    Article  CAS  PubMed  Google Scholar 

  • Fondon JW, Garner HR (2004) Molecular origins of rapid and continuous morphological evolution. Proc Natl Acad Sci USA 101(52):18058–18063

    Article  CAS  PubMed  Google Scholar 

  • Green H, Wang N (1994) Codon reiteration and the evolution of proteins. Proc Natl Acad Sci USA 91:4298–4302

    Article  CAS  PubMed  Google Scholar 

  • Gu X (2006) A simple statistical method for estimating Type-II (cluster-specific) functional divergence of protein sequences. Mol Biol Evol 23(10):1937–1945

    Article  CAS  PubMed  Google Scholar 

  • Han K, Manley JL (1993) Functional domains of the Drosophila engrailed protein. EMBO J 12:2723–2733

    CAS  PubMed  Google Scholar 

  • Hancock JM, Simon M (2005) Simple sequence repeats in proteins and their significance for network evolution. Gene 345:113–118

    Article  CAS  PubMed  Google Scholar 

  • Hancock JM, Wothey EA, Santibanez-Koref MF (2001) A role for selection in regulating the evolutionary emergence of disease-causing and other coding CAG repeats in humans and mice. Mol Biol Evol 18:1014–1023

    CAS  PubMed  Google Scholar 

  • Hill WG (1982) Rates of change in quantitative traits from fixation of new mutations. Proc Natl Acad Sci USA 79:142–145

    Article  CAS  PubMed  Google Scholar 

  • Karlin S (1995) Curr Opin Struct Biol 5:360–371

    Article  CAS  PubMed  Google Scholar 

  • Karlin S, Brocchieri L, Bergman A, Mrazek J, Gentles AJ (2002) Amino acid runs in eukaryotic proteomes and disease associations. Proc Natl Acad Sci USA 99:333–338

    Article  CAS  PubMed  Google Scholar 

  • Ledneva RK, Alexeevskii AV, Vasil SA, Spirin SA, Karyagina AS (2001) Structural aspects of interaction of homeodomains with DNA. Mol Biol 35(5):647–659

    Article  CAS  Google Scholar 

  • Levinson G, Gutman GA (1987) Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 4:203–221

    CAS  PubMed  Google Scholar 

  • Li YC, Korol AB, Fahima T, Nevo E (2004) Microsatellites within genes: structure, function and evolution. Mol Biol Evol 21:991–1007

    Article  CAS  PubMed  Google Scholar 

  • Lovell SC (2003) Are non-function, unfolded proteins (“junk proteins”) common in the genome? FEBS Lett 554:237–239

    Article  CAS  PubMed  Google Scholar 

  • Lynch M (2007) The origins of genome architecture. Sinauer Associates, Inc., Sunderland, MA

    Google Scholar 

  • Lynch VJ, Roth JJ, Wagner GP (2006) Adaptive evolution of Hox-gene homeodomains after cluster duplications. BMC Evol Biol 6:86

    Article  PubMed  Google Scholar 

  • Lynch VJ, Tanzer A, Wang Y, Leung FC, Gellersen B, Emera D, Wagner GP (2008) Adaptive changes in the transcription factor HoxA-11 are essential for the evolution of pregnancy in mammals. Proc Natl Acad Sci USA 105(39):14928–14933

    Article  CAS  PubMed  Google Scholar 

  • Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C, Gonzales NR, Gwadz M, Hao L, He S, Hurwitz DI, Jackson JD, Ke Z, Krylov D, Lanczycki CJ, Liebert CA, Liu C, Lu F, Shennan Lu, Marchler GH, Muilokandov M, Song JS, Thanki N, Yamashita RA, Yin JJ, Zhang D, Bryant SH (2007) CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 35:D237–D240

    Article  CAS  PubMed  Google Scholar 

  • Mularoni L, Veitia RA, Albà MM (2007) Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. Genomics 89:316–325

    Article  CAS  PubMed  Google Scholar 

  • Nakachi Y, Hayakawa T, Oota H, Sumiyama K, Wang L, Ueda S (1997) Nucleotide compositional constraints on genomes generate alanine-, glycine-, and proline-rich structures in transcription factors. Mol Biol Evol 14:1042–1049

    CAS  PubMed  Google Scholar 

  • Ofran Y, Rost B (2003) Analysing six types of protein-protein interfaces. J Mol Biol 325:377–387

    Article  CAS  PubMed  Google Scholar 

  • Petes TD, Greenwell PW, Dominska M (1997) Stabilization of microsatellite sequences by variant repeats in the yeast Saccharomyces cerevisiae. Genetics 146:491–498

    CAS  PubMed  Google Scholar 

  • Rice S (2004) Evolutionary theory: mathematical and conceptual foundations. Sinauer Associates, Sunderland, MA, pp 76–78

    Google Scholar 

  • Richard GF, Dujon B (1997) Trinucleotide repeats in yeast. Res Microbiol 148:731–744

    Article  CAS  PubMed  Google Scholar 

  • Roth JJ, Breitenback M, Wagner GP (2005) J Exp Zool B Mol Dev Evol 304B:468–475

    Article  CAS  Google Scholar 

  • Wagner GP, Pyle AM (2007) Tinkering with transcription factor proteins: the role of transcription factor adaptation in developmental evolution. Novartis Foundation Symp 284:115–125

    Google Scholar 

  • Wagner GP, Schwenk K (2000) Evolutionarily stable configurations: functional integration and the evolution of phenotypic stability. In: Hecht MK, MacIntyre RJ, Clegg MT (eds) Evolutionary biology, vol 31. Kluwer Academic/Plenum Publishers, New York, NY, pp 155–217

Download references

Acknowledgments

We are tremendously grateful to JME Associate Editor, Mark Jensen, for his thorough and insightful review, and for the useful comments and suggestions of two anonymous reviewers. The experimental work in the Wagner lab is supported by a grant from the John Templeton Foundation (Grant number 12793). The views expressed in this paper do not necessarily reflect the views of the JTF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mary M. Rorick.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOC 456 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rorick, M.M., Wagner, G.P. The Origin of Conserved Protein Domains and Amino Acid Repeats Via Adaptive Competition for Control Over Amino Acid Residues. J Mol Evol 70, 29–43 (2010). https://doi.org/10.1007/s00239-009-9305-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-009-9305-7

Keywords

Navigation