Abstract
Gene duplications are one of the most important mechanisms for the origin of evolutionary novelties. Even though various models of the fate of duplicated genes have been established, current knowledge about the role of divergent selection after gene duplication is rather limited. In this study, we analyzed sequence divergence in response to neo- and subfunctionalization of segmentally duplicated genes in the genome of Arabidopsis thaliana. We compared the genomes of A. thaliana and the poplar Populus trichocarpa to identify orthologous pairs of genes and their corresponding inparalogs. Maximum-likelihood analyses of the nonsynonymous and synonymous substitution rate ratio \( \left( {\omega = d_{\rm{N}} /d_{\rm{S}} } \right) \) of pairs of A. thaliana inparalogs were used to detect differences in the evolutionary rates of protein coding sequences. We analyzed 1,924 A. thaliana paralogous pairs and our results indicate that around 6.9% show divergent ω values between the lineages for a fraction of sites. We observe an enrichment of regulatory sequences, a reduced level of co-expression and an increased number of substitutions that can be attributed to positive selection based on an McDonald–Kreitman type of analysis. Taken together, these results show that selection after duplication contributes substantially to gene novelties and hence functional divergence in plants.
Similar content being viewed by others
References
AG Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408(6814):796–815
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. doi:10.1006/jmbi.1990.9999
Anisimova M, Bielawski JP, Yang Z (2001) Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol 18(8):1585–1592
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29. doi:10.1038/75556
Aury JM, Jaillon O, Duret L, Noel B, Jubin C, Porcel BM, Ségurens B, Daubin V, Anthouard V, Aiach N, Arnaiz O, Billaut A, Beisson J, Blanc I, Bouhouche K, Câmara F, Duharcourt S, Guigo R, Gogendeau D, Katinka M, Keller AM, Kissmehl R, Klotz C, Koll F, Mouël AL, Lepère G, Malinsky S, Nowacki M, Nowak JK, Plattner H, Poulain J, Ruiz F, Serrano V, Zagulski M, Dessen P, Bétermier M, Weissenbach J, Scarpelli C, Schächter V, Sperling L, Meyer E, Cohen J, Wincker P (2006) Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444(7116):171–178. doi:10.1038/nature05230
Benderoth M, Textor S, Windsor AJ, Mitchell-Olds T, Gershenzon J, Kroymann J (2006) Positive selection driving diversification in plant secondary metabolism. Proc Natl Acad Sci USA 103(24):9118–9123. doi:10.1073/pnas.0601738103
Bernhardt A, Mooney S, Hellmann H (2010) Arabidopsis DDB1a and DDB1b are critical for embryo development. Planta 232(3):555–566. doi:10.1007/s00425-010-1195-9
Bielawski JP, Yang Z (2004) A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. J Mol Evol 59(1):121–132. doi:10.1007/s00239-004-2597-8
Blanc G, Wolfe KH (2004a) Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16(7):1679–1691. doi:10.1105/tpc.021410
Blanc G, Wolfe KH (2004b) Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16(7):1667–1678. doi:10.1105/tpc.021345
Blanc G, Hokamp K, Wolfe KH (2003) A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res 13(2):137–144. doi:10.1101/gr.751803
Cartwright RA (2009) Problems and solutions for estimating indel rates and length distributions. Mol Biol Evol 26(2):473–480. doi:10.1093/molbev/msn275
Chain FJJ, Evans BJ (2006) Multiple mechanisms promote the retained expression of gene duplicates in the tetraploid frog Xenopus laevis. PLoS Genet 2(4):e56. doi:10.1371/journal.pgen.0020056
Chen F, Mackey AJ, Vermunt JK, Roos DS (2007a) Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS ONE 2(4):e383. doi:10.1371/journal.pone.0000383
Chen Q, Steinhauer L, Hammerlindl J, Keller W, Zou J (2007b) Biosynthesis of phytosterol esters: identification of a sterol o-acyltransferase in Arabidopsis. Plant Physiol 145(3):974–984. doi:10.1104/pp.107.106278
Clark AG (1994) Invasion and maintenance of a gene duplication. Proc Natl Acad Sci USA 91(8):2950–2954
Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, Warthmann N, Hu TT, Fu G, Hinds DA, Chen H, Frazer KA, Huson DH, Schölkopf B, Nordborg M, Rätsch G, Ecker JR, Weigel D (2007) Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science 317(5836):338–342. doi:10.1126/science.1138632
Conant GC, Wagner A (2003) Asymmetric sequence divergence of duplicate genes. Genome Res 13(9):2052–2058. doi:10.1101/gr.1252603
Duarte JM, Cui L, Wall PK, Zhang Q, Zhang X, Leebens-Mack J, Ma H, Altman N, dePamphilis CW (2006) Expression pattern shifts following duplication indicative of subfunctionalization and neofunctionalization in regulatory genes of Arabidopsis. Mol Biol Evol 23(2):469–478. doi:10.1093/molbev/msj051
Ellegren H (2008) Sequencing goes 454 and takes large-scale genomics into the wild. Mol Ecol 17(7):1629–1631. doi:10.1111/j.1365-294X.2008.03699.x
Eyre-Walker A (2006) The genomic rate of adaptive evolution. Trends Ecol Evol 21(10):569–575. doi:10.1016/j.tree.2006.06.015
Eyre-Walker A, Keightley PD (2009) Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations and population size change. Mol Biol Evol 26(9):2097–2108. doi:10.1093/molbev/msp119
Felsenstein J (2005) PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle
Fiebig A, Kimport R, Preuss D (2004) Comparisons of pollen coat genes across Brassicaceae species reveal rapid evolution by repeat expansion and diversification. Proc Natl Acad Sci USA 101(9):3286–3291. doi:10.1073/pnas.0305448101
Fletcher W, Yang Z (2009) INDELible: a flexible simulator of biological sequence evolution. Mol Biol Evol 26(8):1879–1888. doi:10.1093/molbev/msp098
Fletcher W, Yang Z (2010) The Effect of insertions, deletions and alignment errors on the branch-site test of positive selection. Mol Biol Evol. doi:10.1093/molbev/msq115
Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151(4):1531–1545
Forsberg R, Christiansen FB (2003) A codon-based model of host-specific selection in parasites, with an application to the influenza A virus. Mol Biol Evol 20(8):1252–1259. doi:10.1093/molbev/msg149
Foxe JP, un Nisa Dar V, Zheng H, Nordborg M, Gaut BS, Wright SI (2008) Selection on amino acid substitutions in Arabidopsis. Mol Biol Evol 25(7):1375–1383. doi:10.1093/molbev/msn079
Gossmann TI, Song BH, Windsor AJ, Mitchell-Olds T, Dixon CJ, Kapralov MV, Filatov DA, Eyre-Walker A (2010) Genome wide analyses reveal little evidence for adaptive evolution in many plant species. Mol Biol Evol. doi:10.1093/molbev/msq079
Hahn MW (2009) Distinguishing among evolutionary models for the maintenance of gene duplicates. J Hered 100(5):605–617. doi:10.1093/jhered/esp047
Han MV, Demuth JP, McGrath CL, Casola C, Hahn MW (2009) Adaptive evolution of young gene duplicates in mammals. Genome Res 19(5):859–867. doi:10.1101/gr.085951.108
He X, Zhang J (2005) Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169(2):1157–1164. doi:10.1534/genetics.104.037051
Higgins DG (1994) CLUSTAL V: multiple alignment of DNA and protein sequences. Methods Mol Biol 25:307–318. doi:10.1385/0-89603-276-0:307
IRGS Project (2005) The map-based sequence of the rice genome. Nature 436(7052):793–800
Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyère C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Fabbro CD, Alaux M, Gaspero GD, Dumas V, Felice N, Paillard S, Juman I, Moroldo M, Scalabrin S, Canaguier A, Clainche IL, Malacrida G, Durand E, Pesole G, Laucou V, Chatelet P, Merdinoglu D, Delledonne M, Pezzotti M, Lecharny A, Scarpelli C, Artiguenave F, Pè ME, Valle G, Morgante M, Caboche M, Adam-Blondon AF, Weissenbach J, Quétier F, Wincker P, for Grapevine Genome Characterization FIPC (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449(7161):463–467
Jakoby M, Weisshaar B, Dröge-Laser W, Vicente-Carbajosa J, Tiedemann J, Kroj T, Parcy F, bZ IP Research Group (2002) bZIP transcription factors in Arabidopsis. Trends Plant Sci 7(3):106–111
Kesarwani M, Yoo J, Dong X (2007) Genetic interactions of TGA transcription factors in the regulation of pathogenesis-related genes and disease resistance in Arabidopsis. Plant Physiol 144(1):336–346. doi:10.1104/pp.106.095299
Kim CY, Bove J, Assmann SM (2008) Overexpression of wound-responsive RNA-binding proteins induces leaf senescence and hypersensitive-like cell death. New Phytol 180(1):57–70. doi:10.1111/j.1469-8137.2008.02557.x
Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39:309–338. doi:10.1146/annurev.genet.39.073003.114725
Kuang H, Woo SS, Meyers BC, Nevo E, Michelmore RW (2004) Multiple genetic processes result in heterogeneous rates of evolution within the major cluster disease resistance genes in lettuce. Plant Cell 16(11):2870–2894. doi:10.1105/tpc.104.025502
Kuraku S, Meyer A, Kuratani S (2009) Timing of genome duplications relative to the origin of the vertebrates: did cyclostomes diverge before or after? Mol Biol Evol 26(1):47–59. doi:10.1093/molbev/msn222
Li L, Stoeckert CJ, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189. doi:10.1101/gr.1224503
Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290(5494):1151–1155
Lynch M, Force A (2000) The probability of duplicate gene preservation by subfunctionalization. Genetics 154(1):459–473
Lynch M, O’Hely M, Walsh B, Force A (2001) The probability of preservation of a newly arisen gene duplicate. Genetics 159(4):1789–1804
Mao Y, Pavangadkar KA, Thomashow MF, Triezenberg SJ (2006) Physical and functional interactions of Arabidopsis ADA2 transcriptional coactivator proteins with the acetyltransferase GCN5 and with the cold-induced transcription factor CBF1. Biochim Biophys Acta 1759(1–2):69–79. doi:10.1016/j.bbaexp.2006.02.006
McDonald JH, Kreitman M (1991) Adaptive protein evolution at the Adh locus in Drosophila. Nature 351(6328):652–654. doi:10.1038/351652a0
Mita SD, Santoni S, Hochu I, Ronfort J, Bataillon T (2006) Molecular evolution and positive selection of the symbiotic gene NORK in Medicago truncatula. J Mol Evol 62(2):234–244. doi:10.1007/s00239-004-0367-2
Mondragón-Palomino M, Meyers BC, Michelmore RW, Gaut BS (2002) Patterns of positive selection in the complete NBS-LRR gene family of Arabidopsis thaliana. Genome Res 12(9):1305–1315. doi:10.1101/gr.159402
Nordborg M, Hu TT, Ishino Y, Jhaveri J, Toomajian C, Zheng H, Bakker E, Calabrese P, Gladstone J, Goyal R, Jakobsson M, Kim S, Morozov Y, Padhukasahasram B, Plagnol V, Rosenberg NA, Shah C, Wall JD, Wang J, Zhao K, Kalbfleisch T, Schulz V, Kreitman M, Bergelson J (2005) The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol 3(7):e196, doi:10.1371/journal.pbio.0030196
O’Hely M (2006) A diffusion approach to approximating preservation probabilities for gene duplicates. J Math Biol 53(2):215–230. doi:10.1007/s00285-006-0001-6
Ohno S (1970) Evolution by gene duplication. Springer, Berlin
Raes J, Vandepoele K, Simillion C, Saeys Y, de Peer YV (2003) Investigating ancient duplication events in the Arabidopsis genome. J Struct Funct Genomics 3(1–4):117–129
Rastogi S, Liberles DA (2005) Subfunctionalization of duplicated genes as a transition state to neofunctionalization. BMC Evol Biol 5(1):28. doi:10.1186/1471-2148-5-28
Remm M, Storm CE, Sonnhammer EL (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 314(5):1041–1052. doi:10.1006/jmbi.2000.5197
Rizzon C, Ponger L, Gaut BS (2006) Striking similarities in the genomic distribution of tandemly arrayed genes in Arabidopsis and rice. PLoS Comput Biol 2(9):e115. doi:10.1371/journal.pcbi.0020115
Scannell DR, Butler G, Wolfe KH (2007) Yeast genome evolution–the origin of the species. Yeast 24(11):929–942. doi:10.1002/yea.1515
Schein M, Yang Z, Mitchell-Olds T, Schmid KJ (2004) Rapid evolution of a pollen-specific oleosin-like gene family from Arabidopsis thaliana and closely related species. Mol Biol Evol 21(4):659–669. doi:10.1093/molbev/msh059
Soltis PS, Soltis DE (2009) The role of hybridization in plant speciation. Annu Rev Plant Biol 60:561–588. doi:10.1146/annurev.arplant.043008.092039
Spillane C, Schmid KJ, Laoueillé-Duprat S, Pien S, Escobar-Restrepo JM, Baroux C, Gagliardini V, Page DR, Wolfe KH, Grossniklaus U (2007) Positive darwinian selection at the imprinted MEDEA locus in plants. Nature 448(7151):349–352. doi:10.1038/nature05984
Studer RA, Penel S, Duret L, Robinson-Rechavi M (2008) Pervasive positive selection on duplicated and nonduplicated vertebrate protein coding genes. Genome Res 18(9):1393–1402. doi:10.1101/gr.076992.108
Sun X, Cao Y, Wang S (2006) Point mutations with positive selection were a major force during the evolution of a receptor-kinase resistance gene family of rice. Plant Physiol 140(3):998–1008. doi:10.1104/pp.105.073080
Sun J, Jiang H, Xu Y, Li H, Wu X, Xie Q, Li C (2007) The CCCH-type zinc finger proteins AtSZF1 and AtSZF2 regulate salt stress responses in Arabidopsis. Plant Cell Physiol 48(8):1148–1158. doi:10.1093/pcp/pcm088
Suyama M, Torrents D, Bork P (2006) PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34(Web Server issue):W609–W612. doi:10.1093/nar/gkl315
Tanaka KM, Takahasi KR, Takano-Shimizu T (2009) Enhanced fixation and preservation of a newly arisen duplicate gene by masking deleterious loss-of-function mutations. Genet Res (Camb) 91(4):267–280. doi:10.1017/S0016672309000196
Tang H, Wang X, Bowers JE, Ming R, Alam M, Paterson AH (2008) Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res 18(12):1944–1954. doi:10.1101/gr.080978.108
Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, Schein J, Sterck L, Aerts A, Bhalerao RR, Bhalerao RP, Blaudez D, Boerjan W, Brun A, Brunner A, Busov V, Campbell M, Carlson J, Chalot M, Chapman J, Chen GL, Cooper D, Coutinho PM, Couturier J, Covert S, Cronk Q, Cunningham R, Davis J, Degroeve S, Déjardin A, Depamphilis C, Detter J, Dirks B, Dubchak I, Duplessis S, Ehlting J, Ellis B, Gendler K, Goodstein D, Gribskov M, Grimwood J, Groover A, Gunter L, Hamberger B, Heinze B, Helariutta Y, Henrissat B, Holligan D, Holt R, Huang W, Islam-Faridi N, Jones S, Jones-Rhoades M, Jorgensen R, Joshi C, Kangasjärvi J, Karlsson J, Kelleher C, Kirkpatrick R, Kirst M, Kohler A, Kalluri U, Larimer F, Leebens-Mack J, Leplé JC, Locascio P, Lou Y, Lucas S, Martin F, Montanini B, Napoli C, Nelson DR, Nelson C, Nieminen K, Nilsson O, Pereda V, Peter G, Philippe R, Pilate G, Poliakov A, Razumovskaya J, Richardson P, Rinaldi C, Ritland K, Rouzé P, Ryaboy D, Schmutz J, Schrader J, Segerman B, Shin H, Siddiqui A, Sterky F, Terry A, Tsai CJ, Uberbacher E, Unneberg P, Vahala J, Wall K, Wessler S, Yang G, Yin T, Douglas C, Marra M, Sandberg G, de Peer YV, Rokhsar D (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313(5793):1596–1604. doi:10.1126/science.1128691
Wong KM, Suchard MA, Huelsenbeck JP (2008) Alignment uncertainty and genomic analysis. Science 319(5862):473–476. doi:10.1126/science.1151532
Xue C, Fu Y (2009) Preservation of duplicate genes by originalization. Genetica 136(1):69–78. doi:10.1007/s10709-008-9311-5
Yamane K, Yano K, Kawahara T (2006) Pattern and rate of indel evolution inferred from whole chloroplast intergenic regions in sugarcane, maize and rice. DNA Res 13(5):197–204. doi:10.1093/dnares/dsl012
Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13(5):555–556
Yang Z, Bielawski JP (2000) Statistical methods for detecting molecular adaptation. Trends Ecol Evol 15(12):496–503
Zhang Z, Gerstein M (2003) Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. Nucleic Acids Res 31(18):5338–5348
Acknowledgments
The authors are grateful to the bioinformatics group of the IPK Gatersleben as well as Christian Kauhaus at University of Jena for access to the computer clusters. Matthias Höffken provided useful hints on Python scripting and contributed Python code for statistical analyses. The authors thank Adam Eyre-Walker for discussion on the MK analysis of inparalogs. The authors also thank D. Tian and two anonymous referees for their valuable comments on the manuscript. The study was supported by an undergraduate scholarship by the Studienstiftung des deutschen Volkes to TG and IPK core funding to KS.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Gossmann, T.I., Schmid, K.J. Selection-Driven Divergence After Gene Duplication in Arabidopsis thaliana . J Mol Evol 73, 153–165 (2011). https://doi.org/10.1007/s00239-011-9463-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-011-9463-2