Journal of Molecular Evolution

, Volume 67, Issue 4, pp 343–357 | Cite as

Whole-Genome Duplications in the Ancestral Vertebrate Are Detectable in the Distribution of Gene Family Sizes of Tetrapod Species

  • Timothy HughesEmail author
  • David A. LiberlesEmail author


A clustering of all protein coding genes from the complete genomes of five tetrapod species into gene families shows a clear deviation from the expected power-law distribution of gene family size. We hypothesize that at least part of the deviation is the result of the two whole-genome duplications (WGDs) that are now known, with reasonable certainty, to have occurred prior to the fish-tetrapod split. We build a model of homologous gene family evolution and perform simulations to show that speciations alone cannot produce a distribution that resembles the empirical data. In order to replicate the features of the empirical distribution, the simulation must incorporate two WGD events. These WGDs must be such that a significant number of the gene duplicates generated in the WGDs have a higher retention rate than they do following small-scale duplication (SSD). This requirement is consistent with what is known about duplicate retention following a WGD, namely, that genes belonging to specific functional classes, such as genes regulating transcription, are much more likely to be retained following WGD than SSD. We conclude that the deviation from the power-law that we observe in the empirical data is the result of the two WGDs that occurred in the ancestral chordate. This implies that the two ancient WGDs continue to have a structural effect on gene families approximately 500 million years after the initial events. On the one hand, this is a surprising result, given the limited retention of duplicates generated by a WGD and the continual SSD, which further weakens the signal created by the fraction of duplicate pairs that are retained. On the other hand, WGD’s capacity to fundamentally change the architecture of gene families in a profound and lasting way is consistent with the observed correlation between WGDs and important evolutionary transitions.


Gene duplication Whole-genome duplication Pseudogenization Nonsynonymous substitution Gene family size Power-law distribution Speciation 



This work was funded by FUGE, the functional genomics platform of the Norwegian Research Council.

Supplementary material

239_2008_9145_MOESM_ESM.pdf (320 kb)


  1. Abi-Rached L, Gilles A, Shiina T, Pontarotti P, Inoko H (2002) Evidence of en bloc duplication in vertebrate genomes. Nature Genet 31:100–105PubMedCrossRefGoogle Scholar
  2. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedCrossRefGoogle Scholar
  3. Aury JM, Jaillon O, Duret L, Noel B, Jubin C, Porcel BM, Ségurens B, Daubin V, Anthouard V, Aiach N, Arnaiz O, Billaut A, Beisson J, Blanc I, Bouhouche K, Câmara F, Duharcourt S, Guigo R, Gogendeau D, Katinka M, Keller AM, Kissmehl R, Klotz C, Koll F, Mouël AL, Lepère G, Malinsky S, Nowacki M, Nowak JK, Plattner H, Poulain J, Ruiz F, Serrano V, Zagulski M, Dessen P, Bétermier M, Weissenbach J, Scarpelli C, Schächter V, Sperling L, Meyer E, Cohen J, Wincker P (2006) Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444:171–178PubMedCrossRefGoogle Scholar
  4. Birney E, Andrews D, Caccamo M et al (2006) Ensembl 2006. Nucleic Acids Res 34:D556–D561PubMedCrossRefGoogle Scholar
  5. Blair JE, Hedges SB (2005) Molecular phylogeny and divergence times of deuterostome animals. Mol Biol Evol 22:2275–2284PubMedCrossRefGoogle Scholar
  6. Blanc G, Wolfe KH (2004) Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16:1679–1691PubMedCrossRefGoogle Scholar
  7. Blomme T, Vandepoele K, Bodt SD, Simillion C, Maere S, van de Peer Y (2006) The gain and loss of genes during 600 million years of vertebrate evolution. Genome Biol 7:R43PubMedCrossRefGoogle Scholar
  8. Brunet FG, Crollius HR, Paris M, Aury JM, Gibert P, Jaillon O, Laudet V, Robinson Rechavi M (2006) Gene loss and evolutionary rates following whole-genome duplication in teleost fishes. Mol Biol Evol 23:1808–1816PubMedCrossRefGoogle Scholar
  9. Christoffels A, Koh EGL, Chia JM, Brenner S, Aparicio S, Venkatesh B (2004) Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol Biol Evol 21:1146–1151PubMedCrossRefGoogle Scholar
  10. Dehal P, Boore JL (2005) Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol 3:e314PubMedCrossRefGoogle Scholar
  11. Demuth JP, Bie TD, Stajich JE, Cristianini N, Hahn MW (2006) The evolution of mammalian gene families. PLoS ONE 1:e85PubMedCrossRefGoogle Scholar
  12. Enright AJ, Dongen SV, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30:1575–1584PubMedCrossRefGoogle Scholar
  13. Enright AJ, Kunin V, Ouzounis CA (2003) Protein families and TRIBES in genome sequence space. Nucleic Acids Res 31:4632–4638PubMedCrossRefGoogle Scholar
  14. Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531–1545PubMedGoogle Scholar
  15. Friedman R, Hughes AL (2001) Pattern and timing of gene duplication in animal genomes. Genome Res 11:1842–1847PubMedCrossRefGoogle Scholar
  16. Friedman R, Hughes AL (2003) The temporal distribution of gene duplication events in a set of highly conserved human gene families. Mol Biol Evol 20:154–161PubMedCrossRefGoogle Scholar
  17. Gilad Y, Man O, Pääbo S, Lancet D (2003) Human specific loss of olfactory receptor genes. Proc Natl Acad Sci USA 100:3324–3327PubMedCrossRefGoogle Scholar
  18. Graur D, Martin W (2004) Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision. Trends Genet 20:80–86PubMedCrossRefGoogle Scholar
  19. Harrison PM, Gerstein M (2002) Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Biol 318:1155–1174PubMedCrossRefGoogle Scholar
  20. He X, Zhang J (2005) Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169:1157–1164PubMedCrossRefGoogle Scholar
  21. Hedges SB, Kumar S (2004) Precision of molecular time estimates. Trends Genet 20:242–247PubMedCrossRefGoogle Scholar
  22. Hughes AL, da Silva J, Friedman R (2001) Ancient genome duplications did not structure the human hox-bearing chromosomes. Genome Res 11:771–780PubMedCrossRefGoogle Scholar
  23. Hughes T, Liberles D (2007) The pattern of evolution of smaller-scale gene duplicates in mammalian genomes is more consistent with neo- than subfunctionalisation. J Mol Evol 65:574–588PubMedCrossRefGoogle Scholar
  24. Hughes T, Liberles DA (2008) The power-law distribution of gene family size is driven by the pseudogenisation rate’s heterogeneity between gene families. Gene 414:85–94PubMedCrossRefGoogle Scholar
  25. Huynen MA, van Nimwegen E (1998) The frequency distribution of gene family sizes in complete genomes. Mol Biol Evol 15:583–589PubMedGoogle Scholar
  26. Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, Nicaud S, Jaffe D, Fisher S, Lutfalla G, Dossat C, Segurens B, Dasilva C, Salanoubat M, Levy M, Boudet N, Castellano S, Anthouard V, Jubin C, Castelli V, Katinka M, Vacherie B, Biémont C, Skalli Z, Cattolico L, Poulain J, Berardinis VD, Cruaud C, Duprat S, Brottier P, Coutanceau JP, Gouzy J, Parra G, Lardier G, Chapple C, McKernan KJ, McEwan P, Bosak S, Kellis M, Volff JN, Guigó R, Zody MC, Mesirov J, Lindblad-Toh K, Birren B, Nusbaum C, Kahn D, Robinson-Rechavi M, Laudet V, Schachter V, Quétier F, Saurin W, Scarpelli C, Wincker P, Lander ES, Weissenbach J, Crollius HR (2004) Genome duplication in the teleost fish tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431:946–957PubMedCrossRefGoogle Scholar
  27. Kikuta H, Laplante M, Navratilova P, Komisarczuk AZ, Engström PG, Fredman D, Akalin A, Caccamo M, Sealy I, Howe K, Ghislain J, Pezeron G, Mourrain P, Ellingsen S, Oates AC, Thisse C, Thisse B, Foucher I, Adolf B, Geling A, Lenhard B, Becker TS (2007) Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Res 17:545–555PubMedCrossRefGoogle Scholar
  28. Koonin EV (2003) Comparative genomics, minimal gene-sets and the last universal common ancestor. Nature Rev Microbiol 1:127–136CrossRefGoogle Scholar
  29. Lundin LG, Larhammar D, Hallböök F (2003) Numerous groups of chromosomal regional paralogies strongly indicate two genome doublings at the root of the vertebrates. J Struct Funct Genomics 3:53–63PubMedCrossRefGoogle Scholar
  30. Luscombe NM, Qian J, Zhang Z, Johnson T, Gerstein M (2002) The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties. Genome Biol 3(8):research0040.1-0040.7. Available at:
  31. Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290:1151–1155PubMedCrossRefGoogle Scholar
  32. Lynch M, Conery JS (2003) The evolutionary demography of duplicate genes. J Struct Funct Genomics 3:35–44PubMedCrossRefGoogle Scholar
  33. Lynch M, Force A (2000) The probability of duplicate gene preservation by subfunctionalization. Genetics 154:459–473PubMedGoogle Scholar
  34. Maere S, Bodt SD, Raes J, Casneuf T, Montagu MV, Kuiper M, de Peer YV (2005) Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci USA 102:5454–5459PubMedCrossRefGoogle Scholar
  35. McLysaght A, Hokamp K, Wolfe KH (2002) Extensive genomic duplication during early chordate evolution. Nature Genet 31:200–204PubMedCrossRefGoogle Scholar
  36. Ohno S (1970) Evolution by gene duplication. Springer-Verlag, New YorkGoogle Scholar
  37. Promponas VJ, Enright AJ, Tsoka S, Kreil DP, Leroy C, Hamodrakas S, Sander C, Ouzounis CA (2000) CAST: an iterative algorithm for the complexity analysis of sequence tracts. Bioinformatics 16:915–922PubMedCrossRefGoogle Scholar
  38. Rastogi S, Liberles DA (2005) Subfunctionalization of duplicated genes as a transition state to neofunctionalization. BMC Evol Biol 5:28PubMedCrossRefGoogle Scholar
  39. Rastogi S, Reuter N, Liberles DA (2006) Evaluation of models for the evolution of protein sequences and functions under structural constraint. Biophys Chem 124:134–144PubMedCrossRefGoogle Scholar
  40. Rivera MC, Jain R, Moore JE, Lake JA (1998) Genomic evidence for two functionally distinct gene classes. Proc Natl Acad Sci USA 95:6239–6244PubMedCrossRefGoogle Scholar
  41. Roth C, Betts MJ, Steffansson P, Saelensminde G, Liberles DA (2005) The Adaptive Evolution Database (TAED): a phylogeny based tool for comparative genomics. Nucleic Acids Res 33:D495–D497PubMedCrossRefGoogle Scholar
  42. Vandepoele K, Vos WD, Taylor JS, Meyer A, de Peer YV (2004) Major events in the genome evolution of vertebrates: paranome age and size differ considerably between ray-finned fishes and land vertebrates. Proc Natl Acad Sci USA 101:1638–1643PubMedCrossRefGoogle Scholar
  43. Wang Y, Gu X (2000) Evolutionary patterns of gene families generated in the early stage of vertebrates. J Mol Evol 51:88–96PubMedGoogle Scholar
  44. Woods IG, Wilson C, Friedlander B, Chang P, Reyes DK, Nix R, Kelly PD, Chu F, Postlethwait JH, Talbot WS (2005) The zebrafish gene map defines ancestral vertebrate chromosomes. Genome Res 15:1307–1314PubMedCrossRefGoogle Scholar
  45. Yanai I, Camacho CJ, DeLisi C (2000) Predictions of gene family distributions in microbial genomes: evolution by gene duplication and modification. Phys Rev Lett 85:2641–2644PubMedCrossRefGoogle Scholar
  46. Yang Z, Nielsen R (1998) Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol 46:409–418PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Computational Biology UnitBCCS, University of BergenBergenNorway
  2. 2.Department of Molecular BiologyUniversity of WyomingLaramieUSA

Personalised recommendations