Molecular Genetics and Genomics

, Volume 278, Issue 4, pp 393–402 | Cite as

Dating the early evolution of plants: detection and molecular clock analyses of orthologs

  • Andreas Zimmer
  • Daniel Lang
  • Sandra Richardt
  • Wolfgang Frank
  • Ralf Reski
  • Stefan A. Rensing
Original Paper


Orthologs generally are under selective pressure against loss of function, while paralogs usually accumulate mutations and finally die or deviate in terms of function or regulation. Most ortholog detection methods contaminate the resulting datasets with a substantial amount of paralogs. Therefore we aimed to implement a straightforward method that allows the detection of ortholog clusters with a reduced amount of paralogs from completely sequenced genomes. The described cross-species expansion of the reciprocal best BLAST hit method is a time-effective method for ortholog detection, which results in 68% truly orthologous clusters and the procedure specifically enriches single-copy orthologs. The detection of true orthologs can provide a phylogenetic toolkit to better understand evolutionary processes. In a study across six photosynthetic eukaryotes, nuclear genes of putative mitochondrial origin were shown to be over-represented among single copy orthologs. These orthologs are involved in fundamental biological processes like amino acid metabolism or translation. Molecular clock analyses based on this dataset yielded divergence time estimates for the red/green algae (1,142 MYA), green algae/land plant (725 MYA), mosses/seed plant (496 MYA), gymno-/angiosperm (385 MYA) and monocotyledons/core eudicotyledons (301 MYA) divergence times.


Ortholog detection Molecular clock Phylogeny Plant evolution 



We would like to thank Wolfgang R. Hess, Clemens Kreutz and Klaas Vandepoele for helpful discussions. Financial support by the DFG (Re 837/10-1) is gratefully acknowledged.

Supplementary material

438_2007_257_MOESM1_ESM.pdf (24 kb)
Supplementary data 1The database queries for the cross-species ortholog selection (PDF 24 kb)
438_2007_257_MOESM2_ESM.xls (152 kb)
Supplementary data 2Names, numbers and annotation details (BLAST, GO, KEGG) of the 93 ortholog clusters as well as the results of the phylogenetic analyses (XLS 151 kb)
438_2007_257_MOESM3_ESM.xls (47 kb)
Supplementary data 3The accession numbers of the 93 orthologous gene clusters for the 6 species (XLS 47.2 kb)
438_2007_257_MOESM4_ESM.xls (172 kb)
Supplementary data 4The comparison results of our method and InParanoid for Cyanidioschyzon merolae and Chlamydomonas reinhardtii (XLS 172 kb)
438_2007_257_MOESM5_ESM.xls (2.2 mb)
Supplementary data 5The comparison results to MultiParanoid (all 6 species, confidence cut off: 1.0) (XLS 2.22 Mb)


  1. Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21:2104–2105PubMedCrossRefGoogle Scholar
  2. Alexeyenko A, Tamas I, Liu G, Sonnhammer EL (2006) Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22:e9–e15PubMedCrossRefGoogle Scholar
  3. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedCrossRefGoogle Scholar
  4. Armbrust EV, Berges JA, Bowler C, Green BR, Martinez D, Putnam NH, Zhou S, Allen AE, Apt KE, Bechner M, Brzezinski MA, Chaal BK, Chiovitti A, Davis AK, Demarest MS, Detter JC, Glavina T, Goodstein D, Hadi MZ, Hellsten U, Hildebrand M, Jenkins BD, Jurka J, Kapitonov VV, Kroger N, Lau WW, Lane TW, Larimer FW, Lippmeier JC, Lucas S, Medina M, Montsant A, Obornik M, Parker MS, Palenik B, Pazour GJ, Richardson PM, Rynearson TA, Saito MA, Schwartz DC, Thamatrakoln K, Valentin K, Vardi A, Wilkerson FP, Rokhsar DS (2004) The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science 306:79–86PubMedCrossRefGoogle Scholar
  5. Bell CD, Soltis DE, Soltis PS (2005) The age of the angiosperms: a molecular timescale without a clock. Evolution: Int J Org Evolution 59:1245–1258Google Scholar
  6. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc B 57:289–300Google Scholar
  7. Birchler JA, Riddle NC, Auger DL, Veitia RA (2005) Dosage balance in gene regulation: biological implications. Trends Genet 21:219–226PubMedCrossRefGoogle Scholar
  8. Blair JE, Shah P, Hedges SB (2005) Evolutionary sequence analysis of complete eukaryote genomes. BMC Bioinformatics 6:53PubMedCrossRefGoogle Scholar
  9. Blanc G, Wolfe KH (2004) Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16:1679–1691PubMedCrossRefGoogle Scholar
  10. Butterfield NJ (2001) Paleobiology of the late Mesoproterozoic (ca. 1200 Ma) hunting formation, Somerset Island, Arctic Canada. Precam Res 111:235–256CrossRefGoogle Scholar
  11. Crane PR, Friis EM, Pedersen KR (1995) The origin and early diversification of angiosperms. Nature 374:27–33CrossRefGoogle Scholar
  12. Crum HA (2001) Structural diversity of bryophytes. The University of Michigan Herbarium, Bloomfield HillsGoogle Scholar
  13. De Bodt S, Maere S, Van de Peer Y (2005) Genome duplication and the origin of angiosperms. Trends Ecol Evol 523:1–7Google Scholar
  14. Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340PubMedCrossRefGoogle Scholar
  15. Dutilh BE, Huynen MA, Snel B (2006) A global definition of expression context is conserved between orthologs, but does not correlate with sequence conservation. BMC Genomics 7:10PubMedCrossRefGoogle Scholar
  16. Fares MA, Byrne KP, Wolfe KH (2006) Rate asymmetry after genome duplication causes substantial long-branch attraction artifacts in the phylogeny of Saccharomyces species. Mol Biol Evol 23:245–253PubMedCrossRefGoogle Scholar
  17. Felsenstein J (1978) Cases in which parsimony and compatibility methods will be positively misleading. Syst Zool 27:401–410CrossRefGoogle Scholar
  18. Fitch WM (1970) Distinguishing homologous from analogous proteins. Syst Zool 19:99–113PubMedCrossRefGoogle Scholar
  19. Fitch WM (2000) Homology a personal view on some of the problems. Trends Genet 16:227–231PubMedCrossRefGoogle Scholar
  20. Gray MW, Burger G, Lang BF (2001) The origin and early evolution of mitochondria. Genome Biol 2:REVIEWS1018Google Scholar
  21. Gutierrez RA, Green PJ, Keegstra K, Ohlrogge JB (2004) Phylogenetic profiling of the Arabidopsis thaliana proteome: what proteins distinguish plants from other organisms? Genome Biol 5:15CrossRefGoogle Scholar
  22. Hedges SB, Blair JE, Venturi ML, Shoe JL (2004) A molecular timescale of eukaryote evolution and the rise of complex multicellular life. BMC Evol Biol 4:2PubMedCrossRefGoogle Scholar
  23. Iseli C, Jongeneel CV, Bucher P (1999) ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Int Conf Intell Syst Mol Biol, 138–148Google Scholar
  24. Kenrick P, Crane PR (1997) The origin and early evolution of plants on land. Nature 389:33–39CrossRefGoogle Scholar
  25. Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Rogozin IB, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2004) A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol 5:15CrossRefGoogle Scholar
  26. Lang D, Eisinger J, Reski R, Rensing SA (2005) Representation and high-quality annotation of the Physcomitrella patens transcriptome demonstrates a high proportion of proteins involved in metabolism among mosses. Plant Biol 7:228–237PubMedCrossRefGoogle Scholar
  27. Lee Y, Sultana R, Pertea G, Cho J, Karamycheva S, Tsai J, Parvizi B, Cheung F, Antonescu V, White J, Holt I, Liang F, Quackenbush J (2002) Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res 12:493–502PubMedCrossRefGoogle Scholar
  28. Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189PubMedCrossRefGoogle Scholar
  29. Li WH, Yang J, Gu X (2005) Expression divergence between duplicate genes. Trends Genet 21:602–607PubMedCrossRefGoogle Scholar
  30. Martin W, Rujan T, Richly E, Hansen A, Cornelsen S, Lins T, Leister D, Stoebe B, Hasegawa M, Penny D (2002) Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc Natl Acad Sci USA 99:12246–12251PubMedCrossRefGoogle Scholar
  31. Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJ, Vaughan R, Zdobnov EM (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res 31:315–318PubMedCrossRefGoogle Scholar
  32. Mushegian AR, Garey JR, Martin J, Liu LX (1998) Large-scale taxonomic profiling of eukaryotic model organisms: a comparison of orthologous proteins encoded by the human, fly, nematode, and yeast genomes. Genome Res 8:590–598PubMedGoogle Scholar
  33. O’Brien KP, Remm M, Sonnhammer EL (2005) Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res 33: D476–D480PubMedCrossRefGoogle Scholar
  34. Raymond J, Zhaxybayeva O, Gogarten JP, Gerdes SY, Blankenship RE (2002) Whole-genome analysis of photosynthetic prokaryotes. Science 298:1616–1620PubMedCrossRefGoogle Scholar
  35. Remm M, Storm CE, Sonnhammer EL (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 314:1041–1052PubMedCrossRefGoogle Scholar
  36. Rensing SA, Fritzowsky D, Lang D, Reski R (2005) Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens. BMC Genomics 6:43PubMedCrossRefGoogle Scholar
  37. Richardt S, Lang D, Frank W, Reski R, Rensing SA (2007) PlanTAPDB: a phylogeny-based resource of plant transcription associated proteins. Plant Physiol 143:1452–1466PubMedCrossRefGoogle Scholar
  38. Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12:85–94PubMedCrossRefGoogle Scholar
  39. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425PubMedGoogle Scholar
  40. Sanderson MJ (2003) r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19:301–302PubMedCrossRefGoogle Scholar
  41. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502–504PubMedCrossRefGoogle Scholar
  42. Shimodaira H, Hasegawa M (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16:1114–1116Google Scholar
  43. Sonnhammer EL, Koonin EV (2002) Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet 18:619–620PubMedCrossRefGoogle Scholar
  44. Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehvaslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E (2002) The Bioperl toolkit: Perl modules for the life sciences. Genome Res 12:1611–1618PubMedCrossRefGoogle Scholar
  45. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41PubMedCrossRefGoogle Scholar
  46. Taylor TN, Kerp H, Hass H (2005) Life history biology of early land plants: deciphering the gametophyte phase. Proc Natl Acad Sci USA 102:5892–5897PubMedCrossRefGoogle Scholar
  47. Theissen G, Münster T, Henschel K (2001) Why don’t mosses flower? New Phytol 150:1–8CrossRefGoogle Scholar
  48. Troitsky AV, Melekhovets Yu F, Rakhimova GM, Bobrova VK, Valiejo-Roman KM, Antonov AS (1991) Angiosperm origin and early stages of seed plant evolution deduced from rRNA sequence comparisons. J Mol Evol 32:253–261PubMedCrossRefGoogle Scholar
  49. Vandepoele K, Van de Peer Y (2005) Exploring the plant transcriptome through phylogenetic profiling. Plant Physiol 137:31–42PubMedCrossRefGoogle Scholar
  50. Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18:691–699PubMedGoogle Scholar
  51. Wolfe KH, Gouy M, Yang YW, Sharp PM, Li WH (1989) Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc Natl Acad Sci USA 86:6201–6205PubMedCrossRefGoogle Scholar
  52. Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13:555–556PubMedGoogle Scholar
  53. Yoon HS, Hackett JD, Ciniglia C, Pinto G, Bhattacharya D (2004) A molecular timeline for the origin of photosynthetic eukaryotes. Mol Biol Evol 21:809–818PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • Andreas Zimmer
    • 1
  • Daniel Lang
    • 1
  • Sandra Richardt
    • 1
  • Wolfgang Frank
    • 1
  • Ralf Reski
    • 1
  • Stefan A. Rensing
    • 1
  1. 1.Plant Biotechnology, Faculty of BiologyUniversity of FreiburgFreiburgGermany

Personalised recommendations