Skip to main content
Log in

Dating the early evolution of plants: detection and molecular clock analyses of orthologs

  • Original Paper
  • Published:
Molecular Genetics and Genomics Aims and scope Submit manuscript

Abstract

Orthologs generally are under selective pressure against loss of function, while paralogs usually accumulate mutations and finally die or deviate in terms of function or regulation. Most ortholog detection methods contaminate the resulting datasets with a substantial amount of paralogs. Therefore we aimed to implement a straightforward method that allows the detection of ortholog clusters with a reduced amount of paralogs from completely sequenced genomes. The described cross-species expansion of the reciprocal best BLAST hit method is a time-effective method for ortholog detection, which results in 68% truly orthologous clusters and the procedure specifically enriches single-copy orthologs. The detection of true orthologs can provide a phylogenetic toolkit to better understand evolutionary processes. In a study across six photosynthetic eukaryotes, nuclear genes of putative mitochondrial origin were shown to be over-represented among single copy orthologs. These orthologs are involved in fundamental biological processes like amino acid metabolism or translation. Molecular clock analyses based on this dataset yielded divergence time estimates for the red/green algae (1,142 MYA), green algae/land plant (725 MYA), mosses/seed plant (496 MYA), gymno-/angiosperm (385 MYA) and monocotyledons/core eudicotyledons (301 MYA) divergence times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21:2104–2105

    Article  PubMed  CAS  Google Scholar 

  • Alexeyenko A, Tamas I, Liu G, Sonnhammer EL (2006) Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22:e9–e15

    Article  PubMed  CAS  Google Scholar 

  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    Article  PubMed  CAS  Google Scholar 

  • Armbrust EV, Berges JA, Bowler C, Green BR, Martinez D, Putnam NH, Zhou S, Allen AE, Apt KE, Bechner M, Brzezinski MA, Chaal BK, Chiovitti A, Davis AK, Demarest MS, Detter JC, Glavina T, Goodstein D, Hadi MZ, Hellsten U, Hildebrand M, Jenkins BD, Jurka J, Kapitonov VV, Kroger N, Lau WW, Lane TW, Larimer FW, Lippmeier JC, Lucas S, Medina M, Montsant A, Obornik M, Parker MS, Palenik B, Pazour GJ, Richardson PM, Rynearson TA, Saito MA, Schwartz DC, Thamatrakoln K, Valentin K, Vardi A, Wilkerson FP, Rokhsar DS (2004) The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science 306:79–86

    Article  PubMed  CAS  Google Scholar 

  • Bell CD, Soltis DE, Soltis PS (2005) The age of the angiosperms: a molecular timescale without a clock. Evolution: Int J Org Evolution 59:1245–1258

    CAS  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc B 57:289–300

    Google Scholar 

  • Birchler JA, Riddle NC, Auger DL, Veitia RA (2005) Dosage balance in gene regulation: biological implications. Trends Genet 21:219–226

    Article  PubMed  CAS  Google Scholar 

  • Blair JE, Shah P, Hedges SB (2005) Evolutionary sequence analysis of complete eukaryote genomes. BMC Bioinformatics 6:53

    Article  PubMed  Google Scholar 

  • Blanc G, Wolfe KH (2004) Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16:1679–1691

    Article  PubMed  CAS  Google Scholar 

  • Butterfield NJ (2001) Paleobiology of the late Mesoproterozoic (ca. 1200 Ma) hunting formation, Somerset Island, Arctic Canada. Precam Res 111:235–256

    Article  CAS  Google Scholar 

  • Crane PR, Friis EM, Pedersen KR (1995) The origin and early diversification of angiosperms. Nature 374:27–33

    Article  CAS  Google Scholar 

  • Crum HA (2001) Structural diversity of bryophytes. The University of Michigan Herbarium, Bloomfield Hills

    Google Scholar 

  • De Bodt S, Maere S, Van de Peer Y (2005) Genome duplication and the origin of angiosperms. Trends Ecol Evol 523:1–7

    Google Scholar 

  • Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15:330–340

    Article  PubMed  CAS  Google Scholar 

  • Dutilh BE, Huynen MA, Snel B (2006) A global definition of expression context is conserved between orthologs, but does not correlate with sequence conservation. BMC Genomics 7:10

    Article  PubMed  Google Scholar 

  • Fares MA, Byrne KP, Wolfe KH (2006) Rate asymmetry after genome duplication causes substantial long-branch attraction artifacts in the phylogeny of Saccharomyces species. Mol Biol Evol 23:245–253

    Article  PubMed  CAS  Google Scholar 

  • Felsenstein J (1978) Cases in which parsimony and compatibility methods will be positively misleading. Syst Zool 27:401–410

    Article  Google Scholar 

  • Fitch WM (1970) Distinguishing homologous from analogous proteins. Syst Zool 19:99–113

    Article  PubMed  CAS  Google Scholar 

  • Fitch WM (2000) Homology a personal view on some of the problems. Trends Genet 16:227–231

    Article  PubMed  CAS  Google Scholar 

  • Gray MW, Burger G, Lang BF (2001) The origin and early evolution of mitochondria. Genome Biol 2:REVIEWS1018

    Google Scholar 

  • Gutierrez RA, Green PJ, Keegstra K, Ohlrogge JB (2004) Phylogenetic profiling of the Arabidopsis thaliana proteome: what proteins distinguish plants from other organisms? Genome Biol 5:15

    Article  Google Scholar 

  • Hedges SB, Blair JE, Venturi ML, Shoe JL (2004) A molecular timescale of eukaryote evolution and the rise of complex multicellular life. BMC Evol Biol 4:2

    Article  PubMed  Google Scholar 

  • Iseli C, Jongeneel CV, Bucher P (1999) ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Int Conf Intell Syst Mol Biol, 138–148

  • Kenrick P, Crane PR (1997) The origin and early evolution of plants on land. Nature 389:33–39

    Article  CAS  Google Scholar 

  • Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Rogozin IB, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2004) A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol 5:15

    Article  Google Scholar 

  • Lang D, Eisinger J, Reski R, Rensing SA (2005) Representation and high-quality annotation of the Physcomitrella patens transcriptome demonstrates a high proportion of proteins involved in metabolism among mosses. Plant Biol 7:228–237

    Article  PubMed  Google Scholar 

  • Lee Y, Sultana R, Pertea G, Cho J, Karamycheva S, Tsai J, Parvizi B, Cheung F, Antonescu V, White J, Holt I, Liang F, Quackenbush J (2002) Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res 12:493–502

    Article  PubMed  CAS  Google Scholar 

  • Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189

    Article  PubMed  CAS  Google Scholar 

  • Li WH, Yang J, Gu X (2005) Expression divergence between duplicate genes. Trends Genet 21:602–607

    Article  PubMed  Google Scholar 

  • Martin W, Rujan T, Richly E, Hansen A, Cornelsen S, Lins T, Leister D, Stoebe B, Hasegawa M, Penny D (2002) Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc Natl Acad Sci USA 99:12246–12251

    Article  PubMed  CAS  Google Scholar 

  • Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJ, Vaughan R, Zdobnov EM (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res 31:315–318

    Article  PubMed  CAS  Google Scholar 

  • Mushegian AR, Garey JR, Martin J, Liu LX (1998) Large-scale taxonomic profiling of eukaryotic model organisms: a comparison of orthologous proteins encoded by the human, fly, nematode, and yeast genomes. Genome Res 8:590–598

    PubMed  CAS  Google Scholar 

  • O’Brien KP, Remm M, Sonnhammer EL (2005) Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res 33: D476–D480

    Article  PubMed  Google Scholar 

  • Raymond J, Zhaxybayeva O, Gogarten JP, Gerdes SY, Blankenship RE (2002) Whole-genome analysis of photosynthetic prokaryotes. Science 298:1616–1620

    Article  PubMed  CAS  Google Scholar 

  • Remm M, Storm CE, Sonnhammer EL (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 314:1041–1052

    Article  PubMed  CAS  Google Scholar 

  • Rensing SA, Fritzowsky D, Lang D, Reski R (2005) Protein encoding genes in an ancient plant: analysis of codon usage, retained genes and splice sites in a moss, Physcomitrella patens. BMC Genomics 6:43

    Article  PubMed  Google Scholar 

  • Richardt S, Lang D, Frank W, Reski R, Rensing SA (2007) PlanTAPDB: a phylogeny-based resource of plant transcription associated proteins. Plant Physiol 143:1452–1466

    Article  PubMed  CAS  Google Scholar 

  • Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12:85–94

    Article  PubMed  CAS  Google Scholar 

  • Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425

    PubMed  CAS  Google Scholar 

  • Sanderson MJ (2003) r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19:301–302

    Article  PubMed  CAS  Google Scholar 

  • Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502–504

    Article  PubMed  CAS  Google Scholar 

  • Shimodaira H, Hasegawa M (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16:1114–1116

    CAS  Google Scholar 

  • Sonnhammer EL, Koonin EV (2002) Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet 18:619–620

    Article  PubMed  CAS  Google Scholar 

  • Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehvaslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E (2002) The Bioperl toolkit: Perl modules for the life sciences. Genome Res 12:1611–1618

    Article  PubMed  CAS  Google Scholar 

  • Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41

    Article  PubMed  Google Scholar 

  • Taylor TN, Kerp H, Hass H (2005) Life history biology of early land plants: deciphering the gametophyte phase. Proc Natl Acad Sci USA 102:5892–5897

    Article  PubMed  CAS  Google Scholar 

  • Theissen G, Münster T, Henschel K (2001) Why don’t mosses flower? New Phytol 150:1–8

    Article  CAS  Google Scholar 

  • Troitsky AV, Melekhovets Yu F, Rakhimova GM, Bobrova VK, Valiejo-Roman KM, Antonov AS (1991) Angiosperm origin and early stages of seed plant evolution deduced from rRNA sequence comparisons. J Mol Evol 32:253–261

    Article  PubMed  CAS  Google Scholar 

  • Vandepoele K, Van de Peer Y (2005) Exploring the plant transcriptome through phylogenetic profiling. Plant Physiol 137:31–42

    Article  PubMed  CAS  Google Scholar 

  • Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18:691–699

    PubMed  CAS  Google Scholar 

  • Wolfe KH, Gouy M, Yang YW, Sharp PM, Li WH (1989) Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc Natl Acad Sci USA 86:6201–6205

    Article  PubMed  CAS  Google Scholar 

  • Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13:555–556

    PubMed  CAS  Google Scholar 

  • Yoon HS, Hackett JD, Ciniglia C, Pinto G, Bhattacharya D (2004) A molecular timeline for the origin of photosynthetic eukaryotes. Mol Biol Evol 21:809–818

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

We would like to thank Wolfgang R. Hess, Clemens Kreutz and Klaas Vandepoele for helpful discussions. Financial support by the DFG (Re 837/10-1) is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefan A. Rensing.

Additional information

Communicated by Y. Van de Peer.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary data 1The database queries for the cross-species ortholog selection (PDF 24 kb)

438_2007_257_MOESM2_ESM.xls

Supplementary data 2Names, numbers and annotation details (BLAST, GO, KEGG) of the 93 ortholog clusters as well as the results of the phylogenetic analyses (XLS 151 kb)

Supplementary data 3The accession numbers of the 93 orthologous gene clusters for the 6 species (XLS 47.2 kb)

438_2007_257_MOESM4_ESM.xls

Supplementary data 4The comparison results of our method and InParanoid for Cyanidioschyzon merolae and Chlamydomonas reinhardtii (XLS 172 kb)

Supplementary data 5The comparison results to MultiParanoid (all 6 species, confidence cut off: 1.0) (XLS 2.22 Mb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zimmer, A., Lang, D., Richardt, S. et al. Dating the early evolution of plants: detection and molecular clock analyses of orthologs. Mol Genet Genomics 278, 393–402 (2007). https://doi.org/10.1007/s00438-007-0257-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00438-007-0257-6

Keywords

Navigation