Journal of Molecular Evolution

, Volume 68, Issue 3, pp 197–204 | Cite as

Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum

  • Vadim V. Goremykin
  • Roberto Viola
  • Frank H. Hellwig


It is widely appreciated that noisy, highly variable data can impede phylogeney reconstruction. Researchers have for a long time omitted problematic data from phylogenetic analyses, such as the third-codon positions and variable regions. In the analyses of the phylogenetic relations of the angiosperms; however, inclusion of complete gene sequences into genomic-scale alignments has become a common practice. Here we demonstrate that this practice can be misleading. We show that support of the basal-most position of Amborella trichopoda among the angiosperms in the chloroplast genomic data is based only on a tiny subset (< 1% of the total alignment length) of the most variable positions in alignment, exhibiting mean maximum likelihood (ML) distance among the angiosperm operational taxonomic units (OTUs) approximately 36 substitutions/site. Exclusion of these positions leads to disappearance of the basal Amborella branch. Likewise, the recently reported sister-group relationship of Ceratophyllum to the eudicots is based on the presence of 2% of the most variable positions in the genomic alignment, exhibiting, on average, 20 substitutions/site in comparison among the angiosperm OTUs. These observations highlight a need for excluding a certain proportion of saturated positions in alignment from phylogenomic analyses.


Chloroplast genomes Molecular evolution Angiosperm diversification 


  1. Barkman TJ, Chenery G, McNeal JR, Lyons-Weile J, Ellisens WJ, Moore G, Wolfe AD, dePamphilis CW (2000) Independent and combined analyses of sequences from all three genomic compartments converge on the root of flowering plant phylogeny. Proc Natl Acad Sci USA 97:13166–13171PubMedCrossRefGoogle Scholar
  2. Bergsten J (2005) A review of long-branch attraction. Cladistics 21:163–193CrossRefGoogle Scholar
  3. Borsch T, Hilu KW, Quandt D, Wilde V, Neinhuis C, Barthlott W (2003) Non-coding plastid trnT-trnF sequences reveal a well resolved phylogeny of basal angiosperms. J Evol Biol 16:558–576PubMedCrossRefGoogle Scholar
  4. Crane PR, Friis EM, Pedersen KR (1995) The origin and early diversification of angiosperms. Nature 374:27–33CrossRefGoogle Scholar
  5. Ewing B, Green P (1998) Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res 8:186–194PubMedGoogle Scholar
  6. Felsenstein J (1978) Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 27:401–410CrossRefGoogle Scholar
  7. Goremykin V, Hansmann S, Martin W (1997) Evolutionary analysis of 58 proteins encoded in six completely sequenced chloroplast genomes: Revised molecular estimates of two seed plant divergence times. Plant Syst Evol 206:337–351CrossRefGoogle Scholar
  8. Goremykin VV, Holland B, Hirsch-Ernst KI, Hellwig FH (2003) The chloroplast genome of the “basal” angiosperm Calycanthus fertilis—structural and phylogenetic analyses. Plant Syst Evol 242:119–135CrossRefGoogle Scholar
  9. Goremykin VV, Hirsch-Ernst KI, Wolfl S, Hellwig FH (2004) The chloroplast genome of Nymphaea alba: Whole-genome analyses and the problem of identifying the most basal angiosperm. Mol Biol Evol 21:1445–1454PubMedCrossRefGoogle Scholar
  10. Goremykin VV, Holland B, Hirsch-Ernst KI, Hellwig FH (2005) Analysis of Acorus calamus chloroplast genome and its phylogenetic implications. Mol Biol Evol 22:1813–1822PubMedCrossRefGoogle Scholar
  11. Goremykin VV, Hellwig FH (2006) A new test of phylogenetic model fitness addresses the issue of the basal angiosperm phylogeny. Gene 381:81–91PubMedCrossRefGoogle Scholar
  12. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:504–696CrossRefGoogle Scholar
  13. Hilu KW, Borsch T, Muller K, Soltis DE, Soltis PS, Savolainen V, Chase MW, Powell M, Alice L, Evans R et al (2003) Angiosperm phylogeny based on matK sequence information. Am J Bot 90:1758–1776CrossRefGoogle Scholar
  14. Hiratsuka J, Shimada H, Whittier R, Ishibashi T, Sakamoto M, Mori M, Kondo C, Honji Y, Sun CR, Meng BY et al (1989) The complete sequence of the rice (Oryza sativa) chloroplast genome: intermolecular recombination between distinct tRNA genes accounts for a major plastid DNA inversion during the evolution of the cereals. Mol Gen Genet 217:185–194PubMedCrossRefGoogle Scholar
  15. Jansen RK, Cai Z, Raubeson LA, Daniell H, Depamphilis CW, Leebens-Mack J, Muller KF, Guisinger-Bellian M, Haberle RC, Hansen AK et al (2007) Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci USA 104:19369–19374PubMedCrossRefGoogle Scholar
  16. Jeffroy O, Brinkmann H, Delsuc F, Philippe H (2006) Phylogenomics: the beginning of incongruence? Trends Genet 22:225–231PubMedCrossRefGoogle Scholar
  17. Leebens-Mack J, Raubeson LA, Cui LY, Kuehl JV, Fourcade MH, Chumley TW, Boore JL, Jansen RK, de Pamphilis CW (2005) Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one’s way out of the Felsenstein zone. Mol Biol Evol 22:1948–1963PubMedCrossRefGoogle Scholar
  18. Mathews S, Donoghue MJ (1999) The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286:947–950PubMedCrossRefGoogle Scholar
  19. Mathews S, Donoghue MJ (2000) Basal angiosperm phylogeny inferred from duplicate phytochromes A and C. Int J Plant Sci 161(Suppl):S41–S55CrossRefGoogle Scholar
  20. Moore MJ, Bell CD, Soltis PS, Soltis DE (2007) Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms. Proc Natl Acad Sci USA 104:19363–19368PubMedCrossRefGoogle Scholar
  21. Murray MG, Thompson WF (1980) Rapid isolation of high molecular weight DNA. Nucleic Acids Res 8:4321–4325PubMedCrossRefGoogle Scholar
  22. Posada D, Crandall KA (1998) Modeltest: Testing the model of DNA substitution. Bioinformatics 14:817–818PubMedCrossRefGoogle Scholar
  23. Parkinson CL, Adams KL, Palmer JD (1999) Multigene analyses identify the three earliest lineages of extant flowering plants. Curr Biol 9:1485–1488PubMedCrossRefGoogle Scholar
  24. Qiu Y-L, Lee J, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zanis M, Zimmer EA, Chen Z, Savolainen V, Chase MW (1999) The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature 402:404–407PubMedCrossRefGoogle Scholar
  25. Qiu Y-L, Lee J, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zanis M, Zimmer EA, Chen Z, Savolainen V, Chase MW (2000) Phylogeny of basal angiosperms: analyses of five genes from three genomes. Int J Plant Sci 161(Suppl):S3–S27CrossRefGoogle Scholar
  26. Qiu Y-L, Dombrovska O, Lee J, Li L, Whitlock BA, Bernasconi-Quadroni F, Rest JS, Davis CC, Borsch T, Hilu KW et al (2005) Phylogenetic analyses of basal angiosperms based on nine plastid, mitochondrial, and nuclear genes. Int J Plant Sci 166:815–842CrossRefGoogle Scholar
  27. Soltis PS, Soltis DE, Chase MW (1999) Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402:402–403PubMedCrossRefGoogle Scholar
  28. Soltis PS, Soltis DE, Zanis MJ, Kim S (2000a) Basal lineages of angiosperms: relationships and implications for floral evolution. Int J Plant Sci 161(Suppl):S97–S107CrossRefGoogle Scholar
  29. Soltis DE, Soltis PS, Chase MW, Mort ME, Albach DC, Zanis M, Savolainen V, Hahn WH, Hoot SB, Fay MF et al (2000b) Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB sequences. Bot J Linn Soc 133:381–461Google Scholar
  30. Soltis DE, Albert VA, Savolainen V, Hilu K, Qiu Y-L, Chase MW, Farris JS, Stefanovic S, Rice DW, Palmer JD, Soltis PS (2004) Genome-scale data, angiosperm relationships, and “ending incongruence”: A cautionary tale in phylogenetics. Trends Plants Sci 9:477–483CrossRefGoogle Scholar
  31. Staden R, Beal KF, Bonfield JK (2000) The Staden package 1998. Meth Mol Biol 132:115–130Google Scholar
  32. Stefanovic S, Rice DW, Palmer JD (2004) Long branch attraction, taxon sampling, and the earliest angiosperms: Amborella or monocots? BMC Evol Biol 4:35PubMedCrossRefGoogle Scholar
  33. Strimmer K, von Haeseler A (1996) Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol Biol Evol 13:964–969Google Scholar
  34. Swofford DL (2002) PAUP*: phylogenetic analysis using parsimony (* and other methods). Version 4. Sinauer, SunderlandGoogle Scholar
  35. Tang J, Xia H, Cao M, Zhang X, Zeng W, Hu S, Tong W, Wang J, Wang J, Yu J, Yang H, Zhu Z (2004) A comparison of rice chloroplast genomes. Plant Physiol 135:412–420PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Vadim V. Goremykin
    • 1
  • Roberto Viola
    • 1
  • Frank H. Hellwig
    • 2
  1. 1.IASMA Research CenterSan Michele all’AdigeItaly
  2. 2.Institut für Spezielle BotanikUniversität JenaJenaGermany

Personalised recommendations