Journal of Molecular Evolution

, Volume 68, Issue 3, pp 197-204

First online:

Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum

  • Vadim V. GoremykinAffiliated withIASMA Research Center Email author 
  • , Roberto ViolaAffiliated withIASMA Research Center
  • , Frank H. HellwigAffiliated withInstitut für Spezielle Botanik, Universität Jena

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access


It is widely appreciated that noisy, highly variable data can impede phylogeney reconstruction. Researchers have for a long time omitted problematic data from phylogenetic analyses, such as the third-codon positions and variable regions. In the analyses of the phylogenetic relations of the angiosperms; however, inclusion of complete gene sequences into genomic-scale alignments has become a common practice. Here we demonstrate that this practice can be misleading. We show that support of the basal-most position of Amborella trichopoda among the angiosperms in the chloroplast genomic data is based only on a tiny subset (< 1% of the total alignment length) of the most variable positions in alignment, exhibiting mean maximum likelihood (ML) distance among the angiosperm operational taxonomic units (OTUs) approximately 36 substitutions/site. Exclusion of these positions leads to disappearance of the basal Amborella branch. Likewise, the recently reported sister-group relationship of Ceratophyllum to the eudicots is based on the presence of 2% of the most variable positions in the genomic alignment, exhibiting, on average, 20 substitutions/site in comparison among the angiosperm OTUs. These observations highlight a need for excluding a certain proportion of saturated positions in alignment from phylogenomic analyses.


Chloroplast genomes Molecular evolution Angiosperm diversification