Journal of Molecular Evolution

, Volume 78, Issue 2, pp 148–162

Multiple ITS Haplotypes in the Genome of the Lichenized Basidiomycete Cora inversa (Hygrophoraceae): Fact or Artifact?

  • Robert Lücking
  • James D. Lawrey
  • Patrick M. Gillevet
  • Masoumeh Sikaroodi
  • Manuela Dal-Forno
  • Simon A. Berger
Original Article

DOI: 10.1007/s00239-013-9603-y

Cite this article as:
Lücking, R., Lawrey, J.D., Gillevet, P.M. et al. J Mol Evol (2014) 78: 148. doi:10.1007/s00239-013-9603-y

Abstract

The internal transcribed spacer region (ITS) of the nuclear rDNA cistron represents the barcoding locus for Fungi. Intragenomic variation of this multicopy gene can interfere with accurate phylogenetic reconstruction of biological entities. We investigated the amount and nature of this variation for the lichenized fungus Cora inversa in the Hygrophoraceae (Basidiomycota: Agaricales), analyzing base call and length variation in ITS1 454 pyrosequencing data of three samples of the target mycobiont, for a total of 16,665 reads obtained from three separate repeats of the same samples under different conditions. Using multiple fixed alignment methods (PaPaRa) and maximum likelihood phylogenetic analysis (RAxML), we assessed phylogenetic relationships of the obtained reads, together with Sanger ITS sequences from the same samples. Phylogenetic analysis showed that all ITS1 reads belonged to a single species, C. inversa. Pyrosequencing data showed 266 insertion sites in addition to the 325 sites expected from Sanger sequences, for a total of 15,654 insertions (0.94 insertions per read). An additional 3,279 substitutions relative to the Sanger sequences were detected in the dataset, out of 5,461,125 bases to be called. Up to 99.3 % of the observed indels in the dataset could be interpreted as 454 pyrosequencing errors, approximately 65 % corresponding to incorrectly recovered homopolymer segments, and 35 % to carry-forward-incomplete-extension errors. Comparison of automated clustering and alignment-based phylogenetic analysis demonstrated that clustering of these reads produced a 35-fold overestimation of biological diversity in the dataset at the 95 % similarity threshold level, whereas phylogenetic analysis using a maximum likelihood approach accurately recovered a single biological entity. We conclude that variation detected in 454 pyrosequencing data must be interpreted with great care and that a combination of a sufficiently large number of reads per taxon, a set of Sanger references for the same taxon, and at least two runs under different emulsion PCR and sequencing conditions, are necessary to reliably separate biological variation from 454 sequencing errors. Our study shows that clustering methods are highly sensitive to artifactual sequence variation and inadequate to properly recover biological diversity in a dataset, if sequencing errors are substantial and not removed prior to clustering analysis.

Keywords

Basidiomycota Dictyonema Environmental sequencing Next-generation sequencing 

Supplementary material

239_2013_9603_MOESM1_ESM.txt (5 kb)
Supplementary material 1 (TXT 4 kb)
239_2013_9603_MOESM2_ESM.txt (5 kb)
Supplementary material 2 (TXT 4 kb)
239_2013_9603_MOESM3_ESM.txt (10.4 mb)
Supplementary material 3 (TXT 10672 kb)
239_2013_9603_MOESM4_ESM.xls (499 kb)
Supplementary material 4 (XLS 499 kb)

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Robert Lücking
    • 1
  • James D. Lawrey
    • 2
  • Patrick M. Gillevet
    • 2
  • Masoumeh Sikaroodi
    • 2
  • Manuela Dal-Forno
    • 2
  • Simon A. Berger
    • 3
  1. 1.Science & Education, Integrative Research CenterThe Field MuseumChicagoUSA
  2. 2.Department of Environmental Science and PolicyGeorge Mason UniversityFairfaxUSA
  3. 3.The Exelixis Lab, Scientific Computing GroupHeidelberg Institute for Theoretical StudiesHeidelbergGermany