Skip to main content

A Bioinformatics Pipeline for Sequence-Based Analyses of Fungal Biodiversity

  • Protocol
  • First Online:
Fungal Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 722))

Abstract

The internal transcribed spacer (ITS) is the locus of choice with which to characterize fungal diversity in environmental samples. However, methods to analyze ITS datasets have lagged behind the capacity to generate large amounts of sequence information. Here, we describe our bioinformatics pipeline to process large fungal ITS sequence datasets, from raw chromatograms to a spreadsheet of operational taxonomic unit (OTU) abundances across samples. Steps include assembling of reads originating from one clone, identifying primer “barcodes” or “tags,” trimming vectors and primers, marking low-quality base calls and removing low-quality sequences, orienting sequences, extracting the ITS region from longer amplicons, and grouping sequences into OTUs. We expect that the principles and tools presented here are relevant to datasets arising from ever-evolving new technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fox, G. E., Stackebrandt, E., Hespell, R. B., Gibson, J., Maniloff, J., Dyer, T. A., Wolfe, R. S., Balch, W. E., Tanner, R. S., Magrum, L. J., Zablen, L. B., Blakemore, R., Gupta, R., Bonen, L., Lewis, B. J., Stahl, D. A., Luehrsen, K. R., Chen, K. N., and Woese, C. R. (1980) The phylogeny of prokaryotes, Science 209, 457–463.

    Article  PubMed  CAS  Google Scholar 

  2. Pace, N. R., Stahl, D. A., Lane, D. J., and Olsen G. J. (1985) Analyzing natural microbial populations by rRNA sequences, ASM American Society for Microbiology News 51, 4–12.

    Google Scholar 

  3. Giovannoni, S. J., Britschgi, T. B., Moyer, C. L., and Field, K. G. (1990) Genetic diversity in Sargasso Sea bacterioplankton, Nature 345, 60–63.

    Article  PubMed  CAS  Google Scholar 

  4. Vandenkoornhuyse, P., Baldauf, S. L., Leyval, C., Straczek, J., and Young, J. P. W. (2002) Evolution – extensive fungal diversity in plant roots, Science 295, 2051–2051.

    Article  PubMed  Google Scholar 

  5. Schadt, C. W., Martin, A. P., Lipson, D. A., and Schmidt, S. K. (2003) Seasonal dynamics of previously unknown fungal lineages in tundra soils, Science 301, 1359–1361.

    Article  PubMed  CAS  Google Scholar 

  6. O´Brien, H. E., Parrent, J. L., Jackson, J. A., Moncalvo, J. M., and Vilgalys, R. (2005) Fungal community analysis by large-scale sequencing of environmental samples, Appl Environ Microb 71, 5544–5550.

    Article  Google Scholar 

  7. Maidak, B. L., Cole, J. R., Lilburn, T. G., Parker, C. T., Saxman, P. R., Farris, R. J., Garrity, G. M., Olsen, G. J., Schmidt, T. M., and Tiedje, J. M. (2001) The RDP-II (Ribosomal Database Project), Nucleic Acids Res 29, 173–174.

    Article  PubMed  CAS  Google Scholar 

  8. DeSantis, T. Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E. L., Keller, K., Huber, T., Dalevi, D., Hu, P., and Andersen, G. L. (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB, Appl Environ Microb 72, 5069–5072.

    Article  CAS  Google Scholar 

  9. Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, E. B., Lesniewski, R. A., Oakley, B. B., Parks, D. H., Robinson, C. J., Sahl, J. W., Stres, B., Thallinger, G. G., Van Horn, D. J., and Weber, C. F. (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities, Appl Environ Microb 75, 7537–7541.

    Article  CAS  Google Scholar 

  10. Gardes, M., and Bruns, T. D. (1993) ITS primers with enhanced specificity for basidiomycetes – application to the identification of mycorrhizae and rusts, Mol Ecol 2, 113–118.

    Article  PubMed  CAS  Google Scholar 

  11. Seifert, K. A. (2009) Progress towards DNA barcoding of fungi, Mol Ecol Resour 9, 83–89.

    Article  PubMed  CAS  Google Scholar 

  12. Kunin, V., Engelbrektson, A., Ochman, H., and Hugenholtz, P. (2010) Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates, Environ Microbiol 12, 118–23.

    Google Scholar 

  13. Meyerhans, A., Vartanian, J. P., and Wainhobson, S. (1990) DNA recombination during Pcr, Nucleic Acids Res 18, 1687–1691.

    Article  PubMed  CAS  Google Scholar 

  14. Ashelford, K. E., Chuzhanova, N. A., Fry, J. C., Jones, A. J., and Weightman, A. J. (2006) New screening software shows that most recent large 16S rRNA gene clone libraries contain chimeras, Appl Environ Microb 72, 5734–5741.

    Article  CAS  Google Scholar 

  15. Valentini, A., Miquel, C., Nawaz, M. A., Bellemain, E., Coissac, E., Pompanon, F., Gielly, L., Cruaud, C., Nascetti, G., Wincker, P., Swenson, J. E., and Taberlet, P. (2009) New perspectives in diet analysis based on DNA barcoding and parallel pyrosequencing: the trnL approach, Mol Ecol Resour 9, 51–60.

    Article  PubMed  CAS  Google Scholar 

  16. Sogin, M. L., Morrison, H. G., Huber, J. A., Mark Welch, D., Huse, S. M., Neal, P. R., Arrieta, J. M., and Herndl, G. J. (2006) Microbial diversity in the deep sea and the underexplored “rare biosphere”, P Natl Acad Sci USA 103, 12115–12120.

    Article  CAS  Google Scholar 

  17. Buee, M., Reich, M., Murat, C., Morin, E., Nilsson, R. H., Uroz, S., and Martin, F. (2009) 454 Pyrosequencing analyses of forest soils reveal an unexpectedly high fungal diversity, New Phytol 184, 449–456.

    Article  PubMed  CAS  Google Scholar 

  18. Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A., Berka, J., Braverman, M. S., Chen, Y. J., Chen, Z. T., Dewell, S. B., Du, L., Fierro, J. M., Gomes, X. V., Godwin, B. C., He, W., Helgesen, S., Ho, C. H., Irzyk, G. P., Jando, S. C., Alenquer, M. L. I., Jarvie, T. P., Jirage, K. B., Kim, J. B., Knight, J. R., Lanza, J. R., Leamon, J. H., Lefkowitz, S. M., Lei, M., Li, J., Lohman, K. L., Lu, H., Makhijani, V. B., McDade, K. E., McKenna, M. P., Myers, E. W., Nickerson, E., Nobile, J. R., Plant, R., Puc, B. P., Ronan, M. T., Roth, G. T., Sarkis, G. J., Simons, J. F., Simpson, J. W., Srinivasan, M., Tartaro, K. R., Tomasz, A., Vogt, K. A., Volkmer, G. A., Wang, S. H., Wang, Y., Weiner, M. P., Yu, P. G., Begley, R. F., and Rothberg, J. M. (2005) Genome sequencing in microfabricated high-density picolitre reactors, Nature 437, 376–380.

    PubMed  CAS  Google Scholar 

  19. Taylor, D. L., Herriott, I. C., Long, J., and O’Neill, K. (2007) TOPO TA is A-OK: a test of phylogenetic bias in fungal environmental clone library construction, Environ Microbiol 9, 1329–1334.

    Article  PubMed  CAS  Google Scholar 

  20. Taylor, D. L., Booth, M. G., Mcfarland, J. W., Herriott, I. C., Lennon, N. J., Nusbaum, C., and Marr, T. G. (2008) Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach, Mol Ecol Resour 8, 742–752.

    Article  PubMed  CAS  Google Scholar 

  21. Geml, J., Laursen, G. A., and Taylor, D. L. (2008) Molecular diversity assessment of arctic and boreal Agaricus taxa, Mycologia 100, 577–589.

    Article  PubMed  Google Scholar 

  22. Geml, J., Laursen, G. A., Timling, I., Mcfarland, J. M., Booth, M. G., Lennon, N., Nusbaum, C., and Taylor, D. L. (2009) Molecular phylogenetic biodiversity assessment of arctic and boreal ectomycorrhizal Lactarius Pers. (Russulales; Basidiomycota) in Alaska, based on soil and sporocarp DNA, Mol Ecol 18, 2213–2227.

    Article  PubMed  CAS  Google Scholar 

  23. White, T. J., Bruns, T., Lee. S., Taylor, J. (1990) Amplification and direct sequencing of fungal ribosomal RNA Genes for phylogenetics, PCR protocols: a guide to methods and applications 42, 315–322.

    Google Scholar 

  24. Ewing, B., Hillier, L., Wendl, M. C., and Green, P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment, Genome Res 8, 175–185.

    PubMed  CAS  Google Scholar 

  25. Ewing, B., and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities, Genome Res 8, 186–194.

    PubMed  CAS  Google Scholar 

  26. Brockman, W., Alvarez, P., Young, S., Garber, M., Giannoukos, G., Lee, W. L., Russ, C., Lander, E. S., Nusbaum, C., and Jaffe, D. B. (2008) Quality scores and SNP detection in sequencing-by-synthesis systems, Genome Res 18, 763–770.

    Article  PubMed  CAS  Google Scholar 

  27. Gordon, D., Abajian, C., and Green, P. (1998) Consed: A graphical tool for sequence finishing, Genome Res 8, 195–202.

    PubMed  CAS  Google Scholar 

  28. Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) Clustal-W – improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Res 22, 4673–4680.

    Article  PubMed  CAS  Google Scholar 

  29. Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Res 32, 1792–1797.

    Article  PubMed  CAS  Google Scholar 

  30. Hall, T. A. (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT, In: Nucleic acids symposium series. p. 95–98.

    Google Scholar 

  31. Pertea, G., Huang, X. Q., Liang, F., Antonescu, V., Sultana, R., Karamycheva, S., Lee, Y., White, J., Cheung, F., Parvizi, B., Tsai, J., and Quackenbush, J. (2003) TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets, Bioinformatics 19, 651–652.

    Article  PubMed  CAS  Google Scholar 

  32. Huang, X. Q., and Madan, A. (1999) Cap3: A DNA sequence assembly program, Genome Res 9, 868–877.

    Article  PubMed  CAS  Google Scholar 

  33. Higgins, K. L., Arnold, A. E., Miadlikowska, J., Sarvate, S. D., and Lutzoni, F. (2007) Phylogenetic relationships, host affinity, and geographic structure of boreal and arctic endophytes from three major plant lineages, Mol Phylogenet Evol 42, 543–555.

    Article  PubMed  CAS  Google Scholar 

  34. Colwell, R. K., and Coddington, J. A. (1994) Estimating terrestrial biodiversity through extrapolation, Philos T Roy Soc B 345, 101–118.

    Article  CAS  Google Scholar 

  35. McCune, B., Mefford, M. J. (1999) PC-ord. Multivariate analysis of ecological data, version 4(0).

    Google Scholar 

  36. Oksanen, J., Kindt, R., Legendre, P., O’Hara, B., Stevens, M. H. (2007) vegan: Community Ecology Package. R package version 1.8-8. Online at: http://r-forge.r-project.org/projects/vegan.

  37. Lozupone, C., and Knight, R. (2005) UniFrac: a new phylogenetic method for comparing microbial communities, Appl Environ Microb 71, 8228–8235.

    Article  CAS  Google Scholar 

  38. Webb, C. O., Ackerly, D. D., and Kembel, S. W. (2008) Phylocom: software for the analysis of phylogenetic community structure and trait evolution, Bioinformatics 24, 2098–2100.

    Article  PubMed  CAS  Google Scholar 

  39. Koljalg, U., Larsson, K. H., Abarenkov, K., Nilsson, R. H., Alexander, I. J., Eberhardt, U., Erland, S., Hoiland, K., Kjoller, R., Larsson, E., Pennanen, T., Sen, R., Taylor, A. F. S., Tedersoo, L., Vralstad, T., and Ursing, B. M. (2005) UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi, New Phytol 166, 1063–1068.

    Article  PubMed  CAS  Google Scholar 

  40. Nilsson, R., Bok, G., Ryberg, M., Kristiansson, E., Hallenberg, N. (2009) A software pipeline for processing and identification of fungal ITS sequences. Source Code Biol Med 4, 1.

    Article  PubMed  Google Scholar 

  41. Jumpponen, A. (2003) Soil fungal community assembly in a primary successional glacier forefront ecosystem as inferred from rDNA sequence analyses, New Phytol 158, 569–578.

    Article  Google Scholar 

  42. Huber, T., Faulkner, G., and Hugenholtz, P. (2004) Bellerophon: a program to detect chimeric sequences in multiple sequence alignments, Bioinformatics 20, 2317–2319.

    Article  PubMed  CAS  Google Scholar 

  43. Perotto, S., Nepote-Fus, P., Saletta, L., Bandi, C., and Young, J. P. W. (2000) A diverse population of introns in the nuclear ribosomal genes off ericoid mycorrhizal fungi includes elements with sequence similarity to endonuclease-coding genes, Mol Biol Evol 17, 44–59.

    PubMed  CAS  Google Scholar 

Download references

Acknowledgments

We thank James Long for writing several of the original pipeline scripts and Dan Cardin for writing the tag-finder script. Niall Lennon and Chad Nusbaum of the Broad Institute, MA, spearheaded high-throughput Sanger sequencing of our fungal clone libraries. Lab members Michael Booth, Robert Burgess, Ian Herriott, Jack McFarland, and Ina Timling have assisted with testing and improving our pipeline and also provided valuable comments on earlier drafts of the chapter. This work was supported in part by the National Science Foundation under grant numbers EF-0333308 and ARC-0632332. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the National Science Foundation. This publication was also made possible by grant number 2P20RR016466 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Lee Taylor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Taylor, D.L., Houston, S. (2011). A Bioinformatics Pipeline for Sequence-Based Analyses of Fungal Biodiversity. In: Xu, JR., Bluhm, B. (eds) Fungal Genomics. Methods in Molecular Biology, vol 722. Humana Press. https://doi.org/10.1007/978-1-61779-040-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-040-9_10

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-61779-039-3

  • Online ISBN: 978-1-61779-040-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics