Skip to main content

Raw Sequence Data and Quality Control

  • Protocol
Bacterial Pangenomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1231))

Abstract

Next-generation sequencing technologies are extensively used in many fields of biology. One of the problems, related to the utilization of this kind of data, is the analysis of raw sequence quality and removal (trimming) of low-quality segments while retaining sufficient information for subsequent analyses. Here, we present a series of methods useful for converting and for refinishing one or more sequence files. One of the methods proposed, based on dynamic trimming, as implemented in the software StreamingTrim allows a fast and accurate trimming of sequence files, with low memory requirement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pettersson E, Lundeberg J, Ahmadian A (2009) Generations of sequencing technologies. Genomics 93:105–111

    Article  CAS  PubMed  Google Scholar 

  2. Sawicki MP, Samara G, Hurwitz M, Passaro E (1993) Human genome project. Am J Surg 165:258–264

    Article  CAS  PubMed  Google Scholar 

  3. Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res 8:175–185

    Article  CAS  PubMed  Google Scholar 

  4. Walther D, Bartha G, Morris M (2001) Base calling with lifetrace. Genome Res 11:875–888

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38:1767–1771

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Wikipedia (2014) ASCII. Wikipedia, the free encyclopedia

    Google Scholar 

  7. Wikipedia (2014) FASTQ format. Wikipedia, the free encyclopedia

    Google Scholar 

  8. Kunin V, Copeland A, Lapidus A, Mavromatis K, Hugenholtz P (2008) A bioinformatician’s guide to metagenomics. Microbiol Mol Biol Rev 72:557–578

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Cox MP, Peterson DA, Biggs PJ (2010) SolexaQA: at-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics 11:485

    Article  PubMed  PubMed Central  Google Scholar 

  10. Smeds L, Künstner A (2011) ConDeTri-a content dependent read trimmer for Illumina data. PLoS One 6:e26314

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Patel RK, Jain M (2012) NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7:e30619

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Bacci G, Bazzicalupo M, Benedetti A, Mengoni A (2014) StreamingTrim 1.0: a Java software for dynamic trimming of 16S rRNA sequence data from metagenetic studies. Mol Ecol Resour 14:426–434

    Article  CAS  PubMed  Google Scholar 

  13. Holland RC, Down TA, Pocock M, Prlić A, Huen D, James K, Foisy S, Dräger A, Yates A, Heuer M (2008) BioJava: an open-source framework for bioinformatics. Bioinformatics 24:2096–2097

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75:7537–7541

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Field D, Tiwari B, Booth T, Houten S, Swan D, Bertrand N, Thurston M (2006) Open software for biologists: from famine to feast. Nat Biotechnol 24:801–804

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giovanni Bacci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this protocol

Cite this protocol

Bacci, G. (2015). Raw Sequence Data and Quality Control. In: Mengoni, A., Galardini, M., Fondi, M. (eds) Bacterial Pangenomics. Methods in Molecular Biology, vol 1231. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-1720-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-1720-4_9

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-1719-8

  • Online ISBN: 978-1-4939-1720-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics