From FASTQ to Function: In Silico Methods for Processing Next-Generation Sequencing Data

  • Mark D. Preston
  • Richard A. Stabler
Part of the Methods in Molecular Biology book series (MIMB, volume 1476)


This chapter presents a method to process C. difficile whole-genome next-generation sequencing data straight from the sequencer. Quality control processing and de novo assembly of these data enable downstream analyses such as gene annotation and in silico multi-locus strain-type identification.

Key words

Read trimming De novo assembly Gene annotation MLST 


  1. 1.
    Sebaihia M et al (2006) The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome. Nat Genet 38(7):779–786CrossRefPubMedGoogle Scholar
  2. 2.
    Andrews S (2015) FastQC. Available from:
  3. 3.
    Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18(5):821–829CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Assefa S et al (2009) ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics 25(15):1968–1969CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069CrossRefPubMedGoogle Scholar
  7. 7.
    Camacho C et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Carver TJ et al (2005) ACT: the Artemis Comparison Tool. Bioinformatics 21(16):3422–3423CrossRefPubMedGoogle Scholar
  9. 9.
    Seemann T (2015) MLST. Available from:
  10. 10.
    Cairns MD et al (2015) Genomic epidemiology of a protracted hospital outbreak caused by a toxin A-negative Clostridium difficile sublineage PCR ribotype 017 strain in London, England. J Clin Microbiol 53(10):3141–3147CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Gladman S, Seemann, T (2012) VelvetOptimiser. Available from:
  12. 12.
    Delcher AL, Salzberg SL, Phillippy AM (2003) Using MUMmer to identify similar regions in large sequence sets. Curr Protoc Bioinformatics. Chapter 10: Unit 10.3Google Scholar
  13. 13.
    Altschul SF et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410CrossRefPubMedGoogle Scholar
  14. 14.
    Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763CrossRefPubMedGoogle Scholar
  15. 15.
    Benson DA et al (2013) GenBank. Nucleic Acids Res 41(Database issue):D36–D42CrossRefPubMedGoogle Scholar
  16. 16.
    Stabler RA et al (2012) Macro and micro diversity of Clostridium difficile isolates from diverse sources and geographical locations. PLoS One 7(3):e31559CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.National Institute for Biological Standards and ControlSouth MimmsUK
  2. 2.Faculty of Infectious & Tropical DiseasesLondon School of Hygiene and Tropical MedicineLondonUK

Personalised recommendations