NGS-QC Generator: A Quality Control System for ChIP-Seq and Related Deep Sequencing-Generated Datasets

  • Marco Antonio Mendoza-ParraEmail author
  • Mohamed-Ashick M. Saleem
  • Matthias Blum
  • Pierre-Etienne Cholley
  • Hinrich Gronemeyer
Part of the Methods in Molecular Biology book series (MIMB, volume 1418)


The combination of massive parallel sequencing with a variety of modern DNA/RNA enrichment technologies provides means for interrogating functional protein–genome interactions (ChIP-seq), genome-wide transcriptional activity (RNA-seq; GRO-seq), chromatin accessibility (DNase-seq, FAIRE-seq, MNase-seq), and more recently the three-dimensional organization of chromatin (Hi-C, ChIA-PET). In systems biology-based approaches several of these readouts are generally cumulated with the aim of describing living systems through a reconstitution of the genome-regulatory functions. However, an issue that is often underestimated is that conclusions drawn from such multidimensional analyses of NGS-derived datasets critically depend on the quality of the compared datasets. To address this problem, we have developed the NGS-QC Generator, a quality control system that infers quality descriptors for any kind of ChIP-sequencing and related datasets. In this chapter we provide a detailed protocol for (1) assessing quality descriptors with the NGS-QC Generator; (2) to interpret the generated reports; and (3) to explore the database of QC indicators ( for >21,000 publicly available datasets.

Key words

Next-generation sequencing Massive parallel sequencing Quality control ChIP-sequencing Galaxy Database 



This work was supported by funds from SATT/Conectus, the Fondation pour la Recherche Médicale (FRM), the Alliance Nationale pour les Sciences de la Vie et de la Santé–Institut Thématique Multi-organismes Cancer–Institut National du Cancer (INCa) grant “Epigenomics of breast cancer” and “EpiPCa,” the Ligue National Contre le Cancer (to H.G.; Equipe Labellisée).


  1. 1.
    Mendoza-Parra MA, Van Gool W, Saleem MAM, Ceschin DG, Gronemeyer H (2013) A quality control system for profiles obtained by ChIP sequencing. Nucleic Acids Res 41, e196CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74. doi: 10.1038/nature11247 CrossRefGoogle Scholar
  3. 3.
    Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M et al (2013) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41:D991–D995. doi: 10.1093/nar/gks1193 CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Kodama Y, Shumway M, Leinonen R (2012) The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res 40:D54–D56. doi: 10.1093/nar/gkr854 CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N et al (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079. doi: 10.1093/bioinformatics/btp352 CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. doi: 10.1093/bioinformatics/btq033 CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Andrews S. FastQC: a quality control tool for high throughput sequence data [Internet]. citeulike-article-id:11583827
  8. 8.
    Patel RK, Jain M (2012) NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7, e30619. doi: 10.1371/journal.pone.0030619 CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17(1). Next Gener Seq Data Anal.
  10. 10.
    Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11:R86. doi: 10.1186/gb-2010-11-8-r86 CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P et al (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15:1451–1455. doi: 10.1101/gr.4086505 CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M et al (2001) Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol. doi: 10.1002/0471142727.mb1910s89 Google Scholar
  13. 13.
    Helt GA, Nicol JW, Erwin E, Blossom E, Blanchard SG, Chervitz SA et al (2009) Genoviz Software Development Kit: Java tool kit for building genomics visualization applications. BMC Bioinformatics 10:266. doi: 10.1186/1471-2105-10-266 CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM et al (2002) The Human Genome Browser at UCSC. Genome Res 12:996–1006. doi: 10.1101/gr.229102 CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Marco Antonio Mendoza-Parra
    • 1
    Email author
  • Mohamed-Ashick M. Saleem
    • 1
  • Matthias Blum
    • 1
  • Pierre-Etienne Cholley
    • 1
  • Hinrich Gronemeyer
    • 1
  1. 1.Equipe Labellisée Ligue Contre le Cancer, Department of Functional Genomics and CancerInstitut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC)/CNRS/INSERM/Université de StrasbourgIllkirch CedexFrance

Personalised recommendations