Read Depth Analysis to Identify CNV in Bacteria Using CNOGpro

  • Ola BrynildsrudEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1833)


Whole-genome sequencing with short-read technologies is well suited for calling single nucleotide polymorphisms, but has major problems with the detection of structural variants larger than the read length. One such type of variation is copy number variation (CNV), which entails deletion or duplication of genomic regions, and the expansion or contraction of repeated elements. Duplicated and deleted regions will typically be collapsed during de novo assembly of sequence data, or ignored when mapping reads toward a reference. However, signatures of the copy number variation can be detected in the resultant read depth at each position in the genome. We here provide instructions on how to analyze this read depth signal with the R package CNOGpro, allowing for estimation of copy numbers with uncertainty for each feature in a genome.

Key words

Read depth CNV Bacteria CNOGpro Coverage Whole-genome sequencing 


  1. 1.
    Riehle MM, Bennett AF, Long AD (2001) Genetic architecture of thermal adaptation in Escherichia coli. Proc Natl Acad Sci 98(2):525–530CrossRefPubMedGoogle Scholar
  2. 2.
    San Millan A, Escudero JA, Gifford DR et al (2016) Multicopy plasmids potentiate the evolution of antibiotic resistance in bacteria. Nat Ecol Evol 1:0010CrossRefGoogle Scholar
  3. 3.
    Alkan C, Kidd JM, Marques-Bonet T et al (2009) Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 41(10):1061–1067CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Campbell PJ, Stephens PJ, Pleasance ED et al (2008) Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet 40(6):722–729CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Medvedev P, Stanciu M, Brudno M (2009) Computational methods for discovering structural variation with next-generation sequencing. Nat Methods 6:S13–S20CrossRefPubMedGoogle Scholar
  6. 6.
    Dohm JC, Lottaz C, Borodina T et al (2008) Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res 36(16):e105-e105CrossRefGoogle Scholar
  7. 7.
    Diskin SJ, Li M, Hou C et al (2008) Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Res 36(19):e126-e126CrossRefGoogle Scholar
  8. 8.
    Skovgaard O, Bak M, Løbner-Olesen A et al (2011) Genome-wide detection of chromosomal rearrangements, indels, and mutations in circular chromosomes by short read sequencing. Genome Res 21(8):1388–1393CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Khrameeva EE, Gelfand MS (2012) Biases in read coverage demonstrated by interlaboratory and interplatform comparison of 117 mRNA and genome sequencing experiments. BMC Bioinformatics 13(6):S4CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Lee H, Schatz MC (2012) Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score. Bioinformatics 28(16):2097–2105CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Sims D, Sudbery I, Ilott NE et al (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 15(2):121–132CrossRefPubMedGoogle Scholar
  12. 12.
    Brynildsrud O, Snipen L-G, Bohlin J (2015) CNOGpro: detection and quantification of CNVs in prokaryotic whole-genome sequencing data. Bioinformatics 31(11):1708–1715CrossRefPubMedGoogle Scholar
  13. 13.
    Charif D, Lobry JR (2007) SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis. In: Structural approaches to sequence evolution. Springer, New York, pp 207–232CrossRefGoogle Scholar
  14. 14.
    Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069CrossRefPubMedGoogle Scholar
  15. 15.
    Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio.GN]Google Scholar
  16. 16.
    Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Benjamini Y, Speed TP (2012) Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 40(10):e72-e72CrossRefGoogle Scholar
  18. 18.
    Yoon S, Xuan Z, Makarov V et al (2009) Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res 19(9):1586–1592CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Sepúlveda N, Campino SG, Assefa SA et al (2013) A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data. BMC Genomics 14(1):128CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Norwegian Institute of Public HealthOsloNorway

Personalised recommendations