Skip to main content

Evaluation of Whole Genome Sequencing Data

  • Protocol
  • First Online:
Lymphoma

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1956))

Abstract

Whole genome sequencing (WGS) can provide comprehensive insights into the genetic makeup of lymphomas. Here we describe a selection of methods for the analysis of WGS data, including alignment, identification of different classes of genomic variants, the identification of driver mutations, and the identification of mutational signatures. We further outline design considerations for WGS studies and provide a variety of quality control measures to detect common quality problems in the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Stratton MR, Campbell PJ, Futreal PA (2009) The cancer genome. Nature 458:719–724. https://doi.org/10.1038/nature07943

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Ley TJ, Mardis ER, Ding L et al (2008) DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456:66–72. https://doi.org/10.1038/nature07485

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Meyerson M, Gabriel S, Getz G (2010) Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet 11:685–696

    Article  CAS  PubMed  Google Scholar 

  4. Alioto TS, Buchhalter I, Derdak S et al (2015) A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat Commun 6:10001. https://doi.org/10.1038/ncomms10001

    Article  CAS  PubMed  Google Scholar 

  5. Davies H, Glodzik D, Morganella S et al (2017) HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat Med 23:517–525. https://doi.org/10.1038/nm.4292

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Hudson TJ, Anderson W, Aretz A et al (2010) International network of cancer genome projects. Nature 464:993–998. https://doi.org/10.1038/nature08987

    Article  CAS  PubMed  Google Scholar 

  7. Robbe P, Popitsch N, Knight SJL et al (2018) Clinical whole-genome sequencing from routine formalin-fixed, paraffin-embedded specimens: pilot study for the 100,000 Genomes Project. Genet Med 20(10):1196–1205. https://doi.org/10.1038/gim.2017.241

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. https://doi.org/10.1093/bioinformatics/btu170

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/btp324

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. https://www.arxiv.org/abs/1303.3997

  11. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. https://doi.org/10.1038/nmeth.1923

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Marco-Sola S, Sammeth M, Guigó R, Ribeca P (2012) The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Methods 9:1185–1188. https://doi.org/10.1038/nmeth.2221

    Article  CAS  PubMed  Google Scholar 

  13. Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11:473–483

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Treangen TJ, Salzberg SL (2011) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 13:36–46. https://doi.org/10.1038/nrg3117

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Lippert RA (2005) Space-efficient whole genome comparisons with burrows–wheeler transforms. J Comput Biol 12:407–415. https://doi.org/10.1089/cmb.2005.12.407

    Article  CAS  PubMed  Google Scholar 

  16. Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26:589–595. https://doi.org/10.1093/bioinformatics/btp698

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. BroadInstitute (2016) Picard Tools—By Broad Institute. http://broadinstitute.github.io/picard/. Accessed 6 May 2018

  19. Tischler G, Leonard S (2014) Biobambam: tools for read pair collation based algorithms on BAM files. Source Code Biol Med 9:13

    Article  PubMed Central  Google Scholar 

  20. Tarasov A, Vilella AJ, Cuppen E et al (2015) Sambamba: fast processing of NGS alignment formats. Bioinformatics 31:2032–2034. https://doi.org/10.1093/bioinformatics/btv098

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Van der Auwera GA, Carneiro MO, Hartl C et al (2013) From fastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protoc Bioinformatics 43:11.10.1–11.1033. https://doi.org/10.1002/0471250953.bi1110s43

    Article  Google Scholar 

  22. Depristo MA, Banks E, Poplin R et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–501. https://doi.org/10.1038/ng.806

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Poplin R, Ruano-Rubio V, DePristo MA, et al (2017) Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv 201178. https://doi.org/10.1101/201178

  24. Rimmer A, Phan H, Mathieson I et al (2014) Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat Genet 46:912–918. https://doi.org/10.1038/ng.3036

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Garrison E, Marth G (2012) Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907. https://arxiv.org/abs/1207.3907

  26. Kim S, Scheffler K, Halpern AL, et al (2017) Strelka2: Fast and accurate variant calling for clinical sequencing applications. bioRxiv 192872. https://doi.org/10.1101/192872

  27. Xu C (2018) A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data. Comput Struct Biotechnol J 16:15–24

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Cibulskis K, Lawrence MS, Carter SL et al (2013) Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol 31:213–219. https://doi.org/10.1038/nbt.2514

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Chen X, Schulz-Trieglaff O, Shaw R et al (2016) Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32:1220–1222. https://doi.org/10.1093/bioinformatics/btv710

    Article  CAS  PubMed  Google Scholar 

  30. Chong Z, Ruan J, Gao M et al (2017) novoBreak: local assembly for breakpoint detection in cancer genomes. Nat Methods 14:65–67. https://doi.org/10.1038/nmeth.4084

    Article  CAS  PubMed  Google Scholar 

  31. Wala JA, Bandopadhayay P, Greenwald NF et al (2018) SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res 28:581–591. https://doi.org/10.1101/gr.221028.117

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Rausch T, Zichner T, Schlattl A et al (2012) DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28:i333–i339. https://doi.org/10.1093/bioinformatics/bts378

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Layer RM, Chiang C, Quinlan AR, Hall IM (2014) LUMPY: a probabilistic framework for structural variant discovery. Genome Biol 15:R84. https://doi.org/10.1186/gb-2014-15-6-r84

    Article  PubMed  PubMed Central  Google Scholar 

  34. Benjamini Y, Speed TP (2012) Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 40:e72. https://doi.org/10.1093/nar/gks001

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Koren A, Handsaker RE, Kamitaki N et al (2014) Genetic variation in human DNA replication timing. Cell 159:1015–1026. https://doi.org/10.1016/j.cell.2014.10.025

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Kleinheinz K, Bludau I, Huebschmann D, et al (2017) ACEseq—allele specific copy number estimation from whole genome sequencing. bioRxiv 210807. https://doi.org/10.1101/210807

  37. Boeva V, Popova T, Bleakley K et al (2012) Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28:423–425. https://doi.org/10.1093/bioinformatics/btr670

    Article  CAS  PubMed  Google Scholar 

  38. Favero F, Joshi T, Marquard AM et al (2015) Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann Oncol 26:64–70. https://doi.org/10.1093/annonc/mdu479

    Article  CAS  PubMed  Google Scholar 

  39. Van Loo P, Nordgard SH, Lingjærde OC et al (2010) Allele-specific copy number analysis of tumors. Proc Natl Acad Sci U S A 107:16910–16915. https://doi.org/10.1073/pnas.1009843107

    Article  PubMed  PubMed Central  Google Scholar 

  40. Carter SL, Cibulskis K, Helman E et al (2012) Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol 30:413–421. https://doi.org/10.1038/nbt.2203

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Simon A (2010) FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

  42. Sherry ST, Ward MH, Kholodov M et al (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38:e164–e164. https://doi.org/10.1093/nar/gkq603

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Cingolani P, Platts A, Wang LL et al (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6:80–92. https://doi.org/10.4161/fly.19695

    Article  CAS  Google Scholar 

  45. McLaren W, Gil L, Hunt SE et al (2016) The Ensembl variant effect predictor. Genome Biol 17:122. https://doi.org/10.1186/s13059-016-0974-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Vazquez M, Nogales R, Carmona P et al (2010) Rbbt: a framework for fast bioinformatics development with ruby. Springer, Berlin, Heidelberg

    Google Scholar 

  47. McCarthy DJ, Humburg P, Kanapin A et al (2014) Choice of transcripts and software has a large effect on variant annotation. Genome Med 6:26. https://doi.org/10.1186/gm543

    Article  PubMed  PubMed Central  Google Scholar 

  48. Frankish A, Uszczynska B, Ritchie GR et al (2015) Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction. BMC Genomics 16:S2. https://doi.org/10.1186/1471-2164-16-S8-S2

    Article  PubMed  PubMed Central  Google Scholar 

  49. Wu PY, Phan JH, Wang MD (2013) Assessing the impact of human genome annotation choice on RNA-seq expression estimates. BMC Bioinformatics 14(Suppl 1):S8. https://doi.org/10.1186/1471-2105-14-S11-S8

    Article  PubMed  PubMed Central  Google Scholar 

  50. Zhao S, Zhang B (2015) A comprehensive evaluation of ensembl, RefSeq, and UCSC annotations in the context of RNA-seq read mapping and gene quantification. BMC Genomics 16:97. https://doi.org/10.1186/s12864-015-1308-8

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Dees ND, Zhang Q, Kandoth C et al (2012) MuSiC: Identifying mutational significance in cancer genomes. Genome Res 22:1589–1598. https://doi.org/10.1101/gr.134635.111

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Lawrence MS, Stojanov P, Polak P et al (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499:214–218. https://doi.org/10.1038/nature12213

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Gonzalez-Perez A, Lopez-Bigas N (2012) Functional impact bias reveals cancer drivers. Nucleic Acids Res 40:e169. https://doi.org/10.1093/nar/gks743

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Tamborero D, Gonzalez-Perez A, Lopez-Bigas N (2013) OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29:2238–2244. https://doi.org/10.1093/bioinformatics/btt395

    Article  CAS  PubMed  Google Scholar 

  55. Lochovsky L, Zhang J, Fu Y et al (2015) LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations. Nucleic Acids Res 43:8123–8134. https://doi.org/10.1093/nar/gkv803

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Mularoni L, Sabarinathan R, Deu-Pons J et al (2016) OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol 17:128. https://doi.org/10.1186/s13059-016-0994-0

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Alexandrov LB, Nik-Zainal S, Wedge DC et al (2013) Signatures of mutational processes in human cancer. Nature. https://doi.org/10.1038/nature12477

  58. COSMIC—signatures of mutational processes in human cancer. https://cancer.sanger.ac.uk/cosmic/signatures. Accessed 9 May 2018

  59. Gehring JS, Fischer B, Lawrence M, Huber W (2015) SomaticSignatures: inferring mutational signatures from single-nucleotide variants. Bioinformatics 31:3673–3675. https://doi.org/10.1093/bioinformatics/btv408

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Huebschmann D, Kurzawa N, Steinhauser S, et al (2017) Deciphering programs of transcriptional regulation by combined deconvolution of multiple omics layers. bioRxiv 199547. https://doi.org/10.1101/199547

  61. Mnih V (2009) CUDAMat: a CUDA-based matrix class for Python. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.232.4776&rep=rep1&type=pdf

  62. Rosenthal R, McGranahan N, Herrero J et al (2016) deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol 17:31. https://doi.org/10.1186/s13059-016-0893-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Huebschmann D, Gu Z, Schlesner M (2015) YAPSA: yet another package for signature analysis. R package. http://bioconductor.org/packages/release/bioc/html/YAPSA.html

  64. Lek M, Karczewski KJ, Minikel EV et al (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature 536:285–291. https://doi.org/10.1038/nature19057

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Kalatskaya I, Trinh QM, Spears M et al (2017) ISOWN: accurate somatic mutation identification in the absence of normal tissue controls. Genome Med 9:59. https://doi.org/10.1186/s13073-017-0446-9

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Smith KS, Yadav VK, Pei S et al (2016) SomVarIUS: somatic variant identification from unpaired tissue samples. Bioinformatics 32:808–813. https://doi.org/10.1093/bioinformatics/btv685

    Article  CAS  PubMed  Google Scholar 

  67. Madubata CJ, Roshan-Ghias A, Chu T et al (2017) Identification of potentially oncogenic alterations from tumor-only samples reveals Fanconi anemia pathway mutations in bladder carcinomas. NPJ Genomic Med 2:29. https://doi.org/10.1038/s41525-017-0032-5

    Article  CAS  Google Scholar 

Download references

Acknowledgments

This work has been supported by the German Ministry of Science and Education (BMBF) in the framework of the ICGC MMML-Seq (01KU1002A-J) and the ICGC DE-MINING (01KU1505E) projects and the Heidelberg Center for Human Bioinformatics (HD-HuB) within the German Network for Bioinformatics Infrastructure (de.NBI) (#031A537A, #031A537C). We are grateful to all present and previous members of the Division of Theoretical Bioinformatics, the DKFZ-HIPO bioinformatics team, the Omics IT and Data Management Core Facility, and the Bioinformatics and Omics Data Analytics group of the German Cancer Research Center (DKFZ, Heidelberg) as well as coworkers in the ICGC MMML-seq and PedBrain projects who were involved in the establishment of the procedures described here.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Schlesner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Hübschmann, D., Schlesner, M. (2019). Evaluation of Whole Genome Sequencing Data. In: Küppers, R. (eds) Lymphoma. Methods in Molecular Biology, vol 1956. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-9151-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9151-8_15

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-9150-1

  • Online ISBN: 978-1-4939-9151-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics