Skip to main content

SomaticSeq: An Ensemble and Machine Learning Method to Detect Somatic Mutations

  • Protocol
  • First Online:
Bioinformatics for Cancer Immunotherapy

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2120))

Abstract

A standard strategy to discover somatic mutations in a cancer genome is to use next-generation sequencing (NGS) technologies to sequence the tumor tissue and its matched normal (commonly blood or adjacent normal tissue) for side-by-side comparison. However, when interrogating entire genomes (or even just the coding regions), the number of sequencing errors easily outnumbers the number of real somatic mutations by orders of magnitudes. Here, we describe SomaticSeq, which incorporates multiple somatic mutation detection algorithms and then uses machine learning to vastly improve the accuracy of the somatic mutation call sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cibulskis K, Lawrence MS, Carter SL et al (2013) Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol 31(3):213–219

    Article  CAS  Google Scholar 

  2. Koboldt DC, Zhang Q, Larson DE et al (2012) VarScan 2: somatic mutation and copy numberalteration discovery in cancer by exome sequencing. Genome Res 22(3):568–576

    Article  CAS  Google Scholar 

  3. Roth A, Ding J, Morin R et al (2012) JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28(7):907–913

    Article  CAS  Google Scholar 

  4. Larson DE, Harris CC, Chen K et al (2012) SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28(3):311–317

    Article  CAS  Google Scholar 

  5. Lai Z, Markovets A, Ahdesmaki M et al (2016) VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res 44(11):e108

    Article  CAS  Google Scholar 

  6. Fan Y, Xi L, Hughes DST et al (2016) MuSE: accounting for tumor het- erogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome Biol 17(1):178

    Article  Google Scholar 

  7. Wilm A, Aw PPK, Bertrand D et al (2012) LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. NucleicAcidsRes 40(22):11189–11201

    Article  CAS  Google Scholar 

  8. Narzisi G, O’Rawe JA, Iossifov I et al (2014) Accurate de novo and transmitted indel detection in exome-capture data using microassembly. Nat Methods 11(10):1033–1036

    Article  CAS  Google Scholar 

  9. Kim S, Scheffler K, Halpern AL et al (2018) Strelka2: fast and accurate calling of germline and somatic variants. Nat Methods 15(8):591–594

    Article  CAS  Google Scholar 

  10. Freed D, Pan R, Aldana R (2018) Tnscope: accurate detection of somatic mutations with haplotype-based variant candidate detection and machine learning filtering. bioRxiv

    Google Scholar 

  11. Thorvaldsdottir H, Robinson JT, Mesirov JP (2013) Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14(2):178–192

    Article  CAS  Google Scholar 

  12. Fang LT, Afshar PT, Chhibber A et al (2015) An ensemble approach to accurately detect somatic mutations using somaticseq. Genome Biol 16(1):197

    Article  CAS  Google Scholar 

  13. Johnson K, Culp M, Michailides G (2006) ada: an R package for stochastic boosting. J Stat Softw 17(2)

    Google Scholar 

  14. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841–842

    Article  CAS  Google Scholar 

  15. Ewing AD, Houlahan KE, Hu Y et al (2015) Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat Methods 12(7):623–630

    Article  CAS  Google Scholar 

  16. Genome in a bottle. https://www.nist.gov/programs-projects/genome-bottle

  17. First publicly available XTen genome. http://allseq.com/knowledge-bank/1000-genome/get-your-1000-genome-test-data-set/

  18. Roberts ND, Daniel Kortschak R, Parker WT et al (2013) A comparative analysis of algorithms for somatic snv detection in cancer. Bioinformatics 29(18):2223–2230

    Article  CAS  Google Scholar 

  19. Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with bwa-mem

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Tai Fang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Fang, L.T. (2020). SomaticSeq: An Ensemble and Machine Learning Method to Detect Somatic Mutations. In: Boegel, S. (eds) Bioinformatics for Cancer Immunotherapy. Methods in Molecular Biology, vol 2120. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0327-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-0327-7_4

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-0326-0

  • Online ISBN: 978-1-0716-0327-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics