Ensemble-Based Somatic Mutation Calling in Cancer Genomes

  • Weitai HuangEmail author
  • Yu Amanda Guo
  • Mei Mei Chang
  • Anders Jacobsen Skanderup
Part of the Methods in Molecular Biology book series (MIMB, volume 2120)


Identification of somatic mutations in tumor tissue is challenged by both technical artifacts, diverse somatic mutational processes, and genetic heterogeneity in the tumors. Indeed, recent independent benchmark studies have revealed low concordance between different somatic mutation callers. Here, we describe Somatic Mutation calling method using a Random Forest (SMuRF), a portable ensemble method that combines the predictions and auxiliary features from individual mutation callers using supervised machine learning. SMuRF has improved prediction accuracy for both somatic point mutations (single nucleotide variants; SNVs) and small insertions/deletions (indels) in cancer genomes and exomes. Here, we describe the method and provide a tutorial on the installation and application of SMuRF.

Key words

Somatic mutation calling Next-generation sequencing 


  1. 1.
    Cibulskis K, Lawrence MS, Carter SL et al (2013) Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol 31:213. Scholar
  2. 2.
    Lai Z, Markovets A, Ahdesmaki M et al (2016) VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res 44(11):e108. Scholar
  3. 3.
    Koboldt DC, Zhang Q, Larson DE et al (2012) VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 22(3):568–576. Scholar
  4. 4.
    Kim S, Scheffler K, Halpern AL et al (2018) Strelka2: fast and accurate calling of germline and somatic variants. Nat Methods 15(8):591–594. Scholar
  5. 5.
    Hwang S, Kim E, Lee I et al (2015) Systematic comparison of variant calling pipelines using gold standard personal exome variants. Sci Rep 5:17875. Scholar
  6. 6.
    Kroigard AB, Thomassen M, Laenkholm AV et al (2016) Evaluation of nine somatic variant callers for detection of somatic mutations in exome and targeted deep sequencing data. PLoS One 11(3):e0151664. Scholar
  7. 7.
    O'Rawe J, Jiang T, Sun G et al (2013) Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med 5(3):28. Scholar
  8. 8.
    Roberts ND, Kortschak RD, Parker WT et al (2013) A comparative analysis of algorithms for somatic SNV detection in cancer. Bioinformatics (Oxford, England) 29(18):2223–2230. Scholar
  9. 9.
    Alioto TS, Buchhalter I, Derdak S et al (2015) A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat Commun 6:10001. Scholar
  10. 10.
    Huang W, Guo YA, Muthukumar K et al (2019) SMuRF: portable and accurate ensemble prediction of somatic mutations. Bioinformatics (Oxford, England) 35:3157–3159. Scholar
  11. 11.
    Cingolani P, Platts A, Wang le L et al (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6(2):80–92. Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  • Weitai Huang
    • 1
    • 2
    Email author
  • Yu Amanda Guo
    • 1
  • Mei Mei Chang
    • 1
  • Anders Jacobsen Skanderup
    • 1
  1. 1.Computational and Systems Biology 3Genome Institute of Singapore, A∗STAR (Agency for Science, Technology and Research)SingaporeSingapore
  2. 2.National University of Singapore Graduate School for Integrative Sciences and Engineering, National University of SingaporeSingaporeSingapore

Personalised recommendations