OVarCall: Bayesian Mutation Calling Method Utilizing Overlapping Paired-End Reads

  • Takuya Moriyama
  • Yuichi Shiraishi
  • Kenichi Chiba
  • Rui Yamaguchi
  • Seiya Imoto
  • Satoru MiyanoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9683)


Detection of somatic mutations from tumor and matched normal sequencing data has become a standard approach in cancer research. Although a number of mutation callers are developed, it is still difficult to detect mutations with low allele frequency even in exome sequencing. We expect that overlapping paired-end read information is effective for this purpose, but no mutation caller has modeled overlapping information statistically in a proper form in exome sequence data. Here, we develop a Bayesian hierarchical method, OVarCall, where overlapping paired-end read information improves the accuracy of low allele frequency mutation detection. Firstly, we construct two generative models: one is for reads with somatic variants generated from tumor cells and the other is for reads that does not have somatic variants but potentially includes sequence errors. Secondly, we calculate marginal likelihood for each model using a variational Bayesian algorithm to compute Bayes factor for the detection of somatic mutations. We empirically evaluated the performance of OVarCall and confirmed its better performance than other existing methods.


Somatic mutation detection Next-generation sequencing data Overlapping paired-end reads Bayesian hierarchical model 



The super-computing resource was provided by Human Genome Center, the Institute of Medical Science, the University of Tokyo.


  1. 1.
    Benson, G.: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27(2), 573–580 (1999)CrossRefGoogle Scholar
  2. 2.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)CrossRefGoogle Scholar
  3. 3.
    Chen-Harris, H., et al.: Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs. BMC Genomics 14(1), 96 (2013)CrossRefGoogle Scholar
  4. 4.
    Cibulskis, K., et al.: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31(3), 213–219 (2013)CrossRefGoogle Scholar
  5. 5.
    Dohm, J.C., et al.: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 36(16), e105 (2008)CrossRefGoogle Scholar
  6. 6.
    Jensen, J.L.W.V.: Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Acta Math. 30(1), 175–193 (1906)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Koboldt, D.C., et al.: VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22(3), 568–576 (2012)CrossRefGoogle Scholar
  8. 8.
    Larson, D.E., et al.: SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28(3), 311–317 (2012)CrossRefGoogle Scholar
  9. 9.
    Li, H., et al.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009). Oxford, EnglandCrossRefGoogle Scholar
  10. 10.
    Li, M., Stoneking, M.: A new approach for detecting low-level mutations in next-generation sequence data. Genome Biol. 13(5), R34 (2012)CrossRefGoogle Scholar
  11. 11.
    Meyerson, M., et al.: Advances in understanding cancer genomes through second-generation sequencing. Nat. Reviews. Genet. 11(10), 685–696 (2010)CrossRefGoogle Scholar
  12. 12.
    Nakamura, K., et al.: Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 39(13), e90 (2011)CrossRefGoogle Scholar
  13. 13.
    Pope, B.J., et al.: ROVER variant caller: read-pair overlap considerate variant-calling software applied to PCR-based massively parallel sequencing datasets. Source Code Biol. Med. 9(1), 3 (2014)CrossRefGoogle Scholar
  14. 14.
    Roth, A., et al.: JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28(7), 907–913 (2012)CrossRefGoogle Scholar
  15. 15.
    Sato, Y., et al.: Integrated molecular analysis of clear-cell renal cell carcinoma. Nat. Genet. 45(8), 860–867 (2013)CrossRefGoogle Scholar
  16. 16.
    Saunders, C.T., et al.: Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28(14), 1811–1817 (2012)CrossRefGoogle Scholar
  17. 17.
    Shah, S.P., et al.: Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 461(7265), 809–813 (2009)CrossRefGoogle Scholar
  18. 18.
    Sherry, S.T.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29(1), 308–311 (2001)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Shiraishi, Y., et al.: An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data. Nucleic Acids Res. 41(7), e89 (2013)CrossRefGoogle Scholar
  20. 20.
    Usuyama, N., et al.: HapMuC: somatic mutation calling using heterozygous germ line variants near candidate mutations. Bioinformatics 30(23), 3302–3309 (2014)CrossRefGoogle Scholar
  21. 21.
    Yoshida, K., et al.: Frequent pathway mutations of splicing machinery in myelodysplasia. Nature 478(7367), 64–69 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Takuya Moriyama
    • 1
  • Yuichi Shiraishi
    • 1
  • Kenichi Chiba
    • 1
  • Rui Yamaguchi
    • 1
  • Seiya Imoto
    • 2
  • Satoru Miyano
    • 1
    • 2
    Email author
  1. 1.Human Genome Center, Institute of Medical ScienceThe University of TokyoTokyoJapan
  2. 2.Health Intelligence Center, Institute of Medical ScienceThe University of TokyoTokyoJapan

Personalised recommendations