Accurate and Flexible Bayesian Mutation Call from Multi-regional Tumor Samples
We propose a Bayesian method termed MultiMuC for accurate detection of somatic mutations (mutation call) from multi-regional tumor sequence data sets. To improve detection performance, our method is based on the assumption of mutation sharing: if we can predict at least one tumor region has the mutation, then we can be more confident to detect a mutation in more tumor regions by lowering the original threshold of detection. We find two drawbacks in existing methods for leveraging the assumption of mutation sharing. First, existing methods do not consider the probability of the “No-TP (True Positive)” case: we could expect mutation candidates in multiple regions, but actually, no true mutations exist. Second, existing methods cannot leverage scores from other state-of-the-art mutation calling methods for a single-regional tumor. We overcome the first drawback through evaluation of the probability of the No-TP case. Next, we solve the second drawback by the idea of Bayes-factor-based model construction that enables flexible integration of probability-based mutation call scores as building blocks of a Bayesian statistical model. We empirically evaluate that our method steadily improves results from mutation calling methods for a single-regional tumor, e.g., Strelka2 and NeuSomatic, and outperforms existing methods for multi-regional tumors through a real-data-based simulation study. Our implementation of MultiMuC is available at https://github.com/takumorizo/MultiMuC.
We used the supercomputers at Human Genome Center, the Institute of Medical Science, the University of Tokyo. This work has been supported by the Grant-in-Aid for JSPS Research Fellow (17J08884) and MEXT/JSPS KAKENHI Grant (15H05912, hp180198, hp170227, 18H03329, hp190158).
- 7.Moriyama, T., et al.: A Bayesian model integration for mutation calling through data partitioning. Bioinformatics, btz233 (2019). https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz233/5423180
- 15.Detering, H., et al.: Accuracy of somatic variant detection in multiregional tumor sequencing data. bioRxiv 655605 (2019)Google Scholar
- 17.Neal, R.M.: Probabilistic inference using Markov Chain Monte Carlo methods. Technical report, Department of Computer Science, University of Toronto (1993)Google Scholar