Tumor Copy Number Deconvolution Integrating Bulk and Single-Cell Sequencing Data
Characterizing intratumor heterogeneity (ITH) is crucial to understanding cancer development, but it is hampered by limits of available data sources. Bulk DNA sequencing is the most common technology to assess ITH, but mixes many genetically distinct cells in each sample, which must then be computationally deconvolved. Single-cell sequencing (SCS) is a promising alternative, but its limitations—e.g., high noise, difficulty scaling to large populations, technical artifacts, and large data sets—have so far made it impractical for studying cohorts of sufficient size to identify statistically robust features of tumor evolution. We have developed strategies for deconvolution and tumor phylogenetics combining limited amounts of bulk and single-cell data to gain some advantages of single-cell resolution with much lower cost, with specific focus on deconvolving genomic copy number data. We developed a mixed membership model for clonal deconvolution via non-negative matrix factorization (NMF) balancing deconvolution quality with similarity to single-cell samples via an associated efficient coordinate descent algorithm. We then improve on that algorithm by integrating deconvolution with clonal phylogeny inference, using a mixed integer linear programming (MILP) model to incorporate a minimum evolution phylogenetic tree cost in the problem objective. We demonstrate the effectiveness of these methods on semi-simulated data of known ground truth, showing improved deconvolution accuracy relative to bulk data alone.
KeywordsCancer Heterogeneity Genomic deconvolution Copy number alteration (CNA) Non-negative matrix factorization (NMF)
This research was supported in part by the Intramural Research Program of the National Institutes of Health, National Library of Medicine and both Center for Cancer Research and Division of Cancer Epidemiology and Genetics within the National Cancer Institute. This research was supported in part by the Exploration Program of the Shenzhen Science and Technology Innovation Committee [JCYJ20170303151334808]. Portions of this work have been funded by U.S. N.I.H. award R21CA216452 and Pennsylvania Dept. of Health award 4100070287. The Pennsylvania Department of Health specifically disclaims responsibility for any analyses, interpretations or conclusions.
- 17.Kuipers, J., Jahn, K., Beerenwinkel, N.: Advances in understanding tumour evolution through single-cell sequencing. Biochimica et Biophysica Acta (BBA)-Rev. Cancer 1867(2), 127–138 (2017)Google Scholar
- 18.Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556–562 (2001)Google Scholar
- 20.Loeb, L.A.: A mutator phenotype in cancer. Cancer Res. 61(8), 3230–3239 (2001)Google Scholar
- 22.Malikic, S., et al.: PhISCS-a combinatorial approach for sub-perfect tumor phylogeny reconstruction via integrative use of single cell and bulk sequencing data. bioRxiv p. 376996 (2018)Google Scholar
- 23.Malikic, S., Jahn, K., Kuipers, J., Sahinalp, C., Beerenwinkel, N.: Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data. bioRxiv p. 234914 (2017)Google Scholar
- 24.Marusyk, A., Polyak, K.: Tumor heterogeneity: causes and consequences. Biochimica et Biophysica Acta (BBA)-Rev. Cancer 1805(1), 105–117 (2010)Google Scholar
- 34.Siegel, R.L., et al.: Colorectal cancer statistics, 2017. CA: Cancer J. Clin. 67(3), 177–193 (2017)Google Scholar
- 35.Sridhar, S., Lam, F., Blelloch, G.E., Ravi, R., Schwartz, R.: Efficiently finding the most parsimonious phylogenetic tree via linear programming. In: Măndoiu, I., Zelikovsky, A. (eds.) ISBRA 2007. LNCS, vol. 4463, pp. 37–48. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72031-7_4CrossRefGoogle Scholar
- 37.Thurau, C., Kersting, K., Bauckhage, C.: Convex non-negative matrix factorization in the wild. In: 2009 Ninth IEEE International Conference on Data Mining, pp. 523–532, December 2009. https://doi.org/10.1109/ICDM.2009.55
- 42.Wu, K., et al.: Diverse evolutionary dynamics in glioblastoma inference by multi-region and single-cell sequencing. J. Clin. Oncol. 34(15\_suppl), 11580 (2016)Google Scholar