Towards Recovering Allele-Specific Cancer Genome Graphs
Integrated analysis of structural variants (SVs) and copy number alterations (CNAs) in aneuploid cancer genomes is key to understanding the tumor genome complexity. A recently developed new algorithm Weaver can estimate, for the first time, allele-specific copy number of SVs and their interconnectivity in aneuploid cancer genomes. However, one major limitation is that not all SVs identified by Weaver are phased. In this paper, we develop a general convex programming framework that predicts the interconnectivity of unphased SVs with possibly noisy allele-specific copy number estimations as input. We demonstrated through applications to both simulated data and the HeLa whole-genome sequencing data that our method is robust to the noise in the input copy numbers and can predict SV phasings with high specificity. We found that our method can make consistent predictions with Weaver even if a large proportion of the input variants are unphased. We also applied our method to TCGA ovarian cancer whole-genome sequencing samples to phase unphased SVs obtained by Weaver. Our work provides an important new algorithmic framework for recovering more complete allele-specific cancer genome graphs.
KeywordsInteger Linear Program Cancer Genome Tumor Genome Region Extremity Somatic Copy Number Alteration
The authors would like to thank anonymous reviewers for suggestions that improved the paper. The authors would also like to thank the TCGA Research Network for making the data publicly available. This work is supported in part by National Institutes of Health Grants CA182360, HG007352, and DK107965 (to J.M.), and National Science Foundation Grants 1054309 and 1262575 (to J.M.).
- 7.Gordon, D.J., Resio, B., Pellman, D.: Causes and consequences of aneuploidy in cancer. Nat. Rev. Genet. 13(3), 189–203 (2012)Google Scholar
- 10.Gurobi Optimization Inc.: Gurobi optimizer reference manual (2015)Google Scholar
- 12.Kimura, M.: The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61(4), 893 (1969)Google Scholar
- 22.Zheng, G.X., Lau, B.T., Schnall-Levin, M., Jarosz, M., Bell, J.M., Hindson, C.M., Kyriazopoulou-Panagiotopoulou, S., Masquelier, D.A., Merrill, L., Terry, J.M., et al.: Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34(3), 303–311 (2016)CrossRefGoogle Scholar