Abstract
Background
The reconstruction of clonal haplotypes and their evolutionary history in evolving populations is a common problem in both microbial evolutionary biology and cancer biology. The clonal theory of evolution provides a theoretical framework for modeling the evolution of clones.
Results
In this paper, we review the theoretical framework and assumptions over which the clonal reconstruction problem is formulated. We formally define the problem and then discuss the complexity and solution space of the problem. Various methods have been proposed to find the phylogeny that best explains the observed data. We categorize these methods based on the type of input data that they use (space-resolved or time-resolved), and also based on their computational formulation as either combinatorial or probabilistic. It is crucial to understand the different types of input data because each provides essential but distinct information for drastically reducing the solution space of the clonal reconstruction problem. Complementary information provided by single cell sequencing or from whole genome sequencing of randomly isolated clones can also improve the accuracy of clonal reconstruction. We briefly review the existing algorithms and their relationships. Finally we summarize the tools that are developed for either directly solving the clonal reconstruction problem or a related computational problem.
Conclusions
In this review, we discuss the various formulations of the problem of inferring the clonal evolutionary history from allele frequeny data, review existing algorithms and catergorize them according to their problem formulation and solution approaches. We note that most of the available clonal inference algorithms were developed for elucidating tumor evolution whereas clonal reconstruction for unicellular genomes are less addressed. We conclude the review by discussing more open problems such as the lack of benchmark datasets and comparison of performance between available tools.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Abbreviations
- DNA:
-
deoxyribonucleic acid
- WGS:
-
whole genome sequencing
- SNV:
-
single nucleotide variation
- CNV:
-
copy number variation
- SV:
-
structural variation
- LTEE:
-
long term evolution experiment
- VAF:
-
variant allele frequency
- VAFFP:
-
variant allele frequency factorization problem
- ISA:
-
infinite sites assumption
- ILP:
-
integer linear programming
- MILP:
-
mixed integer linear programming
- QIP:
-
quadratic integer programming
- BTP:
-
binary tree partition
- MCMC:
-
Markov Chain Monte Carlo
- BIC:
-
Bayesian information criterion
References
Shapiro, B. J. (2016) How clonal are bacteria over time? Curr. Opin. Microbiol., 31, 116–123
Tibayrenc, M., Kjellberg, F. and Ayala, F. J. (1990) A clonal theory of parasitic protozoa: the population structures of Entamoeba, Giardia, Leishmania, Naegleria, Plasmodium, Trichomonas, and Trypanosoma and their medical and taxonomical consequences. Proc. Natl. Acad. Sci. USA, 87, 2414–2418
Blount, Z. D., Barrick, J. E., Davidson, C. J. and Lenski, R. E. (2012) Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature, 489, 513–518
Wielgoss, S., Barrick, J. E., Tenaillon, O., Cruveiller, S., Chane-Woon-Ming, B., Médigue, C., Lenski, R. E. and Schneider, D. (2011) Mutation rate inferred from synonymous substitutions in a long-term evolution experiment with Escherichia coli. G3: Genes, Genom. Genet., 1, 183–186
Behringer, M. G., Choi, B. I., Miller, S. F., Doak, T. G., Karty, J. A., Guo, W. and Lynch, M. (2018) Escherichia coli cultures maintain stable subpopulation structure during long-term evolution. Proc. Natl. Acad. Sci. USA, 115, E4642–E4650
Pon, J. R. and Marra, M. A. (2015) Driver and passenger mutations in cancer. Annu. Rev. Pathol., 10, 25–50
Lenski, R. E., Rose, M. R., Simpson, S. C. and Tadler, S. C. (1991) Long-term experimental evolution in Escherichia coli. I. adaptation and divergence during 2,000 generations. Am. Nat., 138, 1315–1341
Lenski, R. E., Wiser, M. J., Ribeck, N., Blount, Z. D., Nahum, J. R., Morris, J. J., Zaman, L., Turner, C. B., Wade, B. D., Maddamsetti, R., et al. (2015) Sustained fitness gains and variability in fitness trajectories in the long-term evolution experiment with Escherichia coli. P. Roy. Soc. B-Biol. Sci. 282, 20152292
Plucain, J., Hindré, T., Le Gac, M., Tenaillon, O., Cruveiller, S., Médigue, C., Leiby, N., Harcombe, W. R., Marx, C. J., Lenski, R. E., et al. (2014) Epistasis and allele specificity in the emergence of a stable polymorphism in Escherichia coli. Science, 343, 1366–1369
Rozen, D. E. and Lenski, R. E. (2000) Long-term experimental evolution in Escherichia coli. VIII. dynamics of a balanced polymorphism. Am. Nat., 155, 24–35
Wiser, M. J., Ribeck, N. and Lenski, R. E. (2013) Long-term dynamics of adaptation in asexual populations. Science, 342, 1364–1367
Taus, T., Futschik, A. and Schlötterer, C. (2017) Quantifying selection with pool-seq time series data. Mol. Biol. Evol., 34, 3023–3034
Schwartz, R., Schöffer, A.A. (2017) The evolution of tumour phylogenetics: principles and practice. Nat. Re. Genet., 18, 213–229
Kimura, M. (1969) The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics, 61, 893–903
El-Kebir, M., Oesper, L., Acheson-Field, H. and Raphael, B. J. (2015) Reconstruction of clonal trees and tumor composition from multi-sample sequencing data. Bioinformatics, 31, i62–i70
Ng, C. K., Cooke, S. L., Howe, K., Newman, S., Xian, J., Temple, J., Batty, E. M., Pole, J. C., Langdon, S. P., Edwards, P. A., et al. (2012) The role of tandem duplicator phenotype in tumour evolution in high-grade serous ovarian cancer. J. Pathol., 226, 703–712
Yang, L., Luquette, L. J., Gehlenborg, N., Xi, R., Haseley, P. S., Hsieh, C. H., Zhang, C., Ren, X., Protopopov, A., Chin, L., et al. (2013) Diverse mechanisms of somatic structural variations in human cancer genomes. Cell, 153, 919–929
Quigley, D. A., Dang, H. X., Zhao, S. G., Lloyd, P., Aggarwal, R., Alumkal, J. J., Foye, A., Kothari, V., Perry, M. D., Bailey, A. M., et al. (2018) Genomic hallmarks and structural variation in metastatic prostate cancer. Cell, 174, 758–769.e9
Malikic, S., McPherson, A. W., Donmez, N. and Sahinalp, C. S. (2015) Clonality inference in multiple tumor samples using phylogeny. Bioinformatics, 31, 1349–1356
Zare, H., Wang, J., Hu, A., Weber, K., Smith, J., Nickerson, D., Song, C., Witten, D., Blau, C. A. and Noble, W. S. (2014) Inferring clonal composition from multiple sections of a breast cancer. PLOS Comput. Biol., 10, e1003703
Fischer, A., Vázquez-García, I., Illingworth J. R. C., and Mustonen, V. (2014) High-definition reconstruction of clonal composition in cancer. Cell Reports, 7, 1740–1752
Zaccaria, S., El-Kebir, M., Klau, G. W. and Raphael, B. J. (2017) The copy-number tree mixture deconvolution problem and applications to multi-sample bulk sequencing tumor data. In: International Conference on Research in Computational Molecular Biology, pp. 318–335. Springer
Husić, E., Li, X., Hujdurović, A., Mehine, M., Rizzi, R., Mäkinen, V., Milanič, M. and Tomescu, A. I. (2019) MIPUP: minimum perfect unmixed phylogenies for multi-sampled tumors via branchings and ILP. Bioinformatics, 35, 769–777
Popic, V., Salari, R., Hajirasouliha, I., Kashef-Haghighi, D., West, R. B. and Batzoglou, S. (2015) Fast and scalable inference of multi-sample cancer lineages. Genome Biol., 16, 91
Jiao, W., Vembu, S., Deshwar, A. G., Stein, L. and Morris, Q. (2014) Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC Bioinformatics, 15, 35
Deshwar, A. G., Vembu, S., Yung, C. K., Jang, G. H., Stein, L. and Morris, Q. (2015) PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol., 16, 35
Roth, A., Khattra, J., Yap, D., Wan, A., Laks, E., Biele, J., Ha, G., Aparicio, S., Bouchard-Côté, A. and Shah, S. P. (2014) PyClone: statistical inference of clonal population structure in cancer. Nat. Methods, 11, 396–398
Hajirasouliha, I., Mahmoody, A. and Raphael, B. J. (2014) A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data. Bioinformatics, 30, i78–i86
Miller, C. A., White, B. S., Dees, N. D., Griffith, M., Welch, J. S., Griffith, O. L., Vij, R., Tomasson, M. H., Graubert, T. A., Walter, M. J., et al. (2014) SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLOS Comput. Biol., 10, e1003665
Oesper, L., Mahmoody, A. and Raphael, B. J. (2013) THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biol., 14, R80
Strino, F., Parisi, F., Micsinai, M. and Kluger, Y. (2013) TrAp: a tree approach for fingerprinting subclonal tumor composition. Nucleic Acids Res., 41, e165
Deveau, P., Colmet Daage, L., Oldridge, D., Bernard, V., Bellini, A., Chicard, M., Clement, N., Lapouble, E., Combaret, V., Boland, A., et al. (2018) QuantumClone: clonal assessment of functional mutations in cancer based on a genotype-aware method for clonal reconstruction. Bioinformatics, 34, 1808–1816
Donmez, N., Malikic, S., Wyatt, A. W., Gleave, M. E., Collins, C. C. and Sahinalp, S. C. (2017) Clonality inference from single tumor samples using low-coverage sequence data. J. Comput. Biol., 24, 515–523
Mohammed Ismail, W. and Tang, H. (2019) Clonal reconstruction from time course genomic sequencing data. In: International Conference on Intelligent Biology and Medicine
El-Kebir, M., Satas, G., Oesper, L. and Raphael, B. J. (2016) Inferring the mutational history of a tumor using multi-state perfect phylogeny mixtures. Cell Syst., 3, 43–53
Nieboer, M. M., Dorssers, L. C. J., Straver, R., Looijenga, L. H. J. and de Ridder, J. (2018) TargetClone: A multi-sample approach for reconstructing subclonal evolution of tumors. PLoS One, 13, e0208002
Yuan, K., Sakoparnig, T., Markowetz, F. and Beerenwinkel, N. (2015) BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies. Genome Biol., 16, 36
Jiang, Y., Qiu, Y., Minn, A. J. and Zhang, N. R. (2016) Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proc. Natl. Acad. Sci. USA, 113, E5528–E5537
Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L. M., Melnyk, N., McPherson, A., Bashashati, A., Laks, E., et al. (2014) TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data. Genome Res., 24, 1881–1893
Myers, M. A., Satas, G. and Raphael, B. J. (2019) Calder: Inferring phylogenetic trees from longitudinal tumor samples. Cell Syst., 8, 514–522.e5
Sengupta, S., Wang, J., Lee, J., Müller, P., Gulukota, K., Banerjee, A. and Ji, Y. (2014) Bayclone: Bayesian nonparametric inference of tumor subclones using NGS data. In: Pacific Symposium on Biocomputing Co-Chairs, pp. 467–478. World Scientific
Lee, J., Müller, P., Sengupta, S., Gulukota, K. and Ji, Y. (2016) Bayesian inference for intratumour heterogeneity in mutations and copy number variation. J. R. Stat. Soc. Ser. C Appl. Stat., 65, 547–563
Miura, S., Gomez, K., Murillo, O., Huuki, L. A., Vu, T., Buturla, T. and Kumar, S. (2018) Predicting clone genotypes from tumor bulk sequencing of multiple samples. Bioinformatics, 34, 4017–4026
Marass, F., Mouliere, F., Yuan, K., Rosenfeld, N. and Markowetz, F. (2016) A phylogenetic latent feature model for clonal deconvolution. Ann. Appl. Stat., 10, 2377–2404
Zhou, T., Sengupta, S., Müller, P. and Ji, Y. (2019) Treeclone: Reconstruction of tumor subclone phylogeny based on mutation pairs using next generation sequencing data. Ann. Appl. Stat., 13, 874–899
Zhou, T., Müller, P., Sengupta, S. and Ji, Y. (2019) Pairclone: a bayesian subclone caller based on mutation pairs. J. R. Stat. Soc. Ser. C Appl. Stat., 68, 705–725
Qiao, Y., Quinlan, A. R., Jazaeri, A. A., Verhaak, R. G., Wheeler, D. A. and Marth, G. T. (2014) SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization. Genome Biol., 15, 443
Zafar, H., Tzen, A., Navin, N., Chen, K. and Nakhleh, L. (2016) Sifit: a method for inferring tumor trees from single-cell sequencing data under finite-site models. Genome Biol., 18, 178
Davis, A. and Navin, N. E. (2016) Computing tumor trees from single cells. Genome Biol., 17, 113
Ross, E. M. and Markowetz, F. (2016) OncoNEM: inferring tumor evolution from single-cell sequencing data. Genome Biol., 17, 69
El-Kebir, M. (2018) SPhyR: tumor phylogeny estimation from single-cell sequencing data under loss and error. Bioinformatics, 34, i671–i679
Malikic, S., Jahn, K., Kuipers, J., Sahinalp, C. and Beerenwinkel, N. (2017) Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data. Nat. Commu., 10, 2750
Salehi, S., Steif, A., Roth, A., Aparicio, S., Bouchard-Côté, A. and Shah, S. P. (2017) ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data. Genome Biol., 18, 44
Eaton, J., Wang, J. and Schwartz, R. (2018) Deconvolution and phylogeny inference of structural variations in tumor genomic samples. Bioinformatics, 34, i357–i365
Lei, H., Lyu, B., Gertz, E. M., Schaeffer, A. A., Shi, X., Wu, K., Li, G., Xu, L., Hou, Y., Dean, M., et al. (2019) Tumor copy number deconvolution integrating bulk and single-cell sequencing data. In: International Conference on Research in Computational Molecular Biology, pp. 174–189. Springer
Aganezov, S. and Raphael, B. J. (2019) Reconstruction of clone-and haplotype-specific cancer genome karyotypes from bulk tumor samples. bioRxiv
Chen, G., Ning, B., Shi, T. (2019) Single-cell RNA-seq technologies and related computational data analysis. Front. Genet., 10, 317–317
Ferreira, P. F., Carvalho, A. M. and Vinga, S. (2018) Scalable probabilistic matrix factorization for single-cell RNA-seq analysis. bioRxiv
Durif, G., Modolo, L., Mold, J. E., Lambert-Lacroix, S. and Picard, F. (2019) Probabilistic count matrix factorization for single cell expression data analysis. Bioinformatics, 35, 4011–4019
Sun, S., Chen, Y., Liu, Y. and Shang, X. (2019) A fast and efficient count-based matrix factorization method for detecting cell types from single-cell RNAseq data. BMC Syst. Biol., 13, 28
Acknowledgements
This research was partially supported by a Multidisciplinary University Research Initiative Award W911NF-09-1-0444 from the US Army Research Office, the National Institute of Health grant 1R01AI108888 and Indiana University (IU) Precision Health Initiative (PHI). We thank Drs. Megan Behringer and Michael Lynch for very inspiring discussions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors Wazim Mohammed Ismail, Etienne Nzabarushimana and Haixu Tang declare that they have no conflict of interests.
This article is a review article and does not contain any studies with human or animal subjects performed by any of the authors.
Additional information
Author summary: As cells divide, they often gain new mutations creating newborn cells that are genetically distinct from their parent cells. Each new genetically distinct cell is called a clone. The problem of inferring the number of clones in a given population of cells, the unique set of mutations that identify each clone and the ancestral history of these identified clones is known as the clonal reconstruction problem. In this review, we discuss the theoretical framework of this problem, briefly review and classify the existing algorithms based on their approach and discuss open problems in this area of research.
Rights and permissions
About this article
Cite this article
Ismail, W.M., Nzabarushimana, E. & Tang, H. Algorithmic approaches to clonal reconstruction in heterogeneous cell populations. Quant Biol 7, 255–265 (2019). https://doi.org/10.1007/s40484-019-0188-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40484-019-0188-3