Abstract
Intratumor heterogeneity provides the fuel for the evolution and selection of subclonal tumor cell populations. However, accurate inference of tumor subclonal architecture and reconstruction of tumor evolutionary histories from bulk DNA sequencing data remains challenging. Frequently, sequencing and alignment artifacts are not fully filtered out from cancer somatic mutations, and errors in the identification of copy number alterations or complex evolutionary events (e.g., mutation losses) affect the estimated cellular prevalence of mutations. Together, such errors propagate into the analysis of mutation clustering and phylogenetic reconstruction. In this Protocol, we present a new computational framework, CONIPHER (COrrecting Noise In PHylogenetic Evaluation and Reconstruction), that accurately infers subclonal structure and phylogenetic relationships from multisample tumor sequencing, accounting for both copy number alterations and mutation errors. CONIPHER has been used to reconstruct subclonal architecture and tumor phylogeny from multisample tumors with high-depth whole-exome sequencing from the TRACERx421 dataset, as well as matched primary-metastatic cases. CONIPHER outperforms similar methods on simulated datasets, and in particular scales to a large number of tumor samples and clones, while completing in under 1.5 h on average. CONIPHER enables automated phylogenetic analysis that can be effectively applied to large sequencing datasets generated with different technologies. CONIPHER can be run with a basic knowledge of bioinformatics and R and bash scripting languages.
Key points
-
CONIPHER is a computational framework for accurately inferring subclonal structure and phylogenetic relationships from multisample tumor sequencing, accounting for both copy number alterations and mutation errors.
-
Benchmarking analyses on simulations show that CONIPHER outperforms similar methods, and in particular scales to a large number of tumor samples and clones. This enables automated phylogenetic analysis that can be effectively applied to large sequencing datasets generated with different technologies.
Similar content being viewed by others
Data availability
The WES data (from the TRACERx study) used during this study have been deposited in the European Genome–phenome Archive, which is hosted by The European Bioinformatics Institute and the Centre for Genomic Regulation under the accession code EGAS00001006494; access is controlled by the TRACERx data access committee. Details on how to apply for access are available on the linked page. The three simulated datasets used in the benchmarking analyses are available at https://zenodo.org/doi/10.5281/zenodo.10048164.
Code availability
The code to run the CONIPHER clustering and tree-building wrapper functions can be found with documentation and run examples on the Github page at https://github.com/McGranahanLab/CONIPHER-wrapper. The source code for the CONIPHER R package can be found on the Github page at https://github.com/McGranahanLab/CONIPHER. The simulation framework can be found on the Github page at https://github.com/zaccaria-lab/TRACERx_simulation_tool. The code in this protocol has been peer reviewed.
References
Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012).
Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).
Maley, C. C. et al. Genetic clonal diversity predicts progression to esophageal adenocarcinoma. Nat. Genet. 38, 468–473 (2006).
Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020).
Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525–533 (2023).
Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 184, 2239–2254.e39 (2021).
Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396–398 (2014).
Satas, G., Zaccaria, S., El-Kebir, M. & Raphael, B. J. DeCiFering the elusive cancer cell fraction in tumor heterogeneity and evolution. Cell Syst. 12, 1004–1018.e10 (2021).
Satas, G. & Raphael, B. J. Tumor phylogeny inference using tree-constrained importance sampling. Bioinformatics 33, i152–i160 (2017).
Deshwar, A. G. et al. PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol. 16, 35 (2015).
Dentro, S. C., Wedge, D. C. & Van Loo, P. Principles of reconstructing the subclonal architecture of cancers. Cold Spring Harb. Perspect. Med. 7, a026625 (2017).
Tarabichi, M. et al. A practical guide to cancer subclonal reconstruction from DNA sequencing. Nat. Methods 18, 144–155 (2021).
Malikic, S., McPherson, A. W., Donmez, N. & Sahinalp, C. S. Clonality inference in multiple tumor samples using phylogeny. Bioinformatics 31, 1349–1356 (2015).
El-Kebir, M., Oesper, L., Acheson-Field, H. & Raphael, B. J. Reconstruction of clonal trees and tumor composition from multi-sample sequencing data. Bioinformatics 31, i62–i70 (2015).
Popic, V. et al. Fast and scalable inference of multi-sample cancer lineages. Genome Biol. 16, 91 (2015).
Gundem, G. et al. The evolutionary history of lethal metastatic prostate cancer. Nature 520, 353–357 (2015).
Al Bakir, M. et al. The evolution of non-small cell lung cancer metastases in TRACERx. Nature 616, 534–542 (2023).
Martínez-Ruiz, C. et al. Genomic-transcriptomic evolution in lung cancer and metastasis. Nature 616, 543–552 (2023).
Abbosh, C. et al. Tracking early lung cancer metastatic dissemination in TRACERx using ctDNA. Nature 616, 553–562 (2023).
Karasaki, T. et al. Evolutionary characterization of lung adenocarcinoma morphology in TRACERx. Nat. Med. 29, 833–845 (2023).
Benjamin, D. et al. Calling somatic SNVs and indels with Mutect2. Preprint at https://www.biorxiv.org/content/10.1101/861054v1 (2019).
Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).
Zaccaria, S. & Raphael, B. J. Accurate quantification of copy-number aberrations and whole-genome duplications in multi-sample tumor sequencing data. Nat. Commun. 11, 4301 (2020).
Favero, F. et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann. Oncol. 26, 64–70 (2015).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
McGranahan, N. et al. Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci. Transl. Med. 7, 283ra54 (2015).
Myers, M. A., Satas, G. & Raphael, B. J. CALDER: inferring phylogenetic trees from longitudinal tumor samples. Cell Syst. 8, 514–522.e5 (2019).
Wintersinger, J. A. et al. Reconstructing complex cancer evolutionary histories from multiple bulk DNA samples using Pairtree. Blood Cancer Discov. 3, 208–219 (2022).
Acknowledgements
The TRACERx study (Clinicaltrials.gov no. NCT01888601) is sponsored by University College London (UCL/12/0279) and has been approved by an independent Research Ethics Committee (13/LO/1546). TRACERx is funded by Cancer Research UK (CRUK) (C11496/A17786) and coordinated through the CRUK and UCL Cancer Trials Centre, which has a core grant from CRUK (C444/A15953). We gratefully acknowledge the patients and relatives who participated in the TRACERx study. We thank all site personnel, investigators, funders and industry partners that supported the generation of the data within this study. This work was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (CC2041), the UK Medical Research Council (CC2041) and the Wellcome Trust (CC2041). This work was also supported by the Cancer Research UK Lung Cancer Centre of Excellence, the CRUK City of London Centre Award (C7893/A26233) and the UCL Experimental Cancer Research Centre. C.S. is a Royal Society Napier Research Professor (RSRP\R\210001); is supported by the Francis Crick Institute that receives its core funding from Cancer Research UK (CC2041), the UK Medical Research Council (CC2041) and the Wellcome Trust (CC2041). For the purpose of open access, the author has applied a CC BY public copyright licence to any author accepted manuscript version arising from this submission. C.S. is funded by Cancer Research UK (TRACERx (C11496/A17786), PEACE (C416/A21999) and CRUK Cancer Immunotherapy Catalyst Network); Cancer Research UK Lung Cancer Centre of Excellence (C11496/A30025); the Rosetrees Trust, Butterfield and Stoneygate Trusts; NovoNordisk Foundation (ID16584); Royal Society Professorship Enhancement Award (RP/EA/180007); National Institute for Health Research (NIHR) University College London Hospitals Biomedical Research Centre; the Cancer Research UK–University College London Centre; the Experimental Cancer Medicine Centre; the Breast Cancer Research Foundation (US); the Mark Foundation for Cancer Research Aspire Award (21-029-ASP); and is in receipt of an ERC Advanced Grant (PROTEUS) from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (835297). This work was supported by a Stand Up To Cancer‐LUNGevity-American Lung Association Lung Cancer Interception Dream Team Translational Research Grant (SU2C-AACR-DT23-17 to S. M. Dubinett and A. E. Spira). Stand Up To Cancer is a division of the Entertainment Industry Foundation. Research grants are administered by the American Association for Cancer Research, the Scientific Partner of SU2C. S.Z. is a Cancer Research UK Career Development Fellow (RCCCDF-Nov21\100005), is supported by Rosetrees Trust (M917) and is also supported by a Cancer Research UK UCL Centre Non-Clinical Training Award (CANTAC721\100022). N.M. is a Sir Henry Dale Fellow, jointly funded by the Wellcome Trust and the Royal Society (211179/Z/18/Z) and also receives funding from Cancer Research UK, Rosetrees and the NIHR BRC at University College London Hospitals and the CRUK University College London Experimental Cancer Medicine Centre.
Author information
Authors and Affiliations
Contributions
K.G., A.H., E.C., A.M.F., K.T., N.J.B. and N.M. helped to develop the protocol and wrote the manuscript. A.B. and S.Z. created the simulations, performed the benchmarking and wrote the manuscript. M.S.H. helped with bioinformatics pipeline development. C.S., S.Z. and N.M. jointly designed and supervised the study and helped to write the manuscript.
Corresponding authors
Ethics declarations
Competing interests
A.M.F. is co-inventor to a patent application to determine methods and systems for tumor monitoring (PCT/EP2022/077987). N.J.B. is a co-inventor to a patent to identify responders to cancer treatment (PCT/GB2018/051912) and a co-inventor on a patent for methods for predicting anti-cancer response (US14/466,208). C.S. acknowledges grant support from AstraZeneca, Boehringer-Ingelheim, Bristol Myers Squibb, Pfizer, Roche-Ventana, Invitae (previously Archer Dx Inc—collaboration in minimal residual disease sequencing technologies) and Ono Pharmaceutical, and Personalis. He is an AstraZeneca Advisory Board member and Chief Investigator for the AZ MeRmaiD 1 and 2 clinical trials and is also Co-Chief Investigator of the NHS Galleri trial funded by GRAIL and a paid member of GRAIL’s Scientific Advisory Board. He receives consultant fees from Achilles Therapeutics (also a Scientific Advisory Board member), Bicycle Therapeutics (also a Scientific Advisory Board member), Genentech, Medicxi, China Innovation Centre of Roche (CICoR) formerly Roche Innovation Centre–Shanghai, Metabomed (until July 2022) and the Sarah Cannon Research Institute. He has received honoraria from Amgen, AstraZeneca, Pfizer, Novartis, GlaxoSmithKline, MSD, Bristol Myers Squibb, Illumina, and Roche-Ventana; had stock options in Apogen Biotechnologies and GRAIL until June 2021, and currently has stock options in Epic Bioscience, Bicycle Therapeutics, and has stock options and is co-founder of Achilles Therapeutics. C.S. is listed as an inventor on a European patent application relating to assay technology to detect tumor recurrence (PCT/GB2017/053289), which has been licensed to commercial entities and, under his terms of employment, C.S. is due a revenue share of any revenue generated from such license(s). C.S. holds patents relating to targeting neoantigens (PCT/EP2016/059401), to identifying patient response to immune checkpoint blockade (PCT/EP2016/071471), to determining human leukocyte antigen loss of heterozygosity (HLA LOH) (PCT/GB2018/052004), to predicting survival rates of patients with cancer (PCT/GB2020/050221), to identifying patients who respond to cancer treatment (PCT/GB2018/051912), to detecting tumor mutations (US patent: PCT/US2017/28013) and to methods for lung cancer detection (US20190106751A1). He also holds both a European and US patent related to identifying indel targets (PCT/GB2018/051892) and is co-inventor to a patent application to determine methods and systems for tumor monitoring (PCT/EP2022/077987) and is a named inventor on a provisional patent protection related to a ctDNA detection algorithm. N.M. has received consultancy fees and has stock options in Achilles Therapeutics. N.M. holds European patents relating to targeting neoantigens (PCT/EP2016/059401), identifying patient response to immune checkpoint blockade (PCT/EP2016/071471), determining HLA LOH (PCT/GB2018/052004) and predicting survival rates of patients with cancer (PCT/GB2020/050221).
Peer review
Peer review information
Nature Protocols thanks Alexander Anderson, Tim Coorens, Andrew Roth and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Frankell, A. M. et al. Nature 616, 525–533 (2023): https://doi.org/10.1038/s41586-023-05783-5
Al Bakir, M. et al. Nature 616, 534–542 (2023): https://doi.org/10.1038/s41586-023-05729-x
Martínez-Ruiz, C. et al. Nature 616, 543–552 (2023): https://doi.org/10.1038/s41586-023-05706-4
Abbosh, C. et al. Nature 616, 553–562 (2023): https://doi.org/10.1038/s41586-023-05776-4
Karasaki, T. et al. Nat. Med. 29, 833–845 (2023): https://doi.org/10.1038/s41591-023-02230-w
Supplementary information
Supplementary Information
Supplementary Methods 1–5, Notes 1–4, Figs. 1–4 and Table 1
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Grigoriadis, K., Huebner, A., Bunkum, A. et al. CONIPHER: a computational framework for scalable phylogenetic reconstruction with error correction. Nat Protoc 19, 159–183 (2024). https://doi.org/10.1038/s41596-023-00913-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-023-00913-9
- Springer Nature Limited