Skip to main content

Advertisement

Log in

PyClone: statistical inference of clonal population structure in cancer

  • Brief Communication
  • Published:

From Nature Methods

View current issue Submit your manuscript

Abstract

We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1: Comparison of clustering performance for the mixture of normal-tissue data sets.
Figure 2: Joint analysis of multiple samples from high-grade serous ovarian cancer 2.

Similar content being viewed by others

References

  1. Nowell, P.C. Science 194, 23–28 (1976).

    Article  CAS  PubMed  Google Scholar 

  2. Aparicio, S. & Caldas, C. N. Engl. J. Med. 368, 842–851 (2013).

    Article  CAS  PubMed  Google Scholar 

  3. Greaves, M. & Maley, C.C. Nature 481, 306–313 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Shah, S.P. et al. Nature 486, 395–399 (2012).

    Article  CAS  PubMed  Google Scholar 

  5. Ding, L. et al. Nature 481, 506–510 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Nik-Zainal, S. et al. Cell 149, 994–1007 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Carter, S.L. et al. Nat. Biotechnol. 30, 413–421 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Govindan, R. et al. Cell 150, 1121–1134 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Shah, S.P. et al. Nature 461, 809–813 (2009).

    Article  CAS  PubMed  Google Scholar 

  10. Gerlinger, M. et al. N. Engl. J. Med. 366, 883–892 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. The 1000 Genomes Project Consortium. Nature 467, 1061–1073 (2010).

  12. Harismendy, O. et al. Genome Biol. 12, R124 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Rosenberg, A. & Hirschberg, J. in Proc. 2007 Joint Conf. Empir. Methods Natural Lang. Process. Comput. Natural Lang. Learn. (EMNLP-CoNLL) Vol. 410, 420 (2007).

    Google Scholar 

  14. Bashashati, A. et al. J. Pathol. 231, 21–34 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Forshew, T. et al. Sci. Transl. Med. 4, 136ra68 (2012).

    Article  PubMed  Google Scholar 

  16. Dawson, S.J. et al. N. Engl. J. Med. 368, 1199–1209 (2013).

    Article  CAS  PubMed  Google Scholar 

  17. Sottoriva, A. et al. Proc. Natl. Acad. Sci. USA 110, 4009–4014 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Fritsch, A. & Ickstadt, K. Bayesian Anal. 4, 367–392 (2009).

    Article  Google Scholar 

  19. Ng, S.B. et al. Nature 461, 272–276 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Van Loo, P. et al. Proc. Natl. Acad. Sci. USA 107, 16910–16915 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Greenman, C.D. et al. Biostatistics 11, 164–175 (2010).

    Article  PubMed  Google Scholar 

  22. Yau, C. et al. Genome Biol. 11, R92 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Untergasser, A. et al. Nucleic Acids Res. 40, e115 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Li, H. & Durbin, R. Bioinformatics 26, 589–595 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work is funded by Canadian Institutes for Health Research (CIHR), Genome Canada, Genome British Columbia, Canadian Cancer Society Research Institute and Canadian Breast Cancer Foundation grants to S.P.S. and S.A. S.P.S. is supported by the Michael Smith Foundation for Health Research and is the Canada Research Chair (CRC) for Computational Cancer Genomics. S.A. is the CRC for Molecular Oncology. A.R. is supported by a CIHR Banting scholarship.

Author information

Authors and Affiliations

Authors

Contributions

Project conception and oversight: S.P.S., S.A., A.R.; method development: A.R., A.B.-C., S.P.S.; implementation and benchmarking: A.R.; manuscript writing and editing, study design and execution: A.R., A.B.C., S.P.S., S.A.; single-cell sequencing: J.K., D.Y., A.W., E.L., J.B.; data analysis and interpretation: G.H.

Corresponding author

Correspondence to Sohrab P Shah.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14, Supplementary Results, Supplementary Discussion and Supplementary Note (PDF 5370 kb)

Supplementary Table 1

Allelic counts, IBBMM and PyClone PCN cellular prevalence estimates for mutations in high grade serous ovarian cancer case 2. Copy number predictions where inferred using PICNIC as described in the Online Methods. Cellular prevalences where computed by taking the mean of the post burnin trace for the cellular prevalences for the respective methods. The standard deviation of the cellular prevalence parameter estimated from the post burnin trace is also included. Cluster ids (last two columns) were predicted from the post burnin trace using the MPEAR clustering criteria as described in the Online Methods and Supplementary Note. Mutation ids list gene name, chromosome and chromosome coordinate. All coordinates are in the hg19 coordinate system. (XLS 50 kb)

Supplementary Table 2

Allelic counts, IBBMM and PyClone PCN cellular prevalence estimates for mutations in high grade serous ovarian cancer case 1. Copy number predictions where inferred using PICNIC as described in the Online Methods. Cellular prevalences where computed by taking the mean of the post burnin trace for the cellular prevalences for the respective methods. The standard deviation of the cellular prevalence parameter estimated from the post burnin trace is also included. Cluster ids (last two columns) were predicted from the post burnin trace using the MPEAR clustering criteria as described in the Online Methods and Supplementary Note. Mutation ids list gene name, chromosome and chromosome coordinate. All coordinates are in the hg19 coordinate system. (XLSX 40 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roth, A., Khattra, J., Yap, D. et al. PyClone: statistical inference of clonal population structure in cancer. Nat Methods 11, 396–398 (2014). https://doi.org/10.1038/nmeth.2883

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.2883

  • Springer Nature America, Inc.

This article is cited by

Navigation