Parallel-Tempered Feature Allocation for Large-Scale Tumor Heterogeneity with Deep Sequencing Data

  • Yang Ni
  • Peter Müller
  • Max Shpak
  • Yuan JiEmail author
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 218)


We developed a parallel-tempered feature allocation algorithm to infer tumor heterogeneity from deep DNA sequencing data. The feature allocation model is based on a binomial likelihood and an Indian Buffet process prior on the latent haplotypes. A variation of parallel tempering technique is introduced to flatten peaked local modes of the posterior distribution, and yields a more efficient Markov chain Monte Carlo algorithm. Simulation studies provide empirical evidence that the proposed method is superior to competing methods at a high read depth. In our application to Glioblastoma multiforme data, we found several distinctive haplotypes that indicate the presence of multiple subclones in the tumor sample.


Haplotype deconvolution Single nucleotide variants Next-generation sequencing data Indian buffet process Glioblastoma multiforme 



YN, YJ and PM were partially funded by grant NIH R01 CA132891-06A1. MS was supported by the St. David’s Foundation impact fund. Specimen collection, processing and analysis were supported by funds from the St. David’s Impact Fund and the NeuroTexas Research Foundation.


  1. 1.
    Griffiths, T.L., Ghahramani, Z.: Infinite latent feature models and the indian buffet process. NIPS 18, 475–482 (2005)Google Scholar
  2. 2.
    Geyer, C.J.: Markov chain Monte Carlo maximum likelihood. In: Proceedings of the 23rd Symposium on the Interface, Computing Science and Statistics. Interface Foundation, Fairfax Station, VA (1991)Google Scholar
  3. 3.
    Gerlinger, M., Rowan, A.J., Horswell, S., Larkin, J., Endesfelder, D., Gronroos, E., Martinez, P., Matthews, N., Stewart, A., Tarpey, P., et al.: Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 2012(366), 883–892 (2012)CrossRefGoogle Scholar
  4. 4.
    Seoane, J., Mattos-Arruda, D., et al.: The challenge of intratumour heterogeneity in precision medicine. J. Intern. Med. 276(1), 41–51 (2014)CrossRefGoogle Scholar
  5. 5.
    De Bono, J., Ashworth, A.: Translating cancer research into targeted therapeutics. Nature 467(7315), 543–549 (2010)CrossRefGoogle Scholar
  6. 6.
    Snyder, A., Makarov, V., Merghoub, T., Yuan, J., Zaretsky, J.M., Desrichard, A., Walsh, L.A., Postow, M.A., Wong, P., Ho, T.S., et al.: Genetic basis for clinical response to ctla-4 blockade in melanoma. N. Engl. J. Med. 371(23), 2189–2199 (2014)CrossRefGoogle Scholar
  7. 7.
    Campbell, P.J., Pleasance, E.D., Stephens, P.J., Dicks, E., Rance, R., Goodhead, I., Follows, G.A., et al.: Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc. Natl. Acad. Sci. U. S. A. 105(35), 13,081–13,086 (2008)CrossRefGoogle Scholar
  8. 8.
    Ling, S., Hu, Z., Yang, Z., Yang, F., Li, Y., Lin, P., Chen, K., Dong, L., Cao, L., Tao, Y., et al.: Extremely high genetic diversity in a single tumor points to prevalence of non-darwinian cell evolution. Proc. Natl. Acad. Sci. 112(47), E6496–E6505 (2015)CrossRefGoogle Scholar
  9. 9.
    Lee, J., Müller, P., Gulukota, K., Ji, Y., et al.: A bayesian feature allocation model for tumor heterogeneity. Ann/ Appl. Stat. 9(2), 621–639 (2015)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Green, P.J.: Reversible jump markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4), 711–732 (1995)MathSciNetCrossRefGoogle Scholar
  11. 11.
    O’Hagan, A.: Fractional Bayes factors for model comparison. J. R. Stat. Soc. Series B 57(1), 99–138 (1995)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Xu, Y., Müller, P., Yuan, Y., Gulukota, K., Ji, Y.: Mad bayes for tumor heterogeneity-feature allocation with exponential family sampling. J. Am. Stat. Assoc. 110(510), 503–514 (2015)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Dahl, D.B.: Model-based clustering for expression data via a Dirichlet process mixture model. In: Bayesian Inference for Gene Expression and Proteomics, pp. 201–218 (2006)Google Scholar
  14. 14.
    Sottoriva, A., Spiteri, I., Piccirillo, S.G., Touloumis, A., Collins, V.P., Marioni, J.C., Curtis, C., Watts, C., Tavaré, S.: Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proc. Natl. Acad. Sci. 110(10), 4009–4014 (2013)CrossRefGoogle Scholar
  15. 15.
    McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res. 20(9), 1297–1303 (2010)CrossRefGoogle Scholar
  16. 16.
    DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., Del Angel, G., Rivas, M.A., Hanna, M., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43(5), 491–498 (2011)CrossRefGoogle Scholar
  17. 17.
    Auwera, G.A., Carneiro, M.O., Hartl, C., Poplin, R., del Angel, G., Levy-Moonshine, A., Jordan, T., Shakir, K., Roazen, D., Thibault, J., et al.: From FASTQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. In: Current Protocols in Bioinformatics, pp. 11.10.1–11.10.33 (2013)Google Scholar
  18. 18.
    Qiao, W., Quon, G., Csaszar, E., Yu, M., Morris, Q., Zandstra, P.W.: Pert: a method for expression deconvolution of human blood samples from varied microenvironmental and developmental conditions. PLoS Comput. Biol. 8(12), e1002, 838 (2012)CrossRefGoogle Scholar
  19. 19.
    Ahn, J., Yuan, Y., Parmigiani, G., Suraokar, M.B., Diao, L., Wistuba, I.I., Wang, W.: Demix: deconvolution for mixed cancer transcriptomes using raw measured data. Bioinformatics (2013)Google Scholar
  20. 20.
    Ren, B., Bacallado, S., Favaro, S., Vatanen, T., Huttenhower, C., Trippa, L.: Bayesian nonparametric mixed effects models in microbiome data analysis. arXiv:1711.01241 (2017)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Statistics and Data SciencesThe University of Texas at AustinAustinUSA
  2. 2.Department of MathematicsThe University of Texas at AustinAustinUSA
  3. 3.Sarah Cannon Research InstituteNashvilleUSA
  4. 4.Center for Systems and Synthetic BiologyThe University of Texas at AustinAustinUSA
  5. 5.Fresh Pond Research InstituteCambridgeUSA
  6. 6.Program of Computational Genomics & MedicineNorthShore University HealthSystemEvanstonUSA
  7. 7.Department of Public Health SciencesThe University of ChicagoChicagoUSA

Personalised recommendations