Skip to main content

Prior Density Learning in Variational Bayesian Phylogenetic Parameters Inference

  • Conference paper
  • First Online:
Comparative Genomics (RECOMB-CG 2023)

Abstract

The advances in variational inference are providing promising paths in Bayesian estimation problems. These advances make variational phylogenetic inference an alternative approach to Markov Chain Monte Carlo methods for approximating the phylogenetic posterior. However, one of the main drawbacks of such approaches is modelling the prior through fixed distributions, which could bias the posterior approximation if they are distant from the current data distribution. In this paper, we propose an approach and an implementation framework to relax the rigidity of the prior densities by learning their parameters using a gradient-based method and a neural network-based parameterization. We applied this approach for branch lengths and evolutionary parameters estimation under several Markov chain substitution models. The results of performed simulations show that the approach is powerful in estimating branch lengths and evolutionary model parameters. They also show that a flexible prior model could provide better results than a predefined prior model. Finally, the results highlight that using neural networks improves the initialization of the optimization of the prior density parameters.

Supported by NSERC, FRQNT, Genome Canada and The Digital Research Alliance of Canada.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alfaro, M.E., Holder, M.T.: The posterior and the prior in Bayesian phylogenetics. Ann. Rev. Ecol. Evol. Syst. 37(1), 19–42 (2006). https://doi.org/10.1146/annurev.ecolsys.37.091305.110021

    Article  Google Scholar 

  2. Ayres, D.L., et al.: BEAGLE 3: improved performance, scaling, and usability for a high-performance computing library for statistical phylogenetics. Syst. Biol. 68(6), 1052–1061 (2019). https://doi.org/10.1093/sysbio/syz020

    Article  Google Scholar 

  3. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006). https://link.springer.com/book/9780387310732

  4. Brown, J.M., Hedtke, S.M., Lemmon, A.R., Lemmon, E.M.: When trees grow too long: investigating the causes of highly inaccurate Bayesian branch-length estimates. Syst. Biol. 59(2), 145–161 (2010). https://doi.org/10.1093/sysbio/syp081

    Article  Google Scholar 

  5. Carpenter, B., et al.: Stan: a probabilistic programming language. J. Stat. Softw. 76(1), 1–32 (2017). https://doi.org/10.18637/jss.v076.i01

    Article  Google Scholar 

  6. Cohn, I., El-Hay, T., Friedman, N., Kupferman, R.: Mean field variational approximation for continuous-time Bayesian networks. J. Mach. Learn. Res. 11(93), 2745–2783 (2010). http://jmlr.org/papers/v11/cohn10a.html

  7. Dang, T., Kishino, H.: Stochastic variational inference for Bayesian phylogenetics: a case of CAT model. Mol. Biol. Evol. 36(4), 825–833 (2019)

    Article  Google Scholar 

  8. Fabreti, L.G., Höhna, S.: Bayesian inference of phylogeny is robust to substitution model over-parameterization. bioRxiv, pp. 2022–02 (2022). https://doi.org/10.1101/2022.02.17.480861

  9. Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17(6), 368–376 (1981). https://doi.org/10.1007/BF01734359

    Article  Google Scholar 

  10. Fisher, A.A., Hassler, G.W., Ji, X., Baele, G., Suchard, M.A., Lemey, P.: Scalable Bayesian phylogenetics. Philos. Trans. R. Soc. B Biol. Sci. 377(1861) (2022). https://doi.org/10/grqt53

  11. Fortuin, V.: Priors in Bayesian deep learning: a review. Int. Stat. Rev. (2022). https://doi.org/10.1111/insr.12502. arXiv:2105.06868

  12. Fourment, M., Darling, A.E.: Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics. PeerJ 7(12), e8272 (2019). https://doi.org/10.7717/peerj.8272

    Article  Google Scholar 

  13. Fourment, M., Magee, A.F., Whidden, C., Bilge, A., Matsen, F.A., Minin, V.N.: 19 dubious ways to compute the marginal likelihood of a phylogenetic tree topology. Syst. Biol. 69(2), 209–220 (2020). https://doi.org/10.1093/sysbio/syz046. arXiv: 1811.11804

  14. Hasegawa, M., Kishino, H., Yano, T.: Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22(2), 160–174 (1985). https://doi.org/10.1007/BF02101694

    Article  Google Scholar 

  15. Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14(40), 1303–1347 (2013). http://jmlr.org/papers/v14/hoffman13a.html

  16. Hoffman, M.D., Johnson, M.J.: ELBO surgery: yet another way to carve up the variational evidence lower bound. In: Advances in Approximate Bayesian Inference. Neurips Workshop, Barcelona, Spain (2016). http://approximateinference.org/2016/accepted/HoffmanJohnson2016.pdf

  17. Huelsenbeck, J.P., Larget, B., Miller, R.E., Ronquist, F.: Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol. 51(5), 673–688 (2002). https://doi.org/10.1080/10635150290102366

    Article  Google Scholar 

  18. Huelsenbeck, J.P., Ronquist, F.: Bayesian Analysis of Molecular Evolution Using MrBayes, pp. 183–226. Springer New York (2005). https://doi.org/10.1007/0-387-27733-1_7

  19. Jojic, V., Jojic, N., Meek, C., Geiger, D., Siepel, A., Haussler, D., Heckerman, D.: Efficient approximations for learning phylogenetic HMM models from data. Bioinformatics 20(Suppl. 1), 161–168 (2004). https://doi.org/10.1093/bioinformatics/bth917

    Article  Google Scholar 

  20. Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999). https://doi.org/10.1023/A:1007665907178

    Article  MATH  Google Scholar 

  21. Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, H.H. (ed.) Mammalian Protein Metabolism, vol. III, pp. 21–132. Academic Press, New York (1969). https://doi.org/10.1016/B978-1-4832-3211-9.50009-7

  22. Ki, C., Terhorst, J.: Variational phylodynamic inference using pandemic-scale data. Mol. Biol. Evol. 39(8) (2022). https://doi.org/10.1093/molbev/msac154

  23. Kimura, M.: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16(2), 111–120 (1980). https://doi.org/10.1007/BF01731581

    Article  Google Scholar 

  24. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). https://arxiv.org/abs/1412.6980

  25. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the International Conference on Learning Representations (2014). https://arxiv.org/abs/1312.6114

  26. Kingma, D.P., Welling, M.: An introduction to variational autoencoders. Found. Trends Mach. Learn. 12(4), 307–392 (2019). https://doi.org/10.1561/2200000056

    Article  MATH  Google Scholar 

  27. Kolaczkowski, B., Thornton, J.W.: Effects of branch length uncertainty on Bayesian posterior probabilities for phylogenetic hypotheses. Mol. Biol. Evol. 24(9), 2108–2118 (2007). https://doi.org/10.1093/molbev/msm141

    Article  Google Scholar 

  28. Krishnan, R., Liang, D., Hoffman, M.: On the challenges of learning with inference networks on sparse, high-dimensional data. In: Storkey, A., Perez-Cruz, F. (eds.) Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 84, pp. 143–151. PMLR (2018). https://proceedings.mlr.press/v84/krishnan18a.html

  29. Kruskal, W.H., Wallis, W.A.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47(260), 583–621 (1952). https://doi.org/10.2307/2280779

    Article  MATH  Google Scholar 

  30. Nascimento, F.F., Reis, M.D., Yang, Z.: A biologist’s guide to Bayesian phylogenetic analysis. Nat. Ecol. Evol. 1(10), 1446–1454 (2017). https://doi.org/10.1038/s41559-017-0280-x

    Article  Google Scholar 

  31. Nelson, B.J., Andersen, J.J., Brown, J.M.: Deflating trees: improving Bayesian branch-length estimates using informed priors. Syst. Biol. 64(3), 441–447 (2015). https://doi.org/10.1093/sysbio/syv003

    Article  Google Scholar 

  32. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). https://arxiv.org/abs/1912.01703

  33. Posada, D., Crandall, K.A.: Felsenstein phylogenetic likelihood. J. Mol. Evol. 89(3), 134–145 (2021). https://doi.org/10.1007/s00239-020-09982-w

  34. Ranganath, R., Gerrish, S., Blei, D.: Black box variational inference. In: Kaski, S., Corander, J. (eds.) Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 33, pp. 814–822. PMLR, Reykjavik, Iceland (2014). https://proceedings.mlr.press/v33/ranganath14.html

  35. Rannala, B., Zhu, T., Yang, Z.: Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference. Mol. Biol. Evol. 29(1), 325–335 (2012). https://doi.org/10.1093/molbev/msr210

    Article  Google Scholar 

  36. Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32, pp. 1278–1286. PMLR, Bejing, China (2014). https://proceedings.mlr.press/v32/rezende14.html

  37. Spielman, S.J., Wilke, C.O.: Pyvolve: a flexible python module for simulating sequences along phylogenies. PLoS ONE 10(9), 1–7 (2015). https://doi.org/10.1371/journal.pone.0139047

    Article  Google Scholar 

  38. Tavaré, S.: Some probabilistic and statistical problems in the analysis of dna sequences. In: Lectures on Mathematics in the Life Sciences, vol. 17, no. 2, pp. 57–86 (1986)

    Google Scholar 

  39. Tomczak, J., Welling, M.: VAE with a VampPrior. In: Storkey, A., Perez-Cruz, F. (eds.) Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 84, pp. 1214–1223. PMLR (2018). https://proceedings.mlr.press/v84/tomczak18a.html

  40. Wexler, Y., Geiger, D.: Variational upper bounds for probabilistic phylogenetic models. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS, vol. 4453, pp. 226–237. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71681-5_16

    Chapter  Google Scholar 

  41. Yang, Z.: Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39(1), 105–111 (1994). https://doi.org/10.1007/BF00178256

    Article  Google Scholar 

  42. Yang, Z.: Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol. 11(9), 367–372 (1996). https://doi.org/10.1016/0169-5347(96)10041-0

    Article  Google Scholar 

  43. Yang, Z.: Maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus A. J. Mol. Evol. 51(5), 423–432 (2000). https://doi.org/10.1007/s002390010105

    Article  MathSciNet  Google Scholar 

  44. Yang, Z., Rannala, B.: Branch-length prior influences Bayesian posterior probability of phylogeny. Syst. Biol. 54(3), 455–470 (2005). https://doi.org/10.1080/10635150590945313

    Article  Google Scholar 

  45. Zhang, C.: Improved variational Bayesian phylogenetic inference with normalizing flows. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 18760–18771. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/hash/d96409bf894217686ba124d7356686c9-Abstract.html

  46. Zhang, C., Bütepage, J., Kjellström, H., Mandt, S.: Advances in variational inference. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 2008–2026 (2019). https://doi.org/10/ggmzgz

  47. Zhang, C., Matsen, F.A.: Generalizing tree probability estimation via Bayesian networks. In: Advances in Neural Information Processing Systems 2018-Decem(NeurIPS), pp. 1444–1453 (2018). https://proceedings.neurips.cc/paper/2018/file/b137fdd1f79d56c7edf3365fea7520f2-Paper.pdf. arXiv: 1805.07834

  48. Zhang, C., Matsen IV, F.A.: Variational Bayesian phylogenetic inference. In: International Conference on Learning Representations (2019). https://openreview.net/pdf?id=SJVmjjR9FX

  49. Zhang, C., Matsen IV, F.A.: A variational approach to Bayesian phylogenetic inference. arXiv preprint arXiv:2204.07747 (2022). https://arxiv.org/abs/2204.07747

  50. Zhang, C., Rannala, B., Yang, Z.: Robustness of compound Dirichlet priors for Bayesian inference of branch lengths. Syst. Biol. 61(5), 779–784 (2012). https://doi.org/10.1093/sysbio/sys030

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amine M. Remita .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Remita, A.M., Vitae, G., Diallo, A.B. (2023). Prior Density Learning in Variational Bayesian Phylogenetic Parameters Inference. In: Jahn, K., Vinař, T. (eds) Comparative Genomics. RECOMB-CG 2023. Lecture Notes in Computer Science(), vol 13883. Springer, Cham. https://doi.org/10.1007/978-3-031-36911-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36911-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36910-0

  • Online ISBN: 978-3-031-36911-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics