Abstract
The advances in variational inference are providing promising paths in Bayesian estimation problems. These advances make variational phylogenetic inference an alternative approach to Markov Chain Monte Carlo methods for approximating the phylogenetic posterior. However, one of the main drawbacks of such approaches is modelling the prior through fixed distributions, which could bias the posterior approximation if they are distant from the current data distribution. In this paper, we propose an approach and an implementation framework to relax the rigidity of the prior densities by learning their parameters using a gradient-based method and a neural network-based parameterization. We applied this approach for branch lengths and evolutionary parameters estimation under several Markov chain substitution models. The results of performed simulations show that the approach is powerful in estimating branch lengths and evolutionary model parameters. They also show that a flexible prior model could provide better results than a predefined prior model. Finally, the results highlight that using neural networks improves the initialization of the optimization of the prior density parameters.
Supported by NSERC, FRQNT, Genome Canada and The Digital Research Alliance of Canada.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alfaro, M.E., Holder, M.T.: The posterior and the prior in Bayesian phylogenetics. Ann. Rev. Ecol. Evol. Syst. 37(1), 19–42 (2006). https://doi.org/10.1146/annurev.ecolsys.37.091305.110021
Ayres, D.L., et al.: BEAGLE 3: improved performance, scaling, and usability for a high-performance computing library for statistical phylogenetics. Syst. Biol. 68(6), 1052–1061 (2019). https://doi.org/10.1093/sysbio/syz020
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006). https://link.springer.com/book/9780387310732
Brown, J.M., Hedtke, S.M., Lemmon, A.R., Lemmon, E.M.: When trees grow too long: investigating the causes of highly inaccurate Bayesian branch-length estimates. Syst. Biol. 59(2), 145–161 (2010). https://doi.org/10.1093/sysbio/syp081
Carpenter, B., et al.: Stan: a probabilistic programming language. J. Stat. Softw. 76(1), 1–32 (2017). https://doi.org/10.18637/jss.v076.i01
Cohn, I., El-Hay, T., Friedman, N., Kupferman, R.: Mean field variational approximation for continuous-time Bayesian networks. J. Mach. Learn. Res. 11(93), 2745–2783 (2010). http://jmlr.org/papers/v11/cohn10a.html
Dang, T., Kishino, H.: Stochastic variational inference for Bayesian phylogenetics: a case of CAT model. Mol. Biol. Evol. 36(4), 825–833 (2019)
Fabreti, L.G., Höhna, S.: Bayesian inference of phylogeny is robust to substitution model over-parameterization. bioRxiv, pp. 2022–02 (2022). https://doi.org/10.1101/2022.02.17.480861
Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17(6), 368–376 (1981). https://doi.org/10.1007/BF01734359
Fisher, A.A., Hassler, G.W., Ji, X., Baele, G., Suchard, M.A., Lemey, P.: Scalable Bayesian phylogenetics. Philos. Trans. R. Soc. B Biol. Sci. 377(1861) (2022). https://doi.org/10/grqt53
Fortuin, V.: Priors in Bayesian deep learning: a review. Int. Stat. Rev. (2022). https://doi.org/10.1111/insr.12502. arXiv:2105.06868
Fourment, M., Darling, A.E.: Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics. PeerJ 7(12), e8272 (2019). https://doi.org/10.7717/peerj.8272
Fourment, M., Magee, A.F., Whidden, C., Bilge, A., Matsen, F.A., Minin, V.N.: 19 dubious ways to compute the marginal likelihood of a phylogenetic tree topology. Syst. Biol. 69(2), 209–220 (2020). https://doi.org/10.1093/sysbio/syz046. arXiv: 1811.11804
Hasegawa, M., Kishino, H., Yano, T.: Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22(2), 160–174 (1985). https://doi.org/10.1007/BF02101694
Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14(40), 1303–1347 (2013). http://jmlr.org/papers/v14/hoffman13a.html
Hoffman, M.D., Johnson, M.J.: ELBO surgery: yet another way to carve up the variational evidence lower bound. In: Advances in Approximate Bayesian Inference. Neurips Workshop, Barcelona, Spain (2016). http://approximateinference.org/2016/accepted/HoffmanJohnson2016.pdf
Huelsenbeck, J.P., Larget, B., Miller, R.E., Ronquist, F.: Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol. 51(5), 673–688 (2002). https://doi.org/10.1080/10635150290102366
Huelsenbeck, J.P., Ronquist, F.: Bayesian Analysis of Molecular Evolution Using MrBayes, pp. 183–226. Springer New York (2005). https://doi.org/10.1007/0-387-27733-1_7
Jojic, V., Jojic, N., Meek, C., Geiger, D., Siepel, A., Haussler, D., Heckerman, D.: Efficient approximations for learning phylogenetic HMM models from data. Bioinformatics 20(Suppl. 1), 161–168 (2004). https://doi.org/10.1093/bioinformatics/bth917
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999). https://doi.org/10.1023/A:1007665907178
Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, H.H. (ed.) Mammalian Protein Metabolism, vol. III, pp. 21–132. Academic Press, New York (1969). https://doi.org/10.1016/B978-1-4832-3211-9.50009-7
Ki, C., Terhorst, J.: Variational phylodynamic inference using pandemic-scale data. Mol. Biol. Evol. 39(8) (2022). https://doi.org/10.1093/molbev/msac154
Kimura, M.: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16(2), 111–120 (1980). https://doi.org/10.1007/BF01731581
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). https://arxiv.org/abs/1412.6980
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the International Conference on Learning Representations (2014). https://arxiv.org/abs/1312.6114
Kingma, D.P., Welling, M.: An introduction to variational autoencoders. Found. Trends Mach. Learn. 12(4), 307–392 (2019). https://doi.org/10.1561/2200000056
Kolaczkowski, B., Thornton, J.W.: Effects of branch length uncertainty on Bayesian posterior probabilities for phylogenetic hypotheses. Mol. Biol. Evol. 24(9), 2108–2118 (2007). https://doi.org/10.1093/molbev/msm141
Krishnan, R., Liang, D., Hoffman, M.: On the challenges of learning with inference networks on sparse, high-dimensional data. In: Storkey, A., Perez-Cruz, F. (eds.) Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 84, pp. 143–151. PMLR (2018). https://proceedings.mlr.press/v84/krishnan18a.html
Kruskal, W.H., Wallis, W.A.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47(260), 583–621 (1952). https://doi.org/10.2307/2280779
Nascimento, F.F., Reis, M.D., Yang, Z.: A biologist’s guide to Bayesian phylogenetic analysis. Nat. Ecol. Evol. 1(10), 1446–1454 (2017). https://doi.org/10.1038/s41559-017-0280-x
Nelson, B.J., Andersen, J.J., Brown, J.M.: Deflating trees: improving Bayesian branch-length estimates using informed priors. Syst. Biol. 64(3), 441–447 (2015). https://doi.org/10.1093/sysbio/syv003
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). https://arxiv.org/abs/1912.01703
Posada, D., Crandall, K.A.: Felsenstein phylogenetic likelihood. J. Mol. Evol. 89(3), 134–145 (2021). https://doi.org/10.1007/s00239-020-09982-w
Ranganath, R., Gerrish, S., Blei, D.: Black box variational inference. In: Kaski, S., Corander, J. (eds.) Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 33, pp. 814–822. PMLR, Reykjavik, Iceland (2014). https://proceedings.mlr.press/v33/ranganath14.html
Rannala, B., Zhu, T., Yang, Z.: Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference. Mol. Biol. Evol. 29(1), 325–335 (2012). https://doi.org/10.1093/molbev/msr210
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32, pp. 1278–1286. PMLR, Bejing, China (2014). https://proceedings.mlr.press/v32/rezende14.html
Spielman, S.J., Wilke, C.O.: Pyvolve: a flexible python module for simulating sequences along phylogenies. PLoS ONE 10(9), 1–7 (2015). https://doi.org/10.1371/journal.pone.0139047
Tavaré, S.: Some probabilistic and statistical problems in the analysis of dna sequences. In: Lectures on Mathematics in the Life Sciences, vol. 17, no. 2, pp. 57–86 (1986)
Tomczak, J., Welling, M.: VAE with a VampPrior. In: Storkey, A., Perez-Cruz, F. (eds.) Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 84, pp. 1214–1223. PMLR (2018). https://proceedings.mlr.press/v84/tomczak18a.html
Wexler, Y., Geiger, D.: Variational upper bounds for probabilistic phylogenetic models. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS, vol. 4453, pp. 226–237. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71681-5_16
Yang, Z.: Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39(1), 105–111 (1994). https://doi.org/10.1007/BF00178256
Yang, Z.: Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol. 11(9), 367–372 (1996). https://doi.org/10.1016/0169-5347(96)10041-0
Yang, Z.: Maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus A. J. Mol. Evol. 51(5), 423–432 (2000). https://doi.org/10.1007/s002390010105
Yang, Z., Rannala, B.: Branch-length prior influences Bayesian posterior probability of phylogeny. Syst. Biol. 54(3), 455–470 (2005). https://doi.org/10.1080/10635150590945313
Zhang, C.: Improved variational Bayesian phylogenetic inference with normalizing flows. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 18760–18771. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/hash/d96409bf894217686ba124d7356686c9-Abstract.html
Zhang, C., Bütepage, J., Kjellström, H., Mandt, S.: Advances in variational inference. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 2008–2026 (2019). https://doi.org/10/ggmzgz
Zhang, C., Matsen, F.A.: Generalizing tree probability estimation via Bayesian networks. In: Advances in Neural Information Processing Systems 2018-Decem(NeurIPS), pp. 1444–1453 (2018). https://proceedings.neurips.cc/paper/2018/file/b137fdd1f79d56c7edf3365fea7520f2-Paper.pdf. arXiv: 1805.07834
Zhang, C., Matsen IV, F.A.: Variational Bayesian phylogenetic inference. In: International Conference on Learning Representations (2019). https://openreview.net/pdf?id=SJVmjjR9FX
Zhang, C., Matsen IV, F.A.: A variational approach to Bayesian phylogenetic inference. arXiv preprint arXiv:2204.07747 (2022). https://arxiv.org/abs/2204.07747
Zhang, C., Rannala, B., Yang, Z.: Robustness of compound Dirichlet priors for Bayesian inference of branch lengths. Syst. Biol. 61(5), 779–784 (2012). https://doi.org/10.1093/sysbio/sys030
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Remita, A.M., Vitae, G., Diallo, A.B. (2023). Prior Density Learning in Variational Bayesian Phylogenetic Parameters Inference. In: Jahn, K., Vinař, T. (eds) Comparative Genomics. RECOMB-CG 2023. Lecture Notes in Computer Science(), vol 13883. Springer, Cham. https://doi.org/10.1007/978-3-031-36911-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-36911-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36910-0
Online ISBN: 978-3-031-36911-7
eBook Packages: Computer ScienceComputer Science (R0)