Langevin incremental mixture importance sampling
- 199 Downloads
Abstract
This work proposes a novel method through which local information about the target density can be used to construct an efficient importance sampler. The backbone of the proposed method is the incremental mixture importance sampling (IMIS) algorithm of Raftery and Bao (Biometrics 66(4):1162–1173, 2010), which builds a mixture importance distribution incrementally, by positioning new mixture components where the importance density lacks mass, relative to the target. The key innovation proposed here is to construct the mean vectors and covariance matrices of the mixture components by numerically solving certain differential equations, whose solution depends on the local shape of the target log-density. The new sampler has a number of advantages: (a) it provides an extremely parsimonious parametrization of the mixture importance density, whose configuration effectively depends only on the shape of the target and on a single free parameter representing pseudo-time; (b) it scales well with the dimensionality of the target; (c) it can deal with targets that are not log-concave. The performance of the proposed approach is demonstrated on two synthetic non-Gaussian densities, one being defined on up to eighty dimensions, and on a Bayesian logistic regression model, using the Sonar dataset. The Julia code implementing the importance sampler proposed here can be found at https://github.com/mfasiolo/LIMIS.
Keywords
Importance sampling Langevin diffusion Mixture density Optimal importance distribution Local approximation Kalman-Bucy filterNotes
Acknowledgements
The authors would like to thank Samuel Livingstone and two anonymous referees for providing useful comments on an earlier version of this paper.
Supplementary material
References
- Ascher, U.M, Petzold, L.R.: Computer methods for ordinary differential equations and differential-algebraic equations. Soc. Ind. Appl. Math. 73–78 (1998)Google Scholar
- Bates, S.: Bayesian inference for deterministic simulation models for environmental assessment. PhD Thesis, University of Washington (2001)Google Scholar
- Bezanson, J., Karpinski, S., Shah, V.B., Edelman, A.: Julia: a fast dynamic language for technical computing. arXiv:1209.5145 (2012)
- Brent, R.P.: Algorithms for Minimization Without Derivatives. Courier Corporation, North Chelmsford (2013)MATHGoogle Scholar
- Bucy, R.S., Joseph, P.D.: Filtering for stochastic processes with applications to guidance. Am. Math. Soc. 43–55 (1987)Google Scholar
- Bunch, P., Godsill, S.: Approximations of the optimal importance density using Gaussian particle flow importance sampling. J. Am. Stat. Assoc. 111(514), 748–762 (2016)MathSciNetCrossRefGoogle Scholar
- Cappé, O., Douc, R., Guillin, A., Marin, J.M., Robert, C.P.: Adaptive importance sampling in general mixture classes. Stat. Comput. 18(4), 447–459 (2008)MathSciNetCrossRefGoogle Scholar
- Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., Riddell, A.: Stan: a probabilistic programming language. J. Stat. Softw. 76(1), 1–32 (2017)Google Scholar
- Daum, F., Huang, J.: Particle flow for nonlinear filters with log-homotopy. In: SPIE Defense and Security Symposium, International Society for Optics and Photonics, pp. 696,918–696,918 (2008)Google Scholar
- Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid monte carlo. Phys. Lett. B 195(2), 216–222 (1987)CrossRefGoogle Scholar
- Faes, C., Ormerod, J.T., Wand, M.P.: Variational Bayesian inference for parametric and nonparametric regression with missing data. J. Am. Stat. Assoc. 106(495), 959–971 (2011)MathSciNetCrossRefMATHGoogle Scholar
- Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(2), 123–214 (2011)MathSciNetCrossRefGoogle Scholar
- Givens, G.H., Raftery, A.E.: Local adaptive importance sampling for multivariate densities with strong nonlinear relationships. J. Am. Stat. Assoc. 91(433), 132–141 (1996)MathSciNetCrossRefMATHGoogle Scholar
- Gorman, R.P., Sejnowski, T.J.: Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw. 1(1), 75–89 (1988)CrossRefGoogle Scholar
- Haario, H., Saksman, E., Tamminen, J.: An adaptive Metropolis algorithm. Bernoulli 7(2), 223–242 (2001)MathSciNetCrossRefMATHGoogle Scholar
- Hoffman, M.D., Gelman, A.: The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mac. Learn. Res. 15(1), 1593–1623 (2014)MathSciNetMATHGoogle Scholar
- Ionides, E.L.: Truncated importance sampling. J. Comput. Graph. Stat. 17(2), 295–311 (2008)MathSciNetCrossRefGoogle Scholar
- Kong, A., Liu, J.S., Wong, W.H.: Sequential imputations and Bayesian missing data problems. J. Am. Stat. Assoc. 89(425), 278–288 (1994)CrossRefMATHGoogle Scholar
- Lichman, M.: UCI machine learning repository. URL http://archive.ics.uci.edu/ml (2013)
- Raftery, A.E., Bao, L.: Estimating and projecting trends in HIV/AIDS generalized epidemics using incremental mixture importance sampling. Biometrics 66(4), 1162–1173 (2010)MathSciNetCrossRefMATHGoogle Scholar
- Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4), 341–363 (1996)Google Scholar
- Roberts, G.O., Rosenthal, J.S., et al.: Optimal scaling for various Metropolis-Hastings algorithms. Stat. Sci. 16(4), 351–367 (2001)MathSciNetCrossRefMATHGoogle Scholar
- Schuster I.: Gradient importance sampling. arXiv:1507.05781. (2015)
- Sim, A., Filippi, S., Stumpf, M.P.: Information geometry and sequential Monte Carlo. arXiv:1212.0764. (2012)
- Süli, E., Mayers, D.F.: An Introduction to Numerical Analysis. Cambridge University Press, Cambridge (2003)CrossRefMATHGoogle Scholar
- West, M.: Modelling with mixtures. In: Berger, J., Bernardo, J., Dawid, A., Smith, A. (eds.) Bayesian Statistics, pp. 503–525. Oxford University Press, Oxford (1992)Google Scholar