Abstract
We propose an importance sampling algorithm with proposal distribution obtained from variational approximation. This method combines the strength of both importance sampling and variational method. On one hand, this method avoids the bias from variational method. On the other hand, variational approximation provides a way to design the proposal distribution for the importance sampling algorithm. Theoretical justification of the proposed method is provided. Numerical results show that using variational approximation as the proposal can improve the performance of importance sampling and sequential importance sampling.
Similar content being viewed by others
References
Ali SM, Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J R Stat Soc Ser B 28:131–142
Armagan A, Dunson D (2011) Sparse variational analysis of linear mixed models for large data sets. Stat Probab Lett 81:1056–1062
Beal MJ, Ghahramani Z (2003) The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. Bayesian Stat 7:453–464
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
Blei DM, Jordan MI (2006) Variational inference for Dirichlet process mixtures. Bayesian Anal 1:121–143
Blei DM, Kucukelbir A, Mcauliffe JD (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112:859–877
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Bugallo MF, Elvira V, Martino L, Luengo D, Miguez J, Djuric PM (2017) Adaptive importance sampling: the past, the present, and the future. IEEE Signal Process Mag 34:60–79
Cappé O, Douc R, Guillin A, Marin J-M, Robert CP (2008) Adaptive importance sampling in general mixture classes. Stat Comput 18:447–459
Cappé O, Guillin A, Marin J-M, Robert CP (2004) Population Monte Carlo. J Comput Graph Stat 13:907–929
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
Depraetere N, Vandebroek M (2017) A comparison of variational approximations for fast inference in mixed logit models. Comput Stat 32:93–125
Dieng AB, Tran D, Ranganath R, Paisley J, Blei D (2017) Variational inference via \(\chi \) upper bound minimization. Adv Neural Inf Process Syst 30:2732–2741
Doucet A, Godsill S, Andrieu C (2000) On sequential Monte Carlo sampling methods for Bayesian filtering. Stat Comput 10:197–208
Dowling M, Nassar J, Djurić PM, Bugallo MF (2018) Improved adaptive importance sampling based on variational inference. In: Proceedings of the 26th European signal processing conference (EUSIPCO), IEEE, pp 1632–1636
Hofman JM, Wiggins CH (2008) Bayesian approach to network modularity. Phys Rev Lett 100:258701
Hughes MC, Sudderth E (2013) Memoized online variational inference for Dirichlet process mixture models. Adv Neural Inf Process Syst 1133–1141
Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233
Kong A (1992) A note on importance sampling using standardized weights. University of Chicago, Dept. of Statistics, Tech. Rep 348
Kong A, Liu JS, Wong WH (1994) Sequential imputations and Bayesian missing data problems. J Am Stat Assoc 89:278–288
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Liu JS, Chen R (1998) Sequential Monte Carlo methods for dynamic systems. J Am Stat Assoc 93:1032–1044
Martino L, Elvira V, Louzada F (2017) Effective sample size for importance sampling based on discrepancy measures. Signal Process 131:386–401
Naesseth C, Linderman S, Ranganath R, Blei D (2018) Variational sequential Monte Carlo. In: Proceedings of the twenty-first international conference on artificial intelligence and statistics, proceedings of machine learning research, pp 968–977
Neal RM (2001) Annealed importance sampling. Stat Comput 11:125–139
Owen AB (2013) Monte Carlo theory, methods and examples. http://statweb.stanford.edu/~owen/mc/
O’Hagan A, White A (2019) Improved model-based clustering performance using Bayesian initialization averaging. Comput Stat 34:201–231
Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22:400–407
Sanguinetti G, Lawrence ND, Rattray M (2006) Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities. Bioinformatics 22:2775–2781
Sason I, Verdú S (2016) \(f\)-divergence inequalities. IEEE Trans Inf Theory 62:5973–6006
Wang P, Blunsom P (2013) Collapsed variational Bayesian inference for hidden Markov models. AISTATS 599–607
Xing EP, Jordan MI, Russell S (2002) A generalized mean field algorithm for variational inference in exponential families. In: Proceedings of the nineteenth conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., pp 583–591
You C, Ormerod JT, Mueller S (2014) On variational Bayes estimation and variational information criteria for linear regression models. Aust N Z J Stat 56:73–87
Zreik R, Latouche P, Bouveyron C (2017) The dynamic random subgraph model for the clustering of evolving networks. Comput Stat 32:501–533
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported in part by National Science Foundation Grant DMS-2015561.
Appendices
Appendices
Proof of Lemma 1
Proof
We have \(\lim _{n\rightarrow \infty } \beta _{2,n} = 1\) immediately from the definition of convergence in (4).
Now we prove \(\lim _{n\rightarrow \infty } \beta _{1,n} = 1\). For \(\forall \ \epsilon > 0\) and \(\delta > 0\), define \(I_1^{(n)} = \{x:\frac{p_n(x)}{q(x)}< 1-\epsilon \}\), \(I_2^{(n)} = \{x:1-\epsilon \le \frac{p_n(x)}{q(x)}< 1+\delta \}\), and \(I_3^{(n)} = \{x:\frac{p_n(x)}{q(x)}\ge 1+\delta \}\).
From (4), we have for any given \(\epsilon >0\), there exists \(N\in \mathbb N\) such that for all \(n>N\), we have
By the definition of essential infimum, we have
which implies
Then we have
So
From the definitions of \(I_2^{(n)}\) and \(I_3^{(n)}\), we have
Similarly, we also have
From (10) and (11), the following inequality holds:
Suppose \(\limsup \limits _{n \rightarrow \infty }\int _{I_2^{(n)}}\,q\,dx = \theta (\epsilon , \delta ) \in [0,1]\), then \(\liminf \limits _{n \rightarrow \infty }\int _{I_3^{(n)}}\,q\,dx = 1-\theta (\epsilon , \delta )\) based on (11). Since the definition of \(I_3^{(n)}\) depends only on \(\delta \), not \(\epsilon \), we know that \(\liminf \limits _{n \rightarrow \infty }\int _{I_3^{(n)}}\,q\,dx \) also depends only on \(\delta \), not \(\epsilon \). Thus \(\theta (\epsilon , \delta ) = \theta (\delta )\) does not depend on \(\epsilon \).
Taking limit inferior on both sides of (12), we have
Therefore,
Note that (14) is true for any \(\epsilon > 0\) and \(\delta > 0\) selected at the beginning of the proof. Since the left hand side of (14) does not depend on \(\epsilon \), letting \(\epsilon \rightarrow 0\) on the right hand side of (14), we have \(\theta (\delta ) \ge 1\). On the other hand, \(\limsup \limits _{n \rightarrow \infty }\int _{I_2^{(n)}}\,q\,dx = \theta (\delta ) \in [0,1]\). Therefore we have \(\theta (\delta )=1\), which implies \(\liminf \limits _{n \rightarrow \infty }\int _{I_3^{(n)}}\,q\,dx =1-\theta (\delta )= 0\) for any \(\delta >0\). From the definition of \(\beta _{1,n}\), we have \(\lim _{n\rightarrow \infty } \beta _{1,n} = 1\).
Since \(\mu (\{x: p_n(x)/q(x) < \beta _{2,n} \}) = \mu (\{x: p_n(x)/q(x) > \beta _{1,n}^{-1} \}) =0\), we have
Letting \(n\rightarrow \infty \), due to the continuity of f at 1, we have \(\lim _{n\rightarrow \infty } D_f(p_n||q) \le f(1) = 0\). \(\square \)
Proof of Theorem 1
Proof
From Lemma 1, we have
By L’Hospital’s rule, we have \(\lim _{t \rightarrow 1}\kappa (t) = 1\), where \(\kappa (t)\) is defined in (5). Therefore, take limit on the both sides of (6) and (7), we have
\(\square \)
Rights and permissions
About this article
Cite this article
Su, X., Chen, Y. Variational approximation for importance sampling. Comput Stat 36, 1901–1930 (2021). https://doi.org/10.1007/s00180-021-01063-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-021-01063-w