Variational approximation for importance sampling

Su, Xiao; Chen, Yuguo

doi:10.1007/s00180-021-01063-w

Variational approximation for importance sampling

Original paper
Published: 13 January 2021

Volume 36, pages 1901–1930, (2021)
Cite this article

Computational Statistics Aims and scope Submit manuscript

637 Accesses
2 Citations
Explore all metrics

Abstract

We propose an importance sampling algorithm with proposal distribution obtained from variational approximation. This method combines the strength of both importance sampling and variational method. On one hand, this method avoids the bias from variational method. On the other hand, variational approximation provides a way to design the proposal distribution for the importance sampling algorithm. Theoretical justification of the proposed method is provided. Numerical results show that using variational approximation as the proposal can improve the performance of importance sampling and sequential importance sampling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convergence rates for optimised adaptive importance samplers

Article Open access 21 January 2021

Implicitly adaptive importance sampling

Article Open access 09 February 2021

Importance conditional sampling for Pitman–Yor mixtures

Article Open access 17 May 2022

References

Ali SM, Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J R Stat Soc Ser B 28:131–142
MathSciNet MATH Google Scholar
Armagan A, Dunson D (2011) Sparse variational analysis of linear mixed models for large data sets. Stat Probab Lett 81:1056–1062
Article MathSciNet Google Scholar
Beal MJ, Ghahramani Z (2003) The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. Bayesian Stat 7:453–464
MathSciNet Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
MATH Google Scholar
Blei DM, Jordan MI (2006) Variational inference for Dirichlet process mixtures. Bayesian Anal 1:121–143
Article MathSciNet Google Scholar
Blei DM, Kucukelbir A, Mcauliffe JD (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112:859–877
Article MathSciNet Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Bugallo MF, Elvira V, Martino L, Luengo D, Miguez J, Djuric PM (2017) Adaptive importance sampling: the past, the present, and the future. IEEE Signal Process Mag 34:60–79
Article Google Scholar
Cappé O, Douc R, Guillin A, Marin J-M, Robert CP (2008) Adaptive importance sampling in general mixture classes. Stat Comput 18:447–459
Article MathSciNet Google Scholar
Cappé O, Guillin A, Marin J-M, Robert CP (2004) Population Monte Carlo. J Comput Graph Stat 13:907–929
Article MathSciNet Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
MathSciNet MATH Google Scholar
Depraetere N, Vandebroek M (2017) A comparison of variational approximations for fast inference in mixed logit models. Comput Stat 32:93–125
Article MathSciNet Google Scholar
Dieng AB, Tran D, Ranganath R, Paisley J, Blei D (2017) Variational inference via $\chi $ upper bound minimization. Adv Neural Inf Process Syst 30:2732–2741
Google Scholar
Doucet A, Godsill S, Andrieu C (2000) On sequential Monte Carlo sampling methods for Bayesian filtering. Stat Comput 10:197–208
Article Google Scholar
Dowling M, Nassar J, Djurić PM, Bugallo MF (2018) Improved adaptive importance sampling based on variational inference. In: Proceedings of the 26th European signal processing conference (EUSIPCO), IEEE, pp 1632–1636
Hofman JM, Wiggins CH (2008) Bayesian approach to network modularity. Phys Rev Lett 100:258701
Article Google Scholar
Hughes MC, Sudderth E (2013) Memoized online variational inference for Dirichlet process mixture models. Adv Neural Inf Process Syst 1133–1141
Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233
Article Google Scholar
Kong A (1992) A note on importance sampling using standardized weights. University of Chicago, Dept. of Statistics, Tech. Rep 348
Kong A, Liu JS, Wong WH (1994) Sequential imputations and Bayesian missing data problems. J Am Stat Assoc 89:278–288
Article Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Article MathSciNet Google Scholar
Liu JS, Chen R (1998) Sequential Monte Carlo methods for dynamic systems. J Am Stat Assoc 93:1032–1044
Article MathSciNet Google Scholar
Martino L, Elvira V, Louzada F (2017) Effective sample size for importance sampling based on discrepancy measures. Signal Process 131:386–401
Article Google Scholar
Naesseth C, Linderman S, Ranganath R, Blei D (2018) Variational sequential Monte Carlo. In: Proceedings of the twenty-first international conference on artificial intelligence and statistics, proceedings of machine learning research, pp 968–977
Neal RM (2001) Annealed importance sampling. Stat Comput 11:125–139
Article MathSciNet Google Scholar
Owen AB (2013) Monte Carlo theory, methods and examples. http://statweb.stanford.edu/~owen/mc/
O’Hagan A, White A (2019) Improved model-based clustering performance using Bayesian initialization averaging. Comput Stat 34:201–231
Article MathSciNet Google Scholar
Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22:400–407
Article MathSciNet Google Scholar
Sanguinetti G, Lawrence ND, Rattray M (2006) Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities. Bioinformatics 22:2775–2781
Article Google Scholar
Sason I, Verdú S (2016) $f$-divergence inequalities. IEEE Trans Inf Theory 62:5973–6006
Article MathSciNet Google Scholar
Wang P, Blunsom P (2013) Collapsed variational Bayesian inference for hidden Markov models. AISTATS 599–607
Xing EP, Jordan MI, Russell S (2002) A generalized mean field algorithm for variational inference in exponential families. In: Proceedings of the nineteenth conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., pp 583–591
You C, Ormerod JT, Mueller S (2014) On variational Bayes estimation and variational information criteria for linear regression models. Aust N Z J Stat 56:73–87
Article MathSciNet Google Scholar
Zreik R, Latouche P, Bouveyron C (2017) The dynamic random subgraph model for the clustering of evolving networks. Comput Stat 32:501–533
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Amazon, Seattle, WA, 98109, USA
Xiao Su
Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
Yuguo Chen

Authors

Xiao Su
View author publications
You can also search for this author in PubMed Google Scholar
Yuguo Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuguo Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by National Science Foundation Grant DMS-2015561.

Appendices

Proof of Lemma 1

Proof

We have $\lim _{n\rightarrow \infty } \beta _{2,n} = 1$ immediately from the definition of convergence in (4).

Now we prove $\lim _{n\rightarrow \infty } \beta _{1,n} = 1$. For $\forall \ \epsilon > 0$ and $\delta > 0$, define $I_1^{(n)} = \{x:\frac{p_n(x)}{q(x)}< 1-\epsilon \}$, $I_2^{(n)} = \{x:1-\epsilon \le \frac{p_n(x)}{q(x)}< 1+\delta \}$, and $I_3^{(n)} = \{x:\frac{p_n(x)}{q(x)}\ge 1+\delta \}$.

From (4), we have for any given $\epsilon >0$, there exists $N\in \mathbb N$ such that for all $n>N$, we have

$$\begin{aligned} \text {ess}\inf \,\frac{p_n}{q} > 1-\epsilon .\end{aligned}$$

By the definition of essential infimum, we have

$$\begin{aligned} \sup \{b \in \mathbb R: \mu (\{x: p_n(x)/q(x) < b \}) = 0 \} > 1-\epsilon , \end{aligned}$$

which implies

$$\begin{aligned} \mu (I_1^{(n)})=\mu (\{x: p_n(x)/q(x) < 1-\epsilon \}) = 0. \end{aligned}$$

Then we have

$$\begin{aligned} \int _{I_1^{(n)}}\frac{p_n}{q}\,q\,dx = \int _{I_1^{(n)}}p_n\,dx =0 \ \ \ \text{ for } n > N.\end{aligned}$$

So

$$\begin{aligned} 1 = \int _{\mathbb {R}} p_n \,dx= & {} \int _{I_1^{(n)}}\frac{p_n}{q}\,q\,dx + \int _{I_2^{(n)}}\frac{p_n}{q}\,q\,dx + \int _{I_3^{(n)}}\frac{p_n}{q}\,q\,dx \\= & {} \int _{I_2^{(n)}}\frac{p_n}{q}\,q\,dx + \int _{I_3^{(n)}}\frac{p_n}{q}\,q\,dx \ \ \ \ \ \text{ for } n > N. \end{aligned}$$

From the definitions of $I_2^{(n)}$ and $I_3^{(n)}$, we have

$$\begin{aligned} 1= & {} \int _{I_2^{(n)}}\frac{p_n}{q}\,q\,dx + \int _{I_3^{(n)}}\frac{p_n}{q}\,q\,dx \nonumber \\\ge & {} (1-\epsilon )\int _{I_2^{(n)}}\,q\,dx + (1+\delta )\int _{I_3^{(n)}}\,q\,dx \ \ \ \ \text{ for } n > N. \end{aligned}$$

(10)

Similarly, we also have

$$\begin{aligned} 1=\int _{\mathbb {R}} q \,dx= & {} \int _{I_1^{(n)}}q\,dx + \int _{I_2^{(n)}}q\,dx + \int _{I_3^{(n)}}q\,dx \nonumber \\= & {} \int _{I_2^{(n)}}q\,dx + \int _{I_3^{(n)}}q\,dx \ \ \ \ \text{ for } n > N. \end{aligned}$$

(11)

From (10) and (11), the following inequality holds:

$$\begin{aligned} 1\ge & {} (1-\epsilon )\int _{I_2^{(n)}}\,q\,dx + (1+\delta )\left( 1-\int _{I_2^{(n)}}\,q\,dx\right) \nonumber \\= & {} (1+\delta ) - (\epsilon +\delta ) \int _{I_2^{(n)}}\,q\,dx \ \ \ \ \text{ for } n > N. \end{aligned}$$

(12)

Suppose $\limsup \limits _{n \rightarrow \infty }\int _{I_2^{(n)}}\,q\,dx = \theta (\epsilon , \delta ) \in [0,1]$, then $\liminf \limits _{n \rightarrow \infty }\int _{I_3^{(n)}}\,q\,dx = 1-\theta (\epsilon , \delta )$ based on (11). Since the definition of $I_3^{(n)}$ depends only on $\delta $, not $\epsilon $, we know that $\liminf \limits _{n \rightarrow \infty }\int _{I_3^{(n)}}\,q\,dx $ also depends only on $\delta $, not $\epsilon $. Thus $\theta (\epsilon , \delta ) = \theta (\delta )$ does not depend on $\epsilon $.

Taking limit inferior on both sides of (12), we have

$$\begin{aligned} 1\ge & {} (1+\delta ) -\limsup \limits _{n \rightarrow \infty }\left( (\epsilon +\delta ) \int _{I_2^{(n)}}\,q\,dx \right) = (1+\delta ) - (\epsilon +\delta )\theta (\delta ). \end{aligned}$$

(13)

Therefore,

$$\begin{aligned} \theta (\delta ) \ge \frac{\delta }{\delta +\epsilon }. \end{aligned}$$

(14)

Note that (14) is true for any $\epsilon > 0$ and $\delta > 0$ selected at the beginning of the proof. Since the left hand side of (14) does not depend on $\epsilon $, letting $\epsilon \rightarrow 0$ on the right hand side of (14), we have $\theta (\delta ) \ge 1$. On the other hand, $\limsup \limits _{n \rightarrow \infty }\int _{I_2^{(n)}}\,q\,dx = \theta (\delta ) \in [0,1]$. Therefore we have $\theta (\delta )=1$, which implies $\liminf \limits _{n \rightarrow \infty }\int _{I_3^{(n)}}\,q\,dx =1-\theta (\delta )= 0$ for any $\delta >0$. From the definition of $\beta _{1,n}$, we have $\lim _{n\rightarrow \infty } \beta _{1,n} = 1$.

Since $\mu (\{x: p_n(x)/q(x) < \beta _{2,n} \}) = \mu (\{x: p_n(x)/q(x) > \beta _{1,n}^{-1} \}) =0$, we have

$$\begin{aligned} D_f(p_n||q) = \int _{\{\beta _{2,n}\le \frac{p_n}{q}\le \beta _{1,n}^{-1}\}} f\left( \frac{p_n}{q}\right) \,q\,dx \le \sup _{\beta _{2,n}\le \beta \le \beta _{1,n}^{-1}}|f(\beta )|.\end{aligned}$$

Letting $n\rightarrow \infty $, due to the continuity of f at 1, we have $\lim _{n\rightarrow \infty } D_f(p_n||q) \le f(1) = 0$. $\square $

Proof of Theorem 1

Proof

From Lemma 1, we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \beta _{1,n} =\lim _{n\rightarrow \infty } \beta _{2,n}= 1. \end{aligned}$$

By L’Hospital’s rule, we have $\lim _{t \rightarrow 1}\kappa (t) = 1$, where $\kappa (t)$ is defined in (5). Therefore, take limit on the both sides of (6) and (7), we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{KL(p_n||q)}{KL(q||p_n)} = 1 \ , \ \ \lim _{n\rightarrow \infty } \frac{KL(p_n||q)}{\chi ^2(p_n||q)} = \frac{1}{2} .\end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Su, X., Chen, Y. Variational approximation for importance sampling. Comput Stat 36, 1901–1930 (2021). https://doi.org/10.1007/s00180-021-01063-w

Download citation

Received: 13 August 2019
Accepted: 05 January 2021
Published: 13 January 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s00180-021-01063-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variational approximation for importance sampling

Abstract

Access this article

Similar content being viewed by others

Convergence rates for optimised adaptive importance samplers

Implicitly adaptive importance sampling

Importance conditional sampling for Pitman–Yor mixtures

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendices

Proof of Lemma 1

Proof

Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Variational approximation for importance sampling

Abstract

Access this article

Similar content being viewed by others

Convergence rates for optimised adaptive importance samplers

Implicitly adaptive importance sampling

Importance conditional sampling for Pitman–Yor mixtures

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendices

Proof of Lemma 1

Proof

Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation