Skip to main content
Log in

Variational approximation for importance sampling

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

We propose an importance sampling algorithm with proposal distribution obtained from variational approximation. This method combines the strength of both importance sampling and variational method. On one hand, this method avoids the bias from variational method. On the other hand, variational approximation provides a way to design the proposal distribution for the importance sampling algorithm. Theoretical justification of the proposed method is provided. Numerical results show that using variational approximation as the proposal can improve the performance of importance sampling and sequential importance sampling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Ali SM, Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J R Stat Soc Ser B 28:131–142

    MathSciNet  MATH  Google Scholar 

  • Armagan A, Dunson D (2011) Sparse variational analysis of linear mixed models for large data sets. Stat Probab Lett 81:1056–1062

    Article  MathSciNet  Google Scholar 

  • Beal MJ, Ghahramani Z (2003) The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. Bayesian Stat 7:453–464

    MathSciNet  Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin

    MATH  Google Scholar 

  • Blei DM, Jordan MI (2006) Variational inference for Dirichlet process mixtures. Bayesian Anal 1:121–143

    Article  MathSciNet  Google Scholar 

  • Blei DM, Kucukelbir A, Mcauliffe JD (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112:859–877

    Article  MathSciNet  Google Scholar 

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Bugallo MF, Elvira V, Martino L, Luengo D, Miguez J, Djuric PM (2017) Adaptive importance sampling: the past, the present, and the future. IEEE Signal Process Mag 34:60–79

    Article  Google Scholar 

  • Cappé O, Douc R, Guillin A, Marin J-M, Robert CP (2008) Adaptive importance sampling in general mixture classes. Stat Comput 18:447–459

    Article  MathSciNet  Google Scholar 

  • Cappé O, Guillin A, Marin J-M, Robert CP (2004) Population Monte Carlo. J Comput Graph Stat 13:907–929

    Article  MathSciNet  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Depraetere N, Vandebroek M (2017) A comparison of variational approximations for fast inference in mixed logit models. Comput Stat 32:93–125

    Article  MathSciNet  Google Scholar 

  • Dieng AB, Tran D, Ranganath R, Paisley J, Blei D (2017) Variational inference via \(\chi \) upper bound minimization. Adv Neural Inf Process Syst 30:2732–2741

    Google Scholar 

  • Doucet A, Godsill S, Andrieu C (2000) On sequential Monte Carlo sampling methods for Bayesian filtering. Stat Comput 10:197–208

    Article  Google Scholar 

  • Dowling M, Nassar J, Djurić PM, Bugallo MF (2018) Improved adaptive importance sampling based on variational inference. In: Proceedings of the 26th European signal processing conference (EUSIPCO), IEEE, pp 1632–1636

  • Hofman JM, Wiggins CH (2008) Bayesian approach to network modularity. Phys Rev Lett 100:258701

    Article  Google Scholar 

  • Hughes MC, Sudderth E (2013) Memoized online variational inference for Dirichlet process mixture models. Adv Neural Inf Process Syst 1133–1141

  • Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233

    Article  Google Scholar 

  • Kong A (1992) A note on importance sampling using standardized weights. University of Chicago, Dept. of Statistics, Tech. Rep 348

  • Kong A, Liu JS, Wong WH (1994) Sequential imputations and Bayesian missing data problems. J Am Stat Assoc 89:278–288

    Article  Google Scholar 

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86

    Article  MathSciNet  Google Scholar 

  • Liu JS, Chen R (1998) Sequential Monte Carlo methods for dynamic systems. J Am Stat Assoc 93:1032–1044

    Article  MathSciNet  Google Scholar 

  • Martino L, Elvira V, Louzada F (2017) Effective sample size for importance sampling based on discrepancy measures. Signal Process 131:386–401

    Article  Google Scholar 

  • Naesseth C, Linderman S, Ranganath R, Blei D (2018) Variational sequential Monte Carlo. In: Proceedings of the twenty-first international conference on artificial intelligence and statistics, proceedings of machine learning research, pp 968–977

  • Neal RM (2001) Annealed importance sampling. Stat Comput 11:125–139

    Article  MathSciNet  Google Scholar 

  • Owen AB (2013) Monte Carlo theory, methods and examples. http://statweb.stanford.edu/~owen/mc/

  • O’Hagan A, White A (2019) Improved model-based clustering performance using Bayesian initialization averaging. Comput Stat 34:201–231

    Article  MathSciNet  Google Scholar 

  • Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22:400–407

    Article  MathSciNet  Google Scholar 

  • Sanguinetti G, Lawrence ND, Rattray M (2006) Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities. Bioinformatics 22:2775–2781

    Article  Google Scholar 

  • Sason I, Verdú S (2016) \(f\)-divergence inequalities. IEEE Trans Inf Theory 62:5973–6006

    Article  MathSciNet  Google Scholar 

  • Wang P, Blunsom P (2013) Collapsed variational Bayesian inference for hidden Markov models. AISTATS 599–607

  • Xing EP, Jordan MI, Russell S (2002) A generalized mean field algorithm for variational inference in exponential families. In: Proceedings of the nineteenth conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., pp 583–591

  • You C, Ormerod JT, Mueller S (2014) On variational Bayes estimation and variational information criteria for linear regression models. Aust N Z J Stat 56:73–87

    Article  MathSciNet  Google Scholar 

  • Zreik R, Latouche P, Bouveyron C (2017) The dynamic random subgraph model for the clustering of evolving networks. Comput Stat 32:501–533

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuguo Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by National Science Foundation Grant DMS-2015561.

Appendices

Appendices

Proof of Lemma 1

Proof

We have \(\lim _{n\rightarrow \infty } \beta _{2,n} = 1\) immediately from the definition of convergence in (4).

Now we prove \(\lim _{n\rightarrow \infty } \beta _{1,n} = 1\). For \(\forall \ \epsilon > 0\) and \(\delta > 0\), define \(I_1^{(n)} = \{x:\frac{p_n(x)}{q(x)}< 1-\epsilon \}\), \(I_2^{(n)} = \{x:1-\epsilon \le \frac{p_n(x)}{q(x)}< 1+\delta \}\), and \(I_3^{(n)} = \{x:\frac{p_n(x)}{q(x)}\ge 1+\delta \}\).

From (4), we have for any given \(\epsilon >0\), there exists \(N\in \mathbb N\) such that for all \(n>N\), we have

$$\begin{aligned} \text {ess}\inf \,\frac{p_n}{q} > 1-\epsilon .\end{aligned}$$

By the definition of essential infimum, we have

$$\begin{aligned} \sup \{b \in \mathbb R: \mu (\{x: p_n(x)/q(x) < b \}) = 0 \} > 1-\epsilon , \end{aligned}$$

which implies

$$\begin{aligned} \mu (I_1^{(n)})=\mu (\{x: p_n(x)/q(x) < 1-\epsilon \}) = 0. \end{aligned}$$

Then we have

$$\begin{aligned} \int _{I_1^{(n)}}\frac{p_n}{q}\,q\,dx = \int _{I_1^{(n)}}p_n\,dx =0 \ \ \ \text{ for } n > N.\end{aligned}$$

So

$$\begin{aligned} 1 = \int _{\mathbb {R}} p_n \,dx= & {} \int _{I_1^{(n)}}\frac{p_n}{q}\,q\,dx + \int _{I_2^{(n)}}\frac{p_n}{q}\,q\,dx + \int _{I_3^{(n)}}\frac{p_n}{q}\,q\,dx \\= & {} \int _{I_2^{(n)}}\frac{p_n}{q}\,q\,dx + \int _{I_3^{(n)}}\frac{p_n}{q}\,q\,dx \ \ \ \ \ \text{ for } n > N. \end{aligned}$$

From the definitions of \(I_2^{(n)}\) and \(I_3^{(n)}\), we have

$$\begin{aligned} 1= & {} \int _{I_2^{(n)}}\frac{p_n}{q}\,q\,dx + \int _{I_3^{(n)}}\frac{p_n}{q}\,q\,dx \nonumber \\\ge & {} (1-\epsilon )\int _{I_2^{(n)}}\,q\,dx + (1+\delta )\int _{I_3^{(n)}}\,q\,dx \ \ \ \ \text{ for } n > N. \end{aligned}$$
(10)

Similarly, we also have

$$\begin{aligned} 1=\int _{\mathbb {R}} q \,dx= & {} \int _{I_1^{(n)}}q\,dx + \int _{I_2^{(n)}}q\,dx + \int _{I_3^{(n)}}q\,dx \nonumber \\= & {} \int _{I_2^{(n)}}q\,dx + \int _{I_3^{(n)}}q\,dx \ \ \ \ \text{ for } n > N. \end{aligned}$$
(11)

From (10) and (11), the following inequality holds:

$$\begin{aligned} 1\ge & {} (1-\epsilon )\int _{I_2^{(n)}}\,q\,dx + (1+\delta )\left( 1-\int _{I_2^{(n)}}\,q\,dx\right) \nonumber \\= & {} (1+\delta ) - (\epsilon +\delta ) \int _{I_2^{(n)}}\,q\,dx \ \ \ \ \text{ for } n > N. \end{aligned}$$
(12)

Suppose \(\limsup \limits _{n \rightarrow \infty }\int _{I_2^{(n)}}\,q\,dx = \theta (\epsilon , \delta ) \in [0,1]\), then \(\liminf \limits _{n \rightarrow \infty }\int _{I_3^{(n)}}\,q\,dx = 1-\theta (\epsilon , \delta )\) based on (11). Since the definition of \(I_3^{(n)}\) depends only on \(\delta \), not \(\epsilon \), we know that \(\liminf \limits _{n \rightarrow \infty }\int _{I_3^{(n)}}\,q\,dx \) also depends only on \(\delta \), not \(\epsilon \). Thus \(\theta (\epsilon , \delta ) = \theta (\delta )\) does not depend on \(\epsilon \).

Taking limit inferior on both sides of (12), we have

$$\begin{aligned} 1\ge & {} (1+\delta ) -\limsup \limits _{n \rightarrow \infty }\left( (\epsilon +\delta ) \int _{I_2^{(n)}}\,q\,dx \right) = (1+\delta ) - (\epsilon +\delta )\theta (\delta ). \end{aligned}$$
(13)

Therefore,

$$\begin{aligned} \theta (\delta ) \ge \frac{\delta }{\delta +\epsilon }. \end{aligned}$$
(14)

Note that (14) is true for any \(\epsilon > 0\) and \(\delta > 0\) selected at the beginning of the proof. Since the left hand side of (14) does not depend on \(\epsilon \), letting \(\epsilon \rightarrow 0\) on the right hand side of (14), we have \(\theta (\delta ) \ge 1\). On the other hand, \(\limsup \limits _{n \rightarrow \infty }\int _{I_2^{(n)}}\,q\,dx = \theta (\delta ) \in [0,1]\). Therefore we have \(\theta (\delta )=1\), which implies \(\liminf \limits _{n \rightarrow \infty }\int _{I_3^{(n)}}\,q\,dx =1-\theta (\delta )= 0\) for any \(\delta >0\). From the definition of \(\beta _{1,n}\), we have \(\lim _{n\rightarrow \infty } \beta _{1,n} = 1\).

Since \(\mu (\{x: p_n(x)/q(x) < \beta _{2,n} \}) = \mu (\{x: p_n(x)/q(x) > \beta _{1,n}^{-1} \}) =0\), we have

$$\begin{aligned} D_f(p_n||q) = \int _{\{\beta _{2,n}\le \frac{p_n}{q}\le \beta _{1,n}^{-1}\}} f\left( \frac{p_n}{q}\right) \,q\,dx \le \sup _{\beta _{2,n}\le \beta \le \beta _{1,n}^{-1}}|f(\beta )|.\end{aligned}$$

Letting \(n\rightarrow \infty \), due to the continuity of f at 1, we have \(\lim _{n\rightarrow \infty } D_f(p_n||q) \le f(1) = 0\). \(\square \)

Proof of Theorem 1

Proof

From Lemma 1, we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \beta _{1,n} =\lim _{n\rightarrow \infty } \beta _{2,n}= 1. \end{aligned}$$

By L’Hospital’s rule, we have \(\lim _{t \rightarrow 1}\kappa (t) = 1\), where \(\kappa (t)\) is defined in (5). Therefore, take limit on the both sides of (6) and (7), we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{KL(p_n||q)}{KL(q||p_n)} = 1 \ , \ \ \lim _{n\rightarrow \infty } \frac{KL(p_n||q)}{\chi ^2(p_n||q)} = \frac{1}{2} .\end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Su, X., Chen, Y. Variational approximation for importance sampling. Comput Stat 36, 1901–1930 (2021). https://doi.org/10.1007/s00180-021-01063-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-021-01063-w

Keywords

Navigation