Abstract
In this paper we consider computations with the multivariate student density, truncated on a set described by a linear system of inequalities. Our goal is to both simulate from this truncated density, as well as to estimate its normalizing constant. To this end we consider an exponentially tilted sequential importance sampling (IS) density. We prove that the corresponding IS estimator of the normalizing constant, a rare-event probability, has bounded relative error under certain conditions. Along the way, we establish the multivariate extension of the Mill’s ratio for the student distribution. We present applications of the proposed sampling and estimation algorithms in Bayesian inference. In particular, we construct efficient rejection samplers for the posterior densities of the Bayesian Constrained Linear Regression model, the Bayesian Tobit model, and the Bayesian smoothing spline for non-negative functions. Typically, sampling from such posterior densities is only viable via approximate Markov chain Monte Carlo (MCMC). Finally, we propose a novel Reject-Regenerate sampler, which is a hybrid between rejection sampling and MCMC. The Reject-Regenerate sampler creates a Markov chain, whose states are, with a certain probability, flagged as commencing a new regenerative or renewal cycle. Whenever a state initiates a new regenerative cycle, we can further flip a biased coin to decide whether the state is an exact draw from the target, or not. We show that the proposed MCMC algorithm is strongly efficient in a rare-event regime and provide a numerical example.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Botev, Z.I.: The normal law under linear restrictions: simulation and estimation via minimax tilting. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 79(1), 125–148 (2017)
Botev, Z.I., L’Ecuyer, P.: Efficient probability estimation and simulation of the truncated multivariate student-t distribution. In: 2015 Winter Simulation Conference (WSC), pp. 380–391. IEEE (2015)
Botev, Z., L’Ecuyer, P.: Simulation from the normal distribution truncated to an interval in the tail. In: proceedings of the 10th EAI International Conference on Performance Evaluation Methodologies and Tools on 10th EAI International Conference on Performance Evaluation Methodologies and Tools, pp. 23–29 (2017)
Botev, Z.I., Mackinlay, D., Chen, Y.L.: Logarithmically efficient estimation of the tail of the multivariate normal distribution. In: 2017 Winter Simulation Conference (WSC), pp. 1903–1913. IEEE (2017)
Botev, Z.I., Chen, Y.L., L’Ecuyer, P., MacNamara, S., Kroese, D.P.: Exact posterior simulation from the linear lasso regression. In: 2018 Winter Simulation Conference (WSC), pp. 1706–1717. IEEE (2018)
Chen, M.H., Deely, J.J.: Bayesian analysis for a constrained linear multiple regression problem for predicting the new crop of apples. J. Agric. Biol. Environ. Stat. 1(4), 467–489 (1996)
Chen, M.H., Ibrahim, J.G., Shao, Q.M.: Monte Carlo Methods in Bayesian Computation. Springer (2000)
Chib, S.: Bayes inference in the Tobit censored regression model. J. Econom. 51(1–2), 79–99 (1992)
Gelfand, A.E., Smith, A.F., Lee, T.M.: Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling. J. Am. Stat. Assoc. 87(418), 523–532 (1992)
Genz, A., Bretz, F.: Numerical computation of multivariate t-probabilities with application to power calculation of multiple contrasts. J. Stat. Comput. Simul. 63(4), 103–117 (1999)
Hashorva, E., Hüsler, J.: On multivariate Gaussian tails. Ann. Inst. Stat. Math. 55(3), 507–522 (2003)
Kroese, D.P., Botev, Z.I., Taimre, T., Vaisman, R.: Data Science and Machine Learning: Mathematical and Statistical Methods. Chapman and Hall/CRC (2019)
Kroese, D.P., Taimre, T., Botev, Z.I.: Handbook of Monte Carlo Methods. Wiley (2011)
L’Ecuyer, P., Blanchet, J.H., Tuffin, B., Glynn, P.W.: Asymptotic robustness of estimators in rare-event simulation. ACM Trans. Model. Comput. Simul. (TOMACS) 20(1), 1–41 (2010)
Mengersen, K.L., Tweedie, R.L.: Rates of convergence of the Hastings and Metropolis algorithms. Ann. Stat. 24(1), 101–121 (1996)
Mills, J.P.: Table of the ratio: area to bounding ordinate, for any portion of normal curve. Biometrika, pp. 395–400 (1926)
Mroz, T.A.: The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econom. J. Econom. Soc. 55(4), 765–799 (1987)
Nummelin, E.: A splitting technique for Harris recurrent Markov chains. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 43(4), 309–318 (1978)
Pakman, A., Paninski, L.: Exact Hamiltonian Monte Carlo for truncated multivariate Gaussians. J. Comput. Graph. Stat. 23(2), 518–542 (2014)
Soms, A.P.: An asymptotic expansion for the tail area of the t-distribution. J. Am. Stat. Assoc. 71(355), 728–730 (1976)
Soms, A.P.: Rational bounds for the t-tail area. J. Am. Stat. Assoc. 75(370), 438–440 (1980)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 Proof of Theorem 2
Proof
First, we use the normal scale-mixture representation of \(\boldsymbol{Y}\sim \textsf{t}_\nu (\textbf{0},\Sigma )\) as \(\boldsymbol{Y}=\sqrt{\nu }\boldsymbol{Z}/R\), where \( \boldsymbol{Z}\sim \mathcal {N}\left( \textbf{0},\Sigma \right) \) is independent of \( R\sim c_\nu (r)=\frac{\exp \left( -\frac{r^2}{2}+(\nu -1)\ln r\right) }{2^{\nu /2-1}\Gamma (\nu /2)}, \quad r>0. \) We can thus write \(\ell \) as a conditional expectation: \( {\ell }(\gamma )={\mathbb {P}}\left[ \frac{\sqrt{\nu }\boldsymbol{Z}}{R}\ge \boldsymbol{l}(\gamma )\right] = \mathbb {E}\left[ \mathbb {P}\left[ \frac{\sqrt{\nu }\boldsymbol{Z}}{R}\ge \boldsymbol{l}(\gamma )\,\Big |\,R\right] \right] .\) Next, condition on \(R=r\), and let \(\boldsymbol{\mu }=r\boldsymbol{x}^*/\sqrt{\nu }\), where \(\boldsymbol{x}^*\) is the solution of the QPP. Denoting \(\boldsymbol{t}=[\boldsymbol{t}_1^\top ,\boldsymbol{t}_2^\top ]^\top =: r\boldsymbol{l}/\sqrt{\nu }\), and making a change of variable \(\boldsymbol{z}\leftarrow \boldsymbol{z}-\boldsymbol{\mu }\), we obtain \(\mathbb {P}\left[ \frac{\sqrt{\nu }\boldsymbol{Z}}{R}\ge \boldsymbol{l}(\gamma )\,\Big |\,R=r\right] =\mathbb {P}[\boldsymbol{Z}\ge \boldsymbol{t}]=\)
In other words, we have:
Let \(\mathfrak {D} \equiv \{\boldsymbol{z} :\boldsymbol{z}_1\ge {\textbf{0}},\boldsymbol{z}_2\ge \frac{r(\boldsymbol{l}_2-\Sigma _{21}\Sigma _{11}^{-1}\boldsymbol{l}_1)}{\sqrt{\nu }}\}\). We can now rewrite (10) as an integral and integrate over r. This gives \({\ell }(\gamma )=\):
where the third line follows from the change of variable \( u=r\sqrt{1+\frac{\boldsymbol{l}_1^\top \Sigma _{11}^{-1}\boldsymbol{l}_1}{\nu }}\;.\) Next, using formula (10) we rewrite the last expression as:
We now seek to apply the dominated convergence theorem to the expectation in the last displayed equation. For this we need the upper bound (recall that \(\Sigma _{11}^{-1}\boldsymbol{l}_1\ge {\textbf{0}}\))
The last expression is integrable in the sense that \(\int _0^\infty c_\nu (r)\exp (r^2/2)\overline{\Phi }\left( r\right) \textrm{d} r=\)
In addition, as \(\gamma \uparrow \infty \), by Lemma 1 we have the pointwise limits:
Therefore, by the dominated convergence theorem
This concludes the proof. \(\square \)
Lemma 1
(Continuity of Gaussian tail) Suppose that \(\boldsymbol{Z}\sim \mathcal {N}({\textbf{0}},\Sigma )\) for some positive definite matrix \(\Sigma \), and \(\boldsymbol{a}_n\rightarrow \boldsymbol{a}\) as \(n\uparrow \infty \). Then, the tail of the multivariate Gaussian is continuous: \( \lim _{n\uparrow \infty }\mathbb {P}[\boldsymbol{Z}\ge \boldsymbol{a}_n]=\mathbb {P}[\boldsymbol{Z}\ge \boldsymbol{a}]. \)
Proof
The proof is yet another application of the dominated convergence theorem to show that: \( \int _{[{\textbf{0}},\boldsymbol{\infty })}\phi _{\Sigma }(\boldsymbol{z}+\boldsymbol{a}_n)\textrm{d} \boldsymbol{z}\rightarrow \int _{[{\textbf{0}},\boldsymbol{\infty })}\phi _{\Sigma }(\boldsymbol{z}+\boldsymbol{a})\textrm{d} \boldsymbol{z}=\mathbb {P}[\boldsymbol{Z}\ge \boldsymbol{a}]. \) Since \(\Sigma \) is a positive definite matrix, the \(\Vert \boldsymbol{x}\Vert ^2_{\Sigma }:=\boldsymbol{x}^\top \Sigma ^{-1}\boldsymbol{x}\) is a norm satisfying \(\Vert \boldsymbol{z}+\boldsymbol{a}_n\Vert _{\Sigma }^2\le 2(\Vert \boldsymbol{z}\Vert _{\Sigma }^2+\Vert \boldsymbol{a}_n\Vert _{\Sigma }^2)\). Therefore, \( \int _{[{\textbf{0}},\boldsymbol{\infty })}\phi _{\Sigma }(\boldsymbol{z}+\boldsymbol{a}_n)\textrm{d} \boldsymbol{z}\le \frac{\exp (-\Vert \boldsymbol{a}_n\Vert ^2_{\Sigma })}{2^{n/2}}\int _{[\textbf{0},\boldsymbol{\infty })}\phi _{\Sigma /2}(\boldsymbol{z})\textrm{d} \boldsymbol{z}<\infty , \) and the conditions for the dominated convergence theorem are met. \(\square \)
1.2 Proof of Theorem 3
Proof
First, note that the second moment is \(\int g(\boldsymbol{z},r;\boldsymbol{\mu }^*,\eta ^*) \exp (2\psi (\boldsymbol{z},r;\boldsymbol{\mu }^*,\eta ^*))\) \(\textrm{d}\boldsymbol{z}\textrm{d} r =\)
Since the properties of \(\psi \) imply that
bounded relative error will follow if we can show that \( \frac{(r^*)^{\nu -1}\Phi (\eta ^*)\exp (\frac{(\eta ^*)^2}{2}-r^*\eta ^*)}{\ell (\gamma )} \) remains bounded in \(\gamma \). The pair \((r^*,\eta ^*)\) is determined from the solution to (3), namely from finding the saddle-point solution of: \(\max _{r,\boldsymbol{z}}\min _{\eta ,\boldsymbol{\mu }}\psi (\boldsymbol{z},r;\boldsymbol{\mu },\eta )\). This can be obtained by setting the gradient of \(\psi \) with respect to the vector \((\boldsymbol{z},r,\boldsymbol{\mu },\eta )\) to zero: \(\nabla \psi ={\textbf{0}}\). We now introduce the following notation that will allow us to express \(\nabla \psi ={\textbf{0}}\) explicitly. Let L be the lower triangular Cholesky factor of \(\Sigma = L L^\top \). Define \(D=\textrm{diag}( L)\;,\breve{ L}= D^{-1} L\), \( \tilde{\boldsymbol{l}}=\frac{r}{\sqrt{\nu }} D^{-1} \boldsymbol{l}(\gamma )-(\breve{ L}- I)\boldsymbol{z}, \) and vector \(\boldsymbol{\Psi }\) with elements \( \Psi _k=\phi (\tilde{l}_k-\mu _k)/\overline{\Phi }(\tilde{l}_k-\mu _k)\). Then, \(\nabla \psi ={\textbf{0}}\) can be written as
Next, we verify via substitution that the solution of (11) as \(\gamma \uparrow \infty \) satisfies \( r^*=\mathcal {O}(\gamma ^{-1})\), \(\boldsymbol{z}^*=\mathcal {O}(\textbf{1})\), \(\eta ^*=\mathcal {O}(-\gamma )\), \(\boldsymbol{\mu }^*=\mathcal {O}(\textbf{1}).\) First, equations one and three in (11) are trivially satisfied and we can deduce that \(\boldsymbol{\Psi }=\mathcal {O}(\textbf{1})\). Second, since \(\tilde{\boldsymbol{l}}=\mathcal {O}(r\boldsymbol{l}(\gamma ))=\mathcal {O}(\textbf{1})\), it follows that equation two in (11) is equivalent to
Finally, note that Mill’s ratio \( \frac{\Phi (\eta )}{\phi (\eta )}\simeq -\frac{1}{\eta }+\frac{1}{\eta ^3}, \eta \downarrow -\infty , \) implies that equation four is asymptotically equivalent to \(r\eta ^2+\eta -r\simeq 0 \). The solution of this quadratic equation in turn implies that \( \eta \simeq (-1-\sqrt{1+4r^2})/(2r) \simeq -1/r\). In other words, \(\eta ^* r^*=\mathcal {O}(1)\), as desired. Therefore, if \(\tilde{\psi }\) denotes the value of \(\psi \) at the solution (11), we have
By Mill’s ratio inequality: \( \ln \overline{\Phi }(-\eta )\le -\eta ^2/2-\frac{1}{2}\ln (2\pi )-\ln (-\eta ), \) we obtain: \( \tilde{\psi }\lesssim \mathcal {O}(1)-\ln (-\eta ^*)-\frac{1}{2}\ln (2\pi )+(\nu -1)\ln r^*=-\nu \log (\gamma )+\mathcal {O}(1). \) In other words, there exist constants \(c_1,c_2>0\) such that \( \exp (\tilde{\psi })\le c_1\gamma ^{-\nu } \) for every \(\gamma >c_2\). Therefore,
and since by Theorem 2
we have \(\text {lim sup}_{\gamma \uparrow \infty }\text {Var}(\hat{\ell })/\ell ^{2}< {\infty }.\) \(\square \)
1.3 Proof of Theorem 4
Proof
Ignoring the \(B_i\) variable in Algorithm 2 gives a state \(\boldsymbol{X}_n\) with marginal distribution that follows an independence Metropolis Hastings sampler. From [15, Theorem 2.1] we know that for an independence Metropolis sampler with proposal \(g(\boldsymbol{x})\) and target \(f(\boldsymbol{x})\) such that \(\sup _{\boldsymbol{x}}f(\boldsymbol{x})/g(\boldsymbol{x})<c\) for some constant \(c>0\), the Markov chain is uniformly ergodic with convergence rate
Thus, to ensure the total variation bound remains below \(\epsilon \), we need to run the independence sampler for \(t^*\) steps such that
In other words, we have \( t^*\ge \left\lceil -c\ln (\epsilon )\right\rceil \) and the length of the chain will remain bounded in the rarity parameter \(\gamma \) provided that \(c(\gamma )\) remains bounded in \(\gamma \). In Algorithm 2 we have
where \(\psi ^*=\exp (\psi (\boldsymbol{z}^*,r^*;\boldsymbol{\mu }^*,\eta ^*))\). However, from the proof of Theorem 3 we know that \(\frac{\exp (\psi ^*)}{\ell (\gamma )}\) remains bounded as \(\gamma \uparrow \infty \). Hence, the Markov chain in Algorithm 2 is strongly efficient. \(\square \)
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Botev, Z.I., Chen, YL. (2022). Truncated Multivariate Student Computations via Exponential Tilting. In: Botev, Z., Keller, A., Lemieux, C., Tuffin, B. (eds) Advances in Modeling and Simulation. Springer, Cham. https://doi.org/10.1007/978-3-031-10193-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-10193-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10192-2
Online ISBN: 978-3-031-10193-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)