Abstract
The need to simulate from a positive multivariate normal distribution arises in several settings, specifically in Bayesian analysis. A variety of algorithms can be used to sample from this distribution, but most of these algorithms involve Gibbs sampling. Since the sample is generated from a Markov chain, the user has to account for the fact that sequential draws in the sample depend on one another and that the sample generated only follows a positive multivariate normal distribution asymptotically. The user would not have to account for such issues if the sample generated was i.i.d. In this paper, an accept-reject algorithm is introduced in which variates from a positive multivariate normal distribution are proposed from a multivariate skew-normal distribution. This new algorithm generates an i.i.d. sample and is shown, under certain conditions, to be very efficient.
Similar content being viewed by others
References
Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88:669–679
Chen M, Deely J (1996) Bayesian analysis for a constrained linear multiple regression problem for predicting the new crop of apples. J Agric Biol Environ Stat 1:467–489
Chib S (1992) Bayes inference in the Tobit censored regression model. J Econom 51:79–99
Chib S, Greenberg E (1998) Analysis of multivariate probit models. Biometrika 85:347–361
Damien P, Walker SG (2001) Sampling truncated normal, beta, and gamma densities. J Comput Graph Stat 10:206–215
Gelfand AE, Smith AFM, Lee TM (1992) Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling. J Am Stat Assoc 87:523–532
Genz A (1992) Numerical computation of multivariate normal probabilities. J Comput Graph Stat 1:141–150
Genz A (1993) Comparison of methods for the computation of multivariate normal probabilities. Comput Sci Stat 25:400–405
Geweke J (1991) Efficient Simulation From the Multivariate Normal and Student t-distributions Subject to Linear Constraints. Computer Science and Statistics: Proceedings of the 23rd Symposium Interface, pp 571–577
Gupta AK, González-Farías G, Domínguez-Molina JA (2004) A multivariate skew normal distribution. J Multivar Anal 89:181–190
Hajivassiliou VA, McFadden D, Ruud PA (1996) Simulation of multivariate normal rectangle probabilities and their derivatives: theoretical and computational results. J Econom 72:85–134
Linardakis M, Dellaportas P (2003) Assessment of Athen’s metro passenger behavior via a multiranked probit model. J R Stat Soc Ser C 52:185–200
Liu X, Daniels MJ, Marcus B (2009) Joint models for the association of longitudinal binary and continuous processes with application to a smooking cessation trial. J Am Stat Assoc 104:429–438
Mira A, Moller J, Roberts GO (2001) Perfect slice samplers. J R Stat Soc Ser B 63:593–606
Phillipe A, Robert CA (2003) Perfect simulation of positive Gaussian distributions. Stat Comput 13:179–186
Pitt M, Chan D, Kohn R (2006) Efficient Bayesian inference for Gaussian copula regression models. Biometrika 93:537–554
Robert CP (1995) Simulation of truncated normal variables. Stat Comput 5:121–125
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 The perfect sampling algorithm of Phillipe and Robert
Perfect sampling involves coupling two Markov chains, one which begins at \(\mathbf{x}^\mathrm{min}\), where
and another which begins at \(\mathbf{x}^\mathrm{max}\), where
Once these two chains coalesce, an exact (or perfect) sample is generated.
While PR’s perfect sampling algorithm for the positive multivariate normal distribution is theoretically appealing, there are some computational issue associated with it which any practitioner should be aware of. The first issue involves the location of \(\mathbf{x}^\mathrm{min}\), the value of \(\mathbf{x}\) at which the density of \(\mathbf{X}\) achieves its lowest value. Mira et al. (2001) make it clear that for a perfect sampling algorithm such as this to truly generate an exact sample, one of the chains must begin at \(\mathbf{x}^\mathrm{min}\). For a positive multivariate normal distribution, \(\mathbf{x}^\mathrm{min} = \left( + \infty , +\infty , \ldots , +\infty \right)^T\). Starting a Markov chain at this point, however, is computationally infeasible. Phillipe and Robert get around this problem by setting \(\mathbf{x}^\mathrm{min}\) to an extreme positive value, but it is never made clear how this extreme value should vary with the parameters of the positive multivariate normal distribution. Setting \(\mathbf{x}^\mathrm{min}\) to such a value also results in a sample that is not truly exact. Their algorithm also involves an accept-reject step, and it is not clear how the acceptance probability of this step varies with the parameters of the positive multivariate normal distribution.
1.2 Correction to Gupta et al. (2004)
Gupta et al. (2004) state the following theorem
If
then \(\mathbf{Y} = \mathbf{Z}|(\mathbf{W} > \mathbf{0}) \sim f_\mathbf{Y} \left(\mathbf{y}; \beta , \xi , \mathbf{R}, \Omega \right)\) where
The density \(f_\mathbf{Y}(\mathbf{y}; \beta , \xi , \mathbf{R}, \Omega )\) is incorrect. The correct density is
Proof
First observe that
Since \(\mathbf{W}| \left( \mathbf{Z} = \mathbf{z} \right) \sim MVN_d \left( \xi + \mathbf{R} \left( \mathbf{z} - \beta \right), \mathbf{I} \right)\), we get
\(\square \)
1.3 Proof of Lemma 1
By setting \(\beta = \mu + \alpha \) for some \(\alpha \in \mathbb{R }^d, m \left( \mathbf{y}; \beta , \Omega \right)\) becomes
Observe that at \(\mathbf{y} = \mu , m \left( \mathbf{y}; \beta , \Omega \right) = -\alpha ^T \Omega ^{-1} \alpha \) which is negative for all \(\alpha \ne \mathbf{0}\) since \(\Omega ^{-1}\) is positive definite. Since \(m \left( \mathbf{y}; \beta , \Omega \right) \ge 0~\forall ~\mathbf{y} \in \mathbb{R }^d, \alpha \) has to equal \(\mathbf{0}\), which implies \(\beta = \mu \) \(\square \)
1.4 Proof of Lemma 2
This is a proof by contradiction. If \(m \left( \mathbf{y}; \beta , \Omega \right) \ge 0~\forall ~\mathbf{y} \in \mathbb{R }^d\), then with algebra it follows that
Now assume that \(|\Omega | < |\Sigma |\). If this is true, it follows that \(\phi _d \left( \mathbf{y}; \mu , \Sigma \right) < \phi _d \left( \mathbf{y}; \beta , \Omega \right)~~ \forall ~\mathbf{y} \in \mathbb{R }^d\), which is impossible since both functions integrate to 1 over \(\mathbb{R }^d\) \(\square \)
1.5 Proof of Theorem 1
By restricting \(m \left( \mathbf{y}; \beta , \Omega \right) \ge 0~~\forall ~\mathbf{y} \in \mathbb{R }^d\), it follows from Lemma 1 that \(\beta = \mu \), and from Lemma 2 that \(\Omega \) is some matrix such that \(|\Omega | \ge |\Sigma |\). Since \(\sup _{\mathbf{y} \in \mathbb{R }^d} \left\{ h \left( \mathbf{y}; \beta , \Omega \right) \right\} \) increases with \(|\Omega |\), \(\Omega \) should be some matrix such that its determinant is as close to \(|\Sigma |\) as possible, and such that \(m \left( \mathbf{y}; \beta , \Omega \right) \ge 0~\forall ~\mathbf{y} \in \mathbb{R }^d\). A value of \(\Omega \) that has a determinant greater equal to \(|\Sigma |\) yet still satisfies the restriction placed on \(m \left( \mathbf{y}; \beta , \Omega \right)\) is \(\Omega = \Sigma \). With \(\Omega = \Sigma \), \(\inf _{\beta , \Omega : m \left( \mathbf{y}; \beta , \Omega \right) > 0 } \left[ \sup _{ \mathbf{y} \in \mathbb{R }^d} \left\{ h \left( \mathbf{y}; \beta , \Omega \right) \right\} \right] = 1\). \(\square \)
1.6 Proof of Theorem 2
First observe that
The second to last equality in (15) follows since \(\mathbf{W} \sim MVN_{l^*} \left( \xi , \mathbf{I} + r^2 \mathrm{Cov} \left( \mathbf{Z}^* \right) \right)\), and
Since \(r > 0\), \(\min ( \mathbf{e} ) < 0\), and all elements of \(\mathbf{H}\) are equal to 0 or 1, the vector \(-r \mathbf{e}^T \mathbf{H}\) has at least one positive element. With this being true, it follows that \(-r \mathbf{e}^T \mathbf{H} \mathbf{y}\) spans over all values of \(\mathbb{R }\) when \(\mathbf{y} \in \mathbb{R }_+^d\). It follows that \(\inf _{\mathbf{y} \in \mathbb{R }_+^d} \left\{ \Phi _1 \left(0; -\mathbf{e}^T \xi - r \mathbf{e}^T \mathbf{H} \mathbf{y} + r \mathbf{e}^T \mathbf{H} \mu , 1 \right) \right\} = 0,\) making
\(\square \)
1.7 Proof of Theorem 3
First observe that
We have to select values of \(r\) and \(\xi \) that make \(q\) as small as possible and \(s\) as large as possible, where
With lots of algebra, we get that \(r\) satisfies
If \(q > \left. \left( \mathbf{h}^{\mathbf{Z}^*} \right)^T \mathbb{E } \left( \mathbf{Z}^* \right) \right. \sqrt{ \omega ^{\mathbf{Z}^*}}\), and \(s > |q|\), then \(r\) is real and positive. The value of \(s\) can thus be arbitrarily larger than \(|q|\), making the denominator of (16) arbitrarily close to \(p_\mathrm{SAR} (\mu , \Sigma )\), and \(q\) can be arbitrarily close (yet larger) than \( \left. \left( \mathbf{h}^{\mathbf{Z}^*} \right)^T \mathbb{E } \left( \mathbf{Z}^* \right) \right. \sqrt{ \omega ^{ \mathbf{Z}^*}}\) making the numerator of (16) arbitrarily close to \({ \Phi _1 \left( \left. \left( \mathbf{h}^{\mathbf{Z}^*} \right)^T \mathbb{E } \left( \mathbf{Z}^* \right) \right. \sqrt{ \omega ^{ \mathbf{Z}^*}}; 0, 1 \right) = \Phi _1 \left( g(\mu , \Sigma ); 0,1 \right)}. \) \(\square \)
1.8 Proof of Corollary 1
To prove the corollary, the following three results are necessary.
Result 1
Let \(\alpha \in (0,1)\). For a negative value of \(\kappa _1\), \(\Phi _1 \left( \left( \kappa _1 + \epsilon _1 \right); 0, 1 \right) - \Phi _1 \left( \kappa _1; 0, 1 \right) = u_1 \left( \delta _1 \right)\) where \(u_1 \left( \delta _1 \right) = o \left( \delta _1^{1 - \alpha } \right)\) as \(\delta _1 \searrow 0\).
Proof
For a negative value of \(\kappa _1\),
With the Taylor expansion of the erf function, we get
Now observe that \( |\kappa _1|^j - |\kappa _1 + \delta _1|^j \!=\! |\kappa _1|^j - \left( | \kappa _1| - \delta _1 \right)^j \!=\! - \sum _{k=1}^j \genfrac(){0.0pt}{}{j}{k} | \kappa _1|^{j-k} (-\delta _1)^k\), in which case
and the value of the right goes to 0 as \(\delta _1 \searrow 0\). Thus, \( \Phi _1 \left( \kappa _1 + \delta _1; 0, 1 \right) - \Phi _1 \left( \kappa _1; 0, 1 \right) = u_1 \left( \delta _1 \right)\) where \( u_1 = o \left( \delta _1^{1-\alpha } \right)\) as \(\delta _1 \searrow 0\). \(\square \)
Result 2
Let \(\gamma \in (0,1)\). \(\Phi _1 \left( \left( \kappa _2 + \delta _2 \right); 0, 1 \right) - 1 = u_2 \left( \delta _2 \right)\) where \(u_2 \left( \delta _2 \right) = o \left( e^{-\delta _2 \gamma } \right)\) as \(\delta _2 \longrightarrow + \infty \).
Proof
First observe that
As \(\delta _2\) gets larger, eventually \(\kappa _2 + \delta _2\) is greater than 2, in which case
and, when \(\gamma \in (0,1)\), the right side of the last equation goes to 0 as \(\delta _2 \longrightarrow + \infty \). \(\square \)
Result 3
Let \(\alpha \in (0,1)\) and \(\gamma \in (0,1)\). Then \({\frac{ \Phi _1 \left( \left( \kappa _1 + \delta _1 \right); 0, 1 \right)}{ \Phi _1 \left( \left( \kappa _2 + \delta _2 \right); 0, 1 \right)}} - \Phi _1 \left( \kappa _1; 0, 1 \right) = v_1 \left( \delta _1 \right) + v_2 \left( \delta _2 \right)\), where \(v_1 \left( \delta _1 \right) = o \left( \delta _1^{1 - \alpha } \right)\) as \(\delta _1 \searrow 0\). and \(v_2 \left( \delta _2 \right) = o \left( e^{-{\delta _2 \gamma }} \right)\) as \(\delta _2 \longrightarrow \infty \).
Proof
From Results 1 and 2, we can say that
where \(u_1 \left( \delta _1 \right) = o \left( \delta _1^{1 - \alpha } \right)\) as \(\delta _1 \searrow 0\), and \(u_2 \left( \delta _2 \right) = o \left( e^{-{\delta _2 \gamma }} \right)\) as \(\delta _2 \longrightarrow \infty \). Note that
The last inequality holds because \(1 + u_2 \left( \delta _2 \right) = \Phi _1 \left( \kappa _2 + \delta _2; 0, 1 \right) \ge .5\) when \(\delta _2\) is sufficiently large. This implies that
where \(v_1(\delta _1) = 2 u_1 \left( \delta _1 \right)\) and \(v_2 \left( \delta _2 \right) = -2 \Phi _1 \left( \kappa _1; 0, 1 \right) u_2 \left( \delta _2 \right)\), and since \(v_1 \left( \delta _1 \right)\) and \(v_2 \left( \delta _2 \right)\) are multiples of \(u_1 \left( \delta _1 \right)\) and \(u_2 \left( \delta _2 \right)\), equivalent asymptotic results hold. \(\square \)
Proof of Corollary
From the steps given in the Proof of Theorem 3, it is clear that by setting \(\mathbf{R} = r \mathbf{H}\), where \(r = b \left( \delta _1, \delta _2 \right)\) and \(\xi = \chi \left( \delta _1, \delta _2 \right)\), that
and from Result 3 above, it follows that
where \(f_1 \left( \delta _1 \right) = o \left( \delta _1^{1 - \alpha } \right)\) with \(\alpha \in (0,1)\) as \(\delta _1 \searrow 0\), and \(f_2 \left( \delta _2 \right) = 0 \left( e^{-\delta _2 \gamma } \right)\) with \(\gamma \in (0,1)\) as \(\delta _2 \longrightarrow \infty \). \(\square \)
1.9 Website with R Code
The website http://carstenbotts.com/wp-content/uploads/2011/10/TruncNormalSampler21.txt contains the R function trunc_normal_smplr.
Rights and permissions
About this article
Cite this article
Botts, C. An accept-reject algorithm for the positive multivariate normal distribution. Comput Stat 28, 1749–1773 (2013). https://doi.org/10.1007/s00180-012-0377-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-012-0377-2