Abstract
We propose a Bayesian hypothesis testing procedure for comparing the multivariate distributions of several treatment groups against a control group. This test is derived from a flexible model for the group distributions based on a random binary vector such that, if its jth element equals one, then the jth treatment group is merged with the control group. The group distributions’ flexibility comes from a dependent Dirichlet process, while the latent vector prior distribution ensures a multiplicity correction to the testing procedure. We explore the posterior consistency of the Bayes factor and provide a Monte Carlo simulation study comparing the performance of our procedure with state-of-the-art alternatives. Our results show that the presented method performs better than competing approaches. Finally, we apply our proposal to two classical experiments. The first one studies the effects of tuberculosis vaccines on multiple health outcomes for rabbits, and the second one analyzes the effects of two drugs on weight gain for rats. In both applications, we find relevant differences between the control group and at least one treatment group.
Similar content being viewed by others
References
Allison, M.J., Zappasodi, P., Lurie, M.B.: The correlation of a biphasic metabolic response with a biphasic response in resistance to tuberculosis in rabbits. J. Exp. Med. 115(5), 881–890 (1962). https://doi.org/10.1084/jem.115.5.881
Andrieu, C., Doucet, A., Holenstein, R.: Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B 72(3), 269–342 (2010). https://doi.org/10.1111/j.1467-9868.2009.00736.x
Barrientos, A.F., Jara, A., Quintana, F.A.: On the support of MacEachern’s dependent Dirichlet processes and extensions. Bayesian Anal. 7(2), 277–310 (2012). https://doi.org/10.1214/12-BA709
Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. Wiley, New York (1994)
Biswas, M., Mukhopadhyay, M., Ghosh, A.K.: A distribution-free two-sample run test applicable to high-dimensional data. Biometrika 101(4), 913–926 (2014). https://doi.org/10.1093/biomet/asu045
Blackwell, D., MacQueen, J.B.: Ferguson distributions via Polya urn schemes. Ann. Stat. 1(2), 353–355 (1973). https://doi.org/10.1214/aos/1176342372
Bouchard-Côté, A., Doucet, A., Roth, A.: Particle Gibbs split-merge sampling for Bayesian inference in mixture models. J. Mach. Learn. Res. 18(28), 1–39 (2017). (https://arxiv.org/abs/1508.02663)
Box, G.E.P.: Problems in the analysis of growth and wear curves. Biometrics 6(4), 362–389 (1950). https://doi.org/10.2307/3001781
Castillo, I., Schmidt-Hieber, J., van der Vaart, A.: Bayesian linear regression with sparse priors. Ann. Stat. 43(5), 1986–2018 (2015). https://doi.org/10.1214/15-AOS1334
Chatfield, C., Collins, A.J.: Introduction to Multivariate Analysis. Chapman and Hall, Edward Chapman (1980)
Chen, Y., Hanson, T.E.: Bayesian nonparametric k-sample tests for censored and uncensored data. Comput. Stat. Data Anal. 71, 335–346 (2014). https://doi.org/10.1016/j.csda.2012.11.003
Chen, H., Friedman, J.H.: A new graph-based two-sample test for multivariate and object data. J. Am. Stat. Assoc. 112(517), 397–409 (2017). https://doi.org/10.1080/01621459.2016.1147356
Chen, H., Chen, X., Su, Y.: A weighted edge-count two-sample test for multivariate and object data. J. Am. Stat. Assoc. 113(523), 1146–1155 (2018). https://doi.org/10.1080/01621459.2017.1307757
Cipolli, W., III., Hanson, T.E., McLain, A.C.: Bayesian nonparametric multiple testing. Comput. Stat. Data Anal. 101, 64–79 (2016). https://doi.org/10.1016/j.csda.2016.02.016
Cole, D.A., Maxwell, S.E., Arvey, R., et al.: How the power of MANOVA can both increase and decrease as a function of the intercorrelations among the dependent variables. Psychol. Bull. 115(3), 465–474 (1994). https://doi.org/10.1037/0033-2909.115.3.465
De Iorio, M., Müller, P., Rosner, G.L., et al.: An ANOVA model for dependent random measures. J. Am. Stat. Assoc. 99(465), 205–215 (2004). https://doi.org/10.1198/016214504000000205
Duncan, D.B.: A Bayesian approach to multiple comparisons. Technometrics 7, 171–222 (1965). https://doi.org/10.2307/1266670
Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90(430), 577–588 (1995). https://doi.org/10.1080/01621459.1995.10476550
Friedman, J.H., Rafsky, L.C.: Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann. Stat. 7(4), 697–717 (1979). https://doi.org/10.1214/aos/1176344722
Gelfand, A.E., Smith, A.F.M.: Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85(410), 398–409 (1990). https://doi.org/10.2307/2289776
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984). https://doi.org/10.1109/TPAMI.1984.4767596
George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88(423), 881–889 (1993). https://doi.org/10.1080/01621459.1993.10476353
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (2013)
Gutiérrez, L., Barrientos, A.F., González, J., et al.: A Bayesian nonparametric multiple testing procedure for comparing several treatments against a control. Bayesian Anal. 14(2), 649–675 (2019). https://doi.org/10.1214/18-BA1122
Harville, D.A.: Matrix Algebra from a Statistician’s Perspective. Springer, New York (2008)
Holmes, C.C., Caron, F., Griffin, J.E., et al.: Two-sample Bayesian nonparametric hypothesis testing. Bayesian Anal. 10(2), 297–320 (2015). https://doi.org/10.1214/14-BA914
Hotelling, H.: A generalized t test and measure of multivariate dispersion. In: Proceedings of Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, pp. 23–41 (1951). https://projecteuclid.org/euclid.bsmsp/1200500217
Ishwaran, H., James, L.F.: Gibbs sampling methods for stick-breaking priors. J. Am. Stat. Assoc. 96, 161–173 (2001). https://doi.org/10.1198/016214501750332758
Jain, S., Neal, R.M.: A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. J. Comput. Graph. Stat. 13(1), 158–182 (2004). https://doi.org/10.1198/1061860043001
Jeffreys, H.: Some tests of significance, treated by the theory of probability. Math. Proc. Camb. Philos. Soc. 31(2), 203–222 (1935). https://doi.org/10.1017/S030500410001330X
Jefferys, W.H., Berger, J.O.: Ockham’s razor and Bayesian analysis. Am. Sci. 80(1), 64–72 (1992). (www.jstor.org/stable/29774559)
Kim, S., Dahl, D.B., Vannucci, M.: Spiked Dirichlet process prior for Bayesian multiple hypothesis testing in random effects models. Bayesian Anal. 4(4), 707–732 (2009). https://doi.org/10.1214/09-BA426
Konietschke, F., Bathke, A.C., Harrar, S.W., et al.: Parametric and nonparametric bootstrap methods for general MANOVA. J. Multivar. Anal. 140, 291–301 (2015). https://doi.org/10.1016/j.jmva.2015.05.001
Lo, A.Y.: On a class of Bayesian nonparametric estimates: I. Density estimates. Ann. Stat. 12(1), 351–357 (1984). https://doi.org/10.1214/aos/1176346412
Ma, L., Wong, W.H.: Coupling optional Pólya trees and the two sample problem. J. Am. Stat. Assoc. 106(496), 1553–1565 (2011). https://doi.org/10.1198/jasa.2011.tm10003
Maceachern, S.N.: Dependent nonparametric processes. In: ASA Proceedings Section Bayesian Statistical Science, American Statistical Association, Alexandria, VA (1999)
Miller, R.G.: Simultaneous Statistical Inference. Springer Series in Statistics, 2nd edn. Springer, New York (1981)
Mitchell, T.J., Beauchamp, J.J.: Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83(404), 1023–1032 (1988). https://doi.org/10.1080/01621459.1988.10478694
Mukhopadhyay, S., Wang, K.: A nonparametric approach to high-dimensional k-sample comparison problems. Biometrika 107(3), 555–572 (2020). https://doi.org/10.1093/biomet/asaa015
Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9(2), 249–265 (2000). https://doi.org/10.1080/10618600.2000.10474879
Pillai, K.C.S.: Some new test criteria in multivariate analysis. Ann. Math. Stat. 26(1), 117–121 (1955). https://doi.org/10.1214/aoms/1177728599
Quintana, F.A., Müller, P., Jara, A., et al.: The dependent Dirichlet process and related models. Stat. Sci. 37(1), 24–41 (2022). https://doi.org/10.1214/20-STS819
Rosenbaum, P.R.: An exact distribution-free test comparing two multivariate distributions based on adjacency. J. R. Stat. Soc. Ser. B 67(4), 515–530 (2005). https://doi.org/10.1111/j.1467-9868.2005.00513.x
Roy, S.N.: On a heuristic method of test construction and its use in multivariate analysis. Ann. Math. Stat. 24(2), 220–238 (1953). https://doi.org/10.1214/aoms/1177729029
Rupasinghe, H.S., Olive, D.J.: Bootstrapping analogs of the one way MANOVA test. Commun. Stat. Theory Methods 48(22), 5546–5558 (2019). https://doi.org/10.1080/03610926.2018.1515363
Scott, J.G., Berger, J.O.: An exploration of aspects of Bayesian multiple testing. J. Stat. Plan. Inference 136, 2144–2162 (2006). https://doi.org/10.1016/J.JSPI.2005.08.031
Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4(2), 639–650 (1994). (http://www.jstor.org/stable/24305538)
Smirnov, N.: On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Moscow Univ. Math. Bull. 2(2), 1 (1939)
Taylor-Rodriguez, D., Womack, A., Bliznyuk, N.: Bayesian variable selection on model spaces constrained by heredity conditions. J. Comput. Graph. Stat. 25(2), 515–535 (2016). https://doi.org/10.1080/10618600.2015.1056793
Wald, A., Wolfowitz, J.: On a test whether two samples are from the same population. Ann. Math. Stat. 11(2), 147–162 (1940). https://doi.org/10.1214/aoms/1177731909
Warne, R.T., Lazo, M., Ramos, T., et al.: Statistical methods used in gifted education journals, 2006–2010. Gift Child Q. 56(3), 134–149 (2012). https://doi.org/10.1177/0016986212444122
Weiss, L.: Two-sample tests for multivariate distributions. Ann. Math. Stat. 31(1), 159–164 (1960). https://doi.org/10.1214/aoms/1177705995
Wilks, S.S.: Certain generalizations in the analysis of variance. Biometrika 24(3/4), 471–494 (1932). https://doi.org/10.2307/2331979
Wilson, M.A., Iversen, E.S., Clyde, M.A., et al.: Bayesian model search and multilevel inference for SNP association studies. Ann. Appl. Stat. 4(3), 1342–1364 (2010). https://doi.org/10.1214/09-aoas322
Womack, A.J., Fuentes, C., Taylor-Rodriguez, D.: Model space priors for objective sparse Bayesian regression (2015). arXiv:1511.04745
Zanella, G.: Informed proposals for local MCMC in discrete spaces. J. Am. Stat. Assoc. 115(530), 852–865 (2020). https://doi.org/10.1080/01621459.2019.1585255
Zientek, L.R., Capraro, M.M., Capraro, R.M.: Reporting practices in quantitative teacher education research: one look at the evidence cited in the AERA panel report. Educ. Res. 37(4), 208–216 (2008). https://doi.org/10.3102/0013189X08319762
Acknowledgements
The first author was supported by CONICYT PFCHA/DOCTORADO BECAS CHILE/2020-21201742. The second author was supported by Fondecyt Grant 1220229 and ANID–Millennium Science Initiative Program–NCN17\(\_\)059. The third author was partially supported by Fondecyt Grant 11190018 and UKRI Medical Research Council Grant MC_UU_00002/5.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Useful properties of the Bayesian Normal model
For future reference, we review here some useful properties of the conjugate Normal model. Let \(\varvec{y}_{1:m}:= (\varvec{y}_1, \ldots , \varvec{y}_{m})\), where \((\varvec{y}_i \,|\, \varvec{\mu }, \varvec{\Sigma }) {\mathop {\sim }\limits ^{iid}} \text {N}_D(\varvec{\mu }, \varvec{\Sigma })\) and \((\varvec{\mu }, \varvec{\Sigma }) \sim \text {NIW}_D(\varvec{u}_0,r_0, \nu _0, \varvec{S}_0)\). Then, \(\big ((\varvec{\mu }, \varvec{\Sigma }) \,|\, \varvec{y}_{1:m}\big ) \sim \text {NIW}_D(\varvec{u}_m, r_m, \nu _m, \varvec{S}_m)\), where \((\varvec{u}_m, r_m, \nu _m, \varvec{S}_m)\) can be computed recursively as
Moreover, the marginal predictive density and the predictive density can be written as
and
respectively, where \(\Gamma _D(\cdot )\) is the D-variate Gamma function (Bernardo and Smith 1994). In practice, we almost never compute \(\varvec{S}_m\) but its Cholesky decomposition, \(\varvec{S}_m = \varvec{P}_m' \varvec{P}_m\). Specifically, we compute \(\varvec{P}_0\) from scratch and then compute \(\varvec{P}_1, \ldots , \varvec{P}_m\) using a series of rank-1 updates (Golub and Van Loan 2013, Section 6.5.4).
Appendix B: Posterior simulation algorithm
1.1 Appendix B.1: Updating the DP concentration parameter (Step 1, Algorithm [1])
We update \(\alpha \) using the algorithm of Escobar and West (1995), which proceeds as follows:
-
1.
Draw \(\phi \,|\, \alpha \sim \text {Beta}(\alpha + 1, N)\).
-
2.
Compute \(n_{k} = \# \{s_i: i \in {\mathcal {N}}\}\).
-
3.
Compute
$$\begin{aligned} \psi / (1 - \psi ) = (a_0 + n_{k} - 1) / \{N (b_0 - \log \phi )\}. \end{aligned}$$ -
4.
Draw \(\chi \sim \text {Bernoulli}(\psi )\).
-
5.
Draw
$$\begin{aligned} \alpha \sim {\left\{ \begin{array}{ll} \text {Gamma}(a_0 + n_{k}, b_0 - \log \phi ) &{} \text {if }\chi = 1, \\ \text {Gamma}(a_0 + n_{k} - 1, b_0 - \log \phi ) &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
1.2 Appendix B.2: Updating the cluster labels (Step 2, Algorithm [1])
It is not hard to see that, conditionally on \((\alpha , \varvec{\gamma })\), model (3) reduces to a Dirichlet process mixture model (Lo 1984). Let \({\mathcal {I}}_{-i}:= {\mathcal {N}} \backslash \{i\}\) and \(\varvec{s}_{-i}:= (s_\ell : \ell \in {\mathcal {I}}_{-i})\). Then, there is a well known, closed-form expression for \(p(s_i \,|\, \varvec{s}_{-i}, \varvec{z}, \varvec{y},\alpha ) = p(s_i \,|\, \varvec{s}_{-i}, \varvec{x}, \varvec{y},\alpha , \varvec{\gamma })\), namely
where \(k^*:= 1 + \max (\varvec{s}_{-i})\), \(\varvec{y}_{-i}:= (\varvec{y}_\ell : \ell \in {\mathcal {I}}_{-i})\) and \(n_{-ik} = \#\{\ell \in {\mathcal {I}}_{-i}: s_\ell = k\}\) (Neal 2000). Now, let \({\mathcal {I}}_{-ik} = \{\ell \in {\mathcal {I}}_{-i}: z_\ell = z_i, s_\ell = k\}\) and \(\varvec{y}_{-ik} = (\varvec{y}_\ell : \ell \in {\mathcal {I}}_{-ik})\). Then, we can rewrite \(p(\varvec{y}_i \,|\, \varvec{z}, \varvec{y}_{-i}, \varvec{s}_{-i}, s_i = k)\) as \(p(\varvec{y}_i \,|\, z_i, \varvec{y}_{-ik}, s_i = k)\), which coincides with the predictive likelihood of an out-of sample observation \(\varvec{y}_i\), given a sample \(\varvec{y}_{-ik}\), under the conjugate Normal model described in the previous appendix. Hence, \(p(\varvec{y}_i \,|\, \varvec{z}, \varvec{y}_{-i}, \varvec{s}_{-i}, s_i = k)\) can be computed efficiently using Eqs. (7) and (8).
In practice, we rarely need to compute the quantity \(p(\varvec{y}_i \,|\, \varvec{y}_{-ik}, s_i = k, z_i = j)\) from scratch. Instead, the statistics used in the computation of \(p(\varvec{y}_i \,|\, \varvec{y}_{-ik}, s_i = k, z_i = j)\), denoted by \((\varvec{u}_m, r_m, \nu _m, \varvec{S}_m)\) in (6), are cached as \(\varvec{T}_{jk}\); and each time \(s_i\) change its value, say from \(k_1\) to \(k_2\), only \(\varvec{T}_{jk_1}\) and \(\varvec{T}_{jk_2}\) are updated using recursion (6).
1.3 Appendix B.3: Updating the hypothesis vector (Step 3, Algorithm [1])
Let \(g_j:= P(\Vert \varvec{\gamma }\Vert _1 = j)\), \(j \in {\mathcal {J}}\). Then, under Womack’s prior,
where \(\zeta _0\) is a hyperparameter (Womack et al. 2015). This is a non-singular linear system with \(J + 1\) equations. Hence, \(\varvec{g} = (g_j: j \in {\mathcal {J}})\) is always well defined. Once \(\varvec{g}\) is computed, the cost of computing \(\pi _0(\varvec{\gamma })\) is negligible, because Womack’s prior implies that \(\pi _0(\varvec{\gamma }) = g_{\Vert \varvec{\gamma }\Vert _1} \left( {\begin{array}{c}J\\ \Vert \varvec{\gamma }\Vert _1!\end{array}}\right) \).
Now, let \({\mathcal {I}}_{\varvec{\gamma } jk}:= \{i \in {\mathcal {N}}: \gamma _{x_i}x_i = j, s_i = k\}\) and \(\varvec{y}_{\varvec{\gamma } jk}:= (\varvec{y}_i: i \in {\mathcal {I}}_{\varvec{\gamma } jk})\). Then,
under the convention that \(p(\varvec{y}_{\varvec{\gamma } jk} \,|\, {\mathcal {I}}_{\varvec{\gamma } jk}) = 1\) if \({\mathcal {I}}_{\varvec{\gamma } jk} = \emptyset \). However, \(p(\varvec{y}_{\varvec{\gamma } jk} \,|\, {\mathcal {I}}_{\varvec{\gamma } jk})\) is numerically equal to the marginal likelihood of a sample \(\varvec{y}_{\varvec{\gamma } jk}\) under the model described in Appendix A. Hence, \(\pi _1(\varvec{\gamma })\) can be computed efficiently (up to some proportionality constant) using Eq. (7) and thus we can draw \(\varvec{\gamma } \sim \pi _1\) using any variant of the MH algorithm. In particular, if we use \(W(\varvec{\beta } \,|\, \varvec{\gamma }) \propto I(\Vert \varvec{\beta } - \varvec{\gamma }\Vert _1 = 1)\) as proposal distribution, the acceptance ratio becomes
Appendix C: Proof of Theorem 1
Let \(\varvec{\beta }\) and \(\varvec{\gamma }\) two hypotheses such that \(\varvec{\beta } \succ \varvec{\gamma }\). Without loss of generality, assume that \(\varvec{\beta } = \varvec{e}_1\) and \(\varvec{\gamma } = \varvec{0}_J\), where \(\varvec{e}_1\) is the first column of \(\varvec{I}_J\).
(a) In order to prove this claim, let’s start by noting that \(\bar{\nu }_{\varvec{\gamma } jk} / n_{\varvec{\gamma } jk} \rightarrow 1\), \(\bar{r}_{\varvec{\gamma } jk} / n_{\varvec{\gamma } jk} \rightarrow 1\), and \(\bar{\varvec{S}}_{\varvec{\gamma } 0k} / n_{\varvec{\gamma } 0k} {\mathop {\rightarrow }\limits ^{p}} \varvec{\Sigma }_{0k}\) for all active groups. In addition, if the true hypothesis is \(\varvec{\gamma }\), then \(\bar{\varvec{S}}_{\varvec{\beta } jk} / n_{\varvec{\beta } jk} {\mathop {\rightarrow }\limits ^{p}} \varvec{\Sigma }_{0k}\), \(j = 0, 1\). Hence,
where \(\kappa _{\varvec{\gamma } jkd}:= (\bar{\nu }_{\varvec{\gamma } jk} + 1 - d) / 2\) and \(a_n {\mathop {\sim }\limits ^{\centerdot }} b_n \Leftrightarrow a_n = O_p(b_n)\), i.e. that \(a_n / b_n\) is stochastically bounded. This, in turn, implies that
where the terms involving \(\pi \) and \(\varvec{\Sigma }_{0k}\) disappear because the exponents in the numerator and the denominator cancel out, except for a term that does not depends on the sample size, and thus it becomes irrelevant in the subsequent computations.
Now, let us assume that \(n_{\varvec{\beta } jk} / n_{\varvec{\gamma } 0k} {\mathop {\rightarrow }\limits ^{p}} \chi _j > 0\), \(j = 0, 1\). This must be the case since the sample is assumed independent and identically distributed. Then,
where most of the terms involving the sample size disappears because \(\bar{\nu }_{\varvec{\gamma } 0k} - \bar{\nu }_{\varvec{\beta } 0k} - \bar{\nu }_{\varvec{\beta } 1k} = - \nu _0\). On the other hand, we can rewrite \(A_{3d}\) as a Beta function times a compensating term
where the last step is due \(\kappa _{\varvec{\gamma } 0kd} / n_{\varvec{\gamma } 0k}\) converges to 1/2. Hence, using Stirling approximation, we have
but we know that
Therefore,
Hence,
which converges to zero.
(b) First, note that
From the first part of the theorem, we already know that \(b_{k1} = O_p(n_{\varvec{\gamma } 0k}^{q_1})\) and \(b_{k2} = O_p(n_{\varvec{\gamma } 0k}^{q_2})\) for some \(q_1, q_2 < \infty \). In addition,
where the second equality is due to the fact that \(\bar{\varvec{S}}_{\varvec{\gamma } 0k} / n_{\varvec{\gamma } 0k}\), \(\bar{\varvec{S}}_{\varvec{\beta } 0k} / n_{\varvec{\beta } 0k}\), and \(\bar{\varvec{S}}_{\varvec{\beta } 1k} / n_{\varvec{\beta } 1k}\) converge in probability to (positive definite) finite matrices. Hence, \(b_{k1} b_{k2} b_{k3} = O_p(n_{\varvec{\gamma } 0k}^q)\) for some \(q < \infty \). In the next paragraphs, we will prove that \(b_{k4} = O_p((1 + u)^{n_{\varvec{\gamma } 0k}})\) for some \(u > 0\), ensuring that \(b_k\) diverges to \(\infty \) in probability, no matter the value of the aforementioned q.
First, recall a basic property of the conjugate Normal model: if \(\varvec{a}_i \,|\, \varvec{\mu }, \varvec{\Sigma } {\mathop {\sim }\limits ^{iid}} \text {N}_D(\varvec{\mu }, \varvec{\Sigma })\), with \((\varvec{\mu }, \varvec{\Sigma }) \sim \text {NIW}(\bar{\varvec{u}}_0, \bar{r}_0, \bar{\nu }_0, \bar{\varvec{S}}_0)\). Then, \((\varvec{\mu }, \varvec{\Sigma }) \,|\, \varvec{a}_1, \ldots , \varvec{a}_m \sim \text {NIW}(\bar{\varvec{u}}_m, \bar{r}_m, \bar{\nu }_m, \bar{\varvec{S}}_m)\), where the posterior hyperparameters can be computed recursively as (Bouchard-Côté et al. 2017)
Now, for any fixed \(j \in \{0, 1\}\), replace m with \(n_{\varvec{\gamma } 0k}\), and replace the \(\varvec{a}_i\)s with the elements of \(\{\varvec{y}_i: i \in {\mathcal {I}}_{\varvec{\beta } jk}\}\) followed by the elements of \(\{\varvec{y}_i: i \in {\mathcal {I}}_{\varvec{\beta } (1-j)k}\}\). Then, it is not hard to note that \(\bar{\varvec{S}}_{n_{\varvec{\beta } jk}} = \bar{\varvec{S}}_{\varvec{\beta }jk}\) and \(\bar{\varvec{S}}_{n_{\varvec{\gamma } 0k}} = \bar{\varvec{S}}_{\varvec{\gamma }0k}\). Hence, \(\bar{\varvec{S}}_{\varvec{\gamma }0k}\) can be obtained from any \(\bar{\varvec{S}}_{\varvec{\beta }jk}\) after a finite sequence of rank-1 updates.
Next, recall the well-known matrix determinant lemma (Harville 2008, Theorem 18.1.1). Given an invertible matrix \(\varvec{A}\) and two vectors \(\varvec{u}\) and \(\varvec{v}\), this lemma states that \( \,|\, \varvec{A} + \varvec{u}\varvec{v}' \,|\, = \,|\, \varvec{A} \,|\, (1 + \varvec{u}'\varvec{A}^{-1}\varvec{v})\). Applying this lemma \(n_{\varvec{\gamma }0k} - n_{\varvec{\beta }jk}\) consecutive times to \(\bar{\varvec{S}}_{\varvec{\beta }jk}\) (once per rank-1 update), we have that
since the matrices \(\bar{\varvec{S}}_q\) are all positive definite. Now, each term in the sum is \(O(n_{\varvec{\gamma } 0k}^{-1})\), but there are \(n_{\varvec{\gamma } 0k} - n_{\varvec{\beta } jk} = O_p(n_{\varvec{\gamma } 0k})\) terms. So, the whole sum is \(O_p(1)\) and greater than zero as well. This is an important point, because it means that \( \,|\, \bar{\varvec{S}}_{\varvec{\gamma } 0k} \,|\, > \,|\, \bar{\varvec{S}}_{\varvec{\beta } jk} \,|\, (1 + O_p(1))\). Hence,
where \(O_p(1)\) is greater than zero. Then, \(b_{k4}\) diverges faster than the terms that converge to zero, and thus \(b_k\) also diverges, as desired.
Appendix D: Software
We developed a Julia package, available at https://github.com/igutierrezm/MANOVABNPTest.jl, that is relatively easy to call from R (thanks to the R package JuliaConnectoR). Details about its installation, as well as a minimal reproducible example, are available at the repository’s README. A more elaborate example (specifically, an R script reproducing Fig. 2) is available at https://raw.githubusercontent.com/igutierrezm/MANOVABNPTest.jl/master/extras/elaborate-example.R.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gutiérrez, I., Gutiérrez, L. & Alvares, D. A new flexible Bayesian hypothesis test for multivariate data. Stat Comput 33, 50 (2023). https://doi.org/10.1007/s11222-023-10214-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-023-10214-6