Covariance matrix testing in high dimension using random projections

Ayyala, Deepak Nag; Ghosh, Santu; Linder, Daniel F.

doi:10.1007/s00180-021-01166-4

Covariance matrix testing in high dimension using random projections

Original paper
Published: 06 November 2021

Volume 37, pages 1111–1141, (2022)
Cite this article

Computational Statistics Aims and scope Submit manuscript

536 Accesses
1 Citation
Explore all metrics

Abstract

Estimation and hypothesis tests for the covariance matrix in high dimensions is a challenging problem as the traditional multivariate asymptotic theory is no longer valid. When the dimension is larger than or increasing with the sample size, standard likelihood based tests for the covariance matrix have poor performance. Existing high dimensional tests are either computationally expensive or have very weak control of type I error. In this paper, we propose a test procedure, CRAMP (covariance testing using random matrix projections), for testing hypotheses involving one or more covariance matrices using random projections. Projecting the high dimensional data randomly into lower dimensional subspaces alleviates of the curse of dimensionality, allowing for the use of traditional multivariate tests. An extensive simulation study is performed to compare CRAMP against asymptotics-based high dimensional test procedures. An application of the proposed method to two gene expression data sets is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hypothesis tests for high-dimensional covariance structures

Article 01 August 2020

Empirical likelihood test for the equality of several high-dimensional covariance matrices

Article 07 April 2021

On the test of covariance between two high-dimensional random vectors

Article 07 October 2023

Notes

References

Achlioptas D (2001) Database-friendly random projections. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’01, page 274–281, New York, NY, USA. Association for Computing Machinery. ISBN 1581133618
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc National Acad Sci 96(12):6745–6750. ISSN 0027-8424
Anderson TW (2003). An introduction to multivariate statistical analysis. Wiley Series in Probability and Statistics, 3rd edn. ISBN 978-0-471-36091-9
Ayyala DN (2020) High-dimensional statistical inference: Theoretical development to data analytics (Chapter 6), volume 43 of Handbook of Statistics, pp. 289–335. Elsevier. https://doi.org/10.1016/bs.host.2020.02.003
Burr M, Gao S, Knoll F (2018) Optimal bounds for Johnson-Lindenstrauss transformations. J Mach Learn Res 19:1–22
MathSciNet MATH Google Scholar
Cai T, Liu W, Xia Y (2013) Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J Am Stat Assoc 108(501):265–277
Article MathSciNet Google Scholar
Cai TT, Li H, Liu W, Xie J (2012) Covariate-adjusted precision matrix estimation with an application in genetical genomics. Biometrika 100(1):139–156, 11. ISSN 0006-3444. https://doi.org/10.1093/biomet/ass058
Cannings TI (2021) Random projections: data perturbation for classification problems. WIREs Comput Stat 13(1):e1499. https://doi.org/10.1002/wics.1499
Article MathSciNet Google Scholar
Cannings TI, Samworth RJ (2017) Random-projection ensemble classification. J R Stat Soc Ser B (Stat Methodol) 79(4):959–1035
Article MathSciNet Google Scholar
Chen SX, Zhang LX, Zhong PS (2010) Tests for high-dimensional covariance matrices. J Am Stat Assoc 105(490):810–819
Article MathSciNet Google Scholar
Fisher TJ (2012) On testing for an identity covariance matrix when the dimensionality equals or exceeds the sample size. J Stat Plann Inference 142(1):312–326
Article MathSciNet Google Scholar
Fisher TJ, Sun X, Gallagher CM (2010) A new test for sphericity of the covariance matrix for high dimensional data. J Multivar Anal 101(10):2554–2570
Article MathSciNet Google Scholar
Hu J, Bai Z (2016) A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices. Sci China Math 59:2281–2300
Article MathSciNet Google Scholar
John S (1972) The distribution of a statistic used for testing sphericity of normal distributions. Biometrika 59(1):169–173
Article MathSciNet Google Scholar
Johnson WB, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. Contemp Math 26:189–206
Article MathSciNet Google Scholar
Ledoit O, Wolf M (2002) Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann Stat 30(4):1081–1102
Article MathSciNet Google Scholar
Li J, Chen SX (2012) Two sample tests for high-dimensional covariance matrices. Ann Stat 40(2):908–940
Article MathSciNet Google Scholar
Lopes M, Jacob L, Wainwright MJ (2011) A more powerful two-sample test in high dimensions using random projection. pages 1206–1214
Nagao H (1973) On some test criteria for covariance matrix. Ann Stat 1(4):700–709
Article MathSciNet Google Scholar
Qian M, Tao L, Li E, Tian M (2020) Hypothesis testing for the identity of high-dimensional covariance matrices. Stat Probab Lett 161:108699
Article MathSciNet Google Scholar
Rencher AC, Christensen WF (2012). Methods of Multivariate Analysis. Wiley, 3rd edn. ISBN 9781118391686
Schclar A, Rokach L (2009) Random projection ensemble classifiers. In: Filipe J, Cordeiro J (eds) Enterprise information systems. Springer, Berlin, pp 309–316
Chapter Google Scholar
Schott JR (2007) A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Comput Stat Data Anal 51(12):6535–6542
Article MathSciNet Google Scholar
Srivastava MS, Yanagihara H, Kubokawa T (2014) Tests for covariance matrices in high dimension with less sample size. J Multivar Anal 130:289–309
Article MathSciNet Google Scholar
Thanei G-A, Heinze C, Meinshausen N (2017) Random Projections for Large-Scale Regression, pp. 51–68. Springer International Publishing, Cham, 2017. ISBN 978-3-319-41573-4. https://doi.org/10.1007/978-3-319-41573-4_3
van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605
MATH Google Scholar
Wu T-L, Li P (2020) Projected tests for high-dimensional covariance matrices. J Stat Plann Inference, 207:73–85. ISSN 0378-3758
Zhao SD, Cai TT, Li H (2014) Direct estimation of differential networks. Biometrika 101(2):253–268. ISSN 0006-3444. https://doi.org/10.1093/biomet/asu009

Download references

Author information

Authors and Affiliations

Department of Population Health Sciences, Medical College of Georgia, Augusta University, 1120 15th Street, Augusta, GA, 30912, USA
Deepak Nag Ayyala, Santu Ghosh & Daniel F. Linder

Authors

Deepak Nag Ayyala
View author publications
You can also search for this author in PubMed Google Scholar
Santu Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Daniel F. Linder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepak Nag Ayyala.

Appendix

Proof of Theorem 1

The proof of Theorem 1 is along the same lines as the proof of Theorem 2 in Srivastava et al. (2014). To show that the distribution of $\overline{\pi }_U$ is independent of $\sigma $, define $\mathbf {X}^*_{m; i} = \mathcal {R}_m \mathbf {X}_i, i = 1, \ldots , n, m = 1, \ldots , M$ as the projection of the $i^\mathrm{th}$ observation using the $m^\mathrm{th}$ random projection matrix. Then we have

$$\begin{aligned} \mathrm{var} \left( \mathbf {X}_{m;1}, \ldots , \mathbf {X}_{m; n} \right) = \mathcal {S}^*_m = \mathcal {R}_m \mathcal {S} \mathcal {R}_m^{\top }, \end{aligned}$$

where $\mathcal {S}$ and $\mathcal {S}_m^*$ are the sample covariance matrices of the original and projected observations respectively. From equation (17), the p-values based on M i.i.d. random projection matrices are

$$\begin{aligned} \pi _m = 1 - \chi ^2_{\nu } \left( \frac{1}{k} \mathrm{tr}\left\{ \frac{\mathcal {S}_m^*}{ \mathrm{tr}\mathcal {S}_m^* / k} - \mathcal {I}_k \right\} ^2 \right) . \end{aligned}$$

Firstly since the random matrices are independent, conditional on the data $\mathcal {X} = \{\mathbf {X}_1, \ldots , \mathbf {X}_n\}$ and $\mathcal {Y} = \{\mathbf {Y}_1, \ldots , \mathbf {Y}_m \}$, the p-values $\pi _1, \ldots , \pi _M$ are independent and identically distributed. This is because of the orthogonality of the projection matrices which preserves the covariance matrix structure ($\mathcal {R} \left( \sigma ^2 \mathcal {I}_p \right) \mathcal {R}^{\top } = \sigma ^2 \mathcal {I}_k$). Additionally, we can write

$$\begin{aligned} P \left[ \overline{\pi }< u \right] = \mathbb {E}_{\mathcal {X}, \mathcal {Y}} \left\{ P_{\mathcal {R}} \left[ \overline{\pi } < u | \mathcal {X}, \mathcal {Y} \right] \right\} , \end{aligned}$$

(A.20)

where the expected value is with respect to the distribution of the observations and the probability is with respect to the randomness of the projection matrix.

By the conditional independence of $\pi _1, \ldots , \pi _M$ and the central limit theorem, we have a normal approximation to the probability in (A.20)

$$\begin{aligned} \lim \limits _{M \rightarrow \infty } \left| P \left[ \overline{\pi } < u \right] - \Phi \left( \frac{u - \mathbb {E}_{\mathcal {R}} \left[ u | \mathcal {X}, \mathcal {Y} \right] }{ \mathrm{var}_{\mathcal {R}} \left[ u | \mathcal {X}, \mathcal {Y} \right] } \right) \right| = 0. \end{aligned}$$

(A.21)

Hence the probability $P \left[ \overline{\pi } < u \right] $ can be approximated only using the moments of $U | \mathcal {X}, \mathcal {Y}$. Under the null hypothesis $H_{0S}$, the variable $U | \mathcal {X}, \mathcal {Y}$ is defined as

$$\begin{aligned} U | \mathcal {X}, \mathcal {Y}&= 1 - \chi ^2_{\nu } \left( U | \mathcal {X}, \mathcal {Y} \right) = 1 - F_{\chi ^2_{\nu }} \left( \mathrm{tr}\left\{ \frac{\mathcal {S}_m^*}{ \mathrm{tr}\mathcal {S}_m^* / k} - \mathcal {I}_k \right\} ^2 | \mathcal {X}, \mathcal {Y} \right) \nonumber \\&\sim \mathrm{Unif}(0,1). \end{aligned}$$

(A.22)

The uniform distribution is from the standard property of p-value under the null hypothesis, which is independent of $\sigma ^2$. Using this property, we shall show that the distribution of $E_{\mathcal {R}} \left[ U | \mathcal {X}, \mathcal {Y} \right] $ and $\mathrm{var}_{\mathcal {R}} \left[ U | \mathcal {X}, \mathcal {Y} \right] $ with respect to $\mathcal {X}, \mathcal {Y}$ are also independent of $\sigma ^2$.

Let W denote the expected value of $U | \mathcal {X}, \mathcal {Y}$ with respect to $\mathcal {R}$,

$$\begin{aligned} W = \mathbb {E}_{\mathcal {R}} \left[ U | \mathcal {X}, \mathcal {Y} \right]&= \int u \, dP_{\mathcal {R}} \nonumber \\&= \int \left[ 1 - F_{\chi ^2_{\nu }} \left( \mathrm{tr}\left\{ \frac{\mathcal {S}_m^*}{ \mathrm{tr}\mathcal {S}_m^* / k} - \mathcal {I}_k \right\} ^2 | \mathcal {X}, \mathcal {Y} \right) \right] \, dP_{\mathcal {R}} \end{aligned}$$

(A.23)

where the integral is with respect to the distribution of the random projection matrix $\mathcal {R}$. While the exact integral is not of importance, it should be noted that from equation (A.22), the integrand is independent of $\sigma ^2$. As the random projection matrices are generated independent of the distribution of the observations, we can conclude that the variable W is independent of $\sigma ^2$. For any $m \ge 1$, the $\mathrm{m}\mathrm{th}$ moment of W is given by

$$\begin{aligned} \mathbb {E}_{\mathcal {X}, \mathcal {Y}} \left[ W^m \right] = \int W^m \, dF_{\mathcal {X}, \mathcal {Y}}&= \int \mathbb {E}_{\mathcal {R}} \left[ U| \mathcal {X}, \mathcal {Y} \right] ^m \, dF_{\mathcal {X}, \mathcal {Y}} \\&= \int \mathbb {E}_{\mathcal {R}} \left[ U| \mathcal {X}, \mathcal {Y} \right] \times \cdots \times \mathbb {E}_{\mathcal {R}} \left[ U| \mathcal {X}, \mathcal {Y} \right] \, dF_{\mathcal {X}, \mathcal {Y}} \\&= \int \left\{ \int U_{\mathcal {R}_1} \, dP_{\mathcal {R}_1} \right\} \cdots \left\{ \int U_{\mathcal {R}_m} \, dP_{\mathcal {R}_m} \right\} \, dF_{\mathcal {X}, \mathcal {Y}} \end{aligned}$$

Interchanging the integrals by Fubini’s theorem, we have

$$\begin{aligned} \mathbb {E}_{\mathcal {X}, \mathcal {Y}} \left[ W^m \right] = \int \cdots \int \left\{ \int U_{\mathcal {R}_1} \ldots U_{\mathcal {R}_m} \, dF_{\mathcal {X}, \mathcal {Y}} \right\} \, dP_{\mathcal {R}_1} \cdots dP_{\mathcal {R}_m} \end{aligned}$$

(A.24)

By the construction of U in equation (A.22), the integral $\left\{ \int U_{\mathcal {R}_1} \ldots U_{\mathcal {R}_m} \, dF_{\mathcal {X}, \mathcal {Y}} \right\} $ is independent of $\sigma ^2$. Therefore, all moments of W are independent of $\sigma ^2$ which implies that the distribution of W is independent of $\sigma ^2$.

Similarly, it can be shown that the distribution of $\mathrm{var}_{\mathcal {R}} \left( U| \mathcal {X}, \mathcal {Y} \right) $ is also independent of $\sigma ^2$. From the independence of the mean and variance, we have the distributions of

$$\begin{aligned} \Phi \left[ \frac{ u - \mathbb {E}_{\mathcal {R}} \left\{ U | \mathcal {X}, \mathcal {Y} \right\} }{ \mathrm{var}_{\mathcal {R}} \left\{ U | \mathcal {X}, \mathcal {Y} \right\} } \right] \text{ and } \quad \mathbb {E}_{\mathcal {X}, \mathcal {Y}} \left\{ \Phi \left[ \frac{ u - \mathbb {E}_{\mathcal {R}} \left\{ U | \mathcal {X}, \mathcal {Y} \right\} }{ \mathrm{var}_{\mathcal {R}} \left\{ U | \mathcal {X}, \mathcal {Y} \right\} } \right] \right\} \end{aligned}$$

(A.25)

are independent of $\sigma ^2$. Finally, combining this independence with equation (A.21), we have

$$\begin{aligned}&\lim \limits _{M \rightarrow \infty } P_\mathcal {R} \left\{ \overline{\pi } | \mathcal {X}, \mathcal {Y} \right\} = \Phi \left[ \frac{ u - \mathbb {E}_{\mathcal {R}} \left\{ U | \mathcal {X}, \mathcal {Y} \right\} }{ \mathrm{var}_{\mathcal {R}} \left\{ U | \mathcal {X}, \mathcal {Y} \right\} } \right] , \end{aligned}$$

with the right hand side independent of $\sigma ^2$. Taking expected values with respect to $\mathcal {X}$ and $\mathcal {Y}$, we have

$$\begin{aligned} \lim \limits _{M \rightarrow \infty } P \left[ \overline{\pi } < u \right] = \mathbb {E}_{\mathcal {X}, \mathcal {Y}} \left\{ \Phi \left[ \frac{ u - \mathbb {E}_{\mathcal {R}} \left\{ U | \mathcal {X}, \mathcal {Y} \right\} }{ \mathrm{var}_{\mathcal {R}} \left\{ U | \mathcal {X}, \mathcal {Y} \right\} } \right] \right\} . \end{aligned}$$

(A.26)

By equation (A.25), the right hand side in (A.26) is also independent of $\sigma ^2$, completing the proof. $\square $

Proof of Theorem 2

Invariance of the distribution of the two-sample test statistic can be shown similar to the above proof. Besides computation of the test statistic, rest of the argument remains the same since the Box M test statistic also follows a standard uniform distribution under the null hypothesis. Hence in Algorithm 2, $\pi _m \sim \mathrm{Unif}(0, 1)$ under $H_0$, which is independent of the choice of $\Sigma $. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ayyala, D.N., Ghosh, S. & Linder, D.F. Covariance matrix testing in high dimension using random projections. Comput Stat 37, 1111–1141 (2022). https://doi.org/10.1007/s00180-021-01166-4

Download citation

Received: 25 February 2021
Accepted: 15 October 2021
Published: 06 November 2021
Issue Date: July 2022
DOI: https://doi.org/10.1007/s00180-021-01166-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Covariance matrix testing in high dimension using random projections

Abstract

Access this article

Similar content being viewed by others

Hypothesis tests for high-dimensional covariance structures

Empirical likelihood test for the equality of several high-dimensional covariance matrices

On the test of covariance between two high-dimensional random vectors

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Covariance matrix testing in high dimension using random projections

Abstract

Access this article

Similar content being viewed by others

Hypothesis tests for high-dimensional covariance structures

Empirical likelihood test for the equality of several high-dimensional covariance matrices

On the test of covariance between two high-dimensional random vectors

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation