Tests of Zero Correlation Using Modified RV Coefficient for High-Dimensional Vectors

Tests of zero correlation between two or more vectors with large dimension, possibly larger than the sample size, are considered when the data may not necessarily follow a normal distribution. A single-sample case for several vectors is first proposed, which is then extended to the common covariance matrix under the assumption of homogeneity across several independent populations. The test statistics are constructed using a recently proposed modification of the RV coefficient (a correlation coefficient for vector-valued random variables) for high-dimensional vectors. The accuracy of the tests is shown through simulations.


Introduction
Let k = (X k1 , … , X kp ) T , k = 1, … , n , be iid random vectors drawn from a population with E ( k ) = ∈ ℝ p and Cov ( k ) = ∈ ℝ p×p , where > 0 can be expressed as a partitioned matrix = ( ij ) b i,j=1 with blocks ij ∈ ℝ p i ×p j , ji = T ij , T ii = ii . We are interested to test when the block dimensions, p i , may exceed the sample size, n , and the data may not necessarily follow the multivariate normal distribution. Under H 0b , reduces to a block-diagonal structure, = diag( 1 , … , bb ) , ii ∈ ℝ p i ×p i . Obviously, under normality, the test of H 0b is equivalent to testing independence of the corresponding vectors. Now, consider g ≥ 2 independent populations with lk = (X lk1 , … , X lkp ) T , k = 1, … , n l , as iid random vectors drawn from lth population with E ( lk ) = l , (1) H 0b ∶ ij = ∀ i < j vs. H 1b ∶ ij ≠ for at least one pairi < j to the test of H 0g in Sect. 4. Accuracy of the tests through simulations is shown in Sect. 5, where technical proofs are deferred to Appendix.

Notations and Preliminaries
Let the random vectors k ∈ ℝ p , k = 1, … , n , with E ( k ) = ∈ ℝ p , Cov ( k ) = ∈ ℝ p×p , be partitioned as k = ( T 1k , … , T bk ) T , ik ∈ ℝ p i so that accordingly where ℝ a×b is the space of real matrices and ⊗ is the Kronecker product. We assume > 0 , ii > 0 ∀ i. Let = ∑ n k=1 ik ∕n and � = ∑ n k=1 ( � k ⊗ � k )∕(n − 1) be unbiased estimators of and , likewise partitioned as = ( , so that i = ∑ n k=1 ik ∕n and � ij = ∑ n k=1 ( � ik ⊗ � jk )∕(n − 1) are unbiased estimators of i and ij , where ̃ k = k − , ̃ ik = ik − i . Denoting = ( T 1 … T n ) ∈ ℝ n×p as the entire data matrix and i = ( T i1 , … T in ) T ∈ ℝ n×p i as the data matrix for ith block, we can express the estimators more succinctly as where = − ∕n is the n × n centering matrix, is the identity matrix, = T with a vector of 1s. The null hypotheses in (1) thus imply the nullity of all offdiagonal blocks ij in . Consider first the simplest case of b = 2 , with only one off-diagonal block 12 = Cov ( 1k , 2k ) . Obviously, a test of zero correlation (or, under normality, of independence) between 1k and 2k can be based on an empirical estimator of ‖ 12 ‖ 2 = tr ( 12 21 ) or an appropriately normed version of it. One such normed measure is proposed in [5] as where ‖ ij ‖ 2 is an unbiased (and consistent) estimator of ‖ ij ‖ 2 , i, j = 1, 2; see Sect. 3 for formal definition. The ̂ is a modified form of the original RV coefficient, ̃ = ‖̂ 12 ‖ 2 ∕‖̂ 11 ‖‖̂ 22 ‖ , as an extension of the scalar correlation coefficient, to measure correlation between vectors of possibly different dimension. Note that ̂ is constructed using unbiased estimators in the true RV coefficient = ‖ 12 ‖ 2 ∕‖ 11 ‖‖ 22 ‖ . The RV coefficient extends the usual scalar correlation coefficient to data vectors of possibly different dimension, is often used to study relationship between different data configurations, and has many attractive properties; see also [25].
Based on the aforementioned modification and subsequent test of zero correlation, we here extend the same concept and construct tests of hypotheses in (1) and their multi-sample variants. The proposed tests will be valid for the following general multivariate model which helps us relax the normality assumption.
We define the multivariate model as where  denotes a p-dimensional distribution function having fourth moment finite; see the assumptions below. Model (4) covers a wide class of distributions such as the elliptical class including multivariate normal. Further, it will help us enhance the validity of the proposed test to a variety of applications under fairly general conditions.

One-Sample Test with Multiple Blocks
, the RV coefficient of correlation between ik and jk can be computed as which implies that a test for ij = 0 can be equivalently based on ij . In fact, as will be shown shortly, the denominator of ̂ ij adjusts itself through Var (‖ ij ‖ 2 ) so that it suffices to use ‖ ij ‖ 2 to construct a test statistic. We begin by defining the estimators used to compose ̂ ij and their moments, which will also be useful in subsequent computations. For notational convenience, denote 1∕2 ii = ii so that ‖ ii ‖ 2 = tr ( ii ) and correspondingly ‖̂ ii ‖ 2 = tr (̂ ii ) . For the proof of the following theorem, see "Appendix 1.2.1".

Theorem 1
The unbiased estimators of ‖ ii ‖ 2 , ‖ ij ‖ 2 , and ‖ ii ‖‖ jj ‖ are defined as Note that the terms Q ij are needed to make the estimators, and hence the subsequent test statistics, valid under Model (4) beyond the normality assumption. Essentially, these terms involve fourth-order elements of ik due to the variances of bilinear forms to be computed. This in turn calls for bounding such moments of ik to be finite (see assumptions below). For this, define with z ik = T ik ik . As ij = 0 under normality, it serves as a measure of non-normality and, given finite fourth moment, helps extend the results to a wide class of distributions under Model (4). The results of Theorem 1 also help obtain unbiased estimators of ij and ii as (see "Appendix 1.2.1") The estimators in Theorem 1 are composed of ̂ ij (see Eq. 2) and Q ij , both of which are simple functions of mean-deviated vectors ̃ ik . It makes the estimators very simple and computationally highly efficient for practical use. For mathematical amenability, however, an alternative form of the same estimators in terms of U-statistics helps us study their properties due to the attractive projection and asymptotic theory of U-statistics.
where the sum is quadruple over {k, l, r, s} and (⋅) = (k, r, l, s) implies k ≠ r ≠ l ≠ s . The moments in the following theorem follow conveniently using the alternative form of estimators in (13) and will be very useful for further computations in the sequel.
We skip the proof of Theorem 2 which follows from that of Theorem 2 in [5]. The terms KO(⋅) sum up constants that eventually vanish under the assumptions. In particular, K involves ij in (10) and terms involving Hadamard products such as ii ⊙ jj which converge to zero. Now, we have the required tools to proceed with the test of H 0b in (1). As mentioned above, can be based on a sum of Frobenius norms over all off-diagonal blocks, i.e., We thus define the test statistic for H 0b as Here, T ij is a statistic to test H 0ij ∶ ij = 0 for any single off-diagonal block ij . Moreover, the scaling factor p i p j will help us obtain the limit of T b for p i → ∞ along with n → ∞ , under the following assumptions. Recall k ∈ ℝ p in Model (4).
Assumption 3 bounds fourth moment of k so that, by Cauchy-Schwarz inequality, E (y 2 k1s y 2 k2s ) < ∞ which implies that ij ∕‖ ii ‖ 2 ‖ jj ‖ 2 → 0 . This conforms to the definition of ij and helps us present the test under Model (4), and it also makes the terms involving K vanish in Theorem 2. Assumption 4 bounds the average of the eigenvalues of diagonal blocks. It is a mild assumption, often used in high-dimensional testing, and as its consequence, ‖ ij ‖ 2 ∕p i p j = O(1).
Whereas Assumption 4 is needed for limits under H 0b and H 1b , its consequence is only needed under H 1b since it neither holds nor is required under H 0b . Assumption 5 controls joint growth of n and p i so that the limit holds under a high-dimensional framework, and it is also needed only under H 1b . Now, for T b , we have Equation (18) gives Var ( T ij ) and the covariance, say C 1 , C 2 , C 3 , respectively, given as below.
Under H 0b , the covariances vanish and the variance reduces to (18) is bounded and a nondegenerate limit of n T b may exist. That this indeed holds under the assumptions follow from the limit of T ij given in Ahmad ([5], Theorem 3), by noting that the covariance terms above essentially vanish under the same assumptions. The following theorem summarizes the result, the proof of which, along with that of Theorem 12 for the multi-sample case, is given in "Appendix 1.2.3".

Theorem 6 Given
T b in (17) and as in Eq. (22). Further, the same limits hold when The last part of Theorem 6 makes the statistic T b applicable, when a consistent estimator, i.e., From the proof of the theorem, we also note that the moments and the limit of T b are functions of are the parameter subspaces under H 0b and H 1b , respectively. Let z denote 100 % quantile of N(0, 1) distribution, so that P( similar arguments imply that the local power also converges to 1 as n, p i → ∞.

Multi-Sample Extension with Multiple Blocks
Consider the multi-sample set up given after (1) are unbiased estimators of l , l , correspondingly partitioned as ) T ∈ ℝ n l ×p as the entire data matrix for lth sample, with li = ( T li1 , … , T lin l ) T ∈ ℝ n l ×p i as ith data matrix in l , we can rewrite the estimators, using l = n l − n l ∕n l , l = 1, … , g , as The RV coefficient in Eq. (3) can now be computed from lth sample, l = 1, … , g , as which estimates lij = ‖ lij ‖ 2 ∕‖ lii ‖‖ ljj ‖ . We are interested to test hypotheses in (1) for common , assuming l = ∀ l , i.e., to test if is block diagonal, = diag( 11 , … , bb ) . Formally, where the subscript g refers to g populations. A test of H 0g can thus be constructed by pooling information from g populations. For this, we state Assumptions 3-5 for the multi-sample case.
Assumption 9 lim n l ,p i →∞ Note that these are unbiased estimators of ‖ ii ‖ 2 , ‖ ij ‖ 2 , � ‖ ii ‖ 2 � 2 and ‖ ii ‖‖ jj ‖ , respectively, obtained from sample l, so that they can be used to construct pooled estimators of the same parameters, as the following.
respectively, as In pooling information across g samples, Eqs. (31)-(34) use weights 1∕ l , where the pooled estimators correspond to = ( ij ) b i,j=1 for which H 0g is defined. Thus, an appropriate test statistic for H 0g can be defined as which extends Eq. (17) for the multi-sample case under homogeneity. Equivalently, we can write In this form, T lij , hence T lb , are the same as T ij and T b in Sect. 3 but now defined for lth population. In either case, the main focus on the formulation of T g is simplicity, so that, by independence of the g samples, the computations for the one-sample case will mainly suffice for the multi-sample case, as for example the results of the following theorem; see "Appendix 1.2.2" for proof.

Theorem 11
The pooled estimators ‖ pii ‖ 2 and ‖ pij ‖ 2 in Eqs. (31)-(32) are unbiased for ‖ ii ‖ 2 and ‖ ij ‖ 2 , respectively. Further, under Assumptions 7-9, as n, p i → ∞, Theorem 11 provides bounds on the variance ratios which are more important for our purpose, where the exact variances, which follow from those of single-sample case in Theorem 2, are given in "Appendix 1.2.2". Moreover, Eq. (38) also implies that It is obvious from the construction and moments of T g that its limit, by the independence of samples, follows along the same lines as that of T b without much new computation, and likewise holds for the consistency of ‖ lii ‖ 2 ∕p 2 i using that of ‖ ii ‖ 2 ∕p 2 i , so that, plugged in, they yield V ar ( T g ) as a consistent estimator of Var ( T g ) as the following, where V pij =‖ pii ‖ 2‖ pjj ‖ 2 ∕p 2 i p 2 j : The following theorem extends Theorem 6 to the multi-sample case; see "Appendix 1.2.3" for proof.

Simulations
We evaluate the performance of T b and T g through simulations. For the onesample multi-block statistic, T b , we take b = 3 and sample random vectors of sizes n ∈ {20, 50, 100} from multivariate normal, t and chi-square distributions with 10 degrees of freedom each for the latter two, with block dimensions p 1 ∈ {10, 25, 50, 150, 300} , p 2 = 2p 1 , p 3 = 3p 1 . Two covariance patterns are assumed for the true covariance matrix, i.e., compound symmetry (CS) and AR(1), defined, respectively, as (1 − ) p + p and = with = |k−l| 1∕5 and a diagonal matrix with entries square roots of + 1 ∕p , where 1 = 1 ∶ p , is the identity matrix and is matrix of 1s. We set = 0.5 . Under H 0b , the same structures are imposed on three diagonal blocks ii of dimensions p i , so that = ⊕ 3 i=1 ii , where ⊕ denotes the Kronecker sum. (42) For the multi-sample statistic, T g , we take g = 2 with b = 2 , and generate n 1 = {20, 40, 75} and n 2 = {30, 50, 100} iid vectors from respective populations with p 1 = {20, 50, 100, 300, 500} and p 2 = 2p 1 . Under homoscedasticity, the common is assumed to follow the two covariance structures under the alternative, where under the null, the two diagonal blocks are assumed to follow the same structures with their respective dimensions. The estimated size and power of T b and T g , reported in Tables 1, 2, 3 and 4, respectively, are an average of 1000 simulation runs.
We observe an accurate test size for both tests, T b and T g , for all parameter settings under all distributions. For multi-sample case, the test performs relatively more accurately although the sample sizes reasonably differ. Note that, generally, both tests are very conservative for relatively small sample sizes, but improve with increasing sample size. In particular, the accuracy of the tests remains intact for increasing dimension. The results under the t and uniform distributions point to the evidence of robustness of the tests to the normality assumption, under Model (4).
Similar performance can be witnessed for the power of both test statistics. Like test size, the accuracy of which increases for increasing sample size, and the power also improves with increasing sample size, but also with increasing dimension, even for seriously differing dimensions for different blocks. In particular, for the multi-sample case, we observe an accurate performance of T g for unbalanced design, and the accuracy is not disturbed for increasing dimension. The robustness character of the two tests also resembles that for the test size.

Discussion and Conclusions
Test statistics for correlation between two or more vectors are presented when the dimensions of the vectors, possibly unequal, may exceed the number of vectors. The   one-sample case is further extended to two or more independent samples coming from populations assumed to have common covariance matrix. Accuracy of the tests is shown through simulations with different parameters. Among potential advantages of the tests include their simple construction, particularly for the multi-sample case whence most computations follow conveniently from the one-sample results, and their wide practical applicability under fairly general conditions and for a larger class of multivariate models including the multivariate normal. A particularly distinguishing feature of the tests is that they are composed of computationally very efficient estimators defined as simple functions of the empirical covariance matrix. From applications perspective, it may be mentioned that the tests are constructed using the RV coefficient so that they can only be used to assess linear independence of vectors. It distinguishes them from measures such as distance correlation or kernel methods which can also measure nonlinear dependence; see also [5] for more details. and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Some Basic Moments
Given Model (4) with ik = ik − i , let A ik = T ik ik , A ikr = T ik ir , k ≠ r , and ii , ij be as in Eq. (10) ; then, the moments in the following theorem hold under Model (4).

Proof of Theorem 1
Since we can write ̂ = Using ̂ ij and Theorem 13, Eqs. (7) and (9) can be obtained from 'Appendix B.1' in [4] by replacing 1 with i and 2 with j. They express ‖̂ ij ‖ 2 , ‖̂ ii ‖ 2 ‖̂ jj ‖ 2 and Q ij in terms of functions of A ik and A ikr , defined above, so that, taking expectation, it follows that̂ Solving these equations simultaneously gives Eqs. (7) and (9), which can be used to show unbiasedness of ̂ ij . Following the same lines, and using ̂ ii above, we can write where, for simplicity, A 01 , A 02 , A 03 contain terms the expectation of which vanishes, so that, using Theorem 13, it follows after some simplification that Solving simultaneously gives Eqs. (6) and (8), and also the unbiasedness of ̂ ii in Eq. (12).

Proof of Theorem 11
Under the assumption l = ∀ l = 1, … , g , the unbiasedness follows immediately since

Proofs of Theorems 6 and 12
Consider Ahmad ([5], Theorem 3), it is shown that under Assumptions 3-5, as n, p i → ∞ , where Var ( T ij ) follows from Theorem 2. As the limit holds for all T ij , i < j , and T b is a sum of all such T ij , we basically need to focus on the covariances in Eqs. (19)-(21) for the limit of T b . Consider C 1 , where the first term, normed by ‖ ii ‖ 2 ‖ jj ‖‖ jj ′ ‖ , vanishes under Assumptions 3-5, as p i → ∞ , and the same holds for the second term. Repeating the same for C 2 and C 3 , and noting that the terms like ‖ ii ‖ 2 ∕p 2 i are uniformly bounded under the same assumptions, it follows that where ij = ‖ ij ‖ 2 ∕‖ ii ‖‖ jj ‖ ; see Eq. (5). Combined with Eq. (22), it implies that Var (n T b ) is bounded, but the covariances with the same order vanish. Now, denote and B is the vector of 1s. By the above arguments, as n, p i → ∞ , Cov ( B ) = is a diagonal matrix with diagonal elements Var (T ij ) , i, j = 1, … , b , i < j , i.e., where i = Cov ( i ) = diag( Var ( T i1 , … , Var ( T ib )) , i = 1, … , b − 1 . Hence (see Eq. 22), This gives the limit T b by a simple application of Cramér-Wold device ( [31], p. 16), including for the case under the null whence the covariances vanish exactly. For the last part of the theorem, we only need to prove that, we note, from Theorem 2, that The first and last terms vanish under the assumptions, so that Var (‖ ii ‖ 2 ∕‖ ii ‖ 2 ) → 4O(1∕n 2 ) as p i → ∞ . Thus, ‖ ii ‖ 2 ∕p 2 i  � � � � � � � → ‖ ii ‖ 2 ∕p 2 i , as n, p i → ∞ . Plugging in ‖ ii ‖ 2 for ‖ ii ‖ 2 in Var ( T b ) gives V ar ( T b ) as a consistent estimator of Var ( T b ) . This completes the proof of Theorem 6.