Skip to main content
Log in

Two-sample mean vector projection test in high-dimensional data

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Statistical hypothesis testing for high-dimensional data poses challenges in recent inference. The Hotelling test is commonly applied to the comparison of mean vectors with a fixed dimension of mean vectors, but becomes unavailable when the dimension diverges or greater than sample sizes. For the high-dimensional regimes, we propose a two-sample mean vector test statistic by adding a projection term based on the Euclidean norm of the mean vectors. The projection term improves power and ensures the validity for dimensions greater than sample sizes, without relying on any inverse matrices. The proposed projection statistic, suitably standardized, approximates a standard normal distribution under mild conditions. Extensive simulation results, under different scenarios, show that the proposed approach enjoys a comparable Type I error and an improved efficiency power. We further illustrate its application by testing the equality of two acute lymphocytic leukemia genetic data and a significant test of the “Sell in May and Go Away” effect in China A stock market.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

Download references

Acknowledgements

The authors are grateful to the Associate Editor and referees for their useful comments that enabled us to improve the paper.

Funding

Funnding was provided by National Natural Science Foundation of China, (Grant No. 11871173); National Statistical Science Research Project, (Grant No. 2020LZ09).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xia Cui.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 1459 KB)

Appendix 1

Appendix 1

\(\mathbf{{A}}\) Under \({H_0}\) and conditions defined in \((a), (b_2)\) and \((c_2)\), the expectation and the variance of the statistic \({T_{n}}\) are

$$\begin{aligned} E(T_n) =\,&\text {tr}(D\Sigma _1/n_1) + \text {tr}(D\Sigma _2/n_2), \\ \text{ Var }({T_{n}}) =\,&\left[ \frac{2}{n_1^2} \text {tr}\{(D\Sigma _1)^2\} + \frac{2}{n_2^2} \text {tr}\{(D\Sigma _2)^2\} + \frac{4}{n_1n_2}\text {tr}(D\Sigma _2D\Sigma _1) \right] \{1 + o(1 )\}. \end{aligned}$$

In particular, assuming \(\Sigma = \Sigma _1 = \Sigma _2\), under \(H_0\) and conditions \((a), (b_1)\), \((c_1)\), the expectation and the variance of the statistic \(T_n\) simplify to

$$\begin{aligned} E(T_n) =\,&\tau \text {tr}(D\Sigma ), \\ \text{ Var }({T_{n}}) =\,&2{\tau ^2}\text {tr}{\{(D\Sigma )^2 \}}\{1 + o(1 )\}. \end{aligned}$$

Proof

The proposed statistic \({T_{n}}\) can be rewritten as

$$\begin{aligned} {T_{n}}&= ({{\bar{\textbf {X}}}_1}-{{\bar{\textbf {X}}}_2})^TD({{\bar{\textbf {X}}}_1}-{{\bar{\textbf {X}}}_2}) = {{\bar{\textbf {X}}}_1}^T D{{\bar{\textbf {X}}}_1} + {{\bar{\textbf {X}}}_2}^T D{{\bar{\textbf {X}}}_2 }-2{{\bar{\textbf {X}}}_1}^T D{{\bar{\textbf {X}}}_2}. \end{aligned}$$

Thus,

$$\begin{aligned} E(T_n) =\,&E({\bar{{\textbf {X}}}}_1^T D {\bar{{\textbf {X}}}}_1 + {\bar{{\textbf {X}}}}_2^T D {\bar{{\textbf {X}}}}_2 - 2{\bar{{\textbf {X}}}}_1^T D {\bar{{\textbf {X}}}}_2)\\ =\,&\text {tr}(D\Sigma _1)/n_1 + \varvec{\mu }_1^T D \varvec{\mu }_1 + \text {tr}(D\Sigma _2)/n_2 + \varvec{\mu }_2^T D \varvec{\mu }_2 - 2\varvec{\mu }_1^TD \varvec{\mu }_2\\ =\,&(\varvec{\mu }_1-\varvec{\mu }_2)^T D (\varvec{\mu }_1-\varvec{\mu }_2) + \text {tr}(D\Sigma _1)/n_1 + \text {tr}(D\Sigma _2)/n_2, \end{aligned}$$

where

$$\begin{aligned} E({\bar{{\textbf {X}}}}_1^T D {\bar{{\textbf {X}}}}_1) =\,&E({\bar{{\textbf {X}}}}_1 - \varvec{\mu }_1 +\varvec{\mu }_1)^T D({\bar{{\textbf {X}}}}_1 - \varvec{\mu }_1 +\varvec{\mu }_1)\\ =\,&E({\bar{{\textbf {X}}}}_1-\varvec{\mu }_1)^T D ({\bar{{\textbf {X}}}}_1-\varvec{\mu }_1) + E({\bar{{\textbf {X}}}}_1-\varvec{\mu }_1)^TD\varvec{\mu }_1 + \varvec{\mu }_1^TDE({\bar{{\textbf {X}}}}_1-\varvec{\mu }_1)+\varvec{\mu }_1^TD\varvec{\mu }_1\\ =\,&E\left[ \text {tr}\{D({\bar{{\textbf {X}}}}_1-\varvec{\mu }_1)({\bar{{\textbf {X}}}}_1-\varvec{\mu }_1)^T\}\right] +\varvec{\mu }_1^TD\varvec{\mu }_1\\ =\,&\text {tr}\left[ E\{D({\bar{{\textbf {X}}}}_1-\varvec{\mu }_1)({\bar{{\textbf {X}}}}_1-\varvec{\mu }_1)^T\} \right] + \varvec{\mu }_1^TD\varvec{\mu }_1\\ =\,&\text {tr}(D\Sigma _1)/n_1 + \varvec{\mu }_1^TD\varvec{\mu }_1. \end{aligned}$$

Similarly, \(E({\bar{{\textbf {X}}}}_2^T D {\bar{{\textbf {X}}}}_2) = \text {tr}(D\Sigma _2)/n_2 + \varvec{\mu }_2^TD\varvec{\mu }_2\). Since \({\bar{{\textbf {X}}}}_1\) and \({\bar{{\textbf {X}}}}_2\) are independent, we have that

$$\begin{aligned} E(2{\bar{{\textbf {X}}}}_1^TD{\bar{{\textbf {X}}}}_2) =\,&2 E \left\{ ({\bar{{\textbf {X}}}}_1 -\varvec{\mu }_1 +\varvec{\mu }_1)^T D ({\bar{{\textbf {X}}}}_2 -\varvec{\mu }_2 +\varvec{\mu }_2)\right\} \\ =\,&2E\left\{ ({\bar{{\textbf {X}}}}_1-\varvec{\mu }_1)^T D ({\bar{{\textbf {X}}}}_2-\varvec{\mu }_2)\right\} + 2E({\bar{{\textbf {X}}}}_1-\varvec{\mu }_1)^TD\varvec{\mu }_2\\&+ 2\varvec{\mu }_1^TDE({\bar{{\textbf {X}}}}_2-\varvec{\mu }_2) + 2\varvec{\mu }_1^TD\varvec{\mu }_2\\ =\,&2\varvec{\mu }_1^TD\varvec{\mu }_2. \end{aligned}$$

Hence, under \(H_0\), \(E(T_n) = \text {tr}(D\Sigma _1)/n_1 + \text {tr}(D\Sigma _2)/n_2\). In case of identical covariance matrices, \(E(T_n) = \tau \text {tr}(D\Sigma )\), where \(\tau = (n_1+n_2)/(n_1n_2)\).

Here, we derive the variance of \(T_n\) under some mild conditions. First, we write

$$\begin{aligned} \text{ Var }({T_{n}}) =\,&\text{ Var }({\bar{\textbf {X}}^T_1}D{{\bar{\textbf {X}}}_1}) + \text{ Var }({{\bar{\textbf {X}}}^T_2}D{{\bar{\textbf {X}}}_2}) + 4\text{ Var }({{\bar{\textbf {X}}}^T_1}D{{\bar{\textbf {X}}}_2}) \\&+ 2\text{ Cov }({{\bar{\textbf {X}}}^T_1}D{{\bar{\textbf {X}}}_1},{{\bar{\textbf {X}}}^T_2} D{{\bar{\textbf {X}}}_2})-4\text{ Cov }({{\bar{\textbf {X}}}^T_1}D{{\bar{\textbf {X}}}_1},{{\bar{\textbf {X}}}^T_1}D{{\bar{\textbf {X}}}_2} ) \\&-4\text{ Cov }({{\bar{\textbf {X}}}^T_2}D{{\bar{\textbf {X}}}_2},{{\bar{\textbf {X}}}^T_1} D{{\bar{\textbf {X}}}_2}). \end{aligned}$$

We separately develop the six addends on the right hand of the above equation expressed as Step 1 to Step 6.

Step 1: We have

$$\begin{aligned} \text{ Var }({{\bar{\textbf {X}}}^T_1}D{{\bar{\textbf {X}}}_1}) =\,&E\{({{\bar{\textbf {X}}}^T_1}D{{\bar{\textbf {X}}}_1})^2\} - {E^2}({{\bar{\textbf {X}}}^T_1}D{{\bar{\textbf {X}}}_1}) \\ =\,&\frac{\Delta }{{n_1^3}}\sum \limits _{i = 1}^m {\gamma _{1ii}^2} + \frac{2}{{n_1^2}}\text {tr}\{(D\Sigma _1 )^2\} \nonumber \\&+ \frac{4}{{{n_1}}}{{{\varvec{\mu }^T_1}}}D\Sigma _1 D{{\varvec{\mu }_1}} + \frac{4}{{n_1^2}}(\Gamma _1^T D{{\varvec{\mu }_1}})\textrm{diag}(\Gamma _1^TD\Gamma _1 ){\beta _3^1}. \end{aligned}$$

where \(\beta _3^1 = {E( z_{1ij}^3)}\) and denoting by \({\bar{{\textbf {Z}}}}_i = \sum _{j=1}^{n_i} Z_{ij}/n_i\), it follows that

$$\begin{aligned} E\{({{\bar{\textbf {X}}}^T_1}D{{\bar{\textbf {X}}}_1})^2\} =\,&E\left[ \left\{ (\Gamma _1{\bar{{\textbf {Z}}}}_1 + \varvec{\mu }_1)^T D (\Gamma _1{\bar{{\textbf {Z}}}}_1 + \varvec{\mu }_1)\right\} ^2\right] \\ =\,&E\left\{ ({\bar{{\textbf {Z}}}}_1^T\Gamma _1^TD\Gamma _1{\bar{{\textbf {Z}}}}_1 + {\bar{{\textbf {Z}}}}_1^T\Gamma _1^TD\varvec{\mu }_1 + 2\varvec{\mu }_1^TD\Gamma _1{\bar{{\textbf {Z}}}}_1+ \varvec{\mu }_1^TD\varvec{\mu }_1)^2 \right\} \\ =\,&E\left\{ ({\bar{{\textbf {Z}}}}_1^T\Gamma _1^TD\Gamma _1{\bar{{\textbf {Z}}}}_1 + 2{\bar{{\textbf {Z}}}}_1^T\Gamma _1^TD\varvec{\mu }_1 + \varvec{\mu }_1^TD\varvec{\mu }_1)^2 \right\} \\ =\,&E\left\{ {({{\bar{\textbf {Z}}}^T_1}\Gamma _1^T D\Gamma _1 {{\bar{\textbf {Z}}}_1})^2} + 4{({{\bar{\textbf {Z}}}^T_1}\Gamma _1^TD{{\varvec{\mu }_1}})^2} + {({{{\varvec{ \mu }^T_1}}}D{{\varvec{\mu }_1}})^2}\right. \\&\quad \quad \left. +4{\bar{{\textbf {Z}}}}_1^T\Gamma _1^TD\Gamma _1{\bar{{\textbf {Z}}}}_1{\bar{{\textbf {Z}}}}_1^T\Gamma _1^TD\varvec{\mu }_1+ 2{{\bar{\textbf {Z}}}_1^T} \Gamma _1^TD\Gamma _1 {{\bar{\textbf {Z}}}_1}{{{\varvec{\mu }^T_1}}}D{{\varvec{\mu }_1}}\right. \\&\quad \quad \left. + 4{{\bar{\textbf {Z}}}^T_1}\Gamma _1^TD\varvec{\mu }_1\varvec{\mu }_1^TD{{\varvec{\mu }_1}} \right\} ; \end{aligned}$$

Since \(E\left( {\bar{{\textbf {Z}}}}_1\right) =0\) and denoting \(\Gamma _1^TD\Gamma _1 = {[{\gamma _{ij}^1}]_{m \times m}}\), then

$$\begin{aligned} E\{({\bar{\textbf {Z}}_1^T} \Gamma _1^TD\Gamma _1{\bar{\textbf {Z}}_1})^2\} = \sum \limits _{i = 1}^m {\sum \limits _{j = 1} ^m {\sum \limits _{k = 1}^m {\sum \limits _{l = 1}^m {E({\gamma ^1_{ij}}{{{\bar{z}} }_{1i}}{{{\bar{z}}}_{1j}} {\gamma ^1_{kl}} {{{\bar{z}}}_{1k}}{{{\bar{z}}}_{1l}})}}}}, \end{aligned}$$

where \({{\bar{z}}_{1k}} = \sum \nolimits _{j = 1}^{{n_1}} {{z_{1jk}}}/n_1,k = 1, \dots ,m\), and

$$\begin{aligned} E({{\bar{z}}_{1i}}{{\bar{z}}_{1j}}{{\bar{z}}_{1k}}{{\bar{z}}_{1l}}) = \left\{ \begin{array}{l} {{{\tilde{m}}}_4},\quad \quad \quad i = j = k = l;\\ \frac{1}{{n_1^2}},\quad \quad \quad i = j \ne k = l;i = k \ne j = l;i = l \ne j = k; \\ 0,\quad \quad \quad \;\; {\text {other}}. \\ \end{array} \right. \end{aligned}$$

According to the assumption (a), \({{\tilde{m}}_4} = E({\bar{z}}_{1k}^4) = \frac{\Delta }{{n_1^3}} + \frac{3}{{n_1^2}}\), where

$$\begin{aligned} E({\bar{z}}_{1k}^4) =\,&\frac{E \left\{ \left( \sum _{j=1}^{n_1}z_{1jk} \right) ^4\right\} }{n_1^4} = \frac{1}{{n_1^4}} \left\{ \sum \limits _{i = 1}^{{n_1}} {( 3 + \Delta ) + 3\mathop {\sum {\sum 1} }\limits _{1 \le i \ne j \le {n_1}}}\right\} \\ =\,&\frac{3n_1+\Delta n_1 + 3(n_1^2 -n_1) }{n_1^4}=\frac{\Delta }{{n_1^3}} + \frac{3}{{n_1^2}},\\ E({\bar{z}}_{1j}^2{\bar{z}}_{1k}^2 ) =\,&\text{ Cov }({\bar{z}}_{1j}^2, {\bar{z}}_{1k}^2) + E({\bar{z}}_{1j}^2) E({\bar{z}}_{1k}^2) =\frac{1}{n_1^2}\cdot \end{aligned}$$

Therefore,

$$\begin{aligned} E\{({{\bar{\textbf {Z}}}_1^T} \Gamma _1^TD\Gamma _1 {{\bar{\textbf {Z}}}_1})^2\} =\,&{{{\tilde{m}}}_4}\sum \limits _{i = 1 }^m {\gamma _{1ii}^2} + \frac{1}{{n_1^2}}\left( \sum \limits _{i \ne k} {{\gamma _{1ii}}{\gamma _ {1kk}} + \sum \limits _{i \ne j} {\gamma _{1ij}^2 + \sum \limits _{i \ne j} {{\gamma _{1ij}}{\gamma _{1ji }}}}}\right) \\ =\,&\tilde{m}_4\sum _{i=1}^{m}\gamma _{1ii}^2 - \frac{3}{n_1^2}\sum _{i=1}^{m}\gamma _{1ii}^2 + \frac{1}{n_1^2} \sum \limits _{i \ne k} {\gamma _{1ii}}{\gamma _ {1kk}} + \frac{1}{n_1^2} \sum _{i=1}^{m}\gamma _{1ii}^2 \\&+ \frac{1}{n_1^2} \sum \limits _{i \ne j} {\gamma _{1ij}^2} + \frac{1}{n_1^2} \sum _{i=1}^{m}\gamma _{1ii}^2 + \frac{1}{n_1^2} \sum \limits _{i \ne j} {\gamma _{1ij}}{\gamma _ {1ji}} + \frac{1}{n_1^2} \sum _{i=1}^{m}\gamma _{1ii}^2 \\ =\,&\frac{\Delta }{{n_1^3}}\sum \limits _{i = 1}^m {\gamma _{1ii}^ 2} + \frac{1}{{n_1^2}}\left[ \text {tr}^2(D\Sigma _1) + 2\text {tr}\{(D\Sigma _1 )^2\}\right] \cdot \end{aligned}$$

Moreover, we have \(E{({\bar{\textbf {Z}}^T_1}\Gamma _1^TD{{\varvec{\mu }_1}})^2} = \frac{1}{{{n_1}}}{{\varvec{\mu }^T_1}}D\Sigma _1 D{{\varvec{\mu }}_1}\) and \(E({\bar{\textbf {Z}}^T_1}\Gamma _1^TD\Gamma _1{\bar{\textbf {Z}}_1}{{\varvec{\mu }^T_1}}D{{\varvec{\mu }}_1}) = \frac{1}{{{n_1}}}\text {tr}(D\Sigma _1 ){{\varvec{\mu }^T_1}}D{ {\varvec{\mu }}_1}\), since \(Var({\bar{{\textbf {Z}}}}_1) = E({\bar{{\textbf {Z}}}}_1^2) = {\text {I}}_m/n_1\) \(\cdot\)

Now, we consider the calculation of \(E({\bar{\textbf {Z}}^T_1}\Gamma _1^TD\Gamma _1{\bar{\textbf {Z}}_1}{{\varvec{\mu }^T_1}}D\Gamma _1{\bar{\textbf {Z}}_1}) = \sum \nolimits _{i = 1}^m {\sum \nolimits _{j = 1}^m {\sum \nolimits _{k = 1}^m {E({\gamma _{1ij}}{b_{1k}}{{{\bar{z}}}_ {1i}}{{{\bar{z}}}_{1j}}{{{\bar{z}}}_{1k}})}}}\) with \({{\varvec{\mu }^T_1}}D\Gamma _1 = (b_{11}, \dots , b_{1m})\). Let

$$\begin{aligned} E({{\bar{z}}_{1i}}{{\bar{z}}_{1j}}{{\bar{z}}_{1k}}) = \left\{ \begin{array}{l} {{{\tilde{m}}}_3},\quad \quad \quad i = j = k; \\ 0,\quad \quad \quad \;\; {\text {other}}, \end{array} \right. \end{aligned}$$

where \({{\tilde{m}}_3} = E({\bar{z}}_{1i}^3) = \frac{1}{{n_1^3}}\sum \nolimits _{j = 1}^{{n_1}} {E( z_{1ij}^3)} = \frac{{{\beta _3^1}}}{{n_1^2}}\). Hence, we have

$$\begin{aligned} E({{\bar{\textbf {Z}}}^T_1}\Gamma _1^TD\Gamma _1{{\bar{\textbf {Z}}}_1}{{{\varvec{\mu }^T_1}}}D\Gamma _1{{\bar{\textbf {Z}}}_1 }) =\,&{{{\tilde{m}}}_3}\sum \limits _{i = 1}^m {{\gamma _{1ii}}{b_{1i}}} = \frac{{{\beta _3^1}}}{{n_1 ^2}}\sum \limits _{i = 1}^m {{\gamma _{1ii}}{b_{1i}}} \nonumber \\ =\,&\frac{{{\beta _3^1}}}{{n_1^2}}(\Gamma _1^T D{{\varvec{\mu }}_1})\textrm{diag}(\Gamma _1^TD\Gamma _1 ), \end{aligned}$$

where \(\textrm{diag}(\Gamma _1^TD\Gamma _1 )\) denotes a m-vector with entries, the diagonal elements of matrix \(\Gamma _1^TD\Gamma _1\). Thus,

$$\begin{aligned} E\{({{\bar{\textbf {X}}}_1^T} D{{\bar{\textbf {X}}}_1})^2\} =\,&\frac{\Delta }{{n_1^3}}\sum \limits _{i = 1}^m {\gamma _{1ii}^2} + \frac{1}{{n_1^2}}\left[ \text {tr}{^2}(D\Sigma _1 ) + 2\text {tr}\{(D\Sigma _1)^2\}\right] + \frac{4}{{{n_1}}}{{{\varvec{\mu }}}^T_1}D\Sigma _1 D{{\varvec{\mu }}_1} \\&+ {({{{\varvec{\mu }}}^T_1}D{{\varvec{\mu }}_1})^2}\; + \frac{2}{{{n_1}}}\text {tr}(D\Sigma _1 ){{{\varvec{\mu }}}^T_1}D{{\varvec{\mu }}_1} + \frac{4}{{n_1^2}}(\Gamma _1^T D{{\varvec{\mu }}_1})\textrm{diag}(\Gamma _1^TD\Gamma _1 ){\beta _3^1}; \\ E({{\bar{\textbf {X}}}^T_1}D{{\bar{\textbf {X}}}_1}) =\,&{{{\varvec{\mu }}}^T_1}D{{\varvec{\mu }}_1} + \frac{1}{{{n_1}}}\text {tr}(D\Sigma _1 );\\ {E^2}({{\bar{\textbf {X}}}^T_1}D{{\bar{\textbf {X}}}_1}) =\,&{({{{\varvec{\mu }}}^T_1}D{{\varvec{\mu }}_1})^2} + \frac{2}{{{n_1}}}{{{\varvec{\mu }}}^T_1}D{{\varvec{\mu }}_1}\text {tr}(D\Sigma _1) + \frac{1}{{n_1^2}}\text {tr}{^2}(D\Sigma _1 ). \end{aligned}$$

From the above result, we obtain that

$$\begin{aligned} Var({{\bar{\textbf {X}}}^T_1}D{{\bar{\textbf {X}}}_1}) =\,&\frac{\Delta }{{n_1^3}}\sum \limits _{i = 1}^m {\gamma _{1ii}^2} + \frac{2}{{n_1^2}}\text {tr}\{(D\Sigma _1 )^2\} + \frac{4}{{{n_1}}}{{{\varvec{\mu }}}^T_1}D\Sigma _1 D{{\varvec{\mu }}_1} \nonumber \\&+ \frac{4}{{n_1^2}}(\Gamma _1^T D{{\varvec{\mu }}_1})\textrm{diag}(\Gamma _1^T D\Gamma _1 ){\beta _3^1}. \end{aligned}$$
(4)

Step 2: with the same developments, we obtain

$$\begin{aligned} Var({\bar{\textbf {X}}^T_2}D{\bar{\textbf {X}}_2})\; =\,&\frac{\Delta }{{n_2^3}}\sum \limits _{i = 1}^m {\gamma _{2ii}^2} + \frac{2}{{n_2^2}}\text {tr}\{(D\Sigma _2 )^2\} + \frac{4}{{{n_2}}}{{\varvec{\mu }}^T_2}D\Sigma _2 D{{\varvec{\mu }}_2} \nonumber \\&+ \frac{4}{{n_2^2}}(\Gamma _2^TD{{\varvec{\mu }}_2})\textrm{diag}(\Gamma _2^TD\Gamma _2 ){\beta _3^2}, \end{aligned}$$
(5)

where \(\beta _3^2 = {E( z_{2ij}^3)}\) and \(\Gamma _2^TD\Gamma _2 = [\gamma _{2ij}]_{m \times m}\).

Step 3: We have that

$$\begin{aligned} Var({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_2}) =\,&E\left\{ {({\bar{\textbf {X}}_1^T } D{\bar{\textbf {X}}_2})^2}\right\} -{E^2}({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_2}) \\ =\,&\frac{1}{{{n_1}{n_2}}}\text {tr}{(D\Sigma _2 D \Sigma _1)} + \frac{1}{{{n_1}}}{{\varvec{\mu }}^T_2}D\Sigma _1 D{{\varvec{\mu }}_2} + \frac{1}{{{n_2}}}{{\varvec{\mu }}^T_1}D\Sigma _2 D{{\varvec{\mu }}_1}. \end{aligned}$$

where

$$\begin{aligned} E\left\{ ({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_2})^2\right\} =\,&E\left\{ ({\bar{\textbf {Z}}^T_1}\Gamma _1^TD\Gamma _2 {\bar{\textbf {Z}}_2} + {\bar{\textbf {Z}}^T_1}\Gamma _1^TD{{\varvec{\mu }}_2} + {{\varvec{\mu }}^T_1}D\Gamma _2 {\bar{\textbf {Z}}_2} + {{\varvec{\mu }}^T_1 }D{{\varvec{\mu }}_2})^2\right\} \\ =\,&E\left\{ {({\bar{\textbf {Z}}^T_1}\Gamma _1^TD\Gamma _2 {\bar{\textbf {Z}}_2})^2 } + {({\bar{\textbf {Z}}^T_1}\Gamma _1^TD{{\varvec{\mu }}_2})^2} + {({{\varvec{\mu }}^T_1}D\Gamma _2 {\bar{\textbf {Z}}_2})^2} + {({{\varvec{\mu }}^T_1}D{{\varvec{\mu }}_2})^2}\right\} . \end{aligned}$$

Let \(\Gamma _1^TD\Gamma _2 = [\lambda _{ij}]_{m \times m}\), then

$$\begin{aligned} E{({\bar{\textbf {Z}}^T_1}\Gamma _1^TD\Gamma _2 {\bar{\textbf {Z}}_2})^2} = \sum \limits _{i = 1}^m {\sum \limits _{j = 1}^ m {\sum \limits _{k = 1}^m {\sum \limits _{l = 1}^m {E({\lambda _{ij}}{{{\bar{z}}} _{1i}}{{{\bar{z}}}_{2j}} {\lambda _{kl}} {{{\bar{z}}}_{2k}}{{{\bar{z}}}_{1l}})}}}}, \end{aligned}$$

where

$$\begin{aligned} E({{\bar{z}}_{1i}}{{\bar{z}}_{2j}}{{\bar{z}}_{2k}}{{\bar{z}}_{1l}}) = \left\{ \begin{array}{l} \frac{1}{{{n_1}{n_2}}},\quad \quad \quad i = j = k = l;i = l \ne j = k; \\ 0,\quad \quad \quad \quad \;\; \text {other} \end{array} \right. \end{aligned}$$

Since \(E({\bar{z}}_{1i}^2{\bar{z}}_{2i}^2) = E({\bar{z}}_{1i}^2)E({\bar{z}}_{2i}^2) = \frac{1}{{{n_1}{n_2}}}\) and \(E({\bar{z}}_{1i}^2{\bar{z}}_{2j}^2) = E({\bar{z}}_{1i}^2)E({\bar{z}}_{2j} ^2) = \frac{1}{{{n_1}{n_2}}}\), it follows that

$$\begin{aligned} E\left\{ ({{\bar{\textbf {Z}}}^T_1}\Gamma _1^TD\Gamma _2 {{\bar{\textbf {Z}}}_2})^2\right\} =\,&\frac{1}{{{n_1}{n_2}}}\left( \sum \limits _{i = 1}^m {\lambda _{ii}^2} + \mathop {\sum {\sum {\lambda _{ij}^2}} }\limits _{1 \le i \ne j \le m}\right) \\ =\,&\frac{1}{{{n_1}{n_2}}}\sum \limits _{i = 1}^m {\lambda _{ii}^2} + \frac{1}{{{n_1}{n_2}}}\sum \limits _{i = 1}^m {\sum \limits _{j = 1}^m {\lambda _{ ij}^2}}-\frac{1}{{{n_1}{n_2}}}\sum \limits _{i = 1}^m {\lambda _{ii}^2} \\ =\,&\frac{1}{n_1n_2} \text {tr}(\Gamma _1^T D \Gamma _2 \Gamma _2^T D \Gamma _1)\\ =\,&\frac{1}{{{n_1}{n_2}}}\text {tr}{(D\Sigma _2 D\Sigma _1)}, \end{aligned}$$

It is straightforward obtain \(E\left\{ ({\bar{\textbf {Z}}^T_1}\Gamma _1^TD{{\varvec{\mu }}_2})^2\right\} = \frac{1}{{{n_1}}}{{\varvec{\mu }}^T_2}D\Sigma _1D{{\varvec{\mu }}_2}\) and \(E\left\{ ({{\varvec{\mu }}^T_1}D\Gamma _2 {\bar{\textbf {{Z}}}_2})^2\right\} = \frac{1}{{{n_2}}}{{\varvec{\mu }}^T_1}D\Sigma _2D{{\varvec{\mu }}_1}\) and hence, we have

$$\begin{aligned} E\left\{ ({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_2})^2\right\} = \frac{1}{{{n_1}{n_2}}}\text {tr}{(D\Sigma _2D\Sigma _1 )} + \frac{1}{{{n_1}}}{{\varvec{\mu }}^T_2}D\Sigma _1 D{{\varvec{\mu }}_2} + \frac{1}{{{n_2}} }{{\varvec{\mu }}^T_1}D\Sigma _2 D{{\varvec{\mu }}_1} + {({{\varvec{\mu }}^T_1}D{{\varvec{\mu }}_2})^2}, \end{aligned}$$

and

$$\begin{aligned} {E^2}({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_2}) = {({{\varvec{\mu }}^T_1}D {{\varvec{\mu }}_2 })^2}. \end{aligned}$$

From the above results, we have

$$\begin{aligned} \text{ Var }({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_2}) =\,&\frac{1}{{{n_1}{n_2}}}\text {tr}{(D\Sigma _2D\Sigma _1 )} + \frac{1}{{{n_1}}}{{\varvec{\mu }}^T_2}D\Sigma _1 D{{\varvec{\mu }}_2} + \frac{1}{{{n_2}}}{{\varvec{\mu }}^T_1}D\Sigma _2 D{{\varvec{\mu }}_1}. \end{aligned}$$
(6)

Step 4:

$$\begin{aligned} \text{ Cov }({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_1},{\bar{\textbf {X}}^T_2}D{\bar{\textbf {X}}_2}) = 0. \end{aligned}$$
(7)

Step 5: \(\text{ Cov }({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_1},{\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_2}) = E({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_1}{\bar{\textbf {X}}^T_2}D{\bar{\textbf {X}}_1})-E({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_1})E({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_2})\). Then, we rewrite

$$\begin{aligned} E({{\bar{\textbf {X}}}^T_1}D{{\bar{\textbf {X}}}_1}{{\bar{\textbf {X}}}^T_2}D{{\bar{\textbf {X}}}_1}) =\,&E({{\bar{\textbf {Z}}}^T _1}\Gamma _1^TD\Gamma _1{{\bar{\textbf {Z}}}_1}{{{\varvec{\mu }}}^T_2}D\Gamma _1{{\bar{\textbf {Z}}}_1} + {{\bar{\textbf {Z}}}^T_1 }\Gamma _1^TD\Gamma _1 {{\bar{\textbf {Z}}}_1}{{{\varvec{\mu }}}^T_2}D{{\varvec{\mu }}_1} \\&+ 2{{\bar{\textbf {Z}}}^T_1}\Gamma _1^TD{{\varvec{\mu }}_1}{{{\varvec{\mu }}}^T_2}D\Gamma _1 {{\bar{\textbf {Z}}}_1} + {{{\varvec{\mu }}}^T_1}D{{\varvec{\mu }}_1}{{{ \varvec{\mu }}}^T_2}D{{\varvec{\mu }}_1}), \end{aligned}$$

where, from Step1

$$\begin{aligned} E({\bar{\textbf {Z}}_1^T} \Gamma _1^T D\Gamma _1 {\bar{\textbf {Z}}_1}{{\varvec{\mu }}^T_2}D\Gamma _1 {\bar{\textbf {Z}}_1}) =\,&\frac{\beta _3^1}{{n_1^2}}(\Gamma _1^TD{{\varvec{\mu }}_2})\textrm{diag}(\Gamma _1^TD\Gamma _1 );\\ E({\bar{\textbf {Z}}^T_1}\Gamma _1^TD\Gamma _1 {\bar{{\textbf {Z}}}}_1 \varvec{\mu }_2^T D \varvec{\mu }_1) =\,&\frac{1}{{{n_1}}} \text {tr}(D\Sigma _1) {{\varvec{\mu }}^T_2}D{{\varvec{\mu }}_1};\\ E({\bar{\textbf {Z}}^T_1}\Gamma _1^TD{{\varvec{\mu }}_1}{{\varvec{\mu }}^T_2}D\Gamma _1 {\bar{\textbf {Z}}_1}) =\,&\frac{1}{{{n_1}}}{{\varvec{\mu }}^T_1}D\Sigma _1 D{{\varvec{\mu }}_2}. \end{aligned}$$

Then,

$$\begin{aligned} E({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_1}{\bar{\textbf {X}}^T_2}D{\bar{\textbf {X}}_1}) =\,&\frac{{{\beta _3^1}}}{{n_1^2}}( \Gamma _1^TD{{\varvec{\mu }}_2})\textrm{diag}(\Gamma _1^TD\Gamma _1) + \frac{1}{{{n_1}}}\text {tr}(D\Sigma _1 ){{\varvec{\mu }}^T_2}D{{\varvec{\mu }}_1} \\&+ \frac{2}{{{n_1}}}{{\varvec{\mu }}^T_1 }D\Sigma _1 D{{\varvec{\mu }}_2} + {{\varvec{\mu }}^T_1}D{{\varvec{\mu }}_1}{{\varvec{\mu } }^T_2}D{{\varvec{\mu }}_1} \end{aligned}$$

and

$$\begin{aligned} E({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_1})E({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_2}) = {{\varvec{\mu }}^T_1}D{{ \varvec{\mu }}_1}{{\varvec{\mu }}^T_2}D{{\varvec{\mu }}_1} + \frac{1}{{{n_1}}}\text {tr}(D\Sigma _1 ){{\varvec{\mu }}^T_2}D{{\varvec{\mu }}_1}. \end{aligned}$$

Therefore, we have

$$\begin{aligned} \text{ Cov }({\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_1},{\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_2}) = \frac{{\beta _3^1}}{{n_1^2}}(\Gamma _1^TD {{\varvec{\mu }}_2})\textrm{diag}(\Gamma _1^TD\Gamma _1 ) + \frac{2}{{{n_1}}}{{\varvec{\mu }}^T_1 }D\Sigma _1 D{{\varvec{\mu }}_2}. \end{aligned}$$
(8)

Step 6: with similar developments we have

$$\begin{aligned} \text{ Cov }({\bar{\textbf {X}}^T_2}D{\bar{\textbf {X}}_2},{\bar{\textbf {X}}^T_1}D{\bar{\textbf {X}}_2}) = \frac{{\beta _3^2}}{{n_2^2}}(\Gamma _2^TD {{\varvec{\mu }}_1})\textrm{diag}(\Gamma _2^TD\Gamma _2 ) + \frac{2}{{{n_2}}}{{\varvec{\mu }}^T_1 }D\Sigma _2 D{{\varvec{\mu }}_2}. \end{aligned}$$
(9)

Using the results of (A1.1)-(A1.6), we obtain

$$\begin{aligned} \text{ Var }({T_{n}}) =\,&\frac{\Delta }{{n_1^3}}\sum \limits _{i = 1}^m {\gamma _{1ii}^2} + \frac{2 }{{n_1^2}}\text {tr}\{(D\Sigma _1 )^2\} + \frac{4}{{{n_1}}}{{{\varvec{\mu }}}^T_1}D\Sigma _1 D{ {\varvec{\mu }}_1} + \frac{4}{{n_1^2}}(\Gamma _1^TD{{\varvec{\mu }}_1})\textrm{diag}(\Gamma _1^TD\Gamma _1 ){ \beta _3^1} \\&+ \frac{\Delta }{{n_2^3}}\sum \limits _{i = 1}^m {\gamma _{2ii}^ 2} + \frac{2}{{n_2^2}}\text {tr}\{(D\Sigma _2 )^2\} + \frac{4}{{{n_2}}}{{{\varvec{\mu }}}^T _2}D\Sigma _2 D{{\varvec{\mu }}_2} + \frac{4}{{n_2^2}}(\Gamma _2^TD{{\varvec{\mu }}_2})\textrm{diag}(\Gamma _2^TD\Gamma _2 ){\beta _3^2} \\&+ 4\left\{ \frac{1}{{{n_1}{n_2}}}\text {tr}{(D\Sigma _2D\Sigma _1 )} + \frac{1 }{{{n_1}}}{{{\varvec{\mu }}}^T_2}D\Sigma _1 D{{\varvec{\mu }}_2} + \frac{1}{{{n_2}}} {{{\varvec{\mu }}}^T_1}D\Sigma _2 D{{\varvec{\mu }}_1}\right\} \\&-4\left\{ \frac{1}{{n_1^2}}(\Gamma _1^TD{{\varvec{\mu }}_2})\textrm{diag}(\Gamma _1^T D\Gamma _1 ){\beta _3^1} + \frac{2}{{{n_1}}}{{{\varvec{\mu ^T}}}_1}D\Sigma _1 D{{\varvec{\mu }}_2}\right\} \\&-4\left\{ \frac{1}{{n_2^2}}(\Gamma _2^TD{{\varvec{\mu }}_1})\textrm{diag}(\Gamma _2^T D\Gamma _2 ){\beta _3^2} + \frac{2}{{{n_2}}}{{{\varvec{\mu }}}^T_1}D\Sigma _2 D{{\varvec{\mu }}_2}\right\} \\ =\,&\frac{\Delta }{{n_1^3}}\sum \limits _{i = 1}^m {\gamma _{1ii}^2} + \frac{\Delta }{{n_2^3}}\sum \limits _{i = 1}^m {\gamma _{2ii}^2}+ \frac{2 }{{n_1^2}}\text {tr}\{(D\Sigma _1 )^2\} + \frac{2 }{{n_2^2}}\text {tr}\{(D\Sigma _2 )^2\}+ \frac{4}{{{n_1}{n_2}}}\text {tr}{(D\Sigma _2D\Sigma _1 )} \\&+ \frac{4}{n_1} (\varvec{\mu }_1 - \varvec{\mu }_2)^T D \Sigma _1 D (\varvec{\mu }_1 - \varvec{\mu }_2) + \frac{4}{n_2} (\varvec{\mu }_1 - \varvec{\mu }_2)^T D \Sigma _2 D (\varvec{\mu }_1 - \varvec{\mu }_2)\\&+ \frac{4}{{n_1^2}}\Gamma _1^TD({{\varvec{\mu }}_1} - \varvec{\mu }_2)\textrm{diag}(\Gamma _1^TD\Gamma _1 ){ \beta _3^1} + \frac{4}{{n_2^2}}\Gamma _2^TD({{\varvec{\mu }}_2} -\varvec{\mu }_1)\textrm{diag}(\Gamma _2^TD\Gamma _2 ){\beta _3^2}. \end{aligned}$$

Under \(H_0: \varvec{\mu }_1 = \varvec{\mu }_2\) and the condition \((b_2)\), we have

$$\begin{aligned} \text{ Var }(T_n) = \frac{2 }{{n_1^2}}\text {tr}\{(D\Sigma _1 )^2\} + \frac{2 }{{n_2^2}}\text {tr}\{(D\Sigma _2 )^2\}+ \frac{4}{{{n_1}{n_2}}}\text {tr}{(D\Sigma _2D\Sigma _1 )} \{1+o(1)\}. \end{aligned}$$

In particular, when two covariance matrices are identical, it is easy to derive

$$\begin{aligned} \text{ Var }({T_{n}}) =\,&\frac{\Delta }{{n_1^3}}\sum \limits _{i = 1}^m {\gamma _{ii}^2} + \frac{2 }{{n_1^2}}\text {tr}\{(D\Sigma )^2\} + \frac{4}{{{n_1}}}{{{\varvec{\mu }}}^T_1}D\Sigma D{ {\varvec{\mu }}_1} + \frac{4}{{n_1^2}}(\Gamma ^TD{{\varvec{\mu }}_1})\textrm{diag}(\Gamma ^TD\Gamma ){ \beta _3^1} \\&+ \frac{\Delta }{{n_2^3}}\sum \limits _{i = 1}^m {\gamma _{ii}^ 2} + \frac{2}{{n_2^2}}\text {tr}\{(D\Sigma )^2\} + \frac{4}{{{n_2}}}{{{\varvec{\mu }}}^T _2}D\Sigma D{{\varvec{\mu }}_2} + \frac{4}{{n_2^2}}(\Gamma ^TD{{\varvec{\mu }}_2})\textrm{diag}(\Gamma ^TD\Gamma ){\beta _3^2} \\&+ 4\left\{ \frac{1}{{{n_1}{n_2}}}\text {tr}\{(D\Sigma )^2\} + \frac{1 }{{{n_1}}}{{{\varvec{\mu }}}^T_2}D\Sigma D{{\varvec{\mu }}_2} + \frac{1}{{{n_2}}} {{{\varvec{\mu }}}^T_1}D\Sigma D{{\varvec{\mu }}_1}\right\} \\&-4\left\{ \frac{1}{{n_1^2}}(\Gamma ^TD{{\varvec{\mu }}_2})\textrm{diag}(\Gamma ^T D\Gamma ){\beta _3^1} + \frac{2}{{{n_1}}}{{{\varvec{\mu ^T}}}_1}D\Sigma D{{\varvec{\mu }}_2}\right\} \\&-4\left\{ \frac{1}{{n_2^2}}(\Gamma ^TD{{\varvec{\mu }}_1})\textrm{diag}(\Gamma ^T D\Gamma ){\beta _3^2} + \frac{2}{{{n_2}}}{{{\varvec{\mu }}}^T_1}D\Sigma D{{\varvec{\mu }}_2}\right\} . \end{aligned}$$

Under the null hypothesis and the assumption \((b_1)\), the above equation is simplified as

$$\begin{aligned} \text{ Var }({T_{n}}) = 2{\tau ^2}\text {tr}\{(D\Sigma )^2\}\{1 + o(1)\}. \end{aligned}$$

\(\square\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, C., Cui, X. & Kenne Pagui, E. Two-sample mean vector projection test in high-dimensional data. Comput Stat 39, 1061–1091 (2024). https://doi.org/10.1007/s00180-023-01374-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-023-01374-0

Keywords

Navigation