Abstract
This article is focused on the problem to measure and test the conditional mean dependence of a response variable on a predictor variable. A local influence detection approach is developed combining with the martingale difference divergence (MDD) metric, and an efficient wild bootstrap implementation is given. The obtained new metric of the conditional mean dependence holds the merits of MDD, while it is more sensitive than the original one, and leads to a powerful test to nonlinear relationships. It is shown by simulations that the proposed test can achieve higher power for general conditional mean dependence relationships even in high-dimensional settings. Theoretical asymptotic properties of the local influence test statistic are given, and a real data analysis is also presented for further illustration. The localization idea could be combined with other conditional mean dependence metrics.
Similar content being viewed by others
References
Arcones, M.A., Giné, E.: Limit theorems for U-processes. Ann. Probab. 21(3), 1494–1542 (1993)
Fang, L., Yuan, Q., Ye, C., Yin, X.: High-dimensional variable screening via conditional martingale difference divergence. arXiv preprint arXiv:2206.11944 (2022)
Free, S., O’Higgins, P., Maudgil, D., Dryden, I., Lemieux, L., Fish, D., Shorvon, S.: Landmark-based morphometrics of the normal adult brain using mri. Neuroimage 13(5), 801–813 (2001)
Kendall, D.G.: Shape manifolds, procrustean metrics, and complex projective spaces. Bull. Lond. Math. Soc. 16(2), 81–121 (1984)
Kosorok, M.R.: Introduction to empirical processes and semiparametric inference. Springer, New York (2007)
Lai, T., Zhang, Z., Wang, Y.: A kernel-based measure for conditional mean dependence. Comput. Stat. Data Anal. 160, 107246 (2021)
Lee, C., Zhang, X., Shao, X.: Testing conditional mean independence for functional data. Biometrika 107(2), 331–346 (2020)
Li, R., Xu, K., Zhou, Y., Zhu, L.: Testing the effects of high-dimensional covariates via aggregating cumulative covariances. J. Am. Stat. Assoc. (2022). https://doi.org/10.1080/01621459.2022.2044334
Lyons, R.: Distance covariance in metric spaces. Ann. Probab. 41(5), 3284–3305 (2013)
Nolan, D., Pollard, D.: Functional limit theorems for U-processes. Ann. Probab. 16(3), 1291–1298 (1988)
Pan, W., Wang, X., Zhang, H., Zhu, H., Zhu, J.: Ball covariance: a generic measure of dependence in banach space. J. Am. Stat. Assoc. 115(529), 307–317 (2020)
Park, T., Shao, X., Yao, S.: Partial martingale difference correlation. Electron. J. Stat. 9(1), 1492–1517 (2015)
Shao, X., Zhang, J.: Martingale difference correlation and its use in high-dimensional variable screening. J. Am. Stat. Assoc. 109(507), 1302–1318 (2014)
Székely, G.J., Rizzo, M.L., Bakirov, N.K.B.: Measuring and testing dependence by correlation of distances. Ann. Stat. 35(6), 2769–2794 (2007)
Vaart, A.W.V.D., Wellner, J.A.: Weak convergence and empirical processes. Springer, New York (1996)
Victor, H., Victor, H., la Peña, D., de la Peña, V., Giné, E.: Decoupling: from dependence to independence. Springer, New York (1999)
Wang, G., Zhu, K., Shao, X.: Testing for the martingale difference hypothesis in multivariate time series models. J. Bus. Econ. Stat. 40, 1–15 (2021)
Zhang, X., Yao, S., Shao, X.: Conditional mean and quantile dependence testing in high dimension. Ann. Stat. 46(1), 219–246 (2018)
Zhou, T., Zhu, L., Xu, C., Li, R.: Model-free forward screening via cumulative divergence. J. Am. Stat. Assoc. 115(531), 1393–1405 (2020)
Zhu, J., Pan, W., Zheng, W., Wang, X.: Ball: an r package for detecting distribution difference and association in metric spaces. J. Stat. Softw. 97(1), 1–31 (2021)
Acknowledgements
The authors are very grateful to the Editors and two anonymous referees for their helpful suggestions. The research was partly supported by the Project of Improving the Basic Scientific Research Ability of Young and Middle-aged College Teachers in Guangxi (2023KY0058) and the National Natural Science Foundation of China (No. 12271014 and No.11971045).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
6 Appendix
6 Appendix
Proof of Proposition 2.1
Note that \(\Psi (Y, Y^{\prime })=\langle Y - \mu , Y^{\prime }-\mu \rangle \), where \(\mu = E(Y)\). Therefore,
The third equation holds since
We complete the proof. \(\square \)
Proof of Theorem 2.5
Since \(E(Y\vert X)=E(Y)\) almost surely, we have, for any \(t\in [0, 1]\),
The last equation holds since \(E\{\Psi (Y, Y^{\prime })\}=0\).
The second assertion of the theorem is clear since \( L_{X, X^{\prime }}(1)=1 \) when \( t=1 \), which leads to \( \textrm{LMDD}_{Y\vert X}(1)= \textrm{FMDD}_{\rho }(Y\vert X)\). \(\square \)
Proof of Theorem 2.6
For any \(\delta \) with understanding that \(t+\delta \in [0, 1]\), we have
Next we prove that \(E[\Phi _{\rho }(X, X^{\prime })\Psi (Y, Y^{\prime })\{L_{X, X^{\prime }}(t+\delta )-L_{X, X^{\prime }}(t)\}]\) converges to zero as \(\delta \rightarrow 0\).
By the conditions that \( \rho (X, X^{\prime }) \) is a continuous random variable and that g is continuous, \(L_{X, X^{\prime }}(t+\delta )\) converges almost surely to \(L_{X, X^{\prime }}(t)\) as \( \delta \rightarrow 0 \); therefore, \(\Phi _{\rho }(X, X^{\prime })\Psi (Y, Y^{\prime })(L_{X, X^{\prime }}(t+\delta )-L_{X, X^{\prime }}(t))\rightarrow 0\) almost surely. Since \( \vert \Phi _{\rho }(X, X^{\prime })\Psi (Y, Y^{\prime })(L_{X, X^{\prime }}(t+\delta )-L_{X, X^{\prime }}(t)) \vert \le 2 \vert \Phi _{\rho }(X, X^{\prime })\Psi (Y, Y^{\prime }) \vert \), if \(E[ \vert \Phi _{\rho }(X, X^{\prime })\Psi (Y, Y^{\prime }) \vert ] < \infty \), by the dominated convergence theorem, \(E \vert \Phi _{\rho }(X, X^{\prime })\Psi (Y, Y^{\prime })(L_{X, X^{\prime }}(t+\delta )-L_{X, X^{\prime }}(t)) \vert \) converges to zero as \(\delta \rightarrow 0\). Then with this result and the inequality
the conclusion follows.
We verify the condition \(E[ \vert \Phi _{\rho }(X, X^{\prime })\Psi (Y, Y^{\prime }) \vert ] < \infty \). By the conditions of Theorem 2.6, there exists an \(o\in {\mathcal {X}}\) such that \(E[\rho (o, X)] <\infty \) and \(E \vert Y \vert <\infty \). Decomposing \(\Phi _{\rho }(X, X^{\prime })\Psi (Y, Y^{\prime })\), we have
where \(\mu =E(Y)\) and \(E_{X^{\prime \prime }}\) means taking expectation with respect to \(X^{\prime \prime }\). It can be verified that the expectation of the absolute value of each term in the above display is bounded. We only show the term \(\rho (X, X^{\prime })\langle Y, Y^{\prime }\rangle \), and the others can be done by similar arguments. By direct computations,
which completes the proof. \(\square \)
Proof of Theorem 2.10
We will drop the argument t and simply denote \({\tilde{C}}_{ij}(t)\) as \(c_{ij}\). The \(\textrm{LMDD}_{n, Y\vert X}(t)\) can be decomposed into 9 terms as follows,
where
Meanwhile, the \(\textrm{U}\)-process \(U_n(t)\) defined in (2.9) can be rewritten as
where and hereinafter \(\sum _{(i_1,\dots ,i_k)}\) means the summation over all possible k-tuples satisfying that \(i_1,\dots ,i_k\) are different from each other.
It can be verified that
We only verify \(\sup _{t\in [0, 1]}\{ \vert J_4-J^*_4 \vert \}\xrightarrow [ ]{a.s.} 0\), and the others can be done in a similar way. By direct calculations,
It is easy to show that \(N_i\xrightarrow [ ]{a.s.}0\), \(i=1, \cdots , 4\). We only show \(N_1\xrightarrow [ ]{a.s.}0\). To this end, observe that
and that, by the law of large number for \(\textrm{U}\)-statistics, \(\frac{1}{n(n-1)(n-2)}\sum _{i, j, k}a_{ij}b_{si}\xrightarrow []{a.s.}E[a_{12}b_{31}]\) and \(\frac{1}{n(n-1)}\sum _{i\ne j}a_{ij}b_{ji}\xrightarrow []{a.s.}E[a_{12}b_{21}]\) provided that \(E[ \vert a_{12}b_{31} \vert ]<\infty \), \(E[ \vert a_{12}b_{21} \vert ]<\infty \). And this is the case since that
and
Therefore, \(N_1\xrightarrow []{a.s.}0\).
By (6.2), we have \(\sup _{t\in [0, 1]}\{ \vert \textrm{LMDD}_{n,Y\vert X}(t)-U_n(t) \vert \}\xrightarrow []{a.s.}0\). Thus, to proof that \(\sup _{t\in [0, 1]}\{ \vert \textrm{LMDD}_{n,Y\vert X}(t)-\textrm{LMDD}_{Y\vert X}(t) \vert \}\xrightarrow []{a.s.}0\) we only need to show that \(\sup _{t\in [0, 1]}\{ \vert U_n(t)-\textrm{LMDD}_{Y\vert X}(t) \vert \}\xrightarrow []{a.s.}0\). Since \(U_n(t)\) is an \(\textrm{U}\)-process and \(E[U_n(t)]=\textrm{LMDD}_{Y\vert X}(t)\), we can achieve the goal by applying the theory of \(\textrm{U}\)-processes. Specifically, by Corollary 3.3 of Arcones and Giné [1], we only need to verify that the function class \({\mathcal {F}}=\{h_t(w_1, \dots , w_5): t \in [0, 1]\}\) is an image-admissible Suslin (the definition can be seen on page 138 of Victor et al. [16]) VC- class with an envelop function \(G(w_1, \dots , w_5)\) satisfying \(E\{G(W_1, \dots , W_5)\}<\infty \), where \(W_i=(X_i, Y_i)\), \(w_i=(x_i, y_i)\) and
We verify these conditions in the following. Choose the envelop function as \(G(w_1, \cdots , w_5)=|\left\{ \rho (x_1, x_2)-\rho (x_1, x_3)\right\} \left\{ \langle y_1, y_2\rangle -\langle y_1,y_4\rangle -\langle y_2, y_5\rangle +\langle y_4, y_5\rangle \right\} |\). By the conditions of the theorem, we have
where the last inequality is obtained by using similar arguments as in (6.3) and (6.4).
Next, we show the \({\mathcal {F}}\) is an image-admissible Suslin VC-class. We first show that the set \({\mathcal {C}}=\{C_t: t\in [0, 1]\}\) is a VC-class, \(C_t=\{(x, x^{\prime }): g(\rho (x, x^{\prime }))\le t\}\). Suppose that \(p_1=(x_1, x_1^{\prime })\), \(p_2=(x_2, x_2^{\prime })\) are two different points in \({\mathcal {X}}\times {\mathcal {X}}\) and \({\mathcal {C}}\) shatters the sets of these two points. Then there exist \(0\le t_1,t_2\le 1\) such that \(p_1\in C_{t_1}, p_2\not \in C_{t_1}\), \(p_2\in C_{t_2}, p_1\not \in C_{t_2}\). Without lose of generality, assume \(0 \le t_1 < t_2 \le 1\). By the form of \(C_t\), we have \(C_{t_1}\subset C_{t_2}\). Then \(p_1\in C_{1}\) which is contradict to the fact that \(p_1\not \in C_{1}\). Hence, \({\mathcal {C}}\) cannot shatter the set \(\{p_1, p_2\}\) and \({\mathcal {C}}\) is a VC-class with VC-index 2. By Lemma 9.8 of Kosorok [5], the function class \(\{I(g(\rho (X, X^{\prime }))\le t): t \in [0, 1]\}\) is a VC-class. By Lemma 9.9 (vi) of Kosorok [5], \({\mathcal {F}}\) is a VC-class. Since the class of kernels \( {\mathcal {F}} \) is parametrized by [0, 1] and these kernels are jointly measurable in \( w_1, \cdots , w_5, t \), this class is image-admissible Suslin. Therefore, \({\mathcal {F}}\) is an image-admissible Suslin VC-class. We complete the proof. \(\square \)
Proof of Theorem 2.11
Observe that
By straightforward computations, we have
Under \(H_0\), \(E(Y\vert X)=E(Y)\) almost surely; therefore, we have
It follows that
Similarly, we have
Since \(E[b_{12}]=E[b_{13}]=E[b_{14}]=E[b_{23}]=E[b_{24}]=E[b_{34}]=E[b_{45}]\), we have
where \(U_n(t)\) is the \(\textrm{U}\)-process defined in equation (2.9). Based on this result, we can utilize the theory of \(\textrm{U}\)-processes to study the convergence of \(n\textrm{LMDD}_{n, Y\vert X}(t)\). We will show that for every \(t\in [0, 1]\) the kernel of \(U_n(t)\) is degenerate if \(H_0\) is true.
Recall that the kernel of \(U_n(t)\) is
The corresponding symmetrical kernel is
in which the sum extends over all permutations \( (\pi (1), \dots , \pi (5)) \) of \( \{1, 2, 3, 4, 5\} \). Under \(H_0\), we have
Similarly, we have \(E\left( h_t(W_1, \dots , W_5) \vert W_i=(x, y)\right) =0\) for \(i=2, \dots , 5\). This means \(E\left( {\bar{h}}(W_1, \dots , W_5) \vert W_1=(x, y)\right) =0\), which indicates that \(U_n(t)\) is degenerate for every \(t\in [0, 1]\). By the proof of Theorem 2.10, we know that the function class
is an image-admissible Suslin \(\textrm{VC}\)-class and \(G(w_1, \cdots , w_5)\) is an envelop function of \({\mathcal {F}}\). Since X and Y have finite second moments, under \( H_0 \), we have
By Corollary 5.7 of Arcones and Giné [1],
By Slutsky lemma, we have
which completes the proof. \(\square \)
Proof of Corollary 2.13
By the continuous mapping theorem, the conclusion follows. \(\square \)
Proof of Theorem 2.14
We first show that \( \sqrt{n}\textrm{LMDD}_{n, Y\vert X}(t)=\sqrt{n}U_n(t) +o_{p}(1)\), where \( U_n(t) \) is the U-process as before. Similar to the proof of Theorem 2.11, we have
Using the similar arguments to the proof of Theorem 2.11, the second term of the right-hand side of equation (6.6) can be expressed as
Applying similar calculations for the rest terms, we have
Therefore, in order to proof the conclusion of Theorem 2.14, it is suffice to show that
For every \( t\in [0, 1] \), \(U_n(t)\) is a non-degenerate \(\textrm{U}\)-statistic (this can be obtained by similar techniques as in the proof of Lee et al. [7]). Thus, we have \(\sqrt{n}(U_n(t)-E[U_n(t)])\xrightarrow {d}N(0, \sigma (t))\), where \( N(0, \sigma (t)) \) is a normal distribution with mean zero and variance \( \sigma (t) \). Combining this result with the fact that the kernel function class of \( U_n(t) \) is an image-admissible Suslin VC-class, by Theorem 5.3.3 of Victor et al. [16], we have
This completes the proof. \(\square \)
Proof of Theorem 3.2
For every fixed \(t\in T\), by Theorem 4 of Lee et al. [7], we have
here \( d^* \) denotes convergence in distribution given \( Z_1, Z_2, \dots \). Using the similar argument therein, we can obtain the finite marginal convergence, that is,
Further, by the conditions that \( (T, \nu ) \) is totally bounded and \( nU^*_n(t) \) is asymptotically uniformly \( \nu \)-equicontinuous in probability, we have, by Theorems 1.5.4 and 1.5.7 of Vaart and Wellner [15],
We complete the proof. \(\square \)
Proof of Theorem 3.3
Recall that
Denote \(\phi _{ij}=\Phi _{\rho }(X_i, X_j)\), \(\psi _{ij}=\Psi (Y_i, Y_j)\) and \( {\mathcal {G}}_n^*(t)=-\frac{1}{n-1}\sum _{i\ne j}^{n}\varepsilon _i\phi _{ij}\psi _{ij}c_{ij}\varepsilon _j\). Our proof follows from the two steps:
-
(i)
\( {\mathcal {K}}_n^*(t) = {\mathcal {G}}_n^*(t)+o^*_{p}(1) ~~a.s.; \)
-
(ii)
\({\mathcal {G}}^{*}_n(t)\rightarrow _{{\mathcal {L}}^*} {\mathcal {K}}(t), \ t\in [0, 1]\).
We first prove (i). It suffices to verify that
Using the same arguments in the proof of Theorem 5 of Lee et al. [7], the above assertion can be obtained. Since the details is almost the same, we omit the verification here.
Next we prove (ii). By Theorem 3.2, it suffices to show that \( {\mathcal {G}} ^*_n(t)\) is equicontinuous in probability (with respect to Euclidean metric on [0, 1]). By Markov inequality, it suffices to have
as \( \delta \rightarrow 0 \), here the expectation is taken over \(\varepsilon _i\) given \( W_1, W_2, \dots \). To this end, note that, for every \( s, t\in [0, 1] \),
by the law of large number for \(\textrm{U}\)-statistics, provided that
Using the similar argument of the proof of Theorem 2.6, under the conditions of the theorem,
And the condition (6.7) can be verified easily by similar argument as the previous proofs. An application of the maximal inequality (see Theorem 2.2.4 of Vaart and Wellner [15]) gives (ii).
By (i) and (ii), \({\mathcal {K}}^{*}_n(t)\rightarrow _{{\mathcal {L}}^*} {\mathcal {K}}(t), t\in [0, 1]\). The second assertion follows by continuous mapping theorem. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lai, T., Zhang, Z. Local Influence Detection of Conditional Mean Dependence. Commun. Math. Stat. (2023). https://doi.org/10.1007/s40304-023-00365-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40304-023-00365-3