Abstract
We propose a unified framework for sufficient dimension reduction through independence and conditional mean independence measures. When the interest is the conditional distribution of Y given X, α-distance covariance is used to recover the central space. If the focus is the conditional mean of Y given X, the central mean space can be estimated through α-martingale difference divergence. Compared with existing estimators based on the distance covariance which recover the central space, the new estimators are more accurate when the target is the central mean space. By choosing α smaller than one, the new estimators outperform existing estimators when the predictor distribution is heavy-tailed and when there is data contamination.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
X. Chen, R.D. Cook, C. Zou, Diagnostic studies in sufficient dimension reduction. Biometrika 102, 545–558 (2015)
X. Chen, W. Sheng, X. Yin, Efficient sparse estimate of sufficient dimension reduction in high dimension. Technometrics 60, 161–168 (2018)
R.D. Cook, Regression Graphics: Ideas for Studying Regressions Through Graphics (Wiley, New York, 1998)
R.D. Cook, L. Forzani, Likelihood-based sufficient dimension reduction. J. Am. Stat. Assoc. 104, 197–208 (2009)
R.D. Cook, B. Li, Dimension reduction for the conditional mean. Ann. Stat. 30, 455–474 (2002)
R.D. Cook, S. Weisberg, Discussion of sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 86, 28–33 (1991)
Y. Dong, A note on moment-based sufficient dimension reduction estimators. Stat. Interface 9, 141–145 (2016)
Y. Dong, A brief review of linear sufficient dimension reduction through optimization. J. Stat. Plann. Inference 211, 154–161 (2021)
Y. Dong, B. Li, Dimension reduction for non-elliptically distributed predictors: second order methods. Biometrika 97, 279–294 (2010)
Y. Dong, Q. Xia, C. Tang, Z. Li, On sufficient dimension reduction with missing responses through estimating equations. Comput. Stat. Data Anal. 126, 67–77 (2018)
B. Li, Sufficient Dimension Reduction: Methods and Applications with R (CRC Press, 2018)
B. Li, Y. Dong, Dimension reduction for non-elliptically distributed predictors. Ann. Stat. 37, 1272–1298 (2009)
B. Li, S. Wang, On directional regression for dimension reduction. J. Am. Stat. Assoc. 479, 997–1008 (2007)
K.C. Li, Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 86, 316–327 (1991)
K.C. Li, On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. J. Am. Stat. Assoc. 87, 1025–1039 (1992)
K.C. Li, N. Duan, Regression analysis under link violation. Ann. Stat. 17, 1009–1052 (1989)
Y. Ma, L.P. Zhu, A semiparametric approach to dimension reduction. J. Am. Stat. Assoc. 107, 168–179 (2012)
Y. Ma, L. Zhu, A review on dimension reduction. Int. Stat. Rev. 81, 134–150 (2013)
Y. Ma, L.P. Zhu, On estimation efficiency of the central mean subspace. J. R. Stat. Soc. Ser. B 76, 885–901 (2014)
X. Shao, J. Zhang, Martingale difference correlation and its use in high dimensional variable screening. J. Am. Stat. Assoc. 109, 1302–1318 (2014)
W. Sheng, X. Yin, Direction estimation in single-index models via distance covariance. J. Multivar. Anal. 122, 148–161 (2013)
W. Sheng, X. Yin, Sufficient dimension reduction via distance covariance. J. Comput. Graph. Stat. 25, 91–104 (2016)
G.J. Székely, M.L. Rizzo, Brownian distance covariance. Ann. Appl. Stat. 3, 1236–1265 (2009)
G.J. Székely, M.L. Rizzo, Energy statistics: a class of statistics based on distances. J. Stat. Plann. Inference 143, 1249–1272 (2013)
G.J. Székely, M.L. Rizzo, N.K. Bakirov, Measuring and testing dependence by correlation of distances. Ann. Stat. 35, 2769–2794 (2007)
Y. Xia, H. Tong, W. Li, L. Zhu, An adaptive estimation of dimension reduction space. J. R. Stat. Soc. Ser. B 64, 363–410 (2002)
Y. Zhang, J. Liu, Y. Wu, X. Fang, A martingale-difference-divergence-based estimation of central mean subspace. Stat. Interface 12, 489–500 (2019)
Y. Zhu, P. Zeng, Fourier methods for estimating the central subspace and the central mean subspace in regression. J. Am. Stat. Assoc. 101, 1638–1651 (2006)
L.P. Zhu, L.X. Zhu, Z.H. Feng, Dimension reduction in regressions through cumulative slicing estimation. J. Am. Stat. Assoc. 105, 1455–1466 (2010)
Acknowledgements
The author sincerely thanks the editor and two anonymous referees for useful comments that led to a much improved presentation of the paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Proof of Proposition 1
For part (i), since the weight function w q,p(t, s) is positive, Φα(V, U) = 0 if and only if f V,U(t, s) = f V(t)f U(s) for almost all s and t. Thus as long as it is well defined, Φα(V, U) is zero if and only if V and U are independent. The proof of part (ii) follows directly from the proof of Theorem 7 in Székely and Rizzo (2009) and is thus omitted. □
The following Lemma is needed before we prove Proposition 2. Its proof follows directly from the proof of Theorem 3 in Székely and Rizzo (2009) and is thus omitted.
Lemma 1
For random vectors \(V_1, V_2\in {\mathbb R}^q\) , and \(U_1,U_2\in {\mathbb R}^p\) , assume \(E(|U_1|{ }_p^\alpha )<\infty \), \(E(|U_2|{ }_p^\alpha )<\infty \), \(E(|V_1|{ }_q^\alpha )<\infty \) , and \(E(|V_2|{ }_q^\alpha )<\infty \) . Denote Φ α as the square root of \(\Phi _\alpha ^2\) . If \([V_1^T, U_1^T]^T\) is independent of \([V_2^T, U_2^T]^T\) , then
Equality holds if and only if U 1 and V 1 are both constants, or U 2 and V 2 are both constants, or U 1 , U 2 , V 1 , V 2 are mutually independent.
Proof of Proposition 2
We follow the proof of Proposition 2 in Sheng and Yin (2016). For any \(\beta \in {\mathbb R}^{p\times d}\), there exists rotation matrix \(M\in {\mathbb R}^{d\times d}\) such that βM = [β a, β b], where Span(β a) ⊆Span(β 0) and Span(β b) ⊆Span(β 0)⊥, where Span(β 0)⊥ denotes the orthogonal space of Span(β 0). From (1), we have . Together with , we have . It follows that . Let U 1 = [X T β a, 0]T, U 2 = [0, X T β b]T, V 1 = Y , and V 2 = 0. Then \([V_1, U_1^T]^T\) is independent of \([V_2, U_2^T]^T\). According to Lemma 1,
On the other hand, M being a rotation matrix implies that MM T = M T M = I d and |M T β T(X − X′)|d = |β T(X − X′)|d. It follows from Proposition 1 that
Similarly, Span(β a) ⊆Span(β 0) implies \(|\beta _a^T (X-X')|{ }_{d_a}\leq |\beta _0^T (X-X')|{ }_d\), where d a is the number of columns for β a. Apply Proposition 1 and we have
(11), (12), and (13) together lead to \(\Phi _\alpha (Y,\beta ^T X)\leq \Phi _\alpha (Y,\beta _0^T X)\). We get equality if and only if Span(β a) = Span(β 0), in which case β b vanishes. Since β ∗ maximizes \(\Phi _\alpha ^2(Y,\beta ^T X)\) over \(\beta \in {\mathbb R}^{p\times d}\), we must have \(\mathrm {Span}(\beta ^*)=\mathrm {Span}(\beta _0)={\mathcal S}_{Y|X}\). □
Proof of Proposition 3
The proof follows directly from the proof of Proposition 3 in Sheng and Yin (2016) and is thus omitted. □
Proof of Proposition 4
For part (i), note that Ξα(V ∣U) = 0 if and only if g V,U(s) = g V f U(s) for almost all s. Thus Ξα(V ∣U) = 0 if and only if E(V ) = E(V ∣U) almost surely. For part (ii), the proof of Theorem 1 in Shao and Zhang (2014) can be followed directly. □
Proof of Proposition 5
Denote \(\eta _{0\perp }\in {\mathbb R}^{p\times (p-r)}\) as a basis for the orthogonal space of Span(η 0). We choose η 0 and η 0⊥ such that \(\eta _0^T\Sigma \eta _0=I_r\), \(\eta _{0\perp }^T\Sigma \eta _0=0\), and \(\eta _{0\perp }^T\Sigma \eta _{0\perp }=I_{p-r}\). For any \(\eta \in {\mathbb R}^{p\times r}\) satisfying η T Ση = I r, there exists \(A\in {\mathbb R}^{r\times r}\) and \(C\in {\mathbb R}^{(p-r)\times r}\) such that η = η 0 A + η 0⊥ C. Then
For \(s\in {\mathbb R}^r\), because , we have
Note that (2) implies \(E(Y\mid X)=E(Y\mid \eta _0^T X)\), and we have
(15), (16), and the definition of \(\Xi _\alpha ^2\) in (7) together lead to
On the other hand, it follows from (14) that
(17), (18), and equation (8) from Proposition 4 together lead to
We get equality if and only if A T A = I r, in which case C vanishes and η = η 0 A. Since η ∗ maximizes \(\Xi _\alpha ^2(Y\mid \eta ^T X)\) over \(\eta \in {\mathbb R}^{p\times r}\), we must have \(\mathrm {Span}(\eta ^*)=\mathrm {Span}(\eta _0)={\mathcal S}_{E(Y|X)}\). □
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Dong, Y. (2021). Sufficient Dimension Reduction Through Independence and Conditional Mean Independence Measures. In: Bura, E., Li, B. (eds) Festschrift in Honor of R. Dennis Cook. Springer, Cham. https://doi.org/10.1007/978-3-030-69009-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-69009-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69008-3
Online ISBN: 978-3-030-69009-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)