Abstract
Dimension reduction is a useful technique when working with high-dimensional predictors, as meaningful data visualizations and graphical analyses using fewer predictors can be achieved. We propose a new non-iterative and robust against extreme values estimation of the effective dimension reduction (e.d.r) subspace, which is based on the estimation of the conditional median function of the predictors given the response. The existing literature on robust estimation of the e.d.r subspace relies on iterative algorithms, such as the composite quantile minimum average variance estimation and the sliced regression. Compared with these existing robust dimension reduction methods, the new method avoids iterations by directly estimating the e.d.r subspace and has better finite sample performance. It is shown that the inverse Tukey and Oja median regression curve falls into the e.d.r subspace, and that its directions can be estimated \(\sqrt{n}\)-consistently.
Similar content being viewed by others
References
Arcones M, Chen ZQ, Gine E (1994) Estimators related to U-processes with applications to multivariate medians: asymptotic normality. Ann Stat 22(3):1460–1477
Bai Z-D, He X (1999) Asymptotic distributions of the maximal depth estimators for regression and multivariate location. Ann Stat 27:1617–1637
Bura E, Cook RD (2001) Extending sliced inverse regression: the weighted chi-squared test. J Am Stat Assoc 96(455):996–1003
Christou E, Akritas MG (2016) Single index quantile regression for heteroscedastic data. J Multivar Anal 150:169–182
Cook RD (1994) Using dimension-reduction subspaces to identify important inputs in models of physical systems. In: Proceedings of the Section on Physical and Engineering Sciences. American Statistical Association, Alexandria, VA, pp 18–25
Cook RD (1996) Graphics for regressions with a binary response. J Am Stat Assoc 91:983–992
Cook RD (1998) Regression graphics: ideas for studying regressions through graphics. Wiley, New York
Cook RD, Nachtsheim CJ (1994) Reweighting to achieve elliptically contoured covariates in regression. J Am Stat Assoc 89:592–599
Cook RD, Weisberg S (1991) Comment on sliced inverse regression for dimension reduction. J Am Stat Assoc 86:328–332
Davis C, Kahan WM (1970) The rotation of eigenvectors by a perturbation III. SIAM J Numer Anal 7(1):1–46
Diaconis P, Freedman D (1984) Asymptotics of graphical projection pursuit. Ann Stat 12:793–815
Dong Y, Li B (2010) Dimension reduction for non-elliptically distributed predictors: second-order methods. Biometrika 97:279–294
Donoho DL, Gasko M (1992) Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann Stat 20(4):1803–1827
Eaton ML, Tyler D (1994) The asymptotic distribution of singular values with applications to canonical correlations and correspondence analysis. J Multivar Anal 50:238–264
Hayford J (1902) What is the center of an area or the center of a population. J Am Stat Assoc 8(58):47–58
He X, Wang L, Hong HG (2013) Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Stat 41(1):342–369
Hettmansperger TP, Mottonen J, Oja H (1997) Affine equivariant multivariate one-sample signed-rank tests. J Am Stat Assoc 92(440):1591–1600
Hristache M, Juditsky A, Polzehl J, Spokoiny V (2001) Structure adaptive approach for dimension reduction. Ann Stat 29(6):1537–1566
Kong E, Xia Y (2012) A single-index quantile regression model and its estimator. Econ Theory 28:730–768
Kong E, Xia Y (2014) An adaptive composite quantile approach to dimension reduction. Ann Stat 42(4):1657–1688
Li K-C (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86(414):316–327
Li K-C (1992) On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s Lemma. J Am Stat Assoc 87(420):1025–1039
Li B, Cook RD, Chiaromonte F (2003) Dimension reduction for the conditional mean in regressions with categorical predictors. Ann Stat 31:1636–1668
Li B, Dong Y (2009) Dimension reduction for nonelliptically distributed predictors. Ann Stat 37:1272–1298
Li B, Wang S (2007) On directional regression for dimension reduction. J Am Stat Assoc 102(479):997–1008
Li R-Z, Fang K-T, Zhu L-X (1997) Some Q-Q probability plots to test spherical and elliptical symmetry. J Comput Graph Stat 6(4):350–435
Li B, Zha H, Chiaromonte F (2005) Contour regression: a general approach to dimension reduction. Ann Stat 33(4):1580–1616
Liu X, Luo S, Zuo Y (2017) Some results on the computing of Tukey’s halfspace median. Stat Pap. https://doi.org/10.1007/s00362-017-0941-5
Lue H-H (2004) Principal Hessian directions for regression with measurement error. Biometrika 91(2):409–423
Luo W, Li B, Yin X (2014) On efficient dimension reduction with respect to a statistical functional of interest. Ann Stat 42(1):382–412
Ma Y, Zhu L (2012) A semiparametric approach to dimension reduction. J Am Stat Assoc 107(497):168–179
Massé J-C (2002) Asymptotics for the Tukey median. J Multivar Anal 81:286–300
Nolan D (1999) On min-max majority and deepest points. Stat Probab Lett 43:325–333
Oja H (1983) Descriptive statistics for multivariate distributions. Stat Probab Lett 1(6):327–332
Ronkainen T, Oja H, Orponen P (2003) Computation of the multivariate oja median. In: Dutter R, Filzmoser P, Gather U, Rousseeuw PJ (eds) Developments in robust statistics, Proceedings of the International Conference on Robust Statistics (ICORS’01, Stift Vorau, Austria, July 2001). Springer, Berlin, pp 344–359
Shen G (2008) Asymptotics of Oja median estimate. Stat Probab Lett 78:2137–2141
Tukey JW (1975) Mathematics and the picturing of data. In: Proceedings of the International Congress of Mathematics, vol. 2. Vancouver, Canada, pp. 523–531
Wang H, Xia Y (2008) Sliced regression for dimension reduction. J Am Stat Assoc 103:811–821
Wang G, Zhou J, Wu W, Chen M (2017) Robust functional sliced inverse regression. Stat Pap 58(1):227–245
Weber A (1909) Über den standort der industrien. Mohr, Tübingen
Xia Y (2007) A constructive approach to the estimation of dimension reduction directions. Ann Stat 35(6):2654–2690
Xia Y, Tong H, Li WK, Zhu L-X (2002) An adaptive estimation of dimension reduction space. J R Stat Soc 64:363–410
Yin X, Cook RD (2002) Dimension reduction for the conditional \(k\)th moment in regression. J R Stat Soc 62:159–175
Yin X, Li B (2011) Sufficient dimension reduction based on an ensemble of minimum average variance estimators. Ann Stat 39:3392–3416
Yin X, Li B, Cook RD (2008) Successive direction extraction for estimating the central subspace in a multiple-index regression. J Multivar Anal 99:1733–1757
Zhao X, Zhou X (2017) Partial sufficient dimension reduction on additive rates model for recurrent event data with high-dimensional covariates. Stat Pap. https://doi.org/10.1007/s00362-017-0949-x
Zhu Y, Zeng P (2006) Fourier methods for estimating the central subspace and the central mean subspace in regression. J Am Stat Assoc 101:1638–1651
Zhu L-P, Zhu L-X, Feng Z-H (2010) Dimension reduction in regression through cumulative slicing estimation. J Am Stat Assoc 105(492):1455–1466
Acknowledgements
We would like to thank Professors Michael Akritas and Bing Li from the Pennsylvania State University for useful discussions regarding the presented paper. We are also grateful to the referees for their valuable remarks and careful reading of a previous version of the manuscript, which helped to improve the quality of the paper. This work was supported, in part, by funds provided by the University of North Carolina at Charlotte.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Theorem 2.4: Without loss of generality, assume that \(med(\mathbf {X})=\mathbf {0}\). We will first prove the result for \(\varvec{\Sigma }_{\mathbf {xx}}=\mathbf {I}_{p}\) (spherical distribution), and then we will extend it for a general \(\varvec{\Sigma }_{\mathbf {xx}}\) (elliptical distribution).
Let \(\varOmega _{Y}\) be the sample space of Y, J be a subset of \(\varOmega _{Y}\), and define \(S_{1}(\mathbf {m})=dep_{\mathbf {X}| Y \in J}(\mathbf {m})\) and \(S_{2}(\mathbf {m})=E \{ \mathcal {V}(\mathbf {X}_{1}, \ldots , \mathbf {X}_{p}, \mathbf {m}) | Y \in J\}\). Let \(\mathbf {A}\) be a \((p-d_{0}) \times (p-d_{0})\) orthogonal matrix that is not \(\mathbf {I}_{p-d_{0}}\), \(\mathbf {u}_{1}, \ldots , \mathbf {u}_{d_{0}}\) be an orthonormal basis for \(\mathcal {S}_{y|\mathbf {x}}\), and \(\mathbf {w}_{1}, \ldots , \mathbf {w}_{p-d_{0}}\) be an orthonormal basis for \(\mathcal {S}^{\perp }_{y|\mathbf {x}}\). Define \(\mathbf {R}=\mathbf {U}\mathbf {U}^{\top }+\mathbf {W}\mathbf {A}\mathbf {W}^{\top }\), where \(\mathbf {U}=(\mathbf {u}_{1}, \ldots , \mathbf {u}_{d_{0}})\), and \(\mathbf {W}=(\mathbf {w}_{1}, \ldots , \mathbf {w}_{p-d_{0}})\). Note that (a) if \(\mathbf {v} \in \mathcal {S}_{y|\mathbf {x}}^{\perp }\), then \(\mathbf {R}\) rotates \(\mathbf {v}\), and (b) if \(\mathbf {v} \in \mathcal {S}_{y|\mathbf {x}}\), then \(\mathbf {R}\) leaves \(\mathbf {v}\) unchanged.
Since \(\mathbf {X}\) has a spherical distribution, \(\mathbf {X}\) and \(\mathbf {R}\mathbf {X}\) have the same distribution. Let \(\mathbf {P}\) be the projection onto \(\mathcal {S}_{y|\mathbf {x}}\), and \(\mathbf {Q}=\mathbf {I}_{p}-\mathbf {P}\) be the projection onto \(\mathcal {S}_{y|\mathbf {x}}^{\perp }\). Moreover, since , then \(Y | \mathbf {X}\) and \(Y | \mathbf {R} \mathbf {X}\) have the same conditional distribution. Therefore, \((Y,\mathbf {X})\) and \((Y,\mathbf {R}\mathbf {X})\) have the same distribution.
Moreover, the depth and the volume are invariant under rotation. Therefore, \(dep_{\mathbf {X}}(\mathbf {m})=dep_{\mathbf {RX}}(\varvec{\mathbf {R}m})\) and \(\mathcal {V}(\mathbf {X}_{1} ,\ldots , \mathbf {X}_{p}, \mathbf {m})=\mathcal {V}(\mathbf {RX}_{1}, \ldots , \mathbf {RX}_{p},\)\(\mathbf {Rm})\). Hence,
Similarly,
Let \(\varvec{\alpha }_{1}^{*}=tmed(\mathbf {X}|Y\in J)\) and suppose that \(\varvec{\alpha }_{1}^{*} \notin \mathcal {S}_{y|\mathbf {x}}\). Then, \(\varvec{\alpha }_{1}^{*}=\mathbf {P}\varvec{\alpha }_{1}^{*}+\mathbf {Q}\varvec{\alpha }_{1}^{*}\), where \(\mathbf {Q}\varvec{\alpha }_{1}^{*} \ne \mathbf {0}\). By the definition of \(\mathbf {R}\), \(\mathbf {R}\varvec{\alpha }_{1}^{*}=\mathbf {P}\varvec{\alpha }_{1}^{*}+\mathbf {WAW}^{\top }\mathbf {Q}\varvec{\alpha }_{1}^{*}\). Then \(\mathbf {WAW}^{\top }\mathbf {Q}\varvec{\alpha }_{1}^{*} \ne \mathbf {Q}\varvec{\alpha }_{1}^{*}\). But, \(S_{1}(\varvec{\alpha }_{1}^{*})=S_{1}(\mathbf {R}\varvec{\alpha }_{1}^{*})=\mathbf {R}S_{1}(\varvec{\alpha }_{1}^{*})\) (affine equivariance property) and \(\varvec{\alpha }_{1}^{*}\) cannot be \(tmed(\mathbf {X}|Y \in J)\). Contradiction. Similar steps give the result for the Oja median.
Now, if \(\varvec{\Sigma }_{\mathbf {xx}} \ne \mathbf {I}_{p}\) (elliptical distribution), then \(\varvec{\Sigma }_{xx}^{-1}\mathbf {X}\) has a covariance matrix of \(\mathbf {I}_{p}\) (spherical distribution), and \(\varvec{\Sigma }_{\mathbf {xx}}^{-1} med(\mathbf {X}|Y)= med(\varvec{\Sigma }_{\mathbf {xx}}^{-1} \mathbf {X}|Y) \in span\{\varvec{\beta }_{01}, \ldots , \varvec{\beta }_{0d_{0}}\}\), where the first equality follows from the affine equivariant property.
Proof of Corollary 3.1: We will first show that \(\widehat{\mathbf {V}}\), defined in (2), converges to
Letting \(\left\| \cdot \right\| \) to be the Frobenius norm, consider the difference
where the first inequality follows from the triangular inequality. Therefore, since \(H<\infty \), it is enough to show that \(\widehat{p}_{h}med_{n}(\widetilde{\mathbf {Z}}|Y \in I_{h})med_{n}(\widetilde{\mathbf {Z}}^{\top }|Y \in I_{h})\) converge to \(p_{h} med(\mathbf {Z}|Y \in I_{h})med(\mathbf {Z}^{\top }|Y \in I_{h})\), for every \(h=1, \ldots , H\). Note that
Note that
-
1.
elementary probability theory shows that \(\widehat{p}_{h}\) converges to \(p_{h}\),
-
2.
\(med_{n}(\widetilde{\mathbf {Z}}|Y \in I_{h})\) converges to \(med(\mathbf {Z} | Y \in I_{h})\) for both the Tukey and Oja medians. The convergence of the Tukey and Oja medians follows from their asymptotic normality distribution. See for example Bai and He (1999), Nolan (1999), and Massé (2002) for the regularity conditions and the proof of the asymptotic normality for the Tukey median, and Arcones et al. (1994), Hettmansperger et al. (1997), and Shen (2008) for the corresponding conditions and proof for the Oja median.
-
3.
\(\widehat{p}_{h}\) is bounded in probability,
-
4.
\(med(\mathbf {Z}|Y \in I_{h})\) exists for both the Tukey and Oja medians. See for example Donoho and Gasko (1992) who state that Tukey median always exists and Oja (1983) for a similar conclusion for the Oja median.
-
5.
\(med_{n}(\widetilde{\mathbf {Z}}|Y \in I_{h})\) is bounded in probability for both Tukey and Oja medians. This follows from the fact that \(med(\mathbf {Z}|Y \in I_{h})\) exists and \(med_{n}(\widetilde{\mathbf {Z}}|Y \in I_{h})\) converges to \(med(\mathbf {Z} | Y \in I_{h})\).
Using all the above facts, the weighted covariance-type \(\widehat{\mathbf {V}}\) converges to \(\mathbf {V}\) according to the Frobenius norm. Consequently, the consistency of \(\widehat{\mathbf {V}}\) to \(\mathbf {V}\), along with the assumption that all eigenvalues of \(\mathbf {V}\) are distinct, imply that the \(d_{0}\) eigenvectors of \(\widehat{\mathbf {V}}\), \(\widehat{\varvec{\eta }}_{k}\), \(k=1,...,d_{0}\), converge to the corresponding eigenvectors of \(\mathbf {V}\). This is a direct consequence of the Davis–Kahan \(sin\theta \) theorem (Davis and Kahan 1970). Following, using Theorem 2.4, which implies that \(med(\mathbf {Z}|Y)\) is contained in the standardized e.d.r subspace spanned by \(\varvec{\eta }_{0k}\), \(k=1,...,d_{0}\), we can see that the \(d_{0}\) eigenvectors of \(\mathbf {V}\) fall into that subspace.
Since \(\widehat{\varvec{\beta }}_{k}=\widehat{\varvec{\Sigma }}_{\mathbf {xx}}^{-1/2} \widehat{\varvec{\eta }}_{k}\) and, according to probability theory, \(\widehat{\varvec{\Sigma }}_{\mathbf {xx}}^{-1/2}\) converges to \(\varvec{\Sigma }_{\mathbf {xx}}^{-1/2}\) at \(\sqrt{n}\)-rate according to the Frobenius norm, then, \(\widehat{\varvec{\beta }}_{k}\), \(k=1,...,d_{0}\) converge to the e.d.r directions \(\varvec{\beta }_{0k}\), \(k=1,...,d_{0}\).
Rights and permissions
About this article
Cite this article
Christou, E. Robust dimension reduction using sliced inverse median regression. Stat Papers 61, 1799–1818 (2020). https://doi.org/10.1007/s00362-018-1007-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-018-1007-z
Keywords
- Affine equivariance property
- Dimension reduction subspace
- Oja and Tukey medians
- Robustness
- Sliced inverse regression