Abstract
In this paper, the conditional distance correlation (CDC) is used as a measure of correlation to develop a conditional feature screening procedure given some significant variables for ultrahigh-dimensional data. The proposed procedure is model free and is called conditional distance correlation-sure independence screening (CDC-SIS for short). That is, we do not specify any model structure between the response and the predictors, which is appealing in some practical problems of ultrahigh-dimensional data analysis. The sure screening property of the CDC-SIS is proved and a simulation study was conducted to evaluate the finite sample performances. Real data analysis is used to illustrate the proposed method. The results indicate that CDC-SIS performs well.
Similar content being viewed by others
References
Fan, J., Gijbels, I. (1996). Local polynomial modelling and its applications, Monographs on Statistics and Applied Probability, vol. 66. Chapman and Hall, London.
Fan, J., Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849–911.
Fan, J., Song, R. (2010). Sure independence screening in generalized linear models with np-dimensionality. The Annals of Statistics, 38(6), 3567–3604.
Fan, J., Samworth, R., Wu, Y. (2009). Ultrahigh dimensional feature selection: beyond the linear model. The Journal of Machine Learning Research, 10, 2013–2038.
Fan, J., Feng, Y., Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106(494), 544–557.
Fan, J., Ma, Y., Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association, 109(507), 1270–1284.
Harrison, D., Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1), 81–102.
Li, R., Zhong, W., Zhu, L. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107(499), 1129–1139.
Liu, J., Li, R., Wu, R. (2014). Feature selection for varying coefficient models with ultrahigh-dimensional covariates. Journal of the American Statistical Association, 109(505), 266–274.
Székely, G. J., Rizzo, M. L., Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794.
Wang, Q. H., Rao, J. N. K. (2002). Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics, 30(3), 896–924.
Wang, X., Pan, W., Hu, W., Tian, Y., Zhang, H. (2015). Conditional distance correlation. Journal of the American Statistical Association, 110(512), 1726–1734.
Zhong, W., Zhu, L., Li, R., Cui, H. (2016). Regularized quantile regression and robust feature screening for single index models. Statistica Sinica, 26(1), 69–95.
Zhu, L. P., Li, L., Li, R., Zhu, L. X. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106(496), 1464–1475.
Acknowledgements
Wang’s research was supported by the National Natural Science Foundation of China (General Program 11171331 and Key Program 11331011) and the National Natural Science Foundation for Creative Research Groups in China (61621003), a Grant from the Key Lab of Random Complex Structure and Data Science, CAS and Natural Science Fund of SZU.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
We first establish the following regularity conditions:
-
(C1)
Denote the density function of W by \(f(\cdot )\), and assume that it has continuous second derivatives. The support of W is assumed to be bounded and is denoted by \(\mathcal {W}=[a,b]\) with finite constants a and b.
-
(C2)
\(K(\cdot )\) is a symmetric density function with bounded support and bounded over its support.
-
(C3)
The random variables \(\mathbf X \) and Y satisfy the sub-exponential tail probability uniformly in p. That is, there exists a positive constant \(s_0\), such that for \(0\le s<s_0\),
$$\begin{aligned} \sup _{W\in \mathcal {W}}\max _{1\le j \le p}E(\exp (sX_j^2|W))< & {} \infty ,\\ \sup _{W\in \mathcal {W}}E(\exp (s Y^2|W))< & {} \infty , \end{aligned}$$ -
(C4)
\(\min _{j\in \mathcal {{M}}^{*}}{\rho }_{j0}^{*} \ge 2cn^{-\kappa }\) for some constant \(c>0\) and \(0\le \kappa < 1/2\).
Proof of Theorem 1
The proof consists of three steps. We denote the positive constants c and C as generic constants depending on the context, which can vary from line to line.
-
Step 1.
For some \(0\le \kappa <1/2\), we first prove
$$\begin{aligned}&\max _{1\le j\le p}\sup _{w\in [a,b]}P(|\hat{\rho }^2 (X_j,Y|W=w)-{\rho }^2 (X_j,Y|W=w)|\nonumber \\ {}&\quad \ge cn^{-\kappa }) \le C \exp \left( -\frac{n^{-\kappa }}{Ch}\right) . \end{aligned}$$(7)Refer to the Supplemental material for the proof of Step 1.
-
Step 2.
We prove \(P(\max _{1\le j\le p}|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-k}) \le O(np\exp (-{n^{\gamma -\kappa }}/{\xi }))\). Note that
$$\begin{aligned} P(|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-\kappa })\le & {} P(|\hat{\rho }_{j}^{*}-{\rho }_{j}^{*}|+|{\rho }_{j}^{*}-{\rho }_{j0}^{*}| \ge cn^{-\kappa })\\\le & {} P(|\hat{\rho }_{j}^{*}-{\rho }_{j}^{*}| \ge cn^{-\kappa }/2)+ P(|{\rho }_{j}^{*}-{\rho }_{j0}^{*}| \ge cn^{-\kappa }/2). \end{aligned}$$By the definitions of \(\hat{\rho }_{j}^{*}\), \({\rho }_{j}^{*}=\frac{1}{n}\sum _{i=1}^n{\rho }_j^2 (W_i)\) with \({\rho }_j^2 (w)=\rho ^2(X_j,Y|W=w)\) and the result of Step 1, we have, for \(j=1,2,\ldots ,p\)
$$\begin{aligned} P(|\hat{\rho }_{j}^{*}-{\rho }_{j}^{*}| \ge cn^{-\kappa }/2)= & {} P\left( |\frac{1}{n}\sum _{i=1}^n\hat{\rho }_j^2 (W_i)-\frac{1}{n}\sum _{i=1}^n{\rho }_j^2 (W_i)| \ge cn^{-\kappa }/2\right) \nonumber \\\le & {} \sum _{i=1}^n P(|\hat{\rho }_j^2 (W_i)-{\rho }_j^2 (W_i)| \ge cn^{-\kappa }/2)\nonumber \\\le & {} Cn\exp \left( -\frac{n^{-\kappa }}{Ch}\right) \nonumber \\= & {} O(n\exp (-n^{\gamma -\kappa }/\xi )), \end{aligned}$$(8)where \(\xi \) is a positive constant, and \(0\le \kappa <\gamma \). By Hoeffding’s inequality, for \(j=1,2,\ldots ,p\), it follows that
$$\begin{aligned} P(|{\rho }_{j}^{*}-{\rho }_{j0}^{*}| \ge cn^{-\kappa }/2)= & {} P\left( |\frac{1}{n}\sum _{i=1}^n{\rho }_j^2 (W_i)-E{\rho }_j^2 (W_i)| \ge cn^{-\kappa }/2\right) \nonumber \\\le & {} 2\exp (-nc^2n^{-2\kappa }/2))= O(\exp (-n^{1-2\kappa }/\xi )).\nonumber \\ \end{aligned}$$(9)Eq. (8) dominates Eq. (9). Hence, for \(j=1,2,\ldots ,p\), we get
$$\begin{aligned} P(|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-\kappa }) \le O(n\exp (-{n^{\gamma -\kappa }}/{\xi })). \end{aligned}$$We thus have
$$\begin{aligned} P\left( \max _{1\le j\le p}|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-k}\right) \le O(np\exp (-{n^{\gamma -\kappa }}/{\xi })). \end{aligned}$$ -
Step 3.
We prove \(P(\mathcal {{M}}^{*} \subset \mathcal {\hat{M}}) \ge 1-O(ns_n\exp (-{n^{\gamma -\kappa }}/{\xi }))\). If \(\mathcal {{M}}^{*} \not \subset \mathcal {\hat{M}}\), then there exist some \(j\in \mathcal {{M}}^{*}\) such that \(\hat{\rho }_{j}^{*}<cn^{-\kappa }\), due to \(\min _{j\in \mathcal {{M}}^{*}}{\rho }_{j0}^{*} \ge 2cn^{-\kappa }\), \(|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-\kappa }\) for some \(j\in \mathcal {{M}}^{*}\), indicating that
$$\begin{aligned} \{\mathcal {{M}}^{*} \not \subset \mathcal {\hat{M}}\} \subset \{|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-\kappa }\quad \text{ for } \text{ some } \; j\in \mathcal {{M}}^{*}\}. \end{aligned}$$Consequently,
$$\begin{aligned} P\{\mathcal {{M}}^{*} \subset \mathcal {\hat{M}}\}\ge & {} P\{\max _{j\in \mathcal {{M}}^{*}}|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|< cn^{-\kappa }\}\\= & {} 1-P\{\max _{j\in \mathcal {{M}}^{*}}|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-\kappa }\}\\\ge & {} 1-s_n P\{|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-\kappa }\}\\\ge & {} 1 - O(ns_n\exp (-{n^{\gamma -\kappa }}/{\xi })). \end{aligned}$$
\(\square \)
About this article
Cite this article
Liu, Y., Wang, Q. Model-free feature screening for ultrahigh-dimensional data conditional on some variables. Ann Inst Stat Math 70, 283–301 (2018). https://doi.org/10.1007/s10463-016-0597-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-016-0597-2