Model-free feature screening for ultrahigh-dimensional data conditional on some variables

Liu, Yi; Wang, Qihua

doi:10.1007/s10463-016-0597-2

Model-free feature screening for ultrahigh-dimensional data conditional on some variables

Published: 17 January 2017

Volume 70, pages 283–301, (2018)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Yi Liu^1,2 &
Qihua Wang^1,3

959 Accesses
9 Citations
Explore all metrics

Abstract

In this paper, the conditional distance correlation (CDC) is used as a measure of correlation to develop a conditional feature screening procedure given some significant variables for ultrahigh-dimensional data. The proposed procedure is model free and is called conditional distance correlation-sure independence screening (CDC-SIS for short). That is, we do not specify any model structure between the response and the predictors, which is appealing in some practical problems of ultrahigh-dimensional data analysis. The sure screening property of the CDC-SIS is proved and a simulation study was conducted to evaluate the finite sample performances. Real data analysis is used to illustrate the proposed method. The results indicate that CDC-SIS performs well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature dimensionality reduction: a review

Article Open access 21 January 2022

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

References

Fan, J., Gijbels, I. (1996). Local polynomial modelling and its applications, Monographs on Statistics and Applied Probability, vol. 66. Chapman and Hall, London.
Fan, J., Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849–911.
Fan, J., Song, R. (2010). Sure independence screening in generalized linear models with np-dimensionality. The Annals of Statistics, 38(6), 3567–3604.
Fan, J., Samworth, R., Wu, Y. (2009). Ultrahigh dimensional feature selection: beyond the linear model. The Journal of Machine Learning Research, 10, 2013–2038.
Fan, J., Feng, Y., Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 106(494), 544–557.
Fan, J., Ma, Y., Dai, W. (2014). Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. Journal of the American Statistical Association, 109(507), 1270–1284.
Harrison, D., Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1), 81–102.
Li, R., Zhong, W., Zhu, L. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107(499), 1129–1139.
Liu, J., Li, R., Wu, R. (2014). Feature selection for varying coefficient models with ultrahigh-dimensional covariates. Journal of the American Statistical Association, 109(505), 266–274.
Székely, G. J., Rizzo, M. L., Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794.
Wang, Q. H., Rao, J. N. K. (2002). Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics, 30(3), 896–924.
Wang, X., Pan, W., Hu, W., Tian, Y., Zhang, H. (2015). Conditional distance correlation. Journal of the American Statistical Association, 110(512), 1726–1734.
Zhong, W., Zhu, L., Li, R., Cui, H. (2016). Regularized quantile regression and robust feature screening for single index models. Statistica Sinica, 26(1), 69–95.
Zhu, L. P., Li, L., Li, R., Zhu, L. X. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106(496), 1464–1475.

Download references

Acknowledgements

Wang’s research was supported by the National Natural Science Foundation of China (General Program 11171331 and Key Program 11331011) and the National Natural Science Foundation for Creative Research Groups in China (61621003), a Grant from the Key Lab of Random Complex Structure and Data Science, CAS and Natural Science Fund of SZU.

Author information

Authors and Affiliations

Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
Yi Liu & Qihua Wang
College of Science, China University of Petroleum, Qingdao, 266580, China
Yi Liu
Institute of Statistical Science, Shenzhen University, Shenzhen, 518006, China
Qihua Wang

Authors

Yi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qihua Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qihua Wang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 230 KB)

Appendix

We first establish the following regularity conditions:

(C1)
Denote the density function of W by $f(\cdot )$, and assume that it has continuous second derivatives. The support of W is assumed to be bounded and is denoted by $\mathcal {W}=[a,b]$ with finite constants a and b.
(C2)
$K(\cdot )$ is a symmetric density function with bounded support and bounded over its support.
(C3)
The random variables $\mathbf X $ and Y satisfy the sub-exponential tail probability uniformly in p. That is, there exists a positive constant $s_0$, such that for $0\le s<s_0$,
$$\begin{aligned} \sup _{W\in \mathcal {W}}\max _{1\le j \le p}E(\exp (sX_j^2|W))< & {} \infty ,\\ \sup _{W\in \mathcal {W}}E(\exp (s Y^2|W))< & {} \infty , \end{aligned}$$
(C4)
$\min _{j\in \mathcal {{M}}^{*}}{\rho }_{j0}^{*} \ge 2cn^{-\kappa }$ for some constant $c>0$ and $0\le \kappa < 1/2$.

Proof of Theorem 1

The proof consists of three steps. We denote the positive constants c and C as generic constants depending on the context, which can vary from line to line.

Step 1.
For some $0\le \kappa <1/2$, we first prove
$$\begin{aligned}&\max _{1\le j\le p}\sup _{w\in [a,b]}P(|\hat{\rho }^2 (X_j,Y|W=w)-{\rho }^2 (X_j,Y|W=w)|\nonumber \\ {}&\quad \ge cn^{-\kappa }) \le C \exp \left( -\frac{n^{-\kappa }}{Ch}\right) . \end{aligned}$$
(7)
Refer to the Supplemental material for the proof of Step 1.
Step 2.
We prove $P(\max _{1\le j\le p}|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-k}) \le O(np\exp (-{n^{\gamma -\kappa }}/{\xi }))$. Note that
$$\begin{aligned} P(|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-\kappa })\le & {} P(|\hat{\rho }_{j}^{*}-{\rho }_{j}^{*}|+|{\rho }_{j}^{*}-{\rho }_{j0}^{*}| \ge cn^{-\kappa })\\\le & {} P(|\hat{\rho }_{j}^{*}-{\rho }_{j}^{*}| \ge cn^{-\kappa }/2)+ P(|{\rho }_{j}^{*}-{\rho }_{j0}^{*}| \ge cn^{-\kappa }/2). \end{aligned}$$
By the definitions of $\hat{\rho }_{j}^{*}$, ${\rho }_{j}^{*}=\frac{1}{n}\sum _{i=1}^n{\rho }_j^2 (W_i)$ with ${\rho }_j^2 (w)=\rho ^2(X_j,Y|W=w)$ and the result of Step 1, we have, for $j=1,2,\ldots ,p$
$$\begin{aligned} P(|\hat{\rho }_{j}^{*}-{\rho }_{j}^{*}| \ge cn^{-\kappa }/2)= & {} P\left( |\frac{1}{n}\sum _{i=1}^n\hat{\rho }_j^2 (W_i)-\frac{1}{n}\sum _{i=1}^n{\rho }_j^2 (W_i)| \ge cn^{-\kappa }/2\right) \nonumber \\\le & {} \sum _{i=1}^n P(|\hat{\rho }_j^2 (W_i)-{\rho }_j^2 (W_i)| \ge cn^{-\kappa }/2)\nonumber \\\le & {} Cn\exp \left( -\frac{n^{-\kappa }}{Ch}\right) \nonumber \\= & {} O(n\exp (-n^{\gamma -\kappa }/\xi )), \end{aligned}$$
(8)
where $\xi $ is a positive constant, and $0\le \kappa <\gamma $. By Hoeffding’s inequality, for $j=1,2,\ldots ,p$, it follows that
$$\begin{aligned} P(|{\rho }_{j}^{*}-{\rho }_{j0}^{*}| \ge cn^{-\kappa }/2)= & {} P\left( |\frac{1}{n}\sum _{i=1}^n{\rho }_j^2 (W_i)-E{\rho }_j^2 (W_i)| \ge cn^{-\kappa }/2\right) \nonumber \\\le & {} 2\exp (-nc^2n^{-2\kappa }/2))= O(\exp (-n^{1-2\kappa }/\xi )).\nonumber \\ \end{aligned}$$
(9)
Eq. (8) dominates Eq. (9). Hence, for $j=1,2,\ldots ,p$, we get
$$\begin{aligned} P(|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-\kappa }) \le O(n\exp (-{n^{\gamma -\kappa }}/{\xi })). \end{aligned}$$
We thus have
$$\begin{aligned} P\left( \max _{1\le j\le p}|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-k}\right) \le O(np\exp (-{n^{\gamma -\kappa }}/{\xi })). \end{aligned}$$
Step 3.
We prove $P(\mathcal {{M}}^{*} \subset \mathcal {\hat{M}}) \ge 1-O(ns_n\exp (-{n^{\gamma -\kappa }}/{\xi }))$. If $\mathcal {{M}}^{*} \not \subset \mathcal {\hat{M}}$, then there exist some $j\in \mathcal {{M}}^{*}$ such that $\hat{\rho }_{j}^{*}<cn^{-\kappa }$, due to $\min _{j\in \mathcal {{M}}^{*}}{\rho }_{j0}^{*} \ge 2cn^{-\kappa }$, $|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-\kappa }$ for some $j\in \mathcal {{M}}^{*}$, indicating that
$$\begin{aligned} \{\mathcal {{M}}^{*} \not \subset \mathcal {\hat{M}}\} \subset \{|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-\kappa }\quad \text{ for } \text{ some } \; j\in \mathcal {{M}}^{*}\}. \end{aligned}$$
Consequently,
$$\begin{aligned} P\{\mathcal {{M}}^{*} \subset \mathcal {\hat{M}}\}\ge & {} P\{\max _{j\in \mathcal {{M}}^{*}}|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|< cn^{-\kappa }\}\\= & {} 1-P\{\max _{j\in \mathcal {{M}}^{*}}|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-\kappa }\}\\\ge & {} 1-s_n P\{|\hat{\rho }_{j}^{*}-{\rho }_{j0}^{*}|\ge cn^{-\kappa }\}\\\ge & {} 1 - O(ns_n\exp (-{n^{\gamma -\kappa }}/{\xi })). \end{aligned}$$

$\square $

About this article

Cite this article

Liu, Y., Wang, Q. Model-free feature screening for ultrahigh-dimensional data conditional on some variables. Ann Inst Stat Math 70, 283–301 (2018). https://doi.org/10.1007/s10463-016-0597-2

Download citation

Received: 02 June 2016
Revised: 18 November 2016
Published: 17 January 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10463-016-0597-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-free feature screening for ultrahigh-dimensional data conditional on some variables

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 230 KB)

Appendix

Proof of Theorem 1

About this article

Cite this article

Keywords

Navigation

Model-free feature screening for ultrahigh-dimensional data conditional on some variables

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 230 KB)

Appendix

Appendix

Proof of Theorem 1

About this article

Cite this article

Share this article

Keywords

Search

Navigation