Abstract
With the knowledge on the predetermined active predictors, we develop a feature screening procedure via the conditional distance correlation learning. The proposed procedure can significantly lower the correlation among the predictors when they are highly correlated and thus reduce the numbers of false positive and false negative. Meanwhile, when the conditional set is unable to be accessed beforehand, a data-driven method is provided to select it. We establish both the ranking consistency and the sure screening property for the new proposed procedure. To compare the performance of our method with its competitors, extensive simulations are conducted, which shows that the new procedure performs well in both the linear and nonlinear models. Finally, a real data analysis is investigated to further illustrate the effectiveness of the new method.
Similar content being viewed by others
References
Barut E, Fan J, Verhasselt A (2016) Conditional sure independence screening. J Am Stat Assoc 111(515):1266–1277
Candes E, Tao T (2007) The dantzig selector: statistical estimation when p is much larger than n. Ann Stat 35(6):2313–2351
Chang J, Tang C, Wu Y (2013) Marginal empirical likelihood and sure independence feature screening. Ann Stat 41(4):2123–2148
Fan J, Feng Y, Song R (2011) Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Am Stat Assoc 106(494):544–557
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc B 70(5):849–911
Fan J, Zhang C, Zhang J (2001) Generalized likelihood ratio statistics and wilks phenomenon. Ann Stat 29(1):153–193
Golub T, Slonim D, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
He X, Wang L, Hong H et al (2013) Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Stat 41(1):342–369
Hu Q, Lin L (2017) Conditional sure independence screening by conditional marginal empirical likelihood. Ann Inst Stat Math 69(1):63–96
Lavergne P, Patilea V (2012) One for all and all for one: regression checks with many regressors. J Bus Econ Stat 30(1):41–52
Li G, Peng H, Zhang J et al (2012) Robust rank correlation based screening. Ann Stat 40(3):1846–1877
Li R, Zhong W, Zhu L (2012) Feature screening via distance correlation learning. J Am Stat Assoc 107(499):1129–1139
Lin L, Sun J (2016) Adaptive conditional feature screening. Comput Stat Data Anal 94:287–301
Lin L, Sun J, Zhu L (2013) Nonparametric feature screening. Comput Stat Data Anal 67:162–174
Liu J, Li R, Wu R (2014) Feature selection for varying coefficient models with ultrahigh-dimensional covariates. J Am Stat Assoc 109(505):266–274
Neykov N, Filzmoser P, Neytchev P (2014) Ultrahigh dimensional variable selection through the penalized maximum trimmed likelihood estimator. Stat Pap 55(1):187–207
Székely G, Rizzo M, Bakirov N et al (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35(6):2769–2794
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288
Wang H (2009) Forward regression for ultra-high dimensional variable screening. J Am Stat Assoc 104(488):1512–1524
Wang M, Tian G (2017) Adaptive group lasso for high-dimensional generalized linear models. Stat Pap. doi:10.1007/s00362-017-0882-z
Wang X, Pan W, Hu W et al (2015) Conditional distance correlation. J Am Stat Assoc 110(512):1726–1734
Xu D, Zhang Z, Wu L (2014) Variable selection in high-dimensional double generalized linear models. Stat Pap 55(2):327–347
Zhu L, Li L, Li R, Zhu L (2011) Model-free feature screening for ultrahigh-dimensional data. J Am Stat Assoc 106(496):1464–1475
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Acknowledgements
We would like to thank Shaomin Li and Wei Shen at Shandong University for their constructive suggestions and effort for improving the English writing of this paper. Also, thanks for two anonymous reviewers and associate editor for their useful comments on improving the quality of the paper. The research was supported by NNSF Projects (11571204, 11231005, 11526205 and 11626247) of China.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Lu, J., Lin, L. Model-free conditional screening via conditional distance correlation. Stat Papers 61, 225–244 (2020). https://doi.org/10.1007/s00362-017-0931-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-017-0931-7