Skip to main content
Log in

Model-free conditional screening via conditional distance correlation

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

With the knowledge on the predetermined active predictors, we develop a feature screening procedure via the conditional distance correlation learning. The proposed procedure can significantly lower the correlation among the predictors when they are highly correlated and thus reduce the numbers of false positive and false negative. Meanwhile, when the conditional set is unable to be accessed beforehand, a data-driven method is provided to select it. We establish both the ranking consistency and the sure screening property for the new proposed procedure. To compare the performance of our method with its competitors, extensive simulations are conducted, which shows that the new procedure performs well in both the linear and nonlinear models. Finally, a real data analysis is investigated to further illustrate the effectiveness of the new method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Barut E, Fan J, Verhasselt A (2016) Conditional sure independence screening. J Am Stat Assoc 111(515):1266–1277

    Article  MathSciNet  Google Scholar 

  • Candes E, Tao T (2007) The dantzig selector: statistical estimation when p is much larger than n. Ann Stat 35(6):2313–2351

    Article  MathSciNet  Google Scholar 

  • Chang J, Tang C, Wu Y (2013) Marginal empirical likelihood and sure independence feature screening. Ann Stat 41(4):2123–2148

    Article  MathSciNet  Google Scholar 

  • Fan J, Feng Y, Song R (2011) Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Am Stat Assoc 106(494):544–557

    Article  MathSciNet  Google Scholar 

  • Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc B 70(5):849–911

    Article  MathSciNet  Google Scholar 

  • Fan J, Zhang C, Zhang J (2001) Generalized likelihood ratio statistics and wilks phenomenon. Ann Stat 29(1):153–193

    Article  MathSciNet  Google Scholar 

  • Golub T, Slonim D, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Article  Google Scholar 

  • He X, Wang L, Hong H et al (2013) Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Stat 41(1):342–369

    Article  MathSciNet  Google Scholar 

  • Hu Q, Lin L (2017) Conditional sure independence screening by conditional marginal empirical likelihood. Ann Inst Stat Math 69(1):63–96

    Article  MathSciNet  Google Scholar 

  • Lavergne P, Patilea V (2012) One for all and all for one: regression checks with many regressors. J Bus Econ Stat 30(1):41–52

    Article  MathSciNet  Google Scholar 

  • Li G, Peng H, Zhang J et al (2012) Robust rank correlation based screening. Ann Stat 40(3):1846–1877

    Article  MathSciNet  Google Scholar 

  • Li R, Zhong W, Zhu L (2012) Feature screening via distance correlation learning. J Am Stat Assoc 107(499):1129–1139

    Article  MathSciNet  Google Scholar 

  • Lin L, Sun J (2016) Adaptive conditional feature screening. Comput Stat Data Anal 94:287–301

    Article  MathSciNet  Google Scholar 

  • Lin L, Sun J, Zhu L (2013) Nonparametric feature screening. Comput Stat Data Anal 67:162–174

    Article  MathSciNet  Google Scholar 

  • Liu J, Li R, Wu R (2014) Feature selection for varying coefficient models with ultrahigh-dimensional covariates. J Am Stat Assoc 109(505):266–274

    Article  MathSciNet  Google Scholar 

  • Neykov N, Filzmoser P, Neytchev P (2014) Ultrahigh dimensional variable selection through the penalized maximum trimmed likelihood estimator. Stat Pap 55(1):187–207

    Article  MathSciNet  Google Scholar 

  • Székely G, Rizzo M, Bakirov N et al (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35(6):2769–2794

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • Wang H (2009) Forward regression for ultra-high dimensional variable screening. J Am Stat Assoc 104(488):1512–1524

    Article  MathSciNet  Google Scholar 

  • Wang M, Tian G (2017) Adaptive group lasso for high-dimensional generalized linear models. Stat Pap. doi:10.1007/s00362-017-0882-z

    Article  MathSciNet  Google Scholar 

  • Wang X, Pan W, Hu W et al (2015) Conditional distance correlation. J Am Stat Assoc 110(512):1726–1734

    Article  MathSciNet  Google Scholar 

  • Xu D, Zhang Z, Wu L (2014) Variable selection in high-dimensional double generalized linear models. Stat Pap 55(2):327–347

    Article  MathSciNet  Google Scholar 

  • Zhu L, Li L, Li R, Zhu L (2011) Model-free feature screening for ultrahigh-dimensional data. J Am Stat Assoc 106(496):1464–1475

    Article  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We would like to thank Shaomin Li and Wei Shen at Shandong University for their constructive suggestions and effort for improving the English writing of this paper. Also, thanks for two anonymous reviewers and associate editor for their useful comments on improving the quality of the paper. The research was supported by NNSF Projects (11571204, 11231005, 11526205 and 11626247) of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lu Lin.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (tex 38 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, J., Lin, L. Model-free conditional screening via conditional distance correlation. Stat Papers 61, 225–244 (2020). https://doi.org/10.1007/s00362-017-0931-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-017-0931-7

Keywords

Navigation