Skip to main content
Log in

RCV-based error density estimation in the ultrahigh dimensional additive model

  • Articles
  • Published:
Science China Mathematics Aims and scope Submit manuscript

Abstract

In this paper, we mainly study how to estimate the error density in the ultrahigh dimensional sparse additive model, where the number of variables is larger than the sample size. First, a smoothing method based on B-splines is applied to the estimation of regression functions. Second, an improved two-stage refitted cross-validation (RCV) procedure by random splitting technique is used to obtain the residuals of the model, and then the residual-based kernel method is applied to estimate the error density function. Under suitable sparse conditions, the large sample properties of the estimator including the weak and strong consistency, as well as normality and the law of the iterated logarithm are obtained. Especially, the relationship between the sparsity and the convergence rate of the kernel density estimator is given. The methodology is illustrated by simulations and a real data example, which suggests that the proposed method performs well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anscombe F J, Glynn W J. Distribution of kurtosis statistic for normal statistics. Biometrika, 1983, 70: 227–234

    MathSciNet  MATH  Google Scholar 

  2. Breheny P, Huang J. Penalized methods for bi-level variable selection. Stat Interface, 2009, 2: 369–380

    Article  MathSciNet  Google Scholar 

  3. Chai G X, Li Z Y. Asymptotic theory for estimation of error distribution in linear model. Sci China Ser A, 1993, 36: 408–419

    MathSciNet  MATH  Google Scholar 

  4. Chai G X, Li Z Y, Tian H. Consistent nonparametric estimation of error distributions in linear model. Acta Math Appl Sin, 1991, 7: 246–256

    Article  MathSciNet  Google Scholar 

  5. Chen Z, Fan J Q, Li R Z. Error variance estimation in ultrahigh dimensional additive models. J Amer Statist Assoc, 2018, 113: 315–327

    Article  MathSciNet  Google Scholar 

  6. Cheng F X. Weak and strong uniform consistency of a kernel error density estimator in nonparametric regression. J Statist Plann Inference, 2004, 119: 95–107

    Article  MathSciNet  Google Scholar 

  7. Cheng F X. Asymptotic distributions of error density and distribution function estimators in nonparametric regression. J Statist Plann Inference, 2005, 128: 327–349

    Article  MathSciNet  Google Scholar 

  8. D’Agostino R B. Transformation to normality of the null distribution of g1. Biometrika, 1970, 57: 679–681

    MATH  Google Scholar 

  9. De Boor C. A Practical Guide to Splines. New York: Springer, 2001

    MATH  Google Scholar 

  10. Fan J Q, Feng Y, Song R. Nonparametric independence screening in sparse ultrahigh dimensional additive models. J Amer Statist Assoc, 2011, 106: 544–557

    Article  MathSciNet  Google Scholar 

  11. Fan J Q, Guo S J, Hao N. Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J R Stat Soc Ser B Stat Methodol, 2012, 74: 37–65

    Article  MathSciNet  Google Scholar 

  12. Fan J Q, Lv J C. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B Stat Methodol, 2008, 70: 849–911

    Article  MathSciNet  Google Scholar 

  13. Friedman J H, Stuetzle W. Projection pursuit regression. J Amer Statist Assoc, 1981, 76: 817–823

    Article  MathSciNet  Google Scholar 

  14. Hall P, Miller H. Using generalized correlation to effect variable selection in very high dimensional problems. J Comput Graph Statist, 2009, 18: 533–550

    Article  MathSciNet  Google Scholar 

  15. Huang J, Horowitz J L, Wei F. Variable selection in nonparametric additive models. Ann Statist, 2010, 38: 2282–2313

    MathSciNet  MATH  Google Scholar 

  16. Huang J, Ma S G, Zhang C H. Adaptive LASSO for sparse high-dimensional regression. Statist Sinica, 2008, 18: 1603–1618

    MathSciNet  MATH  Google Scholar 

  17. Jarque C M, Bera A K. Efficient test for normality, homoscedasticity and serial independence of residuals. Econom Lett, 1980, 3: 255–259

    Article  MathSciNet  Google Scholar 

  18. Li R Z, Zhong W, Zhu L P. Feature screening via distance correlation learning. J Amer Statist Assoc, 2012, 107: 1129–1139

    Article  MathSciNet  Google Scholar 

  19. Liang H, Hardle W. Large sample theory of the estimation of the error distribution for a semiparametric model. Comm Statist Theory Methods, 1999, 28: 2025–2036

    Article  MathSciNet  Google Scholar 

  20. Maye J, Gerken L. Learning phonemes without minimal pairs. In: Proceedings of the 24th Annual Boston University Conference on Language Development, vol. 2. Somerville: Cascadilla Press, 2000, 522–533

    Google Scholar 

  21. Meier L, Geer SVD, Bühlmann P. High-dimensional additive modeling. Ann Statist, 2009, 37: 3779–3821

    Article  MathSciNet  Google Scholar 

  22. Pollard D. Convergence of Stochastic Processes. New York: Springer, 1984

    Book  Google Scholar 

  23. Powell J L. Least absolute deviations estimation for the censored regression model. J Econometrics, 1984, 25: 303–325

    Article  MathSciNet  Google Scholar 

  24. Ravikumar P, Lafferty J, Liu H, et al. Sparse additive models. J R Stat Soc Ser B Stat Methodol, 2009, 71: 1009–1030

    Article  MathSciNet  Google Scholar 

  25. Schumaker L L. Splines Function: Basic Theory. New York: Wiley, 1981

    Google Scholar 

  26. Silverman B W. Density Estimation for Statistics and Data Analysis. London: Chapman and Hall, 1986

    MATH  Google Scholar 

  27. Stone C J. Additive regression and other nonparametric models. Ann Statist, 1985, 13: 689–705

    Article  MathSciNet  Google Scholar 

  28. Wang H S. Forward regression for ultrahigh dimensional variable screening. J Amer Statist Assoc, 2009, 104: 1512–1524

    Article  MathSciNet  Google Scholar 

  29. Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol, 2006, 68: 49–67

    Article  MathSciNet  Google Scholar 

  30. Xue L. Consistent variable selection in additive models. Statist Sinica, 2009, 19: 1281–1296

    MathSciNet  MATH  Google Scholar 

  31. Zhong W. Robust sure independence screening for ultrahigh dimensional non-normal data. Acta Math Appl Sin, 2014, 30: 1885–1896

    Article  MathSciNet  Google Scholar 

  32. Zou F, Cui H J. Error density estimation in high-dimensional sparse linear model. Ann Inst Statist Math, 2020, 72: 427–449

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 11971324 and 11471223), Foundation of Science and Technology Innovation Service Capacity Building, Interdisciplinary Construction of Bioinformatics and Statistics, and Academy for Multidisciplinary Studies, Capital Normal University. The authors thank the reviewers for their constructive comments, which have led to an improvement of the earlier version of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hengjian Cui.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, F., Cui, H. RCV-based error density estimation in the ultrahigh dimensional additive model. Sci. China Math. 65, 1003–1028 (2022). https://doi.org/10.1007/s11425-019-1722-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11425-019-1722-2

Keywords

MSC(2020)

Navigation