Abstract
In this paper, we mainly study how to estimate the error density in the ultrahigh dimensional sparse additive model, where the number of variables is larger than the sample size. First, a smoothing method based on B-splines is applied to the estimation of regression functions. Second, an improved two-stage refitted cross-validation (RCV) procedure by random splitting technique is used to obtain the residuals of the model, and then the residual-based kernel method is applied to estimate the error density function. Under suitable sparse conditions, the large sample properties of the estimator including the weak and strong consistency, as well as normality and the law of the iterated logarithm are obtained. Especially, the relationship between the sparsity and the convergence rate of the kernel density estimator is given. The methodology is illustrated by simulations and a real data example, which suggests that the proposed method performs well.
Similar content being viewed by others
References
Anscombe F J, Glynn W J. Distribution of kurtosis statistic for normal statistics. Biometrika, 1983, 70: 227–234
Breheny P, Huang J. Penalized methods for bi-level variable selection. Stat Interface, 2009, 2: 369–380
Chai G X, Li Z Y. Asymptotic theory for estimation of error distribution in linear model. Sci China Ser A, 1993, 36: 408–419
Chai G X, Li Z Y, Tian H. Consistent nonparametric estimation of error distributions in linear model. Acta Math Appl Sin, 1991, 7: 246–256
Chen Z, Fan J Q, Li R Z. Error variance estimation in ultrahigh dimensional additive models. J Amer Statist Assoc, 2018, 113: 315–327
Cheng F X. Weak and strong uniform consistency of a kernel error density estimator in nonparametric regression. J Statist Plann Inference, 2004, 119: 95–107
Cheng F X. Asymptotic distributions of error density and distribution function estimators in nonparametric regression. J Statist Plann Inference, 2005, 128: 327–349
D’Agostino R B. Transformation to normality of the null distribution of g1. Biometrika, 1970, 57: 679–681
De Boor C. A Practical Guide to Splines. New York: Springer, 2001
Fan J Q, Feng Y, Song R. Nonparametric independence screening in sparse ultrahigh dimensional additive models. J Amer Statist Assoc, 2011, 106: 544–557
Fan J Q, Guo S J, Hao N. Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J R Stat Soc Ser B Stat Methodol, 2012, 74: 37–65
Fan J Q, Lv J C. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B Stat Methodol, 2008, 70: 849–911
Friedman J H, Stuetzle W. Projection pursuit regression. J Amer Statist Assoc, 1981, 76: 817–823
Hall P, Miller H. Using generalized correlation to effect variable selection in very high dimensional problems. J Comput Graph Statist, 2009, 18: 533–550
Huang J, Horowitz J L, Wei F. Variable selection in nonparametric additive models. Ann Statist, 2010, 38: 2282–2313
Huang J, Ma S G, Zhang C H. Adaptive LASSO for sparse high-dimensional regression. Statist Sinica, 2008, 18: 1603–1618
Jarque C M, Bera A K. Efficient test for normality, homoscedasticity and serial independence of residuals. Econom Lett, 1980, 3: 255–259
Li R Z, Zhong W, Zhu L P. Feature screening via distance correlation learning. J Amer Statist Assoc, 2012, 107: 1129–1139
Liang H, Hardle W. Large sample theory of the estimation of the error distribution for a semiparametric model. Comm Statist Theory Methods, 1999, 28: 2025–2036
Maye J, Gerken L. Learning phonemes without minimal pairs. In: Proceedings of the 24th Annual Boston University Conference on Language Development, vol. 2. Somerville: Cascadilla Press, 2000, 522–533
Meier L, Geer SVD, Bühlmann P. High-dimensional additive modeling. Ann Statist, 2009, 37: 3779–3821
Pollard D. Convergence of Stochastic Processes. New York: Springer, 1984
Powell J L. Least absolute deviations estimation for the censored regression model. J Econometrics, 1984, 25: 303–325
Ravikumar P, Lafferty J, Liu H, et al. Sparse additive models. J R Stat Soc Ser B Stat Methodol, 2009, 71: 1009–1030
Schumaker L L. Splines Function: Basic Theory. New York: Wiley, 1981
Silverman B W. Density Estimation for Statistics and Data Analysis. London: Chapman and Hall, 1986
Stone C J. Additive regression and other nonparametric models. Ann Statist, 1985, 13: 689–705
Wang H S. Forward regression for ultrahigh dimensional variable screening. J Amer Statist Assoc, 2009, 104: 1512–1524
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol, 2006, 68: 49–67
Xue L. Consistent variable selection in additive models. Statist Sinica, 2009, 19: 1281–1296
Zhong W. Robust sure independence screening for ultrahigh dimensional non-normal data. Acta Math Appl Sin, 2014, 30: 1885–1896
Zou F, Cui H J. Error density estimation in high-dimensional sparse linear model. Ann Inst Statist Math, 2020, 72: 427–449
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 11971324 and 11471223), Foundation of Science and Technology Innovation Service Capacity Building, Interdisciplinary Construction of Bioinformatics and Statistics, and Academy for Multidisciplinary Studies, Capital Normal University. The authors thank the reviewers for their constructive comments, which have led to an improvement of the earlier version of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zou, F., Cui, H. RCV-based error density estimation in the ultrahigh dimensional additive model. Sci. China Math. 65, 1003–1028 (2022). https://doi.org/10.1007/s11425-019-1722-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-019-1722-2
Keywords
- ultrahigh dimensional additive model
- B-spline
- kernel density estimation
- refitted cross-validation method
- asymptotic property