Abstract
This paper focuses on error density estimation in ultrahigh dimensional sparse linear model, where the error term may have a heavy-tailed distribution. First, an improved two-stage refitted cross-validation method combined with some robust variable screening procedures such as RRCS and variable selection methods such as LAD-SCAD is used to obtain the submodel, and then the residual-based kernel density method is applied to estimate the error density through LAD regression. Under given conditions, the large sample properties of the estimator are also established. Especially, we explicitly give the relationship between the sparsity and the convergence rate of the kernel density estimator. The simulation results show that the proposed error density estimator has a good performance. A real data example is presented to illustrate our methods.
Similar content being viewed by others
References
Anscombe, F. J., Glynn, W. J.: Distribution of kurtosis statistic for normal statistics. Biometrika, 70, 227–234 (1983)
Arslan, O.: Weighted lad-lasso method for robust parameter estimation and variable selection in regression. Computational Statistics & Data Analysis, 56, 1952–1965 (2012)
Bassett, G., Koenker, R.: Asymptotic theory of least absolute error regression. Journal of the American Statistical Association, 73, 618–622 (1978)
Bowman, A. W.: An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71, 353–360 (1984)
Chai, G., Li, Z.: Asymptotic theory for estimation of error distribution in linear model. Science in China Series A, 36, 408–419 (1993)
Chen, X., Bai, Z., Zhao, L., et al.: Asymptotic normality of minimum l1-norm estimates in linear model. Science in China Series A, 33, 1311–1328 (1990)
Cheng, F.: Weak and strong uniform consistency of a kernel error density estimator in nonparametric regression. Journal of Statistical Planning and Inference, 119, 95–107 (2004)
Cheng, F.: Asymptotic distributions of error density and distribution function estimators in nonparametric regression. Journal of Statistical Planning and Inference, 128, 327–349 (2005)
D’Agostino, R. B.: Transformation to normality of the null distribution of g1. Biometrika, 57, 679–681 (1970)
Fan, J., Guo, S., Hao, N.: Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society Series B, 74, 37–65 (2012)
Fan, J., Li, R.: Variable selection via nonconvave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1346 (2001)
Fan, J., Lv, J.: Sure independence screening for ultra-high dimensional feature space. Journal of the Royal Statistical Society Series B, 70, 849–911 (2008)
Fu, W. J., Knight, K.: Asymptotics for lasso-type estimators. Annals of Statistics, 28, 1356–1378 (2000)
Gao, X., Huang, J.: Asymptotic analysis of high-dimensional lad regression with lasso smoother. Statistica Sinica, 20, 187–193 (2010)
Hall, P.: Laws of the iterated logarithm for nonparametric density estimators. Probability Theory & Related Fields, 56, 47–61 (1981)
Hall, P.: Large sample optimality of least squares cross-validation in density estimation. Annals of Statistics, 11, 1156–1174 (1983)
Huang, J., Horowitz, J. L., Ma, S.: Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Annals of Statistics, 36, 587–613 (2008)
Jarque, C. M., Bera, A. K.: A test for normality of observations and regression residuals. International Statistical Review, 55, 163–172 (1987)
Koenker, R., Bassett, G. W.: Regression quantiles. Econometrica, 46, 211–244 (1978)
Li, G., Peng, H., Zhang, J., et al.: Robust rank correlation based screening. Annals of Statistics, 40, 1846–1877 (2012)
Li, G., Peng, H., Zhu, L.: Nonconcave penalized m-estimation with a diverging number of parameters. Statistica Sinica, 21, 391–419 (2011)
Li, R., Zhong, W., Zhu, L.: Feature screening via distance correlation learning. Journal of the American Statistical Association, 107, 1129–1139 (2012)
Li, Z.: A study of nonparametric estimation of error distribution in linear model based on l1-norm. Hiroshima Mathematical Journal, 25, 171–205 (1995)
Liang, H., Hardle, W.: Large sample theory of the estimation of the error distribution for a semiparametric model. Communications in Statistics, 28, 2025–2036 (1999)
Maye, J., Gerken, L.: Learning phonemes without minimal pairs. In: Proceedings of the 24th Annual Boston University Conference on Language Development, Vol. 2, 522–533 (2000)
McKean, J., Schrader, R.: Least Absolute Errors Analysis of Variance, Statistical Data Analysis Based on the l1-norm and Related Methods, North Holland, Amsterdam, 1987
Meinshausen, N.: Relaxed lasso. Computational Statistics & Data Analysis, 52, 374–393 (2007)
Meinshausen, N., Meier, L., Bühlmann, P.: p-values for high-dimensional regression. Journal of the American Statistical Association, 104, 1671–1681 (2009)
Pollard, D.: Asymptotics for least absolute deviation regression estimators. Econometric Theory, 7, 186–199 (1991)
Portnoy, S., Koenker, R.: The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Statistical Science, 12, 279–300 (1997)
Powell, J.L.: Least absolute deviations estimation for the censored regression model. Journal of Econometrics, 25, 303–325 (1984)
Powell, J. L.: Censored regression quantiles. Journal of Econometrics, 32, 143–155 (1986)
Pourahmdai, M.: High-Dimensional Covariance Estimation, Wiley, New Jersey, 2013
Rudemo, M.: Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics, 9, 65–78 (1982)
Silverman, B. W.: Density Estimation for Statistics and Data Analysis, Chapman and Hall, London, 1986
Stadler, N., Bühlmann, P., Geer, S.V.D.: l1-penalization for mixture regression models. Test, 19, 209–256 (2010)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58, 267–288 (1996)
Wahid, A., Khan, D. M., Hussain, I.: Robust adaptive lasso method for parameter’s estimation and variable selection in high-dimensional sparse models. PLoS ONE, 12, 1–17 (2017)
Wang, H., Li, G., Jiang, G.: Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business & Economic Statstics, 25, 347–355 (2007)
Wang, L.: The l1 penalized lad estimator for high dimensional linear regression. Journal of Multivariate Analysis, 120, 135–151 (2013)
Wang, L., Wu, Y., Li, R.: Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107, 214–222 (2012)
Wu, Y.: Strong consistency and exponential rate of the “minimum l1-norm” estimates in linear regression models. Computational Statistics & Data Analysis, 6, 285–295 (1988)
Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38, 894–942 (2010)
Zhong, W., Zhu, L., Li, R., et al.: Regularized quantile regression and robust feature screening for single index models. Statistica Sinica, 26, 69–95 (2016)
Zou, F., Cui, H.: Error density estimation in high-dimensional sparse linear model. Annals of the Institute of Statal Mathematics, 72, 427–449 (2020)
Zou, F., Cui, H.: RCV-based error density estimation in the ultrahigh dimensional additive model. Science China Mathemetics, 2022, 65, https://doi.org/10.1007/s11425-019-1722-2
Acknowledgements
The authors thank the Editor, the AE and reviewers for their constructive comments, which have led to an improvement of the earlier version of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China (Grant No. 11971324) and the State Key Program of National Natural Science Foundation of China (Grant No. 12031016)
Rights and permissions
About this article
Cite this article
Zou, F., Cui, H.J. Robust Error Density Estimation in Ultrahigh Dimensional Sparse Linear Model. Acta. Math. Sin.-English Ser. 38, 963–984 (2022). https://doi.org/10.1007/s10114-022-1134-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10114-022-1134-2
Keywords
- Ultrahigh dimensional sparse linear model
- robust density estimation
- refitted cross-validation method
- asymptotic properties