Robust Error Density Estimation in Ultrahigh Dimensional Sparse Linear Model

Zou, Feng; Cui, Heng Jian

doi:10.1007/s10114-022-1134-2

Robust Error Density Estimation in Ultrahigh Dimensional Sparse Linear Model

Published: 15 June 2022

Volume 38, pages 963–984, (2022)
Cite this article

Acta Mathematica Sinica, English Series Aims and scope Submit manuscript

Feng Zou¹ &
Heng Jian Cui¹

79 Accesses
Explore all metrics

Abstract

This paper focuses on error density estimation in ultrahigh dimensional sparse linear model, where the error term may have a heavy-tailed distribution. First, an improved two-stage refitted cross-validation method combined with some robust variable screening procedures such as RRCS and variable selection methods such as LAD-SCAD is used to obtain the submodel, and then the residual-based kernel density method is applied to estimate the error density through LAD regression. Under given conditions, the large sample properties of the estimator are also established. Especially, we explicitly give the relationship between the sparsity and the convergence rate of the kernel density estimator. The simulation results show that the proposed error density estimator has a good performance. A real data example is presented to illustrate our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning

Article Open access 01 July 2018

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

References

Anscombe, F. J., Glynn, W. J.: Distribution of kurtosis statistic for normal statistics. Biometrika, 70, 227–234 (1983)
MathSciNet MATH Google Scholar
Arslan, O.: Weighted lad-lasso method for robust parameter estimation and variable selection in regression. Computational Statistics & Data Analysis, 56, 1952–1965 (2012)
Article MathSciNet MATH Google Scholar
Bassett, G., Koenker, R.: Asymptotic theory of least absolute error regression. Journal of the American Statistical Association, 73, 618–622 (1978)
Article MathSciNet MATH Google Scholar
Bowman, A. W.: An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71, 353–360 (1984)
Article MathSciNet Google Scholar
Chai, G., Li, Z.: Asymptotic theory for estimation of error distribution in linear model. Science in China Series A, 36, 408–419 (1993)
MathSciNet MATH Google Scholar
Chen, X., Bai, Z., Zhao, L., et al.: Asymptotic normality of minimum l₁-norm estimates in linear model. Science in China Series A, 33, 1311–1328 (1990)
Google Scholar
Cheng, F.: Weak and strong uniform consistency of a kernel error density estimator in nonparametric regression. Journal of Statistical Planning and Inference, 119, 95–107 (2004)
Article MathSciNet MATH Google Scholar
Cheng, F.: Asymptotic distributions of error density and distribution function estimators in nonparametric regression. Journal of Statistical Planning and Inference, 128, 327–349 (2005)
Article MathSciNet MATH Google Scholar
D’Agostino, R. B.: Transformation to normality of the null distribution of g1. Biometrika, 57, 679–681 (1970)
MATH Google Scholar
Fan, J., Guo, S., Hao, N.: Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society Series B, 74, 37–65 (2012)
Article MathSciNet MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconvave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1346 (2001)
Article MathSciNet MATH Google Scholar
Fan, J., Lv, J.: Sure independence screening for ultra-high dimensional feature space. Journal of the Royal Statistical Society Series B, 70, 849–911 (2008)
Article MathSciNet MATH Google Scholar
Fu, W. J., Knight, K.: Asymptotics for lasso-type estimators. Annals of Statistics, 28, 1356–1378 (2000)
Article MathSciNet MATH Google Scholar
Gao, X., Huang, J.: Asymptotic analysis of high-dimensional lad regression with lasso smoother. Statistica Sinica, 20, 187–193 (2010)
MATH Google Scholar
Hall, P.: Laws of the iterated logarithm for nonparametric density estimators. Probability Theory & Related Fields, 56, 47–61 (1981)
MathSciNet MATH Google Scholar
Hall, P.: Large sample optimality of least squares cross-validation in density estimation. Annals of Statistics, 11, 1156–1174 (1983)
MathSciNet MATH Google Scholar
Huang, J., Horowitz, J. L., Ma, S.: Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Annals of Statistics, 36, 587–613 (2008)
MathSciNet MATH Google Scholar
Jarque, C. M., Bera, A. K.: A test for normality of observations and regression residuals. International Statistical Review, 55, 163–172 (1987)
Article MathSciNet MATH Google Scholar
Koenker, R., Bassett, G. W.: Regression quantiles. Econometrica, 46, 211–244 (1978)
Article MathSciNet MATH Google Scholar
Li, G., Peng, H., Zhang, J., et al.: Robust rank correlation based screening. Annals of Statistics, 40, 1846–1877 (2012)
MathSciNet MATH Google Scholar
Li, G., Peng, H., Zhu, L.: Nonconcave penalized m-estimation with a diverging number of parameters. Statistica Sinica, 21, 391–419 (2011)
MathSciNet MATH Google Scholar
Li, R., Zhong, W., Zhu, L.: Feature screening via distance correlation learning. Journal of the American Statistical Association, 107, 1129–1139 (2012)
Article MathSciNet MATH Google Scholar
Li, Z.: A study of nonparametric estimation of error distribution in linear model based on l₁-norm. Hiroshima Mathematical Journal, 25, 171–205 (1995)
MathSciNet Google Scholar
Liang, H., Hardle, W.: Large sample theory of the estimation of the error distribution for a semiparametric model. Communications in Statistics, 28, 2025–2036 (1999)
Article MathSciNet MATH Google Scholar
Maye, J., Gerken, L.: Learning phonemes without minimal pairs. In: Proceedings of the 24th Annual Boston University Conference on Language Development, Vol. 2, 522–533 (2000)
McKean, J., Schrader, R.: Least Absolute Errors Analysis of Variance, Statistical Data Analysis Based on the l₁-norm and Related Methods, North Holland, Amsterdam, 1987
Meinshausen, N.: Relaxed lasso. Computational Statistics & Data Analysis, 52, 374–393 (2007)
Article MathSciNet MATH Google Scholar
Meinshausen, N., Meier, L., Bühlmann, P.: p-values for high-dimensional regression. Journal of the American Statistical Association, 104, 1671–1681 (2009)
Article MathSciNet MATH Google Scholar
Pollard, D.: Asymptotics for least absolute deviation regression estimators. Econometric Theory, 7, 186–199 (1991)
Article MathSciNet Google Scholar
Portnoy, S., Koenker, R.: The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Statistical Science, 12, 279–300 (1997)
Article MathSciNet MATH Google Scholar
Powell, J.L.: Least absolute deviations estimation for the censored regression model. Journal of Econometrics, 25, 303–325 (1984)
Article MathSciNet MATH Google Scholar
Powell, J. L.: Censored regression quantiles. Journal of Econometrics, 32, 143–155 (1986)
Article MathSciNet MATH Google Scholar
Pourahmdai, M.: High-Dimensional Covariance Estimation, Wiley, New Jersey, 2013
Book Google Scholar
Rudemo, M.: Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics, 9, 65–78 (1982)
MathSciNet MATH Google Scholar
Silverman, B. W.: Density Estimation for Statistics and Data Analysis, Chapman and Hall, London, 1986
MATH Google Scholar
Stadler, N., Bühlmann, P., Geer, S.V.D.: l₁-penalization for mixture regression models. Test, 19, 209–256 (2010)
Article MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Wahid, A., Khan, D. M., Hussain, I.: Robust adaptive lasso method for parameter’s estimation and variable selection in high-dimensional sparse models. PLoS ONE, 12, 1–17 (2017)
Article Google Scholar
Wang, H., Li, G., Jiang, G.: Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business & Economic Statstics, 25, 347–355 (2007)
Article MathSciNet Google Scholar
Wang, L.: The l₁ penalized lad estimator for high dimensional linear regression. Journal of Multivariate Analysis, 120, 135–151 (2013)
Article MathSciNet MATH Google Scholar
Wang, L., Wu, Y., Li, R.: Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107, 214–222 (2012)
Article MathSciNet MATH Google Scholar
Wu, Y.: Strong consistency and exponential rate of the “minimum l₁-norm” estimates in linear regression models. Computational Statistics & Data Analysis, 6, 285–295 (1988)
Article MathSciNet MATH Google Scholar
Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38, 894–942 (2010)
Article MathSciNet MATH Google Scholar
Zhong, W., Zhu, L., Li, R., et al.: Regularized quantile regression and robust feature screening for single index models. Statistica Sinica, 26, 69–95 (2016)
MathSciNet MATH Google Scholar
Zou, F., Cui, H.: Error density estimation in high-dimensional sparse linear model. Annals of the Institute of Statal Mathematics, 72, 427–449 (2020)
Article MathSciNet MATH Google Scholar
Zou, F., Cui, H.: RCV-based error density estimation in the ultrahigh dimensional additive model. Science China Mathemetics, 2022, 65, https://doi.org/10.1007/s11425-019-1722-2

Download references

Acknowledgements

The authors thank the Editor, the AE and reviewers for their constructive comments, which have led to an improvement of the earlier version of this paper.

Author information

Authors and Affiliations

School of Mathematical Sciences, Capital Normal University, Beijing, 100048, P. R. China
Feng Zou & Heng Jian Cui

Authors

Feng Zou
View author publications
You can also search for this author in PubMed Google Scholar
Heng Jian Cui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heng Jian Cui.

Additional information

Supported by the National Natural Science Foundation of China (Grant No. 11971324) and the State Key Program of National Natural Science Foundation of China (Grant No. 12031016)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zou, F., Cui, H.J. Robust Error Density Estimation in Ultrahigh Dimensional Sparse Linear Model. Acta. Math. Sin.-English Ser. 38, 963–984 (2022). https://doi.org/10.1007/s10114-022-1134-2

Download citation

Received: 26 February 2021
Revised: 08 June 2021
Accepted: 13 July 2021
Published: 15 June 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s10114-022-1134-2

Keywords

MR(2010) Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Error Density Estimation in Ultrahigh Dimensional Sparse Linear Model

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

MR(2010) Subject Classification

Navigation

Robust Error Density Estimation in Ultrahigh Dimensional Sparse Linear Model

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

MR(2010) Subject Classification

Search

Navigation