Abstract
Regression problems where the number of predictors, p, exceeds the number of responses, n, have become increasingly important in many diverse fields in the last couple of decades. In the classical case of “small p and large n,” the least squares estimator is a practical and effective tool for estimating the model parameters. However, in this so-called Big Data era, models have the characteristic that p is much larger than n. Statisticians have developed a number of regression techniques for dealing with such problems, such as the Lasso by Tibshirani (J R Stat Soc Ser B Stat Methodol 58:267–288, 1996), the SCAD by Fan and Li (J Am Stat Assoc 96(456):1348–1360, 2001), the LARS algorithm by Efron et al. (Ann Stat 32(2):407–499, 2004), the MCP estimator by Zhang (Ann Stat. 38:894–942, 2010), and a tuning-free regression algorithm by Chatterjee (High dimensional regression and matrix estimation without tuning parameters, 2015, https://arxiv.org/abs/1510.07294). In this paper, we investigate the relative performances of some of these methods for parameter estimation and variable selection through analyzing real and synthetic data sets. By an extensive Monte Carlo simulation study, we also compare the relative performance of proposed methods under correlated design matrix.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmed, S. E. (2014). Penalty, shrinkage and pretest strategies: Variable selection and estimation. New York: Springer.
Belloni, A., Chernozhukov, V., & Wang, L. (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4), 791–806.
Bickel, P. J., Ritov, Y., & Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. The Annals of Statistics, 37, 1705–1732.
Bühlmann, P., Kalisch, M., & Meier, L. (2014). High-dimensional statistics with a view towards applications in biology. Annual Review of Statistics and Its Applications, 1, 255–278.
Bühlmann, P., & van de Geer, S. (2011). Statistics for high-dimensional data. Methods,theory and applications. Springer series in statistics. Heidelberg: Springer.
Chatterjee, S. (2015). High dimensional regression and matrix estimation without tuning parameters. https://arxiv.org/abs/1510.07294
Chatterjee, S., & Jafarov, J. (2016). Prediction error of cross-validated Lasso. https://arxiv.org/abs/1502.06291
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499.
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1.
Frank, I., & Friedman, J. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics, 35, 109–148 (1993).
Fu, W. J. (1998). Penalized regressions: The bridge versus the lasso. Journal of Computational and Graphical Statistics, 7(3), 397–416.
Greenshtein, E., & Ritov, Y. A. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli, 10(6), 971–988.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. Data mining, inference, and prediction. Springer series in statistics (2nd ed.). New York: Springer.
Hastie, T., Wainwright, M., & Tibshirani, R. (2016). Statistical learning with sparsity: The lasso and generalizations. Boca Raton, FL: Chapman and Hall/CRC.
Hebiri, M., & Lederer, J. (2013). How correlations influence lasso prediction. IEEE Xplore: IEEE Transactions on Information Theory, 59(3), 1846–1854.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for non-orthogonal problems. Technometrics, 12, 69–82.
Huang, J., Ma, S., & Zhang, C. H. (2008). Adaptive Lasso for sparse high-dimensional regression models. Statistica Sinica, 18, 1603–1618
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. Springer texts in statistics. New York: Springer.
Knight, K., & Fu, W. (2000). Asymptotics for lasso-type estimators. The Annals of Statistics, 28, 1356–1378.
Lederer, J., & Muller, C. (2014). Don’t fall for tuning parameters: Tuning-free variable selection in high dimensions with the TREX. Preprint. arXiv:1404.0541.
Leng, C., Lin, Y., & Wahba, G. (2006). A note on the lasso and related procedures in model selection. Statistica Sinica, 16, 1273–1284.
Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462.
Reid, S., Tibshirani, R., & Friedman, J. (2016). A study of error variance estimation in lasso regression. Statistica Sinica, 26(1), 35–67.
Stamey, T. A., Kabalin, J. N., McNeal, J. E., Johnstone, I. M., Freiha, F., Redwine, E. A., et al. (1989). Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients. The Journal of Urology, 141(5), 1076–1083.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B Statistical Methodology, 58, 267–288.
Tibshirani, R. (2013). The lasso problem and uniqueness. Electronic Journal of Statistics, 7, 1456–1490
Zhang, C. H. (2007). Penalized linear unbiased selection. Department of Statistics and Bioinformatics, Rutgers University, 3
Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894–942.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.
Acknowledgements
The research of S. Ejaz Ahmed is supported by the Natural Sciences and the Engineering Research Council of Canada (NSERC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ahmed, S.E., Kim, H., Yıldırım, G., Yüzbaşı, B. (2019). High-Dimensional Regression Under Correlated Design: An Extensive Simulation Study. In: Ahmed, S., Carvalho, F., Puntanen, S. (eds) Matrices, Statistics and Big Data. IWMS 2016. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-17519-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-17519-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17518-4
Online ISBN: 978-3-030-17519-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)