Skip to main content

High-Dimensional Regression Under Correlated Design: An Extensive Simulation Study

  • Conference paper
  • First Online:
Matrices, Statistics and Big Data (IWMS 2016)

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

Included in the following conference series:

Abstract

Regression problems where the number of predictors, p, exceeds the number of responses, n, have become increasingly important in many diverse fields in the last couple of decades. In the classical case of “small p and large n,” the least squares estimator is a practical and effective tool for estimating the model parameters. However, in this so-called Big Data era, models have the characteristic that p is much larger than n. Statisticians have developed a number of regression techniques for dealing with such problems, such as the Lasso by Tibshirani (J R Stat Soc Ser B Stat Methodol 58:267–288, 1996), the SCAD by Fan and Li (J Am Stat Assoc 96(456):1348–1360, 2001), the LARS algorithm by Efron et al. (Ann Stat 32(2):407–499, 2004), the MCP estimator by Zhang (Ann Stat. 38:894–942, 2010), and a tuning-free regression algorithm by Chatterjee (High dimensional regression and matrix estimation without tuning parameters, 2015, https://arxiv.org/abs/1510.07294). In this paper, we investigate the relative performances of some of these methods for parameter estimation and variable selection through analyzing real and synthetic data sets. By an extensive Monte Carlo simulation study, we also compare the relative performance of proposed methods under correlated design matrix.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ahmed, S. E. (2014). Penalty, shrinkage and pretest strategies: Variable selection and estimation. New York: Springer.

    Book  Google Scholar 

  2. Belloni, A., Chernozhukov, V., & Wang, L. (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4), 791–806.

    Article  MathSciNet  Google Scholar 

  3. Bickel, P. J., Ritov, Y., & Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. The Annals of Statistics, 37, 1705–1732.

    Article  MathSciNet  Google Scholar 

  4. Bühlmann, P., Kalisch, M., & Meier, L. (2014). High-dimensional statistics with a view towards applications in biology. Annual Review of Statistics and Its Applications, 1, 255–278.

    Article  Google Scholar 

  5. Bühlmann, P., & van de Geer, S. (2011). Statistics for high-dimensional data. Methods,theory and applications. Springer series in statistics. Heidelberg: Springer.

    MATH  Google Scholar 

  6. Chatterjee, S. (2015). High dimensional regression and matrix estimation without tuning parameters. https://arxiv.org/abs/1510.07294

    Google Scholar 

  7. Chatterjee, S., & Jafarov, J. (2016). Prediction error of cross-validated Lasso. https://arxiv.org/abs/1502.06291

    Google Scholar 

  8. Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499.

    Article  MathSciNet  Google Scholar 

  9. Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.

    Article  MathSciNet  Google Scholar 

  10. Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1.

    Article  Google Scholar 

  11. Frank, I., & Friedman, J. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics, 35, 109–148 (1993).

    Article  Google Scholar 

  12. Fu, W. J. (1998). Penalized regressions: The bridge versus the lasso. Journal of Computational and Graphical Statistics, 7(3), 397–416.

    MathSciNet  Google Scholar 

  13. Greenshtein, E., & Ritov, Y. A. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli, 10(6), 971–988.

    Article  MathSciNet  Google Scholar 

  14. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. Data mining, inference, and prediction. Springer series in statistics (2nd ed.). New York: Springer.

    MATH  Google Scholar 

  15. Hastie, T., Wainwright, M., & Tibshirani, R. (2016). Statistical learning with sparsity: The lasso and generalizations. Boca Raton, FL: Chapman and Hall/CRC.

    MATH  Google Scholar 

  16. Hebiri, M., & Lederer, J. (2013). How correlations influence lasso prediction. IEEE Xplore: IEEE Transactions on Information Theory, 59(3), 1846–1854.

    MathSciNet  MATH  Google Scholar 

  17. Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for non-orthogonal problems. Technometrics, 12, 69–82.

    Article  Google Scholar 

  18. Huang, J., Ma, S., & Zhang, C. H. (2008). Adaptive Lasso for sparse high-dimensional regression models. Statistica Sinica, 18, 1603–1618

    MathSciNet  MATH  Google Scholar 

  19. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. Springer texts in statistics. New York: Springer.

    Book  Google Scholar 

  20. Knight, K., & Fu, W. (2000). Asymptotics for lasso-type estimators. The Annals of Statistics, 28, 1356–1378.

    Article  MathSciNet  Google Scholar 

  21. Lederer, J., & Muller, C. (2014). Don’t fall for tuning parameters: Tuning-free variable selection in high dimensions with the TREX. Preprint. arXiv:1404.0541.

    Google Scholar 

  22. Leng, C., Lin, Y., & Wahba, G. (2006). A note on the lasso and related procedures in model selection. Statistica Sinica, 16, 1273–1284.

    MathSciNet  MATH  Google Scholar 

  23. Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462.

    Article  MathSciNet  Google Scholar 

  24. Reid, S., Tibshirani, R., & Friedman, J. (2016). A study of error variance estimation in lasso regression. Statistica Sinica, 26(1), 35–67.

    MathSciNet  MATH  Google Scholar 

  25. Stamey, T. A., Kabalin, J. N., McNeal, J. E., Johnstone, I. M., Freiha, F., Redwine, E. A., et al. (1989). Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients. The Journal of Urology, 141(5), 1076–1083.

    Article  Google Scholar 

  26. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B Statistical Methodology, 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  27. Tibshirani, R. (2013). The lasso problem and uniqueness. Electronic Journal of Statistics, 7, 1456–1490

    Article  MathSciNet  Google Scholar 

  28. Zhang, C. H. (2007). Penalized linear unbiased selection. Department of Statistics and Bioinformatics, Rutgers University, 3

    Google Scholar 

  29. Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894–942.

    Article  MathSciNet  Google Scholar 

  30. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The research of S. Ejaz Ahmed is supported by the Natural Sciences and the Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Ejaz Ahmed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ahmed, S.E., Kim, H., Yıldırım, G., Yüzbaşı, B. (2019). High-Dimensional Regression Under Correlated Design: An Extensive Simulation Study. In: Ahmed, S., Carvalho, F., Puntanen, S. (eds) Matrices, Statistics and Big Data. IWMS 2016. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-17519-1_11

Download citation

Publish with us

Policies and ethics