Skip to main content

High Dimensional Data Analysis: Integrating Submodels

  • Chapter
  • First Online:

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

Abstract

We consider an efficient prediction in sparse high dimensional data. In high dimensional data settings where d ≫ n, many penalized regularization strategies are suggested for simultaneous variable selection and estimation. However, different strategies yield a different submodel with d i  < n, where d i represents the number of predictors included in ith submodel. Some procedures may select a submodel with a larger number of predictors than others. Due to the trade-off between model complexity and model prediction accuracy, the statistical inference of model selection becomes extremely important and challenging in high dimensional data analysis. For this reason we suggest shrinkage and pretest strategies to improve the prediction performance of two selected submodels. Such a pretest and shrinkage strategy is constructed by shrinking an overfitted model estimator in the direction of an underfitted model estimator. The numerical studies indicate that our post-selection pretest and shrinkage strategy improved the prediction performance of selected submodels.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ahmed, S.E.: Shrinkage preliminary test estimation in multivariate normal distributions. J. Stat. Comput. Simul. 43, 177–195 (1992)

    Article  MathSciNet  Google Scholar 

  2. Ahmed, S.E.: Penalty, Shrinkage and Pretest Strategies: Variable Selection and Estimation. Springer, New York (2014)

    Google Scholar 

  3. Ahmed, S.E., Yüzbaşı, B.: Big data analytics: integrating penalty strategies. Int. J. Manage. Sci. Eng. Manage. 11 (2), 105–115 (2016)

    Google Scholar 

  4. Ahmed, S.E., Hossain, S., Doksum, K.A.: Lasso and shrinkage estimation in Weibull censored regression models. J. Stat. Plann. Inference 142 (6), 1273–1284 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  5. Armagan, A., Dunson, D.B., Lee, J.: Generalized double Pareto shrinkage. Stat. Sin. 23 (1), 119 (2013)

    MathSciNet  MATH  Google Scholar 

  6. Beer, D.G., Kardia, S.L., Huang, C.C., Giordano, T.J., Levin, A.M., Misek, D.E., Lin, L., Chen, G., Gharib, T.G., Thomas, D.G., Lizyness, M.L., Kuick, R., Hayasaka, S., Taylor, J.M., Iannettoni, M.D., Orringer, M.B., Hanash, S.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8 (8), 816–824 (2002)

    Google Scholar 

  7. Belloni, A., Chernozhukov, V.: Least squares after model selection in high dimensional sparse models. Bernoulli 19, 521–547 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bhattacharya, A., Pati, D., Pillai, N.S., Dunson, D.B.: Bayesian shrinkage (2012). arXiv preprint arXiv:1212.6088

    Google Scholar 

  9. Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of Lasso and dantzig selector. Ann. Stat. 37, 1705–1732 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  10. Bullinger, L., Döhner, K., Bair, E., Fröhling, S., Schlenk, R.F., Tibshirani, R., Döhner, H., Pollack, J.R.: Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. New Engl. J. Med. 350 (16), 1605–1616 (2004)

    Article  Google Scholar 

  11. Carvalho, C.M., Polson, N.G., Scott, J.G.: The horseshoe estimator for sparse signals, Biometrika 97, 465–480 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  12. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32 (2), 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  13. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96 (456), 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  14. Hansen, B.E.: The risk of james-stein and Lasso shrinkage (2013). http://www.ssc.wisc.edu/bhansen/papers/Lasso.pdf

    Google Scholar 

  15. Huang, J., Ma, S., Zhang, C.H.: Adaptive Lasso for sparse high dimensional regression models. Stat. Sin. 18 (4), 1603–1618 (2008)

    MathSciNet  MATH  Google Scholar 

  16. Kim, Y., Choi, H., Oh, H.S.: Smoothly clipped absolute deviation on high dimensions. J. Am. Stat. Assoc. 103 (484), 1665–1673 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  17. Leng, C., Lin, Y., Wahba, G.: A note on the Lasso and related procedures in model selection. Stat. Sin. 16 (4), 1273–1284 (2006)

    MathSciNet  MATH  Google Scholar 

  18. Liu, H., Yu, B.: Asymptotic properties of Lasso+mLS and Lasso+Ridge in sparse high dimensional linear regression. Electron. J. Stat. 7, 3124–3169 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  19. Schelldorfer, J., Bühlmann, P., van de Geer, S.: Estimation for high dimensional linear mixed effects models using L 1-penalization. Scand. J. Stat. 38 (2), 197–214 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  20. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodological), 58 (1), 267–288 (1996)

    Google Scholar 

  21. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused Lasso. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 67 (1), 91–108 (2005)

    Google Scholar 

  22. Tran, M.N.: The loss rank criterion for variable selection in linear regression analysis. Scand. J. Stat. 38 (3), 466–479 (2011)

    MathSciNet  MATH  Google Scholar 

  23. Wang, H., Leng, C.: Unified Lasso estimation by least squares approximation. J. Am. Stat. Assoc. 102 (479), 1039–1048 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  24. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68 (1), 49–67 (2006)

    Google Scholar 

  25. Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  26. Zhang, C.H., Zhang, S.S.: Confidence intervals for low-dimensional parameters in high dimensional linear models. Ann. Stat. 76, 217–242 (2014)

    MathSciNet  Google Scholar 

  27. Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)

    MathSciNet  MATH  Google Scholar 

  28. Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101 (476), 1418–1429 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  29. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Syed Ejaz Ahmed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Ejaz Ahmed, S., Yüzbaşı, B. (2017). High Dimensional Data Analysis: Integrating Submodels. In: Ahmed, S. (eds) Big and Complex Data Analysis. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-41573-4_14

Download citation

Publish with us

Policies and ethics