High Dimensional Data Analysis: Integrating Submodels

Ejaz Ahmed, Syed; Yüzbaşı, Bahadır

doi:10.1007/978-3-319-41573-4_14

High Dimensional Data Analysis: Integrating Submodels

Syed Ejaz Ahmed² &
Bahadır Yüzbaşı³

Chapter
First Online: 22 March 2017

4362 Accesses
3 Citations
2 Altmetric

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

Abstract

We consider an efficient prediction in sparse high dimensional data. In high dimensional data settings where d ≫ n, many penalized regularization strategies are suggested for simultaneous variable selection and estimation. However, different strategies yield a different submodel with d _i < n, where d _i represents the number of predictors included in ith submodel. Some procedures may select a submodel with a larger number of predictors than others. Due to the trade-off between model complexity and model prediction accuracy, the statistical inference of model selection becomes extremely important and challenging in high dimensional data analysis. For this reason we suggest shrinkage and pretest strategies to improve the prediction performance of two selected submodels. Such a pretest and shrinkage strategy is constructed by shrinking an overfitted model estimator in the direction of an underfitted model estimator. The numerical studies indicate that our post-selection pretest and shrinkage strategy improved the prediction performance of selected submodels.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ahmed, S.E.: Shrinkage preliminary test estimation in multivariate normal distributions. J. Stat. Comput. Simul. 43, 177–195 (1992)
Article MathSciNet Google Scholar
Ahmed, S.E.: Penalty, Shrinkage and Pretest Strategies: Variable Selection and Estimation. Springer, New York (2014)
Google Scholar
Ahmed, S.E., Yüzbaşı, B.: Big data analytics: integrating penalty strategies. Int. J. Manage. Sci. Eng. Manage. 11 (2), 105–115 (2016)
Google Scholar
Ahmed, S.E., Hossain, S., Doksum, K.A.: Lasso and shrinkage estimation in Weibull censored regression models. J. Stat. Plann. Inference 142 (6), 1273–1284 (2012)
Article MathSciNet MATH Google Scholar
Armagan, A., Dunson, D.B., Lee, J.: Generalized double Pareto shrinkage. Stat. Sin. 23 (1), 119 (2013)
MathSciNet MATH Google Scholar
Beer, D.G., Kardia, S.L., Huang, C.C., Giordano, T.J., Levin, A.M., Misek, D.E., Lin, L., Chen, G., Gharib, T.G., Thomas, D.G., Lizyness, M.L., Kuick, R., Hayasaka, S., Taylor, J.M., Iannettoni, M.D., Orringer, M.B., Hanash, S.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8 (8), 816–824 (2002)
Google Scholar
Belloni, A., Chernozhukov, V.: Least squares after model selection in high dimensional sparse models. Bernoulli 19, 521–547 (2009)
Article MathSciNet MATH Google Scholar
Bhattacharya, A., Pati, D., Pillai, N.S., Dunson, D.B.: Bayesian shrinkage (2012). arXiv preprint arXiv:1212.6088
Google Scholar
Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of Lasso and dantzig selector. Ann. Stat. 37, 1705–1732 (2009)
Article MathSciNet MATH Google Scholar
Bullinger, L., Döhner, K., Bair, E., Fröhling, S., Schlenk, R.F., Tibshirani, R., Döhner, H., Pollack, J.R.: Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. New Engl. J. Med. 350 (16), 1605–1616 (2004)
Article Google Scholar
Carvalho, C.M., Polson, N.G., Scott, J.G.: The horseshoe estimator for sparse signals, Biometrika 97, 465–480 (2010)
Article MathSciNet MATH Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32 (2), 407–499 (2004)
Article MathSciNet MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96 (456), 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Hansen, B.E.: The risk of james-stein and Lasso shrinkage (2013). http://www.ssc.wisc.edu/bhansen/papers/Lasso.pdf
Google Scholar
Huang, J., Ma, S., Zhang, C.H.: Adaptive Lasso for sparse high dimensional regression models. Stat. Sin. 18 (4), 1603–1618 (2008)
MathSciNet MATH Google Scholar
Kim, Y., Choi, H., Oh, H.S.: Smoothly clipped absolute deviation on high dimensions. J. Am. Stat. Assoc. 103 (484), 1665–1673 (2008)
Article MathSciNet MATH Google Scholar
Leng, C., Lin, Y., Wahba, G.: A note on the Lasso and related procedures in model selection. Stat. Sin. 16 (4), 1273–1284 (2006)
MathSciNet MATH Google Scholar
Liu, H., Yu, B.: Asymptotic properties of Lasso+mLS and Lasso+Ridge in sparse high dimensional linear regression. Electron. J. Stat. 7, 3124–3169 (2013)
Article MathSciNet MATH Google Scholar
Schelldorfer, J., Bühlmann, P., van de Geer, S.: Estimation for high dimensional linear mixed effects models using L ₁-penalization. Scand. J. Stat. 38 (2), 197–214 (2011)
Article MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodological), 58 (1), 267–288 (1996)
Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused Lasso. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 67 (1), 91–108 (2005)
Google Scholar
Tran, M.N.: The loss rank criterion for variable selection in linear regression analysis. Scand. J. Stat. 38 (3), 466–479 (2011)
MathSciNet MATH Google Scholar
Wang, H., Leng, C.: Unified Lasso estimation by least squares approximation. J. Am. Stat. Assoc. 102 (479), 1039–1048 (2012)
Article MathSciNet MATH Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68 (1), 49–67 (2006)
Google Scholar
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Article MathSciNet MATH Google Scholar
Zhang, C.H., Zhang, S.S.: Confidence intervals for low-dimensional parameters in high dimensional linear models. Ann. Stat. 76, 217–242 (2014)
MathSciNet Google Scholar
Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)
MathSciNet MATH Google Scholar
Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101 (476), 1418–1429 (2006)
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Brock University, St. Catherines, Ontario, Canada
Syed Ejaz Ahmed
Department of Econometrics, Inonu University, Malatya, Turkey
Bahadır Yüzbaşı

Authors

Syed Ejaz Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Bahadır Yüzbaşı
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Syed Ejaz Ahmed .

Editor information

Editors and Affiliations

Department of Mathematics & Statistics, Brock University, St. Catherines, Ontario, Canada
S. Ejaz Ahmed

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ejaz Ahmed, S., Yüzbaşı, B. (2017). High Dimensional Data Analysis: Integrating Submodels. In: Ahmed, S. (eds) Big and Complex Data Analysis. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-41573-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-41573-4_14
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41572-7
Online ISBN: 978-3-319-41573-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics