Abstract
We consider an efficient prediction in sparse high dimensional data. In high dimensional data settings where d ≫ n, many penalized regularization strategies are suggested for simultaneous variable selection and estimation. However, different strategies yield a different submodel with d i < n, where d i represents the number of predictors included in ith submodel. Some procedures may select a submodel with a larger number of predictors than others. Due to the trade-off between model complexity and model prediction accuracy, the statistical inference of model selection becomes extremely important and challenging in high dimensional data analysis. For this reason we suggest shrinkage and pretest strategies to improve the prediction performance of two selected submodels. Such a pretest and shrinkage strategy is constructed by shrinking an overfitted model estimator in the direction of an underfitted model estimator. The numerical studies indicate that our post-selection pretest and shrinkage strategy improved the prediction performance of selected submodels.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ahmed, S.E.: Shrinkage preliminary test estimation in multivariate normal distributions. J. Stat. Comput. Simul. 43, 177–195 (1992)
Ahmed, S.E.: Penalty, Shrinkage and Pretest Strategies: Variable Selection and Estimation. Springer, New York (2014)
Ahmed, S.E., Yüzbaşı, B.: Big data analytics: integrating penalty strategies. Int. J. Manage. Sci. Eng. Manage. 11 (2), 105–115 (2016)
Ahmed, S.E., Hossain, S., Doksum, K.A.: Lasso and shrinkage estimation in Weibull censored regression models. J. Stat. Plann. Inference 142 (6), 1273–1284 (2012)
Armagan, A., Dunson, D.B., Lee, J.: Generalized double Pareto shrinkage. Stat. Sin. 23 (1), 119 (2013)
Beer, D.G., Kardia, S.L., Huang, C.C., Giordano, T.J., Levin, A.M., Misek, D.E., Lin, L., Chen, G., Gharib, T.G., Thomas, D.G., Lizyness, M.L., Kuick, R., Hayasaka, S., Taylor, J.M., Iannettoni, M.D., Orringer, M.B., Hanash, S.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8 (8), 816–824 (2002)
Belloni, A., Chernozhukov, V.: Least squares after model selection in high dimensional sparse models. Bernoulli 19, 521–547 (2009)
Bhattacharya, A., Pati, D., Pillai, N.S., Dunson, D.B.: Bayesian shrinkage (2012). arXiv preprint arXiv:1212.6088
Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of Lasso and dantzig selector. Ann. Stat. 37, 1705–1732 (2009)
Bullinger, L., Döhner, K., Bair, E., Fröhling, S., Schlenk, R.F., Tibshirani, R., Döhner, H., Pollack, J.R.: Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. New Engl. J. Med. 350 (16), 1605–1616 (2004)
Carvalho, C.M., Polson, N.G., Scott, J.G.: The horseshoe estimator for sparse signals, Biometrika 97, 465–480 (2010)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32 (2), 407–499 (2004)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96 (456), 1348–1360 (2001)
Hansen, B.E.: The risk of james-stein and Lasso shrinkage (2013). http://www.ssc.wisc.edu/bhansen/papers/Lasso.pdf
Huang, J., Ma, S., Zhang, C.H.: Adaptive Lasso for sparse high dimensional regression models. Stat. Sin. 18 (4), 1603–1618 (2008)
Kim, Y., Choi, H., Oh, H.S.: Smoothly clipped absolute deviation on high dimensions. J. Am. Stat. Assoc. 103 (484), 1665–1673 (2008)
Leng, C., Lin, Y., Wahba, G.: A note on the Lasso and related procedures in model selection. Stat. Sin. 16 (4), 1273–1284 (2006)
Liu, H., Yu, B.: Asymptotic properties of Lasso+mLS and Lasso+Ridge in sparse high dimensional linear regression. Electron. J. Stat. 7, 3124–3169 (2013)
Schelldorfer, J., Bühlmann, P., van de Geer, S.: Estimation for high dimensional linear mixed effects models using L 1-penalization. Scand. J. Stat. 38 (2), 197–214 (2011)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodological), 58 (1), 267–288 (1996)
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused Lasso. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 67 (1), 91–108 (2005)
Tran, M.N.: The loss rank criterion for variable selection in linear regression analysis. Scand. J. Stat. 38 (3), 466–479 (2011)
Wang, H., Leng, C.: Unified Lasso estimation by least squares approximation. J. Am. Stat. Assoc. 102 (479), 1039–1048 (2012)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68 (1), 49–67 (2006)
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Zhang, C.H., Zhang, S.S.: Confidence intervals for low-dimensional parameters in high dimensional linear models. Ann. Stat. 76, 217–242 (2014)
Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)
Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101 (476), 1418–1429 (2006)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Ejaz Ahmed, S., Yüzbaşı, B. (2017). High Dimensional Data Analysis: Integrating Submodels. In: Ahmed, S. (eds) Big and Complex Data Analysis. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-41573-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-41573-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41572-7
Online ISBN: 978-3-319-41573-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)