Abstract
In high-dimensional data analysis, bi-level sparsity is often assumed when covariates function group-wisely and sparsity can appear either at the group level or within certain groups. In such cases, an ideal model should be able to encourage the bi-level variable selection consistently. Bi-level variable selection has become even more challenging when data have heavy-tailed distribution or outliers exist in random errors and covariates. In this paper, we study a framework of high-dimensional M-estimation for bi-level variable selection. This framework encourages bi-level sparsity through a computationally efficient two-stage procedure. In theory, we provide sufficient conditions under which our two-stage penalized M-estimator possesses simultaneous local estimation consistency and the bi-level variable selection consistency if certain non-convex penalty functions are used at the group level. Both our simulation studies and real data analysis demonstrate satisfactory finite sample performance of the proposed estimators under different irregular settings.
Similar content being viewed by others
References
Breheny, P. (2015). The group exponential lasso for bi-level variable selection. Biometrics, 71(3), 731–740.
Breheny, P., Huang, J. (2009). Penalized methods for bi-level variable selection. Statistics and Its Interface, 2(3), 369.
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
Friedman, J., Hastie, T., Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736.
Guo, X., Zhang, H., Wang, Y., Wu, J.-L. (2015). Model selection and estimation in high dimensional regression models with group SCAD. Statistics & Probability Letters, 103, 86–92.
Hill, R. W. (1977). Robust regression when there are outliers in the carriers. PhD thesis, Harvard University.
Huang, J., Ma, S., Xie, H., Zhang, C.-H. (2009). A group bridge approach for variable selection. Biometrika, 96(2), 339–355.
Huang, J., Breheny, P., Ma, S. (2012). A selective review of group selection in high-dimensional models. Statistical Science, 27(4), 481–499.
Jiang, D., Huang, J. (2014). Concave 1-norm group selection. Biostatistics, 16(2), 252–267.
Kita, A., Kasamatsu, A., Nakashima, D., Endo-Sakamoto, Y., Ishida, S., Shimizu, T., Kimura, Y., Miyamoto, I., Yoshimura, S., Shiiba, M., Tanzawa, H., Uzawa, K. (2017). Activin b regulates adhesion, invasiveness, and migratory activities in oral cancer: A potential biomarker for metastasis. Journal of Cancer, 8(11), 2033.
Li, Z.-L., Zhou, S.-F. (2016). A silac-based approach elicits the proteomic responses to vancomycin-associated nephrotoxicity in human proximal tubule epithelial hk-2 cells. Molecules, 21(2), 148.
Lilly, K. (2015). Robust variable selection methods for grouped data. PhD thesis, Auburn University.
Loh, P.-L. (2017). Statistical consistency and asymptotic normality for high-dimensional robust \(m\)-estimators. The Annals of Statistics, 45(2), 866–896.
Loh, P.-L., Wainwright, M. J. (2015). Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. The Journal of Machine Learning Research, 16(1), 559–616.
Mallows, C. L. (1975). On some topics in robustness. Bell Telephone Laboratories. Unpublished memorandum.
Merrill, H. M., Schweppe, F. C. (1971). Bad data suppression in power system static state estimation. IEEE Transactions on Power Apparatus and Systems, 6, 2718–2725.
Müller, C. (2004). Redescending m-estimators in regression analysis, cluster analysis and image analysis. Discussiones Mathematicae Probability and Statistics, 24(1), 59–75.
Nesterov, Y. (2013). Gradient methods for minimizing composite functions. Mathematical Programming, 140(1), 125–161.
Oshima, R. G., Baribault, H., Caulín, C. (1996). Oncogenic regulation and function of keratins 8 and 18. Cancer and Metastasis Reviews, 15(4), 445–471.
Shankavaram, U. T., Reinhold, W. C., Nishizuka, S., Major, S., Morita, D., Chary, K. K., Reimers, M. A., Scherf, J., Kahn, A., Dolginow, D., Cossman, J., Kaldjian, E. P., Scudiero, D. A., Petricoin, E., Liotta, L., Lee, J. K., Weinstein, J. N. (2007). Transcript and protein expression profiles of the nci-60 cancer cell panel: An integromic microarray study. Molecular Cancer Therapeutics, 6(3), 820–832.
Shevlyakov, G., Morgenthaler, S., Shurygin, A. (2008). Redescending m-estimators. Journal of Statistical Planning and Inference, 138(10), 2906–2917.
Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2013). A sparse-group lasso. Journal of Computational and Graphical Statistics, 22(2), 231–245.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.
Walker, L. C., Harris, G. C., Hooloway, A. J., Mckenzie, G. W., Wells, J. E., Robinson, B. A., Morrisa, C. M. (2007). Cytokeratin krt8/18 expression differentiates distinct subtypes of grade 3 invasive ductal carcinoma of the breast. Cancer Genetics and Cytogenetics, 178(2), 94–103.
Wang, M., Tian, G.-L. (2016). Robust group non-convex estimations for high-dimensional partially linear models. Journal of Nonparametric Statistics, 28(1), 49–67.
Wei, F., Huang, J. (2010). Consistent group selection in high-dimensional linear regression. Bernoulli: Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, 16(4), 1369.
Wijayarathna, R., De Kretser, D. M. (2016). Activins in reproductive biology and beyond. Human Reproduction Update, 22(3), 342–357.
Yuan, M., Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.
Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Gao is partially supported by Simons Foundation Grant: SF359337.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Luo, B., Gao, X. A high-dimensional M-estimator framework for bi-level variable selection. Ann Inst Stat Math 74, 559–579 (2022). https://doi.org/10.1007/s10463-021-00809-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-021-00809-z