Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

Laria, Juan C.; Aguilera-Morillo, M. Carmen; Lillo, Rosa E.

doi:10.1007/s00362-022-01313-z

Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

Regular Article
Published: 06 May 2022

Volume 64, pages 227–253, (2023)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Juan C. Laria ORCID: orcid.org/0000-0001-7734-9647¹,
M. Carmen Aguilera-Morillo^2,3 &
Rosa E. Lillo^3,4

308 Accesses
Explore all metrics

Abstract

This paper introduces the Group Linear Algorithm with Sparse Principal decomposition, an algorithm for supervised variable selection and clustering. Our approach extends the Sparse Group Lasso regularization to calculate clusters as part of the model fit. Therefore, unlike Sparse Group Lasso, our idea does not require prior specification of clusters between variables. To determine the clusters, we solve a particular case of sparse Singular Value Decomposition, with a regularization term that follows naturally from the Group Lasso penalty. Moreover, this paper proposes a unified implementation to deal with, but not limited to, linear regression, logistic regression, and proportional hazards models with right-censoring. Our methodology is evaluated using both biological and simulated data, and details of the implementation in R and hyperparameter search are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Sparse Principal Component Analysis Problem: Optimality Conditions and Algorithms

Article 18 April 2016

Adaptive group Lasso for high-dimensional generalized linear models

Article 03 February 2017

A doubly sparse approach for group variable selection

Article 28 June 2016

Notes

References

Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X et al (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511
Article Google Scholar
Bair E, Hastie T, Paul D, Tibshirani R (2006) Prediction by supervised principal components. J Am Stat Assoc 101(473):119–137
Article MATH Google Scholar
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202
Article MATH Google Scholar
Beisser D, Klau GW, Dandekar T, Müller T, Dittrich MT (2010) Bionet: an r-package for the functional analysis of biological networks. Bioinformatics 26(8):1129–1130
Article Google Scholar
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305
MATH Google Scholar
Bühlmann P, Rütimann P, van de Geer S, Zhang CH (2013) Correlated variables in regression: clustering and sparse estimation. J Stat Plan Inference 143(11):1835–1858
Article MATH Google Scholar
Chen K, Chen K, Müller HG, Wang JL (2011) Stringing high-dimensional data for functional analysis. J Am Stat Assoc 106(493):275–284
Article MATH Google Scholar
Ciuperca G (2020) Adaptive elastic-net selection in a quantile model with diverging number of variable groups. Statistics 54(5):1147–1170
Article MATH Google Scholar
Dittrich MT, Klau GW, Rosenwald A, Dandekar T, Müller T (2008) Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics 24(13):i223–i231
Article Google Scholar
Eddelbuettel D, François R (2011) Rcpp: seamless R and C++ integration. J Stat Softw 40(8):1–18
Article Google Scholar
Friedman J, Hastie T, Tibshirani R (2010a) A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736
Friedman J, Hastie T, Tibshirani R (2010b) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1
Kuhn M (2020) tune: Tidy Tuning Tools. https://CRAN.R-project.org/package=tune, r package version 0.1.0
Kuhn M, Vaughan D (2020) parsnip: a Common API to Modeling and Analysis Functions. https://CRAN.R-project.org/package=parsnip, r package version 0.0.5
Laria JC, Carmen Aguilera-Morillo M, Lillo RE (2019) An iterative sparse-group lasso. J Comput Graph Stat 28(3):722–731
Article MATH Google Scholar
Luo S, Chen Z (2020) Feature selection by canonical correlation search in high-dimensional multiresponse models with complex group structures. J Am Stat Assoc 115(531):1227–1235
Article MATH Google Scholar
Moore DF (2016) Applied survival analysis using R. Springer, New York
Book MATH Google Scholar
Ndiaye E, Fercoq O, Gramfort A, Salmon J (2016) Gap safe screening rules for sparse-group lasso. In: Advances in Neural Information Processing Systems, pp 388–396
Price BS, Sherwood B (2017) A cluster elastic net for multivariate regression. J Mach Learn Res 18(1):8685–8723
Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Article Google Scholar
Ren S, Kang EL, Lu JL (2020) Mcen: a method of simultaneous variable selection and clustering for high-dimensional multinomial regression. Stat Comput 30(2):291–304
Article MATH Google Scholar
Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, Gascoyne RD, Muller-Hermelink HK, Smeland EB, Giltnane JM et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med 346(25):1937–1947
Article Google Scholar
Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. J Multivar Anal 99(6):1015–1034
Article MATH Google Scholar
Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graph Stat 22(2):231–245
Article Google Scholar
Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp 2951–2959
Therneau TM (2015) A package for survival analysis in S. https://CRAN.R-project.org/package=survival, version 2.38
Therneau TM, Grambsch PM (2000) Modeling survival data: extending the cox model. Springer, New York
Book MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc 58(1):267–288
MATH Google Scholar
Tibshirani R, Bien J, Friedman J, Hastie T, Simon N, Taylor J, Tibshirani RJ (2012) Strong rules for discarding predictors in lasso-type problems. J R Stat Soc Ser B 74(2):245–266
Article MATH Google Scholar
Witten DM, Shojaie A, Zhang F (2014) The cluster elastic net for high-dimensional regression with unknown variable grouping. Technometrics 56(1):112–122
Article Google Scholar
Zhang Y, Zhang N, Sun D, Toh KC (2020) An efficient hessian based algorithm for solving large-scale sparse group lasso problems. Math Program 179(1):223–263
Article MATH Google Scholar
Zhao H, Wu Q, Li G, Sun J (2019) Simultaneous estimation and variable selection for interval-censored data with broken adaptive ridge regression. J Am Stat Assoc 1–13
Zhou N, Zhu J (2010) Group variable selection via a hierarchical lasso and its oracle property. Stat Interface 3:557–574
Article MATH Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67(2):301–320
Article MATH Google Scholar

Download references

Acknowledgements

We gratefully acknowledge the help provided by Prof. Daniela Witten, who gave us access to the source code of CEN and the simulation set-ups compared in Sect. 4. We also acknowledge the constructive comments of the anonymous referees that have contributed to improve the contents of this paper.

Author information

Authors and Affiliations

TomTom Maps-Analytics, Madrid, Spain
Juan C. Laria
Department of Applied Statistics and Operational Research and Quality, Universitat Politècnica de València, València, Spain
M. Carmen Aguilera-Morillo
UC3M-BS Santander Big Data Institute, Getafe, Spain
M. Carmen Aguilera-Morillo & Rosa E. Lillo
Department of Statistics, University Carlos III of Madrid, Getafe, Spain
Rosa E. Lillo

Authors

Juan C. Laria
View author publications
You can also search for this author in PubMed Google Scholar
M. Carmen Aguilera-Morillo
View author publications
You can also search for this author in PubMed Google Scholar
Rosa E. Lillo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan C. Laria.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 173 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laria, J.C., Aguilera-Morillo, M.C. & Lillo, R.E. Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models. Stat Papers 64, 227–253 (2023). https://doi.org/10.1007/s00362-022-01313-z

Download citation

Received: 08 January 2021
Revised: 12 February 2022
Accepted: 05 April 2022
Published: 06 May 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s00362-022-01313-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

Abstract

Access this article

Similar content being viewed by others

The Sparse Principal Component Analysis Problem: Optimality Conditions and Algorithms

Adaptive group Lasso for high-dimensional generalized linear models

A doubly sparse approach for group variable selection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 173 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

Abstract

Access this article

Similar content being viewed by others

The Sparse Principal Component Analysis Problem: Optimality Conditions and Algorithms

Adaptive group Lasso for high-dimensional generalized linear models

A doubly sparse approach for group variable selection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 173 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation