Penalized regression combining the L 1 norm and a correlation based penalty

Anbari, Mohammed El; Mkhadri, Abdallah

doi:10.1007/s13571-013-0065-4

Penalized regression combining the L ₁ norm and a correlation based penalty

Published: 27 August 2013

Volume 76, pages 82–102, (2014)
Cite this article

Sankhya B Aims and scope Submit manuscript

Mohammed El Anbari¹ &
Abdallah Mkhadri²

333 Accesses
14 Citations
6 Altmetric
Explore all metrics

Abstract

We consider the problem of feature selection in linear regression model with p covariates and n observations. We propose a new method to simultaneously select variables and favor a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The method is based on penalized least squares with a penalty function that combines the L₁ and a Correlation based Penalty (CP) norms. We call it L1CP method. Like the Lasso penalty, L1CP shrinks some coefficients to exactly zero and additionally, the CP term explicitly links strength of penalization to the correlation among predictors. A detailed simulation study in small and high dimensional settings is performed. It illustrates the advantages of our approach compared to several alternatives. Finally, we apply the methodology to two real data sets: US Crime Data and GC-Retention PAC data. In terms of prediction accuracy and estimation error, our empirical study suggests that the L1CP is more adapted than the Elastic-Net to situations where p ≤ n (the number of variables is less or equal to the sample size). If p ≫ n, our method remains competitive and also allows the selection of more than n variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bondell, H.D. and Reich, B.J. (2008). Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR. Biometrics 64, 115–123.
Article MATH MathSciNet Google Scholar
Chen, S., Donoho, D. and Saunders, M. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput., 20, no. 1, 33–61.
Article MathSciNet Google Scholar
Daye, Z.J. and Jeng, X.J. (2009). Shrinkage and model selection with correlated variables via weighted fusion. Comput. Statist. Data Anal., 54, 1284–1298.
Article MathSciNet Google Scholar
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist., 32, 407–499.
Article MATH MathSciNet Google Scholar
El Anbari, M. and Mkhadri, A. (2008). Penalized regression with a combination of the L1 norm and the correlation based penalty. Rapports de Recherche de L’Institut National de Recherche en Informatique et Automatique, France, N° 6746.
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J. and Caliugiuri, M. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 513–536.
Article Google Scholar
Hoerl, A. and Kennard, R. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.
Article MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B, 58, 267–288.
MATH MathSciNet Google Scholar
Tutz, G. and Ulbricht, J. (2009). Penalized regression with correlation based penalty. Stat. Comput., 19, 239–253.
Article MathSciNet Google Scholar
Varmuza, K. and Filzmoser, P. (2009). Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press.
Witten, D.M. and Tibshirani, R. (2009). Covariance-regularized regression and classification for high-dimensional problems. J. R. Stat. Soc. Ser. B, 71, 615–636.
Article MATH MathSciNet Google Scholar
Wu, S., Shen, X. and Geyer, C.J. (2009). Adaptive regularization using the entire solution surfaces. Biometrika, 96, 513–527.
Article MATH MathSciNet Google Scholar
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B, 68, 49–67.
Article MATH MathSciNet Google Scholar
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic-net. J. R. Stat. Soc. Ser. B, 67, 301–320.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Dept. de mathématiques Bâtiment 425, Université Paris-Sud, 91405, Orsay, Cedex, France
Mohammed El Anbari
Department of Mathematics, Faculty of Sciences Semlalia, Cadi Ayyad University, B.P. 2390, Marrakesh, Morocco
Abdallah Mkhadri

Authors

Mohammed El Anbari
View author publications
You can also search for this author in PubMed Google Scholar
Abdallah Mkhadri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdallah Mkhadri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anbari, M.E., Mkhadri, A. Penalized regression combining the L ₁ norm and a correlation based penalty. Sankhya B 76, 82–102 (2014). https://doi.org/10.1007/s13571-013-0065-4

Download citation

Received: 22 January 2011
Revised: 12 April 2013
Published: 27 August 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s13571-013-0065-4

Keywords and phrases

AMS (2000) subject classification.

Primary 62J05; Secondary 62J07

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Penalized regression combining the L ₁ norm and a correlation based penalty

Abstract

Access this article

Similar content being viewed by others

Learning from imbalanced data: open challenges and future directions

Data clustering: application and trends

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords and phrases

AMS (2000) subject classification.

Navigation

Penalized regression combining the L 1 norm and a correlation based penalty

Abstract

Access this article

Similar content being viewed by others

Learning from imbalanced data: open challenges and future directions

Data clustering: application and trends

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords and phrases

AMS (2000) subject classification.

Search

Navigation

Penalized regression combining the L ₁ norm and a correlation based penalty