Regularization and Model Selection with Categorical Covariates

Gertheiss, Jan; Stelz, Veronika; Tutz, Gerhard

doi:10.1007/978-3-319-00035-0_21

Jan Gertheiss²¹,
Veronika Stelz²¹ &
Gerhard Tutz²¹

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2848 Accesses
1 Citations

Abstract

The challenge in regression problems with categorical covariates is the high number of parameters involved. Common regularization methods like the Lasso, which allow for selection of predictors, are typically designed for metric predictors. If independent variables are categorical, selection strategies should be based on modified penalties. For categorical predictor variables with many categories a useful strategy is to search for clusters of categories with similar effects. We focus on generalized linear models and present L ₁-penalty approaches for factor selection and clustering of categories. The methods proposed are investigated in simulation studies and applied to a real world classification problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Variable selection in model-based clustering and discriminant analysis with a regularization approach

Article 11 April 2018

Variable selection and estimation using a continuous approximation to the $$L_0$$ penalty

Article 19 October 2016

A systematic review on model selection in high-dimensional regression

Article 12 November 2018

Notes

1.
The original dataset (Wolberg and Mangasarian 1990) was of size 369 (reported January 1989). Two instances were removed later and additional groups of all in all 332 samples were collected (between October 1989 and November 1991).

References

Bondell, H. D., & Reich, B. J. (2009). Simultaneous factor selection and collapsing levels in anova. Biometrics, 65, 169–177.
Article MathSciNet MATH Google Scholar
Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models (2nd ed.). New York: Springer.
Book MATH Google Scholar
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Article MathSciNet MATH Google Scholar
Gertheiss, J. (2011). ordPens: Selection and/or Smoothing of Ordinal Predictors. R package version 0.1–7
Google Scholar
Gertheiss, J., & Tutz, G. (2009). Penalized regression with ordinal predictors. International Statistical Review, 77, 345–365.
Article Google Scholar
Gertheiss, J., & Tutz, G. (2010). Sparse modeling of categorial explanatory variables. The Annals of Applied Statistics, 4, 2150–2180.
Article MathSciNet MATH Google Scholar
Leisch, F., & Dimitriadou, E. (2010). mlbench: Machine Learning Benchmark Problems. R package version 2.0-0
Google Scholar
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models, 2nd edn. New York: Chapman & Hall
MATH Google Scholar
Newman, D. J., Hettich, S., Blake, C. L., & Merz, C. J. (1998). UCI Repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA, URL http://www.ics.uci.edu/~mlearn/MLRepository.html
Park, M.Y, & Hastie, T. (2007). L1 regularization-path algorithm for generalized linear models. Journal of the Royal Statistical Society, B 69, 659–677.
Google Scholar
Stelz, V. (2010). L1-Regularisierung bei kategorialen Prädiktoren in generalisierten linearen modellen. Master thesis, Ludwig-Maximilians-University Munich
Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58, 267–288.
MathSciNet MATH Google Scholar
Ulbricht, J. (2010). lqa: Penalized Likelihood Inference for GLMs. R package version 1.0–3
Google Scholar
Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 87, 9193–9196.
Article MATH Google Scholar
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported in part by DFG project GE2353/1-1.

Author information

Authors and Affiliations

Department of Statistics, LMU Munich, Akademiestr. 1, 80799, Munich, Germany
Jan Gertheiss, Veronika Stelz & Gerhard Tutz

Authors

Jan Gertheiss
View author publications
You can also search for this author in PubMed Google Scholar
Veronika Stelz
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Tutz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Gertheiss .

Editor information

Editors and Affiliations

University of Essex Department of Mathematical Sciences, Colchester, United Kingdom
Berthold Lausen
Ghent University Department of Marketing, Ghent, Belgium
Dirk Van den Poel
University of Marburg Databionics, FB 12, Marburg, Germany
Alfred Ultsch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gertheiss, J., Stelz, V., Tutz, G. (2013). Regularization and Model Selection with Categorical Covariates. In: Lausen, B., Van den Poel, D., Ultsch, A. (eds) Algorithms from and for Nature and Life. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-00035-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-00035-0_21
Published: 16 July 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-00034-3
Online ISBN: 978-3-319-00035-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Regularization and Model Selection with Categorical Covariates

Abstract

Access this chapter

Similar content being viewed by others

Variable selection in model-based clustering and discriminant analysis with a regularization approach

Variable selection and estimation using a continuous approximation to the $$L_0$$ penalty

A systematic review on model selection in high-dimensional regression

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Regularization and Model Selection with Categorical Covariates

Abstract

Access this chapter

Similar content being viewed by others

Variable selection in model-based clustering and discriminant analysis with a regularization approach

Variable selection and estimation using a continuous approximation to the $$L_0$$ penalty

A systematic review on model selection in high-dimensional regression

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation