LASSO regularization within the LocalGLMnet architecture

Richman, Ronald; Wüthrich, Mario V.

doi:10.1007/s11634-022-00529-z

LASSO regularization within the LocalGLMnet architecture

Regular Article
Published: 13 December 2022

Volume 17, pages 951–981, (2023)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

434 Accesses
3 Citations
Explore all metrics

Abstract

Deep learning models have been very successful in the application of machine learning methods, often out-performing classical statistical models such as linear regression models or generalized linear models. On the other hand, deep learning models are often criticized for not being explainable nor allowing for variable selection. There are two different ways of dealing with this problem, either we use post-hoc model interpretability methods or we design specific deep learning architectures that allow for an easier interpretation and explanation. This paper builds on our previous work on the LocalGLMnet architecture that gives an interpretable deep learning architecture. In the present paper, we show how group LASSO regularization (and other regularization schemes) can be implemented within the LocalGLMnet architecture so that we receive feature sparsity for variable selection. We benchmark our approach with the recently developed LassoNet of Lemhadri et al. ( LassoNet: a neural network with feature sparsity. J Mach Learn Res 22:1–29, 2021).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

A review on the long short-term memory model

Article 13 May 2020

Notes

We call our proposal LASSO regularization of the LocalGLMnet. Whereas the initial proposal of the LASSO was indeed for the linear regression model, this has been extended to GLMs, see Sect. 3.4 in Hastie et al. (2015).
The dataset is available at this link: http://lib.stat.cmu.edu/datasets/boston and code for this example is available on Github at this link: https://github.com/RonRichman/Regularized-LocalGLMnet.
The dataset is available at this link: http://www2.math.uconn.edu/~valdez/telematics_syn-032021.csv
Note that due to privacy concerns, these 100, 000 records were generated synthetically based on real data, see So et al. (2021) for a detailed description of this.
The grouped version of the model was applied in accordance with the instructions at https://github.com/lasso-net/lassonet/issues/7.

References

Agarwal R, Frosst N, Zhang X, Caruana R, Hinton GE (2020) Neural additive models: interpretable machine learning with neural nets. arXiv:2004.13912v1
Apley DW, Zhu J (2020) Visualizing the effects of predictor variables in black box supervised learning models. J R Stat Soc Ser B 82(4):1059–1086
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Article MathSciNet MATH Google Scholar
Gneiting T (2011) Making and evaluating point forecasts. J Am Stat Assoc 106(494):746–762
Article MathSciNet MATH Google Scholar
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102(477):359–378
Article MathSciNet MATH Google Scholar
Harrison D, Rubinfeld DL (1978) Hedonic prices and the demand for clean air. J Environ Econ Manag 5:81–102
Article MATH Google Scholar
Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the Lasso and generalizations. CRC Press
Book MATH Google Scholar
Hoerl A, Kennard R (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
Article MATH Google Scholar
Lee JD, Sun DL, Sun Y, Taylor JE (2016) Exact post-selection inference, with application to the LASSO. Ann Stat 44(3):907–927
Article MathSciNet MATH Google Scholar
Lemhadri I, Ruan F, Abraham L, Tibshirani R (2021) LassoNet: a neural network with feature sparsity. J Mach Learn Res 22:1–29
MathSciNet MATH Google Scholar
Lindholm M, Richman R, Tsanakas A, Wüthrich MV (2022) Discrimination-free insurance pricing. ASTIN Bull J IAA 52:55–89
Article MathSciNet MATH Google Scholar
Merity S, McCann B, Socher R (2017) Revisiting activation regularization for language RNNs. arXiv:1708.01009v1
Merz M, Richman R, Tsanakas A, Wüthrich MV (2022) Interpreting deep learning models with marginal attribution by conditioning on quantiles. Data Min Knowl Discov 36:1335–1370
Article MathSciNet MATH Google Scholar
Oelker M-R, Tutz G (2017) A uniform framework for the combination of penalties in generalized structured models. Adv Data Anal Classif 11:97–120
Article MathSciNet MATH Google Scholar
Parikh N, Boyd S (2013) Proximal algorithms. Found Trends Optim 1(3):123–231
Google Scholar
Richman R (2021) Mind the gap—safely incorporating deep learning models into the actuarial toolkit. SSRN Manuscript ID 3857693
Richman R, Wüthrich MV (2022) LocalGLMnet: interpretable deep learning for tabular data. Scand Actuar J, in press
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B Stat Methodol 58:267–288
MathSciNet MATH Google Scholar
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused LASSO. J R Stat Soc Ser B Stat Methodol 67:91–108
Article MathSciNet MATH Google Scholar
Tikhonov AN (1943) On the stability of inverse problems. Dokl Akad Nauk SSSR 39(5):195–198
MathSciNet Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762v5
So B, Boucher JP, Valdez EA (2021) Synthetic dataset generation of driver telematics. Risks 9(4):58
Article Google Scholar
Vaughan J, Sudjianto A, Brahimi E, Chen J, Nair VN (2018) Explainable neural networks based on additive index models. arXiv:1806.01933v1
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol 68:49–67
Article MathSciNet MATH Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 67:301–320
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors wish to thank the editor, assistant editor and reviewers of an earlier version of this manuscript for their comments which helped to improve the manuscript significantly.

Author information

Authors and Affiliations

Old Mutual Insure, University of the Witwatersrand, Johannesburg, South Africa
Ronald Richman
RiskLab, Department of Mathematics, ETH Zurich, Zurich, Switzerland
Mario V. Wüthrich

Authors

Ronald Richman
View author publications
You can also search for this author in PubMed Google Scholar
Mario V. Wüthrich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ronald Richman.

Ethics declarations

Conflict of interest

Both authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Appendix: R code

B LassoNet-training details

The LassoNet models used were based on the Python code provided for the group LASSO version of the LassoNet at https://github.com/lasso-net/lassonet/tree/group-lasso.^{Footnote 5} The dimensions of the hidden layers of the LassoNet were set equal to the same dimensions as the corresponding LocalGLMnet so that the model capacity is roughly comparable, i.e., any differences in performance will be mainly attributable to the way in which regularization is applied within each of the models. The main hyperparameter tested for the LassoNet was the budget parameter M; a range of LassoNet models are fit automatically for different values of the regularization parameter \(\eta \). Values of \(M \in \{1, 10, 100\}\) were tested for each example.

For the Boston housing dataset, the best LassoNet model as indicated by the MSE on the learning set was selected (since there are no validation or test sets used in that example). For the telematics data, the LassoNet producing the lowest values of the binary cross-entropy loss on the validation set was selected.

Only a single run of the LassoNet model was used for these results, nonetheless, it was observed that the results varied quite significantly for each training run (see last line in Table 8, which shows that the LassoNet has the highest standard deviation over training runs among the models considered) indicating that better results could be perhaps achieved using multiple runs and averaging over these.

Table 9 Boston housing data - feature components used

Full size table

Table 10 Telematics data - feature components used

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Richman, R., Wüthrich, M.V. LASSO regularization within the LocalGLMnet architecture. Adv Data Anal Classif 17, 951–981 (2023). https://doi.org/10.1007/s11634-022-00529-z

Download citation

Received: 20 September 2021
Revised: 10 November 2022
Accepted: 16 November 2022
Published: 13 December 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11634-022-00529-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LASSO regularization within the LocalGLMnet architecture

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

A survey on Image Data Augmentation for Deep Learning

A review on the long short-term memory model

Notes

References

Acknowledgements