Skip to main content
Log in

Extended differential geometric LARS for high-dimensional GLMs with general dispersion parameter

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

A large class of modeling and prediction problems involves outcomes that belong to an exponential family distribution. Generalized linear models (GLMs) are a standard way of dealing with such situations. Even in high-dimensional feature spaces GLMs can be extended to deal with such situations. Penalized inference approaches, such as the \(\ell _1\) or SCAD, or extensions of least angle regression, such as dgLARS, have been proposed to deal with GLMs with high-dimensional feature spaces. Although the theory underlying these methods is in principle generic, the implementation has remained restricted to dispersion-free models, such as the Poisson and logistic regression models. The aim of this manuscript is to extend the differential geometric least angle regression method for high-dimensional GLMs to arbitrary exponential dispersion family distributions with arbitrary link functions. This entails, first, extending the predictor–corrector (PC) algorithm to arbitrary distributions and link functions, and second, proposing an efficient estimator of the dispersion parameter. Furthermore, improvements to the computational algorithm lead to an important speed-up of the PC algorithm. Simulations provide supportive evidence concerning the proposed efficient algorithms for estimating coefficients and dispersion parameter. The resulting method has been implemented in our R package (which will be merged with the original dglars package) and is shown to be an effective method for inference for arbitrary classes of GLMs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. \( \gamma \ge 0\) is a tuning parameter that controls the size of the coefficients. The increase of \(\gamma \) will shrink the coefficients closer to each other and to zero. In practice, it is usually determined by cross-validation.

  2. If n is odd, we can consider \(|n_1-n_2|=1\), and then we randomly apply one of the member of the larger data set to the smaller data set to both have the same dimension, \(n_1=n_2=n/2\).

  3. This package is merged into the original package dglars, Augugliaro and Pazira (2017).

References

  • Aho, K., Derryberry, D., Peterson, T.: Model selection for ecologists: the worldviews of AIC and BIC. Ecology 95(3), 631–636 (2014)

    Article  Google Scholar 

  • Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  • Allgower, E., Georg, K.: Introduction to Numerical Continuation Methods. Society for Industrial and Applied Mathematics, New York (2003)

    Book  MATH  Google Scholar 

  • Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Augugliaro, L., Mineo, A.M., Wit, E.C.: Differential geometric least angle regression: a differential geometric approach to sparse generalized linear models. J. R. Stat. Soc. B 75(3), 471–498 (2013)

    Article  MathSciNet  Google Scholar 

  • Augugliaro, L., Mineo, A.M., Wit, E.C.: dglars: an R package to estimate sparse generalized linear models. J. Stat. Softw. 59(8), 1–40 (2014a)

    Article  Google Scholar 

  • Augugliaro, L.: dglars: Differential Geometric LARS (dgLARS) Method. R package version 1.0.5. http://CRAN.R-project.org/package=dglars (2014b)

  • Augugliaro, L., Mineo, A.M., Wit, E.C.: A differential geometric approach to generalized linear models with grouped predictors. Biometrika 103, 563–593 (2016)

    Article  MathSciNet  Google Scholar 

  • Augugliaro, L., Pazira, H.: dglars: Differential Geometric Least Angle Regression. R package version 2.0.0. http://CRAN.R-project.org/package=dglars (2017)

  • Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, \(2^{{\rm nd}}\) edn. Springer, New York (2002)

    Google Scholar 

  • Candes, E.J., Tao, T.: The dantzig selector: statistical estimation when \(p\) is much larger than \(n\). Ann. Stat. 35, 2313–2351 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, Y., Du, P., Wang, Y.: Variable selection in linear models. Wiley Interdiscip. Rev. Comput. Stat. 6, 1–9 (2014)

    Article  Google Scholar 

  • Cordeiro, G.M., McCullagh, P.: Bias correction in generalized linear models. J. R. Stat. Soc. B 53(3), 629–643 (1991)

    MathSciNet  MATH  Google Scholar 

  • Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. B 70(5), 849–911 (2008)

    Article  MathSciNet  Google Scholar 

  • Fan, J., Guo, S., Hao, N.: Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J. R. Stat. Soc. B 74(1), 37–65 (2012)

    Article  MathSciNet  Google Scholar 

  • Farrington, C.P.: On assessing goodness of fit of generalized linear model to sparse data. J. R. Stat. Soc. B 58(2), 349–360 (1996)

    MathSciNet  MATH  Google Scholar 

  • Friedman, J., Hastie, T., RTibshirani: glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models. R Package Version 1.1-5. http://CRAN.R-project.org/package=glmnet (2010b)

  • Hastie, T., Efron, B.: lars: Least Angle Regression, Lasso and Forward Stagewise. R Package Version 1.2. http://CRAN.R-project.org/package=lars (2013)

  • Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009)

    Book  MATH  Google Scholar 

  • Hoerl, A.E., Kennard, R.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970)

    Article  MATH  Google Scholar 

  • Ishwaran, H., Kogalur, U.B., Rao, J.: spikeslab: prediction and variable selection using spike and slab regression. R J. 2(2), 68–73 (2010a)

    Google Scholar 

  • Ishwaran, H., Kogalur, U.B., Rao, J.: spikeslab: prediction and variable selection using spike and slab regression. R package version 1.1.2. http://CRAN.R-project.org/package=spikeslab (2010b)

  • James, G., Radchenko, P.: A generalized dantzig selector with shrinkage tuning. Biometrika 96, 323–337 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Jorgensen, B.: Exponential dispersion models. J. R. Stat. Soc. B 49, 127–162 (1987)

    MathSciNet  MATH  Google Scholar 

  • Jorgensen, B.: The Theory of Dispersion Models. Chapman & Hall, London (1997)

    MATH  Google Scholar 

  • Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, K.C.: Asymptotic optimality for \(c_p\), \(c_l\), cross-validation and generalized cross-validation: discrete index set. Ann. Stat. 15, 958–975 (1987)

  • Littell, R.C., Stroup, W.W., Feund, R.J.: SAS for Linear Models, 4th edn. Sas Institute Inc., Cary (2002)

    Google Scholar 

  • McCullagh, P., Nelder, J.A.: Generalized Liner Models. Chapman & Hall, London (1989)

    Book  MATH  Google Scholar 

  • McQuarrie, A.D.R., Tsai, C.L.: Regression and Time Series Model Selection, 1st edn. World Scientific Publishing Co. Pte. Ltd, Singapore (1998)

    Book  MATH  Google Scholar 

  • Meng, R.: Estimation of dispersion parameters in glms with and without random effects. Master’s thesis, Stockholm University (2004)

  • Park, M.Y., Hastie, T.: glmpath: \(L_1\) Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model. R Package Version 0.94. http://CRAN.R-project.org/package=glmpath (2007b)

  • Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes in Fortran 77: The Art of Scientific Computing, 2nd edn. Cambridge University Press, England (1992)

    MATH  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  • Shao, J.: An asymptotic theory for linear model selection. Stat. Sin. 7, 221–264 (1997)

    MathSciNet  MATH  Google Scholar 

  • Shibata, R.: An optimal selection of regression variables. Biometrika 68, 45–54 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  • Shibata, R.: Approximation efficiency of a selection procedure for the number of regression variables. Biometrika 71, 43–49 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  • Stone, M.: Asymptotics for and against cross-validation. Biometrika 64, 29–35 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Ultricht, J., Tutz, G.: Combining quadratic penalization and variable selection via forward boosting. Tech. Rep., Department of Statistics, Munich University, Technical Reports No. 99 (2011)

  • Vos, P.W.: A geometric approach to detecting influential cases. Ann. Stat. 19, 1570–1581 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  • Whittaker, E.T., Robinson, G.: The Calculus of Observations: An Introduction to Numerical Analysis, 4th edn. Dover Publications, New York (1967)

    Google Scholar 

  • Wood, S.N.: Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC, Boca Raton (2006)

    MATH  Google Scholar 

  • Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67(2), 301–320 (2005a)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank the editor and anonymous reviewers for valuable comments which improved the presentation of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hassan Pazira.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 65 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pazira, H., Augugliaro, L. & Wit, E. Extended differential geometric LARS for high-dimensional GLMs with general dispersion parameter. Stat Comput 28, 753–774 (2018). https://doi.org/10.1007/s11222-017-9761-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-017-9761-7

Keywords

Navigation