Skip to main content

Penalized Generalized Quasi-Likelihood Based Variable Selection for Longitudinal Data

  • Conference paper
  • First Online:
Advances and Challenges in Parametric and Semi-parametric Analysis for Correlated Data

Abstract

High-dimensional longitudinal data with a large number of covariates, have become increasingly common in many bio-medical applications. The identification of a sub-model that adequately represents the data is necessary for easy interpretation. Also, the inclusion of redundant variables may hinder the accuracy and efficiency of estimation and inference. The joint likelihood function for longitudinal data is challenging, particularly in correlated discrete data. To overcome this problem Wang et al. (Biometrics 68:353–360, 2012) introduced penalized GEEs (PGEEs) with a non-convex penalty function which requires only the first two marginal moments and a working correlation matrix. This method works reasonably well in high-dimensional problems; however, there is a risk of model mis-specification such as variance function and correlation structure and in such situations, we propose variable selection based on penalized generalized quasi-likelihood (PGQL). Simulation studies show that when model assumptions are true, the PGQL method has performance comparable with that of PGEEs. However, when the model is mis-specified, the PGQL method has clear advantages over the PGEEs method. We have implemented the proposed method in a real case example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Akaike, H: Information theory as a extension of maximum likelihood principle. In: Petrove, B.N., Csaki, F. (eds.) Second Symposium of Information Theory, pp. 267–282. Akademiai Kiado, Budapest (1973)

    Google Scholar 

  • Akaike, H: A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  • Antoniadis, A.: Wavelets in statistics: a review (with discussion). J. Italian Stat. Assoc. 6, 97–144 (1997)

    Article  Google Scholar 

  • Antoniadis, A., Fan, J.: Regularization of wavelets approximations. J. Am. Stat. Assoc. 96, 939–967 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Cantoni, E., Flemming, J.M., Ronchetti, E.: Variable selection for marginal longitudinal generalized linear models. Biometrika 61, 507–514 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31, 377–403 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  • Crowder, M.J.: On use of a working correlation matrix in using generalized linear models for repeated measures. Biometrika 82, 407–410 (1995)

    Article  MATH  Google Scholar 

  • Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Dziak, J.J., Li, R., Qu, A.: An overview on quadratic inference function approaches for longitudinal data. In Frontiers of Statistics, Volume 1: New Developments in Biostatistics and Bioinformatics, J. Fan, J.S. Liu, and X. Lin (eds), Chapter 3, 49–72. 5 Toh Tuch Link, Singapore: World Scientific Publishing, (2009)

    Google Scholar 

  • Fan, J.: Comments on “Wavelets in Statistics: A Review” by A. Antoniadis. J. Italian Stat. Assoc. 6, 131–138 (1997)

    Article  Google Scholar 

  • Fan, J., Li, R.: Variable selection via non concave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Li, R.: New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Am. Stat. Assoc. 99, 710–723 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Liang, K.Y., Zeger, S.L.: Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  • Lv, J., Fan, Y.: A unified approach to model selection and sparse recovery using regularized least squares. Ann. Stat. 37, 3498–3528 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • McKenzie, E.: Some ARMA models for dependent sequences of Poisson counts. Adv. Appl. Probab. 20, 822–835 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  • Nadarajah, T.: Penalized empirical likelihood based variable selection. M.Sc. thesis, Memorial University of Newfoundland, St. John’s (2011)

    Google Scholar 

  • Pan, W.: Akaike’s information criterion in generalized estimating equations. Biometrics 57, 120–125 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  • Sutradhar, B.C.: An overview on regression models for discrete longitudinal responses. Stat. Sci. 18, 377–393 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Sutradhar, B. C. Dynamic Mixed Models for Familial Longitudinal Data. New York: Springer (2011)

    Book  MATH  Google Scholar 

  • Sutradhar, B.C., Das, K.: On the efficiency of regression estimators in generalized linear models for longitudinal data. Biometrika 86, 459–465 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Sutradhar, B.C., Kovacevic, M.: Analysing ordinal longitudinal survey data: generalized estimating equations approach. Biometrika 87, 837–848 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Variyath, A.M.: Variable selection in generalized linear models by empirical likelihood. Ph.D. thesis, University of Waterloo, Waterloo (2006)

    Google Scholar 

  • Variyath, A.M., Chen, J., Abraham, B.: Empirical likelihood based variable selection. J. Stat. Plan. Infer. 140, 971–981 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, H., Leng, C.: Unified LASSO estimation via least squares approximation. J. Am. Stat. Assoc. 102, 1039–1048 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, L., Qu, A.: Consistent model selection and data-driven smooth tests for longitudinal data in the estimating equations approach. J. R. Stat. Soc. Ser. B 71, 177–190 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, L., Li, H., Huang, J.: Variable selection in nonparametric varying- coefficient models for analysis of repeated measurements. J. Am. Stat. Assoc. 103, 1556–1569 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, L., Zhou, J., Qu, A.: Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics 68, 353–360 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Wedderburn, R.W.M.: Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61 (3), 439–444 (1974)

    MathSciNet  MATH  Google Scholar 

  • Xu, P., Wu, P., Wang, Y., Zhu, L.X.: A GEE based shrinkage estimation for the generalized linear model in longitudinal data analysis. Technical report, Department of Mathematics, Hong Kong Baptist University, Hong Kong (2010)

    Google Scholar 

  • Xue, L., Qu, A., Zhou, J.: Consistent model selection for marginal generalized additive model for correlated data. J. Am. Stat. Assoc. 105, 1518–1530 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Xiao, N., Zhang, D., Zhang, H.H.: Variable selection for semiparametric mixed models in longitudinal studies. Biometrics 66, 79–88 (2009)

    MathSciNet  MATH  Google Scholar 

  • Zhang, H.H., Lu, W.: Adaptive Lasso for Cox’s proportional hazards model. Biometrika 94, 691–703 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101 (476), 1418–1429 (2006)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors’ are grateful for the opportunity to present their work at the 2015 International Symposium in Statistics (ISS) on Advances in Parametric and Semiparametric Analysis of Multivariate, Time Series, Spatial-temporal, and Familial-longitudinal Data. Special thanks go to Professor Brajendra Sutradhar for organizing the conference, to members of the symposium audience for insightful discussion of our presentation, and to two anonymous referees for their thoughtful comments on our manuscript. The authors’ research was partially supported by grants from Natural Sciences & Engineering Research Council of Canada and Canadian Institute of Health Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tharshanna Nadarajah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Nadarajah, T., Variyath, A.M., Loredo-Osti, J.C. (2016). Penalized Generalized Quasi-Likelihood Based Variable Selection for Longitudinal Data. In: Sutradhar, B. (eds) Advances and Challenges in Parametric and Semi-parametric Analysis for Correlated Data. Lecture Notes in Statistics(), vol 218. Springer, Cham. https://doi.org/10.1007/978-3-319-31260-6_8

Download citation

Publish with us

Policies and ethics