Abstract
High-dimensional longitudinal data with a large number of covariates, have become increasingly common in many bio-medical applications. The identification of a sub-model that adequately represents the data is necessary for easy interpretation. Also, the inclusion of redundant variables may hinder the accuracy and efficiency of estimation and inference. The joint likelihood function for longitudinal data is challenging, particularly in correlated discrete data. To overcome this problem Wang et al. (Biometrics 68:353–360, 2012) introduced penalized GEEs (PGEEs) with a non-convex penalty function which requires only the first two marginal moments and a working correlation matrix. This method works reasonably well in high-dimensional problems; however, there is a risk of model mis-specification such as variance function and correlation structure and in such situations, we propose variable selection based on penalized generalized quasi-likelihood (PGQL). Simulation studies show that when model assumptions are true, the PGQL method has performance comparable with that of PGEEs. However, when the model is mis-specified, the PGQL method has clear advantages over the PGEEs method. We have implemented the proposed method in a real case example.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akaike, H: Information theory as a extension of maximum likelihood principle. In: Petrove, B.N., Csaki, F. (eds.) Second Symposium of Information Theory, pp. 267–282. Akademiai Kiado, Budapest (1973)
Akaike, H: A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974)
Antoniadis, A.: Wavelets in statistics: a review (with discussion). J. Italian Stat. Assoc. 6, 97–144 (1997)
Antoniadis, A., Fan, J.: Regularization of wavelets approximations. J. Am. Stat. Assoc. 96, 939–967 (2001)
Cantoni, E., Flemming, J.M., Ronchetti, E.: Variable selection for marginal longitudinal generalized linear models. Biometrika 61, 507–514 (2005)
Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31, 377–403 (1979)
Crowder, M.J.: On use of a working correlation matrix in using generalized linear models for repeated measures. Biometrika 82, 407–410 (1995)
Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455 (1994)
Dziak, J.J., Li, R., Qu, A.: An overview on quadratic inference function approaches for longitudinal data. In Frontiers of Statistics, Volume 1: New Developments in Biostatistics and Bioinformatics, J. Fan, J.S. Liu, and X. Lin (eds), Chapter 3, 49–72. 5 Toh Tuch Link, Singapore: World Scientific Publishing, (2009)
Fan, J.: Comments on “Wavelets in Statistics: A Review” by A. Antoniadis. J. Italian Stat. Assoc. 6, 131–138 (1997)
Fan, J., Li, R.: Variable selection via non concave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Fan, J., Li, R.: New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Am. Stat. Assoc. 99, 710–723 (2004)
Liang, K.Y., Zeger, S.L.: Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22 (1986)
Lv, J., Fan, Y.: A unified approach to model selection and sparse recovery using regularized least squares. Ann. Stat. 37, 3498–3528 (2009)
McKenzie, E.: Some ARMA models for dependent sequences of Poisson counts. Adv. Appl. Probab. 20, 822–835 (1988)
Nadarajah, T.: Penalized empirical likelihood based variable selection. M.Sc. thesis, Memorial University of Newfoundland, St. John’s (2011)
Pan, W.: Akaike’s information criterion in generalized estimating equations. Biometrics 57, 120–125 (2001)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Sutradhar, B.C.: An overview on regression models for discrete longitudinal responses. Stat. Sci. 18, 377–393 (2003)
Sutradhar, B. C. Dynamic Mixed Models for Familial Longitudinal Data. New York: Springer (2011)
Sutradhar, B.C., Das, K.: On the efficiency of regression estimators in generalized linear models for longitudinal data. Biometrika 86, 459–465 (1999)
Sutradhar, B.C., Kovacevic, M.: Analysing ordinal longitudinal survey data: generalized estimating equations approach. Biometrika 87, 837–848 (2000)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
Variyath, A.M.: Variable selection in generalized linear models by empirical likelihood. Ph.D. thesis, University of Waterloo, Waterloo (2006)
Variyath, A.M., Chen, J., Abraham, B.: Empirical likelihood based variable selection. J. Stat. Plan. Infer. 140, 971–981 (2010)
Wang, H., Leng, C.: Unified LASSO estimation via least squares approximation. J. Am. Stat. Assoc. 102, 1039–1048 (2007)
Wang, L., Qu, A.: Consistent model selection and data-driven smooth tests for longitudinal data in the estimating equations approach. J. R. Stat. Soc. Ser. B 71, 177–190 (2009)
Wang, L., Li, H., Huang, J.: Variable selection in nonparametric varying- coefficient models for analysis of repeated measurements. J. Am. Stat. Assoc. 103, 1556–1569 (2008)
Wang, L., Zhou, J., Qu, A.: Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics 68, 353–360 (2012)
Wedderburn, R.W.M.: Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61 (3), 439–444 (1974)
Xu, P., Wu, P., Wang, Y., Zhu, L.X.: A GEE based shrinkage estimation for the generalized linear model in longitudinal data analysis. Technical report, Department of Mathematics, Hong Kong Baptist University, Hong Kong (2010)
Xue, L., Qu, A., Zhou, J.: Consistent model selection for marginal generalized additive model for correlated data. J. Am. Stat. Assoc. 105, 1518–1530 (2010)
Xiao, N., Zhang, D., Zhang, H.H.: Variable selection for semiparametric mixed models in longitudinal studies. Biometrics 66, 79–88 (2009)
Zhang, H.H., Lu, W.: Adaptive Lasso for Cox’s proportional hazards model. Biometrika 94, 691–703 (2007)
Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101 (476), 1418–1429 (2006)
Acknowledgements
The authors’ are grateful for the opportunity to present their work at the 2015 International Symposium in Statistics (ISS) on Advances in Parametric and Semiparametric Analysis of Multivariate, Time Series, Spatial-temporal, and Familial-longitudinal Data. Special thanks go to Professor Brajendra Sutradhar for organizing the conference, to members of the symposium audience for insightful discussion of our presentation, and to two anonymous referees for their thoughtful comments on our manuscript. The authors’ research was partially supported by grants from Natural Sciences & Engineering Research Council of Canada and Canadian Institute of Health Research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Nadarajah, T., Variyath, A.M., Loredo-Osti, J.C. (2016). Penalized Generalized Quasi-Likelihood Based Variable Selection for Longitudinal Data. In: Sutradhar, B. (eds) Advances and Challenges in Parametric and Semi-parametric Analysis for Correlated Data. Lecture Notes in Statistics(), vol 218. Springer, Cham. https://doi.org/10.1007/978-3-319-31260-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-31260-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31258-3
Online ISBN: 978-3-319-31260-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)