Penalized Generalized Quasi-Likelihood Based Variable Selection for Longitudinal Data
High-dimensional longitudinal data with a large number of covariates, have become increasingly common in many bio-medical applications. The identification of a sub-model that adequately represents the data is necessary for easy interpretation. Also, the inclusion of redundant variables may hinder the accuracy and efficiency of estimation and inference. The joint likelihood function for longitudinal data is challenging, particularly in correlated discrete data. To overcome this problem Wang et al. (Biometrics 68:353–360, 2012) introduced penalized GEEs (PGEEs) with a non-convex penalty function which requires only the first two marginal moments and a working correlation matrix. This method works reasonably well in high-dimensional problems; however, there is a risk of model mis-specification such as variance function and correlation structure and in such situations, we propose variable selection based on penalized generalized quasi-likelihood (PGQL). Simulation studies show that when model assumptions are true, the PGQL method has performance comparable with that of PGEEs. However, when the model is mis-specified, the PGQL method has clear advantages over the PGEEs method. We have implemented the proposed method in a real case example.
KeywordsGEEs Generalized quasi-likelihood Longitudinal data Variable selection
The authors’ are grateful for the opportunity to present their work at the 2015 International Symposium in Statistics (ISS) on Advances in Parametric and Semiparametric Analysis of Multivariate, Time Series, Spatial-temporal, and Familial-longitudinal Data. Special thanks go to Professor Brajendra Sutradhar for organizing the conference, to members of the symposium audience for insightful discussion of our presentation, and to two anonymous referees for their thoughtful comments on our manuscript. The authors’ research was partially supported by grants from Natural Sciences & Engineering Research Council of Canada and Canadian Institute of Health Research.
- Akaike, H: Information theory as a extension of maximum likelihood principle. In: Petrove, B.N., Csaki, F. (eds.) Second Symposium of Information Theory, pp. 267–282. Akademiai Kiado, Budapest (1973)Google Scholar
- Dziak, J.J., Li, R., Qu, A.: An overview on quadratic inference function approaches for longitudinal data. In Frontiers of Statistics, Volume 1: New Developments in Biostatistics and Bioinformatics, J. Fan, J.S. Liu, and X. Lin (eds), Chapter 3, 49–72. 5 Toh Tuch Link, Singapore: World Scientific Publishing, (2009)Google Scholar
- Nadarajah, T.: Penalized empirical likelihood based variable selection. M.Sc. thesis, Memorial University of Newfoundland, St. John’s (2011)Google Scholar
- Variyath, A.M.: Variable selection in generalized linear models by empirical likelihood. Ph.D. thesis, University of Waterloo, Waterloo (2006)Google Scholar
- Xu, P., Wu, P., Wang, Y., Zhu, L.X.: A GEE based shrinkage estimation for the generalized linear model in longitudinal data analysis. Technical report, Department of Mathematics, Hong Kong Baptist University, Hong Kong (2010)Google Scholar