Abstract
For analyzing correlated binary data with high-dimensional covariates, we, in this paper, propose a two-stage shrinkage approach. First, we construct a weighted least-squares (WLS) type function using a special weighting scheme on the non-conservative vector field of the generalized estimating equations (GEE) model. Second, we define a penalized WLS in the spirit of the adaptive LASSO for simultaneous variable selection and parameter estimation. The proposed procedure enjoys the oracle properties in high-dimensional framework where the number of parameters grows to infinity with the number of clusters. Moreover, we prove the consistency of the sandwich formula of the covariance matrix even when the working correlation matrix is misspecified. For the selection of tuning parameter, we develop a consistent penalized quadratic form (PQF) function criterion. The performance of the proposed method is assessed through a comparison with the existing methods and through an application to a crossover trial in a pain relief study.
Similar content being viewed by others
References
Bondell H D, Krishna A, Ghosh S K. Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics, 2010, 66: 1069–1077
Cantoni E, Flemming J M, Ronchetti E. Variable selection for marginal longitudinal generalized linear models. Biometrics, 2005, 61: 507–514
Cantoni E, Filed C, Flemming J M, et al. Longitudinal variable selection by cross-validation in the case of many covariates. Stat Med, 2005, 26: 919–930
Diggle P J, Heagerty P J, Liang K Y, et al. Analysis of Longitudinal Data, 2nd edition. New York: Oxford University Press, 2002
Dziak J J, Li R Z. An overview on variable selection for longitudinal data. In: Quantitative Medical Data Analysis Using Mathematical Tools and Statistical Techniques Chapter One. Singapore: World Scientific Publishing Co., 2006
Fan J Q, Li R Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1359
Frank I E, Friedman J H. A statistical view of some chemometrics regression tools (with discussion). Technometrics, 1993, 35: 109–148
Fu W J. Penalized estimating equations. Biometrics, 2003, 59: 126–132
Leng C L, Li B. Least squares approximation with a diverging number of parameters. Statist Prob Lett, 2010, 80: 254–261
Liang K Y, Zeger S L. Longitudinal data analysis using generalised linear models. Biometrics, 1986, 73: 12–22
Lv J C, Fan Y Y. A unified approach to model selection and sparse recovery using regularized least squares. Ann Statist, 2009, 37: 3498–3528
Oman S D. Easily simulated multivariate binary distributions with given positive and negative correlations. Comput Stat Data Anal, 2009, 53: 999–1005
Pan W. Akaike’s Information Criterion in Generalized Estimating Equations. Biometrics, 2001, 57: 120–125
Pepe M S, Anderson G L. A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Comm Statist Ser B, 1994, 23: 939–951
Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer, 2000
Wang H, Leng C. Unified lasso estimation via least square approximation. J Amer Statist Assoc, 2007, 102: 1039–1048
Wang L. GEE analysis of clustered binary data with diverging number of covariates. Ann Statist, 2011, 39: 389–417
Wang L, Qu A. Consistent model selection and data-driven smooth tests for longitudinal data in the estimating equations approach. J R Statist Soc B, 2009, 71: 177–190
Zhang C H. Nearly unbiased variable selection under minimax concave penalty. Ann Statist, 2010, 38: 894–942
Zou H. The adaptive Lasso and its oracle properties. J Amer Statist Assoc, 2006, 101: 1418–1429
Zou H, Zhang H H. On the adaptive elastic-net with a diverging number of parameters. Ann Statist, 2009, 37: 1733–1751
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xu, P., Fu, W. & Zhu, L. Shrinkage estimation analysis of correlated binary data with a diverging number of parameters. Sci. China Math. 56, 359–377 (2013). https://doi.org/10.1007/s11425-012-4564-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-012-4564-y
Keywords
- correlated binary data
- variable selection
- diverging number of parameters
- adaptive LASSO
- GEE
- oracle properties
- sandwich covariance formula
- penalized quadratic form function