Skip to main content
Log in

Shrinkage estimation analysis of correlated binary data with a diverging number of parameters

  • Articles
  • Published:
Science China Mathematics Aims and scope Submit manuscript

Abstract

For analyzing correlated binary data with high-dimensional covariates, we, in this paper, propose a two-stage shrinkage approach. First, we construct a weighted least-squares (WLS) type function using a special weighting scheme on the non-conservative vector field of the generalized estimating equations (GEE) model. Second, we define a penalized WLS in the spirit of the adaptive LASSO for simultaneous variable selection and parameter estimation. The proposed procedure enjoys the oracle properties in high-dimensional framework where the number of parameters grows to infinity with the number of clusters. Moreover, we prove the consistency of the sandwich formula of the covariance matrix even when the working correlation matrix is misspecified. For the selection of tuning parameter, we develop a consistent penalized quadratic form (PQF) function criterion. The performance of the proposed method is assessed through a comparison with the existing methods and through an application to a crossover trial in a pain relief study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Bondell H D, Krishna A, Ghosh S K. Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics, 2010, 66: 1069–1077

    Article  MathSciNet  MATH  Google Scholar 

  2. Cantoni E, Flemming J M, Ronchetti E. Variable selection for marginal longitudinal generalized linear models. Biometrics, 2005, 61: 507–514

    Article  MathSciNet  MATH  Google Scholar 

  3. Cantoni E, Filed C, Flemming J M, et al. Longitudinal variable selection by cross-validation in the case of many covariates. Stat Med, 2005, 26: 919–930

    Article  Google Scholar 

  4. Diggle P J, Heagerty P J, Liang K Y, et al. Analysis of Longitudinal Data, 2nd edition. New York: Oxford University Press, 2002

    Google Scholar 

  5. Dziak J J, Li R Z. An overview on variable selection for longitudinal data. In: Quantitative Medical Data Analysis Using Mathematical Tools and Statistical Techniques Chapter One. Singapore: World Scientific Publishing Co., 2006

    Google Scholar 

  6. Fan J Q, Li R Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1359

    Article  MathSciNet  MATH  Google Scholar 

  7. Frank I E, Friedman J H. A statistical view of some chemometrics regression tools (with discussion). Technometrics, 1993, 35: 109–148

    Article  MATH  Google Scholar 

  8. Fu W J. Penalized estimating equations. Biometrics, 2003, 59: 126–132

    Article  MathSciNet  MATH  Google Scholar 

  9. Leng C L, Li B. Least squares approximation with a diverging number of parameters. Statist Prob Lett, 2010, 80: 254–261

    Article  MathSciNet  MATH  Google Scholar 

  10. Liang K Y, Zeger S L. Longitudinal data analysis using generalised linear models. Biometrics, 1986, 73: 12–22

    MathSciNet  Google Scholar 

  11. Lv J C, Fan Y Y. A unified approach to model selection and sparse recovery using regularized least squares. Ann Statist, 2009, 37: 3498–3528

    Article  MathSciNet  MATH  Google Scholar 

  12. Oman S D. Easily simulated multivariate binary distributions with given positive and negative correlations. Comput Stat Data Anal, 2009, 53: 999–1005

    Article  MathSciNet  MATH  Google Scholar 

  13. Pan W. Akaike’s Information Criterion in Generalized Estimating Equations. Biometrics, 2001, 57: 120–125

    Article  MathSciNet  MATH  Google Scholar 

  14. Pepe M S, Anderson G L. A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Comm Statist Ser B, 1994, 23: 939–951

    Article  Google Scholar 

  15. Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer, 2000

    MATH  Google Scholar 

  16. Wang H, Leng C. Unified lasso estimation via least square approximation. J Amer Statist Assoc, 2007, 102: 1039–1048

    Article  MathSciNet  MATH  Google Scholar 

  17. Wang L. GEE analysis of clustered binary data with diverging number of covariates. Ann Statist, 2011, 39: 389–417

    Article  MathSciNet  MATH  Google Scholar 

  18. Wang L, Qu A. Consistent model selection and data-driven smooth tests for longitudinal data in the estimating equations approach. J R Statist Soc B, 2009, 71: 177–190

    Article  MathSciNet  MATH  Google Scholar 

  19. Zhang C H. Nearly unbiased variable selection under minimax concave penalty. Ann Statist, 2010, 38: 894–942

    Article  MathSciNet  MATH  Google Scholar 

  20. Zou H. The adaptive Lasso and its oracle properties. J Amer Statist Assoc, 2006, 101: 1418–1429

    Article  MathSciNet  MATH  Google Scholar 

  21. Zou H, Zhang H H. On the adaptive elastic-net with a diverging number of parameters. Ann Statist, 2009, 37: 1733–1751

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to LiXing Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, P., Fu, W. & Zhu, L. Shrinkage estimation analysis of correlated binary data with a diverging number of parameters. Sci. China Math. 56, 359–377 (2013). https://doi.org/10.1007/s11425-012-4564-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11425-012-4564-y

Keywords

MSC(2010)

Navigation