Small Sample Inference for Clustered Data

  • Ziding Feng
  • Thomas Braun
  • Charles McCulloch
Part of the Lecture Notes in Statistics book series (LNS, volume 179)


When the number of independent units is not adequate to invoke large sample approximations in clustered data analysis, a situation that often arises in group randomized trials (GRTs), valid and efficient small sample inference becomes important. We review the current methods for analyzing data from small numbers of clusters, namely methods based on full distribution assumptions (mixed effect models), semi-parametric methods based on Generalized Estimating Equations (GEE), and non-parametric methods based on permutation tests.

Key words

Correlated data group randomized trials linear mixed models Generalized Estimating Equations (GEE) permutation tests small sample inference 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Barlett MS (1937) Properties of sufficiency and statistical tests. Proceedings of the Royal Society, A, 160:268–282.CrossRefGoogle Scholar
  2. Braun T and Feng Z (2001) Optimal permutation tests for the analysis of group randomized trials. Journal of the American Statistical Association, 96:1424–1432.MathSciNetMATHCrossRefGoogle Scholar
  3. Breslow NE, Clayton DG (1993) Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association, 88:9–25.MATHCrossRefGoogle Scholar
  4. Chandra T and Ghosh J (1979) Valid asymptotic expansions for the likelihood ratio statistic and other perturbed chi-square variables. Sankhya A, 41:22–47.MathSciNetMATHGoogle Scholar
  5. Cox DR and Reid N (1987) Parameter orthogonality and approximate conditional inference 9 with discussion) Journal of the Royal Statistical Society Series B, 49:1–39.MathSciNetMATHGoogle Scholar
  6. Donner A, Eliasziw M, Klar N (1994) A comparison of methods for testing homogeneity of proportions for teratologic studies. Statistics in Medicine, 13:479–93.CrossRefGoogle Scholar
  7. Donner A, Klar N (2000) Design and Analysis of Cluster Randomization Trials In Health Research. New York, Oxford University Press.Google Scholar
  8. Edgington ES (1987) Randomization Tests Marcel Decker, New York.MATHGoogle Scholar
  9. Emrich L, Piedmonte M (1992) On some small sample properties of generalized estimating equation Estimates for multivariate dichotomous outcomes. Journal of Statistical Computation and Simulationa, 41:19–29.CrossRefGoogle Scholar
  10. Evans B, Feng Z, Peterson AV (2001) A comparison of generalized linear mixed model procedures with Estimating equations for variance and covariance parameter Estimation in longitudinal studies and group randomized trials. Statistics in Medicine, 20:3353–3373.CrossRefGoogle Scholar
  11. Fay M, Graubard B (2001) Small-sample adjustment for Wald-type tests using sandwich Estimators. Biometrics, 57:1198–1206.MathSciNetMATHCrossRefGoogle Scholar
  12. Feng Z, McLerran D, Grizzle J (1996) A comparison of statistical methods for clustered data analysis with Gaussian error. Statistics in Medicine, 15:1793–806.CrossRefGoogle Scholar
  13. Feng Z, Diehr P, Peterson A, McLerran D (2001) Selected statistical issues in group randomized trials. Annual Review of Public Health, 22:167–87.CrossRefGoogle Scholar
  14. Frydenberg M and Jensen J (1989) Is the ‘improved likelihood ratio statistic’ really improved in the discrete case? Biometrika, 76:655–662.MathSciNetMATHGoogle Scholar
  15. Gail MH, Tan WY, and Piantadosi S (1988) Tests for no treatment effect in randomized clinical trials, Biometrika, 75:57–64.MathSciNetMATHCrossRefGoogle Scholar
  16. Gail MH, Byar DP, Pechacek TF, Corle DK (1992) Aspects of statistical design for the Community Intervention Trial for Smoking Cessation (COMMIT). Controlled Clinical Trials, 123:6–21.CrossRefGoogle Scholar
  17. Gail MH, Mark SD, Carroll R, Greeen S, Pee D (1996) On design considerations and randomization-based inference for community intervention trials. Statistics in Medicine, 15:1069–92.CrossRefGoogle Scholar
  18. Good P Permutation Tests (1994) Springer-Verlag, New York.MATHGoogle Scholar
  19. Harville D (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association, 72:320–340.MathSciNetMATHCrossRefGoogle Scholar
  20. Jennrich Rand Schluchter M (1986). Unbalanced repeated-measures models with structured covariance matrices. Biometrics, 42:805–820.MathSciNetMATHCrossRefGoogle Scholar
  21. Kackar A and Harville D (1984) Approximations for standard errors of Estimators of fixed and random effects in mixed linear models. Journal of the American Statistical Association, 79:853–862.MathSciNetMATHGoogle Scholar
  22. Kauermann G, Carroll R (2001) A note on the efficiency of sandwich covariance matrix Estimation. Journal of the American Statistical Association, 96:1387–1396.MathSciNetMATHCrossRefGoogle Scholar
  23. Kenward M and Roger J (1997) Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53:983–997.MATHCrossRefGoogle Scholar
  24. Laird N and Ware J (1982). Random-effects models for longitudinal data. Biometrics, 38:963–974.MATHCrossRefGoogle Scholar
  25. Lehmann EL and Stein C (1949) On the theory of some non-parametric hypotheses. The Annals of Mathematical Statistics, 20:28–45.MathSciNetCrossRefGoogle Scholar
  26. Liang KY, Zeger SL (1986) Longitudinal Data Analysis Using Generalized Linear Models. Biometrika, 73:13–22.MathSciNetMATHCrossRefGoogle Scholar
  27. Lyons B, Peters D (2000) Applying Skovgaard’s modified directed likelihood statistic to mixed linear models. Journal of Statistical Computation and Simulations, 65:225–242.MATHCrossRefGoogle Scholar
  28. MacKinnon JG, White H (1985) Some heteroscedasticity-consistent covariance matrix Estimators with improved finite sample properties. Journal of Econometrics, 29:305–325.CrossRefGoogle Scholar
  29. Mancl L and DeRouren T (2001) A covariance Estimator for GEE with improved small-sample properties. Biometrics, 57:126–134.MathSciNetMATHCrossRefGoogle Scholar
  30. Maritz J, Jarrett R (1983) The use of statistics to examine the association between fluoride in drinking water and cancer death rates. Applied Statistics, 32:97–101.CrossRefGoogle Scholar
  31. McCulloch CE (1994) Maximum likelihood variance components Estimation for binary data. Journal of the American Statistical Association, 89:330–35.MATHCrossRefGoogle Scholar
  32. McCulloch CE (1997) Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statistical Association, 92:162–70.MathSciNetMATHCrossRefGoogle Scholar
  33. McCulloch CE and Searle SR (2001) Generalized, Linear, and Mixed Models. New York, Wiley.MATHGoogle Scholar
  34. McGilchrist CA. (1994) Estimation in Generalized Mixed Models. Journal of the Royal Statistical Society Series B, 56:61–69MathSciNetMATHGoogle Scholar
  35. Murray DM (1998) Design and Analysis of Group-Randomized Trials. New York, Oxford University Press.Google Scholar
  36. Neyman J, Iwaskiewicz K, and Kolodziejczyk T (1935) Statistical problems in agricultural experimentation. Journal of the Royal Statistical Society, 2:107–180.Google Scholar
  37. Pan W and Wall M (2002) Small-sample adjustments in using the sandwich variance Estimator in generalized Estimating equations. Statistics in Medicine, 21:1429–1441.CrossRefGoogle Scholar
  38. Park T (1993) A Comparison of the Generalized Estimating Equation Approach with the Maximum Likelihood Approach for Repeat ed measurements. Statistics in Medicine, 12:1723–1732.CrossRefGoogle Scholar
  39. Rao CR (1971) Minimum variance quadratic unbiased Estimation of variance components. Journal of Multivariate Analysis, 1:445–56.MathSciNetMATHCrossRefGoogle Scholar
  40. Romano J (1990) On the behavior of randomization tests without a group invariance assumption, Journal of the American Statistical Association, 85:686–692.MathSciNetMATHCrossRefGoogle Scholar
  41. Satterthwaite F (1941) Synthesis of variance. Psychometrika, 6:309–316.MathSciNetMATHCrossRefGoogle Scholar
  42. Schall R (1991) Estimation in Generalized Linear Models with Random Effects. Biometrika, 40:917–927.Google Scholar
  43. Sharples K, Breslow N (1992) Regression analysis of correlated binary data: some small sample results for the Estimating equation approach. Journal of Statistical Computation and Simulations, 42:1–20.MATHCrossRefGoogle Scholar
  44. Sorensen G, Thompson B, Glanz K, Feng Z, Kinne S, Diclemente C, Emmons K, Heimendinger J, Probart C, Lichtenstein E, for Working Well Trial (1996) Work site-based cancer prevention: primary results form the Working Well Trial. American Journal of Public Health, 86:939–947.CrossRefGoogle Scholar
  45. Thornquist M, Anderson G (1992) Small sample properties of generalized Estimating equations in group-randomized designs with gaussian response. Technical Report, Fred Hutchinson Cancer Research Center.Google Scholar
  46. Zucker D, Lieberman O, and Manor O (2000) Improved small sample inference in the mixed linear model: Bartlett correction and adjusted likelihood. Journal of the Royal Statistical Society Series B, 62:827–838.MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2004

Authors and Affiliations

  • Ziding Feng
    • 1
    • 2
    • 3
  • Thomas Braun
    • 1
    • 2
    • 3
  • Charles McCulloch
    • 1
    • 2
    • 3
  1. 1.Cancer Prevention Research ProgramFred Hutchinson Cancer Research CenterSeattleUSA
  2. 2.Department of BiostatisticsUniversity of MichiganAnn ArborUSA
  3. 3.Department of Epidemiology and BiostatisticsSan FranciscoUSA

Personalised recommendations