Skip to main content
Log in

Causes, Problems and Benefits of Different Between and Within Effects in the Analysis of Clustered Data

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

In analyzing longitudinal (panel), multicenter or otherwise clustered data, one often discovers that between and within individual or cluster regression coefficients are not equal. Clearly, this creates difficulties for interpretation and casts doubt on any causal interpretation of the coefficients. To help interpretation, it is useful to note that several types of model misspecification leading to this phenomenon can be reformulated as problems of omitted covariates. We consider models where the relationship between the outcome and the covariates is known, as long as all covariates are correctly measured and included. However, when there are unknown confounders, selection bias or measurement error, this is structurally equivalent to covariates being omitted. We first derive the correct form of the mean structure conditionally on the observed covariates, and then examine the consequences of fitting models which do not take into account the possibility of omitted covariates. We give examples of constraints, which allow estimation of the original parameters of interest. The models are used to help interpret analyses of data from the Asset and Health Dynamics of the Oldest Old (AHEAD), a panel survey of individuals age 70 and above.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anderson, T.W., An Introduction to Multivariate Statistical Analysis, 2nd ed. New York, Wiley, 1984.

    Google Scholar 

  2. Baltagi, B.H., Econometric Analysis of Panel Data. Chichester, England, 2001.

  3. Berlin, J.A., Kimmel, S.E., TenHave, T.R., and Samuel, M.D., “Empirical comparison of several clustered data approaches under confounding due to cluster effects in the analysis of complications of coronary angioplasty,” Biometrics 55, 470-476, 1999.

    Google Scholar 

  4. Carroll, R.J., Ruppert, D., and Stefanski, L.A., Measurement Error in Nonlinear Models. Chapman and Hall, London, 1995.

    Google Scholar 

  5. Chao, W., Palta, M., and Young, T., “Effect of omitted confounders on the analysis of correlated binary data,” Biometrics 53, 678-689, 1997.

    Google Scholar 

  6. DeMets, D.L. and Halperin, M., “Estimation of a simple regression coefficient in samples arising from a sub-sampling procedure,” Biometrics 33, 47-56, 1977.

    Google Scholar 

  7. Diggle, P.J., Liang, K.Y., and Zeger, S.L. Analysis of Longitudinal Data. Oxford University Press Inc., New York, 1994.

    Google Scholar 

  8. Dwyer, J.H., Feinleib, M., Lippert, P., and Hoffmeister, H., Statistical Models for Longitudinal Studies of Health, Ch. 1. Oxford University Press, New York, 1992.

    Google Scholar 

  9. Gail, M.H., Wieand, S., and Piantadosi, S., “Biased estimates of treatment effect in randomized experiments with nonlinear regression and omitted covariates,” Biometrika 71, 131-144, 1984.

    Google Scholar 

  10. Hausman, J.A., “Specification tests in econometrics,” Econometrics 46, 1251-1271, 1978.

    Google Scholar 

  11. Hausman, J.A. and Taylor, W.E., “Panel data and unobservable individual effects,” Econometrics 49, 1377-1398, 1981.

    Google Scholar 

  12. Liang, K.Y. and Zeger, S.L., “Longitudinal data analysis using generalized linear models,” Biometrika 73, 13-22, 1986.

    Google Scholar 

  13. Louis, T.A., Robins, J., Dockery, D.W., Spiro, A. III., and Ware, J.H., “Explaining discrepancies between longitudinal and cross-sectional models,” Journal of Chronic Diseases 39, 831-839, 1986.

    Google Scholar 

  14. Mundlak, Y., “On the pooling of time series and cross-sectional data,” Econometrics 46, 69-86, 1978.

    Google Scholar 

  15. Neuhaus, J.M. and Jewell, N.P., “The effect of retrospective sampling on binary regression models for clustered data,” Biometrics 46, 977-990, 1990.

    Google Scholar 

  16. Neuhaus, J.M. and Jewell, N.P., “A geometric approach to assess bias due to omitted covariates in generalized linear models,” Biometrika 80, 807-815, 1990.

    Google Scholar 

  17. Neuhaus, J.M. and Kalbfleisch, J.D., “Between and within-cluster covariate effects in the analysis of clustered data,” Biometrics 54, 638-645, 1998.

    Google Scholar 

  18. Neuhaus, J.M., Kalbfleisch, J.D., and Hauck, W.W., “Acomparison of cluster-specific and population-averaged approaches for analyzing clustered binary data,” International Statistical Review 59, 25-31, 1991.

    Google Scholar 

  19. Newschaffer, C.J., Bush, T.L., and Hale, W.E., “Aging and total cholesterol level: Cohort, period and survivorship effect,” Am. J. Epidemiol. 136, 23-34, 1992.

    Google Scholar 

  20. Palta, M. and Lin, C-Y., “Latent variables, measurement error and methods for analysing longitudinal binary and ordinal data,” Statistics in Medicine 18, 385-396, 1999.

    Google Scholar 

  21. Palta, M., Lin, C-Y., and Chao, W., “Effect of confounding and other misspecification in models for longitudinal data.” In Modeling Longitudinal and Spatially Correlated Data. Lecture Notes in Statistics Series, 122. Proceeding of the Nantucket Conference on Longitudinal and Correlated Data. Springer-Verlag, 1997, pp. 77-88.

  22. Palta, M. and Yao, T.-J., “Analysis of longitudinal data with unmeasured confounders,” Biometrics 47, 1355-1369, 1991.

    Google Scholar 

  23. Pan, W., “A note on the use of marginal likelihood and conditional likelihood in analyzing clustered data,” The American Statistician 56, 171-174, 2002.

    Google Scholar 

  24. Park, S., Palta, M., Shao, J., and Shen, L., “Bias adjustment in analyzing longitudinal data with informative missingness,” Statistics in Medicine 21, 277-291, 2002.

    Google Scholar 

  25. Pepe, M.S. and Anderson, G.L., “A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data,” Communications in Statistics 23, 939-951, 1994.

    Google Scholar 

  26. SAS Institute Inc., SAS/STAT user's guide, Version 6, 4th ed., vol 1. Cary, NC: SAS Institute Inc., 1989.

    Google Scholar 

  27. SAS Institute Inc., SAS/STAT user's guide. Version 8, Cary, NC: SAS Institute Inc., 1999.

    Google Scholar 

  28. Scott, A.J. and Holt, D., “The effect of two-stage sampling on ordinary least squares methods,” Journal of the American Statistical Association 77, 848-854, 1982.

    Google Scholar 

  29. Shen, L.S., “Analyses of longitudinal data: Measurement error, confounding and model misspecification,” Ph.D. Dissertation, University of Wisconsin, Madison, 2001.

    Google Scholar 

  30. Shen, L., Palta, M., Shao, J., and Park, S., “Model misspecification and different between-and within-cluster covariate effects in the analysis of correlated data,” Proceedings of the Biometrics Section of the American Statistical Association 219-224, 1999.

  31. Shen, L., Palta, M., Shao, J., and Park, S., “Consistent estimation of marginal regression parameters for longitudinal data with covariate measurement error when replicates are not available,” Proceedings of the Biometrics Section of the American Statistical Association, 2000.

  32. Vollmer, W.M., Johnson, L.R., McCamant, L.E., and Buist, A.S., “Longitudinal versus cross-sectional estimation of lung function decline-further insights,” Statistics in Medicine 7, 685-696, 1988.

    Google Scholar 

  33. Wang, J., Shao, J., and Palta, M., “Testing model fit in longitudinal data analysis against alternatives with omitted covariates,” Statistics in Medicine 21, 729-741, 2002.

    Google Scholar 

  34. Wansbeek, T.T. and Meijer, E., “Measurement Error and Latent Variables in Econometrics,” in Advanced Textbooks in Economics, Vol. 37, (C.J. Bliss and M.D. Intriligator, ed.) North-Holland, Elsevier Science B.V., Amsterdam, The Netherlands, 2000.

    Google Scholar 

  35. Ware, J.H., Dockery, D.W., Louis, T.A., Xu, X., Ferris, B.J., and Speizer, F.E., “Longitudinal and crosssectional estimates of pulmonary function decline in never-smoking adults,” Am. J. Epidemiol. 132, 685-700, 1990.

    Google Scholar 

  36. Wishart, J., “Growth-rate determinations in nutrition studies with the bacon pig, and their analysis,” Biometrika 30, 16-28, 1938.

    Google Scholar 

  37. Wu, M.C. and Carroll, R.J., “Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process,” Biometrics 44, 175-188, 1988.

    Google Scholar 

  38. Zeger, S.L., Liang, K.Y., and Albert, P.S., “Models for longitudinal data: A generalized estimating equation approach,” Biometrics 44, 1049-1060, 1988.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Palta, M., Seplaki, C. Causes, Problems and Benefits of Different Between and Within Effects in the Analysis of Clustered Data. Health Services & Outcomes Research Methodology 3, 177–193 (2002). https://doi.org/10.1023/A:1025893627073

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025893627073

Navigation