Abstract
In analyzing longitudinal (panel), multicenter or otherwise clustered data, one often discovers that between and within individual or cluster regression coefficients are not equal. Clearly, this creates difficulties for interpretation and casts doubt on any causal interpretation of the coefficients. To help interpretation, it is useful to note that several types of model misspecification leading to this phenomenon can be reformulated as problems of omitted covariates. We consider models where the relationship between the outcome and the covariates is known, as long as all covariates are correctly measured and included. However, when there are unknown confounders, selection bias or measurement error, this is structurally equivalent to covariates being omitted. We first derive the correct form of the mean structure conditionally on the observed covariates, and then examine the consequences of fitting models which do not take into account the possibility of omitted covariates. We give examples of constraints, which allow estimation of the original parameters of interest. The models are used to help interpret analyses of data from the Asset and Health Dynamics of the Oldest Old (AHEAD), a panel survey of individuals age 70 and above.
Similar content being viewed by others
References
Anderson, T.W., An Introduction to Multivariate Statistical Analysis, 2nd ed. New York, Wiley, 1984.
Baltagi, B.H., Econometric Analysis of Panel Data. Chichester, England, 2001.
Berlin, J.A., Kimmel, S.E., TenHave, T.R., and Samuel, M.D., “Empirical comparison of several clustered data approaches under confounding due to cluster effects in the analysis of complications of coronary angioplasty,” Biometrics 55, 470-476, 1999.
Carroll, R.J., Ruppert, D., and Stefanski, L.A., Measurement Error in Nonlinear Models. Chapman and Hall, London, 1995.
Chao, W., Palta, M., and Young, T., “Effect of omitted confounders on the analysis of correlated binary data,” Biometrics 53, 678-689, 1997.
DeMets, D.L. and Halperin, M., “Estimation of a simple regression coefficient in samples arising from a sub-sampling procedure,” Biometrics 33, 47-56, 1977.
Diggle, P.J., Liang, K.Y., and Zeger, S.L. Analysis of Longitudinal Data. Oxford University Press Inc., New York, 1994.
Dwyer, J.H., Feinleib, M., Lippert, P., and Hoffmeister, H., Statistical Models for Longitudinal Studies of Health, Ch. 1. Oxford University Press, New York, 1992.
Gail, M.H., Wieand, S., and Piantadosi, S., “Biased estimates of treatment effect in randomized experiments with nonlinear regression and omitted covariates,” Biometrika 71, 131-144, 1984.
Hausman, J.A., “Specification tests in econometrics,” Econometrics 46, 1251-1271, 1978.
Hausman, J.A. and Taylor, W.E., “Panel data and unobservable individual effects,” Econometrics 49, 1377-1398, 1981.
Liang, K.Y. and Zeger, S.L., “Longitudinal data analysis using generalized linear models,” Biometrika 73, 13-22, 1986.
Louis, T.A., Robins, J., Dockery, D.W., Spiro, A. III., and Ware, J.H., “Explaining discrepancies between longitudinal and cross-sectional models,” Journal of Chronic Diseases 39, 831-839, 1986.
Mundlak, Y., “On the pooling of time series and cross-sectional data,” Econometrics 46, 69-86, 1978.
Neuhaus, J.M. and Jewell, N.P., “The effect of retrospective sampling on binary regression models for clustered data,” Biometrics 46, 977-990, 1990.
Neuhaus, J.M. and Jewell, N.P., “A geometric approach to assess bias due to omitted covariates in generalized linear models,” Biometrika 80, 807-815, 1990.
Neuhaus, J.M. and Kalbfleisch, J.D., “Between and within-cluster covariate effects in the analysis of clustered data,” Biometrics 54, 638-645, 1998.
Neuhaus, J.M., Kalbfleisch, J.D., and Hauck, W.W., “Acomparison of cluster-specific and population-averaged approaches for analyzing clustered binary data,” International Statistical Review 59, 25-31, 1991.
Newschaffer, C.J., Bush, T.L., and Hale, W.E., “Aging and total cholesterol level: Cohort, period and survivorship effect,” Am. J. Epidemiol. 136, 23-34, 1992.
Palta, M. and Lin, C-Y., “Latent variables, measurement error and methods for analysing longitudinal binary and ordinal data,” Statistics in Medicine 18, 385-396, 1999.
Palta, M., Lin, C-Y., and Chao, W., “Effect of confounding and other misspecification in models for longitudinal data.” In Modeling Longitudinal and Spatially Correlated Data. Lecture Notes in Statistics Series, 122. Proceeding of the Nantucket Conference on Longitudinal and Correlated Data. Springer-Verlag, 1997, pp. 77-88.
Palta, M. and Yao, T.-J., “Analysis of longitudinal data with unmeasured confounders,” Biometrics 47, 1355-1369, 1991.
Pan, W., “A note on the use of marginal likelihood and conditional likelihood in analyzing clustered data,” The American Statistician 56, 171-174, 2002.
Park, S., Palta, M., Shao, J., and Shen, L., “Bias adjustment in analyzing longitudinal data with informative missingness,” Statistics in Medicine 21, 277-291, 2002.
Pepe, M.S. and Anderson, G.L., “A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data,” Communications in Statistics 23, 939-951, 1994.
SAS Institute Inc., SAS/STAT user's guide, Version 6, 4th ed., vol 1. Cary, NC: SAS Institute Inc., 1989.
SAS Institute Inc., SAS/STAT user's guide. Version 8, Cary, NC: SAS Institute Inc., 1999.
Scott, A.J. and Holt, D., “The effect of two-stage sampling on ordinary least squares methods,” Journal of the American Statistical Association 77, 848-854, 1982.
Shen, L.S., “Analyses of longitudinal data: Measurement error, confounding and model misspecification,” Ph.D. Dissertation, University of Wisconsin, Madison, 2001.
Shen, L., Palta, M., Shao, J., and Park, S., “Model misspecification and different between-and within-cluster covariate effects in the analysis of correlated data,” Proceedings of the Biometrics Section of the American Statistical Association 219-224, 1999.
Shen, L., Palta, M., Shao, J., and Park, S., “Consistent estimation of marginal regression parameters for longitudinal data with covariate measurement error when replicates are not available,” Proceedings of the Biometrics Section of the American Statistical Association, 2000.
Vollmer, W.M., Johnson, L.R., McCamant, L.E., and Buist, A.S., “Longitudinal versus cross-sectional estimation of lung function decline-further insights,” Statistics in Medicine 7, 685-696, 1988.
Wang, J., Shao, J., and Palta, M., “Testing model fit in longitudinal data analysis against alternatives with omitted covariates,” Statistics in Medicine 21, 729-741, 2002.
Wansbeek, T.T. and Meijer, E., “Measurement Error and Latent Variables in Econometrics,” in Advanced Textbooks in Economics, Vol. 37, (C.J. Bliss and M.D. Intriligator, ed.) North-Holland, Elsevier Science B.V., Amsterdam, The Netherlands, 2000.
Ware, J.H., Dockery, D.W., Louis, T.A., Xu, X., Ferris, B.J., and Speizer, F.E., “Longitudinal and crosssectional estimates of pulmonary function decline in never-smoking adults,” Am. J. Epidemiol. 132, 685-700, 1990.
Wishart, J., “Growth-rate determinations in nutrition studies with the bacon pig, and their analysis,” Biometrika 30, 16-28, 1938.
Wu, M.C. and Carroll, R.J., “Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process,” Biometrics 44, 175-188, 1988.
Zeger, S.L., Liang, K.Y., and Albert, P.S., “Models for longitudinal data: A generalized estimating equation approach,” Biometrics 44, 1049-1060, 1988.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Palta, M., Seplaki, C. Causes, Problems and Benefits of Different Between and Within Effects in the Analysis of Clustered Data. Health Services & Outcomes Research Methodology 3, 177–193 (2002). https://doi.org/10.1023/A:1025893627073
Issue Date:
DOI: https://doi.org/10.1023/A:1025893627073