Strategies for Bias Reduction in Estimation of Marginal Means with Data Missing at Random

  • Baojiang ChenEmail author
  • Richard J. Cook
Part of the Fields Institute Communications book series (FIC, volume 63)


Incomplete data are common in many fields of research, and interest often lies in estimating a marginal mean based on available information. This paper is concerned with the comparison of different strategies for estimating the marginal mean of a response when data are missing at random. We evaluate these methods based on the asymptotic bias, empirical bias and efficiency. We show that complete case analysis gives biased results when data are missing at random, but inverse probability weighted estimating equations (IPWEE) and a method based on the expected conditional mean (ECM) yield consistent estimators.. While these methods give estimators which behave similarly in the contexts studied they are based on quite different assumptions. The IPWEE approach requires analysts to specify a model for the missing data mechanism whereas the ECM approach requires a model for the distribution of auxiliary variables driving the missing data mechanism. The latter can be a challenge in practice, particularly when the covariates are of high dimension or are a mixture of continuous and categorical variables. The IPWEE approach therefore has considerable appeal in many practical settings.


Consistent Estimator Covariate Vector Covariate Distribution Inverse Probability Weight Miss Data Mechanism 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    R.E. Bellman, Adaptive Control Processes (Princeton University Press, Princeton, 1961)zbMATHGoogle Scholar
  2. 2.
    R. Cameron, K.S. Brown, J.A. Best, C.L. Pelkman, C.L. Madill, S.R. Manske, M.E. Payne, Effectiveness of a social influences smoking prevention program as a function of provider type, training method, and social risk. Am. J. Public Health 89, 1827–1831 (1999)CrossRefGoogle Scholar
  3. 3.
    P.J. Diggle, P. Heagerty, K.Y. Liang, S.L. Zeger, Analysis of Longitudinal Data, 2nd edn. (Oxford University Press, London, 2002)Google Scholar
  4. 4.
    J.H. Friedman, An Overview of Predictive Learning and Function Approximation, ed. by V. Cherkassky, J.H. Friedman, H. Wechsler. From Statistics to Neural Networks. Proc. NATO/ASI Workshop (Springer, Berlin, 1994), pp. 1–61Google Scholar
  5. 5.
    G.N. Hortobagyi, R.L. Theriault, A. Lipton, L. Porter, D. Blayney, C. Sinoff, H. Wheeler, J.F. Simeone, J.J. Seaman, R.D. Knight, M. Heffernan, K. Mellars, D.J. Reitsma, Long-term prevention of skeletal complications of metastatic breast cancer with Pamidronate. J. Clin. Oncol. 16, 2038–2044 (1998)Google Scholar
  6. 6.
    D.G. Horvitz, D.J. Thompson, A generalization of sampling without replacement from a finite universe, J. Am. Stat. Assoc. 47 663–685 (1952)MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    J.D.Y. Kang, J.L. Schafer, Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Stat. Sci. 22, 523–539 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    K.Y. Liang, S.L. Zeger, Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22 (1986)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    R.J.A. Little, D.B. Rubin, Statistical Analysis with Missing Data (Wiley, 2nd edn. 2002)Google Scholar
  10. 10.
    C.R. Loader, Local likelihood density estimation. Ann. Stat. 24, 1602–1618 (1996)MathSciNetzbMATHGoogle Scholar
  11. 11.
    P. McCullagh, J.A. Nelder, Generalized Linear Models (Chapman and Hall, London, 1989)zbMATHGoogle Scholar
  12. 12.
    J. Qin, B. Zhang, Empirical-likelihood-based inference in missing response problems and its application in observational studies. J. Roy. Stat. Soc. B 69, 101–122 (2007)MathSciNetCrossRefGoogle Scholar
  13. 13.
    J.M. Robins, A. Rotnitzky, L.P. Zhao, Estimation of regression coefficients when some regressor are not always observed. J. Am. Stat. Assoc. 89, 846–866 (1994)MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    J.M. Robins, A. Rotnitzky, L.P. Zhao, Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Am. Stat. Assoc. 90, 106–121 (1995)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    D.B. Rubin, Inference and Missing data. Biometrika 63, 581–592 (1976)zbMATHGoogle Scholar
  16. 16.
    D.B. Rubin, Multiple Imputation for Nonresponse in Surveys (Wiley, New York, 1987)CrossRefGoogle Scholar
  17. 17.
    J.L. Schafer, Analysis of Incomplete Multivariate Data (Chapman and Hall, New York, 1997)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of BiostatisticsUniversity of Nebraska Medical CenterOmahaUSA
  2. 2.Department of Statistics and Actuarial ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations