Missing Data

Little, Roderick J. A.; Schenker, Nathaniel

doi:10.1007/978-1-4899-1292-3_2

Roderick J. A. Little⁴ &
Nathaniel Schenker⁵

675 Accesses
127 Citations

Abstract

Studies in the social and behavioral sciences frequently suffer from missing data. For instance, sample surveys often have some individuals who either refuse to participate or do not supply answers to certain questions, and panel studies often have incomplete data due to attrition. Recent comprehensive treatments of the subject of missing data include three volumes produced by the Panel on Incomplete Data of the Committee on National Statistics (Madow, Nisselson, and Olkin 1983; Madow and Olkin 1983; Madow, Olkin, and Rubin 1983) and Little and Rubin (1987).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Afifi, A. A., and Elashoff, R. M. (1967), “Missing observations in multivariate statistics II: point estimation in simple linear regression,” Journal of the American Statistical Association, 62, 595–604.
Google Scholar
Anderson, A. B., Basilevsky, A., and Hum, D. P. J. (1983), “Missing data: a review of the literature,” in P.H. Rossi, J. D. Wright, and A. B. Anderson (eds.), Handbook of Survey Research, pp. 415–494, New York: Academic Press.
Google Scholar
Anderson, T. W. (1957), “Maximum likelihood estimation for the multivariate normal distribution when some observations are missing,” Journal of the American Statistical Association, 52, 200–203.
Article Google Scholar
Azen, S. P., Van Guilder, M., and Hill, M. A. (1989), “Estimation of parameters and missing values under a regression model with non-normally distributed and non-randomly incomplete data,” Statistics in Medicine, 8, 217–228.
Article PubMed Google Scholar
Baker, S. G., and Laird, N. M. (1988), “Regression analysis for categorical variables with outcome subject to nonignorable nonresponse,” Journal of the American Statistical Association, 81, 29–41.
Google Scholar
Basu, D. (1971), “An essay on the logical foundations of survey sampling, Part 1,” in V. R Godambe and D. A. Sprott (eds.), Foundations of Statistical Inference, pp. 203242. Toronto: Holt, Rinehart, and Winston.
Google Scholar
Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975), Discrete Multivariate Analysis: Theory and Practice, Cambridge, MA: MIT Press.
Google Scholar
Casella, G., and George, E. I. (1992), “Explaining the Gibbs sampler,” The American Statistician, 46, 167–174.
Google Scholar
Chen, T., and Fienberg, S. E. (1974), “Two-dimensional contingency tables with both completely and partially classified data, ” Biometrics, 30, 629–642.
Article Google Scholar
Clogg, C. C., Rubin, D. B., Schenker, N., Schultz, B., and Weidman, L. (1991), “Multiple imputation of industry and occupation codes in census public-use samples using Bayesian logistic regression,” Journal of the American Statistical Association, 86, 68–78.
Article Google Scholar
Czajka, J. L., Hirabayashi, S. M., Little, R. J. A., and Rubin, D. B. (1992), “Projecting from advance data using propensity modeling: an application to income and tax statistics,” Journal of Business and Economic Statistics, 10, 117.-132.
Google Scholar
David, M. H., Little, R. J. A., Samuhel, M. E., and Triest, R. K. (1986), “Alternative methods for CPS income imputation,” Journal of the American Statistical Association, 81, 29–41.
Article Google Scholar
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, Ser. B, 39, 1–38.
Google Scholar
Dixon, W. J., ed. (1988), BMDP Statistical Software, Los Angeles: University of California Press.
Google Scholar
Dorey, F. J.,Little, R. J. A., and Schenker, N. (1993), “Multiple imputation for threshold-crossing data with interval censoring,” UCLA Statistics Series,No. 81. To appear in Statistics in Medicine.
Google Scholar
Efron, B. (1979), “Bootstrap methods: Another look at the jackknife,” The Annals of Statistics, 7, 1–26.
Article Google Scholar
Fay, R. E. (1986), “Causal models for patterns of nonresponse,” Journal of the American Statistical Association, 81, 354–365.
Article Google Scholar
Fuchs, C. (1982), “Maximum likelihood estimation and model selection in contingency tables with missing data,” Journal of the American Statistical Association, 77, 270278.
Google Scholar
Gelfand, A. E., Hills, S. E., Racine-Poon, A., and Smith, A. F. M. (1990), “Illustration of Bayesian inference in normal data models using Gibbs sampling,” Journal of the American Statistical Association, 85, 972–985.
Article Google Scholar
Gelfand, A. E., and Smith, A. F. M. (1990), “Sampling-based approaches to calculating marginal densities,” Journal of the American Statistical Association, 85, 398–409.
Article Google Scholar
Gelman, A., and Rubin, D. B. (1992), “Inference from iterative simulation using multiple sequences (with discussion),” Statistical Science, 4, 457–511.
Article Google Scholar
Geman, S., and Geman, D. (1984), “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.
Article PubMed Google Scholar
Geyer, C. J. (1992), “Practical Markov chain Monte Carlo (with discussion),” Statistical Science, 4, 473–511.
Article Google Scholar
Glasser, M. (1964), “Linear regression analysis with missing observations among the independent variables,” Journal of the American Statistical Association, 59, 834844.
Google Scholar
Glynn, R., Laird, N. M., and Rubin, D. B. (1986), “Selection modeling versus mixture modeling with nonignorable nonresponse,” in H. Wainer (ed.), Drawing Inferences from Self-Selected Samples, pp. 119–146, New York: Springer-Verlag.
Google Scholar
Göksel, H., Judkins, D. R., and Mosher, W. D. (1991), “Nonresponse adjustments for a telephone follow-up to a national in-person survey,” in American Statistical Association, 1991, Proceedings of the section on Survey Research Methods, pp. 581–586.
Google Scholar
Greenlees, J. S., Reece, W. S., and Zieschang, K. 0. (1982), “Imputation of missing values when the probability of response depends on the variable being imputed,” Journal of the American Statistical Association, 77, 251–261.
Article Google Scholar
Haitovsky, Y. (1968), “Missing data in regression analysis,” Journal of the Royal Statistical Society, Ser. B, 30, 67–81.
Google Scholar
Hanson, R. H. (1978), “The current population survey: design and methodology,” Technical Paper No. 40, Washington, DC: U.S. Bureau of the Census.
Google Scholar
Harte, J. M. (1982), “Post-stratification approaches in the Corporation Statistics of Income Program,” in American Statistical Association, 1982, Proceedings of the section on Survey Research Methods, 250–253.
Google Scholar
Heckman, J. (1976), “The common structure of statistical models of truncation, sample selection and limited dependent variables, and a simple estimator for such models,” Annals of Economic and Social Measurement, 5, 475–492.
Google Scholar
Heitjan, D. F., and Little, R. J. A. (1991), “Multiple imputation for the fatal accident reporting system,” Applied Statistics, 40, 13–29.
Article Google Scholar
Herzog, T. N., and Rubin, D. B. (1983), “Using multiple imputations to handle nonresponse in sample surveys,” in W. G. Madow, I.O1kin, and D. B. Rubin (eds.), Incomplete Data in Sample Surveys, Volume 2: Theory and Bibliographies, pp. 209–245, New York: Academic Press.
Google Scholar
Jennrich, R. I., and Schluchter, M. D. (1986), “Unbalanced repeated-measures models with structured covariance matrices,”Biometrics, 42, 805–820.
Google Scholar
Kass, R. E., and Steffey, D. (1989), “Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical B ayes models),” Journal of the American Statistical Association, 84, 717–726.
Article Google Scholar
Kennickell, A. B. (1991), “Imputation of the 1989 Survey of Consumer Finances: stochastic relaxation and multiple imputation,” in American Statistical Association, 1991, Proceedings of the section on Survey Research Methods, pp. 1–10.
Google Scholar
Kim, J. 0., and Curry, J. (1977), “The treatment of missing data in multivariate analysis,” Sociological Methods and Research, 6, 215–240.
Article Google Scholar
Laird, N. M. (1988), “Missing data in longitudinal studies,” Statistics in Medicine, 7, 305–315.
Article PubMed Google Scholar
Laird, N. M., and Ware, J. H. (1982), “Random-effects models for longitudinal data,” Biometrics, 38, 963–974.
Article PubMed Google Scholar
Lange, K. L., Little, R. J. A., and Taylor, J. M. G. (1989), “Robust statistical modeling using the t distribution,” Journal of the American Statistical Association, 84, 88 1896.
Google Scholar
Lazzeroni, L. C., Schenker, N., and Taylor, J. M. G. (1990), “Robustness of multiple imputation techniques to model specification,” in American Statistical Association, 1990, Proceedings of the section on Survey Research Methods, pp. 260–265.
Google Scholar
Lee, L. F. (1982), “Some approaches to the correction of selectivity bias,” Review of Economic Studies, 49, 355–372.
Article Google Scholar
Li, K. H. (1988), “Imputation using Markov chains,” Journal of Statistical Computation and Simulation, 30, 57–79.
Article Google Scholar
Li, K. H., Meng, X. L., Raghunathan, T. E., and Rubin, D. B. (1991), “Significance levels from repeated p-values with multiply imputed data,” Statistica Sinica, 1, 65–92.
Google Scholar
Li, K. H., Raghunathan, T. E., and Rubin, D. B. (1991), “Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution,” Journal of the American Statistical Association, 86, 1065–1073.
Google Scholar
Liang, K. Y., Zeger, S. L., and Qaqish, B. (1992), “Multivariate regression analysis for categorical data,” Journal of the Royal Statistical Society, Ser. B, 54, 3–40.
Google Scholar
Lillard, L., Smith, J. P., and Welch, F. (1986), “What do we really know about wages: the importance of nonreporting and census imputation,” Journal of Political Economy, 94, 489–506.
Article Google Scholar
Lindstrom, M. J., and Bates, D. M. (1988), “Newton-Raphson and EM algorithms for linear mixed-effects models for repeated-measures data,” Journal of the American Statistical Association, 88, 1014–1022.
Google Scholar
Lipsitz, S. R., Laird, N. M., and Harrington, D. P. (1990), “Using the jackknife to estimate the variance of regression estimators from repeated measures studies,” Communications in Statistics, Ser. A, 19, 821–845.
Google Scholar
Little, R. J. A. (1985a), “Nonresponse adjustments in longitudinal surveys: models for categorical data,” Bulletin of the International Statistical Institute, 15. 1, 1–15.
Google Scholar
Little, R. J. A. (1985b), “A note about models for selectivity bias,” Econometrica, 53, 1469–1474.
Article Google Scholar
Rubin, D. B. (1986), “Survey nonresponse adjustments,” International Statistical Review, 54, 139–157.
Article Google Scholar
Rubin, D. B. (1988a), “A test of missing completely at random for multivariate data with missing values, ” Journal of the American Statistical Association, 83, 1198–1202.
Article Google Scholar
Rubin, D. B. (1988b), “Missing data adjustments in large surveys,” Journal of Business and Economic Statistics, 6, 287–301.
Google Scholar
c), “Robust estimation of the mean and covariance matrix from data with
Google Scholar
missing values,” Applied Statistics,37, 23–38.
Google Scholar
Rubin, D. B. (1992), “Regression with incomplete X’s; a review,” Journal of the American Statistical Association, 87, 1227–1237.
Google Scholar
Rubin, D. B. (1993a), “Post-stratification: a modeler’s perspective,” Journal of the American Statistical Association, 88, 1001–1012.
Article Google Scholar
Rubin, D. B. (1993b), “Pattern-mixture models for multivariate incomplete data,” Journal of the American Statistical Association, 88, 125–134.
Google Scholar
Rubin, D. B. (1993c), “A class of pattern-mixture models for normal incomplete data,” To appear in Biometrika.
Google Scholar
Little, R. J. A., and Rubin, D. B. (1987), Statistical Analysis with Missing Data, New York: Wiley.
Google Scholar
Little, R. J. A., and Schluchter, M.D. (1985), “Maximum likelihood estimation for mixed continuous and categorical data with missing values,” Biometrika, 72, 497512.
Google Scholar
Little, R. J. A., and Su, H. L. (1989), “Item nonresponse in panel surveys,” in D. Kasprzyk, G. Duncan, G. Kalton, and M. P. Singh (eds.), Panel Surveys, pp. 400 125, New York: Wiley.
Google Scholar
Madow, W. G., Nisselson, H., and Olkin, I. (eds.) (1983), Incomplete Data in Sample Surveys, Volume 1: Report and Case Studies. Academic Press, New York.
Google Scholar
Madow, W. G., and Olkin, I. (eds.) (1983), Incomplete Data in Sample Surveys, Volume 3: Proceedings of the Symposium. Academic Press, New York.
Google Scholar
Madow, W. G., Olkin, I., and Rubin, D. B. (eds.) (1983), Incomplete Data in Sample
Google Scholar
Surveys,Volume 2: Theory and Bibliographies. Academic Press, New York. Marini, M. M., Olsen, A. R., and Rubin, D. B. (1980), “Maximum-likelihood estimation in panel studies with missing data,” Sociological Methodology, 11,314–357.
Google Scholar
McCullagh, P., and Neider, J. A. (1989), Generalized Linear Models, second edition, London: Chapman and Hall.
Google Scholar
McKendrick, A. G. (1926), “Applications of mathematics to medical problems,” Proceedings of the Edinburgh Mathematics Society, 44, 98–130.
Article Google Scholar
Meng, X. L., and Rubin, D. B. (1991), “Using EM to obtain aysmptotic variance-covariance matrices: the SEM algorithm,” Journal of the American Statistical Association 86, 899–909.
Article Google Scholar
Rubin, D. B. (1992), “Performing likelihood ratio tests with multiply-imputed data sets,” Biometrika, 79, 103–111.
Article Google Scholar
Rubin, D. B. (1993), “Maximum likelihood estimation via the ECM algorithm: a general framework,”Biometrika, 80, 267–278.
Google Scholar
Moulton, L. H., and Zeger, S. L. (1989), “Analyzing repeated measures on generalized linear models via the bootstrap, ” Biometrics, 45, 381–394.
Article Google Scholar
Muthén, B., Kaplan, D., and Hollis, M. (1987), “On structural equation modeling with data that are not missing completely at random,” Psychometrika, 52, 431–462.
Article Google Scholar
Nelson, F. D. (1984), “Efficiency of the two-step estimator for models with endogenous sample selection,” Journal of Econometrics, 24, 181–196.
Article Google Scholar
Oh, H. L., and Scheuren, F. S. (1983), “Weighting adjustments for unit nonresponse,” in W. G. Madow, I.Olkin, and D. B. Rubin (eds.), Incomplete Data in Sample Surveys, Volume 2: Theory and Bibliographies, pp. 143–184, New York: Academic Press.
Google Scholar
Olkin, I., and Tate, R. F. (1961), “Multivariate correlation models with mixed discrete and continuous variables,”Biometrika, 72, 448–465.
Google Scholar
Olsen, R. J. (1982), “Distributional tests for selectivity bias and a more robust likelihood estimator,” International Economic Review, 23, 223–240.
Article Google Scholar
Orchard, T., and Woodbury, M. A. (1972), “A missing information principle: theory and applications,” Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, 1, 697–715.
Google Scholar
Prentice, R. L., and Zhao, L. P. (1991), “Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses, ” Biometrics, 47, 825–839.
Article PubMed Google Scholar
Rosenbaum, P. R., and Rubin, D. B. (1983), “The central role of the propensity score in observational studies for causal effects, ” Biometrika, 70, 41–55.
Article Google Scholar
Rubin, D. B. (1976a), “Inference and missing data,” Biometrika, 63, 581–592. (1976b), “Comparing regressions when some predictor values are missing,” Tech- nometrics, 18, 201–205.
Google Scholar
Rubin, D. B. (1977), “Formalizing subjective notions about the effect of nonrespondents in sample surveys,” Journal of the American Statistical Association, 72, 538–543.
Article Google Scholar
Rubin, D. B. (1978), “Multiple imputations in sample surveys - A phenomenological Bayesian approach to nonresponse,” in American Statistical Association, 1978, Proceedings of the section on Survey Research Methods, pp. 20–34.
Google Scholar
Rubin, D. B. (1986), “Statistical matching and file concatenation with adjusted weights and multiple imputations,” Journal of Business and Economic Statistics, 4, 87–94.
Google Scholar
Rubin, D. B. and Schenker, N. (1987), Multiple Imputation for Nonresponse in Surveys, New York: Wiley.
Book Google Scholar
Rubin, D. B. and Schenker, N. (1988), “An overview of multiple imputation,” in American Statistical Association, 1988, Proceedings of the section on Survey Research Methods, pp. 79–84.
Google Scholar
Rubin, D. B., Schafer, J. L. and Schenker, N. (1988), “Imputation strategies for missing values in post-enumeration surveys,” Survey Methodology, 14, 209–221.
Google Scholar
Rubin, D. B. and Schenker, N. (1986), “Multiple imputation for interval estimation from simple random samples with ignorable nonresponse,” Journal of the American Statistical Association, 81, 366–374.
Article Google Scholar
Rubin, D. B. and Schenker, N. (1987), “Interval estimation from multiply-imputed data: A case study using census agriculture industry codes,” Journal of Official Statistics, 3, 375–387.
Google Scholar
Rubin, D. B. and Schenker, N. (1991), “Multiple imputation in health-care databases: An overview and some applications,” Statistics in Medicine, 10, 585–598.
Article PubMed Google Scholar
SAS (1992), “The MIXED Procedure,” chapter 16 in: SAS/STAT Software: Changes and Enhancements, Release 6.07. Technical Report P-229, SAS Institute, Inc., Cary, NC.
Google Scholar
Schafer, J. L. (1991), Algorithms for Multiple Imputation andPosterior Simulationfrom Incomplete Multivariate Data with Ignorable Nonresponse. Ph.D. Thesis, Department of Statistics, Harvard University.
Google Scholar
Schenker, N., Treiman, D.J., and Weidman, L. (1993), “Analyses of public-use data with multiply-imputed industry and occupation codes,” Applied Statistics, 42, 545–556.
Article PubMed Google Scholar
Schenker, N., and Welsh, A. H. (1988), “Asymptotic results for multiple imputation,” The Annals of Statistics, 16, 1550–1566.
Article Google Scholar
Schluchter, M. D. (1988), “Analysis of incomplete multivariate data using linear models with structured covariance matrices,” Statistics in Medicine, 7, 317–324.
Article PubMed Google Scholar
Schoenberg, R. S. (1988), “MISS: a program for missing data,” in GAUSS Programming Language, Aptech Systems Inc., P.O. Box 6487, Kent, WA 98064.
Google Scholar
Stolzenberg, R. M. and Relies, D. A. (1990), “Theory testing in a world of constrained research design–The significance of Heckman’s censored sampling bias correction for nonexperimental research,” Sociological Methods and Research, 18, 395–415.
Article Google Scholar
Tanner, M. A. (1991), Tools for Statistical Inference: Observed Data and Data Augmentation Methods, New York: Springer-Verlag.
Book Google Scholar
Tanner, M. A., and Wong, W. H. (1987), “The calculation of posterior distributions by data augmentation,” Journal of the American Statistical Association, 82, 528–550.
Article Google Scholar
Treiman, D. J., Bielby, W. T., and Cheng, M. T. (1988), “Evaluating a multiple-imputation method for recalibrating 1970 U.S. census detailed industry codes to the 1980 standard,” Sociological Methodology, 18, 309–345.
Article Google Scholar
Van Praag, B. M. S., Dijkstra, T. K., and Van Velzen, J. (1985), “Least-squares theory based on general distributional assumptions with an application to the incomplete observations problem,” Psychometrika, 50, 25–36.
Article Google Scholar
Waterton, J., and Lievesley, D. (1987), “Attrition in a panel study of attitudes,” Journal of Official Statistics, 3, 267–282.
Google Scholar
Weidman, L. (1989), “Final report: industry and occupation imputation,” Statistical Research Division Report Number Census/SRD/89/03, Washington, DC: U.S. Bureau of the Census.
Google Scholar
Wilks, S. S. (1932), “Moments and distribution of estimates of population parameters from fragmentary samples,” The Annals of Mathematical Statistics 3, 163–195.
Article Google Scholar
Woodburn, L. (1991), “Using auxiliary information to investigate nonresponse bias,” in American Statistical Association, 1991, Proceedings of the section on Survey Research Methods, pp. 278–283.
Google Scholar
Zeger, S. L., and Liang, K. Y. (1986), “Longitudinal data analysis for discrete and continuous outcomes,” Biometrics, 42, 121–130.
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biostatistics, University of Michigan, 1420 Washington Heights, Ann Arbor, MI, 48109-2029, USA
Roderick J. A. Little
Department of Biostatistics, UCLA School of Public Health, 10833 Le Conte Avenue, Los Angeles, CA, 90024-1772, USA
Nathaniel Schenker

Authors

Roderick J. A. Little
View author publications
You can also search for this author in PubMed Google Scholar
Nathaniel Schenker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Economics, Bergische Universität-GH Wuppertal, D-42097, Wuppertal, Germany
Gerhard Arminger
Department of Sociology and Department of Statistics, Late of Pennsylvania State University, 16802, University Park, Pennsylvania, USA
Clifford C. Clogg
Department of Sociology, University of Arizona, 85721, Tucson, Arizona, USA
Michael E. Sobel

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Little, R.J.A., Schenker, N. (1995). Missing Data. In: Arminger, G., Clogg, C.C., Sobel, M.E. (eds) Handbook of Statistical Modeling for the Social and Behavioral Sciences. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-1292-3_2

Download citation

DOI: https://doi.org/10.1007/978-1-4899-1292-3_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-1294-7
Online ISBN: 978-1-4899-1292-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics