Abstract
Studies in the social and behavioral sciences frequently suffer from missing data. For instance, sample surveys often have some individuals who either refuse to participate or do not supply answers to certain questions, and panel studies often have incomplete data due to attrition. Recent comprehensive treatments of the subject of missing data include three volumes produced by the Panel on Incomplete Data of the Committee on National Statistics (Madow, Nisselson, and Olkin 1983; Madow and Olkin 1983; Madow, Olkin, and Rubin 1983) and Little and Rubin (1987).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Afifi, A. A., and Elashoff, R. M. (1967), “Missing observations in multivariate statistics II: point estimation in simple linear regression,” Journal of the American Statistical Association, 62, 595–604.
Anderson, A. B., Basilevsky, A., and Hum, D. P. J. (1983), “Missing data: a review of the literature,” in P.H. Rossi, J. D. Wright, and A. B. Anderson (eds.), Handbook of Survey Research, pp. 415–494, New York: Academic Press.
Anderson, T. W. (1957), “Maximum likelihood estimation for the multivariate normal distribution when some observations are missing,” Journal of the American Statistical Association, 52, 200–203.
Azen, S. P., Van Guilder, M., and Hill, M. A. (1989), “Estimation of parameters and missing values under a regression model with non-normally distributed and non-randomly incomplete data,” Statistics in Medicine, 8, 217–228.
Baker, S. G., and Laird, N. M. (1988), “Regression analysis for categorical variables with outcome subject to nonignorable nonresponse,” Journal of the American Statistical Association, 81, 29–41.
Basu, D. (1971), “An essay on the logical foundations of survey sampling, Part 1,” in V. R Godambe and D. A. Sprott (eds.), Foundations of Statistical Inference, pp. 203242. Toronto: Holt, Rinehart, and Winston.
Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975), Discrete Multivariate Analysis: Theory and Practice, Cambridge, MA: MIT Press.
Casella, G., and George, E. I. (1992), “Explaining the Gibbs sampler,” The American Statistician, 46, 167–174.
Chen, T., and Fienberg, S. E. (1974), “Two-dimensional contingency tables with both completely and partially classified data, ” Biometrics, 30, 629–642.
Clogg, C. C., Rubin, D. B., Schenker, N., Schultz, B., and Weidman, L. (1991), “Multiple imputation of industry and occupation codes in census public-use samples using Bayesian logistic regression,” Journal of the American Statistical Association, 86, 68–78.
Czajka, J. L., Hirabayashi, S. M., Little, R. J. A., and Rubin, D. B. (1992), “Projecting from advance data using propensity modeling: an application to income and tax statistics,” Journal of Business and Economic Statistics, 10, 117.-132.
David, M. H., Little, R. J. A., Samuhel, M. E., and Triest, R. K. (1986), “Alternative methods for CPS income imputation,” Journal of the American Statistical Association, 81, 29–41.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, Ser. B, 39, 1–38.
Dixon, W. J., ed. (1988), BMDP Statistical Software, Los Angeles: University of California Press.
Dorey, F. J.,Little, R. J. A., and Schenker, N. (1993), “Multiple imputation for threshold-crossing data with interval censoring,” UCLA Statistics Series,No. 81. To appear in Statistics in Medicine.
Efron, B. (1979), “Bootstrap methods: Another look at the jackknife,” The Annals of Statistics, 7, 1–26.
Fay, R. E. (1986), “Causal models for patterns of nonresponse,” Journal of the American Statistical Association, 81, 354–365.
Fuchs, C. (1982), “Maximum likelihood estimation and model selection in contingency tables with missing data,” Journal of the American Statistical Association, 77, 270278.
Gelfand, A. E., Hills, S. E., Racine-Poon, A., and Smith, A. F. M. (1990), “Illustration of Bayesian inference in normal data models using Gibbs sampling,” Journal of the American Statistical Association, 85, 972–985.
Gelfand, A. E., and Smith, A. F. M. (1990), “Sampling-based approaches to calculating marginal densities,” Journal of the American Statistical Association, 85, 398–409.
Gelman, A., and Rubin, D. B. (1992), “Inference from iterative simulation using multiple sequences (with discussion),” Statistical Science, 4, 457–511.
Geman, S., and Geman, D. (1984), “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.
Geyer, C. J. (1992), “Practical Markov chain Monte Carlo (with discussion),” Statistical Science, 4, 473–511.
Glasser, M. (1964), “Linear regression analysis with missing observations among the independent variables,” Journal of the American Statistical Association, 59, 834844.
Glynn, R., Laird, N. M., and Rubin, D. B. (1986), “Selection modeling versus mixture modeling with nonignorable nonresponse,” in H. Wainer (ed.), Drawing Inferences from Self-Selected Samples, pp. 119–146, New York: Springer-Verlag.
Göksel, H., Judkins, D. R., and Mosher, W. D. (1991), “Nonresponse adjustments for a telephone follow-up to a national in-person survey,” in American Statistical Association, 1991, Proceedings of the section on Survey Research Methods, pp. 581–586.
Greenlees, J. S., Reece, W. S., and Zieschang, K. 0. (1982), “Imputation of missing values when the probability of response depends on the variable being imputed,” Journal of the American Statistical Association, 77, 251–261.
Haitovsky, Y. (1968), “Missing data in regression analysis,” Journal of the Royal Statistical Society, Ser. B, 30, 67–81.
Hanson, R. H. (1978), “The current population survey: design and methodology,” Technical Paper No. 40, Washington, DC: U.S. Bureau of the Census.
Harte, J. M. (1982), “Post-stratification approaches in the Corporation Statistics of Income Program,” in American Statistical Association, 1982, Proceedings of the section on Survey Research Methods, 250–253.
Heckman, J. (1976), “The common structure of statistical models of truncation, sample selection and limited dependent variables, and a simple estimator for such models,” Annals of Economic and Social Measurement, 5, 475–492.
Heitjan, D. F., and Little, R. J. A. (1991), “Multiple imputation for the fatal accident reporting system,” Applied Statistics, 40, 13–29.
Herzog, T. N., and Rubin, D. B. (1983), “Using multiple imputations to handle nonresponse in sample surveys,” in W. G. Madow, I.O1kin, and D. B. Rubin (eds.), Incomplete Data in Sample Surveys, Volume 2: Theory and Bibliographies, pp. 209–245, New York: Academic Press.
Jennrich, R. I., and Schluchter, M. D. (1986), “Unbalanced repeated-measures models with structured covariance matrices,”Biometrics, 42, 805–820.
Kass, R. E., and Steffey, D. (1989), “Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical B ayes models),” Journal of the American Statistical Association, 84, 717–726.
Kennickell, A. B. (1991), “Imputation of the 1989 Survey of Consumer Finances: stochastic relaxation and multiple imputation,” in American Statistical Association, 1991, Proceedings of the section on Survey Research Methods, pp. 1–10.
Kim, J. 0., and Curry, J. (1977), “The treatment of missing data in multivariate analysis,” Sociological Methods and Research, 6, 215–240.
Laird, N. M. (1988), “Missing data in longitudinal studies,” Statistics in Medicine, 7, 305–315.
Laird, N. M., and Ware, J. H. (1982), “Random-effects models for longitudinal data,” Biometrics, 38, 963–974.
Lange, K. L., Little, R. J. A., and Taylor, J. M. G. (1989), “Robust statistical modeling using the t distribution,” Journal of the American Statistical Association, 84, 88 1896.
Lazzeroni, L. C., Schenker, N., and Taylor, J. M. G. (1990), “Robustness of multiple imputation techniques to model specification,” in American Statistical Association, 1990, Proceedings of the section on Survey Research Methods, pp. 260–265.
Lee, L. F. (1982), “Some approaches to the correction of selectivity bias,” Review of Economic Studies, 49, 355–372.
Li, K. H. (1988), “Imputation using Markov chains,” Journal of Statistical Computation and Simulation, 30, 57–79.
Li, K. H., Meng, X. L., Raghunathan, T. E., and Rubin, D. B. (1991), “Significance levels from repeated p-values with multiply imputed data,” Statistica Sinica, 1, 65–92.
Li, K. H., Raghunathan, T. E., and Rubin, D. B. (1991), “Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution,” Journal of the American Statistical Association, 86, 1065–1073.
Liang, K. Y., Zeger, S. L., and Qaqish, B. (1992), “Multivariate regression analysis for categorical data,” Journal of the Royal Statistical Society, Ser. B, 54, 3–40.
Lillard, L., Smith, J. P., and Welch, F. (1986), “What do we really know about wages: the importance of nonreporting and census imputation,” Journal of Political Economy, 94, 489–506.
Lindstrom, M. J., and Bates, D. M. (1988), “Newton-Raphson and EM algorithms for linear mixed-effects models for repeated-measures data,” Journal of the American Statistical Association, 88, 1014–1022.
Lipsitz, S. R., Laird, N. M., and Harrington, D. P. (1990), “Using the jackknife to estimate the variance of regression estimators from repeated measures studies,” Communications in Statistics, Ser. A, 19, 821–845.
Little, R. J. A. (1985a), “Nonresponse adjustments in longitudinal surveys: models for categorical data,” Bulletin of the International Statistical Institute, 15. 1, 1–15.
Little, R. J. A. (1985b), “A note about models for selectivity bias,” Econometrica, 53, 1469–1474.
Rubin, D. B. (1986), “Survey nonresponse adjustments,” International Statistical Review, 54, 139–157.
Rubin, D. B. (1988a), “A test of missing completely at random for multivariate data with missing values, ” Journal of the American Statistical Association, 83, 1198–1202.
Rubin, D. B. (1988b), “Missing data adjustments in large surveys,” Journal of Business and Economic Statistics, 6, 287–301.
c), “Robust estimation of the mean and covariance matrix from data with
missing values,” Applied Statistics,37, 23–38.
Rubin, D. B. (1992), “Regression with incomplete X’s; a review,” Journal of the American Statistical Association, 87, 1227–1237.
Rubin, D. B. (1993a), “Post-stratification: a modeler’s perspective,” Journal of the American Statistical Association, 88, 1001–1012.
Rubin, D. B. (1993b), “Pattern-mixture models for multivariate incomplete data,” Journal of the American Statistical Association, 88, 125–134.
Rubin, D. B. (1993c), “A class of pattern-mixture models for normal incomplete data,” To appear in Biometrika.
Little, R. J. A., and Rubin, D. B. (1987), Statistical Analysis with Missing Data, New York: Wiley.
Little, R. J. A., and Schluchter, M.D. (1985), “Maximum likelihood estimation for mixed continuous and categorical data with missing values,” Biometrika, 72, 497512.
Little, R. J. A., and Su, H. L. (1989), “Item nonresponse in panel surveys,” in D. Kasprzyk, G. Duncan, G. Kalton, and M. P. Singh (eds.), Panel Surveys, pp. 400 125, New York: Wiley.
Madow, W. G., Nisselson, H., and Olkin, I. (eds.) (1983), Incomplete Data in Sample Surveys, Volume 1: Report and Case Studies. Academic Press, New York.
Madow, W. G., and Olkin, I. (eds.) (1983), Incomplete Data in Sample Surveys, Volume 3: Proceedings of the Symposium. Academic Press, New York.
Madow, W. G., Olkin, I., and Rubin, D. B. (eds.) (1983), Incomplete Data in Sample
Surveys,Volume 2: Theory and Bibliographies. Academic Press, New York. Marini, M. M., Olsen, A. R., and Rubin, D. B. (1980), “Maximum-likelihood estimation in panel studies with missing data,” Sociological Methodology, 11,314–357.
McCullagh, P., and Neider, J. A. (1989), Generalized Linear Models, second edition, London: Chapman and Hall.
McKendrick, A. G. (1926), “Applications of mathematics to medical problems,” Proceedings of the Edinburgh Mathematics Society, 44, 98–130.
Meng, X. L., and Rubin, D. B. (1991), “Using EM to obtain aysmptotic variance-covariance matrices: the SEM algorithm,” Journal of the American Statistical Association 86, 899–909.
Rubin, D. B. (1992), “Performing likelihood ratio tests with multiply-imputed data sets,” Biometrika, 79, 103–111.
Rubin, D. B. (1993), “Maximum likelihood estimation via the ECM algorithm: a general framework,”Biometrika, 80, 267–278.
Moulton, L. H., and Zeger, S. L. (1989), “Analyzing repeated measures on generalized linear models via the bootstrap, ” Biometrics, 45, 381–394.
Muthén, B., Kaplan, D., and Hollis, M. (1987), “On structural equation modeling with data that are not missing completely at random,” Psychometrika, 52, 431–462.
Nelson, F. D. (1984), “Efficiency of the two-step estimator for models with endogenous sample selection,” Journal of Econometrics, 24, 181–196.
Oh, H. L., and Scheuren, F. S. (1983), “Weighting adjustments for unit nonresponse,” in W. G. Madow, I.Olkin, and D. B. Rubin (eds.), Incomplete Data in Sample Surveys, Volume 2: Theory and Bibliographies, pp. 143–184, New York: Academic Press.
Olkin, I., and Tate, R. F. (1961), “Multivariate correlation models with mixed discrete and continuous variables,”Biometrika, 72, 448–465.
Olsen, R. J. (1982), “Distributional tests for selectivity bias and a more robust likelihood estimator,” International Economic Review, 23, 223–240.
Orchard, T., and Woodbury, M. A. (1972), “A missing information principle: theory and applications,” Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, 1, 697–715.
Prentice, R. L., and Zhao, L. P. (1991), “Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses, ” Biometrics, 47, 825–839.
Rosenbaum, P. R., and Rubin, D. B. (1983), “The central role of the propensity score in observational studies for causal effects, ” Biometrika, 70, 41–55.
Rubin, D. B. (1976a), “Inference and missing data,” Biometrika, 63, 581–592. (1976b), “Comparing regressions when some predictor values are missing,” Tech- nometrics, 18, 201–205.
Rubin, D. B. (1977), “Formalizing subjective notions about the effect of nonrespondents in sample surveys,” Journal of the American Statistical Association, 72, 538–543.
Rubin, D. B. (1978), “Multiple imputations in sample surveys - A phenomenological Bayesian approach to nonresponse,” in American Statistical Association, 1978, Proceedings of the section on Survey Research Methods, pp. 20–34.
Rubin, D. B. (1986), “Statistical matching and file concatenation with adjusted weights and multiple imputations,” Journal of Business and Economic Statistics, 4, 87–94.
Rubin, D. B. and Schenker, N. (1987), Multiple Imputation for Nonresponse in Surveys, New York: Wiley.
Rubin, D. B. and Schenker, N. (1988), “An overview of multiple imputation,” in American Statistical Association, 1988, Proceedings of the section on Survey Research Methods, pp. 79–84.
Rubin, D. B., Schafer, J. L. and Schenker, N. (1988), “Imputation strategies for missing values in post-enumeration surveys,” Survey Methodology, 14, 209–221.
Rubin, D. B. and Schenker, N. (1986), “Multiple imputation for interval estimation from simple random samples with ignorable nonresponse,” Journal of the American Statistical Association, 81, 366–374.
Rubin, D. B. and Schenker, N. (1987), “Interval estimation from multiply-imputed data: A case study using census agriculture industry codes,” Journal of Official Statistics, 3, 375–387.
Rubin, D. B. and Schenker, N. (1991), “Multiple imputation in health-care databases: An overview and some applications,” Statistics in Medicine, 10, 585–598.
SAS (1992), “The MIXED Procedure,” chapter 16 in: SAS/STAT Software: Changes and Enhancements, Release 6.07. Technical Report P-229, SAS Institute, Inc., Cary, NC.
Schafer, J. L. (1991), Algorithms for Multiple Imputation andPosterior Simulationfrom Incomplete Multivariate Data with Ignorable Nonresponse. Ph.D. Thesis, Department of Statistics, Harvard University.
Schenker, N., Treiman, D.J., and Weidman, L. (1993), “Analyses of public-use data with multiply-imputed industry and occupation codes,” Applied Statistics, 42, 545–556.
Schenker, N., and Welsh, A. H. (1988), “Asymptotic results for multiple imputation,” The Annals of Statistics, 16, 1550–1566.
Schluchter, M. D. (1988), “Analysis of incomplete multivariate data using linear models with structured covariance matrices,” Statistics in Medicine, 7, 317–324.
Schoenberg, R. S. (1988), “MISS: a program for missing data,” in GAUSS Programming Language, Aptech Systems Inc., P.O. Box 6487, Kent, WA 98064.
Stolzenberg, R. M. and Relies, D. A. (1990), “Theory testing in a world of constrained research design–The significance of Heckman’s censored sampling bias correction for nonexperimental research,” Sociological Methods and Research, 18, 395–415.
Tanner, M. A. (1991), Tools for Statistical Inference: Observed Data and Data Augmentation Methods, New York: Springer-Verlag.
Tanner, M. A., and Wong, W. H. (1987), “The calculation of posterior distributions by data augmentation,” Journal of the American Statistical Association, 82, 528–550.
Treiman, D. J., Bielby, W. T., and Cheng, M. T. (1988), “Evaluating a multiple-imputation method for recalibrating 1970 U.S. census detailed industry codes to the 1980 standard,” Sociological Methodology, 18, 309–345.
Van Praag, B. M. S., Dijkstra, T. K., and Van Velzen, J. (1985), “Least-squares theory based on general distributional assumptions with an application to the incomplete observations problem,” Psychometrika, 50, 25–36.
Waterton, J., and Lievesley, D. (1987), “Attrition in a panel study of attitudes,” Journal of Official Statistics, 3, 267–282.
Weidman, L. (1989), “Final report: industry and occupation imputation,” Statistical Research Division Report Number Census/SRD/89/03, Washington, DC: U.S. Bureau of the Census.
Wilks, S. S. (1932), “Moments and distribution of estimates of population parameters from fragmentary samples,” The Annals of Mathematical Statistics 3, 163–195.
Woodburn, L. (1991), “Using auxiliary information to investigate nonresponse bias,” in American Statistical Association, 1991, Proceedings of the section on Survey Research Methods, pp. 278–283.
Zeger, S. L., and Liang, K. Y. (1986), “Longitudinal data analysis for discrete and continuous outcomes,” Biometrics, 42, 121–130.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1995 Springer Science+Business Media New York
About this chapter
Cite this chapter
Little, R.J.A., Schenker, N. (1995). Missing Data. In: Arminger, G., Clogg, C.C., Sobel, M.E. (eds) Handbook of Statistical Modeling for the Social and Behavioral Sciences. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-1292-3_2
Download citation
DOI: https://doi.org/10.1007/978-1-4899-1292-3_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-1294-7
Online ISBN: 978-1-4899-1292-3
eBook Packages: Springer Book Archive