Abstract
Understanding the effects and consequences of missing data imputation is vital to the ability to obtain meaningful and reliable statistics and coefficients in the examination of any quantitatively-based phenomena. Over time a series of sophisticated methods have been developed to handle the issue of missing data imputation however, these sophisticated methods may not always be appropriate or attainable. In these specific cases more traditional approaches to missing data imputation must be employed and driven by the research project, theoretical framework, and the data. In this research note we offer a brief account of one such instance, implementing a large-group mean imputation approach to handling missing data. The analysis is drawn from a much larger project and shows the effect of proper group selection in terms of mean imputation using a cross-validation approach based on the imputed data’s relation to known values. Ultimately, the results show that the use of Rural-Urban Continuum codes are superior to currently used group-means in the U.S., thus introducing a new, and more efficient, approach to the handling of missing data using group-mean imputation.
Similar content being viewed by others
Notes
All nine Beale code categories were checked to make sure that a large enough n still existed in order to statistically draw a group average.
References
Afifi, A. A., & Elashoff, R. M. (1966). Missing observations in multivariate statistics: Review of the literature. Journal of the American Statistical Association, 61, 595–604.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, B 39, 1–38.
Economic Research Service (ERS). (2004). Measuring rurality: Rural-urban continuum codes. Retrieved April 28, 2004, from http://www.ers.usda.gov/Briefing/Rurality/RuralUrbCon/.
Gelman, A., King, G., & Liu, C. (1998). Not asked and not answered: Multiple imputation for multiple surveys. Journal of the American Statistical Association, 93, 846–874.
Hartley, H. O., & Hocking, R. R. (1971). The analysis of incomplete data. Biometrics, 27, 783–808.
Little, R. J. A., & Rubin, D. B. (1983). Incomplete data. Encyclopedia of Statistical Science, 4, 46–53.
Myrtveit, I., Stensrud, E., & Olsson, U. H. (2001). Analyzing data sets with missing fata: An empirical evaluation of imputation methods and likelihood-based methods. IEEE Transactions on Software Engineering, 27, 999–1013.
Acknowledgments
The authors would like to acknowledge support for this project from the Social Science Research Center and Mississippi State University, through a master grant from the Highway Watch (HWW) in conjunction with the U.S. Department of Homeland Security (USDHS).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Porter, J.R., Cossman, R.E. & James, W.L. Research note: imputing large group averages for missing data, using rural-urban continuum codes for density driven industry sectors. J Pop Research 26, 273–278 (2009). https://doi.org/10.1007/s12546-009-9018-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12546-009-9018-1