Abstract
This paper discusses the importance of managing data quality in academic research in its relation to satisfying the customer. This focus is on the data completeness objectivedimension of data quality in relation to recent advancements which have been made in the development of methods for analysing incomplete multivariate data. An overview and comparison of the traditional techniques with the recent advancements are provided. Multiple imputation is also discussed as a method of analysing incomplete multivariate data, which can potentially reduce some of the biases which can occur from using some of the traditional techniques. Despite these recent advancements in the analysis of incomplete multivariate data, evidence is presented which shows that researchers are not using these techniques to manage the data quality of their current research across a variety of academic disciplines. An analysis is then provided as to why these techniques have not been adopted along with suggestions to improve the frequency of their use in the future.
Source-Reference. The ideas for this paper originated from research work on David J. Fogarty's Ph.D. dissertation. The subject area is the use of advanced techniques for the imputation of incomplete multivariate data on corporate data warehouses.
Similar content being viewed by others
References
Brodie, M. L. (1980). Data quality in information systems. Information Management 3: 245-258.
Cronin P. (1993). Close the data quality gap through total data quality management. MIT Management, June.
Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. B39: 1-38.
Efron, B. (1994). Missing Data, Imputation, and the bootstrap. Journal of the American Statistical Association 89: 426-463.
Fay, R. E. (1996). Alternative paradigms for the analysis of imputed survey data. Journal of the American Statistical Association 63: 223-231.
Gelman, A., King, G. & Liu, C. (1998). Not asked or answered: Multiple imputation for multiple surveys. Journal of the American Statistical Association, Submitted 1998.
Hoare, C. R. (1975). Data reliability. SIGPLAN Notices 10(6): 528-533.
Huh, Y. U., Keller, F. R., Redman, T. C. & Watkins A. R. (1990). Data quality. Journal of Information and Software Technology 32(8): 559-565.
James, I. R. (1995). A note on the analysis of censored regression data by multiple imputation. Biometrics 51: 358-362.
King, G., Honaker, J., Joseph, A. & Scheve, K. (1998). Listwise deletion is evil: What to do about missing data in political science, http://Gking.Harvard.Edu, Unpublished.
Laudon, K. C. (1986). Data quality and due process in large interorganizational record systems. Commun. ACM 29(1): 4-18.
Lillard, L., Smith, J. P. & Welch, F. (1986). What do we really know about wages? The importance of nonreporting and census information. Journal of Political Economy 94(31): 489-506.
Little, R. J. A. (1982). Models for nonresponse in sample surveys. J. Am. Statist. Assoc. 77: 237-250.
Little, R. J. & Smith, P. J. (1987). Editing and imputation for quantitative survey data. Journal of the American Statistical Association 82: 56-68.
Little, R. & Rubin, D. (1986). Statistical Analysis with Missing Data. New York: Wiley.
McKeown, P. G. (1984). Editing of continuous survey data, SIAM J. Scientific and Statistical Computing 784-797.
Montalto, C. P. & Sung, J. (1996). Multiple imputation in the 1992 survey of consumer finances. Financial Counseling and Planning 7: 133-146.
Morey, R. C. (1982). Estimating and improving the quality of information in a MIS. Commun. ACM 25(5): 337-342.
Nie, N. H., Hull, C. H., Jenkins, J. G., Steinbrenner, K. & Bent, D. H. (1975). SPSS, 2nd ed., McGraw-Hill, New York.
Orchard, T. & Woodbury, M. A. (1972). A missing information principle: Theory and applications. Proc. 6th Berkeley Symposium on Math. Statist. and Prob. 1: 697-715.
Redman, T. C. (1992). Data Quality: Management and Technology, New York: Bantam Books.
Roth, P. (1994). Missing data: Conceptual review for applied psychologists. Personnel Psychology 47: 537-560.
Rubin, D. B. (1977). The Design of a General and Flexible System for Handling Non-response in Sample Surveys, Manuscript Prepared for the US Social Security Administration, July 1, 1997.
Rubin, D. B. (1978). Multiple imputations in sample surveys-a phenomenological Bayesian approach to non-response, Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 20-34.
Rubin, D. B. (1986). Statistical Analysis with Missing Data, John Wiley & Sons.
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys, John Wiley & Sons.
Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, June.
Rubin, D. B. & Scheneker, N. (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association 81: 366-374.
Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data, Chapman & Hall.
Schafer, J. L. Olsen, M. K. (Submitted 1998). Multiple imputation for multivariate missing data problems: A data analyst's perspective. Journal of the American Statistical Association.
Schenker, N., Treiman, D. J. & Weidman L. (1993). Analysis of public use decennial census data with multiply imputed industry and occupation codes, Applied Statistics 42(3): 545-556.
Sparkhawk, T. C. (1993). How does the fed data garden? By deeply sowing the seeds of TQM. Government Computer News, Jan. 18.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Fogarty, D.J., Blake, J. Utilising Recent Advancements in Techniques for the Analysis of Incomplete Multivariate Data to Improve the Data Quality Management of Current Academic Research. Quality & Quantity 36, 277–289 (2002). https://doi.org/10.1023/A:1016028622217
Issue Date:
DOI: https://doi.org/10.1023/A:1016028622217