Skip to main content
Log in

Utilising Recent Advancements in Techniques for the Analysis of Incomplete Multivariate Data to Improve the Data Quality Management of Current Academic Research

  • Published:
Quality and Quantity Aims and scope Submit manuscript

Abstract

This paper discusses the importance of managing data quality in academic research in its relation to satisfying the customer. This focus is on the data completeness objectivedimension of data quality in relation to recent advancements which have been made in the development of methods for analysing incomplete multivariate data. An overview and comparison of the traditional techniques with the recent advancements are provided. Multiple imputation is also discussed as a method of analysing incomplete multivariate data, which can potentially reduce some of the biases which can occur from using some of the traditional techniques. Despite these recent advancements in the analysis of incomplete multivariate data, evidence is presented which shows that researchers are not using these techniques to manage the data quality of their current research across a variety of academic disciplines. An analysis is then provided as to why these techniques have not been adopted along with suggestions to improve the frequency of their use in the future.

Source-Reference. The ideas for this paper originated from research work on David J. Fogarty's Ph.D. dissertation. The subject area is the use of advanced techniques for the imputation of incomplete multivariate data on corporate data warehouses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Brodie, M. L. (1980). Data quality in information systems. Information Management 3: 245-258.

    Google Scholar 

  • Cronin P. (1993). Close the data quality gap through total data quality management. MIT Management, June.

  • Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. B39: 1-38.

    Google Scholar 

  • Efron, B. (1994). Missing Data, Imputation, and the bootstrap. Journal of the American Statistical Association 89: 426-463.

    Google Scholar 

  • Fay, R. E. (1996). Alternative paradigms for the analysis of imputed survey data. Journal of the American Statistical Association 63: 223-231.

    Google Scholar 

  • Gelman, A., King, G. & Liu, C. (1998). Not asked or answered: Multiple imputation for multiple surveys. Journal of the American Statistical Association, Submitted 1998.

  • Hoare, C. R. (1975). Data reliability. SIGPLAN Notices 10(6): 528-533.

    Google Scholar 

  • Huh, Y. U., Keller, F. R., Redman, T. C. & Watkins A. R. (1990). Data quality. Journal of Information and Software Technology 32(8): 559-565.

    Google Scholar 

  • James, I. R. (1995). A note on the analysis of censored regression data by multiple imputation. Biometrics 51: 358-362.

    Google Scholar 

  • King, G., Honaker, J., Joseph, A. & Scheve, K. (1998). Listwise deletion is evil: What to do about missing data in political science, http://Gking.Harvard.Edu, Unpublished.

  • Laudon, K. C. (1986). Data quality and due process in large interorganizational record systems. Commun. ACM 29(1): 4-18.

    Google Scholar 

  • Lillard, L., Smith, J. P. & Welch, F. (1986). What do we really know about wages? The importance of nonreporting and census information. Journal of Political Economy 94(31): 489-506.

    Google Scholar 

  • Little, R. J. A. (1982). Models for nonresponse in sample surveys. J. Am. Statist. Assoc. 77: 237-250.

    Google Scholar 

  • Little, R. J. & Smith, P. J. (1987). Editing and imputation for quantitative survey data. Journal of the American Statistical Association 82: 56-68.

    Google Scholar 

  • Little, R. & Rubin, D. (1986). Statistical Analysis with Missing Data. New York: Wiley.

    Google Scholar 

  • McKeown, P. G. (1984). Editing of continuous survey data, SIAM J. Scientific and Statistical Computing 784-797.

  • Montalto, C. P. & Sung, J. (1996). Multiple imputation in the 1992 survey of consumer finances. Financial Counseling and Planning 7: 133-146.

    Google Scholar 

  • Morey, R. C. (1982). Estimating and improving the quality of information in a MIS. Commun. ACM 25(5): 337-342.

    Google Scholar 

  • Nie, N. H., Hull, C. H., Jenkins, J. G., Steinbrenner, K. & Bent, D. H. (1975). SPSS, 2nd ed., McGraw-Hill, New York.

  • Orchard, T. & Woodbury, M. A. (1972). A missing information principle: Theory and applications. Proc. 6th Berkeley Symposium on Math. Statist. and Prob. 1: 697-715.

    Google Scholar 

  • Redman, T. C. (1992). Data Quality: Management and Technology, New York: Bantam Books.

    Google Scholar 

  • Roth, P. (1994). Missing data: Conceptual review for applied psychologists. Personnel Psychology 47: 537-560.

    Google Scholar 

  • Rubin, D. B. (1977). The Design of a General and Flexible System for Handling Non-response in Sample Surveys, Manuscript Prepared for the US Social Security Administration, July 1, 1997.

  • Rubin, D. B. (1978). Multiple imputations in sample surveys-a phenomenological Bayesian approach to non-response, Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 20-34.

  • Rubin, D. B. (1986). Statistical Analysis with Missing Data, John Wiley & Sons.

  • Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys, John Wiley & Sons.

  • Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, June.

  • Rubin, D. B. & Scheneker, N. (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association 81: 366-374.

    Google Scholar 

  • Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data, Chapman & Hall.

  • Schafer, J. L. Olsen, M. K. (Submitted 1998). Multiple imputation for multivariate missing data problems: A data analyst's perspective. Journal of the American Statistical Association.

  • Schenker, N., Treiman, D. J. & Weidman L. (1993). Analysis of public use decennial census data with multiply imputed industry and occupation codes, Applied Statistics 42(3): 545-556.

    Google Scholar 

  • Sparkhawk, T. C. (1993). How does the fed data garden? By deeply sowing the seeds of TQM. Government Computer News, Jan. 18.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fogarty, D.J., Blake, J. Utilising Recent Advancements in Techniques for the Analysis of Incomplete Multivariate Data to Improve the Data Quality Management of Current Academic Research. Quality & Quantity 36, 277–289 (2002). https://doi.org/10.1023/A:1016028622217

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1016028622217

Navigation