Abstract
We present an algorithm to impute missingvalues from given dataalone, and analyse its performance. Theproposed procedure is based onnon-numeric rule based data analysis, and aimsto maximise consistency of imputation from known values. Incontrast to the prevailingstatistical imputation algorithms, it does notmake representationalassumptions or presupposes other modelconstraints. Therefore, it is suitablefor a wide variety of data – sets, and can beused as a pre-processing step beforeresorting to harder numerical methods.
Similar content being viewed by others
References
Acock, A. (1997). Working with Missing Data. Family Science Review 10: 76–102.
Arbuckle, J. (1996). Amos Users Guide: Version 3.6. Chicago: SmallWaters Corp.
Bentler, P. (1996). EQS: Structural Equations Program Manual. Los Angeles: BMDP Statistical Software.
Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm (with Discussion. Journal of the Royal Statistical Society (B) 39: 1–38.
Düntsch, I. & Gediga, G. (1997). Statistical Evaluation of Rough Set Dependency Analysis. International Journal of Human-Computer Studies 46: 589–604.
Düntsch, I. & Gediga, G. (1998a). Simple Data Filtering in Rough Set Systems. International Journal of Approximate Reasoning 18(1–2): 93–106.
Düntsch, I. & Gediga, G. (1998b). Uncertainty Measures of Rough Set Prediction. Artificial Intelligence 106(1): 77–107.
Düntsch, I. & Gediga, G. (2000). Rough Set Data Analysis: A Road to Non-Invasive Knowledge Discovery, Vol. 2 of Methoδos Primers. Bangor: Methoδos Publishers (UK).
Düntsch, I. & Gediga, G. (2001). Roughian–Rough Information Analysis. International Journal of Intelligent Systems 16(1): 121–147.
Graham, J. W., Hofer, S. M. & Piccinin, A. M. (1994). Analysis with Missing Data in Drug Prevention Research. In Collins, L. M. & Seitz, L. (eds.) Advances in Data Analysis for Prevention Intervention Research. Washington, NIDA Research Monograph. Series 142.
Grzymała-Busse, J. (1991). On the Unknown Attribute Values in Learning from Examples. In Proc of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, Vol. 542 of Lecture Notes in Artificial Intelligence, 368–377. Charlotte.
Lakshminarayan, K., Harp, S. A., Samad, T. & Goldman, R. P. (1996). Imputation of Missing Data Using Machine Learning Techniques. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 140–145. Menlo Park, American Association for Artificial Intelligence.
Little, R. J. A. & Rubin, D. B. (1987). Statistical Analysis with Missing Data. New York: Wiley.
Meng, X. L. (1995). Multiple-Imputation Inferences with Uncongenial Sources of Input (with discussion). Statistical Science 10: 538–573.
Pawlak, Z. (1982). Rough Sets. Internat. J. Comput. Inform. Sci. 11: 341–356.
Rubin, D. B. (1987). Multiple Imputations for Nonresponse in Surveys. New York: Wiley.
Rubin, D. B. (1996). Multiple Imputation after 18+ Years (with discussion). Journal of the American Statistical Association 91: 473–489.
Schafer, J. (1997). Analysis of Incomplete Multivariate Data. Chapman & Hall.
Wang, H., Düntsch, I. & Gediga, G. (2000). Classificatory Filtering in Decision Systems. International Journal of Approximate Reasoning 23: 111–136.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Gediga, G., Düntsch, I. Maximum Consistency of Incomplete Data via Non-Invasive Imputation. Artificial Intelligence Review 19, 93–107 (2003). https://doi.org/10.1023/A:1022188514489
Issue Date:
DOI: https://doi.org/10.1023/A:1022188514489