Abstract
Missing data, commonly encountered in many fields of study, introduce inaccuracy in the analysis and evaluation. Previous methods used for handling missing data (e.g., deleting cases with incomplete information, or substituting the missing values with estimated mean scores), though simple to implement, are problematic because these methods may result in biased data models. Fortunately, recent advances in theoretical and computational statistics have led to more flexible techniques to deal with the missing data problem. In this paper, we present missing data imputation methods based on clustering, one of the most popular techniques in Knowledge Discovery in Databases (KDD). We combine clustering with soft computing, which tends to be more tolerant of imprecision and uncertainty, and apply fuzzy and rough clustering algorithms to deal with incomplete data. The experiments show that a hybridization of fuzzy set and rough set theories in missing data imputation algorithms leads to the best performance among our four algorithms, i.e., crisp K-means, fuzzy K-means, rough K-means, and rough-fuzzy K-means imputation algorithms.
This work was supported, in part, by a grant from NSF (EIA-0091530), a cooperative agreement with USADA FCIC/RMA (2IE08310228), and an NSF EPSCOR Grant (EPS-0091900).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
Harms, S., Li, D., Deogun, J.S., Tadesse, T.: Efficient rule discovery in a geo-spatial desicion support system. In: Proceedings of the Second National Conference on Digital Government, pp. 235–241 (2002)
Li, D., Deogun, J.S.: Spatio-temporal association mining for un-sampled sites. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 478–485. Springer, Heidelberg (2003)
Li, D., Deogun, J., Harms, S.: Interpolation techniques for geo-spatial association rule mining. In: Proceedings of the 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Chongqing, China, pp. 573–580 (2003)
Li, D., Deogun, J.S.: Interpolation models for spatio-temporal association mining. Fundamenta Informaticae 59, 153–172 (2004)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39, 1–38 (1977)
Gary, K., Honaker, J., Joseph, A., Scheve, K.: Listwise deletion is evil: What to do about missing data in political science (2000), http://GKing.Harvard.edu
Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Proceedings of the Workshop on Foundations and New Directions in Data Mining, the third IEEE International Conference on Data Mining, Melbourne, FL, pp. 56–63 (2003)
Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of indiscernibility relation and rule induction. Transactions on Rough Sets 1, 78–95 (2004)
Myrtveit, I., Stensrud, E., Olsson, U.H.: Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Transactions on Software Engineering 27, 999–1013 (2001)
Roth, P.: Missing data: A conceptual review for applied psychologists. Personnel Psychology 47, 537–560 (1994)
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC (1997)
Weiss, S.M., Indurkhya, N.: Decision-rule solutions for data mining with missing values. In: IBERAMIA-SBIA, pp. 1–10 (2000)
Fujikawa, Y., Ho, T.: Cluster-based algorithms for dealing with missing values. In: Proceedings of Advances in Knowledge Discovery and Data Mining, 6th Pacific-Asia Conference (PAKDD), pp. 535–548 (2002)
Hartigan, J., Wong, M.: Algorithm AS136: A k-means clustering algorithm. Applied Statistics 28, 100–108 (1979)
Zadeh, L.: Fuzzy sets. Information and Control 8, 338–353 (1965)
Li, D., Deogun, J.S., Spaulding, W., Shuart, B.: Towards missing data imputation: A study of fuzzy K-means clustering method. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 573–579. Springer, Heidelberg (2004)
Yager, R.R.: Using fuzzy methods to model nearest neighbor rules. IEEE Transactions on Systems, Man and Cybernetics, Part B 32, 512–525 (2002)
Akleman, E., Chen, J.: Generalized distance functions. In: Proceedings of the 1999 International Conference on Shape Modeling, pp. 72–79 (1999)
Joshi, A., Krishnapuram, R.: Robust fuzzy clustering methods to support web mining. In: Proc. Workshop in Data Mining and knowledge Discovery, SIGMOD, pp. 15-1 – 15-8 (1998)
Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems 9, 595–607 (2001)
Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)
Peters, J.F., Borkowski, M.: K-means indiscernibility over pixels. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 580–585. Springer, Heidelberg (2004)
Lingras, P., Yan, R., West, C.: Comparison of conventional and rough k-means clustering. In: Proc. of the 9th Intl. Conf. on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Chongqing, China, pp. 130–137 (2003)
Asharaf, S., Murty, M.N.: An adaptive rough fuzzy single pass algorithm for clustering large data sets. Pattern Recognition 36, 3015–3018 (2003)
Banerjee, M., Mitra, S., Pal, S.K.: Rough fuzzy mlp: Knowledge encoding and classification. IEEE Trans. Neural Networks 9, 1203–1216 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, D., Deogun, J., Spaulding, W., Shuart, B. (2005). Dealing with Missing Data: Algorithms Based on Fuzzy Set and Rough Set Theories. In: Peters, J.F., Skowron, A. (eds) Transactions on Rough Sets IV. Lecture Notes in Computer Science, vol 3700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11574798_3
Download citation
DOI: https://doi.org/10.1007/11574798_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29830-4
Online ISBN: 978-3-540-32016-6
eBook Packages: Computer ScienceComputer Science (R0)