Advertisement

Dealing with Missing Data: Algorithms Based on Fuzzy Set and Rough Set Theories

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3700)

Abstract

Missing data, commonly encountered in many fields of study, introduce inaccuracy in the analysis and evaluation. Previous methods used for handling missing data (e.g., deleting cases with incomplete information, or substituting the missing values with estimated mean scores), though simple to implement, are problematic because these methods may result in biased data models. Fortunately, recent advances in theoretical and computational statistics have led to more flexible techniques to deal with the missing data problem. In this paper, we present missing data imputation methods based on clustering, one of the most popular techniques in Knowledge Discovery in Databases (KDD). We combine clustering with soft computing, which tends to be more tolerant of imprecision and uncertainty, and apply fuzzy and rough clustering algorithms to deal with incomplete data. The experiments show that a hybridization of fuzzy set and rough set theories in missing data imputation algorithms leads to the best performance among our four algorithms, i.e., crisp K-means, fuzzy K-means, rough K-means, and rough-fuzzy K-means imputation algorithms.

Keywords

Missing data imputation K-means clustering fuzzy sets rough sets rough-fuzzy hybridization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)zbMATHGoogle Scholar
  2. 2.
    Harms, S., Li, D., Deogun, J.S., Tadesse, T.: Efficient rule discovery in a geo-spatial desicion support system. In: Proceedings of the Second National Conference on Digital Government, pp. 235–241 (2002)Google Scholar
  3. 3.
    Li, D., Deogun, J.S.: Spatio-temporal association mining for un-sampled sites. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 478–485. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Li, D., Deogun, J., Harms, S.: Interpolation techniques for geo-spatial association rule mining. In: Proceedings of the 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Chongqing, China, pp. 573–580 (2003)Google Scholar
  5. 5.
    Li, D., Deogun, J.S.: Interpolation models for spatio-temporal association mining. Fundamenta Informaticae 59, 153–172 (2004)zbMATHMathSciNetGoogle Scholar
  6. 6.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39, 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  7. 7.
    Gary, K., Honaker, J., Joseph, A., Scheve, K.: Listwise deletion is evil: What to do about missing data in political science (2000), http://GKing.Harvard.edu
  8. 8.
    Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Proceedings of the Workshop on Foundations and New Directions in Data Mining, the third IEEE International Conference on Data Mining, Melbourne, FL, pp. 56–63 (2003)Google Scholar
  9. 9.
    Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of indiscernibility relation and rule induction. Transactions on Rough Sets 1, 78–95 (2004)CrossRefGoogle Scholar
  10. 10.
    Myrtveit, I., Stensrud, E., Olsson, U.H.: Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Transactions on Software Engineering 27, 999–1013 (2001)CrossRefGoogle Scholar
  11. 11.
    Roth, P.: Missing data: A conceptual review for applied psychologists. Personnel Psychology 47, 537–560 (1994)CrossRefGoogle Scholar
  12. 12.
    Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC (1997)Google Scholar
  13. 13.
    Weiss, S.M., Indurkhya, N.: Decision-rule solutions for data mining with missing values. In: IBERAMIA-SBIA, pp. 1–10 (2000)Google Scholar
  14. 14.
    Fujikawa, Y., Ho, T.: Cluster-based algorithms for dealing with missing values. In: Proceedings of Advances in Knowledge Discovery and Data Mining, 6th Pacific-Asia Conference (PAKDD), pp. 535–548 (2002)Google Scholar
  15. 15.
    Hartigan, J., Wong, M.: Algorithm AS136: A k-means clustering algorithm. Applied Statistics 28, 100–108 (1979)zbMATHCrossRefGoogle Scholar
  16. 16.
    Zadeh, L.: Fuzzy sets. Information and Control 8, 338–353 (1965)zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Li, D., Deogun, J.S., Spaulding, W., Shuart, B.: Towards missing data imputation: A study of fuzzy K-means clustering method. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 573–579. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  18. 18.
    Yager, R.R.: Using fuzzy methods to model nearest neighbor rules. IEEE Transactions on Systems, Man and Cybernetics, Part B 32, 512–525 (2002)CrossRefGoogle Scholar
  19. 19.
    Akleman, E., Chen, J.: Generalized distance functions. In: Proceedings of the 1999 International Conference on Shape Modeling, pp. 72–79 (1999)Google Scholar
  20. 20.
    Joshi, A., Krishnapuram, R.: Robust fuzzy clustering methods to support web mining. In: Proc. Workshop in Data Mining and knowledge Discovery, SIGMOD, pp. 15-1 – 15-8 (1998)Google Scholar
  21. 21.
    Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems 9, 595–607 (2001)CrossRefGoogle Scholar
  22. 22.
    Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)zbMATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Peters, J.F., Borkowski, M.: K-means indiscernibility over pixels. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 580–585. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  24. 24.
    Lingras, P., Yan, R., West, C.: Comparison of conventional and rough k-means clustering. In: Proc. of the 9th Intl. Conf. on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Chongqing, China, pp. 130–137 (2003)Google Scholar
  25. 25.
    Asharaf, S., Murty, M.N.: An adaptive rough fuzzy single pass algorithm for clustering large data sets. Pattern Recognition 36, 3015–3018 (2003)zbMATHCrossRefGoogle Scholar
  26. 26.
    Banerjee, M., Mitra, S., Pal, S.K.: Rough fuzzy mlp: Knowledge encoding and classification. IEEE Trans. Neural Networks 9, 1203–1216 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  1. 1.Department of Computer Science & EngineeringUniversity of Nebraska-LincolnLincolnUSA
  2. 2.Department of PsychologyUniversity of Nebraska-LincolnLincolnUSA

Personalised recommendations