Dealing with Missing Data: Algorithms Based on Fuzzy Set and Rough Set Theories

Li, Dan; Deogun, Jitender; Spaulding, William; Shuart, Bill

doi:10.1007/11574798_3

Dan Li¹⁸,
Jitender Deogun¹⁸,
William Spaulding¹⁹ &
…
Bill Shuart¹⁹

Part of the book series: Lecture Notes in Computer Science ((TRS,volume 3700))

577 Accesses
11 Citations

Abstract

Missing data, commonly encountered in many fields of study, introduce inaccuracy in the analysis and evaluation. Previous methods used for handling missing data (e.g., deleting cases with incomplete information, or substituting the missing values with estimated mean scores), though simple to implement, are problematic because these methods may result in biased data models. Fortunately, recent advances in theoretical and computational statistics have led to more flexible techniques to deal with the missing data problem. In this paper, we present missing data imputation methods based on clustering, one of the most popular techniques in Knowledge Discovery in Databases (KDD). We combine clustering with soft computing, which tends to be more tolerant of imprecision and uncertainty, and apply fuzzy and rough clustering algorithms to deal with incomplete data. The experiments show that a hybridization of fuzzy set and rough set theories in missing data imputation algorithms leads to the best performance among our four algorithms, i.e., crisp K-means, fuzzy K-means, rough K-means, and rough-fuzzy K-means imputation algorithms.

This work was supported, in part, by a grant from NSF (EIA-0091530), a cooperative agreement with USADA FCIC/RMA (2IE08310228), and an NSF EPSCOR Grant (EPS-0091900).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
MATH Google Scholar
Harms, S., Li, D., Deogun, J.S., Tadesse, T.: Efficient rule discovery in a geo-spatial desicion support system. In: Proceedings of the Second National Conference on Digital Government, pp. 235–241 (2002)
Google Scholar
Li, D., Deogun, J.S.: Spatio-temporal association mining for un-sampled sites. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 478–485. Springer, Heidelberg (2003)
Chapter Google Scholar
Li, D., Deogun, J., Harms, S.: Interpolation techniques for geo-spatial association rule mining. In: Proceedings of the 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Chongqing, China, pp. 573–580 (2003)
Google Scholar
Li, D., Deogun, J.S.: Interpolation models for spatio-temporal association mining. Fundamenta Informaticae 59, 153–172 (2004)
MATH MathSciNet Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Gary, K., Honaker, J., Joseph, A., Scheve, K.: Listwise deletion is evil: What to do about missing data in political science (2000), http://GKing.Harvard.edu
Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. In: Proceedings of the Workshop on Foundations and New Directions in Data Mining, the third IEEE International Conference on Data Mining, Melbourne, FL, pp. 56–63 (2003)
Google Scholar
Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of indiscernibility relation and rule induction. Transactions on Rough Sets 1, 78–95 (2004)
Article Google Scholar
Myrtveit, I., Stensrud, E., Olsson, U.H.: Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Transactions on Software Engineering 27, 999–1013 (2001)
Article Google Scholar
Roth, P.: Missing data: A conceptual review for applied psychologists. Personnel Psychology 47, 537–560 (1994)
Article Google Scholar
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC (1997)
Google Scholar
Weiss, S.M., Indurkhya, N.: Decision-rule solutions for data mining with missing values. In: IBERAMIA-SBIA, pp. 1–10 (2000)
Google Scholar
Fujikawa, Y., Ho, T.: Cluster-based algorithms for dealing with missing values. In: Proceedings of Advances in Knowledge Discovery and Data Mining, 6th Pacific-Asia Conference (PAKDD), pp. 535–548 (2002)
Google Scholar
Hartigan, J., Wong, M.: Algorithm AS136: A k-means clustering algorithm. Applied Statistics 28, 100–108 (1979)
Article MATH Google Scholar
Zadeh, L.: Fuzzy sets. Information and Control 8, 338–353 (1965)
Article MATH MathSciNet Google Scholar
Li, D., Deogun, J.S., Spaulding, W., Shuart, B.: Towards missing data imputation: A study of fuzzy K-means clustering method. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 573–579. Springer, Heidelberg (2004)
Chapter Google Scholar
Yager, R.R.: Using fuzzy methods to model nearest neighbor rules. IEEE Transactions on Systems, Man and Cybernetics, Part B 32, 512–525 (2002)
Article Google Scholar
Akleman, E., Chen, J.: Generalized distance functions. In: Proceedings of the 1999 International Conference on Shape Modeling, pp. 72–79 (1999)
Google Scholar
Joshi, A., Krishnapuram, R.: Robust fuzzy clustering methods to support web mining. In: Proc. Workshop in Data Mining and knowledge Discovery, SIGMOD, pp. 15-1 – 15-8 (1998)
Google Scholar
Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems 9, 595–607 (2001)
Article Google Scholar
Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)
Article MATH MathSciNet Google Scholar
Peters, J.F., Borkowski, M.: K-means indiscernibility over pixels. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 580–585. Springer, Heidelberg (2004)
Chapter Google Scholar
Lingras, P., Yan, R., West, C.: Comparison of conventional and rough k-means clustering. In: Proc. of the 9th Intl. Conf. on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Chongqing, China, pp. 130–137 (2003)
Google Scholar
Asharaf, S., Murty, M.N.: An adaptive rough fuzzy single pass algorithm for clustering large data sets. Pattern Recognition 36, 3015–3018 (2003)
Article MATH Google Scholar
Banerjee, M., Mitra, S., Pal, S.K.: Rough fuzzy mlp: Knowledge encoding and classification. IEEE Trans. Neural Networks 9, 1203–1216 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588-0115, USA
Dan Li & Jitender Deogun
Department of Psychology, University of Nebraska-Lincoln, Lincoln, NE, 68588-0308, USA
William Spaulding & Bill Shuart

Authors

Dan Li
View author publications
You can also search for this author in PubMed Google Scholar
Jitender Deogun
View author publications
You can also search for this author in PubMed Google Scholar
William Spaulding
View author publications
You can also search for this author in PubMed Google Scholar
Bill Shuart
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Manitoba, R3T 5V6, Winnipeg, Manitoba, Canada
James F. Peters
Institute of Mathematics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, D., Deogun, J., Spaulding, W., Shuart, B. (2005). Dealing with Missing Data: Algorithms Based on Fuzzy Set and Rough Set Theories. In: Peters, J.F., Skowron, A. (eds) Transactions on Rough Sets IV. Lecture Notes in Computer Science, vol 3700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11574798_3

Download citation

DOI: https://doi.org/10.1007/11574798_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29830-4
Online ISBN: 978-3-540-32016-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics