Skip to main content

A Three-Way Decisions Clustering Algorithm for Incomplete Data

  • Conference paper
Rough Sets and Knowledge Technology (RSKT 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8818))

Included in the following conference series:

Abstract

Clustering is one of the most widely used efficient approaches in data mining to find potential data structure. However, there are some reasons to cause the missing values in real data sets such as difficulties and limitations of data acquisition and random noises. Most of clustering methods can’t be used to deal with incomplete data sets for clustering analysis directly. For this reason, this paper proposes a three-way decisions clustering algorithm for incomplete data based on attribute significance and miss rate. Three-way decisions with interval sets naturally partition a cluster into positive region, boundary region and negative region, which has the advantage of dealing with soft clustering. First, the data set is divided into four parts such as sufficient data, valuable data, inadequate data and invalid data, according to the domain knowledge about the attribute significance and miss rate. Second, different strategies are devised to handle the four types based on three-way decisions. The experimental results on some data sets show preliminarily the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Azam, N.: Formulating Three-way decision making with game-theoretic rough sets. In: Proceedings of CanadainConference on Electrical and Computer Engineering (CCECE 2013), pp. 695–698. IEEE Press (2013)

    Google Scholar 

  2. Dixon, J.K.: Pattern recognition with partly missing data. IEEE Transactions on Systems, Man, and Cybernetics 9, 617–621 (1979)

    Article  Google Scholar 

  3. Himmelspach, L., Hommers, D., Conrad, S.: Cluster tendency assessment for fuzzy clustering of incomplete data. In: Proceedings of the 7th conference of the European Society for Fuzzy Logic and Technology, pp. 290–297. Atlantis Press (2011)

    Google Scholar 

  4. Honda, K., Nonoguchi, R., Notsu, A., Ichihashi, H.: PCA-guided k-Means clustering with incomplete data. In: 2011 IEEE International Conference on Fuzzy Systems (FUZZ), pp. 1710–1714. IEEE Press (2011)

    Google Scholar 

  5. Lai, P.H., O’Sullivan, J.A.: MDL hierarchical clustering with incomplete data. In: Information Theory and Applications Workshop (ITA), pp. 1–5. IEEE Press (2010)

    Google Scholar 

  6. Liang, D.C., Liu, D.: A novel risk decision-making based on decision-theoretic rough sets under hesitant fuzzy information. J. IEEE Transactions on Fuzzy Systems (2014)

    Google Scholar 

  7. Li, D., Gu, H., Zhang, L.Y.: A hybrid genetic algorithm-fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals. J. Soft Computing. 17, 1787–1796 (2013)

    Article  Google Scholar 

  8. Li, D., Zhong, C.Q., Li, J.H.: An attribute weighted fuzzy c-means algorithm for incomplete data sets. In: 2012 International Conference on System Science and Engineering (ICSSE), pp. 449–453. IEEE Press (2012)

    Google Scholar 

  9. UCIrvine Machine Learning Repository: http://archive.ics.uci.edu/ml/

  10. Wu, J., Song, C.H., Kong, J.M., Lee, W.D.: Extended mean field annealing for clustering incomplete data. In: International Symposium on Information Technology Convergence, pp. 8–12. IEEE Press (2007)

    Google Scholar 

  11. Yamamoto, T., Honda, K., Notsu, A., Ichihashi, H.: FCMdd-type linear fuzzy clustering for incomplete non-Euclidean relational data. In: 2011 IEEE International Conference on Fuzzy Systems (FUZZ), pp. 792–798. IEEE Press (2011)

    Google Scholar 

  12. Yao, Y.: An outline of a theory of three-way decisions. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 1–17. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Yao, Y.Y.: Three-way decisions with probabilistic rough sets. J. Information Sciences 180, 341–353 (2010)

    Article  Google Scholar 

  14. Yu, H., Liu, Z.G., Wang, G.Y.: An automatic method to determine the number of clusters using decision-theoretic rough set. International Journal of Approximate Reasoning 55, 101–115 (2014)

    Article  MathSciNet  Google Scholar 

  15. Yu, H., Wang, Y.: Three-way decisions method for overlapping clustering. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 277–286. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  16. Zhou, B., Yao, Y.Y., Luo, J.G.: Cost-sensitive three-way email spam filtering. Journal of Intelligent Information Systems 42, 19–45 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Yu, H., Su, T., Zeng, X. (2014). A Three-Way Decisions Clustering Algorithm for Incomplete Data. In: Miao, D., Pedrycz, W., Ślȩzak, D., Peters, G., Hu, Q., Wang, R. (eds) Rough Sets and Knowledge Technology. RSKT 2014. Lecture Notes in Computer Science(), vol 8818. Springer, Cham. https://doi.org/10.1007/978-3-319-11740-9_70

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11740-9_70

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11739-3

  • Online ISBN: 978-3-319-11740-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics