A Novel Rough Set Based Clustering Approach for Streaming Data

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 236)

Abstract

Clustering is a very important data mining task. Clustering of streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving in data over time. Inherent uncertainty involved in real world data stream further magnifies the challenge of working with streaming data. Rough set is a soft computing technique which can be used to deal with uncertainty involved in cluster analysis. In this paper, we propose a novel rough set based clustering method for streaming data. It describes a cluster as a pair of lower approximation and an upper approximation. Lower approximation comprises of the data objects that can be assigned with certainty to the respective cluster, whereas upper approximation contains those data objects whose belongingness to the various clusters in not crisp along with the elements of lower approximation. Uncertainty in assigning a data object to a cluster is captured by allowing overlapping in upper approximation. Proposed method generates soft-cluster. Keeping in view the challenges of streaming data, the proposed method is incremental and adaptive to evolving concept. Experimental results on synthetic and real world data sets show that our proposed approach outperforms Leader clustering algorithm in terms of classification accuracy. Proposed method generates more natural clusters as compare to k-means clustering and it is robust to outliers. Performance of proposed method is also analyzed in terms of correctness and accuracy of rough clustering.

Keywords

Clustering Streaming data Cluster approximation Rough set 

References

  1. 1.
    Aggarwal, C., Han J., Wang J., Yu P.S.: A framework for projected clustering of high dimensional data streams. In: Proceedings of the 30th VLDB Conference (2004)Google Scholar
  2. 2.
    Aggarwal, C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of 2003 International Conference on Very Large Data Bases (VLDB03), Berlin (2004)Google Scholar
  3. 3.
    Asharaf, S.: Narasimha Murty, M., Shevade, S.K.: Rough set based incremental clustering of interval data. Pattern Recogn. Lett. 27(6), 515–519 (2006)Google Scholar
  4. 4.
    Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over evolving data tream with noise. In: Proceedings of the 6th SIAM International Conference on Data Mining (SIAM 2006), pp. 326–337 (2006)Google Scholar
  5. 5.
    Hailiang, W., Mingtian, Z.: A refined rough k-means clustering with hybrid threshold. Rough Sets and Current Trends in Computing. Lecture Notes in Computer Science, vol. 7413, pp. 26–35 (2012)Google Scholar
  6. 6.
    Jiawei, H., Mieheline, K.: Data Mining, Concepts and Techniques. 2nd edn. Morgan Kaufmann, Massachusetts (2006)Google Scholar
  7. 7.
    Joshi, M., Yiyu, Y., Lingras, P., Virendrakumar, C.B.: Rough, fuzzy, interval clustering for web usage mining. In: Proceedings of 10th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 397–402 (2010)Google Scholar
  8. 8.
    Lingras, P.: Rough set clustering for web mining. In: Proceedings of 2002 IEEE International Conference on Fuzzy Systems, pp. 1039–1044 (2002)Google Scholar
  9. 9.
    OCallaghan, L., Mishra, N., Meyerson, A., Guha, S.: Streaming data algorithms for high-quality clustering. In: Proceedings of ICDE Conference, pp. 685–704 (2000)Google Scholar
  10. 10.
    Pawlak, Z.: Rough Sets. Int. J. Inf. Commun. Comp.Sci. 11, 145–172 (1982)Google Scholar
  11. 11.
    Pawlak, Z.: Some Issues on rough sets. Trans. Rough. Sets. 3100, 1–58 (2004)Google Scholar
  12. 12.
    UCI Machine Learning Repository Irvine: CA University of California, School of Information and Computer, Irvine, CA (2010)Google Scholar
  13. 13.
    Udommanetanakit, K., Rakthanmanon, T., Waiyamai, K.: E-Stream. In: Evolution-Based Technique for Stream Clustering, pp. 605–615. Springer, Heidelberg (2007)Google Scholar
  14. 14.
    Yogita, Saroj, Kumar, D., Pal, V.: Rules + Exceptions: automated discovery of comprehensible decision rules. In: Proceedings of IEEE International Advance Computing Conference, pp. 1479–1484, TIET Patiala, India (2009)Google Scholar
  15. 15.
    Zhou, T., Zhang, Y.N., Lu, H.L.: Rough k-means cluster with adaptive parameters. In: 6th International Conference Machine on Learning and Cybernetics, pp. 3063–3068 (2007)Google Scholar

Copyright information

© Springer India 2014

Authors and Affiliations

  1. 1.Indian Institute of TechnologyRoorkeeIndia

Personalised recommendations