Advertisement

MicroGRID: An Accurate and Efficient Real-Time Stream Data Clustering with Noise

  • Z. TariEmail author
  • A. Thompson
  • N. Almusalam
  • P. Bertok
  • A. Mahmood
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10938)

Abstract

Data stream clustering aims to produce clusters from a data-stream in a real-time. Many of existing algorithms focus however on solving a single problem, leaving anomalous noise in data streams at the wayside. This paper describes the MicroGRID approach to cluster data from single data-streams to handle noisy data streams, accurately identifying and separating noise-affected data points from outlier points. In particular, MicroGRID utilises a combination of micro-cluster and grid-based prospectives, an approach that has not been attempted when clustering data-streams. The experimental results clearly show that MicroGRID significantly outperforms the baseline methods: MicroGRID is up 87% faster and up to 80% more accurate clustering outputs.

References

  1. 1.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 81–92 (2003)CrossRefGoogle Scholar
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: Proceedings of the 13th International Conference on Very Large Data Bases, pp. 852–863 (2004)CrossRefGoogle Scholar
  3. 3.
    Aggarwal, C.C., Yu, P.S.: A framework for clustering uncertain data streams. In: 24th Proceedings of the IEEE International Conference on Data Engineering, pp. 150–159 (2008)Google Scholar
  4. 4.
    Al Aghbari, Z., Kamel, I., Awad, T.: On clustering large number of data streams. Intell. Data Anal. 16(1), 69–91 (2012)Google Scholar
  5. 5.
    Amini, A., Wah, T.Y., Saybani, M.R., Yazdi, S.R.: A study of density-grid based clustering algorithms on data streams. In: Proceedings of the 8th IEEE International Conference on Fuzzy Systems and Knowledge Discovery, pp. 1652–1656 (2011)Google Scholar
  6. 6.
    Amini, A., Saboohi, H., Herawan, T., Wah, T.Y.: Mudi-stream: s multi density clustering algorithm for evolving data stream. J. Netw. Comput. Appl. 59, 370–385 (2016)CrossRefGoogle Scholar
  7. 7.
    Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the SIAM International Conference on Data Mining, vol. 6, pp. 328–339 (2006)CrossRefGoogle Scholar
  8. 8.
    Chen, L., Zou, L.J., Tu, L.: A clustering algorithm for multiple data streams based on spectral component similarity. Inf. Sci. 183(1), 35–47 (2012)CrossRefGoogle Scholar
  9. 9.
    Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142. ACM (2007)Google Scholar
  10. 10.
    Ciampi, A., Appice, A., Malerba, D.: Summarization for geographically distributed data streams. In: Proceedings of Knowledge-Based and Intelligent Information and Engineering Systems, pp. 339–348 (2010)CrossRefGoogle Scholar
  11. 11.
    de Andrade Silva, J., Hruschka, E.R.: Extending k-means-based algorithms for evolving data streams with variable number of clusters. In: Proceedings of the 10th International Conference on Machine Learning and Applications, pp. 14–19 (2011)Google Scholar
  12. 12.
    Hahsler, M., Bolaos, M.: Clustering data streams based on shared density between micro-clusters. IEEE Trans. Knowl. Data Eng. 28, 1449–1461 (2016)CrossRefGoogle Scholar
  13. 13.
    Huang, G., Zhang, Y., Cao, J., Steyn, M., Taraporewalla, K.: Online mining abnormal period patterns from multiple medical sensor data streams. World Wide Web 17(4), 569–587 (2014)CrossRefGoogle Scholar
  14. 14.
    Liu, W., and J. OuYang. Clustering algorithm for high dimensional data stream over sliding windows. In: Proceedings of 10th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 1537–1542 (2011)Google Scholar
  15. 15.
    Qi, Z., Jinze, L., Wei, W.: Approximate clustering on distributed data streams, pp. 1131–1139 (2008)Google Scholar
  16. 16.
    Sabit, H., Al-Anbuky, A., Gholam-Hosseini, H.: Distributed WSN data stream mining based on fuzzy clustering. In: Proceedings of Symposia on Ubiquitous, Autonomic and Trusted Computing, pp. 395–400 (2009)Google Scholar
  17. 17.
    Wang, C.D., Lai, J.H., Huang, D., Zheng, W.S.: SVStream: a support vector-based algorithm for clustering data streams. IEEE Trans. Knowl. Data Eng. 25(6), 1410–1424 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Z. Tari
    • 1
    Email author
  • A. Thompson
    • 1
  • N. Almusalam
    • 1
  • P. Bertok
    • 1
  • A. Mahmood
    • 2
  1. 1.School of ScienceRMIT UniversityMelbourneAustralia
  2. 2.School of Engineering and Mathematical ScienceLa Trobe UniversityMelbourneAustralia

Personalised recommendations