Advertisement

DistClusTree: A Framework for Distributed Stream Clustering

  • Zhinoos Razavi Hesabi
  • Timos Sellis
  • Kewen Liao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10837)

Abstract

In this paper, we investigate the problem of clustering distributed multidimensional data streams. We devise a distributed clustering framework DistClusTree that extends the centralized ClusTree approach. The main difficulty in distributed clustering is balancing communication cost and clustering quality. We tackle this in DistClusTree through combining spatial index summaries and online tracking for efficient local and global incremental clustering. We demonstrate through extensive experiments the efficacy of the framework in terms of communication cost and approximate clustering quality.

References

  1. 1.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)Google Scholar
  2. 2.
    Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the divide: continuous clustering of distributed data streams. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 1036–1045, April 2007Google Scholar
  3. 3.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)Google Scholar
  4. 4.
    Guttman, A.: R-trees: a dynamic index structure for spatial searching, vol. 14. ACM (1984)CrossRefGoogle Scholar
  5. 5.
    Januzaj, E., Kriegel, H.-P., Pfeifle, M.: Towards effective and efficient distributed clustering. In: Workshop on Clustering Large Data Sets ICDM, pp. 49–58 (2003)Google Scholar
  6. 6.
    Kargupta, H., Huang, W., Sivakumar, K., Johnson, E.: Distributed clustering using collective principal component analysis. Knowl. Inf. Syst. 3, 2001 (1999)zbMATHGoogle Scholar
  7. 7.
    Klusch, M., Lodi, S., Moro, G.: Distributed clustering based on sampling local density estimates. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI 2003, pp. 485–490. Morgan Kaufmann Publishers Inc., San Francisco (2003)Google Scholar
  8. 8.
    Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)CrossRefGoogle Scholar
  9. 9.
    Rodrigues, P.P., Gama, J.: Distributed clustering of ubiquitous data streams. Wiley Interdisc. Rev. Data Mining Knowl. Disc. 4(01), 38–54 (2014)CrossRefGoogle Scholar
  10. 10.
    White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proceedings of the Twelfth International Conference on Data Engineering, pp. 516–523, February 1996Google Scholar
  11. 11.
    Yi, K., Zhang, Q.: Multidimensional online tracking. ACM Trans. Algorithms (TALG) 8(2), 12 (2012)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Zhou, A., Cao, F., Yan, Y., Sha, C., He, X.: Distributed data stream clustering: a fast EM-based approach. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 736–745, April 2007Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Zhinoos Razavi Hesabi
    • 1
  • Timos Sellis
    • 2
  • Kewen Liao
    • 2
  1. 1.School of Computer Science and ITRMIT UniversityMelbourneAustralia
  2. 2.Department of Computer Science and Software EngineeringSwinburne UniversityHawthornAustralia

Personalised recommendations