Drifted Data Stream Clustering Based on ClusTree Algorithm

  • Jakub ZgrajaEmail author
  • Michał Woźniak
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10870)


Correct recognition of the possible changes in data streams, called concept drifts plays a crucial role in constructing the appropriate model learning strategy. This paper focuses on the unsupervised learning model for non-stationary data streams, where two significant modifications of the ClustTree algorithm are presented. They allow the clustering model to be adapted to the changes caused by a concept drift. An experimental study conducted on a set of benchmark data streams proves the usefulness of the proposed solutions.


Concept drift Data streams ClusTree On-line clustering 



This work was supported by Statutory Fund of the Department of Systems and—Computer Networks, Faculty of Electronics, Wroclaw University of Science and Technology.


  1. 1.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 81–92. VLDB Endowment (2003)CrossRefGoogle Scholar
  2. 2.
    Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Log. Soft Comput. 17(2–3), 255–287 (2011)Google Scholar
  3. 3.
    Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft. Comput. 13(3), 307–318 (2009)CrossRefGoogle Scholar
  4. 4.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J Mach. Learn. Res. 11, 1601–1604 (2010)Google Scholar
  5. 5.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 71–80. ACM, New York (2000)Google Scholar
  7. 7.
    Gama, J.: Knowledge Discovery from Data Streams. CRC Press, Boca Raton (2010)CrossRefGoogle Scholar
  8. 8.
    Gama, J., Gaber, M.: Learning from Data Streams: Processing Techniques Insensor Networks. Springer, Heidelberg (2007). Scholar
  9. 9.
    Gama, J., Rodrigues, P.P.: Stream-based electricity load forecast. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 446–453. Springer, Heidelberg (2007). Scholar
  10. 10.
    Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The ClusTree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)CrossRefGoogle Scholar
  11. 11.
    Manolopoulos, Y., Nanopoulos, A., Papadopoulos, A.N., Theodoridis, Y.: R-Trees: Theory and Applications. Springer, Heidelberg (2005). Scholar
  12. 12.
    Ren, J., Ma, R.: Density-based data streams clustering over sliding windows. In: 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 5, pp. 248–252, August 2009Google Scholar
  13. 13.
    Sun, J., Sow, D., Hu, J., Ebadollahi, S.: A system for mining temporal physiological data streams for advanced prognostic decision support. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM 2010, pp. 1061–1066, Washington, DC, USA. IEEE Computer Society (2010)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculty of Electronics, Department of Systems and Computer NetworksWroclaw University of Science and TechnologyWrocławPoland

Personalised recommendations