Stream Clustering of Chat Messages with Applications to Twitch Streams

  • Matthias CarneinEmail author
  • Dennis Assenmacher
  • Heike Trautmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10651)


This paper proposes a new stream clustering algorithm for text streams. The algorithm combines concepts from stream clustering and text analysis in order to incrementally maintain a number of text droplets that represent topics within the stream. Our algorithm adapts to changes of topic over time and can handle noise and outliers gracefully by decaying the importance of irrelevant clusters. We demonstrate the performance of our approach by using more than one million real-world texts from the video streaming platform


Data stream Stream clustering Text analysis Text clustering 


  1. 1.
    Aggarwal, C.C.: Mining text and social streams. ACM SIGKDD Explor. Newsl. 15(2), 9–19 (2014)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, vol. 29, pp. 81–92. VLDB Endowment, Berlin, Germany (2003)Google Scholar
  3. 3.
    Aggarwal, C.C., Yu, P.S.: On clustering massive text and categorical data streams. Knowl. Inf. Syst. 24(2), 171–196 (2010)CrossRefGoogle Scholar
  4. 4.
    Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Conference on Data Mining (SIAM 2006), pp. 328–339 (2006)Google Scholar
  5. 5.
    Carnein, M., Assenmacher, D., Trautmann, H.: An empirical comparison of stream clustering algorithms. In: Proceedings of the ACM International Conference on Computing Frontiers (CF 2017), pp. 361–365 (2017)Google Scholar
  6. 6.
    Hahsler, M., Bolanos, M., Forrest, J.: stream: Infrastructure for Data Stream Mining (2015).
  7. 7.
    Hahsler, M., Bolaños, M.: Clustering data streams based on shared density between micro-clusters. IEEE Trans. Knowl. Data Eng. 28(6), 1449–1461 (2016)CrossRefGoogle Scholar
  8. 8.
    López-Ibáñez, M., Dubois-Lacoste, J., Pérez Cáceres, L., Stützle, T., Birattari, M.: The irace package: Iterated racing for automatic algorithm configuration. Oper. Res. Perspect. 3, 43–58 (2016)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A.C.O.L.F., Gama, J.: Data stream clustering: A survey. ACM Comput. Surv. 46(1), 131–1331 (2013)CrossRefzbMATHGoogle Scholar
  10. 10.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering databases method for very large. In: ACM SIGMOD International Conference on Management of Data, vol. 1, pp. 103–114 (1996)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Matthias Carnein
    • 1
    Email author
  • Dennis Assenmacher
    • 1
  • Heike Trautmann
    • 1
  1. 1.University of MünsterMünsterGermany

Personalised recommendations