Advertisement

On Social Network-Based Algorithms for Data Stream Clustering

  • Jean Paul BarddalEmail author
  • Heitor Murilo Gomes
  • Fabrício Enembreck
Chapter
  • 739 Downloads
Part of the Studies in Big Data book series (SBD, volume 41)

Abstract

Extracting useful patterns from data is a challenging task that has been extensively investigated by both machine learning researchers and practitioners for many decades. This task becomes even more problematic when data is presented as a potentially unbounded sequence, the so-called data streams. Albeit most of the research on data stream mining focuses on supervised learning, the assumption that labels are available for learning is unverifiable in most streaming scenarios. Thus, several data stream clustering algorithms were proposed in the last decades to extract meaningful patterns from streams. In this study, we present three recent data stream clustering algorithms based on insights from social networks’ theory that exhibit competitive results against the state of the art. The main distinctive characteristics of these algorithms are the following: (1) they do not rely on a hyper-parameter to define the number of clusters to be found; and (2) they do not require batch processing during the offline steps. These algorithms are detailed and compared against existing works on the area, showing their efficiency in clustering quality, processing time, and memory usage.

Keywords

Clustering Data Streams Clustering Algorithm Parameters Outlier Micro-clusters (OMCs) DenStream Rewiring Procedure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

This research was financially supported by the Coordenação de Aperfeiçoa–mento de Pessoal de Nível Superior (CAPES) through the Programa de Suporte à Pòs-Graduação de Instituições de Ensino Particulares (PROSUP) program and Fundação Araucária.

References

  1. 1.
    Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: International Conference on Database Theory 2001, pp. 420–434. Springer, Berlin (2001). https://doi.org/10.1007/3-540-44503-X_27
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29, VLDB Endowment, VLDB ‘03, pp. 81–92 (2003)Google Scholar
  3. 3.
    Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. In: Reviews of Modern Physics, pp. 139–148. The American Physical Society (2002)Google Scholar
  4. 4.
    Amini, A., Wah, T.Y.: On density-based data streams clustering algorithms: a survey. J. Comput. Sci. Technol. 29(1), 116–141 (2014). https://doi.org/1.1007/s11390-014-1416-y CrossRefGoogle Scholar
  5. 5.
    Barddal, J.P., Gomes, H.M., Enembreck, F.: A complex network-based anytime data stream clustering algorithm. In: Neural Information Processing - 22nd International Conference, ICONIP 2015, Istanbul, Turkey, November 9–12, 2015, Proceedings, Part I, pp. 615–622 (2015). https://doi.org/10.1007/978-3-319-26532-2_68
  6. 6.
    Barddal, J.P., Gomes, H.M., Enembreck, F.: SNCStream: a social network-based data stream clustering algorithm. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC). ACM, New York (2015)Google Scholar
  7. 7.
    Barddal, J.P., Gomes, H.M., Enembreck, F., Barthès, J.P.: SNCStream+: extending a high quality true anytime data stream clustering algorithm. Inf. Syst. (2016). https://doi.org/10.1016/j.is.2016.06.007
  8. 8.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: Moa: Massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)Google Scholar
  9. 9.
    Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM International Conference on Data Mining, pp 328–339 (2006)Google Scholar
  10. 10.
    Corder, G., Foreman, D.: Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach. Wiley, London (2011)zbMATHGoogle Scholar
  11. 11.
    Erdos, P., Rényi, A.: On the evolution of random graphs. In: Publication of the Mathematical Institute of the Hungarian Academy of Sciences, pp. 17–61 (1960)Google Scholar
  12. 12.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) KDD-96 Proceedings, pp. 226–231. AAAI Press, Menlo Park (1996)Google Scholar
  13. 13.
    Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 1–37 (2014). https://doi.org/1.1145/2523813 CrossRefGoogle Scholar
  14. 14.
    Gomes, H.M., Barddal, J.P., Enembreck, F., Bifet, A.: A survey on ensemble learning for data stream classification. ACM Comput. Surv. 50(2), 1–36 (2017). https://doi.org/10.1145/3054925 CrossRefGoogle Scholar
  15. 15.
    Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data SIGMOD‘84, pp. 47–57. ACM, New York (1984). https://doi.org/1.1145/602259.602266
  16. 16.
    Harries, M., Wales, N.S.: Splice-2 comparative evaluation: Electricity pricing (1999)Google Scholar
  17. 17.
    Hassani, M., Spaus, P., Seidl, T.: Adaptive multiple-resolution stream clustering. In: Perner, P. (ed.) Machine Learning and Data Mining in Pattern Recognition. Lecture Notes in Computer Science, vol. 8556, pp. 134–148. Springer International Publishing, Berlin (2014)Google Scholar
  18. 18.
    Ikonomovska, E., Gama, J., Zenko, B., Dzeroski, S.: Speeding-up hoeffding-based regression trees with options. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 537–544 (2011)Google Scholar
  19. 19.
    Kosina, P., Gama, J.: Very fast decision rules for multi-class problems. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC‘12, pp. 795–800. ACM, New York (2012). https://doi.org/1.1145/2245276.2245431
  20. 20.
    Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)CrossRefGoogle Scholar
  21. 21.
    Kremer, H., Kranen, P., Jansen, T., Seidl, T., Bifet, A., Holmes, G., Pfahringer, B.: An effective evaluation measure for clustering on evolving data streams. In: Proceedings of the 17th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD 2011), San Diego, CA, pp. 868–876. ACM, New York (2011)Google Scholar
  22. 22.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans. Knowl. Data Eng. 23(6), 859–874 (2011)CrossRefGoogle Scholar
  24. 24.
    Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., Carvalho, A.C.P.L.F.D., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 1–31 (2013). https://doi.org/1.1145/2522968.2522981 CrossRefGoogle Scholar
  25. 25.
    Ugulino, W., Cardador, D., Vega, K., Velloso, E., Milidia, R., Fuks, H.: Wearable computing: accelerometers’ data classification of body postures and movements. In: Advances in Artificial Intelligence - SBIA 2012. Lecture Notes in Computer Science, pp. 52–61. Springer, Berlin (2012)Google Scholar
  26. 26.
    Watts, D.J., Strogatz, S.H.: Collective dynamics of small-world networks. Nature 393(6684), 440–442 (1998)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Jean Paul Barddal
    • 1
    Email author
  • Heitor Murilo Gomes
    • 2
  • Fabrício Enembreck
    • 1
  1. 1.Graduate Program in Informatics (PPGIa)Pontifícia Universidade Católica do ParanáCuritibaBrazil
  2. 2.Institut Mines-Télécom, Department of Computer Science and Networks (INFRES)Université Paris-SaclayParisFrance

Personalised recommendations