Data Mining and Knowledge Discovery

, Volume 26, Issue 1, pp 1–26 | Cite as

A single pass algorithm for clustering evolving data streams based on swarm intelligence

  • Agostino Forestiero
  • Clara Pizzuti
  • Giandomenico Spezzano
Article

Abstract

Existing density-based data stream clustering algorithms use a two-phase scheme approach consisting of an online phase, in which raw data is processed to gather summary statistics, and an offline phase that generates the clusters by using the summary data. In this article we propose a data stream clustering method based on a multi-agent system that uses a decentralized bottom-up self-organizing strategy to group similar data points. Data points are associated with agents and deployed onto a 2D space, to work simultaneously by applying a heuristic strategy based on a bio-inspired model, known as flocking model. Agents move onto the space for a fixed time and, when they encounter other agents into a predefined visibility range, they can decide to form a flock if they are similar. Flocks can join to form swarms of similar groups. This strategy allows to merge the two phases of density-based approaches and thus to avoid the computing demanding offline cluster computation, since a swarm represents a cluster. Experimental results show that the bio-inspired approach can obtain very good results on real and synthetic data sets.

Keywords

Data streams Density-based clustering Bio-inspired flocking model 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, CC (ed) (2007) Data streams—models and algorithms. Springer, BostonMATHGoogle Scholar
  2. Aggarwal CC, Han J, Wang J, Yu P (2003) A framework for clustering evolving data streams. In Proceedings of 29th international conference on very large data bases (VLDB’03). Morgan Kaufmann, San Francisco, pp 81–92Google Scholar
  3. Aggarwal CC, Han J, Wang J, Yu P (2006) On clustering massive data streams: a summarization paradigm. In: Aggarwal CC (ed) Data streams—models and algorithms. Springer, Boston, pp 11–38Google Scholar
  4. Azzag H, Monmarché N, Slimane M, Guinot C, Venturini G (2003) AntTree: a new model for clustering with artificial ants. In: Banzhaf W, Christaller T, Dittrich P, Kim JT, Ziegler J (eds) Advances in artificial life—Proceedings of the 7th European conference on artificial life (ECAL). Lecture notes in artificial intelligence, vol 2801. Springer, Berlin, pp 564–571Google Scholar
  5. Babock B, Datar M, Motwani R, O’Callaghan L (2003) Maintaining variance and k-medians over data stream windows. In: Proceedings of the 22nd ACM symposium on principles of data base systems (PODS 2003), San Diego, pp 234–243Google Scholar
  6. Barbará D (2002) Requirements for clustering data streams. SIGKDD Explor Newslett 3(2): 23–27CrossRefGoogle Scholar
  7. Beringher J, Hullermeier E (2006) Online clustering of parallel data streams. Data Knowl Eng 58(2): 180–204CrossRefGoogle Scholar
  8. Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over evolving data stream with noise. In: Proceedings of the sixth SIAM international conference on data mining (SIAM’06), Bethesda, pp 326–337Google Scholar
  9. Charikar M, O’Callaghan L, Panigrahy R (2003) Better streaming algorithms for clustering problems. In: Proceedings of the 35th annual ACM symposium on theory of computing (STOC’03), San Diego, pp 30–39Google Scholar
  10. Chen Y, Li T (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’07), ACM, New York, pp 133–142Google Scholar
  11. Cui X, Potok TE (2006a) A distributed agent implementation of multiple species flocking model for document partitioning clustering. In: Cooperative information agents, Edinburgh, pp 124–137Google Scholar
  12. Cui X, Potok TE (2006b) A distributed flocking approach for information stream clustering analysis. In: Proceedings of the ACIS international conference on software engineering, artificial intelligence, networking, and parallel/distributed computing (SNPD’06), Las Vegas, pp 97–102Google Scholar
  13. Dai B, Huang J, Yeh M, Chen M (2006) Adaptive clustering for multiple evolving streams. IEEE Trans Knowl Data Eng 18(9): 1166–1180CrossRefGoogle Scholar
  14. Eberhart RC, Yuhui S, James K (2001) Swarm intelligence (the Morgan Kaufmann series in artificial intelligence). Morgan Kaufmann, San FranciscoGoogle Scholar
  15. Ester M, Kriegel H-P, Jrg S, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second ACM SIGKDD international conference on knowledge discovery and data mining (KDD’96), Portland, pp 373–382Google Scholar
  16. Folino G, Forestiero A, Spezzano G (2009) An adaptive flocking algorithm for performing approximate clustering. Inform Sci 179(18): 3059–3078CrossRefGoogle Scholar
  17. Guha S, Mishra N, Motwani R, O’Callaghan L (2000) Clustering data streams. In: Proceedings of the annual IEEE symposium on foundations of computer science, Redondo Beach, pp 359–366Google Scholar
  18. Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory and practise. IEEE Trans Knowl Data Eng 15(3): 515–528CrossRefGoogle Scholar
  19. Hamdi A, Monmarché N, Alimi A, Slimane M (2008) SwarmClass: a novel data clustering approach by a hybridization of an ant colony with flying insects. In: Dorigo M, Birattari M, Blum C, Clerc M, Stützle T, Winfield A (eds) Ant colony optimization and swarm intelligence—6th international conference, ANTS 2008. Lecture notes in computer science, vol 5217, September 22–24 2008. Springer, Berlin, pp 411–412Google Scholar
  20. Handl J, Meyer B (2007) Ant-based and swarm-based clustering. Swarm Intell 1(2): 95–113CrossRefGoogle Scholar
  21. Li Tu, Chen Y (2009) Stream data clustering based on grid density and attractions. ACM Trans Knowl Discov Data 3(3): 12–11227Google Scholar
  22. Li W, Ng WK, Yu PS, Zhang K (2009) Density-based clustering of data streams at multiple resolutions. ACM Trans Knowl Discov Data 3(3): 14–11428Google Scholar
  23. Liu S, Dou Z-T, Li F, Huang Y-L (2004) A new ant colony clustering algorithm based on DBSCAN. In: 3rd international conference on machine learning and cybernetics, Shanghai, pp 1491–1496Google Scholar
  24. Nasraoui O, Coronel CR (2006) Tecno-streams: tracking evolving clusters in noisy data streams with a scalable immune system learning model. In: Proceedings of the 6th SIAM international conference on data mining (SDM’06), Bethesda, pp 618–622Google Scholar
  25. Nasraoui O, Uribe CC, Coronel CR, González FA (2003) Tecno-streams: tracking evolving clusters in noisy data streams with a scalable immune system learning model. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM’03), Melbourne, pp 235–242Google Scholar
  26. O’Callaghan L, Mishra N, Mishra N, Guha S (2002) Streaming-data algorithms for high quality clustering. In: Proceedings of the 18th international conference on data engineering (ICDE’01), San Jose, pp 685–694Google Scholar
  27. Reynolds CW (1987) Flocks, herds and schools: a distributed behavioral model. In: SIGGRAPH ’87: Proceedings of the 14th annual conference on computer graphics and interactive techniques. ACM, New York, pp 25–34Google Scholar
  28. Sanghamitra B, Giannella C, Maulik U, Kargupta H, Liu K, Datta S (2006) Clustering distributed data streams in peer-to-peer environments. Inform Sci 176(214): 1952–1985Google Scholar
  29. Tan, P-N, Steinbach, M, Kumar, V (eds) (2006) Introduction to data mining. Perason International Edition, BostonGoogle Scholar
  30. Wang Z, Wang B, Zhou C, Xu X (2004) Clustering data streams on the two-tier structure. In: Advanced Web technologies and applications. Springer, New York, pp 416–425Google Scholar
  31. Zhou A, Cao F, Qian W, Jin C (2007) Tracking clusters in evolving data streams over sliding windows. Knowl Inform Syst 15(2): 181–214CrossRefGoogle Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  • Agostino Forestiero
    • 1
  • Clara Pizzuti
    • 1
  • Giandomenico Spezzano
    • 1
  1. 1.National Research Council of Italy–CNRRende (CS)Italy

Personalised recommendations