Advertisement

A Data Stream Clustering Algorithm Based on Density and Extended Grid

  • Zheng Hua
  • Tao Du
  • Shouning Qu
  • Guodong Mou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10362)

Abstract

Based on the traditional grid density clustering algorithm, proposing A Data Stream Clustering Algorithm Based on Density and Extended Grid(DEGDS). The algorithm combines the advantages of grid clustering algorithm and density clustering algorithm, by improving the defects of clustering parameters by artificially set, get any shape of the cluster. The algorithm uses the local density of each sample point and the distance from the other sample points, determining the number of clustering centers in the grid, and realizing the automatic determination of the clustering center, which avoids the influence of improper selection of initial centroid on clustering results. And in the process of combining the Spark parallel framework for partitioning the data to achieve its parallelization. For data points clustered outside the grid, the clustering within the grid has been effectively expanded by extending the grid, to ensure the accuracy of clustering. Introduced density estimation is connected and grid boundaries to merging grid, saving memory consumption. Using the attenuation factor to incremental update grid density, reflect the evolution of spatial data stream. The experimental results show that compared with the traditional clustering algorithm, the DEGDS algorithm has a large performance improvement in accuracy and efficiency, and can be effectively for large data clustering.

Keywords

Density clustering Grid clustering Data stream Spark parallel 

References

  1. 1.
    J. Comput. Appl. 36(12), 3292–3297 (2016)Google Scholar
  2. 2.
    Fiori, A., Mignone, A., Rospo, G.: DeCoClu: density consensus clustering approach for public transport data. Inf. Sci. 328, 378–388 (2016)CrossRefGoogle Scholar
  3. 3.
    Tang, Y.: A distributed data flow clustering algorithm based on grid block. Small Microcomput. Syst. 37(3), 488–493 (2016)Google Scholar
  4. 4.
    Gao, Y.: A data flow clustering algorithm based on grid and density. Comput. Sci. 35(2), 134–137 (2008)Google Scholar
  5. 5.
    Ma, C., Hong, S.: A dense peak clustering algorithm based on cluster center point automatic selection strategy. Comput. Sci. 43(7), 255–258 (2016)Google Scholar
  6. 6.
    Jiang, L.: Optimization of fast clustering algorithm for fast search and discovery density. Appl. Res. Comput. 33(11), 3251–3254 (2016)Google Scholar
  7. 7.
    Zheng, Y.: Data flow clustering algorithm based on mobile grid and density. Comput. Eng. Appl. 45(8), 129–131 (2009)Google Scholar
  8. 8.
    Feng, C.: Data Flow Clustering Analysis Algorithm. Fudan University (2006)Google Scholar
  9. 9.
    Chen, J.Y., He, H.H.: A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data. Inf. Sci. 345(C), 271–293 (2016)CrossRefGoogle Scholar
  10. 10.
    Skála, J., Kolingerová, I.: Dynamic hierarchical triangulation of a clustered data stream. Comput. Geosci. 37(8), 1092–1101 (2011)CrossRefGoogle Scholar
  11. 11.
    Samwel, B., Whipkey, C.: Efficient top-down hierarchical join on a hierarchically clustered data stream (2016)Google Scholar
  12. 12.
    Krawczyk, B., Stefanowski, J., Wozniak, M.: Data stream classification and big data analytics. Neurocomputing 150, 238–239 (2015)CrossRefGoogle Scholar
  13. 13.
    Nguyen, H.L., Woon, Y.K., Ng, W.K.: A survey on data stream clustering and classification. Knowl. Inf. Syst. 45(3), 1–35 (2015)CrossRefGoogle Scholar
  14. 14.
    Xu, S., Wang, J.: Dynamic extreme learning machine for data stream classification. Neurocomputing 238, 433–449 (2017)CrossRefGoogle Scholar
  15. 15.
    Xiaoyun, C., Yufang, M., Yan, Z., et al.: GMDBSCAN: multi-density DBSCAN cluster based on grid. In: IEEE International Conference on E-Business Engineering, pp. 780–783. IEEE (2008)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Zheng Hua
    • 1
    • 2
  • Tao Du
    • 1
    • 2
  • Shouning Qu
    • 1
    • 2
  • Guodong Mou
    • 1
    • 2
  1. 1.School of Information Science and EngineeringUniversity of JinanJinanChina
  2. 2.Shandong Provincial Key Laboratory of Network Based Intelligent ComputingUniversity of JinanJinanChina

Personalised recommendations