Abstract
Now we are in the age of big data. Huge amount of data and information are generated every time. Traditional data stream algorithms are suit for the data streams with low dimension and simple structure. However, with the development of information technology, the produced data streams are becoming more and more complicated. It is particularly important to study how to find new associations and patterns from complex data to achieve the cognition ability and judgment ability like human brain. Clustering data streams with mixed attributes of irregular distribution is a big challenge in data mining. To solve this problem, we present an adaptive density data stream clustering algorithm—ADStream. ADStream is based on the online–off-line clustering framework. It can automatically recognize the initial clusters by passing messages between data points. Then a novel time-decay density clustering strategy is designed to group and update the continuously arriving data streams. Comprehensive experimental results demonstrate that ADStream is adaptive to the evolving data streams and may generate high-quality clusters with fast processing rate.
Similar content being viewed by others
References
Huang XX, Huang HX, Liao BS, et al. An ontology-based approach to metaphor cognitive computation. Mind Mach. 2013;23(1):105–21.
Ding SF, Wu FL, Qian J, Jia HJ, Jin FX. Research on data stream clustering algorithms. Artif Intell Rev. 2015;43(4):593–600.
Byun SS, Balashingham I, Vasilakos AV, et al. Computation of an equilibrium in spectrum markets for cognitive radio networks. IEEE Trans Comput. 2014;63(2):304–16.
Zeng XQ, Li GZ. Incremental partial least squares analysis of big streaming data. Pattern Recogn. 2014;47(11):3726–35.
Mital PK, Smith TJ, Hill RL, et al. Clustering of gaze during dynamic scene viewing is predicted by motion. Cogn Comput. 2011;3(1):5–24.
Sancho-Asensio A, Navarro J, Arrieta-Salinas I, et al. Improving data partition schemes in Smart Grids via clustering data streams. Expert Syst Appl. 2014;41(13):5832–42.
Bian XY, Zhang TX, Zhang XL, et al. Clustering-based extraction of near border data samples for remote sensing image classification. Cogn Comput. 2013;5(1):19–31.
Amini A, Wah TY, Saboohi H. On density-based data streams clustering algorithms: a survey. J Comput Sci Technol. 2014;29(1):116–41.
Jia HJ, Ding SF, Xu XZ, Nie R. The latest research progress on spectral clustering. Neural Comput Appl. 2014;24(7–8):1477–86.
Yu J, Liu DQ, Tao DC, et al. Complex object correspondence construction in two-dimensional animation. IEEE Trans Image Process. 2011;20(11):3257–69.
Ding SF, Jia HJ, Zhang LW, et al. Research of semi-supervised spectral clustering algorithm based on pairwise constraints. Neural Comput Appl. 2014;24(1):211–9.
Yu J, Hong RC, Wang M, et al. Image clustering based on sparse patch alignment framework. Pattern Recogn. 2014;47(11):3512–9.
O’Callaghan L, Mishra N, Meyerson A, et al. Streaming-data algorithms for high quality clustering. In: Proceedings of IEEE international conference on data engineering, 2002, p. 685–694.
Aggarwal C, Han J, Wang J, et al. A framework for clustering evolving data streams. In: Proceedings of the 29th VLDB conference, 2003, p .81–92.
.Aggarwal CC, Han JW, Wang JY, et al. A framework for projected clustering of high dimensional data streams. In: Proceedings of the 30th international conference on very large data bases, 2004, p. 852–863.
Cao F, Ester M, Qian W, et al. Density-based clustering over an evolving data stream with noise. In: Proceedings of the SIAM conference on data ming, 2006, p. 328–339.
Chen Y, Tu L. Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, 2007, p. 133–142.
Zhu WH, Yin J, Xie YH. Arbitrary shape cluster algorithm for clustering data stream. J Softw. 2006;17(3):379–87.
Dai DB, Zhao G, Sun SL. Effective clustering algorithm for probabilistic data stream. J Softw. 2009;20(5):1313–28.
Pereira CMM, de Mello RF. TS-stream: clustering time series on data streams. J Intel Inform Syst. 2014;42(3):531–66.
Miller Z, Dickinson B, Deitrick W, et al. Twitter spammer detection using data stream clustering. Inf Sci. 2014;260:64–73.
Rodrigues PP, Gama J. Distributed clustering of ubiquitous data streams. Wiley Interdiscip Rev Data Mining Knowl Discov. 2014;4(1):38–54.
Albertini MK, de Mello RF. Energy-based function to evaluate data stream clustering. Adv Data Anal Classif. 2013;7(4):435–64.
Jin CQ, Yu JX, Zhou AY, et al. Efficient clustering of uncertain data streams. Knowl Inf Syst. 2014;40(3):509–39.
Vallim RMM, Andrade JA, de Mello RF, et al. Unsupervised density-based behavior change detection in data streams. Intell Data Anal. 2014;18(2):181–201.
Frey BJ, Dueck D. Clustering by passing messages between data points. Science. 2007;315(5814):972–6.
Wang KJ, Zheng J. Specified number of classes under the affinity propagation clustering fast algorithm. Comput Syst Appl. 2010;19(7):207–9.
Wang CD, Lai JH, Suen CY, et al. Multi-exemplar affinity propagation. IEEE Trans Pattern Anal Mach Intell. 2013;35(9):2223–37.
Mu Y, Ding W, Zhou TY, et al. Constrained stochastic gradient descent for large-scale least squares problem. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, 2013, p. 883–891.
Clerc M, Kennedy J. The particle swarm—explosion, stability, and convergence in a multidimensional complex space. IEEE Trans Evol Comput. 2002;6(1):58–73.
Acknowledgments
This work is supported by the National Natural Science Foundation of China (No. 61379101), and the National Key Basic Research Program of China (No. 2013CB329502).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ding, S., Zhang, J., Jia, H. et al. An Adaptive Density Data Stream Clustering Algorithm. Cogn Comput 8, 30–38 (2016). https://doi.org/10.1007/s12559-015-9342-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-015-9342-z