Abstract
Stream data applications have become more and more prominent recently and the requirements for stream clustering algorithms have increased drastically. Due to continuously evolving nature of the stream, it is crucial that the algorithm autonomously detects clusters of arbitrary shape, with different densities, and varying number of clusters. Although available density-based stream clustering are able to detect clusters with arbitrary shapes and varying numbers, they fail to adapt their thresholds to detect clusters with different densities. In this paper we propose a stream clustering algorithm called HASTREAM, which is based on a hierarchical density-based clustering model that automatically detects clusters of different densities. The density thresholds are independently adapted to the existing data without the need of any user intervention. To reduce the high computational cost of the presented approach, techniques from the graph theory domain are utilized to devise an incremental update of the underlying model. To show the effectiveness of HASTREAM and hierarchical density-based approaches in general, several synthetic and real world data sets are evaluated using various quality measures. The results showed that the hierarchical property of the model was able to improve the quality of density-based stream clusterings and enabled HASTREAM to detect streaming clusters of different densities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB 2003, pp. 81–92 (2003)
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. SIGMOD Rec. (1999)
UCI KDD archive. Data set, http://archive.ics.uci.edu/ml/datasets/Covertype .
Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part II. LNCS, vol. 7819, pp. 160–172. Springer, Heidelberg (2013)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SDM 2006, pp. 328–339 (2006)
Dai, B.-R., Huang, J.-W., Yeh, M.-Y., Chen, M.-S.: Adaptive clustering for multiple evolving streams. TKDE 2006 18(9), 1166–1180 (2006)
Ester, M., Kriegel, H.-P., Jörg, S., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD 1996, pp. 226–231 (1996)
Kranen, P., Assent, I., Baldauf, C., Seidl, T.: Self-adaptive anytime stream clustering. In: ICDM 2009, pp. 249–258 (2009)
Kremer, H., Kranen, P., Jansen, T., Seidl, T., Bifet, A., Holmes, G., Pfahringer, B.: An effective evaluation measure for clustering on evolving data streams. In: KDD 2011, pp. 868–876 (2011)
MOA. Framework, http://moa.cms.waikato.ac.nz/details/stream-clustering/ .
Muller, D.W., Sawitzki, G.: Excess mass estimates and tests for multimodality. Journal of the American Statistical Association, 738–746 (1991)
Physiological. Data set, http://www.cs.purdue.edu/commugrate/data/2004icml/ .
Prim, R.C.: Shortest connection networks and some generalizations. The Bell Systems Technical Journal, 1389–1401 (1957)
Tasoulis, D.K., Ross, G., Adams, N.M.: Visualising the cluster structure of data streams. In: Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 81–92. Springer, Heidelberg (2007)
Wan, L., Ng, W.K., Dang, X.H., Yu, P.S., Zhang, K.: Density-based clustering of data streams at multiple resolutions. ACM Trans. Knowl. Discov. Data, 14:1–14:28 (2009)
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn., 311–331 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hassani, M., Spaus, P., Seidl, T. (2014). Adaptive Multiple-Resolution Stream Clustering. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2014. Lecture Notes in Computer Science(), vol 8556. Springer, Cham. https://doi.org/10.1007/978-3-319-08979-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-08979-9_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08978-2
Online ISBN: 978-3-319-08979-9
eBook Packages: Computer ScienceComputer Science (R0)