Abstract
Advances in data acquisition have allowed large data collections of millions of time varying records in the form of data streams. The challenge is to effectively process the stream data with limited resources while maintaining sufficient historical information to define the changes and patterns over time. This paper describes an evidence-based approach that uses representative points to incrementally process stream data by using a graph based method to cluster points based on connectivity and density. Critical cluster features are archived in repositories to allow the algorithm to cope with recurrent information and to provide a rich history of relevant cluster changes if analysis of past data is required. We demonstrate our work with both synthetic and real world data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Proc. 29th Int’l Conf. Very Large Data Bases (2003)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: Proc. Very Large Data Bases, pp. 852–863 (2004)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. 2nd Int’l Conf. Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proc. Sixth SIAM Int’l Conf. Data Mining (2006)
Ester, M., Kriegel, H.P., Sander, J., Wimmer, M., Xu, X.: Incremental clustering for mining in a data warehousing environment. In: Proc. 24rd Int’l Conf. Very Large Data Bases, pp. 323–333 (1998)
Karypis, G., Han, E.H., Kumar, V.: Chameleon: Hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999)
Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Mutlilevel hypergraph partitioning: Application in VLSI domain. IEEE Trans. Very Large Scale Integration (VLSI) Systems 7(1), 69–79 (1999)
Knuth, D.: The Art of Computer Programming, 3rd edn., vol. 3 (1997)
Blackard, J.A., Dean, D.J.: Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Computers and Electronics in Agriculture 24(3), 131–151 (1999)
Aggarwal, C.C.: A human-computer interactive method for projected clustering. IEEE Trans. Knowledge and Data Engineering 16(4), 448–460 (2004)
Bentley, J.L.: Mutlidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lühr, S., Lazarescu, M. (2008). Connectivity Based Stream Clustering Using Localised Density Exemplars. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_62
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)