Abstract
Consider the problem of identifying dense subgroups of data points exhibiting strong correlations in data stream. Such correlation connected clusters are meaningful in many applications. However, the inherent sparsity of high-dimensional space means that the correlations are local for specific subspace, and moreover, the correlation itself can be of arbitrarily complex direction, which blinds most traditional methods. We present ACID, a framework that can effectively detect correlation connected clusters in high dimensional stream. It has high scalability on both the size of stream and the dimension of data, and is robust against noise. Experiments on synthetic and real datasets are done to show its effectiveness and efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.: A Human-Computer Interactive Method for Projected Clustering. IEEE Transactions on Knowledge and Data Engineering 7(16), 448–460 (2004)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Application. In: Proc. of ACM SIGMOD Conf. (1998)
Aggarwal, C., Han, J., Wang, J., Yu, P.: A Framework for Projected Clustering of High Dimensional Data Streams. In: Proc. of 30th VLDB Conf. (2004)
Aggarwal, C., Han, J., Wang, J., Yu, P.: A Framework for Clustering Evolving Data Streams. In: Proc. of VLDB Conf. (2003)
Aggarwal, C., Procopiuc, C.: Fast Algorithms for Projected Clustering. In: Proc. of ACM SIGMOD Conf. (1999)
Aggarwal, C., Yu, P.: Finding Generalized Projected Clusters in High Dimensional Spaces. In: Proc. of ACM SIGMOD Conf. (2000)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: Proc. of ACM POD Conf. (2002)
Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing Clusters of Correlation Connected Objects. In: Proc. of ACM SIGMOD Conf. (2004)
Gehrke, J., Korn, F., Srivastava, D.: On Computing Correlated Aggregates Over Continual Data Streams. In: Proc. of ACM SIGMOD Conf. (2001)
Ng, R., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proc. of VLDB Conf. (1994)
Ong, K., Li, W., Ng, W., Lim, E.: SCLOPE: An Algorithm for Clustering Data Streams of Categorical Attributes. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 209–218. Springer, Heidelberg (2004)
Procopiuc, C., Jones, M., Agarwal, P., Murali, M.: A Monte Carlo Algorithm for Fast Projective Clustering. In: Proc. of ACM SIGMOD Conf. (2002)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: Proc. of ACM SIGMOD Conf. (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, T. (2006). Generalized Projected Clustering in High-Dimensional Data Streams. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_72
Download citation
DOI: https://doi.org/10.1007/11610113_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31142-3
Online ISBN: 978-3-540-32437-9
eBook Packages: Computer ScienceComputer Science (R0)