Dynamic Incremental Data Summarization for Hierarchical Clustering

Liu, Bing; Shi, Yuliang; Wang, Zhihui; Wang, Wei; Shi, Baile

doi:10.1007/11775300_35

Bing Liu¹⁹,
Yuliang Shi¹⁹,
Zhihui Wang¹⁹,
Wei Wang¹⁹ &
…
Baile Shi¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4016))

Included in the following conference series:

International Conference on Web-Age Information Management

1250 Accesses
1 Citations

Abstract

In many real world applications, with the databases frequent insertions and deletions, the ability of a data mining technique to detect and react quickly to dynamic changes in the data distribution and clustering over time is highly desired. Data summarizations (e.g., data bubbles) have been proposed to compress large databases into representative points suitable for subsequent hierarchical cluster analysis. In this paper, we thoroughly investigate the quality measure (data summarization index) of incremental data bubbles. When updating databases, we show which factors could affect the mean and standard deviation of data summarization index or not. Based on these statements, a fully dynamic scheme to maintain data bubbles incrementally is proposed. An extensive experimental evaluation confirms our statements and shows that the fully dynamic incremental data bubbles are effective in preserving the quality of the data summarization for hierarchical clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. In: 5th Berkeley Symp. Math. Statist. Prob., pp. 281–297 (1967)
Google Scholar
Ankerst, M., Breuing, M., Kriegel, H.-P., Sander, J.: OPTICS: Ordering Points to Identify the Clustering Structure. In: SIGMOD 1999, pp. 49–60 (1999)
Google Scholar
Sibson, R.: SLINK: An Optimally Efficient Algorithm for the Single-link Cluster Method. The Computer Journal 16(1), 30–34 (1973)
Article MathSciNet Google Scholar
Sander, J., Qin, X., Lu, Z., Niu, N., Kovarsky, A.: Automated Extraction of Clusters from Hierarchical Clustering Representations. In: PAKDD 2003 (2003)
Google Scholar
Breuing, M., Kriegel, H.-P., Kroger, P., Sander, J.: Data Bubbles: Quality Preserving Performance Boosting for Hierarchical Clustering. In: SIGMOD 2001, pp. 79–90 (2001)
Google Scholar
Zhang, T., Ramakrishnan, R., Linvy, M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: SIGMOD 1996, pp. 103–114 (1996)
Google Scholar
Chen, C., Hwang, S., Oyang, Y.: An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 237. Springer, Heidelberg (2002)
Chapter Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Wimmer, M., Xu, X.: Incremental Clustering for Mining in a Data Warehousing Enviornment. In: VLDB 1998, pp. 323–333 (1998)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: KDD 1996, pp. 226–231 (1996)
Google Scholar
Widyantoro, D.H., Ioerger, T.R., Yen, J.: An Incremental Approach to Building a Cluster Hierarchy. In: ICDM 2002, pp. 705–708 (2002)
Google Scholar
Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental Clustering and Dynamic Information Retrieval. In: 29th Symposium on Theory of Computing, pp. 626–635 (1997)
Google Scholar
Nassar, S., Sander, J., Cheng, C.: Incremental and Effective Data Summarization for Dynamic Hierarchical Clustering. In: SIGMOD 2004, pp. 467–478 (2004)
Google Scholar
Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Document Clustering. In: KDD 1999, pp. 16–22 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing and Information Technology, Fudan University, Shanghai, China
Bing Liu, Yuliang Shi, Zhihui Wang, Wei Wang & Baile Shi

Authors

Bing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuliang Shi
View author publications
You can also search for this author in PubMed Google Scholar
Zhihui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Baile Shi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Chinese University of Hong Kong, Hong Kong, China
Jeffrey Xu Yu
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
Department of Computing, Hong Kong Polytechnic University, Hong Kong
Hong Va Leong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, B., Shi, Y., Wang, Z., Wang, W., Shi, B. (2006). Dynamic Incremental Data Summarization for Hierarchical Clustering. In: Yu, J.X., Kitsuregawa, M., Leong, H.V. (eds) Advances in Web-Age Information Management. WAIM 2006. Lecture Notes in Computer Science, vol 4016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11775300_35

Download citation

DOI: https://doi.org/10.1007/11775300_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35225-9
Online ISBN: 978-3-540-35226-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics