Dynamic Incremental Data Summarization for Hierarchical Clustering

  • Bing Liu
  • Yuliang Shi
  • Zhihui Wang
  • Wei Wang
  • Baile Shi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4016)


In many real world applications, with the databases frequent insertions and deletions, the ability of a data mining technique to detect and react quickly to dynamic changes in the data distribution and clustering over time is highly desired. Data summarizations (e.g., data bubbles) have been proposed to compress large databases into representative points suitable for subsequent hierarchical cluster analysis. In this paper, we thoroughly investigate the quality measure (data summarization index) of incremental data bubbles. When updating databases, we show which factors could affect the mean and standard deviation of data summarization index or not. Based on these statements, a fully dynamic scheme to maintain data bubbles incrementally is proposed. An extensive experimental evaluation confirms our statements and shows that the fully dynamic incremental data bubbles are effective in preserving the quality of the data summarization for hierarchical clustering.


Cluster Algorithm Hierarchical Cluster Cluster Structure Hierarchical Cluster Algorithm Incremental Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. In: 5th Berkeley Symp. Math. Statist. Prob., pp. 281–297 (1967)Google Scholar
  2. 2.
    Ankerst, M., Breuing, M., Kriegel, H.-P., Sander, J.: OPTICS: Ordering Points to Identify the Clustering Structure. In: SIGMOD 1999, pp. 49–60 (1999)Google Scholar
  3. 3.
    Sibson, R.: SLINK: An Optimally Efficient Algorithm for the Single-link Cluster Method. The Computer Journal 16(1), 30–34 (1973)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Sander, J., Qin, X., Lu, Z., Niu, N., Kovarsky, A.: Automated Extraction of Clusters from Hierarchical Clustering Representations. In: PAKDD 2003 (2003)Google Scholar
  5. 5.
    Breuing, M., Kriegel, H.-P., Kroger, P., Sander, J.: Data Bubbles: Quality Preserving Performance Boosting for Hierarchical Clustering. In: SIGMOD 2001, pp. 79–90 (2001)Google Scholar
  6. 6.
    Zhang, T., Ramakrishnan, R., Linvy, M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: SIGMOD 1996, pp. 103–114 (1996)Google Scholar
  7. 7.
    Chen, C., Hwang, S., Oyang, Y.: An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 237. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    Ester, M., Kriegel, H.-P., Sander, J., Wimmer, M., Xu, X.: Incremental Clustering for Mining in a Data Warehousing Enviornment. In: VLDB 1998, pp. 323–333 (1998)Google Scholar
  9. 9.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: KDD 1996, pp. 226–231 (1996)Google Scholar
  10. 10.
    Widyantoro, D.H., Ioerger, T.R., Yen, J.: An Incremental Approach to Building a Cluster Hierarchy. In: ICDM 2002, pp. 705–708 (2002)Google Scholar
  11. 11.
    Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental Clustering and Dynamic Information Retrieval. In: 29th Symposium on Theory of Computing, pp. 626–635 (1997)Google Scholar
  12. 12.
    Nassar, S., Sander, J., Cheng, C.: Incremental and Effective Data Summarization for Dynamic Hierarchical Clustering. In: SIGMOD 2004, pp. 467–478 (2004)Google Scholar
  13. 13.
    Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Document Clustering. In: KDD 1999, pp. 16–22 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Bing Liu
    • 1
  • Yuliang Shi
    • 1
  • Zhihui Wang
    • 1
  • Wei Wang
    • 1
  • Baile Shi
    • 1
  1. 1.Department of Computing and Information TechnologyFudan UniversityShanghaiChina

Personalised recommendations