Dynamic Agglomerative-Divisive Clustering of Clickthrough Data for Collaborative Web Search

  • Kenneth Wai-Ting Leung
  • Dik Lun Lee
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5981)


In this paper, we model clickthroughs as a tripartite graph involving users, queries and concepts embodied in the clicked pages. We develop the Dynamic Agglomerative-Divisive Clustering (DADC) algorithm for clustering the tripartite clickthrough graph to identify groups of similar users, queries and concepts to support collaborative web search. Since the clickthrough graph is updated frequently, DADC clusters the graph incrementally, whereas most of the traditional agglomerative methods cluster the whole graph all over again. Moreover, clickthroughs are usually noisy and reflect diverse interests of the users. Thus, traditional agglomerative clustering methods tend to generate large clusters when the clickthrough graph is large. DADC avoids generating large clusters using two interleaving phases: the agglomerative and divisive phases. The agglomerative phase iteratively merges similar clusters together to avoid generating sparse clusters. On the other hand, the divisive phase iteratively splits large clusters into smaller clusters to maintain the coherence of the clusters and restructures the existing clusters to allow DADC to incrementally update the affected clusters as new clickthrough data arrives.


Average Precision Similar User Hierarchical Cluster Method Query Node Similar Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proc. of ACM SIGKDD Conference (2000)Google Scholar
  2. 2.
    Church, K.W., Gale, W., Hanks, P., Hindle, D.: Using statistics in lexical analysis. In: Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon (1991)Google Scholar
  3. 3.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. JASA 58(301) (1963)Google Scholar
  4. 4.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proc. of ACM SIGKDD Conference (2002)Google Scholar
  5. 5.
    Leung, K.W.T., Ng, W., Lee, D.L.: Personalized concept-based clustering of search engine queries. IEEE TKDE 20(11) (2008)Google Scholar
  6. 6.
    Ng, W., Deng, L., Lee, D.L.: Mining user preference using spy voting for search engine personalization. ACM TOIT 7(4) (2007)Google Scholar
  7. 7.
    Rodrigues, P.P., Gama, J.: Semi-fuzzy splitting in online divisive-agglomerative clustering. In: Neves, J., Santos, M.F., Machado, J.M. (eds.) EPIA 2007. LNCS (LNAI), vol. 4874, pp. 133–144. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Rodrigues, P.P., Gama, J., Pedroso, J.P.: Hierarchical clustering of time-series data streams. IEEE TKDE 20(5) (2008)Google Scholar
  9. 9.
    Sun, J.T., Zeng, H.J., Liu, H., Lu, Y.: Cubesvd: A novel approach to personalized web search. In: Proc. of WWW Conference (2005)Google Scholar
  10. 10.
    Wang, X., Sun, J.T., Chen, Z., Zhai, C.: Latent semantic analysis for multiple-type interrelated data objects. In: Proc. of ACM SIGIR Conference (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Kenneth Wai-Ting Leung
    • 1
  • Dik Lun Lee
    • 1
  1. 1.Department of Computer Science and EngineeringThe Hong Kong University of Science and TechnologyHong Kong

Personalised recommendations