Web Communities Identification from Random Walks

  • Jiayuan Huang
  • Tingshao Zhu
  • Dale Schuurmans
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4213)


We propose a technique for identifying latent Web communities based solely on the hyperlink structure of the WWW, via random walks. Although the topology of the Directed Web Graph encodes important information about the content of individual Web pages, it also reveals useful meta-level information about user communities. Random walk models are capable of propagating local link information throughout the Web Graph, which can be used to reveal hidden global relationships between different regions of the graph. Variations of these random walk models are shown to be effective at identifying latent Web communities and revealing link topology. To efficiently extract these communities from the stationary distribution defined by a random walk, we exploit a computationally efficient form of directed spectral clustering. The performance of our approach is evaluated in real Web applications, where the method is shown to effectively identify latent Web communities based on link topology only.


Random Walk Singular Value Decomposition Directed Graph Spectral Cluster Random Walk Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Chakrabarti, S., Gibson, D., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Spectral filtering for resource discovery. In: ACM SIGIR workshop on Hypertext Information Retrieval on the Web (1998)Google Scholar
  2. 2.
    Chung, F.: Spectral Graph Theory. American Mathematical Society (1997)Google Scholar
  3. 3.
    Ding, C., He, X., Husbands, P., Zha, H., Simon, H.: PageRank, HITS and a unified framework for link analysis. Technical report, LBNL (2002)Google Scholar
  4. 4.
    Flake, G., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: ACM SIGKDD (2000)Google Scholar
  5. 5.
    Flake, G., Lawrence, S., Giles, C.L., Coetzee, F.: Self-organization and identification of web communities. IEEE Computer, Los Alamitos (2002)Google Scholar
  6. 6.
    Gibson, D., Kleinberg, J., Raghavan, P.: Inferring web communities from link topology. In: UK Conference on Hypertext (1998)Google Scholar
  7. 7.
    Henzinger, M.: Hyperlink analysis for the web. IEEE Internet Computing (2001)Google Scholar
  8. 8.
    Ino, H., Kudo, M., Nakamura, A.: Partitioning of web graphs by community topology. In: WWW (2005)Google Scholar
  9. 9.
    Kessler, M.: Bibliographic coupling between scientific papers. In: American Documentation (1963)Google Scholar
  10. 10.
    Kleinberg, J.: Authoritative sources in a hyperlinked environment. JACM (1999)Google Scholar
  11. 11.
    Lempel, R., Moran, S.: The stochastic approach for link-structure analysis (salsa) and the tkc effect. In: WWW, pp. 387–401 (2000)Google Scholar
  12. 12.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)Google Scholar
  13. 13.
    Perona, P., Freeman, W.: A factorization approach to grouping. In: Burkhardt, H.-J., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1406, pp. 655–670. Springer, Heidelberg (1998)Google Scholar
  14. 14.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. In: IEEE PAMI (2000)Google Scholar
  15. 15.
    Small, H.: Co-citation in the scientific literature: A new measure of the relationship between two documents. JASIS (1973)Google Scholar
  16. 16.
    Zhou, D., Huang, J., Scholkopf, B.: Learning from labeled and unlabeled data on a directed graph. In: ICML (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jiayuan Huang
    • 1
    • 2
  • Tingshao Zhu
    • 2
  • Dale Schuurmans
    • 2
  1. 1.University of WaterlooWaterlooCanada
  2. 2.University of AlbertaEdmontonCanada

Personalised recommendations