Web Page Grouping Based on Parameterized Connectivity

  • Tomonari Masada
  • Atsuhiro Takasu
  • Jun Adachi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2973)


We propose a novel method for Web page grouping based only on hyperlink information. Because of the explosive growth of the Web, page grouping is expected to provide a general grasp of the Web for effective Web search and netsurfing. The Web can be regarded as a gigantic digraph where pages are vertices and links are arcs. Our method is a generalization of the decomposition into strongly connected components. Each group is constructed as a subset of a strongly connected component. Moreover, group sizes can be controlled by a parameter, called the threshold parameter. We call the resulting groups parameterized connected components. The algorithm is simple and admits parallelization. Notably, we apply Dijkstra’s shortest path algorithm in our method.


Threshold Parameter Vector Space Model Center Vertex Vertex Group Parameterized Connectivity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  2. 2.
    Kanazawa, T., Takasu, A., Adachi, J.: A relevance-based superimposition model for effective information retrieval. IEICE Trans. Inf. & Syst. E83-D(12), 2152–2160 (2000)Google Scholar
  3. 3.
    Tajima, K., Mizuuchi, Y., Kitagawa, M., Tanaka, K.: Cut as a querying unit for WWW, Netnews, and E-mail. In: Proc. ACM Hypertext 1998, pp. 235–244 (1998)Google Scholar
  4. 4.
    Dhillon, S.: Co-clustering documents and words using bipartite spectral graph partitioning, tech. rep. #TR 2001-05, Dept. of Computer Sciences, University of Texas at Austin (2001)Google Scholar
  5. 5.
    Kannan, R., Vempala, S., Vetta, A.: On clusterings - good, bad and spectral. In: Proc. 41st FOCS, pp. 367–377 (2000)Google Scholar
  6. 6.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Finding authorities and hubs from link structures on the world wide web. In: Proc. 10th WWW Conf., pp. 415–429 (2001)Google Scholar
  8. 8.
    Masada, T., Takasu, A., Adachi, J.: Decomposing the web graph into parameterized connected components. IEICE Trans. Special Issue on Information Processing Technology for Web Utilization E87-D(2) (2004)Google Scholar
  9. 9.
    Broder, Z., Kumar, S.R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the web. In: Proc. 9th WWW Conf., pp. 309–320 (2000)Google Scholar
  10. 10.
    Cooper, C., Frieze, A.: The size of the largest strongly connected component of a random digraph with a given degree sequence, pre-print (2002), available at
  11. 11.
    Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Takaoka, T.: Theory of trinomial heaps. In: Du, D.-Z., Eades, P., Sharma, A.K., Lin, X., Estivill-Castro, V. (eds.) COCOON 2000. LNCS, vol. 1858, pp. 362–372. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. 13.
    Tarjan, R.E.: Depth first search and linear graph algorithms. SIAM J. Comput. 1, 146–160 (1972)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Tomonari Masada
    • 1
  • Atsuhiro Takasu
    • 2
  • Jun Adachi
    • 2
  1. 1.Graduate School of Information Science and TechnologyThe University of TokyoBunkyo-ku, TokyoJapan
  2. 2.The National Institute of InformaticsChiyoda-ku, TokyoJapan

Personalised recommendations