Discovery of Web Communities from Positive and Negative Examples

  • Tsuyoshi Murata
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2843)


Several attempts have been made for Web structure mining whose goals are to discover Web communities or to rank important pages based on the graph structure of hyperlinks. Discovery of Web communities, groups of related Web pages sharing common interests, is important for assisting users’ information retrieval from the Web. There are several different granularities of overlapping Web communities, and this makes the identification of objective boundaries of Web communities difficult. This paper proposes a method for discovering Web communities from given positive and negative examples. Since the boundary of a Web community is hard to define only from positive examples, negative examples are used for limiting its boundary from outer side of the Web community. Experimental results are shown and the effectiveness of our new method is discussed.


Search Engine Bipartite Graph Graph Structure Edge Betweenness Topic Drift 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chakrabarti, S., Joshi, M.M., Punera, K., Pennock, D.M.: The Structure of Broad Topics on the Web. In: Proc. of the 11th WWW Conference, pp. 251–262 (2002)Google Scholar
  2. 2.
    Dean, J., Henzinger, M.R.: Finding Related Pages in the World Wide Web. In: Proc. of the 8th WWW conference (1999)Google Scholar
  3. 3.
    Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.M.: Self-Organization and Identification of Web Communities. IEEE Computer 35(3), 66–71 (2002)Google Scholar
  4. 4.
    Girvan, M., Newman, M.E.J.: Community Structure in Social and Biological Networks, online manuscript (2001),
  5. 5.
    Google: Google API, (2002) online document,
  6. 6.
    Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: The Web as a Graph: Measurements, Models, and Methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, S.-i., Tokuyama, T. (eds.) COCOON 1999. LNCS, vol. 1627, pp. 1–17. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  7. 7.
    Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the Web for Emerging Cyber-Communities. In: Proc. of the 8th WWW Conference, pp. 403–415 (1999)Google Scholar
  8. 8.
    Murata, T.: Machine Discovery Based on the Co-occurrence of References in a Search Engine. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS (LNAI), vol. 1721, pp. 220–229. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  9. 9.
    Murata, T.: Discovery of Web Communities Based on the Co-occurrence of References. In: Morishita, S., Arikawa, S. (eds.) DS 2000. LNCS (LNAI), vol. 1967, pp. 65–75. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Murata, T.: Finding Related Web Pages Based on Connectivity Information from a Search Engine. In: Poster Proc. of 10th WWW conference, pp. 18–19 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Tsuyoshi Murata
    • 1
    • 2
  1. 1.National Institute of InformaticsTokyoJapan
  2. 2.Japan Science and Technology CorporationTokyoJAPAN

Personalised recommendations