Discovery of Web Communities from Positive and Negative Examples
Several attempts have been made for Web structure mining whose goals are to discover Web communities or to rank important pages based on the graph structure of hyperlinks. Discovery of Web communities, groups of related Web pages sharing common interests, is important for assisting users’ information retrieval from the Web. There are several different granularities of overlapping Web communities, and this makes the identification of objective boundaries of Web communities difficult. This paper proposes a method for discovering Web communities from given positive and negative examples. Since the boundary of a Web community is hard to define only from positive examples, negative examples are used for limiting its boundary from outer side of the Web community. Experimental results are shown and the effectiveness of our new method is discussed.
KeywordsSearch Engine Bipartite Graph Graph Structure Edge Betweenness Topic Drift
Unable to display preview. Download preview PDF.
- 1.Chakrabarti, S., Joshi, M.M., Punera, K., Pennock, D.M.: The Structure of Broad Topics on the Web. In: Proc. of the 11th WWW Conference, pp. 251–262 (2002)Google Scholar
- 2.Dean, J., Henzinger, M.R.: Finding Related Pages in the World Wide Web. In: Proc. of the 8th WWW conference (1999)Google Scholar
- 3.Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.M.: Self-Organization and Identification of Web Communities. IEEE Computer 35(3), 66–71 (2002)Google Scholar
- 4.Girvan, M., Newman, M.E.J.: Community Structure in Social and Biological Networks, online manuscript (2001), http://arxiv.org/abs/cond-mat/0112110/
- 5.Google: Google API, (2002) online document, http://www.google.com/apis/
- 7.Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the Web for Emerging Cyber-Communities. In: Proc. of the 8th WWW Conference, pp. 403–415 (1999)Google Scholar
- 10.Murata, T.: Finding Related Web Pages Based on Connectivity Information from a Search Engine. In: Poster Proc. of 10th WWW conference, pp. 18–19 (2001)Google Scholar