Extracting Research Communities by Improved Maximum Flow Algorithm

  • Toshihiko Horiike
  • Youhei Takahashi
  • Tetsuji Kuboyama
  • Hiroshi Sakamoto
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5712)

Abstract

In this paper we propose an algorithm, which is an improvement of identification of web communities by [1], to extract research communities from bibliography data. Web graph is huge graph structure consisting nodes and edges, which represent web pages and hyperlinks. An web community is considered to be a set of web pages holding a common topic, in other words, it is a dense subgraph of web graph. Such subgraphs obtained by the max-flow algorithm [1] are called max-flow communities. We then improve this algorithm by introducing the strategy for selection of community nodes. The effectiveness of our improvement is shown by experiments on finding research communities from CiteSeer bibliography data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: KDD 2000, pp. 150–160 (2000)Google Scholar
  2. 2.
    Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.: Self-organization and identification of web communities. IEEE Computer 35(3), 66–71 (2002)CrossRefGoogle Scholar
  3. 3.
    Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the Web for Emerging Cyber-Communities. Computer Networks 31(11-16), 1481–1493 (1999)CrossRefGoogle Scholar
  4. 4.
    Chakrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D., Kleinberg, J.M.: Automatic resource compilation by analyzing hyperlink structure and associated text. Computer Networks 30(1-7), 65–74 (1998)Google Scholar
  5. 5.
    Gibson, D., Kleinberg, J.M., Raghavan, P.: Inferring web communities from link topology. In: Hypertext 1998, pp. 225–234 (1998)Google Scholar
  6. 6.
    Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting Large-Scale Knowledge Bases from the Web. In: VLDB 1999, pp. 639–650 (1999)Google Scholar
  7. 7.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: SODA1998, pp. 668–677 (1998)Google Scholar
  8. 8.
    Goldberg, A.V., Tarjan, R.E.: A new approach to the maximal flow problem. In: STOC 1986, pp. 136–146 (1986)Google Scholar
  9. 9.
    Ford Jr., L., Fulkerson, D.: Maximal flow through a network. Canadian Journal of Mathematics 8, 399–404 (1956)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Edmonds, J., Karp, R.M.: Theoretical improvements in algorithmic efficiency for network flow problems. J. ACM 19(2), 248–264 (1972)CrossRefMATHGoogle Scholar
  11. 11.
  12. 12.
    Imafuji, N., Kitsuregawa, M.: Effects of maximum flow algorithm on identifying web community. In: WIDM 2002, pp. 43–48 (2002)Google Scholar
  13. 13.
    Toyoda, M., Kitsuregawa, M.: Creating a Web community chart for navigating related communities. In: Hypertext 2001, pp. 103–112 (2001)Google Scholar
  14. 14.
    Imafuji, N., Kitsuregawa, M.: Finding a web community by maximum flow algorithm with hits score based capacity. In: DASFAA, pp. 101–106 (2003)Google Scholar
  15. 15.
    Dean, J., Henzinger, M.R.: Finding Related Pages in the World Wide Web. Computer Networks 31(11-16), 1467–1479 (1999)CrossRefGoogle Scholar
  16. 16.
    Asano, Y., Nishizeki, T., Toyoda, M., Kitsuregawa, M.: Mining communities on the web using a max-flow and a site-oriented framework. IEICE Transactions 89-D(10), 2606–2615 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Toshihiko Horiike
    • 1
  • Youhei Takahashi
    • 1
  • Tetsuji Kuboyama
    • 2
  • Hiroshi Sakamoto
    • 1
  1. 1.Kyushu Institute of TechnologyIizukaJapan
  2. 2.Gakushuin UniversityTokyoJapan

Personalised recommendations