Skip to main content

C&C: An Effective Algorithm for Extracting Web Community Cores

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6193))

Included in the following conference series:

Abstract

Communities is a significant pattern of the Web. A community is a group of pages related to a common topic. Web communities are able to be characterized by dense bipartite subgraphs. Each community almost surely contains at least one core. A core is a complete bipartite graph (CBG). Focusing on the issues of extracting such community cores from the Web, in this paper we propose an effective C&C algorithm based on combination and consolidation to extract all embedded cores in web graphs. Experiments on real and large data collections demonstrate that the proposed algorithm C&C is efficient and effective for the community core extraction because: 1) all the largest emerging cores can be identified; 2) identifying all the embedded cores with different sizes only requires one-pass execution of C&C; 3) the extraction process needs no user-determined parameters in C&C.

This work was partially supported by NSFC under grant No. 60873180, and by the start-up funding (#1600-893313) for newly appointed academic staff of Dalian University of Technology, China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adamic, L.A., Huberman, B.A.: Pawer-Law Distribution of the World Wide Web. Science 287, 2115 (2000)

    Article  Google Scholar 

  2. Agrawal, R., Srikanth, R.: Fast algorithms for mining association rules. In: proceedings of 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann, San Fransisco (1994)

    Google Scholar 

  3. Boldi, P., Vigna, S.: The Web Graph Framework: Compression Techniques. In: Proceedings of the Thirteenth International World Wide Web Conference, pp. 595–601. ACM, New York (2004)

    Google Scholar 

  4. Borodin, A., Gareth, O., Jeffrey, S., Tsaparas, P.: Finding authorities and hubs from link structures on the World Wide Web. In: Proceedings of the 10th international conference on World Wide Web, pp. 415–429. ACM, New York (2001)

    Google Scholar 

  5. Dourisboure, Y., Geraci, F., Pellegrini, M.: Extraction and classification of dense communities in the web. In: 16th international conference on World Wide Web, pp. 461–470. ACM, New York (2007)

    Chapter  Google Scholar 

  6. Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of Web communities. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 150–160. ACM, New York (2000)

    Chapter  Google Scholar 

  7. Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.M.: Self-Organization and Identification of Web Communities. Computer 35, 66–71 (2002)

    Article  Google Scholar 

  8. Gibson, D., Kleinberg, J.M., Raghavan, P.: Inferring Web communities from link topology. In: Proceedings of the ninth ACM conference on Hypertext and hypermedia: links, objects, time and space, pp. 225–234. ACM, New York (1998)

    Chapter  Google Scholar 

  9. Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: 31st international conference on Very large data bases, pp. 721–732. ACM, New York (2005)

    Google Scholar 

  10. Hao, J.X., Orlin, J.B.: A faster algorithm for finding the minimum cut in a graph. In: Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms, pp. 165–174. SIAM, Philadelphia (1992)

    Google Scholar 

  11. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  12. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the Web for emerging cyber-communities. Computer Networks 31, 11–16 (1999)

    Article  Google Scholar 

  13. Park, H.W., Thelwall, M.: Hyperlink Analyses of the World Wide Web: A Review. Journal of Computer Mediated Communication 8(4) (2003)

    Google Scholar 

  14. Reddy, P.K., Kitsuregawa, M.: An Approach to Find Related Communities Based on Bipartite Graphs. Institute of Electronics, Information and Communication Engineers 101, 7–14 (2001)

    Google Scholar 

  15. Stoer, M., Wagner, F.: A simple min-cut algorithm. Journal of the ACM 44, 585–591 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  16. WISDOM Lab.: http://wisdom.dlut.edu.cn/

  17. Zhang, Y.C., Yu, J.X., Hou, J.Y.: Web communities: analysis and construction. Springer, Berlin (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, X., Li, Y., Liang, W. (2010). C&C: An Effective Algorithm for Extracting Web Community Cores. In: Yoshikawa, M., Meng, X., Yumoto, T., Ma, Q., Sun, L., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 6193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14589-6_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14589-6_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14588-9

  • Online ISBN: 978-3-642-14589-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics