Skip to main content

Massive Graph Management for the Web and Web 2.0

  • Chapter
New Directions in Web Data Management 1

Part of the book series: Studies in Computational Intelligence ((SCI,volume 331))

Abstract

The problem of efficiently managing massive datasets has gained increasing attention due to the availability of a plethora of data from various sources, such as the Web. Moreover, Web 2.0 applications seem to be one of the most fruitful sources of information as they have attracted the interest of a large number of users that are eager to contribute to the creation of new data, available online. Several Web 2.0 applications incorporate Social Tagging features, allowing users to upload and tag sets of online resources. This activity produces massive amounts of data on a daily basis, which can be represented by a tripartite graph structure that connects users, resources and tags. The analysis of Social Tagging Systems (STS) emerges as a promising research field, enabling the identification of common patterns in the behavior of users, or the identification of communities of semantically related tags and resources, and much more. The massive size of STS datasets dictates the necessity for a robust underlying infrastructure to be used for their storage and access.

This chapter contains a survey of existing solutions to the problem of storing and managing massive graph data focusing particularly on the implications that the underlying technologies of such frameworks have on the support/operation of Web 2.0 applications using them as back-end storage solutions, as well as on the efficient execution of web mining tasks. Considering the category of STS as an example of Web 2.0 applications, the requirements that are posed for the management of STS data are thoroughly discussed. On the basis of these requirements three frameworks have been developed, using state-of-the-art technologies as backbones. The results of benchmarks conducted on the developed frameworks are presented and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abello, J., Buchsbaum, A.L., Westbrook, J.: A Functional Approach to External Graph Algorithms. Algorithmica 32(3), 437–458 (1998)

    Article  MathSciNet  Google Scholar 

  2. Ahn, K.J., Guha, S.: Graph Sparsification in the Semi-streaming Model. In: ICALP(2), pp. 328–338 (2009)

    Google Scholar 

  3. Bader, D., Madduri, K.: Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2. In: Proceedings of the ICPP 2006. IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  4. Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: Proceeding of the KDD 2008, pp. 16–24. ACM Press, New York (2008)

    Chapter  Google Scholar 

  5. Beynon, M.D., Kurc, T., Catalyurek, U., Chang, C., Sussman, A., Saltz, J.: Distributed processing of very large datasets with DataCutter. Parallel Comput. 27(11), 1457–1478 (2001)

    Article  MATH  Google Scholar 

  6. Boldi, P., Vigna, S.: The WebGraph Framework I: Compression Techniques. In: Proceedings of the WWW 2004, pp. 595–602. ACM, New York (2004)

    Chapter  Google Scholar 

  7. Boldi, P., Vigna, S.: The WebGraph Framework II: Codes For The World-Wide Web. In: Proceedings of the DCC 2004, vol. 528. IEEE Computer Society, Los Alamitos (2004)

    Google Scholar 

  8. Boldi, P., Santini, M., Vigna, S.: Permuting Web Graphs. In: Avrachenkov, K., Donato, D., Litvak, N. (eds.) WAW 2009. LNCS, vol. 5427, pp. 116–126. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  9. Bothorel, C., Bouklit, M.: An algorithm for detecting communities in folksonomy hypergraphs. Appeared in I2CS 2008, Schoelcher, Martinique, Sponsored by IEEE (2008)

    Google Scholar 

  10. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30(1-7), 107–117 (1998)

    Article  Google Scholar 

  11. Brinkmeier, M., Werner, J., Recknagel, S.: Communities in graphs and hypergraphs. In: Proceedings of CIKM 2007, Lisbon, Portugal, pp. 869–872. ACM, New York (2007)

    Chapter  Google Scholar 

  12. Brooks, C.H., Montanez, N.: Improved annotation of the blogosphere via autotagging and hierarchical clustering. In: Proceedings of the WWW 2006, pp. 625–632. ACM, New York (2006)

    Chapter  Google Scholar 

  13. Buchsbaum, A.L., Giancarlo, R., Racz, B.: New results for finding common neighborhoods in massive graphs in the data stream model. Theor. Comput. Sci. 407(1-3), 302–309 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  14. Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: Proceedings of the WSDM 2008. ACM, New York (2008)

    Google Scholar 

  15. Buriol, L.S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., Sohler, C.: Counting triangles in data streams. Proceedings of the PODS 2006, pp. 253–262. ACM, New York (2006)

    Google Scholar 

  16. Cattuto, C., Baldassarri, A., Servedio, D.P.V., Loreto, V.: Emergent Community Structure In Social Tagging Systems. Advances in Complex Systems (ACS) 11(4), 597–608 (2008)

    Article  MATH  Google Scholar 

  17. Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: Proceedings of the KDD 2009, pp. 219–228. ACM, New York (2009)

    Chapter  Google Scholar 

  18. Claude, F., Navarro, G.: A Fast and Compact Web Graph Representation. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 105–116. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  19. Cohen, J.: Graph Twiddling in a MapReduce World. Computing in Science & Engineering 11(4), 29–41 (2009)

    Article  Google Scholar 

  20. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  21. Du, N., Wang, B., Wu, B., Wang, Y.: Overlapping Community Detection in Bipartite Networks. In: Proceedings of the WI-IAT 2008, pp. 176–179. IEEE Computer Society, Los Alamitos (2008)

    Google Scholar 

  22. Erdős, P., Rényi, A.: On Random Graphs I. Publicationes Mathematicae 6, 290–297 (1959)

    Google Scholar 

  23. Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., Zhang, J.: On graph problems in a semi-streaming model. Theor. Comput. Sci. 348(2), 207–216 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  24. Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., Zhang, J.: Graph distances in the streaming model. SIAM J. Comput. 38(5), 1709–1727 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  25. Furtado, P.: Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions. IGI Publishing (2009)

    Google Scholar 

  26. Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. J. Inf. Sci. 32(2), 198–208 (2006)

    Article  Google Scholar 

  27. Guo, L., Tan, E., Chen, S., Zhang, X., Zhao, Y.: Analyzing patterns of user content generation in online social networks. In: Proceedings of the KDD 2009, pp. 369–378. ACM, New York (2009)

    Chapter  Google Scholar 

  28. Guozhu, D., Leonid, L., Jianwen, S., Limsoon, W.: Maintaining Transitive Closure of Graphs in SQL. Int. J. Information Technology 5 (1999)

    Google Scholar 

  29. Halpin, H., Robu, V., Shepherd, H.: The complex dynamics of collaborative tagging. In: Proceedings of the WWW 2007. ACM, New York (2007)

    Google Scholar 

  30. Hartley, T.D.R., Çatalyürek, Ü.V., Özgüner, F., Yoo, A., Kohn, S., Henderson, K.W.: MSSG: A Framework for Massive-Scale Semantic Graphs. In: Proceedings of the 2006 IEEE International Conference on Cluster Computing, pp. 1–10. IEEE, Los Alamitos (2006)

    Chapter  Google Scholar 

  31. Hotho, A., Robert, J., Christoph, S., Gerd, S.: Emergent Semantics in BibSonomy. GI Jahrestagung P-94, 305–312 (2006)

    Google Scholar 

  32. Karande, C., Chellapilla, K., Andersen, R.: Speeding up algorithms on compressed web graphs. In: Proceedings of the WSDM 2009, pp. 272–281. ACM, New York (2009)

    Chapter  Google Scholar 

  33. Keith, H.R., Raymie, S., Janet, L.W., Rajiv, G.W.: The Link Database: Fast Access to Graphs of the Web. In: Data Compression Conference, vol. 0, p. 122. IEEE Computer Society, Los Alamitos (2002)

    Google Scholar 

  34. Kittur, A., Chi, E., Pendleton, B.A., Suh, B., Mytkowicz, T.: Power of the Few vs. Wisdom of the Crowd: Wikipedia and the Rise of the Bourgeoisie. World Wide Web 1, 2,19 (2007)

    Google Scholar 

  35. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  36. Larsson, N.J., Moffat, A.: Offline Dictionary-Based Compression. In: Data Compression Conference, vol. 0, p. 296. IEEE Computer Society, Los Alamitos (1999)

    Google Scholar 

  37. Lawrence, P., Sergey, B., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford University (1998)

    Google Scholar 

  38. Lempel, R., Moran, S.: The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Comput. Netw. 33(1-6), 387–401 (2000)

    Article  Google Scholar 

  39. Madduri, K., Bader, D.A.: Compact graph representations and parallel connectivity algorithms for massive dynamic network analysis. In: Proceedings of the IPDPS 2009, pp. 1–11. IEEE Computer Society, Los Alamitos (2009)

    Google Scholar 

  40. Mika, P.: Ontologies Are Us: A Unified Model of Social Networks and Semantics. In: International Semantic Web Conference, pp. 522–536 (2005)

    Google Scholar 

  41. Muthukrishnan, S.: Data streams: algorithms and applications. In: Proceedings of the SODA 2003, pp. 413–413 (2003)

    Google Scholar 

  42. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E 69(2), 26113+ (2004)

    Article  Google Scholar 

  43. Papadopoulos, S., Menemenis, F., Vakali, A., Kompatsiaris, Y.: Analysis of Content Popularity in Social Bookmarking Systems. In: Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions. IGI Publishing (2009)

    Google Scholar 

  44. Papadopoulos, S., Kompatsiaris, Y., Vakali, A.: Leveraging Collective Intelligence through Community Detection in Tag Networks. In: Proceedings of the CKCaR 2009 (2009)

    Google Scholar 

  45. Papadopoulos, S., Kompatsiaris, Y., Vakali, A.: A Graph-based Clustering Scheme for Identifying Related Tags in Folksonomies. In: Proceedings of the DaWaK 2010 (2010)

    Google Scholar 

  46. Seidel, R., Aragon, C.: Randomized search trees. Algorithmica 16, 464–497 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  47. Shepitsen, A., Gemmell, J., Mobasher, B., Burke, R.: Personalized recommendation in social tagging systems using hierarchical clustering. In: Proceedings of the RecSys 2008, pp. 259–266. ACM, New York (2008)

    Chapter  Google Scholar 

  48. Simpson, E.: Clustering Tags in Enterprise and Web Folksonomies. Technical Report. HP Labs (2008)

    Google Scholar 

  49. Stephens, S., Rung, J., Lopez, X.: Graph Data Representation in Oracle Database 10g: Case Studies in Life Sciences. IEEE Data Eng. Bull. 27(4), 61–66 (2004)

    Google Scholar 

  50. Voss, J.: Measuring Wikipedia. In: The 10th International Conference of the International Society for Scientometrics and Informetrics (2005)

    Google Scholar 

  51. Wu, C., Zhou, B.: Analysis of tag within online social networks. In: Proceedings of the GROUP 2009, pp. 21–30. ACM, New York (2009)

    Google Scholar 

  52. Wu, X., Zhang, L., Yu, Y.: Exploring social annotations for the semantic web. In: Proceedings of the WWW 2006, pp. 417–426. ACM, New York (2006)

    Chapter  Google Scholar 

  53. Yeung, C.A., Gibbins, N., Shadbolt, N.: Tag Meaning Disambiguation through Analysis of Tripartite Structure of Folksonomies. In: Proceedings of the WI-IATW 2007, pp. 3–6. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

  54. Yeung, C.A., Gibbins, N., Shadbolt, N.: Collective User Behaviour and Tag Contextualisation in Folksonomies. In: Proceedings of the WI-IAT 2008, pp. 659–662. IEEE Computer Society, Los Alamitos (2008)

    Google Scholar 

  55. Yeung, C.A., Gibbins, N., Shadbolt, N.: Contextualising tags in collaborative tagging systems. In: Proceedings of the HT 2009, pp. 251–260. ACM, New York (2009)

    Chapter  Google Scholar 

  56. Yin, Z., Li, R., Mei, Q., Han, J.: Exploring social tagging graph for web object classification. In: Proceedings of the KDD 2009, pp. 957–966. ACM, New York (2009)

    Chapter  Google Scholar 

  57. Alberton, L.: Graphs in the database: SQL meets social networks (2009), http://techportal.ibuildings.com/2009/09/07/graphs-in-the-database-sql-meets-social-networks/

  58. Bergman, M.K.: Scalability of the Semantic Web (2006), http://www.mkbergman.com/227/scalability-of-the-semantic-web/

  59. Bergman, M.K.: Enterprise Semantic Webs Demand New Database Paradigms (2006), http://www.mkbergman.com/185/enterprise-semantic-webs-esw-demand-new-database-paradigms/

  60. Obasanjo, D.: An Exploration of Object Oriented Database Management Systems (2001), http://www.25hoursaday.com/WhyArentYouUsingAnOODBMS.html

  61. Staken, K.: Introduction to Native XML Databases (2001), http://www.xml.com/pub/a/2001/10/31/nativexmldb.html

  62. Wang, J.C., Huiling, G., Betsy, G.: Oracle White Paper? A Load-On-Demand Approach to Handling Large Networks in the Oracle Spatial Network Data Model (2009), http://www.oracle.com/technology/products/spatial/pdf/11gr2_collateral/_ndmlod11gr2_wp_1009.pdf

  63. Apache Xindice, http://xml.apache.org/xindice/

  64. AllegroGraph RDF store, http://www.franz.com/agraph/allegrograph/

  65. Benchmarks: Performance advantages to store complex object structures, http://www.db4o.com/about/productinformation/benchmarks/

  66. db4o, http://www.db4o.com/about/productinformation/db4o/

  67. Facebook Statistics (2010), http://www.facebook.com/press/info.php?statistics

  68. Getting Started with Berkeley DB for Java - Release 4.8, http://www.oracle.com/technology/documentation/berkeley-db/db/gsg/_JAVA/BerkeleyDB-Core-JAVA-GSG.pdf/

  69. H2 database, http://www.h2database.com/

  70. How ODB Works, http://wiki.neodatis.org/how-odb-works/

  71. Jena Semantic Web Framework, http://jena.sourceforge.net/

  72. JUNG Graph Framework, http://jung.sourceforge.net/

  73. Neo4j graph database, http://neo4j.org/

  74. Object-relational impedance mismatch, http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch/

  75. Oracle Berkeley DB, http://www.oracle.com/technology/products/berkeley-db/index.html/

  76. OWLIM Repository, http://www.ontotext.com/owlim/

  77. PolePosition Benchmark NeoDatis1.9, http://switch.dl.sourceforge.net/project/neodatis-odb/NeoDatis%20ODB%20Performance/NeoDatis%201.9/PolePosition_NeoDatis-1.9.pdf

  78. Sesame Framework, http://www.openrdf.org/

  79. Tamino XML Server, http://www.softwareag.com/corporate/products/wm/tamino/

  80. Virtuoso Server platform, http://www.openlinksw.com/virtuoso/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Giatsoglou, M., Papadopoulos, S., Vakali, A. (2011). Massive Graph Management for the Web and Web 2.0. In: Vakali, A., Jain, L.C. (eds) New Directions in Web Data Management 1. Studies in Computational Intelligence, vol 331. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17551-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17551-0_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17550-3

  • Online ISBN: 978-3-642-17551-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics