Abstract
The problem of efficiently managing massive datasets has gained increasing attention due to the availability of a plethora of data from various sources, such as the Web. Moreover, Web 2.0 applications seem to be one of the most fruitful sources of information as they have attracted the interest of a large number of users that are eager to contribute to the creation of new data, available online. Several Web 2.0 applications incorporate Social Tagging features, allowing users to upload and tag sets of online resources. This activity produces massive amounts of data on a daily basis, which can be represented by a tripartite graph structure that connects users, resources and tags. The analysis of Social Tagging Systems (STS) emerges as a promising research field, enabling the identification of common patterns in the behavior of users, or the identification of communities of semantically related tags and resources, and much more. The massive size of STS datasets dictates the necessity for a robust underlying infrastructure to be used for their storage and access.
This chapter contains a survey of existing solutions to the problem of storing and managing massive graph data focusing particularly on the implications that the underlying technologies of such frameworks have on the support/operation of Web 2.0 applications using them as back-end storage solutions, as well as on the efficient execution of web mining tasks. Considering the category of STS as an example of Web 2.0 applications, the requirements that are posed for the management of STS data are thoroughly discussed. On the basis of these requirements three frameworks have been developed, using state-of-the-art technologies as backbones. The results of benchmarks conducted on the developed frameworks are presented and discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abello, J., Buchsbaum, A.L., Westbrook, J.: A Functional Approach to External Graph Algorithms. Algorithmica 32(3), 437–458 (1998)
Ahn, K.J., Guha, S.: Graph Sparsification in the Semi-streaming Model. In: ICALP(2), pp. 328–338 (2009)
Bader, D., Madduri, K.: Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2. In: Proceedings of the ICPP 2006. IEEE Computer Society, Los Alamitos (2006)
Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: Proceeding of the KDD 2008, pp. 16–24. ACM Press, New York (2008)
Beynon, M.D., Kurc, T., Catalyurek, U., Chang, C., Sussman, A., Saltz, J.: Distributed processing of very large datasets with DataCutter. Parallel Comput. 27(11), 1457–1478 (2001)
Boldi, P., Vigna, S.: The WebGraph Framework I: Compression Techniques. In: Proceedings of the WWW 2004, pp. 595–602. ACM, New York (2004)
Boldi, P., Vigna, S.: The WebGraph Framework II: Codes For The World-Wide Web. In: Proceedings of the DCC 2004, vol. 528. IEEE Computer Society, Los Alamitos (2004)
Boldi, P., Santini, M., Vigna, S.: Permuting Web Graphs. In: Avrachenkov, K., Donato, D., Litvak, N. (eds.) WAW 2009. LNCS, vol. 5427, pp. 116–126. Springer, Heidelberg (2009)
Bothorel, C., Bouklit, M.: An algorithm for detecting communities in folksonomy hypergraphs. Appeared in I2CS 2008, Schoelcher, Martinique, Sponsored by IEEE (2008)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30(1-7), 107–117 (1998)
Brinkmeier, M., Werner, J., Recknagel, S.: Communities in graphs and hypergraphs. In: Proceedings of CIKM 2007, Lisbon, Portugal, pp. 869–872. ACM, New York (2007)
Brooks, C.H., Montanez, N.: Improved annotation of the blogosphere via autotagging and hierarchical clustering. In: Proceedings of the WWW 2006, pp. 625–632. ACM, New York (2006)
Buchsbaum, A.L., Giancarlo, R., Racz, B.: New results for finding common neighborhoods in massive graphs in the data stream model. Theor. Comput. Sci. 407(1-3), 302–309 (2008)
Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: Proceedings of the WSDM 2008. ACM, New York (2008)
Buriol, L.S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., Sohler, C.: Counting triangles in data streams. Proceedings of the PODS 2006, pp. 253–262. ACM, New York (2006)
Cattuto, C., Baldassarri, A., Servedio, D.P.V., Loreto, V.: Emergent Community Structure In Social Tagging Systems. Advances in Complex Systems (ACS) 11(4), 597–608 (2008)
Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: Proceedings of the KDD 2009, pp. 219–228. ACM, New York (2009)
Claude, F., Navarro, G.: A Fast and Compact Web Graph Representation. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 105–116. Springer, Heidelberg (2007)
Cohen, J.: Graph Twiddling in a MapReduce World. Computing in Science & Engineering 11(4), 29–41 (2009)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Du, N., Wang, B., Wu, B., Wang, Y.: Overlapping Community Detection in Bipartite Networks. In: Proceedings of the WI-IAT 2008, pp. 176–179. IEEE Computer Society, Los Alamitos (2008)
Erdős, P., Rényi, A.: On Random Graphs I. Publicationes Mathematicae 6, 290–297 (1959)
Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., Zhang, J.: On graph problems in a semi-streaming model. Theor. Comput. Sci. 348(2), 207–216 (2005)
Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., Zhang, J.: Graph distances in the streaming model. SIAM J. Comput. 38(5), 1709–1727 (2008)
Furtado, P.: Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions. IGI Publishing (2009)
Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. J. Inf. Sci. 32(2), 198–208 (2006)
Guo, L., Tan, E., Chen, S., Zhang, X., Zhao, Y.: Analyzing patterns of user content generation in online social networks. In: Proceedings of the KDD 2009, pp. 369–378. ACM, New York (2009)
Guozhu, D., Leonid, L., Jianwen, S., Limsoon, W.: Maintaining Transitive Closure of Graphs in SQL. Int. J. Information Technology 5 (1999)
Halpin, H., Robu, V., Shepherd, H.: The complex dynamics of collaborative tagging. In: Proceedings of the WWW 2007. ACM, New York (2007)
Hartley, T.D.R., Çatalyürek, Ü.V., Özgüner, F., Yoo, A., Kohn, S., Henderson, K.W.: MSSG: A Framework for Massive-Scale Semantic Graphs. In: Proceedings of the 2006 IEEE International Conference on Cluster Computing, pp. 1–10. IEEE, Los Alamitos (2006)
Hotho, A., Robert, J., Christoph, S., Gerd, S.: Emergent Semantics in BibSonomy. GI Jahrestagung P-94, 305–312 (2006)
Karande, C., Chellapilla, K., Andersen, R.: Speeding up algorithms on compressed web graphs. In: Proceedings of the WSDM 2009, pp. 272–281. ACM, New York (2009)
Keith, H.R., Raymie, S., Janet, L.W., Rajiv, G.W.: The Link Database: Fast Access to Graphs of the Web. In: Data Compression Conference, vol. 0, p. 122. IEEE Computer Society, Los Alamitos (2002)
Kittur, A., Chi, E., Pendleton, B.A., Suh, B., Mytkowicz, T.: Power of the Few vs. Wisdom of the Crowd: Wikipedia and the Rise of the Bourgeoisie. World Wide Web 1, 2,19 (2007)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Larsson, N.J., Moffat, A.: Offline Dictionary-Based Compression. In: Data Compression Conference, vol. 0, p. 296. IEEE Computer Society, Los Alamitos (1999)
Lawrence, P., Sergey, B., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford University (1998)
Lempel, R., Moran, S.: The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Comput. Netw. 33(1-6), 387–401 (2000)
Madduri, K., Bader, D.A.: Compact graph representations and parallel connectivity algorithms for massive dynamic network analysis. In: Proceedings of the IPDPS 2009, pp. 1–11. IEEE Computer Society, Los Alamitos (2009)
Mika, P.: Ontologies Are Us: A Unified Model of Social Networks and Semantics. In: International Semantic Web Conference, pp. 522–536 (2005)
Muthukrishnan, S.: Data streams: algorithms and applications. In: Proceedings of the SODA 2003, pp. 413–413 (2003)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Physical Review EÂ 69(2), 26113+ (2004)
Papadopoulos, S., Menemenis, F., Vakali, A., Kompatsiaris, Y.: Analysis of Content Popularity in Social Bookmarking Systems. In: Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions. IGI Publishing (2009)
Papadopoulos, S., Kompatsiaris, Y., Vakali, A.: Leveraging Collective Intelligence through Community Detection in Tag Networks. In: Proceedings of the CKCaR 2009 (2009)
Papadopoulos, S., Kompatsiaris, Y., Vakali, A.: A Graph-based Clustering Scheme for Identifying Related Tags in Folksonomies. In: Proceedings of the DaWaK 2010 (2010)
Seidel, R., Aragon, C.: Randomized search trees. Algorithmica 16, 464–497 (1996)
Shepitsen, A., Gemmell, J., Mobasher, B., Burke, R.: Personalized recommendation in social tagging systems using hierarchical clustering. In: Proceedings of the RecSys 2008, pp. 259–266. ACM, New York (2008)
Simpson, E.: Clustering Tags in Enterprise and Web Folksonomies. Technical Report. HP Labs (2008)
Stephens, S., Rung, J., Lopez, X.: Graph Data Representation in Oracle Database 10g: Case Studies in Life Sciences. IEEE Data Eng. Bull. 27(4), 61–66 (2004)
Voss, J.: Measuring Wikipedia. In: The 10th International Conference of the International Society for Scientometrics and Informetrics (2005)
Wu, C., Zhou, B.: Analysis of tag within online social networks. In: Proceedings of the GROUP 2009, pp. 21–30. ACM, New York (2009)
Wu, X., Zhang, L., Yu, Y.: Exploring social annotations for the semantic web. In: Proceedings of the WWW 2006, pp. 417–426. ACM, New York (2006)
Yeung, C.A., Gibbins, N., Shadbolt, N.: Tag Meaning Disambiguation through Analysis of Tripartite Structure of Folksonomies. In: Proceedings of the WI-IATW 2007, pp. 3–6. IEEE Computer Society, Los Alamitos (2007)
Yeung, C.A., Gibbins, N., Shadbolt, N.: Collective User Behaviour and Tag Contextualisation in Folksonomies. In: Proceedings of the WI-IAT 2008, pp. 659–662. IEEE Computer Society, Los Alamitos (2008)
Yeung, C.A., Gibbins, N., Shadbolt, N.: Contextualising tags in collaborative tagging systems. In: Proceedings of the HT 2009, pp. 251–260. ACM, New York (2009)
Yin, Z., Li, R., Mei, Q., Han, J.: Exploring social tagging graph for web object classification. In: Proceedings of the KDD 2009, pp. 957–966. ACM, New York (2009)
Alberton, L.: Graphs in the database: SQL meets social networks (2009), http://techportal.ibuildings.com/2009/09/07/graphs-in-the-database-sql-meets-social-networks/
Bergman, M.K.: Scalability of the Semantic Web (2006), http://www.mkbergman.com/227/scalability-of-the-semantic-web/
Bergman, M.K.: Enterprise Semantic Webs Demand New Database Paradigms (2006), http://www.mkbergman.com/185/enterprise-semantic-webs-esw-demand-new-database-paradigms/
Obasanjo, D.: An Exploration of Object Oriented Database Management Systems (2001), http://www.25hoursaday.com/WhyArentYouUsingAnOODBMS.html
Staken, K.: Introduction to Native XML Databases (2001), http://www.xml.com/pub/a/2001/10/31/nativexmldb.html
Wang, J.C., Huiling, G., Betsy, G.: Oracle White Paper? A Load-On-Demand Approach to Handling Large Networks in the Oracle Spatial Network Data Model (2009), http://www.oracle.com/technology/products/spatial/pdf/11gr2_collateral/_ndmlod11gr2_wp_1009.pdf
Apache Xindice, http://xml.apache.org/xindice/
AllegroGraph RDF store, http://www.franz.com/agraph/allegrograph/
Benchmarks: Performance advantages to store complex object structures, http://www.db4o.com/about/productinformation/benchmarks/
Facebook Statistics (2010), http://www.facebook.com/press/info.php?statistics
Getting Started with Berkeley DB for Java - Release 4.8, http://www.oracle.com/technology/documentation/berkeley-db/db/gsg/_JAVA/BerkeleyDB-Core-JAVA-GSG.pdf/
H2 database, http://www.h2database.com/
How ODB Works, http://wiki.neodatis.org/how-odb-works/
Jena Semantic Web Framework, http://jena.sourceforge.net/
JUNG Graph Framework, http://jung.sourceforge.net/
Neo4j graph database, http://neo4j.org/
Object-relational impedance mismatch, http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch/
Oracle Berkeley DB, http://www.oracle.com/technology/products/berkeley-db/index.html/
OWLIM Repository, http://www.ontotext.com/owlim/
PolePosition Benchmark NeoDatis1.9, http://switch.dl.sourceforge.net/project/neodatis-odb/NeoDatis%20ODB%20Performance/NeoDatis%201.9/PolePosition_NeoDatis-1.9.pdf
Sesame Framework, http://www.openrdf.org/
Tamino XML Server, http://www.softwareag.com/corporate/products/wm/tamino/
Virtuoso Server platform, http://www.openlinksw.com/virtuoso/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Giatsoglou, M., Papadopoulos, S., Vakali, A. (2011). Massive Graph Management for the Web and Web 2.0. In: Vakali, A., Jain, L.C. (eds) New Directions in Web Data Management 1. Studies in Computational Intelligence, vol 331. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17551-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-17551-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17550-3
Online ISBN: 978-3-642-17551-0
eBook Packages: EngineeringEngineering (R0)