Advertisement

A Discussion on the Design of Graph Database Benchmarks

  • David Dominguez-Sal
  • Norbert Martinez-Bazan
  • Victor Muntes-Mulero
  • Pere Baleta
  • Josep Lluis Larriba-Pey
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6417)

Abstract

Graph Database Management systems (GDBs) are gaining popularity. They are used to analyze huge graph datasets that are naturally appearing in many application areas to model interrelated data. The objective of this paper is to raise a new topic of discussion in the benchmarking community and allow practitioners having a set of basic guidelines for GDB benchmarking. We strongly believe that GDBs will become an important player in the market field of data analysis, and with that, their performance and capabilities will also become important. For this reason, we discuss those aspects that are important from our perspective, i.e. the characteristics of the graphs to be included in the benchmark, the characteristics of the queries that are important in graph analysis applications and the evaluation workbench.

Keywords

Social Network Analysis Resource Description Framework Betweenness Centrality Large Graph SPARQL Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Angles, R., Gutiérrez, C.: Survey of graph database models. ACM Comput. Surv. 40(1) (2008)Google Scholar
  2. 2.
  3. 3.
    HypergraphDB: HypergraphDB website, http://www.kobrix.com/hgdb.jsp (last retrieved in March 2010)
  4. 4.
    Infogrid: Blog, http://infogrid.org/blog/2010/03/operations-on-a-graph-database-part-4 (last retrieved in March 2010)
  5. 5.
    Martínez-Bazan, N., Muntés-Mulero, V., et al.: Dex: high-performance exploration on large graphs for information retrieval. In: CIKM, pp. 573–582 (2007)Google Scholar
  6. 6.
    Jena-RDF: Jena documentation, http://jena.sourceforge.net/documentation.html (last retrieved in March 2010)
  7. 7.
    AllegroGraph: AllegroGraph website, http://www.franz.com/agraph/ (last retrieved in May 2010)
  8. 8.
    Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF. W3C (2008), http://www.w3.org/TR/rdf-sparql-query/
  9. 9.
    Gremlin website: Gremlin documentation, http://wiki.github.com/tinkerpop/gremlin/ (last retrieved in June 2010)
  10. 10.
    Transaction Processing Performance Council (TPC): TPC Benchmark. TPC website, http://www.tpc.org (last retrieved in June 2010)
  11. 11.
    Cattell, R., Skeen, J.: Object operations benchmark. TODS 17(1), 1–31 (1992)CrossRefGoogle Scholar
  12. 12.
    Carey, M., DeWitt, D., Naughton, J.: The oo7 benchmark. In: SIGMOD Conference, pp. 12–21 (1993)Google Scholar
  13. 13.
    Schmidt, A., Waas, F., Kersten, M., Carey, M., Manolescu, I., Busse, R.: Xmark: A benchmark for xml data management. In: VLDB, pp. 974–985 (2002)Google Scholar
  14. 14.
    Guo, Y., Pan, Z., Heflin, J.: Lubm: A benchmark for owl knowledge base systems. J. Web Sem. 3(2-3), 158–182 (2005)CrossRefGoogle Scholar
  15. 15.
    Bader, D., Feo, J., Gilbert, J., Kepner, J., Koetser, D., Loh, E., Madduri, K., Mann, B., Meuse, T., Robinson, E.: HPC Scalable Graph Analysis Benchmark v1.0. HPC Graph Analysis (February 2009)Google Scholar
  16. 16.
    Dominguez-Sal, D., Urbón-Bayes, P., Giménez-Vañó, A., Gómez-Villamor, S., Martínez-Bazán, N., Larriba-Pey, J.L.: Survey of graph database performance on the hpc scalable graph analysis benchmark. In: Shen, H.T., Pei, J., Özsu, M.T., Zou, L., Lu, J., Ling, T.-W., Yu, G., Zhuang, Y., Shao, J. (eds.) WAIM 2010. LNCS, vol. 6185, pp. 37–48. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  17. 17.
    INSNA: International network for social network analysis, http://www.insna.org/
  18. 18.
    OReilly, T.: What is Web 2.0: Design patterns and business models for the next generation of software (2005)Google Scholar
  19. 19.
    Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: from relations to semistructured data and XML. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
  20. 20.
    Brickley, D., Guha, R.V.: Resource description framework (rdf) schema specification 1.0. W3C Candidate Recommendation (2000)Google Scholar
  21. 21.
    Shasha, D., Wang, J., Giugno, R.: Algorithmics and applications of tree and graph searching. In: PODS, pp. 39–52. ACM, New York (2002)Google Scholar
  22. 22.
    Anyanwu, K., Sheth, A.: ρ-queries: Enabling querying for semantic associations on the semantic web. In: WWW, pp. 690–699. ACM Press, New York (2003)Google Scholar
  23. 23.
    Chakrabarti, D., Faloutsos, C.: Graph mining: Laws, generators, and algorithms. ACM Computing Surveys (CSUR) 38(1), 2 (2006)CrossRefGoogle Scholar
  24. 24.
    BioGRID: General repository for interaction datasets, http://www.thebiogrid.org/
  25. 25.
    PDB: Rcsb protein data bank, http://www.rcsb.org/
  26. 26.
  27. 27.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)CrossRefGoogle Scholar
  28. 28.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Strands: e-commerce recommendation engine, http://recommender.strands.com/
  30. 30.
    Chein, M., Mugnier, M.: Conceptual graphs: fundamental notions. Revue d’Intelligence Artificielle 6, 365–406 (1992)Google Scholar
  31. 31.
    DirectedEdge: a recommendation engine, http://www.directededge.com (last retrieved in June 2010)
  32. 32.
    Amadeus: Global travel distribution system, http://www.amadeus.net/
  33. 33.
    Leskovec, J., Huttenlocher, D., Kleinberg, J.: Signed networks in social media. In: CHI, pp. 1361–1370 (2010)Google Scholar
  34. 34.
    Goertzel, B.: OpenCog Prime: Design for a Thinking Machine. Online wikibook (2008), http://opencog.org/wiki/OpenCogPrime
  35. 35.
    Erdos, P., Renyi, A.: On random graphs. Mathematicae 6(290-297), 156 (1959)zbMATHGoogle Scholar
  36. 36.
    Leskovec, J., Lang, L., Dasgupta, A., Mahoney, M.: Statistical properties of community structure in large social and information networks. In: WWW, pp. 695–704 (2008)Google Scholar
  37. 37.
    Flickr: Four Billion, http://blog.flickr.net/en/2009/10/12/4000000000/ (last retrieved in June 2010)
  38. 38.
    Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: SIGCOMM, pp. 251–262 (1999)Google Scholar
  39. 39.
    McGlohon, M., Akoglu, L., Faloutsos, C.: Weighted graphs and disconnected components: patterns and a generator. In: KDD, pp. 524–532 (2008)Google Scholar
  40. 40.
    Bader, D., Madduri, K.: Parallel algorithms for evaluating centrality indices in real-world networks. In: ICPP, pp. 539–550 (2006)Google Scholar
  41. 41.
    Bitton, D., DeWitt, D., Turbyfill, C.: Benchmarking database systems a systematic approach. In: VLDB, pp. 8–19 (1983)Google Scholar
  42. 42.
    Transaction Processing Performance Council (TPC): TPC Benchmark H (2.11). TPC website, http://www.tpc.org/tpch/ (last retrieved in June 2010)
  43. 43.
    Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., Ghahramani, Z.: Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research 11, 985–1042 (2010)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • David Dominguez-Sal
    • 1
  • Norbert Martinez-Bazan
    • 1
  • Victor Muntes-Mulero
    • 1
  • Pere Baleta
    • 2
  • Josep Lluis Larriba-Pey
    • 1
  1. 1.DAMA-UPCBarcelonaSpain
  2. 2.Sparsity TechnologiesBarcelonaSpain

Personalised recommendations