Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark

  • D. Dominguez-Sal
  • P. Urbón-Bayes
  • A. Giménez-Vañó
  • S. Gómez-Villamor
  • N. Martínez-Bazán
  • J. L. Larriba-Pey
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6185)

Abstract

The analysis of the relationship among data entities has lead to model them as graphs. Since the size of the datasets has significantly grown in the recent years, it has become necessary to implement efficient graph databases that can load and manage these huge datasets.

In this paper, we evaluate the performance of four of the most scalable native graph database projects (Neo4j, Jena, HypergraphDB and DEX). We implement the full HPC Scalable Graph Analysis Benchmark, and we test the performance of each database for different typical graph operations and graph sizes, showing that in their current development status, DEX and Neo4j are the most efficient graph databases.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    AllegroGraph. AllegroGraph website, http://www.franz.com/agraph/(last retrieved in May 2010)
  2. 2.
    Apache Lucene (September 2008), Lucene website, http://lucene.apache.org/
  3. 3.
    Bader, D., Feo, J., Gilbert, J., Kepner, J., Koetser, D., Loh, E., Madduri, K., Mann, B., Meuse, T., Robinson, E.: HPC Scalable Graph Analysis Benchmark v1.0. HPC Graph Analysis (February 2009)Google Scholar
  4. 4.
    Bader, D., Madduri, K.: Parallel algorithms for evaluating centrality indices in real-world networks. In: ICPP, pp. 539–550 (2006)Google Scholar
  5. 5.
    BerkeleyDB. BerkeleyDB website, http://www.oracle.com/database/berkeley-db/index.html (last retrieved in March 2010)
  6. 6.
    Brandes, U.: A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25(2), 163–177 (2001)CrossRefMATHGoogle Scholar
  7. 7.
    Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: A recursive model for graph mining. In: Jonker, W., Petković, M. (eds.) SDM 2004. LNCS, vol. 3178. Springer, HeidelbergGoogle Scholar
  8. 8.
    Chang, F., Dean, J., Ghemawat, S., et al.: Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2) (2008)Google Scholar
  9. 9.
    HypergraphDB. HypergraphDB website, http://www.kobrix.com/hgdb.jsp (last retrieved in March 2010)
  10. 10.
    Infogrid. Blog, http://infogrid.org/blog/2010/03/operations-on-a-graph-databae-part-4 (last retrieved in March 2010)
  11. 11.
    Jena-RDF. Jena documentation, http://jena.sourceforge.net/documentation.html (last retrieved in March 2010)
  12. 12.
    Leskovec, J., Lang, L., Dasgupta, A., Mahoney, M.: Statistical properties of community structure in large social and information networks. In: WWW, pp. 695–704 (2008)Google Scholar
  13. 13.
    Martínez-Bazan, N., Muntés-Mulero, V., Gómez-Villamor, S., et al.: Dex: high-performance exploration on large graphs for information retrieval. In: CIKM, pp. 573–582 (2007)Google Scholar
  14. 14.
  15. 15.
    Neo4j. Batch Insert, http://wiki.neo4j.org/content/Batch_Insert (last retrieved in March 2010)
  16. 16.
    Neo4j. Neo4j wiki documentation, http://wiki.neo4j.org/content/Main_Page (last retrieven in March 2010)
  17. 17.
    Olson, M., Bostic, K., Seltzer, M.: Berkeley db. In: USENIX Annual Technical Conference, FREENIX Track, pp. 183–191. USENIX (1999)Google Scholar
  18. 18.
    Sesame. Open RDF website, http://www.openrdf.org/ (last retrieved in May 2010)

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • D. Dominguez-Sal
    • 1
  • P. Urbón-Bayes
    • 1
  • A. Giménez-Vañó
    • 1
  • S. Gómez-Villamor
    • 1
  • N. Martínez-Bazán
    • 1
  • J. L. Larriba-Pey
    • 1
  1. 1.DAMA-UPCUniversitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations