Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark
The analysis of the relationship among data entities has lead to model them as graphs. Since the size of the datasets has significantly grown in the recent years, it has become necessary to implement efficient graph databases that can load and manage these huge datasets.
In this paper, we evaluate the performance of four of the most scalable native graph database projects (Neo4j, Jena, HypergraphDB and DEX). We implement the full HPC Scalable Graph Analysis Benchmark, and we test the performance of each database for different typical graph operations and graph sizes, showing that in their current development status, DEX and Neo4j are the most efficient graph databases.
Unable to display preview. Download preview PDF.
- 1.AllegroGraph. AllegroGraph website, http://www.franz.com/agraph/(last retrieved in May 2010)
- 2.Apache Lucene (September 2008), Lucene website, http://lucene.apache.org/
- 3.Bader, D., Feo, J., Gilbert, J., Kepner, J., Koetser, D., Loh, E., Madduri, K., Mann, B., Meuse, T., Robinson, E.: HPC Scalable Graph Analysis Benchmark v1.0. HPC Graph Analysis (February 2009)Google Scholar
- 4.Bader, D., Madduri, K.: Parallel algorithms for evaluating centrality indices in real-world networks. In: ICPP, pp. 539–550 (2006)Google Scholar
- 5.BerkeleyDB. BerkeleyDB website, http://www.oracle.com/database/berkeley-db/index.html (last retrieved in March 2010)
- 7.Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: A recursive model for graph mining. In: Jonker, W., Petković, M. (eds.) SDM 2004. LNCS, vol. 3178. Springer, HeidelbergGoogle Scholar
- 8.Chang, F., Dean, J., Ghemawat, S., et al.: Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2) (2008)Google Scholar
- 9.HypergraphDB. HypergraphDB website, http://www.kobrix.com/hgdb.jsp (last retrieved in March 2010)
- 10.Infogrid. Blog, http://infogrid.org/blog/2010/03/operations-on-a-graph-databae-part-4 (last retrieved in March 2010)
- 11.Jena-RDF. Jena documentation, http://jena.sourceforge.net/documentation.html (last retrieved in March 2010)
- 12.Leskovec, J., Lang, L., Dasgupta, A., Mahoney, M.: Statistical properties of community structure in large social and information networks. In: WWW, pp. 695–704 (2008)Google Scholar
- 13.Martínez-Bazan, N., Muntés-Mulero, V., Gómez-Villamor, S., et al.: Dex: high-performance exploration on large graphs for information retrieval. In: CIKM, pp. 573–582 (2007)Google Scholar
- 14.Neo4j. The neo database (2006), http://dist.neo4j.org/neo-technology-introduction.pdf
- 15.Neo4j. Batch Insert, http://wiki.neo4j.org/content/Batch_Insert (last retrieved in March 2010)
- 16.Neo4j. Neo4j wiki documentation, http://wiki.neo4j.org/content/Main_Page (last retrieven in March 2010)
- 17.Olson, M., Bostic, K., Seltzer, M.: Berkeley db. In: USENIX Annual Technical Conference, FREENIX Track, pp. 183–191. USENIX (1999)Google Scholar
- 18.Sesame. Open RDF website, http://www.openrdf.org/ (last retrieved in May 2010)