Automated Software Engineering

, Volume 21, Issue 4, pp 509–533 | Cite as

Graph database benchmarking on cloud environments with XGDBench

Article

Abstract

Online graph database service providers have started migrating their operations to public clouds due to the increasing demand for low-cost, ubiquitous graph data storage and analysis. However, there is little support available for benchmarking graph database systems in cloud environments. We describe XGDBench which is a graph database benchmarking platform for cloud computing systems. XGDBench has been designed with the aim of creating an extensible platform for graph database benchmarking which makes it suitable for benchmarking future HPC systems. We extend the Yahoo! Cloud Serving Benchmark (YCSB) to the area of graph database benchmarking by creation of XGDBench. The benchmarking platform is written in X10 which is a PGAS language intended for programming future HPC systems. We describe the architecture of the XGDBench and explain how it differs from the current state-of-the-art. We conduct performance evaluation of five famous graph data stores AllegroGraph, Fuseki, Neo4j, OrientDB, and Titan using XGDBench on Tsubame 2.0 HPC cloud environment.

Keywords

Cloud databases Graph database systems Benchmark testing Network theory System performance Performance analysis 

Notes

Acknowledgements

This research was supported by the Japan Science and Technology Agency’s CREST project titled “Development of System Software Technologies for post-Peta Scale High Performance Computing”.

References

  1. AllegroGraph: AllegroGraph RDF Store web 3.0’s database. http://www.franz.com/agraph/allegrograph/ (2013)
  2. Angles, R.: A comparison of current graph database models. In: IEEE 28th International Conference on Data Engineering Workshops (ICDEW), pp. 171–177 (2012) Google Scholar
  3. Apache: Fuseki: serving RDF data over http. URL: http://jena.apache.org/documentation/serving_data/ (2012)
  4. Aurelius: Rexster. URL: https://github.com/tinkerpop/rexster/wiki (2012a)
  5. Aurelius: Titan: distributed graph database. URL: http://thinkaurelius.github.com/titan/ (2012b)
  6. Aurelius: Rexpro. URL: https://github.com/tinkerpop/rexster/wiki/RexPro (2013)
  7. Bader, D.A., Feo, J., Gilbert, J., Kepner, J., Koester, D., Loh, E., Madduri, K., Mann, B., Meuse, T., Robinson, E.: HPC scalable graph analysis benchmark (2009) Google Scholar
  8. Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. Int. J. Semant. Web Inf. Syst. 5(2), 1–24 (2009) CrossRefGoogle Scholar
  9. Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-Mat: a recursive model for graph mining. In: SDM (2004) Google Scholar
  10. Chakrabarti, D., Faloutsos, C., McGlohon, M.: Graph mining: laws and generators. In: Aggarwal, C.C., Wang, H., Elmagarmid, A.K. (eds.) Managing and Mining Graph Data. The Kluwer International Series on Advances in Database Systems, vol. 40, pp. 69–123. Springer, New York (2010) CrossRefGoogle Scholar
  11. Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA ’05), pp. 519–538. ACM, New York (2005) CrossRefGoogle Scholar
  12. Ciglan, M., Averbuch, A., Hluchy, L.: Benchmarking traversal operations over graph databases. In: IEEE 28th International Conference on Data Engineering Workshops (ICDEW), pp. 186–189 (2012) Google Scholar
  13. CloudGraph: CloudGraph.net graph database. URL: http://www.cloudgraph.com/ (2012)
  14. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC ’10), pp. 143–154. ACM, New York (2010). doi: 10.1145/1807128.1807152 CrossRefGoogle Scholar
  15. Cudré-Mauroux, P., Elnikety, S.: Graph data management systems for new application domains. Proc. VLDB Endow. 4(12), 1510–1511 (2011) Google Scholar
  16. Dayarathna, M., Suzumura, T.X.: XGDBench: A benchmarking platform for Graph stores in exascale clouds. In: IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), pp. 363–370 (2012) CrossRefGoogle Scholar
  17. Dominguez-Sal, D., Urbón-Bayes, P., Giménez-Vañó, A., Gómez-Villamor, S., Martínez-Bazán, N., Larriba-Pey, J.L.: Survey of graph database performance on the HPC scalable graph analysis benchmark. In: Proceedings of the 2010 International Conference on Web-Age Information Management (WAIM’10), pp. 37–48. Springer, Berlin (2010) Google Scholar
  18. Dominguez-Sal, D., Martinez-Bazan, N., Muntes-Mulero, V., Baleta, P., Larriba-Pay, J.L.: A discussion on the design of graph database benchmarks. In: Proceedings of the Second TPC Technology Conference on Performance Evaluation, Measurement and Characterization of Complex Systems (TPCTC’10), pp. 25–40. Springer, Berlin (2011) CrossRefGoogle Scholar
  19. Dongarra, J., et al.: The international exascale software project roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011) CrossRefGoogle Scholar
  20. Dudley, J., Pouliot, Y., Chen, R., Morgan, A., Butte, A.: Translational bioinformatics in the cloud: an affordable alternative. Genome Med. 2(8), 51 (2010) CrossRefGoogle Scholar
  21. Dydra: Dydra: networks made friendly. URL: http://dydra.com/ (2012)
  22. Ekins, S., Gupta, R., Gifford, E., Bunin, B., Waller, C.: Chemical space: missing pieces in cheminformatics. Pharm. Res. 27, 2035–2039 (2010) CrossRefGoogle Scholar
  23. Endo, T., Nukada, A., Matsuoka, S., Maruyama, N.: Linpack evaluation on a supercomputer with heterogeneous accelerators. In: IPDPS, pp. 1–8 (2010) Google Scholar
  24. Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the Internet topology. Comput. Commun. Rev. 29(4), 251–262 (1999) CrossRefGoogle Scholar
  25. FlockDB: FlockDB. URL: https://github.com/twitter/flockdb (2013)
  26. Gremlin: Gremlin. URL: https://github.com/tinkerpop/gremlin/wiki/ (2013)
  27. Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for owl knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005) CrossRefGoogle Scholar
  28. Holzschuher, F., Peinl, R.: Performance of graph query languages: comparison of cypher, gremlin and native access in Neo4j. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops (EDBT ’13), pp. 195–204. ACM, New York (2013) CrossRefGoogle Scholar
  29. Huppler, K.: Performance Evaluation and Benchmarking. Chap. The Art of Building a Good Benchmark pp. 18–30. Springer, Berlin (2009) Google Scholar
  30. IBM: X10: performance and productivity at scale. URL: http://x10-lang.org/ (2012)
  31. Leskovec, J., Huttenlocher, D., Kleinberg, J.: Signed networks in social media. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems (CHI ’10), pp. 1361–1370. ACM, New York (2010) CrossRefGoogle Scholar
  32. Ma, L., Yang, Y., Qiu, Z., Xie, G., Pan, Y., Liu, S.: Towards a complete owl ontology benchmark. In: Sure, Y., Domingue, J. (eds.) The Semantic Web: Research and Applications. Lecture Notes in Computer Science, vol. 4011, pp. 125–139. Springer, Berlin (2006) Google Scholar
  33. Morsey, M., Lehmann, J., Auer, S., Ngomo, A.C.N.: DBpedia SPARQL benchmark—performance assessment with real queries on real data. In: International Semantic Web Conference (1)’11, pp. 454–469 (2011) Google Scholar
  34. Murphy, R., Berry, J., McLendon, W., Hendrickson, B., Gregor, D., Lumsdaine, A.: DFS: a simple to write yet difficult to execute benchmark. In: IEEE International Symposium on Workload Characterization, pp. 175–177 (2006) Google Scholar
  35. Myunghwan, K., Leskovec, J.: Multiplicative attribute Graph model of real-world networks. Internet Math. 8(1–2), 113–160 (2012) MathSciNetMATHGoogle Scholar
  36. Nambiar, R., Wakou, N., Carman, F., Majdalany, M.: Transaction processing performance council (tpc): state of the council 2010. In: Nambiar, R., Poess, M. (eds.) Performance Evaluation, Measurement and Characterization of Complex Systems. Lecture Notes in Computer Science, vol. 6417, pp. 1–9. Springer, Berlin (2011) CrossRefGoogle Scholar
  37. Neo4j: Neo4j Heroku add-on. URL: http://www.neo4j.org/develop/heroku (2012)
  38. Newmann, M.: Networks: An Introduction. Oxford University Press, Oxford (2010) CrossRefGoogle Scholar
  39. NuvolaBase: NuvolaBase: cloudize your data—commercial support, training and services about OrientDB. URL: http://www.nuvolabase.com/site/ (2012)
  40. Orient Technologies, O.: OrientDB graph-document NoSQl dbms. URL: http://www.orientdb.org/ (2013)
  41. Partner, J., Vukotic, A., Watt, N.: Neo4j in Action. Manning Publications Co. (2012) Google Scholar
  42. Robinson, I., Webber, J., Eifrem, E.: Graph Databases. O’Reilly, Sebastopol (2013) Google Scholar
  43. Rohloff, K., Dean, M., Emmons, I., Ryder, D., Sumner, J.: An evaluation of triple-store technologies for large data stores. In: On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops. Lecture Notes in Computer Science, vol. 4806, pp. 1105–1114. Springer, Berlin (2007) CrossRefGoogle Scholar
  44. Sakr, S., Liu, A.: SLA-based and consumer-centric dynamic provisioning for cloud databases. In: IEEE 5th International Conference on Cloud Computing, pp. 360–367 (2012) Google Scholar
  45. Sarwat, M., Elnikety, S., He, Y., Kliot, G.H.: Horton: Online query execution engine for large distributed graphs. In: ICDE, pp. 1289–1292 (2012) Google Scholar
  46. Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: Sp2bench: a SPARQL performance benchmark. CoRR abs/0806.4627 (2008) Google Scholar
  47. Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003) CrossRefGoogle Scholar
  48. Shao, B., Wang, H., Xiao, Y.: Managing and mining large graphs: systems and implementations. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD ’12), pp. 589–592. ACM, New York (2012) CrossRefGoogle Scholar
  49. Thakker, D., Osman, T., Gohil, S., Lakin, P.: A pragmatic approach to semantic repositories benchmarking. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) The Semantic Web: Research and Applications. Lecture Notes in Computer Science, vol. 6088, pp. 379–393. Springer, Berlin (2010) Google Scholar
  50. The Apache Software Foundation, T.A.S.: Cassandra. URL: http://cassandra.apache.org/ (2013a)
  51. The Apache Software Foundation: Shindig—welcome to Apache Shindig. URL: http://shindig.apache.org/ (2013b)
  52. Versaci, F., Pingali, K.: Processor allocation for optimistic parallelization of irregular programs. In: Proceedings of the 12th International Conference on Computational Science and Its Applications, Part I (ICCSA’12), pp. 1–14. Springer, Berlin (2012) Google Scholar
  53. Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., Wilkins, D.: A comparison of a graph database and a relational database: a data provenance perspective. In: Proceedings of the 48th Annual Southeast Regional Conference (ACM SE ’10), pp. 42:1–42:6. ACM, New York (2010) Google Scholar
  54. W3C: Rdf primer. URL: http://www.w3.org/TR/rdf-primer/ (2013)
  55. Wang, J.: Sequential patterns. In: Liu, L., Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 2621–2625. Springer, New York (2009) Google Scholar
  56. Zhao, Z., Liu, J., Crespi, N.: The design of activity-oriented social networking: Dig-event. In: Proceedings of the 13th International Conference on Information Integration and Web-Based Applications and Services (iiWAS ’11), pp. 420–425. ACM, New York (2011) Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Computer ScienceTokyo Institute of TechnologyTokyoJapan
  2. 2.Department of Computer ScienceTokyo Institute of Technology/IBM Research-TokyoTokyoJapan

Personalised recommendations