Advertisement

WGB: Towards a Universal Graph Benchmark

  • Khaled Ammar
  • M. Tamer Özsu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8585)

Abstract

Graph data are of growing importance in many recent applications. There are many systems proposed in the last decade for graph processing and analysis. Unfortunately, with the exception of RDF stores, every system uses different datasets and queries to assess its scalability and efficiency. This makes it challenging (and sometimes impossible) to conduct a meaningful comparison. Our aim is to close this gap by introducing Waterloo Graph Benchmark (WGB), a benchmark for graph processing systems that offers an efficient generator that creates dynamic graphs with properties similar to real-life ones. WGB includes the basic graph queries which are used for building graph applications.

Keywords

Graph System Graph Query Separator Character Online Query Reachability Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

This research was partially supported by a fellowship from IBM Centre for Advanced Studies (CAS), Toronto.

References

  1. 1.
    Dominguez-Sal, D., Martinez-Bazan, N., Muntes-Mulero, N., Baleta, P., Larriba-Pay, J.L.: A discussion on the design of graph database benchmarks. In: Proceedings of 2nd TPC Technology Conference on Performance Evaluation, Measurement and Characterization of Complex Systems, pp. 25–40 (2011)Google Scholar
  2. 2.
    Ciglan, M., Averbuch, A., Hluchy, L.: Benchmarking traversal operations over graph databases. In: Proceedings Workshops of 28th International Conference on Data Engineering, pp. 186–189 (2012)Google Scholar
  3. 3.
    Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pp. 135–146 (2010)Google Scholar
  4. 4.
    Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012)CrossRefGoogle Scholar
  5. 5.
    Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: Bigbench: towards an industry standard benchmark for big data analytics. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 1197–1208. ACM (2013)Google Scholar
  6. 6.
    Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Y., Shi, Y., Zhang, S., et al.: Bigdatabench: A big data benchmark suite from internet services (2014). arXiv preprint arXiv:1401.1406
  7. 7.
    Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., Ghahramani, Z.: Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res. 11, 985–1042 (2010)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Ming, Z., Luo, C., Gao, W., Han, R., Yang, Q., Wang, L., Zhan, J.: BDGS: a scalable big data generator suite in big data benchmarking (2014). arXiv preprint arXiv:1401.5465
  9. 9.
    Appel, A.P., Faloutsos, C., Junior, C.T.: Graph mining techniques: focusing on discriminating between real and synthetic graphs. Bioinformatics: Concepts, Methodologies, Tools, and Applications, vol. 3, pp. 446–464. Information Resources Management Association, USA (2013)Google Scholar
  10. 10.
    Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. J. Seman. Web Inf. Syst. 5(2), 1–24 (2009)CrossRefGoogle Scholar
  11. 11.
    Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Seman.: Sci. Serv. Agents World Wide Web 3(2), 158–182 (2005)CrossRefGoogle Scholar
  13. 13.
    Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP\(^2\)l SPARQL performance benchmark. In: Proceedings of 25th International Conferrence on Data Engineering, pp. 222–233 (2009)Google Scholar
  14. 14.
    Duan, S., Kementsietsidis, A., Srinivas, K., Udrea, O.: Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 145–156 (2011)Google Scholar
  15. 15.
    Aluç, G., Özsu, M.T., Daudjee, K., Hartig, O.: Chameleon-db: a workload-aware robust RDF data management system, University of Waterloo, Technical report, CS-2013-10(2013)Google Scholar
  16. 16.
    Yu, J., Cheng, J.: Graph reachability queries: a survey. In: Aggarwal, C.C., Wang, H. (eds.) Managing and Mining Graph Data. Advances in Database Systems, vol. 40, pp. 181–215. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  17. 17.
    Spillane, S.R., Birnbaum, J., Bokser, D., Kemp, D., Labouseur, A., Olsen, P.W., Vijayan, J., Hwang, J.-H., Yoon, J.-W.: A demonstration of the G* graph database system. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), Los Alamitos, CA, USA, pp. 1356–1359. IEEE Computer Society (2013)Google Scholar
  18. 18.
    Aggarwal, C.C., Wang, H.: A survey of clustering algorithms for graph data. In: Aggarwal, C.C., Wang, H. (eds.) Managing and Mining Graph Data. Advances in Database Systems, vol. 40. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  19. 19.
    Akoglu, L., Faloutsos, C.: RTG: a recursive realistic graph generator using random typing. Data Min. Knowl. Disc. 19(2), 194–209 (2009)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Miller, G.A.: Some effects of intermittent silence. Am. J. Psychol. 70(2), 311–314 (1957)CrossRefGoogle Scholar
  21. 21.
    Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data, 1(1), Article 2, pp. 1–41 (2007)Google Scholar
  22. 22.
    Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system implementation and observations. In: Proceedings of IEEE International Conference on Data Mining, 2009, pp. 229–238 (2009)Google Scholar
  23. 23.
    Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: The HaLoop approach to large-scale iterative data analysis. VLDB J. 21(2), 169–190 (2012)CrossRefGoogle Scholar
  24. 24.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of 6th USENIX Symposium on Operating System Design and Implementation, pp. 137–149 (2004)Google Scholar
  25. 25.
    Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Cheriton School of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations