Cluster Computing

, Volume 18, Issue 3, pp 1189–1213 | Cite as

Large scale graph processing systems: survey and an experimental evaluation

  • Omar Batarfi
  • Radwa El Shawi
  • Ayman G. Fayoumi
  • Reza Nouri
  • Seyed-Mehdi-Reza Beheshti
  • Ahmed Barnawi
  • Sherif SakrEmail author


Graph is a fundamental data structure that captures relationships between different data entities. In practice, graphs are widely used for modeling complicated data in different application domains such as social networks, protein networks, transportation networks, bibliographical networks, knowledge bases and many more. Currently, graphs with millions and billions of nodes and edges have become very common. In principle, graph analytics is an important big data discovery technique. Therefore, with the increasing abundance of large graphs, designing scalable systems for processing and analyzing large scale graphs has become one of the most timely problems facing the big data research community. In general, scalable processing of big graphs is a challenging task due to their size and the inherent irregular structure of graph computations. Thus, in recent years, we have witnessed an unprecedented interest in building big graph processing systems that attempted to tackle these challenges. In this article, we provide a comprehensive survey over the state-of-the-art of large scale graph processing platforms. In addition, we present an extensive experimental study of five popular systems in this domain, namely, GraphChi, Apache Giraph, GPS, GraphLab and GraphX. In particular, we report and analyze the performance characteristics of these systems using five common graph processing algorithms and seven large graph datasets. Finally, we identify a set of the current open research challenges and discuss some promising directions for future research in the domain of large scale graph processing.


Big graph Graph processing Experimental evaluation 



This work was supported by King Abdulaziz City for Science and Technology (KACST) Project 11-INF1990-03.


  1. 1.
    Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB 2(1), 922–933 (2009)Google Scholar
  2. 2.
    Alexandrov, A., Bergmann, R., Ewen, S., Freytag, J., Hueske, F., Heise, A., Kao, O., Leich, M., Leser, U., Markl, V., Naumann, F., Peters, M., Rheinländer, A., Sax, M.J., Schelter, S., Höger, M., Tzoumas, K., Warneke, D.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)CrossRefGoogle Scholar
  3. 3.
    Barnawi, A., Batarfi, O., Elshawi, R., Fayoumi, A., Nouri, R., Sakr, S.: On characterizing the performance of distributed graph computation platforms. In: Proceedings of the TPC Technology Conference, TPCTC. Springer, Berlin (2014)Google Scholar
  4. 4.
    Borkar, V.R., Carey, M.J., Grover, R., Onose, N., Vernica, R.: Hyracks: a flexible and extensible foundation for data-intensive computing. In: Proceedings of the international conference on Data Engineering, ICDE, pp. 1151–1162. IEEE (2011)Google Scholar
  5. 5.
    Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: The HaLoop approach to large-scale iterative data analysis. VLDB J. 21(2), 169–190 (2012)CrossRefGoogle Scholar
  6. 6.
    Bu, Y., Borkar, V.R., Jia, J., Carey, M.J., Condie, T.: Pregelix: Big(ger) graph analytics on a dataflow engine. PVLDB 8(2), 161–172 (2014)Google Scholar
  7. 7.
    Chattopadhyay, B., Lin, L., Liu, W., Mittal, S., Aragonda, P., Lychagina, V., Kwon, Y., Wong, M.: Tenzing a SQL implementation on the MapReduce framework. PVLDB 4(12), 1318–1327 (2011)Google Scholar
  8. 8.
    Chen, R., Weng, X., He, B., Yang, M.: Large graph processing in the cloud. In: Proceedings of the SIGMOD, pp. 1123–1126. ACM (2010)Google Scholar
  9. 9.
    Clinger, W.D.: Foundations of Actor Semantics. Technical Report, Cambridge, MA (1981)Google Scholar
  10. 10.
    Dean, J., Ghemawa, S.: MapReduce: simplified data processing on large clusters. OSDI 1, 137–150 (2004)Google Scholar
  11. 11.
    Ediger, D., Bader, D.A.: Investigating graph algorithms in the BSP model on the cray XMT. In: Proceedings of the IPDPS workshops (2013)Google Scholar
  12. 12.
    Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J., Fox, G.: Twister: a runtime for iterative MapReduce. In: Proceedings of the High Performance Distributed Computing, HPDC, pp. 810–818. ACM (2010)Google Scholar
  13. 13.
    Fard, A., Nisar, M.U., Ramaswamy, L., Miller, J.A., Saltz, M.: A distributed vertex-centric approach for pattern matching in massive graphs. In: Proceedings of the BigData conference, pp. 403–411 (2013)Google Scholar
  14. 14.
    Friedman, E., Pawlowski, P.M., Cieslewicz, J.: SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions. PVLDB 2(2), 1402–1413 (2009)Google Scholar
  15. 15.
    Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the Operating Systems Design and Implementation, OSDI, pp. 17–30 (2012)Google Scholar
  16. 16.
    Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: Proceedings of the OSDI, pp. 599–613 (2014)Google Scholar
  17. 17.
    Guo, Y., Biczak, M., Varbanescu, A.L., Iosup, A., Martella, C., Willke, T.L.: How well do graph-processing platforms perform? An empirical performance evaluation and analysis. In: Proceedings of the International Parallel and Distributed Processing Symposiumm, IPDPS, pp. 395–404 (2014)Google Scholar
  18. 18.
    Guo, Y., Varbanescu, A.L., Iosup, A., Martella, C., Willke, T.L.: Benchmarking graph-processing platforms: a vision. In: Proceedings of the International Conference on Performance Engineering, ICPE, pp. 289–292 (2014)Google Scholar
  19. 19.
    Han, W., Lee, S., Park, K., Lee, J., Kim, M., Kim, J., Yu, H.: TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: Proceedings of the KDD, pp. 77–85 (2013)Google Scholar
  20. 20.
    Han, M., Daudjee, K., Ammar, K., Özsu, M.T., Wang, X., Jin, T.: An experimental comparison of Pregel-like graph processing systems. PVLDB 7(12), 1047–1058 (2014)Google Scholar
  21. 21.
    Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: a Self-tuning system for big data analytics. In: Proceedings of the Conference on Innovative Data Systems Research, CIDR, pp. 261–272 (2011)Google Scholar
  22. 22.
    Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system. In: Proceedings of the International Conference on Data Mining, ICDM, pp. 229–238 (2009)Google Scholar
  23. 23.
    Kang, U., Meeder, B., Faloutsos, C.: Spectral analysis for billion-scale graphs: discoveries and implementation. In: Proceedings of the PAKDD, pp. 13–25 (2011)Google Scholar
  24. 24.
    Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: mining peta-scale graphs. Knowl. Inf. Syst. 27(2), 303–325 (2011)CrossRefGoogle Scholar
  25. 25.
    Kang, U., Tong, H., Sun, J., Lin, C.-Y., Faloutsos, C.: GBASE: a scalable and general graph management system. In: Proceedings of the international conference on Knowledge Discovery and Data Mining, KDD, pp. 1091–1099 (2011)Google Scholar
  26. 26.
    Khayyat, Z., Awara, K., Alonazi, A., Jamjoom, H., Williams, D., Kalnis, P.: Mizan: a system for dynamic load balancing in large-scale graph processing. In: Proceedings of the European Conference on Computer Systems, EuroSys, pp. 169–182. ACM (2013)Google Scholar
  27. 27.
    Kyrola, A., Blelloch, G.E., Guestrin, C.: GraphChi: large-scale graph computation on just a PC. In: Proceedings of the OSDI, pp. 31–46 (2012)Google Scholar
  28. 28.
    Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)Google Scholar
  29. 29.
    Lu, Y., Cheng, J., Yan, D., Wu, H.: Largescale distributed graph computing systems: an experimental evaluation. PVLD 8(3), 281–292 (2014)Google Scholar
  30. 30.
    Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the SIGMOD conference, pp. 135–146 (2010)Google Scholar
  31. 31.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical Report 1999–66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120Google Scholar
  32. 32.
    Sakr, S.: GraphREL: a decomposition-based and selectivity-aware relational framework for processing sub-graph queries. In: Proceedings of the DASFAA, pp. 123–137 (2009)Google Scholar
  33. 33.
    Sakr, S., Al-Naymat, G.: Efficient relational techniques for processing graph queries. J. Comput. Sci. Technol. 25(6), 1237–1255 (2010)CrossRefGoogle Scholar
  34. 34.
    Sakr, S., Al-Naymat, G.: Graph indexing and querying: a review. IJWIS 6(2), 101–120 (2010)Google Scholar
  35. 35.
    Sakr, S., Pardede, E. (ed.): Graph Data Management: Techniques and Applications. IGI Global, Hershey (2011)Google Scholar
  36. 36.
    Sakr, S., Elnikety, S., He, Y.: G-SPARQL: a hybrid engine for querying large attributed graphs. In: Proceedings of the Conference on Information and Knowledge Management, CIKM (2012)Google Scholar
  37. 37.
    Sakr, S., Liu, A., Fayoumi, A.G.: The family of mapreduce and large-scale data processing systems. ACM Comput. Surv. 46(1), 11 (2013)CrossRefGoogle Scholar
  38. 38.
    Salihoglu, S., Widom, J.: GPS: a graph processing system. In: Proceedings of the SSDBM, p. 22. ACM (2013)Google Scholar
  39. 39.
    Schad, J., Dittrich, J., Quiané-Ruiz, J.-A.: Runtime measurements in the cloud: observing, analyzing, and reducing variance. PVLDB 3(1), 460–471 (2010)Google Scholar
  40. 40.
    Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 505–516 (2013)Google Scholar
  41. 41.
    Simmen, D.E., Schnaitter, K., Davis, J., He, Y., Lohariwala, S., Mysore, A., Shenoi, V., Tan, M., Xiao, Y.: Large-scale graph analytics in aster 6: bringing context to big data discovery. PVLDB 7(13), 1405–1416 (2014)Google Scholar
  42. 42.
    Stutz, P., Bernstein, A., Cohen, W.W.: Signal/collect: graph algorithms for the (semantic) web. Int. Semant. Web Conf. 1, 764–780 (2010)Google Scholar
  43. 43.
    Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “think like a vertex” to “think like a graph”. PVLDB 7(3), 193–204 (2013)Google Scholar
  44. 44.
    Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRefGoogle Scholar
  45. 45.
    Wang, G., Xie, W., Demers, A., Gehrke, J.: Asynchronous large-scale graph processing made easy. In CIDR (2013)Google Scholar
  46. 46.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the HotCloud (2010)Google Scholar
  47. 47.
    Zhang, Y., Gao, Q., Gao, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation. J. Grid Comput. 10(1), 47–68 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Omar Batarfi
    • 1
  • Radwa El Shawi
    • 2
  • Ayman G. Fayoumi
    • 1
  • Reza Nouri
    • 3
  • Seyed-Mehdi-Reza Beheshti
    • 3
  • Ahmed Barnawi
    • 1
  • Sherif Sakr
    • 3
    • 4
    Email author
  1. 1.King Abdulaziz UniversityJeddahSaudi Arabia
  2. 2.Princess Nourah Bint Abdulrahman UniversityRiyadhSaudi Arabia
  3. 3.University of New South WalesSydneyAustralia
  4. 4.King Saud bin Abdulaziz University for Health SciencesRiyadhSaudi Arabia

Personalised recommendations