An empirical comparison of Big Graph frameworks in the context of network analysis

Koch, Jannis; Staudt, Christian L.; Vogel, Maximilian; Meyerhenke, Henning

doi:10.1007/s13278-016-0394-1

An empirical comparison of Big Graph frameworks in the context of network analysis

Original Article
Published: 22 September 2016

Volume 6, article number 84, (2016)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Jannis Koch¹,
Christian L. Staudt¹,
Maximilian Vogel¹ &
…
Henning Meyerhenke¹

563 Accesses
8 Citations
Explore all metrics

Abstract

Complex networks are heterogeneous relational data sets with nontrivial substructures and statistical properties. They are typically represented as graphs consisting of vertices and edges. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and processing. We describe and compare programming models for distributed computing with a focus on graph algorithms for large-scale complex network analysis. Four frameworks-GraphLab, Apache Giraph, Giraph++ and Apache Flink—are used to implement algorithms for the representative problems connected components, community detection, PageRank and clustering coefficients. The implementations are executed on a computer cluster to evaluate the frameworks’ suitability in practice and to compare their performance to that of the single-machine, shared-memory parallel network analysis package NetworKit. Out of the distributed frameworks, GraphLab and Apache Giraph generally show the best performance. In our experiments a cluster of eight computers running Apache Giraph enables the analysis of a network with ca. 2 billion edges, which is too large for a single machine of the same type. However, for networks that fit into memory of one machine, the performance of the shared-memory parallel implementation is usually far better than the distributed ones. The study provides experimental evidence for selecting the appropriate framework depending on the task and data volume.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An analysis of the graph processing landscape

Article Open access 09 April 2021

Miguel E. Coimbra, Alexandre P. Francisco & Luís Veiga

On Characterizing the Performance of Distributed Graph Computation Platforms

Large scale graph processing systems: survey and an experimental evaluation

Article 24 July 2015

Omar Batarfi, Radwa El Shawi, … Sherif Sakr

Notes

http://networkit.iti.kit.edu.

References

Apache (2014) Giraph++ patch for apache giraph. https://issues.apache.org/jira/browse/GIRAPH-818. Accessed 31 July 2014
Apache (2015a) Website of the framework Apache Flink. https://flink.apache.org/
Apache (2015b) Website of the framework Apache Giraph. http://giraph.apache.org/
Apache (2015c) Website of the research project Stratosphere. http://stratosphere.eu/
Apache (2016) Website of GraphX. https://spark.apache.org/graphx/
Avery Ching (2013) Scaling apache giraph to a trillion edges. https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920. Accessed 30 July 2014
Battré D, Ewen S, Hueske F, Kao O, Markl V, Warneke D (2010) Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In: Proceedings of 1st ACM symposium on cloud computing, SoCC ’10. ACM, New York, pp 119–130
Boldi P, Vigna S (2004) The WebGraph framework I: compression techniques. In: Proceedings of the thirteenth international World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, pp 595–601
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Computer networks and ISDN systems. Elsevier Science Publishers B. V, Amsterdam, pp 107–117
Cha M, Haddadi H, Benevenuto F, Gummadi KP (2010) Measuring user influence in Twitter: the million follower fallacy. In: Proceedings of the 4th international AAAI conference on Weblogs and Social Media (ICWSM)
Costa LdF, Oliveira ON, Travieso G, Rodrigues FA, Villas Boas PR, Antiqueira L, Viana MP, Correa Rocha LE (2011) Analyzing and modeling real-world phenomena with complex networks: a survey of applications. Adv Phys 60(3):329–412
Article Google Scholar
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX conference on operating systems design and implementation, OSDI’12. USENIX Association, Berkeley, CA, USA, pp 17–30
Karloff H, Suri S, Vassilvitskii S (2010) A model of computation for mapreduce. In: Proceedings of the twenty-first annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 938–948
Koch J, Staudt CL, Vogel M, Meyerhenke H (2015) Complex network analysis on distributed systems: an empirical comparison. In: Pei J, Silvestri F, Tang J (eds) Proceedings of 2015 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM 2015. ACM, pp 1169–1176
Kunegis J (2013) Konect: the koblenz network collection. In: Proceedings of 22nd international conference on World Wide Web companion. International World Wide Web Conferences Steering Committee, pp 1343–1350
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: WWW ’10: Proceedings of the 19th international conference on World wide web. ACM, New York, NY, USA, pp 591–600
Lin J, Dyer C (2010) Data-intensive text processing with MapReduce. G-Reference, Information and Interdisciplinary Subjects Series. Morgan & Claypool
Lin J, Schatz M (2010) Design patterns for efficient graph algorithms in mapreduce. In: Proceedings of the eighth workshop on mining and learning with graphs, MLG ’10. ACM, New York, NY, USA, pp 78–85
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning in the cloud. CoRR, abs/1204.6078
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 135–146
McColl RC, Ediger D, Poovey J, Campbell D, Bader DA (2014) A performance evaluation of open source graph databases. In: Proceedings of 1st workshop on parallel programming for analytics applications, PPAA ’14. ACM, New York, NY, USA, pp 11–18
Meyerhenke H, Sanders P, Schulz C (2014) Partitioning complex networks via size-constrained clustering. In: Proceedings of 13th international symposium on experimental algorithms (SEA 2014), vol 8504 of LNCS. Springer, Berlin, pp 351–363
Newman M (2010) Networks: an introduction. Oxford University Press, Oxford
Book MATH Google Scholar
Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106
Article Google Scholar
Satish N, Sundaram N, Patwary MMA, Seo J, Park J, Hassaan MA, Sengupta S, Yin Z, Dubey P (2014). Navigating the maze of graph analytics frameworks using massive graph datasets. In: Proceedings 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, NY, USA, pp 979–990
Schank T, Wagner D (2005) Approximating clustering-coefficient and transitivity. J Gr Algorithm Appl 9(2):265–275
Article MathSciNet MATH Google Scholar
Slota GM, Madduri K, Rajamanickam S (2014) Pulp: scalable multi-objective multi-constraint partitioning for small-world networks. In: Lin J, Pei J, Hu X, Chang W, Nambiar R, Aggarwal C, Cercone N, Honavar V, Huan J, Mobasher B, Pyne S (eds) 2014 IEEE international conference on big data, Big Data 2014, pp 481–490
Staudt CL, Sazonovs A, Meyerhenke H (2016) NetworKit: a tool suite for large-scale complex network analysis. Netw Sci, To Appear
Tian Y, Balmin A, Corsten SA, Tatikonda S, McPherson J (2013) From “think like a vertex” to “think like a graph”. PVLDB 7(3):193–204
Google Scholar
Turi (2016). Website of the company distributing GraphLab
Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111
Article Google Scholar
Zhang Y, Gao Q, Gao L, Wang C (2012). Accelerate large-scale iterative computation through asynchronous accumulative updates. In: Proceedings of the 3rd workshop on scientific cloud computing date, ACM, pp 13–22

Download references

Acknowledgments

This work is partially supported by German Research Foundation (DFG) grant ME 3619/3-1 within the Priority Programme 1736 Algorithms for Big Data.

Author information

Authors and Affiliations

Department of Informatics, Karlsruhe Institute of Technology (KIT), Am Fasanengarten 5, 76131, Karlsruhe, Germany
Jannis Koch, Christian L. Staudt, Maximilian Vogel & Henning Meyerhenke

Authors

Jannis Koch
View author publications
You can also search for this author in PubMed Google Scholar
Christian L. Staudt
View author publications
You can also search for this author in PubMed Google Scholar
Maximilian Vogel
View author publications
You can also search for this author in PubMed Google Scholar
Henning Meyerhenke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henning Meyerhenke.

Additional information

Parts of this paper have been published in preliminary form as Koch et al. (2015).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koch, J., Staudt, C.L., Vogel, M. et al. An empirical comparison of Big Graph frameworks in the context of network analysis. Soc. Netw. Anal. Min. 6, 84 (2016). https://doi.org/10.1007/s13278-016-0394-1

Download citation

Received: 21 December 2015
Revised: 07 September 2016
Accepted: 13 September 2016
Published: 22 September 2016
DOI: https://doi.org/10.1007/s13278-016-0394-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An empirical comparison of Big Graph frameworks in the context of network analysis

Abstract

Access this article

Similar content being viewed by others

An analysis of the graph processing landscape

On Characterizing the Performance of Distributed Graph Computation Platforms

Large scale graph processing systems: survey and an experimental evaluation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An empirical comparison of Big Graph frameworks in the context of network analysis

Abstract

Access this article

Similar content being viewed by others

An analysis of the graph processing landscape

On Characterizing the Performance of Distributed Graph Computation Platforms

Large scale graph processing systems: survey and an experimental evaluation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation