Abstract
Big graph processing has been widely used in various computational domains, ranging from language modeling to social networks. Graph-parallel systems have been proposed to process such big graphs on clusters with up to hundreds of nodes. However, the size of a big graph often exceeds the available main memories in a small cluster. As a consequence, task failures happen frequently. To address this problem, we propose SGraph, a distributed streaming graph processing system built on top of Spark. SGraph introduces a streaming data model to avoid loading all of the graph data which may exceed the available RAM space. In addition, SGraph leverages an edge-centric scatter-gather computing model that can be used to conveniently implement graph algorithms. Experiments demonstrate that SGraph can process graphs with up to 1.5 billion edges on small clusters with several low-cost commodity PCs, whereas existing systems may require up to tens or hundreds of high-end machines. Furthermore, SGraph is up to 2.3 times faster than existing systems.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
http://newsroom.fb.com/News/457/One-Billion-People-on-Facebook
Avery, C.: Giraph: large-scale graph processing infrastructure on Hadoop. In: Proceedings of the Hadoop Summit, Santa Clara (2011)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI, vol. 12, p. 2 (2012)
Jain, N., Liao, G., Willke, T.L.: Graphbuilder: scalable graph ETL framework. In: First International Workshop on Graph Data Management Experiences and Systems, p. 4. ACM (2013)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600. ACM (2010)
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146. ACM (2010)
Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web (1999)
Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 472–488. ACM (2013)
Xin, R.S., Crankshaw, D., Dave, A., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: Unifying Data-Parallel and Graph-Parallel Analytics (2014). arXiv preprint arXiv:1402.2394
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012)
Acknowledgement
We appreciate the reviewers’s comments and the efforts of open-source contributors. This paper is supported by National Natural Science Foundation of China-Guangdong Government Joint Funding (2nd) for Super Computer Application Research and the Hong Kong GRF 2150851.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, C., Wu, H., Zhao, D.J., Yan, D., Cheng, J. (2016). SGraph: A Distributed Streaming System for Processing Big Graphs. In: Wang, Y., Yu, G., Zhang, Y., Han, Z., Wang, G. (eds) Big Data Computing and Communications. BigCom 2016. Lecture Notes in Computer Science(), vol 9784. Springer, Cham. https://doi.org/10.1007/978-3-319-42553-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-42553-5_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42552-8
Online ISBN: 978-3-319-42553-5
eBook Packages: Computer ScienceComputer Science (R0)