Skip to main content

SGraph: A Distributed Streaming System for Processing Big Graphs

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9784))

Abstract

Big graph processing has been widely used in various computational domains, ranging from language modeling to social networks. Graph-parallel systems have been proposed to process such big graphs on clusters with up to hundreds of nodes. However, the size of a big graph often exceeds the available main memories in a small cluster. As a consequence, task failures happen frequently. To address this problem, we propose SGraph, a distributed streaming graph processing system built on top of Spark. SGraph introduces a streaming data model to avoid loading all of the graph data which may exceed the available RAM space. In addition, SGraph leverages an edge-centric scatter-gather computing model that can be used to conveniently implement graph algorithms. Experiments demonstrate that SGraph can process graphs with up to 1.5 billion edges on small clusters with several low-cost commodity PCs, whereas existing systems may require up to tens or hundreds of high-end machines. Furthermore, SGraph is up to 2.3 times faster than existing systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. http://newsroom.fb.com/News/457/One-Billion-People-on-Facebook

  2. http://snap.stanford.edu/data/soc-LiveJournal1.html

  3. Avery, C.: Giraph: large-scale graph processing infrastructure on Hadoop. In: Proceedings of the Hadoop Summit, Santa Clara (2011)

    Google Scholar 

  4. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI, vol. 12, p. 2 (2012)

    Google Scholar 

  5. Jain, N., Liao, G., Willke, T.L.: Graphbuilder: scalable graph ETL framework. In: First International Workshop on Graph Data Management Experiences and Systems, p. 4. ACM (2013)

    Google Scholar 

  6. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600. ACM (2010)

    Google Scholar 

  7. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146. ACM (2010)

    Google Scholar 

  8. Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web (1999)

    Google Scholar 

  9. Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 472–488. ACM (2013)

    Google Scholar 

  10. Xin, R.S., Crankshaw, D., Dave, A., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: Unifying Data-Parallel and Graph-Parallel Analytics (2014). arXiv preprint arXiv:1402.2394

  11. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012)

    Google Scholar 

Download references

Acknowledgement

We appreciate the reviewers’s comments and the efforts of open-source contributors. This paper is supported by National Natural Science Foundation of China-Guangdong Government Joint Funding (2nd) for Super Computer Application Research and the Hong Kong GRF 2150851.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hejun Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, C., Wu, H., Zhao, D.J., Yan, D., Cheng, J. (2016). SGraph: A Distributed Streaming System for Processing Big Graphs. In: Wang, Y., Yu, G., Zhang, Y., Han, Z., Wang, G. (eds) Big Data Computing and Communications. BigCom 2016. Lecture Notes in Computer Science(), vol 9784. Springer, Cham. https://doi.org/10.1007/978-3-319-42553-5_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42553-5_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42552-8

  • Online ISBN: 978-3-319-42553-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics