Advertisement

Frontiers of Computer Science

, Volume 12, Issue 6, pp 1076–1089 | Cite as

IncPregel: an incremental graph parallel computation model

  • Qiang Liu
  • Xiaoshe Dong
  • Heng ChenEmail author
  • Yinfeng Wang
Research Article
  • 25 Downloads

Abstract

Large-scale graph computation is often required in a variety of emerging applications such as social network computation and Web services. Such graphs are typically large and frequently updated with minor changes. However, re-computing an entire graph when a few vertices or edges are updated is often prohibitively expensive. To reduce the cost of such updates, this study proposes an incremental graph computation model called IncPregel, which leverages the non-after-effect property of the first-order Markov chain and provides incremental programming abstractions to avoid redundant computation and message communication. This is accomplished by employing an efficient and fine-grained reuse mechanism. We implemented this model on Hama, a popular open source framework based on Pregel, to construct an incremental graph processing system called IncHama. IncHama automatically detects changes in input in order to recognize “changed vertices” and to exchange reusable data by means of shuffling. The evaluation results on large-scale graphs show that, compared with Hama, IncHama is 1.1–2.7 times faster and can reduce communication messages by more than 50% when the incremental edges increase in number from 0.1 to 100k.

Keywords

graph computation Pregel cloud computing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61572394), National Key Research and Development Program of China (2016YFB1000303), and Shenzhen Scientific Plan (JSGG20140519141854753).

Supplementary material

11704_2016_6109_MOESM1_ESM.ppt (282 kb)
Supplementary material, approximately 282 KB.

References

  1. 1.
    Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107–113CrossRefGoogle Scholar
  2. 2.
    Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of ACM SIGMOD International Conference on Management of Data. 2010, 135–146Google Scholar
  3. 3.
    Low Y C, Gonzalez J, Kyrola A, Bickson D, Guestrin C E, Hellerstein J M. GraphLab: a new framework for parallel machine learning. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. 2010, 340–349Google Scholar
  4. 4.
    Low Y C, Bickson D, Gonzalez J, Guestrin C E, Kyrola A, Hellerstein J M. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the Very Large Data Base Endowment, 2012, 5(8): 716–727Google Scholar
  5. 5.
    Power R, Li J Y. Piccolo: building fast, distributed programs with partitioned tables. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. 2010, 1–14Google Scholar
  6. 6.
    Roy A, Mihailovic I, Zwaenepoel W. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 472–488Google Scholar
  7. 7.
    Wilson C, Sala A, Puttaswamy K P N, Zhao B Y. Beyond social graphs: user interactions in online social networks and their implications. ACM Transactions on the Web, 2012, 6(4): 17CrossRefGoogle Scholar
  8. 8.
    Fan WF, Wang X, Wu Y H. Incremental graph pattern matching. ACM Transactions on Database Systems, 2013, 38(3): 18MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Logothetis D, Olston C, Reed B, Webb K C, Yocum K. Stateful bulk processing for incremental analytics. In: Proceedings of the 1st ACM Symposium on Cloud Computing. 2010, 51–62CrossRefGoogle Scholar
  10. 10.
    Bhatotia P, Wieder A, Rodrigues R, Acar U A, Pasquin R. Incoop: MapReduce for incremental computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing. 2011Google Scholar
  11. 11.
    Sagharichian M, Naderi H, Haghjoo M. ExPregel: a new computational model for large-scale graph processing. Concurrency and Computation: Practice and Experience, 2015, 27(17): 4954–4969CrossRefGoogle Scholar
  12. 12.
    Brin S, Page L. Reprint of: the anatomy of a large-scale hypertextual Web search engine. Computer Networks, 2012, 56(18): 3825–3833CrossRefGoogle Scholar
  13. 13.
    Gyöngyi Z, Garcia-Molina H, Pedersen J. Combating Web spam with trustrank. In: Proceedings of the 30th International Conference on Very Large Data Base. 2004, 576–587Google Scholar
  14. 14.
    Bu Y Y, Howe B, Balazinska M, Ernst MD. HaLoop: efficient iterative data processing on large clusters. Proceedings of the Very Large Data Base Endowment, 2010, 3(1–2): 285–296Google Scholar
  15. 15.
    Kang U, Tsourakakis C E, Faloutsos C. Pegasus: a peta-scale graph mining system implementation and observations. In: Proceedings of the 9th IEEE International Conference on Data Mining. 2009, 229–238Google Scholar
  16. 16.
    Kang U, Tsourakakis C E, Appel A P, Faloutsos C, Leskovec J. Hadi: mining radii of large graphs. ACM Transactions on Knowledge Discovery from Data, 2011, 5(2): 8CrossRefGoogle Scholar
  17. 17.
    Valiant L G. A bridging model for parallel computation. Communications of the ACM, 1990, 33(8): 103–111CrossRefGoogle Scholar
  18. 18.
    Prabhakaran V, Wu M, Weng X T, McSherry F, Zhou L D, Haridasan M. Managing large graphs on multi-cores with graph awareness. In: Proceedings of USENIX Annual Technical Conference. 2012, 41–52Google Scholar
  19. 19.
    Gonzalez J E, Xin R S, Dave A, Crankshaw D, Franklin M J, Stoica I. GraphX: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation. 2014, 599–613Google Scholar
  20. 20.
    Zaharia M, Chowdhury M, Franklin M J, Shenker S, Stoica I. Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. 2010Google Scholar
  21. 21.
    Gonzalez J E, Low Y C, Gu H J, Bickson D, Guestrin C. PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. 2012, 17–30Google Scholar
  22. 22.
    Chen R, Ding X, Wang P, Chen H B, Zang B Y, GuanH B. Computation and communication efficient graph processing with distributed immutable view. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed computing. 2014, 215–226Google Scholar
  23. 23.
    Desikan P, Pathak N, Srivastava J,Kumar V. Incremental page rank computation on evolving graphs. In: Proceedings of Special Interest Tracks and Posters of the 14th International Conference onWorldWide Web. 2005, 1094–1095Google Scholar
  24. 24.
    Chien S, Dwork C, Kumar R, Simon D R, Sivakumar D. Link evolution: analysis and algorithms. Internet Mathematics, 2004, 1(3): 277–304MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Popa L, Budiu M, Yu Y, Isard M. DryadInc: reusing work in large-scale computations. In: Proceedings of Conference on Hot Topics in Cloud Computing. 2009Google Scholar
  26. 26.
    Peng D, Dabek F. Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. 2010, 1–15Google Scholar
  27. 27.
    Cheng R, Hong J, Kyrola A, Miao Y S, Weng X T, Wu M, Yang F, Zhou L D, Zhao F, Chen E H. Kineograph: taking the pulse of a fastchanging and connected world. In: Proceedings of the 7th ACM European Conference on Computer Systems. 2012, 85–98Google Scholar
  28. 28.
    Lovász L. Random walks on graphs: a survey. Combinatorics, Paul Erdos is Eighty, 1993, 2: 1–46Google Scholar
  29. 29.
    Puterman M L. Markov Decision Processes: Discrete Dynamic Stochastic Programming. New York: John Wiley & Sons, 1994CrossRefzbMATHGoogle Scholar
  30. 30.
    Shao Y X, Cui B, Ma L. PAGE: a partition aware engine for parallel graph computation. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(2): 518–530CrossRefGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  • Qiang Liu
    • 1
  • Xiaoshe Dong
    • 1
  • Heng Chen
    • 1
    Email author
  • Yinfeng Wang
    • 2
  1. 1.School of Electronics and Information EngineeringXi’an Jiaotong UniversityXi’anChina
  2. 2.Shenzhen Institute of Information TechnologyShenzhenChina

Personalised recommendations