Skip to main content
Log in

iGraph: an incremental data processing system for dynamic graph

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

With the popularity of social network, the demand for real-time processing of graph data is increasing. However, most of the existing graph systems adopt a batch processing mode, therefore the overhead of maintaining and processing of dynamic graph is significantly high. In this paper, we design iGraph, an incremental graph processing system for dynamic graph with its continuous updates. The contributions of iGraph include: 1) a hash-based graph partition strategy to enable fine-grained graph updates; 2) a vertexbased graph computing model to support incremental data processing; 3) detection and rebalance methods of hotspot to address the workload imbalance problem during incremental processing. Through the general-purpose API, iGraph can be used to implement various graph processing algorithms such as PageRank. We have implemented iGraph on Apache Spark, and experimental results show that for real life datasets, iGraph outperforms the original GraphX in respect of graph update and graph computation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Shao Y, Cui B, Ma L. PAGE: a partition aware engine for parallel graph computation. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(2): 518–530

    Article  Google Scholar 

  2. Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2010, 135–146

    Google Scholar 

  3. Salihoglu S, Widom J. GPS: a graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management. 2013, 22:1–22:12

    Google Scholar 

  4. Power R, Li J Y. Piccolo: building fast, distributed programs with partitioned tables. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2010, 293–306

    Google Scholar 

  5. Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein J M. Graphlab: a new framework for parallel machine learning. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. 2010, 340–349

    Google Scholar 

  6. Pearce R A, Gokhale M, Amato N M. Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing Networking, Storage and Analysis. 2010, 1–11

    Google Scholar 

  7. Kang U, Tsourakakis C E, Faloutsos C. PEGASUS: a peta-scale graph mining system. In: Proceedings of the 9th IEEE International Conference on Data Mining. 2009, 229–238

    Google Scholar 

  8. Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 17–30

    Google Scholar 

  9. Ching A, Edunov S, Kabiljo M, Logothetis D, Muthukrishnan S. One trillion edges: graph processing at Facebook-scale. Proceedings of the VLDB Endowment, 2015, 8(12): 1804–1815

    Article  Google Scholar 

  10. Yan D, Cheng J, Lu Y, Ng W. Blogel: a block-centric framework for distributed computation on real-world graphs. Proceedings of the VLDB Endowment, 2014, 7(14): 1981–1992

    Article  Google Scholar 

  11. Zhang Y, Liao X F, Jin H, Lin L, Lu F. An adaptive switching scheme for iterative computing in the cloud. Frontiers of Computer Science, 2014, 8(6): 872–884

    Article  MathSciNet  Google Scholar 

  12. Zheng X L, Zhong Y G, Zeng D, Wang F Y. Social influence and spread dynamics in social networks. Frontiers of Computer Science, 2012, 6(5): 611–620

    MathSciNet  Google Scholar 

  13. Kumar R, Novak J, Tomkins A. Structure and evolution of online social networks. In: Philip S Y, Han J, Faloutsos C, eds. Link Mining: Models, Algorithms, and Applications. New York: Springer, 2010, 337–357

    Chapter  Google Scholar 

  14. Yan D, Cheng J, Lu Y, Ng W. Effective techniques for message reduction and load balancing in distributed graph computation. In: Proceedings of the 24th International Conference on World Wide Web. 2015, 1307–1317

    Chapter  Google Scholar 

  15. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin M J, Shenker S, Stoica I. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation. 2012, 15–28

    Google Scholar 

  16. Gonzalez J E, Xin R S, Dave A, Crankshaw D, Franklin M J, Stoica I. Graphx: graph processing in a distributed dataflow framework. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2014, 599–613

    Google Scholar 

  17. Ma S, Li J, Hu C M, Lin X L, Huai J P. Big graph search: challenges and techniques. Frontiers of Computer Science, 2016, 10(3): 387–398

    Article  Google Scholar 

  18. Cheatham T, Fahmy A F, Stefanescu D C, Valiant L G. Bulk synchronous parallel computing—a paradigm for transportable software. In: Proceedings of Annual Hawaii International Conference on System Sciences. 1995, 268–275

    Google Scholar 

  19. Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein J M. Distributed graphlab: a framework for machine learning in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716–727

    Article  Google Scholar 

  20. Pujol J M, Erramilli V, Siganos G, Yang X, Laoutaris N, Chhabra P, Rodriguez P. The little engine(s) that could: scaling online social networks. ACM SIGCOMM Computer Communication Review, 2011, 41(4): 375–386

    Google Scholar 

  21. Mondal J, Deshpande A. Managing large dynamic graphs efficiently. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2012, 145–156

    Google Scholar 

  22. Yang S, Yan X, Zong B, Khan A. Towards effective partition management for large graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2012, 517–528

    Google Scholar 

  23. Bu Y, Howe B, Balazinska M, Ernst M D. Haloop: efficient iterative data processing on large clusters. Proceedings of the VLDB Endowment, 2010, 3(1): 285–296

    Article  Google Scholar 

  24. Logothetis D, Olston C, Reed B, Webb K C, Yocum K. Stateful bulk processing for incremental analytics. In: Proceedings of the 1st ACM Symposium on Cloud Computing. 2010, 51–62

    Chapter  Google Scholar 

  25. Popa L, Budiu M, Yu Y, Isard M. Dryadinc: reusing work in largescale computations. In: Proceedings of Workshop on Hot Topics in Cloud Computing. 2009

    Google Scholar 

  26. Gunda P K, Ravindranath L, Thekkath C A, Yu Y, Zhuang L. Nectar: automatic management of data and computation in datacenters. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2010, 75–88

    Google Scholar 

  27. Bhatotia P, Wieder A, Rodrigues R, Acar U A, Pasquin R. Incoop: MapReduce for incremental computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing. 2011

    Google Scholar 

  28. Peng D, Dabek F. Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2010, 251–264

    Google Scholar 

  29. Murray D G, McSherry F, Isaacs R, Isard M, Barham P, Abadi M. Naiad: a timely dataflow system. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 439–455

    Chapter  Google Scholar 

  30. Roy A, Mihailovic I, Zwaenepoel W. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 472–488

    Chapter  Google Scholar 

  31. Kyrola A, Blelloch G E, Guestrin C. Graphchi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 31–46

    Google Scholar 

  32. Cheng R, Hong J, Kyrola A, Miao Y, Weng X, Wu M, Yang F, Zhou L, Zhao F, Chen E. Kineograph: taking the pulse of a fast-changing and connected world. In: Proceedings of the 7th ACM European Conference on Computer Systems. 2012, 85–98

    Google Scholar 

  33. Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I. Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 423–438

    Chapter  Google Scholar 

  34. Çatalyürek Ü V, Aykanat C, Uçar B. On two-dimensional sparse matrix partitioning: models, methods, and a recipe. SIAM Journal on Scientific Computing, 2010, 32(2): 656–683

    Article  MathSciNet  MATH  Google Scholar 

  35. Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: bringing order to the web. Technical Report. 1999

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianxin Li.

Additional information

Wuyang Ju received the BE degree from the School of Mathematics and Systems Science at Beihang University, China in 2013. He is currently a master candidate in the Department of Computer Science, Beihang University. His research interests include distributed storage system and graph processing.

Jianxin Li is an associate professor at the School of Computer Science and Engineering, Beihang University, China. He received his PhD degree from Beihang University in 2008. He was a visiting scholar in machine learning department of Carnegie Mellon University, USA in 2015, and a visiting researcher of MSRA in 2011. His current research interests include data analysis and processing, distributed systems, and system virtualization.

Weiren Yu received the BE degree from the School of Advanced Engineering at Beihang University, China in 2011. He is currently a PhD candidate in the Department of Computer Science, Beihang University since 2011. His research interests include distributed machine learning systems, scalable graphical models and graph mining models for emerging event detection on social media.

Richong Zhang received his BS Degree and MASc degree from Jilin University, China in 2001 and 2004, respectively. In 2006, he received his MS degree from Dalhousie University, Canada. He received his PhD form the School of Information Technology and Engineering, University of Ottawa, Canada in 2011. He is currently an associate professor in the School of Computer Science and Engineering, Beihang University, China. His research interests include artificial intelligence and data mining.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ju, W., Li, J., Yu, W. et al. iGraph: an incremental data processing system for dynamic graph. Front. Comput. Sci. 10, 462–476 (2016). https://doi.org/10.1007/s11704-016-5485-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-016-5485-7

Keywords

Navigation