Skip to main content
Log in

GraphMap: scalable iterative graph processing using NoSQL

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Despite having several distributed graph processing frameworks, scalable iterative processing of large graphs is a challenging problem since the graph and intermediate data need a global view of the graph topology in distributed memory. Although some systems support out-of-core iterative computations, they use a single machine and often require fast storage. In this paper, we present a new distributed iterative graph computation framework, called GraphMap, that utilizes a disk-based NoSQL database system for scalable graph processing while ensuring competitive performance. Extensive experiments on several real-world graphs show that GraphMap is more scalable and often faster than existing distributed memory-based systems for various graph processing workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Apache Giraph. http://giraph.apache.org/. Accessed 12 Mar 2019

  2. Apache Hama. https://hama.apache.org/. Accessed 12 Mar 2019

  3. Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’06. ACM, New York, NY, USA, pp 44–54. https://doi.org/10.1145/1150402.1150412

  4. Boldi P, Vigna S (2004) The WebGraph framework I: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, pp 595–601

  5. Bu Y, Borkar V, Jia J, Carey MJ, Condie T (2014) Pregelix: Big(Ger) graph analytics on a dataflow engine. Proc VLDB Endow 8(2):161–172

    Article  Google Scholar 

  6. Facebook Reports Third Quarter 2019 Results. https://investor.fb.com/investor-news/press-release-details/2019/Facebook-Reports-Third-Quarter-2019-Results/default.aspx. Accessed 12 Mar 2019

  7. Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, USA, pp 17–30. http://dl.acm.org/citation.cfm?id=2387880.2387883. Accessed 12 Mar 2019

  8. Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) GraphX: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX Association, Berkeley, CA, USA, pp 599–613. http://dl.acm.org/citation.cfm?id=2685048.2685096. Accessed 12 Mar 2019

  9. Goswami S, Das AK, Płatania R, Lee K, Park SJ (2016) Lazer: Distributed memory-efficient assembly of large-scale genomes. In: 2016 IEEE International Conference on Big Data (Big Data), pp 1171–1181. https://doi.org/10.1109/BigData.2016.7840721

  10. Han WS, Lee S, Park K, Lee JH, Kim MS, Kim J, Yu H (2013) TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). ACM, New York, NY, USA, pp 77–85. https://doi.org/10.1145/2487575.2487581

  11. Hundt R (2011) Loop recognition in C++/Java/go/scala. Proc Scala Days 2011:38

    Google Scholar 

  12. Jia Z, Kwon Y, Shipman G, McCormick P, Erez M, Aiken A (2017) A distributed multi-GPU system for fast graph processing. Proc VLDB Endow 11(3):297–310

    Article  Google Scholar 

  13. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392. https://doi.org/10.1137/S1064827595287997

    Article  MathSciNet  MATH  Google Scholar 

  14. Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, WWW’10. ACM, New York, NY, USA, pp 591–600. https://doi.org/10.1145/1772690.1772751

  15. Kyrola A, Blelloch G, Guestrin C (2012) GraphChi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12. USENIX Association, Berkeley, CA, USA, pp 31–46. http://dl.acm.org/citation.cfm?id=2387880.2387884

  16. Lee K, Liu L (2013) Scaling queries over Big RDF graphs with semantic hash partitioning. Proc VLDB Endow 6(14):1894–1905

    Article  Google Scholar 

  17. Lee K, Liu L, Schwan K, Pu C, Zhang Q, Zhou Y, Yigitoglu E, Yuan P (2015) Scaling iterative graph computations with GraphMap. In: SC15: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–12. https://doi.org/10.1145/2807591.2807604

  18. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’05). ACM, New York, NY, USA, pp 177–187. https://doi.org/10.1145/1081870.1081893

  19. Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727

    Article  Google Scholar 

  20. Maass S, Min C, Kashyap S, Kang W, Kumar M, Kim T (2017) Mosaic: processing a trillion-edge graph on a single machine. In: Proceedings of the Twelfth European Conference on Computer Systems, EuroSys’17. ACM, New York, NY, USA, pp 527–543. https://doi.org/10.1145/3064176.3064191

  21. Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, New York, NY, USA, pp 135–146. https://doi.org/10.1145/1807167.1807184

  22. Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07), San Diego, CA

  23. Pan Y, Wang Y, Wu Y, Yang C, Owens JD (2017) Multi-GPU graph analytics. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 479–490

  24. Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W (2015) Chaos: scale-out graph processing from secondary storage. In: Proceedings of the 25th Symposium on Operating Systems Principles. ACM, pp 410–424

  25. Roy A, Mihailovic I, Zwaenepoel W (2013) X-Stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, USA, pp 472–488. https://doi.org/10.1145/2517349.2522740

  26. Shao B, Wang H, Li Y (2013) Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, USA, pp 505–516. https://doi.org/10.1145/2463676.2467799

  27. Tian Y, Balmin A, Corsten SA, Tatikonda S, McPherson J (2013) From “think like a vertex” to “think like a graph. Proc VLDB Endow 7(3):193–204

    Article  Google Scholar 

  28. Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111. https://doi.org/10.1145/79173.79181

    Article  Google Scholar 

  29. White B, Lepreau J, Stoller L, Ricci R, Guruprasad S, Newbold M, Hibler M, Barb C, Joglekar A (2002) An integrated experimental environment for distributed systems and networks. In: Proceedings of the 5th Symposium on Operating Systems Design and implementation (OSDI’02). ACM, New York, NY, USA, pp 255–270. https://doi.org/10.1145/1060289.1060313

  30. Yan D, Huang Y, Liu M, Chen H, Cheng J, Wu H, Zhang C (2018) Graphd: distributed vertex-centric graph processing beyond the memory limit. IEEE Trans Parallel Distrib Syst 29(1):99–114

    Article  Google Scholar 

  31. Yuan P, Zhang W, Xie C, Jin H, Liu L, Lee K (2014) Fast iterative graph computation: a path centric approach. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). IEEE Press, Piscataway, NJ, USA, pp 401–412. https://doi.org/10.1109/SC.2014.38

  32. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18(5):821–829

    Article  Google Scholar 

  33. Zhang Y, Liao X, Jin H, He B, Liu H, Gu L (2019) Digraph: an efficient path-based iterative directed graph processing system on multiple GPUs. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, pp 601–614

  34. Zheng D, Mhembere D, Burns R, Vogelstein J, Priebe CE, Szalay AS (2015) FlashGraph: processing billion-node graphs on an array of commodity SSDs. In: 13th USENIX Conference on File and Storage Technologies (FAST 15). USENIX Association, Santa Clara, CA, pp 45–58. https://www.usenix.org/conference/fast15/technical-sessions/presentation/zheng. Accessed 12 Mar 2019

  35. Zhou Y, Liu L, Lee K, Zhang Q (2015) Graphtwist: fast iterative graph computation with two-tier optimizations. Proc VLDB Endow 8(11):1262–1273. https://doi.org/10.14778/2809974.2809987

    Article  Google Scholar 

Download references

Acknowledgements

Funding was provided by Louisiana Board of Regents (Grant No. LEQSF(2016-19)-RD-A-08) and National Science Foundation (Grant No. IBSS-L-1620451 and RAPID-1762600).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sayan Goswami.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goswami, S., Pokhrel, A., Lee, K. et al. GraphMap: scalable iterative graph processing using NoSQL. J Supercomput 76, 6619–6647 (2020). https://doi.org/10.1007/s11227-019-03097-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-03097-w

Keywords

Navigation