Advertisement

GraphMap: scalable iterative graph processing using NoSQL

  • Sayan GoswamiEmail author
  • Ayam Pokhrel
  • Kisung Lee
  • Ling Liu
  • Qi Zhang
  • Yang Zhou
Article
  • 18 Downloads

Abstract

Despite having several distributed graph processing frameworks, scalable iterative processing of large graphs is a challenging problem since the graph and intermediate data need a global view of the graph topology in distributed memory. Although some systems support out-of-core iterative computations, they use a single machine and often require fast storage. In this paper, we present a new distributed iterative graph computation framework, called GraphMap, that utilizes a disk-based NoSQL database system for scalable graph processing while ensuring competitive performance. Extensive experiments on several real-world graphs show that GraphMap is more scalable and often faster than existing distributed memory-based systems for various graph processing workloads.

Keywords

Graph processing Distributed systems NoSQL 

Notes

Acknowledgements

Funding was provided by Louisiana Board of Regents (Grant No. LEQSF(2016-19)-RD-A-08) and National Science Foundation (Grant No. IBSS-L-1620451 and RAPID-1762600).

References

  1. 1.
    Apache Giraph. http://giraph.apache.org/. Accessed 12 Mar 2019
  2. 2.
    Apache Hama. https://hama.apache.org/. Accessed 12 Mar 2019
  3. 3.
    Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’06. ACM, New York, NY, USA, pp 44–54.  https://doi.org/10.1145/1150402.1150412
  4. 4.
    Boldi P, Vigna S (2004) The WebGraph framework I: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, pp 595–601Google Scholar
  5. 5.
    Bu Y, Borkar V, Jia J, Carey MJ, Condie T (2014) Pregelix: Big(Ger) graph analytics on a dataflow engine. Proc VLDB Endow 8(2):161–172CrossRefGoogle Scholar
  6. 6.
  7. 7.
    Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, USA, pp 17–30. http://dl.acm.org/citation.cfm?id=2387880.2387883. Accessed 12 Mar 2019
  8. 8.
    Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) GraphX: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX Association, Berkeley, CA, USA, pp 599–613. http://dl.acm.org/citation.cfm?id=2685048.2685096. Accessed 12 Mar 2019
  9. 9.
    Goswami S, Das AK, Płatania R, Lee K, Park SJ (2016) Lazer: Distributed memory-efficient assembly of large-scale genomes. In: 2016 IEEE International Conference on Big Data (Big Data), pp 1171–1181.  https://doi.org/10.1109/BigData.2016.7840721
  10. 10.
    Han WS, Lee S, Park K, Lee JH, Kim MS, Kim J, Yu H (2013) TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). ACM, New York, NY, USA, pp 77–85.  https://doi.org/10.1145/2487575.2487581
  11. 11.
    Hundt R (2011) Loop recognition in C++/Java/go/scala. Proc Scala Days 2011:38Google Scholar
  12. 12.
    Jia Z, Kwon Y, Shipman G, McCormick P, Erez M, Aiken A (2017) A distributed multi-GPU system for fast graph processing. Proc VLDB Endow 11(3):297–310CrossRefGoogle Scholar
  13. 13.
    Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392.  https://doi.org/10.1137/S1064827595287997 MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, WWW’10. ACM, New York, NY, USA, pp 591–600.  https://doi.org/10.1145/1772690.1772751
  15. 15.
    Kyrola A, Blelloch G, Guestrin C (2012) GraphChi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12. USENIX Association, Berkeley, CA, USA, pp 31–46. http://dl.acm.org/citation.cfm?id=2387880.2387884
  16. 16.
    Lee K, Liu L (2013) Scaling queries over Big RDF graphs with semantic hash partitioning. Proc VLDB Endow 6(14):1894–1905CrossRefGoogle Scholar
  17. 17.
    Lee K, Liu L, Schwan K, Pu C, Zhang Q, Zhou Y, Yigitoglu E, Yuan P (2015) Scaling iterative graph computations with GraphMap. In: SC15: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–12.  https://doi.org/10.1145/2807591.2807604
  18. 18.
    Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’05). ACM, New York, NY, USA, pp 177–187.  https://doi.org/10.1145/1081870.1081893
  19. 19.
    Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727CrossRefGoogle Scholar
  20. 20.
    Maass S, Min C, Kashyap S, Kang W, Kumar M, Kim T (2017) Mosaic: processing a trillion-edge graph on a single machine. In: Proceedings of the Twelfth European Conference on Computer Systems, EuroSys’17. ACM, New York, NY, USA, pp 527–543.  https://doi.org/10.1145/3064176.3064191
  21. 21.
    Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, New York, NY, USA, pp 135–146.  https://doi.org/10.1145/1807167.1807184
  22. 22.
    Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07), San Diego, CAGoogle Scholar
  23. 23.
    Pan Y, Wang Y, Wu Y, Yang C, Owens JD (2017) Multi-GPU graph analytics. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 479–490Google Scholar
  24. 24.
    Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W (2015) Chaos: scale-out graph processing from secondary storage. In: Proceedings of the 25th Symposium on Operating Systems Principles. ACM, pp 410–424Google Scholar
  25. 25.
    Roy A, Mihailovic I, Zwaenepoel W (2013) X-Stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, USA, pp 472–488.  https://doi.org/10.1145/2517349.2522740
  26. 26.
    Shao B, Wang H, Li Y (2013) Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, USA, pp 505–516.  https://doi.org/10.1145/2463676.2467799
  27. 27.
    Tian Y, Balmin A, Corsten SA, Tatikonda S, McPherson J (2013) From “think like a vertex” to “think like a graph. Proc VLDB Endow 7(3):193–204CrossRefGoogle Scholar
  28. 28.
    Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111.  https://doi.org/10.1145/79173.79181 CrossRefGoogle Scholar
  29. 29.
    White B, Lepreau J, Stoller L, Ricci R, Guruprasad S, Newbold M, Hibler M, Barb C, Joglekar A (2002) An integrated experimental environment for distributed systems and networks. In: Proceedings of the 5th Symposium on Operating Systems Design and implementation (OSDI’02). ACM, New York, NY, USA, pp 255–270.  https://doi.org/10.1145/1060289.1060313
  30. 30.
    Yan D, Huang Y, Liu M, Chen H, Cheng J, Wu H, Zhang C (2018) Graphd: distributed vertex-centric graph processing beyond the memory limit. IEEE Trans Parallel Distrib Syst 29(1):99–114CrossRefGoogle Scholar
  31. 31.
    Yuan P, Zhang W, Xie C, Jin H, Liu L, Lee K (2014) Fast iterative graph computation: a path centric approach. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). IEEE Press, Piscataway, NJ, USA, pp 401–412.  https://doi.org/10.1109/SC.2014.38
  32. 32.
    Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18(5):821–829CrossRefGoogle Scholar
  33. 33.
    Zhang Y, Liao X, Jin H, He B, Liu H, Gu L (2019) Digraph: an efficient path-based iterative directed graph processing system on multiple GPUs. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, pp 601–614Google Scholar
  34. 34.
    Zheng D, Mhembere D, Burns R, Vogelstein J, Priebe CE, Szalay AS (2015) FlashGraph: processing billion-node graphs on an array of commodity SSDs. In: 13th USENIX Conference on File and Storage Technologies (FAST 15). USENIX Association, Santa Clara, CA, pp 45–58. https://www.usenix.org/conference/fast15/technical-sessions/presentation/zheng. Accessed 12 Mar 2019
  35. 35.
    Zhou Y, Liu L, Lee K, Zhang Q (2015) Graphtwist: fast iterative graph computation with two-tier optimizations. Proc VLDB Endow 8(11):1262–1273.  https://doi.org/10.14778/2809974.2809987 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Louisiana State UniversityShreveportUSA
  2. 2.Louisiana State UniversityBaton RougeUSA
  3. 3.Georgia Institute of TechnologyAtlantaUSA
  4. 4.IBM Thomas J. Watson Research CenterNew YorkUSA
  5. 5.Auburn UniversityAuburnUSA

Personalised recommendations