Parallel Processing of Graphs

  • Bin Shao
  • Yatao Li
Part of the Data-Centric Systems and Applications book series (DCSA)


Graphs play an indispensable role in a wide range of application domains. Graph processing at scale, however, is facing challenges at all levels, ranging from system architectures to programming models. In this chapter, we review the challenges of parallel processing of large graphs, representative graph processing systems, general principles of designing large graph processing systems, and various graph computation paradigms. Graph processing covers a wide range of topics and graphs can be represented in different forms. Different graph representations lead to different computation paradigms and system architectures. From the perspective of graph representation, this chapter also briefly introduces a few alternative forms of graph representation besides adjacency list.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aggarwal CC, Wang H (eds) (2010) Managing and mining graph data. Advances in database systems, vol 40. Springer, BerlinzbMATHGoogle Scholar
  2. Aranda-Andújar A, Bugiotti F, Camacho-Rodríguez J, Colazzo D, Goasdoué F, Kaoudi Z, Manolescu I (2012) Amada: web data repositories in the amazon cloud. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12. ACM, New York, pp 2749–2751Google Scholar
  3. Atre M, Chaoji V, Zaki MJ, Hendler JA (2010) Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. In: WWW, pp 41–50Google Scholar
  4. Bollobás B (1998) Modern graph theory. Graduate texts in mathematics, Springer, BerlinCrossRefGoogle Scholar
  5. Cheng J, Yu JX, Ding B, Yu PS, Wang H (2008) Fast graph pattern matching. In: ICDE, pp 913–922Google Scholar
  6. Cohen J (2009) Graph twiddling in a mapreduce world. In: Computing in science & engineering, pp 29–41CrossRefGoogle Scholar
  7. Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372CrossRefGoogle Scholar
  8. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51:107–113CrossRefGoogle Scholar
  9. Garey MR, Johnson DS, Stockmeyer L (1974) Some simplified np-complete problems. In: Proceedings of the sixth annual ACM symposium on theory of computing, STOC ’74. ACM, New York, pp 47–63CrossRefGoogle Scholar
  10. Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI, pp 17–30Google Scholar
  11. Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX conference on operating systems design and implementation, OSDI’14. USENIX Association, Berkeley, pp 599–613Google Scholar
  12. Gregor D, Lumsdaine A (2005) The parallel BGL: a generic library for distributed graph computations. In: Parallel object-oriented scientific computing (POOSC), POOSC ’05Google Scholar
  13. He H, Singh AK (2008) Graphs-at-a-time: query language and access methods for graph databases. In: SIGMODGoogle Scholar
  14. Holder LB, Cook DJ, Djoko S (1994) Substucture discovery in the subdue system. In: KDD workshop, pp 169–180Google Scholar
  15. Husain M, McGlothlin J, Masud MM, Khan L, Thuraisingham BM (2011) Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans Knowl Data Eng 23(9):1312–1327CrossRefGoogle Scholar
  16. Kang U, Tsourakakis CE, Faloutsos C (2009) Pegasus: a peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 ninth IEEE international conference on data mining, ICDM ’09. IEEE Computer Society, Washington, pp 229–238Google Scholar
  17. Kaoudi Z, Manolescu I (2015) RDF in the clouds: a survey. VLDB J 24(1):67–91CrossRefGoogle Scholar
  18. Kyrola A, Blelloch G, Guestrin C (2012) Graphchi: large-scale graph computation on just a pc. In: OSDI, pp 31–46Google Scholar
  19. Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727CrossRefGoogle Scholar
  20. Lumsdaine A, Gregor D, Hendrickson B, Berry JW (2007) Challenges in parallel graph processing. Parallel Process Lett 17(1):5–20MathSciNetCrossRefGoogle Scholar
  21. Majumder S, Rixner S (2004) An event-driven architecture for MPI libraries. In: Proceedings of the 2004 Los Alamos computer science institute symposiumGoogle Scholar
  22. Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 international conference on management of data, SIGMOD ’10. ACM, New York, pp 135–146CrossRefGoogle Scholar
  23. Neumann T, Weikum G (2010) The rdf-3x engine for scalable management of RDF data. VLDB J 19(1):91–113CrossRefGoogle Scholar
  24. Oxley J (1992) Matroid theory. Oxford University Press, OxfordzbMATHGoogle Scholar
  25. Oxley J (2001) On the interplay between graphs and matroids. In: Surveys in combinatorics 2001. Cambridge University Press, CambridgeGoogle Scholar
  26. Papailiou N, Konstantinou I, Tsoumakos D, Koziris N (2012) H2rdf: adaptive query processing on RDF data in the cloud. In: Proceedings of the 21st international conference on World Wide Web, WWW ’12 Companion. ACM, New York, pp 397–400Google Scholar
  27. Qi Z, Xiao Y, Shao B, Wang H (2014) Distance oracle on billion node graphs. In: VLDB, VLDB EndowmentGoogle Scholar
  28. Qin L, Yu JX, Chang L, Cheng H, Zhang C, Lin X (2014) Scalable big graph processing in mapreduce. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 827–838Google Scholar
  29. Ravindra P, Kim H, Anyanwu K (2011) An intermediate algebra for optimizing RDF graph pattern matching on mapreduce. In: Proceedings of the 8th extended semantic web conference on the semanic web: research and applications - volume Part II, ESWC’11. Springer, Berlin, pp 46–61Google Scholar
  30. Rohloff K, Schantz RE (2011) Clause-iteration with mapreduce to scalably query datagraphs in the shard graph-store. In: Proceedings of the fourth international workshop on data-intensive distributed computing, DIDC ’11. ACM, New York, pp 35–44CrossRefGoogle Scholar
  31. Sarwat M, Elnikety S, He Y, Mokbel MF (2013) Horton+: a distributed system for processing declarative reachability queries over partitioned graphs. Proc VLDB Endow 6(14):1918–1929CrossRefGoogle Scholar
  32. Shao B, Wang H, Li Y (2013) Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, SIGMOD ’13. ACM, New York, pp 505–516CrossRefGoogle Scholar
  33. Sun Z, Wang H, Wang H, Shao B, Li J (2012) Efficient subgraph matching on billion node graphs. Proc VLDB Endow 5(9):788–799CrossRefGoogle Scholar
  34. Truemper K (1998) Matroid decomposition. Elsevier, AmsterdamzbMATHGoogle Scholar
  35. Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23(1):31–42MathSciNetCrossRefGoogle Scholar
  36. Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33:103–111CrossRefGoogle Scholar
  37. von Eicken T, Culler DE, Goldstein SC, Schauser KE (1992) Active messages: a mechanism for integrated communication and computation. In: Proceedings of the 19th annual international symposium on computer architecture, ISCA ’92. ACM, New York, pp 256–266Google Scholar
  38. Wang L, Xiao Y, Shao B, Wang H (2014) How to partition a billion-node graph. In: IEEE 30th international conference on data engineering, ICDE 2014, Chicago, March 31–April 4, 2014, pp 568–579Google Scholar
  39. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: HotCloud’10 proceedings of the 2nd USENIX conference on hot topics in cloud computing. USENIX Association, Berkeley, 18 pp.Google Scholar
  40. Zeng K, Yang J, Wang H, Shao B, Wang Z (2013) A distributed graph engine for web scale RDF data. In: VLDB, VLDB EndowmentGoogle Scholar
  41. Zhang S, Li S, Yang J (2009) Gaddi: distance index based subgraph matching in biological networks. In: EDBTGoogle Scholar
  42. Zhang X, Chen L, Tong Y, Wang M (2013) Eagre: towards scalable I/O efficient SPARQL query evaluation on the cloud. In: Proceedings of the 2013 IEEE international conference on data engineering (ICDE 2013), ICDE ’13. IEEE Computer Society, Washington, pp 565–576Google Scholar
  43. Zhao P, Han J (2010) On graph query optimization in large networks. PVLDB 3(1):340–351Google Scholar
  44. Zhao X, Sala A, Wilson C, Zheng H, Zhao BY (2010) Orion: shortest path estimation for large social graphs. In: WOSN’10Google Scholar
  45. Zhao X, Sala A, Zheng H, Zhao BY (2011) Fast and scalable analysis of massive social graphs. CoRRGoogle Scholar
  46. Zhu F, Qu Q, Lo D, Yan X, Han J, Yu PS (2011) Mining top-k large structural patterns in a massive network. In: VLDBGoogle Scholar
  47. Zou L, Chen L, Özsu MT (2009) Distancejoin: pattern match query in a large graph database. PVLDB 2(1):886–897Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Microsoft Research AsiaBeijingChina

Personalised recommendations