Advertisement

A survey of current challenges in partitioning and processing of graph-structured data in parallel and distributed systems

  • Hamilton Wilfried Yves AdoniEmail author
  • Tarik Nahhal
  • Moez Krichen
  • Brahim Aghezzaf
  • Abdeltif Elbyed
Article
  • 55 Downloads

Abstract

One of the concepts that attracts attention since entering of big data era is the graph-structured data. Suitable frameworks to handle such data would face several constraints, especially scalability, partitioning challenges, processing complexity and hardware configurations. Unfortunately, although several works deal with big data issues, there is a lack of literature review concerning the challenges related to query answering on large-scale graph data. In this survey paper, we review current problems related to the partitioning and processing of graph-structured data. We discuss existing graph processing systems and provide some insights to know how to choose the right system for parallel and distributed processing of large-scale graph data. Finally, we survey current open challenges in this field.

Keywords

Large-scale graph Big Data Graph processing system Graph partitioning Distributed computing 

Notes

References

  1. 1.
    Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: LinkBench: a database benchmark based on the facebook social graph. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13, pp. 1185–1196. ACM, New York (2013)Google Scholar
  2. 2.
    Watts, D.J., Strogatz, S.H.: Collective dynamics of ’small-world’ networks. Nature 393(6684), 440–442 (1998)zbMATHCrossRefGoogle Scholar
  3. 3.
    Travers, J., Milgram, S.: An experimental study of the small world problem. Sociometry 32(4), 425–443 (1969)CrossRefGoogle Scholar
  4. 4.
    Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Albert, R., Barabási, A.-L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74(1), 47 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Abeywickrama, T., Cheema, M.A., Taniar, D.: K-nearest neighbors on road networks: a journey in experimentation and in-memory implementation. Proc. VLDB Endow. 9(6), 492–503 (2016)CrossRefGoogle Scholar
  7. 7.
    Beutel, A.: User behavior modeling with large-scale graph analysis. PhD thesis, University of Trento (2016)Google Scholar
  8. 8.
    Czerepicki, A.: Application of graph databases for transport purposes. Bull. Pol. Acad. Sci. Tech. Sci. 64(3), 457–466 (2016)Google Scholar
  9. 9.
    Miler, M., Medak, D., Odobašióc, D.: The shortest path algorithm performance comparison in graph and relational database on a transportation network. Promet Traffic Transp. 26(1), 75–82 (2014)Google Scholar
  10. 10.
    Have, C.T., Jensen, L.J.: Are graph databases ready for bioinformatics? Bioinformatics 29(24), 3107–3108 (2013)CrossRefGoogle Scholar
  11. 11.
    Yoon, B.-H., Kim, S.-K., Kim, S.-Y.: Use of graph database for the integration of heterogeneous biological data. Genomics Inform. 15(1), 19–27 (2017)CrossRefGoogle Scholar
  12. 12.
    Adoni, W.Y.H., Nahhal, T., Aghezzaf, B., Elbyed, A.: MRA*: parallel and distributed path in large-scale graph using MapReduce-A* based approach. In: Ubiquitous Networking, Lecture Notes in Computer Science, pp. 390–401. Springer, Cham (2017)CrossRefGoogle Scholar
  13. 13.
    Aridhi, S., d’Orazio, L., Maddouri, M., Mephu, N.E.: Density-based data partitioning strategy to approximate large-scale subgraph mining. Inf. Syst. 48, 213–223 (2015)CrossRefGoogle Scholar
  14. 14.
    Lakhotia, K., Kannan, R., Prasanna, V.: Accelerating pagerank using partition-centric processing. In: 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston (2018)Google Scholar
  15. 15.
    Plimpton, S.J., Devine, K.D.: MapReduce in MPI for large-scale graph algorithms. Parallel Comput. 37(9), 610–632 (2011)CrossRefGoogle Scholar
  16. 16.
    Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S.: The web as a graph: measurements, models, and methods. In: Computing and Combinatorics, Lecture Notes in Computer Science, pp. 1–17. Springer, Berlin (1999)Google Scholar
  17. 17.
    Maillo, J., Ramírez, S., Triguero, I., Herrera, F.: kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl. Based Syst. 117, 3–15 (2016)CrossRefGoogle Scholar
  18. 18.
    Guo, K., Guo, W., Chen, Y., Qiu, Q., Zhang, Q.: Community discovery by propagating local and global information based on the MapReduce model. Inf. Sci. 323, 73–93 (2015)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Moon, S., Lee, J.-G., Kang, M., Choy, M., Lee, J.-W.: Parallel community detection on large graphs with MapReduce and GraphChi. Data Knowl. Eng. 104, 17–31 (2016)CrossRefGoogle Scholar
  20. 20.
    Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’01, pp. 269–274. ACM, New York (2001)Google Scholar
  21. 21.
    Junghanns, M., Petermann, A., Neumann, M., Rahm, E.: Management and analysis of big graph data: current systems and open challenges. In: Handbook of Big Data Technologies, pp. 457–505. Springer (2017)Google Scholar
  22. 22.
    Skhiri, S., Jouili, S.: Large graph mining: recent developments, challenges and potential solutions. In: European Business Intelligence Summer School, pp. 103–124. Springer (2012)Google Scholar
  23. 23.
    Adoni, W.Y.H., Nahhal, T., Aghezzaf, B., Elbyed, A.: The MapReduce-based approach to improve the shortest path computation in large-scale road networks: the case of A* algorithm. J. Big Data 5(1), 16 (2018)CrossRefGoogle Scholar
  24. 24.
    Cossalter, M., Mengshoel, O., Selker, T.: Visualizing and understanding large-scale Bayesian networks. In: Proceedings of the 17th AAAI Conference on Scalable Integration of Analytics and Visualization, AAAIWS’11-17, pp. 12–21. AAAI Press, Menlo Park (2011)Google Scholar
  25. 25.
    Gantz, J., Reinsel, D.: Extracting value from chaos. IDC iView 1142(2011), 1–12 (2011)Google Scholar
  26. 26.
    Alekseev, V.E., Boliac, R., Korobitsyn, D.V., Lozin, V.V.: NP-hard graph problems and boundary classes of graphs. Theor. Comput. Sci. 389(1), 219–236 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    Cameron, K., Eschen, E.M., Hoáng, C.T., Sritharan, R.: The complexity of the list partition problem for graphs. SIAM J. Discret. Math. 21(4), 900–929 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  28. 28.
    Yan, D., Tian, Y., Cheng, J.: Systems for Big Graph Analytics. Springer Briefs in Computer Science. Springer, Cham (2017)CrossRefGoogle Scholar
  29. 29.
    Goel, A.: Neo4j Cookbook Harness the Power of Neo4j to Perform Complex Data Analysis over the Course of 75 Easy-to-Follow Recipes. Packt Publishing, Birmingham (2015)Google Scholar
  30. 30.
    Guerrieri, A.: Distributed computing for large-scale graphs. PhD thesis, University of Trento (2015)Google Scholar
  31. 31.
    Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pp. 907–916. ACM, New York (2009)Google Scholar
  32. 32.
    Martin, C.H.: Spectral clustering: a quick overview. PhD thesis (2012)Google Scholar
  33. 33.
    Filippone, M., Camastra, F., Masulli, F., Rovetta, S.: A survey of kernel and spectral methods for clustering. Pattern Recognit. 41(1), 176–190 (2008)zbMATHCrossRefGoogle Scholar
  34. 34.
    Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, NIPS’01, pp. 849–856. MIT Press, Cambridge (2001)Google Scholar
  35. 35.
    Kong, H., Akakin, H.C., Sarma, S.E.: A generalized Laplacian of Gaussian filter for Blob detection and its applications. IEEE Trans. Cybern. 43(6), 1719–1733 (2013)CrossRefGoogle Scholar
  36. 36.
    Kamvar, S.D., Klein, D., Manning, C.D.: Spectral learning. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI’03, pp. 561–566. Morgan Kaufmann Publishers Inc., San Francisco (2003)Google Scholar
  37. 37.
    Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pp. 551–556. ACM, New York (2004)Google Scholar
  38. 38.
    Qiu, Y., Li, R., Li, J., Qiao, S., Wang, G., Yu, J.X., Mao, R.: Efficient structural clustering on probabilistic graphs. IEEE Trans. Knowl. Data Eng. 31, 1555–1568 (2018)Google Scholar
  39. 39.
    Aggarwal, C.C., Wang, H.: A survey of clustering algorithms for graph data. In: Aggarwal, C.C., Wang, H. (eds.) Managing and Mining Graph Data, vol. 40, pp. 275–301. Springer, Boston (2010)zbMATHCrossRefGoogle Scholar
  40. 40.
    Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)zbMATHCrossRefGoogle Scholar
  41. 41.
    Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Proceedings of the 19th Design Automation Conference, DAC ’82, pp. 175–181. IEEE Press, Piscataway (1982)Google Scholar
  42. 42.
    Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359–392 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  43. 43.
    Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, SC ’98, pp. 1–13. IEEE Computer Society, Washington, DC (1998)Google Scholar
  44. 44.
    Karypis, G., Kumar, V.: Multilevel K-way hypergraph partitioning. In: Proceedings of the 36th Annual ACM/IEEE Design Automation Conference, DAC ’99, pp. 343–348. ACM, New York (1999)Google Scholar
  45. 45.
    Schloegel, K., Karypis, G., Kumar, V.: Parallel multilevel algorithms for multi-constraint graph partitioning. In: Euro-Par 2000 Parallel Processing. Lecture Notes in Computer Science, pp. 296–310. Springer, Berlin (2000)CrossRefGoogle Scholar
  46. 46.
    Apache Spark-Lightning-Fast Cluster Computing. https://spark.apache.org/
  47. 47.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10), 95 (2010)Google Scholar
  48. 48.
    Kyrola, A., Blelloch, G., Guestrin, C.: GraphChi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pp. 31–46. USENIX Association, Berkeley (2012)Google Scholar
  49. 49.
    Johnson, D.S., Aragon, C.R., McGeoch, L.A., Schevon, C.: Optimization by simulated annealing: an experimental evaluation; part I, graph partitioning. Oper. Res. 37(6), 865–892 (1989)zbMATHCrossRefGoogle Scholar
  50. 50.
    Rolland, E., Pirkul, H., Glover, F.: Tabu search for graph partitioning. Ann. Oper. Res. 63, 209–232 (1996)zbMATHCrossRefGoogle Scholar
  51. 51.
    Bui, T.N., Strite, L.C.: An ant system algorithm for graph bisection. In: Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, GECCO’02, pp. 43–51. Morgan Kaufmann Publishers Inc., San Francisco (2002)Google Scholar
  52. 52.
    Maini, H., Mehrotra, K., Mohan, C., Ranka, S.: Genetic algorithms for graph partitioning and incremental graph partitioning. In: Proceedings of the 1994 ACM/IEEE Conference on Supercomputing, Supercomputing ’94, pp. 449–457. IEEE Computer Society Press, Los Alamitos (1994)Google Scholar
  53. 53.
    Kim, J., Hwang, I., Kim, Y.-H., Moon, B.-R.: Genetic approaches for graph partitioning: a survey. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, GECCO ’11, pp. 473–480. ACM, New York (2011)Google Scholar
  54. 54.
    Chen, R., Weng, X., He, B., Choi, B., Yang, M.: Network Performance Aware Graph Partitioning for Large Graph Processing Systems in the Cloud. Nanyang Technological University, Singapore (2014)CrossRefGoogle Scholar
  55. 55.
    Aggarwal, C.C., Zhao, Y., Yu, P.S.: A framework for clustering massive graph streams. Stat. Anal. Data Min. 3(6), 399–416 (2010)MathSciNetCrossRefGoogle Scholar
  56. 56.
    Tsourakakis, C., Gkantsidis, C., Radunovic, B., Vojnovic, M.: FENNEL: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM ’14, pp. 333–342. ACM, New York (2014)Google Scholar
  57. 57.
    Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pp. 17–30. USENIX Association, Berkeley (2012)Google Scholar
  58. 58.
    Rahimian, F., Payberah, A.H., Girdzijauskas, S., Jelasity, M., Haridi, S.: A distributed algorithm for large-scale graph partitioning. ACM Trans. Auton. Adapt. Syst. 10(2), 1–24 (2015)CrossRefGoogle Scholar
  59. 59.
    Rahimian, F., Payberah, A.H., Girdzijauskas, S., Haridi, S.: Distributed vertex-cut partitioning. In: IFIP International Conference on Distributed Applications and Interoperable Systems, pp. 186–200. Springer (2014)Google Scholar
  60. 60.
    Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp. 1222–1230. ACM, New York (2012)Google Scholar
  61. 61.
    Tashkova, K., Koros̆ec, P., S̆ilc, J.: A distributed multilevel ant-colony algorithm for the multi-way graph partitioning. Int. J. Bio-Inspired Comput. 3(5), 286–296 (2011)CrossRefGoogle Scholar
  62. 62.
    White, T.: Hadoop: The Definitive Guide, 3rd edn. O’Reilly, Beijing (2012)Google Scholar
  63. 63.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  64. 64.
    Vavilapalli, V.K., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H.: Apache Hadoop YARN: Yet Another Resource Negotiator, pp. 1–16. ACM Press, Santa Clara (2013)Google Scholar
  65. 65.
    Al hajj Hassan, M., Bamha, M.: Handling Limits of High Degree Vertices in Graph Processing Using MapReduce and Pregel, Research Report. Université Orléans, INSA Centre Val de Loire, LIFO EA 4022, Orléans (2017)Google Scholar
  66. 66.
    Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRefGoogle Scholar
  67. 67.
    Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From think like a vertex to think like a graph. Proc. VLDB Endow. 7(3), 193–204 (2013)CrossRefGoogle Scholar
  68. 68.
    Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pp. 135–146. ACM, New York (2010)Google Scholar
  69. 69.
    Sagharichian, M., Naderi, H., Haghjoo, M.: ExPregel: a new computational model for large-scale graph processing. Concur. Comput. Pract. Exp. 27(17), 4954–4969 (2015)CrossRefGoogle Scholar
  70. 70.
    Ching, A.: Giraph: large-scale graph processing infrastructure on Hadoop. In: Proceedings of the Hadoop Summit, Vol. 11 of 3, Santa Clara, pp. 5–9 (2011)Google Scholar
  71. 71.
    Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.: GraphLab: A new framework for parallel machine learning. In: Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI’10, pp. 340–349. AUAI Press, Arlington (2010)Google Scholar
  72. 72.
    Salihoglu, S., Widom, J.: Gps: a graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management, SSDBM, vol. 22, pp. 1–12. ACM, New York (2013)Google Scholar
  73. 73.
    Yan, D., Cheng, J., Lu, Y., Ng, W.: Blogel: a block-centric framework for distributed computation on real-world graphs. Proc. VLDB Endow. 7(14), 1981–1992 (2014)CrossRefGoogle Scholar
  74. 74.
    Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES ’13, pp. 1–6. ACM, New York (2013)Google Scholar
  75. 75.
    Junghanns, M., Petermann, A., Gómez, K., Rahm, E.: GRADOOP: Scalable Graph Data Management and Analytics with Hadoop. CoRR abs/1506.00548Google Scholar
  76. 76.
    Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. In: ACM SIGOPS Operating Systems Review, vol. 37, pp. 29–43. ACM, New York (2003)CrossRefGoogle Scholar
  77. 77.
    Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: Nikoletseas, S.E. (ed.) Experimental and Efficient Algorithms. Lecture Notes in Computer Science, pp. 606–609. Springer, Berlin (2005)zbMATHCrossRefGoogle Scholar
  78. 78.
    Jain, N., Liao, G., Willke, T.L.: Graphbuilder: scalable graph ETL framework. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES ’13, pp. 1–6. ACM, New York (2013)Google Scholar
  79. 79.
    Chonbodeechalermroong, A., Hewett, R.: Towards visualizing big data with large-scale edge constraint graph drawing. Big Data Res. 10, 21–32 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Faculty of sciencesHassan II University of CasablancaCasablancaMorocco
  2. 2.Faculty of CSITAlbaha UniversityAlbahaSaudi Arabia
  3. 3.ReDCAD LaboratoryUniversity of SfaxSfaxTunisia

Personalised recommendations