Skip to main content

OLPGP: An Optimized Label Propagation-Based Distributed Graph Partitioning Algorithm

  • Conference paper
  • First Online:
Data Mining and Big Data (DMBD 2022)

Abstract

One of the concepts that have attracted attention since entering the big data era is graph-structured data. Distributed systems for graph analysis are widely used to process large graphs. Graph partitioning is critical in parallel and distributed graph processing systems because it can balance the computational load and reduce communication load. An efficient graph partitioning algorithm can significantly improve the performance of large-scale graph data analysis and processing. In this paper, we propose a new Optimized Label Propagation-based distributed Graph Partitioning algorithm (OLPGP). OLPGP optimizes the label propagation algorithm and considers the differences between nodes. To improve computational efficiency, we implement OLPGP on the open-source distributed graph processing framework Spark GraphX. Conducted experiments on real-world networks indicate that OLPGP is scalable and achieves higher partition quality than the state-of-the-art label propagation-based graph partitioning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache Giraph Project. https://giraph.apache.org/. Accessed 6 Jan 2022

  2. Adoni, H.W.Y., Nahhal, T., Krichen, M., Aghezzaf, B., Elbyed, A.: A survey of current challenges in partitioning and processing of graph-structured data in parallel and distributed systems. Distrib. Parallel Databases 38(2), 495–530 (2020)

    Article  Google Scholar 

  3. Akhremtsev, Y., Sanders, P., Schulz, C.: High-quality shared-memory graph partitioning. IEEE Trans. Parallel Distrib. Syst. 31(11), 2710–2722 (2020)

    Article  Google Scholar 

  4. Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74(1), 47 (2002)

    Article  MATH  Google Scholar 

  5. Awadelkarim, A., Ugander, J.: Prioritized restreaming algorithms for balanced graph partitioning. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1877–1887 (2020)

    Google Scholar 

  6. Bui, T.N., Jones, C.: Finding good approximate vertex and edge partitions is NP-hard. Inf. Process. Lett. 42(3), 153–159 (1992)

    Article  MATH  Google Scholar 

  7. Chen, R., Shi, J., Chen, Y., Zang, B., Guan, H., Chen, H.: PowerLyra: differentiated graph computation and partitioning on skewed graphs. ACM Trans. Parallel Comput. (TOPC) 5(3), 1–39 (2019)

    Google Scholar 

  8. Chevalier, C., Pellegrini, F.: PT-scotch: a tool for efficient parallel graph ordering. Parallel Comput. 34(6–8), 318–331 (2008)

    Article  Google Scholar 

  9. El Moussawi, A., Seghouani, N.B., Bugiotti, F.: B-GRAP: balanced graph partitioning algorithm for large graphs. J. Data Intell. 2(2), 116–135 (2021)

    Article  Google Scholar 

  10. Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete problems. In: Proceedings of the Sixth Annual ACM Symposium on Theory of Computing, pp. 47–63 (1974)

    Google Scholar 

  11. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: 10th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 2012), pp. 17–30 (2012)

    Google Scholar 

  12. Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: graph processing in a distributed dataflow framework. In: 11th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 2014), pp. 599–613 (2014)

    Google Scholar 

  13. Gregory, S.: Finding overlapping communities in networks by label propagation. New J. Phys. 12(10), 103018 (2010)

    Article  MATH  Google Scholar 

  14. Jafari, N., Selvitopi, O., Aykanat, C.: Fast shared-memory streaming multilevel graph partitioning. J. Parallel Distrib. Comput. 147, 140–151 (2021)

    Article  Google Scholar 

  15. Karypis, G., Kumar, V.: Multilevel graph partitioning schemes. In: ICPP (3), pp. 113–122 (1995)

    Google Scholar 

  16. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)

    Article  MATH  Google Scholar 

  17. Karypis, G., Kumar, V.: A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. J. Parallel Distrib. Comput. 48(1), 71–95 (1998)

    Article  Google Scholar 

  18. Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)

    Article  MATH  Google Scholar 

  19. Leskovec, J., Huttenlocher, D., Kleinberg, J.: Signed networks in social media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1361–1370 (2010)

    Google Scholar 

  20. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 177–187 (2005)

    Google Scholar 

  21. Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6(1), 29–123 (2009)

    Article  MATH  Google Scholar 

  22. Leskovec, J., Sosič, R.: SNAP: a general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol. (TIST) 8(1), 1 (2016)

    Google Scholar 

  23. Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a new framework for parallel machine learning. arXiv preprint arXiv:1408.2041 (2014)

  24. Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)

    Google Scholar 

  25. Martella, C., Logothetis, D., Loukas, A., Siganos, G.: Spinner: scalable graph partitioning in the cloud. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 1083–1094. IEEE (2017)

    Google Scholar 

  26. Mayer, C., et al.: Adwise: adaptive window-based streaming edge partitioning for high-speed graph processing. In: 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 685–695. IEEE (2018)

    Google Scholar 

  27. Mayer, R., Jacobsen, H.A.: Hybrid edge partitioner: partitioning large power-law graphs under memory constraints. In: Proceedings of the 2021 International Conference on Management of Data, pp. 1289–1302 (2021)

    Google Scholar 

  28. Meyerhenke, H., Sanders, P., Schulz, C.: Parallel graph partitioning for complex networks. IEEE Trans. Parallel Distrib. Syst. 28(9), 2625–2638 (2017)

    Article  Google Scholar 

  29. Mofrad, M.H., Melhem, R., Hammoud, M.: Revolver: vertex-centric graph partitioning using reinforcement learning. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pp. 818–821. IEEE (2018)

    Google Scholar 

  30. Pellegrini, F., Roman, J.: Scotch: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In: Liddell, H., Colbrook, A., Hertzberger, B., Sloot, P. (eds.) HPCN-Europe 1996. LNCS, vol. 1067, pp. 493–498. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61142-8_588

    Chapter  Google Scholar 

  31. Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007)

    Article  Google Scholar 

  32. Sajjad, H.P., Payberah, A.H., Rahimian, F., Vlassov, V., Haridi, S.: Boosting vertex-cut partitioning for streaming graphs. In: 2016 IEEE International Congress on Big Data (BigData Congress), pp. 1–8. IEEE (2016)

    Google Scholar 

  33. Sanders, P., Schulz, C.: Engineering multilevel graph partitioning algorithms. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 469–480. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23719-5_40

    Chapter  MATH  Google Scholar 

  34. Sanders, P., Schulz, C.: Distributed evolutionary graph partitioning. In: 2012 Proceedings of the Fourteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 16–29. SIAM (2012)

    Google Scholar 

  35. Slota, G.M., Madduri, K., Rajamanickam, S.: PuLP: scalable multi-objective multi-constraint partitioning for small-world networks. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 481–490. IEEE (2014)

    Google Scholar 

  36. Slota, G.M., Rajamanickam, S., Devine, K., Madduri, K.: Partitioning trillion-edge graphs in minutes. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 646–655. IEEE (2017)

    Google Scholar 

  37. Stanton, I.: Streaming balanced graph partitioning algorithms for random graphs. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1287–1301. SIAM (2014)

    Google Scholar 

  38. Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1222–1230 (2012)

    Google Scholar 

  39. Tsourakakis, C., Gkantsidis, C., Radunovic, B., Vojnovic, M.: Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 333–342 (2014)

    Google Scholar 

  40. Van Laarhoven, P.J., Aarts, E.H.: Simulated annealing. In: Van Laarhoven, P.J., Aarts, E.H (eds.) Simulated Annealing: Theory and Applications, pp. 7–15. Springer, Dordrecht (1987). https://doi.org/10.1007/978-94-015-7744-1_2

  41. Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)

    Article  Google Scholar 

  42. Zafarani, R., Liu, H.: Social computing data repository at ASU (2009). http://socialcomputing.asu.edu

  43. Zafarani, R., Liu, H.: Social computing data repository at ASU (2009)

    Google Scholar 

  44. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 2010) (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ren, H., Wu, B. (2022). OLPGP: An Optimized Label Propagation-Based Distributed Graph Partitioning Algorithm. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1744. Springer, Singapore. https://doi.org/10.1007/978-981-19-9297-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-9297-1_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-9296-4

  • Online ISBN: 978-981-19-9297-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics