Abstract
One of the concepts that have attracted attention since entering the big data era is graph-structured data. Distributed systems for graph analysis are widely used to process large graphs. Graph partitioning is critical in parallel and distributed graph processing systems because it can balance the computational load and reduce communication load. An efficient graph partitioning algorithm can significantly improve the performance of large-scale graph data analysis and processing. In this paper, we propose a new Optimized Label Propagation-based distributed Graph Partitioning algorithm (OLPGP). OLPGP optimizes the label propagation algorithm and considers the differences between nodes. To improve computational efficiency, we implement OLPGP on the open-source distributed graph processing framework Spark GraphX. Conducted experiments on real-world networks indicate that OLPGP is scalable and achieves higher partition quality than the state-of-the-art label propagation-based graph partitioning algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache Giraph Project. https://giraph.apache.org/. Accessed 6 Jan 2022
Adoni, H.W.Y., Nahhal, T., Krichen, M., Aghezzaf, B., Elbyed, A.: A survey of current challenges in partitioning and processing of graph-structured data in parallel and distributed systems. Distrib. Parallel Databases 38(2), 495–530 (2020)
Akhremtsev, Y., Sanders, P., Schulz, C.: High-quality shared-memory graph partitioning. IEEE Trans. Parallel Distrib. Syst. 31(11), 2710–2722 (2020)
Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74(1), 47 (2002)
Awadelkarim, A., Ugander, J.: Prioritized restreaming algorithms for balanced graph partitioning. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1877–1887 (2020)
Bui, T.N., Jones, C.: Finding good approximate vertex and edge partitions is NP-hard. Inf. Process. Lett. 42(3), 153–159 (1992)
Chen, R., Shi, J., Chen, Y., Zang, B., Guan, H., Chen, H.: PowerLyra: differentiated graph computation and partitioning on skewed graphs. ACM Trans. Parallel Comput. (TOPC) 5(3), 1–39 (2019)
Chevalier, C., Pellegrini, F.: PT-scotch: a tool for efficient parallel graph ordering. Parallel Comput. 34(6–8), 318–331 (2008)
El Moussawi, A., Seghouani, N.B., Bugiotti, F.: B-GRAP: balanced graph partitioning algorithm for large graphs. J. Data Intell. 2(2), 116–135 (2021)
Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete problems. In: Proceedings of the Sixth Annual ACM Symposium on Theory of Computing, pp. 47–63 (1974)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: 10th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 2012), pp. 17–30 (2012)
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: graph processing in a distributed dataflow framework. In: 11th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 2014), pp. 599–613 (2014)
Gregory, S.: Finding overlapping communities in networks by label propagation. New J. Phys. 12(10), 103018 (2010)
Jafari, N., Selvitopi, O., Aykanat, C.: Fast shared-memory streaming multilevel graph partitioning. J. Parallel Distrib. Comput. 147, 140–151 (2021)
Karypis, G., Kumar, V.: Multilevel graph partitioning schemes. In: ICPP (3), pp. 113–122 (1995)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Karypis, G., Kumar, V.: A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. J. Parallel Distrib. Comput. 48(1), 71–95 (1998)
Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)
Leskovec, J., Huttenlocher, D., Kleinberg, J.: Signed networks in social media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1361–1370 (2010)
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 177–187 (2005)
Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6(1), 29–123 (2009)
Leskovec, J., Sosič, R.: SNAP: a general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol. (TIST) 8(1), 1 (2016)
Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a new framework for parallel machine learning. arXiv preprint arXiv:1408.2041 (2014)
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)
Martella, C., Logothetis, D., Loukas, A., Siganos, G.: Spinner: scalable graph partitioning in the cloud. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 1083–1094. IEEE (2017)
Mayer, C., et al.: Adwise: adaptive window-based streaming edge partitioning for high-speed graph processing. In: 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 685–695. IEEE (2018)
Mayer, R., Jacobsen, H.A.: Hybrid edge partitioner: partitioning large power-law graphs under memory constraints. In: Proceedings of the 2021 International Conference on Management of Data, pp. 1289–1302 (2021)
Meyerhenke, H., Sanders, P., Schulz, C.: Parallel graph partitioning for complex networks. IEEE Trans. Parallel Distrib. Syst. 28(9), 2625–2638 (2017)
Mofrad, M.H., Melhem, R., Hammoud, M.: Revolver: vertex-centric graph partitioning using reinforcement learning. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pp. 818–821. IEEE (2018)
Pellegrini, F., Roman, J.: Scotch: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In: Liddell, H., Colbrook, A., Hertzberger, B., Sloot, P. (eds.) HPCN-Europe 1996. LNCS, vol. 1067, pp. 493–498. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61142-8_588
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007)
Sajjad, H.P., Payberah, A.H., Rahimian, F., Vlassov, V., Haridi, S.: Boosting vertex-cut partitioning for streaming graphs. In: 2016 IEEE International Congress on Big Data (BigData Congress), pp. 1–8. IEEE (2016)
Sanders, P., Schulz, C.: Engineering multilevel graph partitioning algorithms. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 469–480. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23719-5_40
Sanders, P., Schulz, C.: Distributed evolutionary graph partitioning. In: 2012 Proceedings of the Fourteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 16–29. SIAM (2012)
Slota, G.M., Madduri, K., Rajamanickam, S.: PuLP: scalable multi-objective multi-constraint partitioning for small-world networks. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 481–490. IEEE (2014)
Slota, G.M., Rajamanickam, S., Devine, K., Madduri, K.: Partitioning trillion-edge graphs in minutes. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 646–655. IEEE (2017)
Stanton, I.: Streaming balanced graph partitioning algorithms for random graphs. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1287–1301. SIAM (2014)
Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1222–1230 (2012)
Tsourakakis, C., Gkantsidis, C., Radunovic, B., Vojnovic, M.: Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 333–342 (2014)
Van Laarhoven, P.J., Aarts, E.H.: Simulated annealing. In: Van Laarhoven, P.J., Aarts, E.H (eds.) Simulated Annealing: Theory and Applications, pp. 7–15. Springer, Dordrecht (1987). https://doi.org/10.1007/978-94-015-7744-1_2
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)
Zafarani, R., Liu, H.: Social computing data repository at ASU (2009). http://socialcomputing.asu.edu
Zafarani, R., Liu, H.: Social computing data repository at ASU (2009)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 2010) (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ren, H., Wu, B. (2022). OLPGP: An Optimized Label Propagation-Based Distributed Graph Partitioning Algorithm. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1744. Springer, Singapore. https://doi.org/10.1007/978-981-19-9297-1_10
Download citation
DOI: https://doi.org/10.1007/978-981-19-9297-1_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9296-4
Online ISBN: 978-981-19-9297-1
eBook Packages: Computer ScienceComputer Science (R0)