Abstract
Relations among data items can be modeled with graphs in most of big data sets such as social networks’ data. This modeling creates big graphs with many vertices and edges. Balanced k-way graph partitioning is a common problem with big graphs. It has many applications in several fields. There are many approximate solutions for this problem; however, most of them do not have enough scalability for big graph partitioning and cannot be executed in a distributed manner. Vertex-centric model has been introduced recently as a scalable distributed processing method for big graphs. There are a few methods for graph partitioning based on this model. Existing approaches only consider one-step neighbors of vertices for graph partitioning and do not consider neighbors with higher steps. In this paper, a distributed method is introduced based on vertex-centric model for balanced k-way graph partitioning. This method applies the personalized PageRank vectors of vertices and partitions to decide how vertices are joined partitions. This method has been implemented in the Giraph system. The proposed method has been evaluated with several synthetic and real graphs. Experimental results have shown that this method has scalability for partitioning big graphs. It was also found that this method produces partitions with higher quality compared to the state-of-the-art stream-based methods and distributed methods based on vertex-centric programming model. Its result is close to the results of Metis method.
Similar content being viewed by others
References
Albert R, Barabási A-L (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47–97
Andersen R, Chung F, Lang K (2006) Local graph partitioning using PageRank vectors. In: 47th annual IEEE symposium on foundations of computer science (FOCS’06), pp 475–486
Andersen R, Chung F, Lang K (2008) Local partitioning for directed graphs using pagerank. Internet Math. 5(1–2):3–22
Avery C (2011) Giraph: large-scale graph processing infrastructure on hadoop. Proc Hadoop Summit Santa Clara 11(3):5–9
Avrachenkov K, Litvak N, Nemirovsky D, Osipova N (2007) Monte carlo methods in pagerank computation: when one iteration is sufficient. SIAM J Numer Anal 45(2):890–904
Aydin K, Bateni M, Mirrokni V (2016) Distributed balanced partitioning via linear embedding. In: Proceedings of the 9th international conference on web search and data mining, WSDM’16. ACM, pp 387–396
Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized pagerank. Proc VLDB Endow 4(3):173–184
Buluç A, Meyerhenke H, Safro I, Sanders P, Schulz C (2016) Recent advances in graph partitioning. In: Kliemann L, Sanders P (eds) Algorithm engineering: selected results and surveys, vol 9220. Springer, Cham, pp 117–158. https://doi.org/10.1007/978-3-319-49487-6_4
Chen R, Shi J, Chen Y, Chen H (2015) PowerLyra: differentiated graph computation and partitioning on skewed graphs. In: Proceedings of the 10th European conference on computer systems, EuroSys ’15. ACM, pp 1:1–1:15
Chung F, Simpson O (2018) Computing heat kernel pagerank and a local clustering algorithm. Eur J Comb 68(Supplement C):96–119
Condon A, Karp RM (2001) Algorithms for graph partitioning on the planted partition model. Random Struct Algorithms 18(2):116–140
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957
Fogaras D, Rcz B, Csalogny K, Sarls T (2005) Towards scaling fully personalized pagerank: algorithms, lower bounds, and experiments. Internet Math 2(3):333–358
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of 10th USENIX symposium on operating systems design and implementation (OSDI), vol 12, pp 17–30
Grady L, Schwartz EL (2006) Isoperimetric graph partitioning for image segmentation. IEEE Trans Pattern Anal Mach Intell 28(3):469–475
Guerrieri Alessio MA (2015) DFEP: distributed funding-based edge partitioning. In: Euro-Par: 21st international conference on parallel and distributed computing. Springer, Berlin, pp 346–358
Guo T, Cao X, Cong G, Lu J, Lin X (2017) Distributed algorithms on exact personalized PageRank. In: Proceedings of the international conference on management of data, SIGMOD ’17. ACM, pp 479–494
Jeh G, Widom J (2003) Scaling personalized web search. In: Proceedings of the 12th international conference on world wide web, WWW ’03. ACM, pp 271–279
Karypis G, Aggarwal R, Kumar V, Shekhar S (1999) Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Trans Very Large Scale Integr VLSI Syst 7(1):69–79
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Kunegis J (2013) KONECT: the Koblenz network collection. In: Proceedings of the 22th international conference on world wide web, WWW ’13 companion. ACM, pp 1343–1350
Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data
Lofgren PA, Banerjee S, Goel A, Seshadhri C (2014) FAST-PPR: scaling personalized PageRank estimation for large graphs. In: Proceedings of the 20th SIGKDD international conference on knowledge discovery and data mining, KDD ’14. ACM, pp 1436–1445
Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD international conference on management of data, SIGMOD ’10. ACM, pp 135–146
Martella C, Logothetis D, Loukas A, Siganos G (2017) Spinner: scalable graph partitioning in the cloud. In: IEEE 33th international conference on data engineering (ICDE), pp 1083–1094
McSherry F (2001) Spectral partitioning of random graphs. In: Proceedings IEEE international conference on cluster computing, pp 529–537
Meyerhenke H, Sanders P, Schulz C (2017) Parallel graph partitioning for complex networks. IEEE Trans Parallel Distrib Syst 28(9):2625–2638
Mofrad MH, Melhem R, Hammoud M (2018) Revolver: vertex-centric graph partitioning using reinforcement learning. In: 2018 IEEE 11th international conference on cloud computing (CLOUD), vol 00, pp 818–821. https://doi.org/10.1109/CLOUD.2018.00111
Nishimura J, Ugander J (2013) Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In: Proceedings of the 19th SIGKDD international conference on knowledge discovery and data mining, KDD ’13. ACM, pp 1106–1114. https://doi.org/10.1145/2487575.2487696
Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Technical report 1999-66, Stanford InfoLab
Perozzi B, McCubbin C, Halbert JT (2014) Scalable graph clustering with parallel approximate PageRank. Soc Netw Anal Min 4(1):179
Rahimian F, Payberah AH, Girdzijauskas S, Haridi S (2014) Distributed vertex-cut partitioning. In: IFIP international conference on distributed applications and interoperable systems. Springer, pp 186–200
Rahimian F, Payberah AH, Girdzijauskas S, Jelasity M, Haridi S (2013) JA-BE-JA: a distributed algorithm for balanced graph partitioning. In: IEEE 7th international conference on self-adaptive and self-organizing systems, pp 51–60
Sajjad HP, Payberah AH, Rahimian F, Vlassov V, Haridi S (2016) Boosting vertex-cut partitioning for streaming graphs. In: IEEE international congress on big data (BigData congress), pp 1–8
Sala, A, Cao L, Wilson C, Zablit R, Zheng H, Zhao BY (2010) Measurement-calibrated graph models for social network experiments. In: Proceedings of the 19th international conference on world wide web, WWW ’10. ACM, pp 861–870
Spielman DA, Teng S-H (2004) Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In: Proceedings of the 36th symposium on theory of computing, STOC ’04. ACM, pp 81–90
Spielman DA, Teng S-H (2013) A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM J Comput 42(1):1–26
Stanton I (2014) Streaming balanced graph partitioning algorithms for random graphs. In: Proceedings of the 25th symposium on discrete algorithms. SIAM, pp 1287–1301
Stanton I, Kliot G (2012) Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, pp 1222–1230
Tabrizi SA, Shakery A, Asadpour M, Abbasi M, Tavallaie MA (2013) Personalized pagerank clustering: a graph clustering algorithm based on random walks. Phys A Stat Mech Appl 392(22):5772–5785
Tsourakakis C, Gkantsidis C, Radunovic B, Vojnovic M (2014) Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th international conference on web search and data mining, WSDM ’14. ACM, pp 333–342
Ugander J, Backstrom L (2013) Balanced label propagation for partitioning massive graphs. In: Proceedings of the 6th international conference on web search and data mining, WSDM ’13. ACM, pp 507–516
Wang L, Xiao Y, Shao B, Wang H (2014) How to partition a billion-node graph. In: IEEE 30th international conference on data engineering, pp 568–579
Whang JJ, Gleich DF, Dhillon IS (2016) Overlapping community detection using neighborhood-inflated seed expansion. IEEE Trans Knowl Data Eng 28(5):1272–1284
Xie C, Li W-J, Zhang Z (2015) S-PowerGraph: streaming graph partitioning for natural graphs by vertex-cut. CoRR arXiv:1511.02586
Zhang H, Raitoharju J, Kiranyaz S, Gabbouj M (2016) Limited random walk algorithm for big graph data clustering. J Big Data 3(1):26
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
No funding was received by the authors.
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mazaheri Soudani, N., Fatemi, A. & Nematbakhsh, M. PPR-partitioning: a distributed graph partitioning algorithm based on the personalized PageRank vectors in vertex-centric systems. Knowl Inf Syst 61, 847–871 (2019). https://doi.org/10.1007/s10115-019-01328-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01328-3