An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment

Liu, Wenjie; Li, Zhanhuai

doi:10.1007/s11704-018-7167-0

An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment

Research Article
Published: 16 July 2019

Volume 13, pages 1309–1325, (2019)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Wenjie Liu¹ &
Zhanhuai Li¹

355 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

N-hop neighborhoods information is very useful in analytic tasks on large-scale graphs, like finding clique in a social network, recommending friends or advertising links according to one’s interests, predicting links among websites and etc. To get the N-hop neighborhoods information on a large graph, such as a web graph, a twitter social graph, the most straightforward method is to conduct a breadth first search (BFS) on a parallel distributed graph processing framework, such as Pregel and GraphLab. However, due to the massive volume of message transfer, the BFS method results in high communication cost and has low efficiency.

In this work, we propose a key/value based method, namely KVB, which perfectly fits into the prevailing parallel graph processing framework and computes N-hop neighborhoods on a large scale graph efficiently. Unlike the BFS method, our method need not transfer large amount of neighborhoods information, thus, significantly reduces the overhead on both the communication and intermediate results in the distributed framework.We formalize the N-hop neighborhoods query processing as an optimization problem based on a set of quantitative cost metrics of parallel graph processing. Moreover, we propose a solution to efficiently load only the relevant neighborhoods for computation. Specially, we prove the optimal partial neighborhoods load problem is NP-hard and carefully design a heuristic strategy. We have implemented our algorithm on a distributed graph framework- Spark GraphX and validated our solution with extensive experiments over a number of real world and synthetic large graphs on a modest indoor cluster. Experiments show that our solution generally gains an order of magnitude speedup comparing to the state-of-art BFS implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CCFinder: using Spark to find clustering coefficient in big graphs

Article 12 April 2017

Mehdi Alemi, Hassan Haghighi & Saeed Shahrivari

Distance labeling: on parallelism, compression, and ordering

Article 31 August 2021

Wentao Li, Miao Qiao, … Xuemin Lin

Distributed K-Distance Indexing Approach for Efficient Shortest Path Discovery on Large Graphs

References

Quamar A, Deshpande A, Lin J. NScale: neighborhood-centric largescale graph analytics in the cloud. The VLDB Journal—The International Journal on Very Large Data Bases, 2016, 25(2): 125–150
Article Google Scholar
Fang Y, Cheng R, Luo S, Hu J. Effective community search for large attributed graphs. Proceedings of the VLDB Endowment, 2016, 9(12): 1233–1244
Article Google Scholar
Xu S, Su S, Xiong L, Cheng X, Xiao K. Differentially private frequent subgraph mining. In: Proceedings of the 32nd IEEE International Conference on Data Engineering. 2016, 229–240
Google Scholar
Tadimety R. Six Degrees of Separation. OSPF: A Network Routing Protocol, Apress, Berkeley, 2015, 1–2
Chapter Google Scholar
Calinescu G. Computing 2-hop neighborhoods in Ad Hoc wireless networks. In: Proceedings of the International Conference on Ad-Hoc Networks and Wireless. 2003, 175–186
Google Scholar
Gui J, Zhou K. Flexible adjustments between energy and capacity for topology control in heterogeneous wireless multi-hop networks. Journal of Network and Systems Management, 2016, 24(4): 789–812
Article Google Scholar
Diop M, Pham C, Thiaré O. 2-hop neighborhood information for cover set selection in mission-critical surveillance with wireless image sensor networks. Wireless Days (WD), 2013 IFIP. 2013, 1–7
Google Scholar
Malewicz G, Austern H, Bik A J, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 135–146
Chapter Google Scholar
Low Y, Gonzalez E, Kyrola A, Bickson D, Guestrin C E, Hellerstein J. Graphlab: a new framework for parallel machine learning. 2014, arXiv preprint arXiv:1408.2041
Google Scholar
Lu Y, Cheng J, Yan D, Wu H. Large-scale distributed graph computing systems: an experimental evaluation. Proceedings of the VLDB Endowment, 2014, 8(3): 281–292
Article Google Scholar
Liu H, Huang H, Hu Y. IBFS: concurrent breadth-first search on gpus. In: Proceedings of the 2016 International Conference on Management of Data. 2016, 403–416
Google Scholar
Clauset A, Shalizi R, Newman M E. Power-law distributions in empirical data. SIAM Review, 2009, 51(4): 661–703
Article MathSciNet MATH Google Scholar
Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In: Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST). 2010, 1–10
Google Scholar
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I. Resilient distributed datasets: a faulttolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 2012, 2
Google Scholar
Bernaschi M, Carbone G, Mastrostefano E, Vella F. Solutions to the stconnectivity problem using a GPU-based distributed BFS. Journal of Parallel and Distributed Computing, 2015, 76: 145–153
Article Google Scholar
Hair F, Black W C, Babin B J, Anderson R E, Tatham R L. Multivariate Data Analysis. Pearson Prentice Hall Upper Saddle River, NJ, 2006
Google Scholar
Ketchen J, Shook C L. The application of cluster analysis in strategic management research: an analysis and critique. Strategic Management Journal, 1996, 17(6): 441–458
Article Google Scholar
Akaike H. Information Theory and an Extension of the Maximum Likelihood Principle. Selected Papers of Hirotugu Akaike, Springer, New York, 1998, 199–213
Chapter Google Scholar
Bhat H, Kumar N. On the derivation of the bayesian information criterion. School of Natural Sciences, University of California, 2010
Google Scholar
Linde A. DIC in variable selection. Statistica Neerlandica, 2005, 59(1): 45–56
Article MathSciNet MATH Google Scholar
Vukotic A, Watt N, Abedrabbo T, Fox D, Partner J. Neo4j in Action. Manning Publications Co., 2014
Google Scholar
Xin S, Gonzalez J E, Franklin M J, Stoica I. Graphx: a resilient distributed graph system on spark. In: Proceedings of the International Workshop on Graph Data Management Experiences and Systems. 2013, 1–6
Google Scholar
Csardi G. The igraph software package for complex network research. InterJournal Complex Systems, 2006, 1695(5): 1–9
Google Scholar
Avery C. Giraph: large-scale graph processing infrastructure on hadoop. Proceedings of the Hadoop Summit. Santa Clara, 2011, 11(3): 5–9
Google Scholar
Shang H, Kitsuregawa M. Efficient breadth-first search on large graphs with skewed degree distributions. In: Proceedings of the 16th International Conference on Extending Database Technology. 2013, 311–322
Google Scholar
Yan D, Cheng J, Lu Y, Ng W. Blogel: a block-centric framework for distributed computation on real-world graphs. Proceedings of the VLDB Endowment, 2014, 7(14): 1981–1992
Article Google Scholar
Ugander J, Karrer B, Backstrom L, Marlow C. The anatomy of the facebook social graph. 2011, arXiv preprint arXiv:1111.4503
Google Scholar

Download references

Acknowledgements

This work is supported by the Natural Science Basic Research Plan in Shaanxi Province of China (2017JM6104), the National Natural Science Foundation of China (Grant Nos. 61303037, 61472321, 61732014), the National Key Research and Development Program of China (2018YFB1003403), the National Basic Research Program (973 Program) of China (2012CB316203), and the National High Technology Research and Development Program (863 Program) of China (2012AA011004).

Author information

Authors and Affiliations

School of Computer, Northwestern Polytechnical University, Xi’an, 710072, China
Wenjie Liu & Zhanhuai Li

Authors

Wenjie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhanhuai Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenjie Liu.

Additional information

Wenjie Liu is an associate professor, she obtained her Master Degree in 2003 and Doctor Degree in computer science from the Northwestern Polytechnical University, China in December 2009. From 2003, she has been a teacher in this university and worked in the Department of Computer Software and Theories. In 2014, she was a visiting researcher at database lab, Department of Computer Science and Engineering, Hong Kong University of Science and Technology (HKUST), China where she worked on cloud computing and big data processing. Her research interests include cloud computing, distributed database, massive data management.

Zhanhuai Li is a professor at Department of Computer Science and Software, School of Computer, Northwestern Polytechnical University, China. He is a doctorial supervisor, CCF fellow and Database Committee fellow of China. His research interests include steam data management, data mining, massive data management, cloud data storage.

Electronic supplementary material

Supplementary material, approximately 180 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, W., Li, Z. An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment. Front. Comput. Sci. 13, 1309–1325 (2019). https://doi.org/10.1007/s11704-018-7167-0

Download citation

Received: 08 May 2017
Accepted: 03 November 2017
Published: 16 July 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11704-018-7167-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment

Abstract

Access this article

Similar content being viewed by others

CCFinder: using Spark to find clustering coefficient in big graphs

Distance labeling: on parallelism, compression, and ordering

Distributed K-Distance Indexing Approach for Efficient Shortest Path Discovery on Large Graphs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 180 KB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient parallel algorithm of N-hop neighborhoods on graphs in distributed environment

Abstract

Access this article

Similar content being viewed by others

CCFinder: using Spark to find clustering coefficient in big graphs

Distance labeling: on parallelism, compression, and ordering

Distributed K-Distance Indexing Approach for Efficient Shortest Path Discovery on Large Graphs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 180 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation