Efficient graph computation on hybrid CPU and GPU systems

Zhang, Tao; Zhang, Jingjie; Shu, Wei; Wu, Min-You; Liang, Xiaoyao

doi:10.1007/s11227-015-1378-z

Efficient graph computation on hybrid CPU and GPU systems

Published: 21 January 2015

Volume 71, pages 1563–1586, (2015)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Tao Zhang¹,
Jingjie Zhang²,
Wei Shu¹,
Min-You Wu¹ &
…
Xiaoyao Liang¹

886 Accesses
20 Citations
Explore all metrics

Abstract

Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale, graph-structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, users need to be skilled at managing and tuning a distributed system in a cluster, which is a non-trivial job for ordinary users. Furthermore, these distributed systems need many machines in a cluster in order to provide reasonable performance. Several recent works proposed non-distributed graph processing platforms as complements to distributed platforms. In fact, efficient non-distributed platforms require less hardware resource and can achieve better energy efficiency than distributed ones. GraphChi is a representative non-distributed platform that is disk-based and can process billions of edges on CPUs in a single PC. However, the design drawbacks of GraphChi on I/O and computation model have limited its parallelism and performance. In this paper, we propose a general, disk-based graph engine called gGraph to process billion-scale graphs efficiently by utilizing both CPUs and GPUs in a single PC. GGraph exploits full parallelism and full overlap of computation and I/O processing as much as possible. Experiment results show that gGraph outperforms GraphChi and PowerGraph. In addition, gGraph achieves the best energy efficiency among all evaluated platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large Scale Graph Processing in a Distributed Environment

OneGraph: a cross-architecture framework for large-scale graph computing on GPUs based on oneAPI

Article 09 November 2023

GPUGraphX: A GPU-Aided Distributed Graph Processing System

References

Addario-Berry L, Kennedy WS, King AD, Li Z, Reed B (2010) Finding a maximum-weight induced k-partite subgraph of an i-triangulated graph. Discret Appl Math 158(7):765–770
Article MATH MathSciNet Google Scholar
Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–192
Article Google Scholar
Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. In: Arnold D, Ruiter Bd (eds.) Computer graphics forum, vol. 8. Wiley Online Library, pp 3–11
Avery C (2011) Giraph: large-scale graph processing infrastruction on Hadoop. In: Proceedings of Hadoop Summit
Boldi P, Santini M, Vigna S (2008) A large time-aware web graph. In: ACM SIGIR forum, vol 42. ACM, pp 33–38
Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: SIAM international conference on data mining (SDM), vol 4. SIAM, pp 442–446
Chan A, Dehne F, Taylor R (2005) Cgmgraph/cgmlib: implementing and testing cgm graph algorithms on PC clusters and shared memory machines. Int J High Perform Comput Appl 19(1):81–97
Article Google Scholar
Che S, Beckmann B, Reinhardt S, Skadron K (2013) Pannotia: understanding irregular GPGPU graph applications. In: IEEE international symposium on workload characterization (IISWC), pp 185–195
Fackbook (2015) Facebook statistics. http://www.statisticbrain.com/facebook-statistics/
Fang J, Varbanescu AL, Sips H (2011) A comprehensive performance comparison of CUDA and OpenCL. In: 2011 international conference on parallel processing (ICPP). IEEE, pp 216–225
Ferrer R, Planas J, Bellens P, Duran A, Gonzalez M, Martorell X, Badia RM, Ayguade E, Labarta J (2011) Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL. In: Cooper K, Mellor-Crummey J, Sarkar V (eds.) Languages and compilers for parallel computing. Springer, Berlin, pp 215–229
Gharaibeh A, Costa LB, Santos-Neto E, Ripeanu M (2013) On graphs, GPUs, and blind dating: a workload to processor matchmaking quest. In: 2013 IEEE 27th international symposium on parallel and distributed processing (IPDPS). IEEE, pp 851–862
Gharaibeh A, Santos-Neto E, Costa LB, Ripeanu M (2013) Efficient large-scale graph processing on hybrid CPU and GPU systems. arXiv:1312.3018
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX symposium on operating systems design and implementation (OSDI), pp 17–30
Graph 500 Steering Committee and others: Graph 500 benchmark. http://www.graph500.org/
Gregor D, Lumsdaine A (2005) The parallel bgl: a generic library for distributed graph computations. In: Parallel Object-Oriented Scientific Computing (POOSC)
Harish P, Vineet V, Narayanan P (2009) Large graph algorithms for massively multithreaded architectures. Centre for Visual Information Technology, I. Institute of Information Technology, Hyderabad, India, Tech. Rep. IIIT/TR/2009/74
ILK Research Group, Tilburg University (2015) The size of the world wide web (the internet). http://www.worldwidewebsize.com
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Oper Syst Rev 41(3):59–72
Article Google Scholar
Kang U, Tong H, Sun J, Lin CY, Faloutsos C (2012) Gbase: an efficient analysis platform for large graphs. VLDB J 21(5):637–650
Article Google Scholar
Kang U, Tsourakakis CE, Faloutsos C (2009) Pegasus: a peta-scale graph mining system implementation and observations. In: Ninth IEEE international conference on data mining (ICDM). IEEE, pp 229–238
Karypis G, Kumar V (1999) Parallel multilevel series k-way partitioning scheme for irregular graphs. SIAM Rev 41(2):278–300
Article MATH MathSciNet Google Scholar
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on World wide web. ACM, pp 591–600
Kyrola A, Blelloch G, Guestrin C (2012) Graphchi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX symposium on operating systems design and implementation (OSDI), vol 8, pp 31–46
Lee J, Han WS, Kasperovics R, Lee JH (2012) An in-depth comparison of subgraph isomorphism algorithms in graph databases. In: Proceedings of the 39th international conference on very large data bases. VLDB Endowment, pp 133–144
Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning and data mining in the cloud. In: Proceedings of the very large data bases (VLDB) endowment, vol 5, issue 8, pp 716–727
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM international conference on management of data (SIGMOD). ACM, pp 135–146
Maserrat H, Pei J (2010) Neighbor query friendly compression of social networks. In: Proceedings of the 16th ACM international conference on Knowledge discovery and data mining (SIGKDD). ACM, pp 533–542
Merrill D, Garland M, Grimshaw A (2012) Scalable GPU graph traversal. In: ACM SIGPLAN notices, vol 47. ACM, pp 117–128
Neo Technology, inc. (2015) Learning neo4j. http://neo4j.com/book-learning-neo4j/
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web
Pearce R, Gokhale M, Amato NM (2010) Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In: Proceedings of the 2010 ACM/IEEE international conference for high performance computing, networking, storage and analysis. IEEE Computer Society, pp 1–11
Salihoglu S, Widom J (2013) Gps: a graph processing system. In: Proceedings of the 25th international conference on scientific and statistical database management. ACM, p 22
Shao B, Wang H, Li Y (2013) Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 international conference on Management of data. ACM, pp 505–516
Shen J, Fang J, Sips H, Varbanescu AL (2012) Performance gaps between OpenMP and OpenCL for multi-core CPUs. In: 2012 41st international conference on parallel processing workshops (ICPPW). IEEE, pp 116–125
Shiloach Y, Vishkin U (1982) An O(log(n)) parallel connectivity algorithm. J Algorithms 3(1):57–67
Article MATH MathSciNet Google Scholar
Shun J, Blelloch GE (2013) Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of the 18th ACM symposium on principles and practice of parallel programming (SIGPLAN). ACM, pp 135–146
Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111
Article Google Scholar
Vineet V, Harish P, Patidar S, Narayanan P (2009) Fast minimum spanning tree for large graphs on the GPU. In: Proceedings of the conference on high performance graphics 2009. ACM, pp 167–171
Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proceedings of the ACM SIGKDD workshop on mining data semantics. ACM, p 3

Download references

Acknowledgments

The authors would like to thank anonymous reviewers for their fruitful feedback and comments that have helped them improve the quality of this work. This work is partly supported by the National Natural Science Foundation of China (Grant No. 61202026, No. 61332001 and No. 61373155) and Program of China National 1000 Young Talent Plan.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Road, Min Hang District, Shanghai, 200240, China
Tao Zhang, Wei Shu, Min-You Wu & Xiaoyao Liang
Department of Electrical Engineering, Fudan University, Shanghai, China
Jingjie Zhang

Authors

Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jingjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Shu
View author publications
You can also search for this author in PubMed Google Scholar
Min-You Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyao Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, T., Zhang, J., Shu, W. et al. Efficient graph computation on hybrid CPU and GPU systems. J Supercomput 71, 1563–1586 (2015). https://doi.org/10.1007/s11227-015-1378-z

Download citation

Published: 21 January 2015
Issue Date: April 2015
DOI: https://doi.org/10.1007/s11227-015-1378-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient graph computation on hybrid CPU and GPU systems

Abstract

Access this article

Similar content being viewed by others

Large Scale Graph Processing in a Distributed Environment

OneGraph: a cross-architecture framework for large-scale graph computing on GPUs based on oneAPI

GPUGraphX: A GPU-Aided Distributed Graph Processing System

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient graph computation on hybrid CPU and GPU systems

Abstract

Access this article

Similar content being viewed by others

Large Scale Graph Processing in a Distributed Environment

OneGraph: a cross-architecture framework for large-scale graph computing on GPUs based on oneAPI

GPUGraphX: A GPU-Aided Distributed Graph Processing System

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation