Computer Science - Research and Development

, Volume 28, Issue 2, pp 193–201

Understanding parallelism in graph traversal on multi-core clusters

Authors

    • State Key Laboratory of Computer Architecture, Institute of Computing TechnologyChinese Academy of Sciences
    • Graduate School of Chinese Academy of Sciences
  • Guangming Tan
    • State Key Laboratory of Computer Architecture, Institute of Computing TechnologyChinese Academy of Sciences
  • Mingyu Chen
    • State Key Laboratory of Computer Architecture, Institute of Computing TechnologyChinese Academy of Sciences
  • Ninghui Sun
    • State Key Laboratory of Computer Architecture, Institute of Computing TechnologyChinese Academy of Sciences
Special Issue Paper

DOI: 10.1007/s00450-012-0207-3

Cite this article as:
Lv, H., Tan, G., Chen, M. et al. Comput Sci Res Dev (2013) 28: 193. doi:10.1007/s00450-012-0207-3

Abstract

There is an ever-increasing need for exploring large-scale graph data sets in computational sciences, social networks, and business analytics. However, due to irregular and memory-intensive nature, graph applications are notoriously known for their poor performance on parallel computer systems. In this paper we propose a new hybrid MPI/Pthreads breadth-first search (BFS) algorithm featuring with (i) overlapping computation and communication by separating them into multiple threads, (ii) maximizing multi-threading parallelism on multi-cores with massive threads to improve throughputs, and (iii) exploiting pipeline parallelism using lock-free queues for asynchronous communication. By comparing it with traditional MPI-only BFS algorithm, we learned several valuable lessons that would help to understand and exploit parallelism in graph traversal applications. Experiments show our algorithm is 1.9× faster than the MPI-only version, capable of processing 1.45 billion edges per second on a 32-node SMP cluster. At a large scale, our algorithm is 1.49× than the MPI-only BFS algorithm in Combinatorial BLAS Library with 6,144 cores.

Keywords

Breadth-first searchGraph algorithmsHybrid MPI/Pthreads programmingLock-free queues

Copyright information

© Springer-Verlag 2012