HCC 2016: Human Centered Computing pp 409-419 | Cite as
SSSP on GPU Without Atomic Operation
Abstract
Graph is a general theoretical model in many large scale data-driven applications. SSSP (Single Source Shortest Path) algorithm is a foundation for most important algorithms and applications. GPU remains its mainstream station in high performance computing with heterogeneous architecture computers. Because of the high parallelization of the GPU threads, the distances of the vertices of the GPU are updated by atomic operations to avoid the read and write errors. Most atomic operations are unnecessary since the read-write conflicts are rare in large graph. However, without atomic operations the result accuracy can’t be guaranteed. The atomic operations take large part of the running time of the program. To improve the performance of SSSP on GPU, we proposed an algorithm with data block iterations instead of atomic operations. The algorithm not only gets a high speed-up but also guarantees the accuracy of the result. Experimental results show that this SSSP algorithm gained a speedup of three times than the serial algorithm on CPU and more than ten times than the parallel algorithm on GPU with atomic operation.
Keywords
Graph SSSP Atomic operation Data block iterationNotes
Acknowledgement
This paper is supported by the Natural Science Foundation of China (61379048); Special Project of National CAS Union – The High Performance Cloud Service Platform for Enterprise Creative Computing; Special Project of Hebei-CAS Union (Hebei, No.14010015)–The Performance Optimization of Key Algorithms of Health Data Processing in Senile Dementia Analysis.
References
- 1.Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1, 269–271 (1959). http://dx.doi.org/10.1007/BF01386390 MathSciNetCrossRefMATHGoogle Scholar
- 2.Fredman, M.L., Tarjan, R.E.: Fibonacci heaps their uses in improved network optimization algorithms. J. ACM (JACM) 34, 596–615 (1987)MathSciNetCrossRefGoogle Scholar
- 3.Bellman, R.: On a routing problem. Q. Appl. Math. 16, 87–90 (1958)MATHGoogle Scholar
- 4.Meyer, U., Sanders, P.: Δ-stepping: a parallelizable shortest path algorithm. J. Algorithms 49, 114–152 (2003)MathSciNetCrossRefMATHGoogle Scholar
- 5.
- 6.Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley, Upper Saddle River (2010)Google Scholar
- 7.Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC’s 2009: Proceedings of the 2009 ACM/IEEE Conference on Supercomputing, pp. 18:1–18:11, November 2009Google Scholar
- 8.Ortega-Arranz, H., Torres, Y.: A new GPU-based approach to the shortest path problem. IEEE (2013)Google Scholar