Abstract
Graphics processing units (GPU) are currently used as a cost-effective platform for computer simulations and big-data processing. Large scale applications require that multiple GPUs work together but the efficiency obtained with cluster of GPUs is, at times, sub-optimal because the GPU features are not exploited at their best. We describe how it is possible to achieve an excellent efficiency for applications in statistical mechanics, particle dynamics and networks analysis by using suitable memory access patterns and mechanisms like CUDA streams, profiling tools, etc. Similar concepts and techniques may be applied also to other problems like the solution of Partial Differential Equations.
References
NVIDIA CUDA Compute Unified Device Architecture Programming Guide, http://www.nvidia.com/cuda
J. Glaser, T.D. Nguyen, J.A. Anderson, P. Lui, F. Spiga, J.A. Millan, D.C. Morse, S.C. Glotzer, Comput. Phys. Commun. 192, 97 (2015)
M. Bernaschi, G. Parisi, L. Parisi, Comput. Phys. Commun. 182, 6 (2011)
T. Preis, P. Virnau, W. Paul, J. Schneider, J. Comput. Phys. 228, 4468 (2009)
M. Weigel, Comput. Phys. Commun. 182, 1833 (2011)
M. Weigel, J. Comput. Phys. 231, 3064 (2012)
M. Lulli, M. Bernaschi, G. Parisi, accepted in Comput. Phys. Commun.
M. Bernaschi, G. Amati, M. Bisson, S. Melchionna, S. Succi, Comput. Phys. Commun. 184, 2 (2012)
M. Bisson, M. Bernaschi, S. Melchionna, Commun. Comput. Phys. 10, 1077 (2011)
G. Karypis, V. Kumar, SIAM J. Sci. Comput. 20, 359 (1999)
C. Chevalier, F. Pellegrini, Parallel Comput. 34, 318 (2008)
C. Begau, G. Sutmann, Comput. Phys. Commun. 190, 51 (2015)
D. Merrill, M. Garland, A. Grimshaw, Scalable gpu graph traversal, in Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’12 (ACM, New York, 2012), pp. 117–128
T. Hiragushi, D. Takahashi, in Algorithms and Architectures for Parallel Processing, Lect. Notes Computer Science (Springer, 2013), Vol. 8286, pp. 40–50
G. Karypis, V. Kumar, SIAM J. Sci. Comput. 20, 359 (1999)
M. Bernaschi, M. Bisson, E. Mastrostefano, submitted to IEEE Transactions on Distributed and Parallel Systems, arXiv:1408.1605 (2014)
N. Satish, C. Kim, J. Chhugani, P. Dubey, Large-scale Energy-efficient Graph Traversal: A Path to Efficient Data-intensive Supercomputing, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2012
F. Checconi, F. Petrini, J. Willcock, A. Lumsdaine, A.R. Choudhury, Y. Sabharwal, Breaking the speed and scalability barriers for graph exploration on distributed-memory machines, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2012
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bernaschi, M., Bisson, M. & Fatica, M. Colloquium: Large scale simulations on GPU clusters. Eur. Phys. J. B 88, 158 (2015). https://doi.org/10.1140/epjb/e2015-60180-8
Received:
Revised:
Published:
DOI: https://doi.org/10.1140/epjb/e2015-60180-8