Colloquium: Large scale simulations on GPU clusters
Colloquium
First Online:
- 168 Downloads
- 2 Citations
Abstract
Graphics processing units (GPU) are currently used as a cost-effective platform for computer simulations and big-data processing. Large scale applications require that multiple GPUs work together but the efficiency obtained with cluster of GPUs is, at times, sub-optimal because the GPU features are not exploited at their best. We describe how it is possible to achieve an excellent efficiency for applications in statistical mechanics, particle dynamics and networks analysis by using suitable memory access patterns and mechanisms like CUDA streams, profiling tools, etc. Similar concepts and techniques may be applied also to other problems like the solution of Partial Differential Equations.
Keywords
Statistical and Nonlinear PhysicsReferences
- 1.NVIDIA CUDA Compute Unified Device Architecture Programming Guide, http://www.nvidia.com/cuda
- 2.J. Glaser, T.D. Nguyen, J.A. Anderson, P. Lui, F. Spiga, J.A. Millan, D.C. Morse, S.C. Glotzer, Comput. Phys. Commun. 192, 97 (2015) ADSCrossRefGoogle Scholar
- 3.M. Bernaschi, G. Parisi, L. Parisi, Comput. Phys. Commun. 182, 6 (2011) CrossRefGoogle Scholar
- 4.T. Preis, P. Virnau, W. Paul, J. Schneider, J. Comput. Phys. 228, 4468 (2009) zbMATHADSCrossRefGoogle Scholar
- 5.M. Weigel, Comput. Phys. Commun. 182, 1833 (2011) ADSCrossRefGoogle Scholar
- 6.M. Weigel, J. Comput. Phys. 231, 3064 (2012) zbMATHADSCrossRefGoogle Scholar
- 7.M. Lulli, M. Bernaschi, G. Parisi, accepted in Comput. Phys. Commun. Google Scholar
- 8.M. Bernaschi, G. Amati, M. Bisson, S. Melchionna, S. Succi, Comput. Phys. Commun. 184, 2 (2012) Google Scholar
- 9.M. Bisson, M. Bernaschi, S. Melchionna, Commun. Comput. Phys. 10, 1077 (2011) Google Scholar
- 10.G. Karypis, V. Kumar, SIAM J. Sci. Comput. 20, 359 (1999) zbMATHMathSciNetCrossRefGoogle Scholar
- 11.C. Chevalier, F. Pellegrini, Parallel Comput. 34, 318 (2008) MathSciNetCrossRefGoogle Scholar
- 12.C. Begau, G. Sutmann, Comput. Phys. Commun. 190, 51 (2015) ADSCrossRefGoogle Scholar
- 13.D. Merrill, M. Garland, A. Grimshaw, Scalable gpu graph traversal, in Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’12 (ACM, New York, 2012), pp. 117–128 Google Scholar
- 14.T. Hiragushi, D. Takahashi, in Algorithms and Architectures for Parallel Processing, Lect. Notes Computer Science (Springer, 2013), Vol. 8286, pp. 40–50 Google Scholar
- 15.G. Karypis, V. Kumar, SIAM J. Sci. Comput. 20, 359 (1999) zbMATHMathSciNetCrossRefGoogle Scholar
- 16.M. Bernaschi, M. Bisson, E. Mastrostefano, submitted to IEEE Transactions on Distributed and Parallel Systems, arXiv:1408.1605 (2014) Google Scholar
- 17.N. Satish, C. Kim, J. Chhugani, P. Dubey, Large-scale Energy-efficient Graph Traversal: A Path to Efficient Data-intensive Supercomputing, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2012 Google Scholar
- 18.F. Checconi, F. Petrini, J. Willcock, A. Lumsdaine, A.R. Choudhury, Y. Sabharwal, Breaking the speed and scalability barriers for graph exploration on distributed-memory machines, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2012 Google Scholar
Copyright information
© EDP Sciences, SIF, Springer-Verlag Berlin Heidelberg 2015