MIMD Interpretation on a GPU
Programming heterogeneous parallel computer systems is notoriously difficult, but MIMD models have proven to be portable across multi-core processors, clusters, and massively parallel systems. It would be highly desirable for GPUs (Graphics Processing Units) also to be able to leverage algorithms and programming tools designed for MIMD targets. Unfortunately, most GPU hardware implements a very restrictive multi-threaded SIMD-based execution model.
This paper presents a compiler, assembler, and interpreter system that allows a GPU to implement a richly featured MIMD execution model that supports shared-memory communication, recursion, etc. Through a variety of careful design choices and optimizations, reasonable efficiency is obtained on NVIDIA CUDA GPUs. The discussion covers both the methods used and the motivation in terms of the relevant aspects of GPU architecture.
KeywordsGraphic Processing Unit Processing Element Global Memory Instruction Type Graphic Processing Unit Architecture
Unable to display preview. Download preview PDF.
- 1.NVIDIA, NVIDIA CUDA compute unified device architecture programming guide version 1.0 (June 2007)Google Scholar
- 2.ATI, ATI stream SDK user guide v1.3-beta (December 2008)Google Scholar
- 3.ClearSpeed. ClearSpeed whitepaper: CSX processor architecture, ClearSpeed Technology plc, vol. PN-1110-0702 (2007)Google Scholar
- 4.Blank, T.: The maspar mp-1 architecture. In: 35th IEEE Computer Society International Conference (COMPCON) (February 1990)Google Scholar
- 5.Wilsey, P., Hensgen, D., Slusher, C., Abu-Ghazaleh, N., Hollinden, D.: Exploiting simd computers for mutant program execution, Technical Report No. TR 133-11- 91, Department of Electrical and Computer Engineering, University of Cincinnati, Cincinnati, Ohio (November 1991)Google Scholar
- 6.Dietz, H.G., Cohen, W.E.: A massively parallel mimd implemented by SIMD hardware, Purdue University School of Electrical Engineering Technical Report TR-EE 92-4, 28 pages (January 1992)Google Scholar
- 7.Thinking Machines Corporation, Connection machine model cm-2 technical sum- mary, version 5.1 (May 1989)Google Scholar
- 8.Siegel, H., Nation, W., Allemang, M.: The organization of the PASM: Reconfigurable parallel processing system. In: Ohio State Parallel Computing Workshop, March 1990, pp. 1–12 (1990)Google Scholar
- 9.Nilsson, M., Tanaka, H.: MIMD Execution by SIMD Computers. Journal of Information Processing. Information Processing Society of Japan 13(1), 58–61 (1990)Google Scholar
- 10.Langdon, W.B., Banzhaf, W.: A SIMD interpreter for genetic programming on GPU graphics cards. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcazar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 73–85. Springer, Heidelberg (2008)CrossRefGoogle Scholar
- 11.Dietz, H.G., Cohen, W.E.: A control-parallel programming model implemented on simd hardware. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) LCPC 1993. LNCS, vol. 768, pp. 96–114. Springer, Heidelberg (1994)Google Scholar
- 12.Abu-ghazaleh, N.B., Wilsey, P.A., Fan, X., Hensgen, D.A.: Synthesizing variable instruction issue interpreters for implementing functional parallelism on SIMD computers. IEEE Transactions on Parallel and Distributed Systems (1997)Google Scholar
- 13.Khronos OpenCL Working Group, The OpenCL specification version 1.0 (December 2008)Google Scholar
- 14.Lipchak, B., et al.: Arb fragment program, OpenGL Extension Registry (August 2002), http://oss.sgi.com/projects/ogl-sample/registry/ARB/fragment_program.txt
- 17.Dietz, H.G.: Common subexpression induction. In: 1992 International Conference on Parallel Processing, Saint Charles, Illinois, August 1992, vol. II (1992)Google Scholar
- 18.Hou, Q., Zhou, K., Guo, A.: Debugging gpu stream programs through automatic data ow recording and visualization (May 2009)Google Scholar