MIMD Interpretation on a GPU

  • Henry G. Dietz
  • B. Dalton Young
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5898)


Programming heterogeneous parallel computer systems is notoriously difficult, but MIMD models have proven to be portable across multi-core processors, clusters, and massively parallel systems. It would be highly desirable for GPUs (Graphics Processing Units) also to be able to leverage algorithms and programming tools designed for MIMD targets. Unfortunately, most GPU hardware implements a very restrictive multi-threaded SIMD-based execution model.

This paper presents a compiler, assembler, and interpreter system that allows a GPU to implement a richly featured MIMD execution model that supports shared-memory communication, recursion, etc. Through a variety of careful design choices and optimizations, reasonable efficiency is obtained on NVIDIA CUDA GPUs. The discussion covers both the methods used and the motivation in terms of the relevant aspects of GPU architecture.


Graphic Processing Unit Processing Element Global Memory Instruction Type Graphic Processing Unit Architecture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    NVIDIA, NVIDIA CUDA compute unified device architecture programming guide version 1.0 (June 2007)Google Scholar
  2. 2.
    ATI, ATI stream SDK user guide v1.3-beta (December 2008)Google Scholar
  3. 3.
    ClearSpeed. ClearSpeed whitepaper: CSX processor architecture, ClearSpeed Technology plc, vol. PN-1110-0702 (2007)Google Scholar
  4. 4.
    Blank, T.: The maspar mp-1 architecture. In: 35th IEEE Computer Society International Conference (COMPCON) (February 1990)Google Scholar
  5. 5.
    Wilsey, P., Hensgen, D., Slusher, C., Abu-Ghazaleh, N., Hollinden, D.: Exploiting simd computers for mutant program execution, Technical Report No. TR 133-11- 91, Department of Electrical and Computer Engineering, University of Cincinnati, Cincinnati, Ohio (November 1991)Google Scholar
  6. 6.
    Dietz, H.G., Cohen, W.E.: A massively parallel mimd implemented by SIMD hardware, Purdue University School of Electrical Engineering Technical Report TR-EE 92-4, 28 pages (January 1992)Google Scholar
  7. 7.
    Thinking Machines Corporation, Connection machine model cm-2 technical sum- mary, version 5.1 (May 1989)Google Scholar
  8. 8.
    Siegel, H., Nation, W., Allemang, M.: The organization of the PASM: Reconfigurable parallel processing system. In: Ohio State Parallel Computing Workshop, March 1990, pp. 1–12 (1990)Google Scholar
  9. 9.
    Nilsson, M., Tanaka, H.: MIMD Execution by SIMD Computers. Journal of Information Processing. Information Processing Society of Japan 13(1), 58–61 (1990)Google Scholar
  10. 10.
    Langdon, W.B., Banzhaf, W.: A SIMD interpreter for genetic programming on GPU graphics cards. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcazar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 73–85. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Dietz, H.G., Cohen, W.E.: A control-parallel programming model implemented on simd hardware. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) LCPC 1993. LNCS, vol. 768, pp. 96–114. Springer, Heidelberg (1994)Google Scholar
  12. 12.
    Abu-ghazaleh, N.B., Wilsey, P.A., Fan, X., Hensgen, D.A.: Synthesizing variable instruction issue interpreters for implementing functional parallelism on SIMD computers. IEEE Transactions on Parallel and Distributed Systems (1997)Google Scholar
  13. 13.
    Khronos OpenCL Working Group, The OpenCL specification version 1.0 (December 2008)Google Scholar
  14. 14.
    Lipchak, B., et al.: Arb fragment program, OpenGL Extension Registry (August 2002),
  15. 15.
  16. 16.
    Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proceedings of the IRE 40(9), 1098–1101 (1952)CrossRefGoogle Scholar
  17. 17.
    Dietz, H.G.: Common subexpression induction. In: 1992 International Conference on Parallel Processing, Saint Charles, Illinois, August 1992, vol. II (1992)Google Scholar
  18. 18.
    Hou, Q., Zhou, K., Guo, A.: Debugging gpu stream programs through automatic data ow recording and visualization (May 2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Henry G. Dietz
    • 1
  • B. Dalton Young
    • 1
  1. 1.Electrical and Computer EngineeringUniversity of Kentucky 

Personalised recommendations