Control Flow Emulation on Tiled SIMD Architectures

Lashari, Ghulam; Lhoták, Ondřej; McCool, Michael

doi:10.1007/978-3-540-78791-4_7

Ghulam Lashari¹,
Ondřej Lhoták¹ &
Michael McCool¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4959))

Included in the following conference series:

International Conference on Compiler Construction

1043 Accesses

Abstract

Heterogeneous multi-core and streaming architectures such as the GPU, Cell, ClearSpeed, and Imagine processors have better power/ performance ratios and memory bandwidth than traditional architectures. These types of processors are increasingly being used to accelerate compute-intensive applications. Their performance advantage is achieved by using multiple SIMD processor cores but limiting the complexity of each core, and by combining this with a simplified memory system. In particular, these processors generally avoid the use of cache coherency protocols and may even omit general-purpose caches, opting for restricted caches or explictly managed local memory.

We show how control flow can be emulated on such tiled SIMD architectures and how memory access can be organized to avoid the need for a general-purpose cache and to tolerate long memory latencies. Our technique uses streaming execution and multipass partitioning. Our prototype targets GPUs. On GPUs the memory system is deeply pipelined and caches for read and write are not coherent, so reads and writes may not use the same memory locations simultaneously. This requires the use of double-buffered streaming. We emulate general control flow in a way that is transparent to the programmer and include specific optimizations in our approach that can deal with double-buffering.

Download to read the full chapter text

Chapter PDF

Supercomputer in a Laptop: Distributed Application and Runtime Development via Architecture Simulation

Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures

sOMP: Simulating OpenMP Task-Based Applications with NUMA Effects

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Lefohn, J.K.A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. In: Eurographics 2005: State of the Art Reports, pp. 21–51 (2005)
Google Scholar
Das, A., Dally, W.J., Mattson, P.: Compiling for stream processing. In: PACT 2006: Parallel Architectures and Compilation Techniques, pp. 33–42. ACM, New York (2006)
Chapter Google Scholar
McCool, M.D.: Scalable Programming Models for Massively Multi-Core Processors. In: Proc. IEEE (January 2008)
Google Scholar
Chan, E., Ng, R., Sen, P., Proudfoot, K., Hanrahan, P.: Efficient partitioning of fragment shaders for multipass rendering on programmable graphics hardware. In: HWWS 2002: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pp. 69–78 (2002)
Google Scholar
Foley, T., Houston, M., Hanrahan, P.: Efficient partitioning of fragment shaders for multiple-output hardware. In: HWWS 2004: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pp. 45–53 (2004)
Google Scholar
Riffel, A., Lefohn, A.E., Vidimce, K., Leone, M., Owens, J.D.: Mio: fast multipass partitioning via priority-based instruction scheduling. In: HWWS 2004: ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pp. 35–44 (2004)
Google Scholar
Heirich, A.: Optimal automatic multi-pass shader partitioning by dynamic programming. In: HWWS 2005: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pp. 91–98 (2005)
Google Scholar
Purcell, T.J., Buck, I., Mark, W.R., Hanrahan, P.: Ray tracing on programmable graphics hardware. ACM Transactions on Graphics 21(3), 703–712 (2002)
Article Google Scholar
Cooper, D.C.: Böhm and Jacopini’s reduction of flow charts. Commun. ACM 10(8), 463 (1967)
Article Google Scholar
Harel, D.: On folk theorems. Commun. ACM 23(7), 379–389 (1980)
Article MATH Google Scholar
Knuth, D.E.: Structured programming with go to statements. ACM Comput. Surv. 6(4), 261–301 (1974)
Article MATH MathSciNet Google Scholar
Kapasi, U.J., Dally, W.J., Rixner, S., Mattson, P.R., Owens, J.D., Khailany, B.: Efficient conditional operations for data-parallel architectures. In: 33rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 159–170 (2000)
Google Scholar
Popa, T.S.: Compiling Data Dependent Control Flow on SIMD GPUs. Master’s thesis, University of Waterloo (2004)
Google Scholar
Marlowe, T.J., Ryder, B.G.: Properties of data flow frameworks: a unified model. Acta Inf. 28(2), 121–163 (1990)
Article MATH MathSciNet Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability. W. H. Freeman and Company, San Francisco (1979)
MATH Google Scholar
Sahni, S., Gonzalez, T.: P-complete approximation problems. Journal of the ACM 23(3), 555–565 (1976)
Article MATH MathSciNet Google Scholar
McCool, M.D., Qin, Z., Popa, T.S.: Shader Metaprogramming. In: Proc. Graphics Hardware, September 2002, pp. 57–68 (2002)
Google Scholar
McCool, M.D.: Data-Parallel Programming on the Cell BE and the GPU using the RapidMind Development Platform. In: Proc. GSPx Multicore Applications Conference (October–November 2006)
Google Scholar
Buck, I.: BrookGPU (2003), http://graphics.stanford.edu/projects/-brookgpu/

Download references

Author information

Authors and Affiliations

D. R. Cheriton School of Computer Science, University of Waterloo,
Ghulam Lashari, Ondřej Lhoták & Michael McCool

Authors

Ghulam Lashari
View author publications
You can also search for this author in PubMed Google Scholar
Ondřej Lhoták
View author publications
You can also search for this author in PubMed Google Scholar
Michael McCool
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Laurie Hendren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lashari, G., Lhoták, O., McCool, M. (2008). Control Flow Emulation on Tiled SIMD Architectures. In: Hendren, L. (eds) Compiler Construction. CC 2008. Lecture Notes in Computer Science, vol 4959. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78791-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-78791-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78790-7
Online ISBN: 978-3-540-78791-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Control Flow Emulation on Tiled SIMD Architectures

Abstract

Chapter PDF

Similar content being viewed by others

Supercomputer in a Laptop: Distributed Application and Runtime Development via Architecture Simulation

Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures

sOMP: Simulating OpenMP Task-Based Applications with NUMA Effects

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Control Flow Emulation on Tiled SIMD Architectures

Abstract

Chapter PDF

Similar content being viewed by others

Supercomputer in a Laptop: Distributed Application and Runtime Development via Architecture Simulation

Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures

sOMP: Simulating OpenMP Task-Based Applications with NUMA Effects

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation