Abstract
Heterogeneous multi-core and streaming architectures such as the GPU, Cell, ClearSpeed, and Imagine processors have better power/ performance ratios and memory bandwidth than traditional architectures. These types of processors are increasingly being used to accelerate compute-intensive applications. Their performance advantage is achieved by using multiple SIMD processor cores but limiting the complexity of each core, and by combining this with a simplified memory system. In particular, these processors generally avoid the use of cache coherency protocols and may even omit general-purpose caches, opting for restricted caches or explictly managed local memory.
We show how control flow can be emulated on such tiled SIMD architectures and how memory access can be organized to avoid the need for a general-purpose cache and to tolerate long memory latencies. Our technique uses streaming execution and multipass partitioning. Our prototype targets GPUs. On GPUs the memory system is deeply pipelined and caches for read and write are not coherent, so reads and writes may not use the same memory locations simultaneously. This requires the use of double-buffered streaming. We emulate general control flow in a way that is transparent to the programmer and include specific optimizations in our approach that can deal with double-buffering.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Lefohn, J.K.A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. In: Eurographics 2005: State of the Art Reports, pp. 21–51 (2005)
Das, A., Dally, W.J., Mattson, P.: Compiling for stream processing. In: PACT 2006: Parallel Architectures and Compilation Techniques, pp. 33–42. ACM, New York (2006)
McCool, M.D.: Scalable Programming Models for Massively Multi-Core Processors. In: Proc. IEEE (January 2008)
Chan, E., Ng, R., Sen, P., Proudfoot, K., Hanrahan, P.: Efficient partitioning of fragment shaders for multipass rendering on programmable graphics hardware. In: HWWS 2002: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pp. 69–78 (2002)
Foley, T., Houston, M., Hanrahan, P.: Efficient partitioning of fragment shaders for multiple-output hardware. In: HWWS 2004: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pp. 45–53 (2004)
Riffel, A., Lefohn, A.E., Vidimce, K., Leone, M., Owens, J.D.: Mio: fast multipass partitioning via priority-based instruction scheduling. In: HWWS 2004: ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pp. 35–44 (2004)
Heirich, A.: Optimal automatic multi-pass shader partitioning by dynamic programming. In: HWWS 2005: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pp. 91–98 (2005)
Purcell, T.J., Buck, I., Mark, W.R., Hanrahan, P.: Ray tracing on programmable graphics hardware. ACM Transactions on Graphics 21(3), 703–712 (2002)
Cooper, D.C.: Böhm and Jacopini’s reduction of flow charts. Commun. ACM 10(8), 463 (1967)
Harel, D.: On folk theorems. Commun. ACM 23(7), 379–389 (1980)
Knuth, D.E.: Structured programming with go to statements. ACM Comput. Surv. 6(4), 261–301 (1974)
Kapasi, U.J., Dally, W.J., Rixner, S., Mattson, P.R., Owens, J.D., Khailany, B.: Efficient conditional operations for data-parallel architectures. In: 33rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 159–170 (2000)
Popa, T.S.: Compiling Data Dependent Control Flow on SIMD GPUs. Master’s thesis, University of Waterloo (2004)
Marlowe, T.J., Ryder, B.G.: Properties of data flow frameworks: a unified model. Acta Inf. 28(2), 121–163 (1990)
Garey, M.R., Johnson, D.S.: Computers and Intractability. W. H. Freeman and Company, San Francisco (1979)
Sahni, S., Gonzalez, T.: P-complete approximation problems. Journal of the ACM 23(3), 555–565 (1976)
McCool, M.D., Qin, Z., Popa, T.S.: Shader Metaprogramming. In: Proc. Graphics Hardware, September 2002, pp. 57–68 (2002)
McCool, M.D.: Data-Parallel Programming on the Cell BE and the GPU using the RapidMind Development Platform. In: Proc. GSPx Multicore Applications Conference (October–November 2006)
Buck, I.: BrookGPU (2003), http://graphics.stanford.edu/projects/-brookgpu/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lashari, G., Lhoták, O., McCool, M. (2008). Control Flow Emulation on Tiled SIMD Architectures. In: Hendren, L. (eds) Compiler Construction. CC 2008. Lecture Notes in Computer Science, vol 4959. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78791-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-78791-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78790-7
Online ISBN: 978-3-540-78791-4
eBook Packages: Computer ScienceComputer Science (R0)