Spill Code Placement for SIMD Machines

Sampaio, Diogo Nunes; Gedeon, Elie; Pereira, Fernando Magno Quintão; Collange, Caroline

doi:10.1007/978-3-642-33182-4_3

Diogo Nunes Sampaio³,
Elie Gedeon³,
Fernando Magno Quintão Pereira³ &
…
Caroline Collange³

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7554))

710 Accesses
5 Citations

Abstract

The Single Instruction, Multiple Data (SIMD) execution model has been receiving renewed attention recently. This awareness stems from the rise of graphics processing units (GPUs) as a powerful alternative for parallel computing. Many compiler optimizations have been recently proposed for this hardware, but register allocation is a field yet to be explored. In this context, this paper describes a register spiller for SIMD machines that capitalizes on the opportunity to share identical data between threads. It provides two different benefits: first, it uses less memory, as more spilled values are shared among threads. Second, it improves the access times to spilled values. We have implemented our proposed allocator in the Ocelot open source compiler, and have been able to speedup the code produced by this framework by 21%. Although we have designed our algorithm on top of a linear scan register allocator, we claim that our ideas can be easily adapted to fit the necessities of other register allocators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools, 2nd edn. Addison Wesley (2006)
Google Scholar
Backus, J.: The history of fortran i, ii, and iii. SIGPLAN Not. 13(8), 165–180 (1978)
Article MathSciNet Google Scholar
Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Hwu, W.M.W.: An adaptive performance modeling tool for GPU architectures. In: PPoPP, pp. 105–114. ACM (2010)
Google Scholar
Belady, L.A.: A study of replacement algorithms for a virtual storage computer. IBM Systems Journal 5(2), 78–101 (1966)
Article Google Scholar
Bouchez, F.: Allocation de Registres et Vidage en Mémoire. Master’s thesis, ENS Lyon (October 2005)
Google Scholar
Briggs, P., Cooper, K.D., Torczon, L.: Rematerialization. In: PLDI, pp. 311–321. ACM (1992)
Google Scholar
Carrillo, S., Siegel, J., Li, X.: A control-structure splitting optimization for GPGPU. In: Computing Frontiers, pp. 147–150. ACM (2009)
Google Scholar
Chaitin, G.J., Auslander, M.A., Chandra, A.K., Cocke, J., Hopkins, M.E., Markstein, P.W.: Register allocation via coloring. Computer Languages 6, 47–57 (1981)
Article Google Scholar
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: IISWC, pp. 44–54. IEEE (2009)
Google Scholar
Coutinho, B., Sampaio, D., Pereira, F.M.Q., Meira, W.: Divergence analysis and optimizations. In: PACT. IEEE (2011)
Google Scholar
Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Zadeck, F.K.: Efficiently computing static single assignment form and the control dependence graph. TOPLAS 13(4), 451–490 (1991)
Article Google Scholar
Diamos, G., Kerr, A., Yalamanchili, S., Clark, N.: Ocelot, a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. In: PACT, pp. 354–364 (2010)
Google Scholar
Farach-colton, M., Liberatore, V.: On local register allocation. Journal of Algorithms 37(1), 37–65 (2000)
Article MathSciNet Google Scholar
Garland, M.: Parallel computing experiences with CUDA. IEEE Micro 28, 13–27 (2008)
Article Google Scholar
Garland, M., Kirk, D.B.: Understanding throughput-oriented architectures. Commun. ACM 53, 58–66 (2010)
Article Google Scholar
Golumbic, M.C.: Algorithmic Graph Theory and Perfect Graphs, 1st edn. Elsevier (2004)
Google Scholar
Han, T.D., Abdelrahman, T.S.: Reducing branch divergence in GPU programs. In: GPGPU-4, pp. 3:1–3:8. ACM (2011)
Google Scholar
Harris, M.: The parallel prefix sum (scan) with CUDA. Tech. Rep. Initial release on February 14, 2007, NVIDIA (2008)
Google Scholar
Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In: ISCA, pp. 451–460. ACM (2010)
Google Scholar
Nickolls, J., Dally, W.J.: The GPU computing era. IEEE Micro 30, 56–69 (2010)
Article Google Scholar
Nickolls, J., Kirk, D.: Graphics and Computing GPUs. In: Patterson, Hennessy (eds.) Computer Organization and Design, 4th edn., ch. A, pp. A.1–A.77. Elsevier (2009)
Google Scholar
Pereira, F.M.Q., Palsberg, J.: Register Allocation After Classical SSA Elimination is NP-Complete. In: Aceto, L., Ingólfsdóttir, A. (eds.) FOSSACS 2006. LNCS, vol. 3921, pp. 79–93. Springer, Heidelberg (2006)
Chapter Google Scholar
Poletto, M., Sarkar, V.: Linear scan register allocation. TOPLAS 21(5), 895–913 (1999)
Article Google Scholar
Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.M.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP, pp. 73–82. ACM (2008)
Google Scholar
Sampaio, D., Martins, R., Collange, C., Pereira, F.M.Q.: Divergence analysis with affine constraints. Tech. rep., École normale supérieure de Lyon (2011)
Google Scholar
Sethi, R.: Complete register allocation problems. In: 5th annual ACM Symposium on Theory of Computing, pp. 182–195. ACM Press (1973)
Google Scholar
Sreedhar, V.C., Gao, G.R.: A linear time algorithm for placing f-nodes. In: POPL, pp. 62–73. ACM (1995)
Google Scholar
Wegman, M.N., Zadeck, F.K.: Constant propagation with conditional branches. TOPLAS 13(2) (1991)
Google Scholar
Zhang, E.Z., Jiang, Y., Guo, Z., Tian, K., Shen, X.: On-the-fly elimination of dynamic irregularities for GPU computing. In: ASPLOS, pp. 369–380. ACM (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Ciência da Computação, UFMG, Brazil
Diogo Nunes Sampaio, Elie Gedeon, Fernando Magno Quintão Pereira & Caroline Collange

Authors

Diogo Nunes Sampaio
View author publications
You can also search for this author in PubMed Google Scholar
Elie Gedeon
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Magno Quintão Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Collange
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidade Federal do Ceará, Departamento de Computação, Campus Universitário do Pici, Bloco 910, 60356-001, Fortaleza, Brazil
Francisco Heron de Carvalho Junior
Departamento de Informática, Universidade do Minho, Campus de Gualtar, 4710-057, Braga, Portugal
Luis Soares Barbosa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sampaio, D.N., Gedeon, E., Pereira, F.M.Q., Collange, C. (2012). Spill Code Placement for SIMD Machines. In: de Carvalho Junior, F.H., Barbosa, L.S. (eds) Programming Languages. Lecture Notes in Computer Science, vol 7554. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33182-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-33182-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33181-7
Online ISBN: 978-3-642-33182-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics