Skip to main content

Spill Code Placement for SIMD Machines

  • Conference paper
Programming Languages

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7554))

Abstract

The Single Instruction, Multiple Data (SIMD) execution model has been receiving renewed attention recently. This awareness stems from the rise of graphics processing units (GPUs) as a powerful alternative for parallel computing. Many compiler optimizations have been recently proposed for this hardware, but register allocation is a field yet to be explored. In this context, this paper describes a register spiller for SIMD machines that capitalizes on the opportunity to share identical data between threads. It provides two different benefits: first, it uses less memory, as more spilled values are shared among threads. Second, it improves the access times to spilled values. We have implemented our proposed allocator in the Ocelot open source compiler, and have been able to speedup the code produced by this framework by 21%. Although we have designed our algorithm on top of a linear scan register allocator, we claim that our ideas can be easily adapted to fit the necessities of other register allocators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools, 2nd edn. Addison Wesley (2006)

    Google Scholar 

  2. Backus, J.: The history of fortran i, ii, and iii. SIGPLAN Not. 13(8), 165–180 (1978)

    Article  MathSciNet  Google Scholar 

  3. Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Hwu, W.M.W.: An adaptive performance modeling tool for GPU architectures. In: PPoPP, pp. 105–114. ACM (2010)

    Google Scholar 

  4. Belady, L.A.: A study of replacement algorithms for a virtual storage computer. IBM Systems Journal 5(2), 78–101 (1966)

    Article  Google Scholar 

  5. Bouchez, F.: Allocation de Registres et Vidage en Mémoire. Master’s thesis, ENS Lyon (October 2005)

    Google Scholar 

  6. Briggs, P., Cooper, K.D., Torczon, L.: Rematerialization. In: PLDI, pp. 311–321. ACM (1992)

    Google Scholar 

  7. Carrillo, S., Siegel, J., Li, X.: A control-structure splitting optimization for GPGPU. In: Computing Frontiers, pp. 147–150. ACM (2009)

    Google Scholar 

  8. Chaitin, G.J., Auslander, M.A., Chandra, A.K., Cocke, J., Hopkins, M.E., Markstein, P.W.: Register allocation via coloring. Computer Languages 6, 47–57 (1981)

    Article  Google Scholar 

  9. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: IISWC, pp. 44–54. IEEE (2009)

    Google Scholar 

  10. Coutinho, B., Sampaio, D., Pereira, F.M.Q., Meira, W.: Divergence analysis and optimizations. In: PACT. IEEE (2011)

    Google Scholar 

  11. Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Zadeck, F.K.: Efficiently computing static single assignment form and the control dependence graph. TOPLAS 13(4), 451–490 (1991)

    Article  Google Scholar 

  12. Diamos, G., Kerr, A., Yalamanchili, S., Clark, N.: Ocelot, a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. In: PACT, pp. 354–364 (2010)

    Google Scholar 

  13. Farach-colton, M., Liberatore, V.: On local register allocation. Journal of Algorithms 37(1), 37–65 (2000)

    Article  MathSciNet  Google Scholar 

  14. Garland, M.: Parallel computing experiences with CUDA. IEEE Micro 28, 13–27 (2008)

    Article  Google Scholar 

  15. Garland, M., Kirk, D.B.: Understanding throughput-oriented architectures. Commun. ACM 53, 58–66 (2010)

    Article  Google Scholar 

  16. Golumbic, M.C.: Algorithmic Graph Theory and Perfect Graphs, 1st edn. Elsevier (2004)

    Google Scholar 

  17. Han, T.D., Abdelrahman, T.S.: Reducing branch divergence in GPU programs. In: GPGPU-4, pp. 3:1–3:8. ACM (2011)

    Google Scholar 

  18. Harris, M.: The parallel prefix sum (scan) with CUDA. Tech. Rep. Initial release on February 14, 2007, NVIDIA (2008)

    Google Scholar 

  19. Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In: ISCA, pp. 451–460. ACM (2010)

    Google Scholar 

  20. Nickolls, J., Dally, W.J.: The GPU computing era. IEEE Micro 30, 56–69 (2010)

    Article  Google Scholar 

  21. Nickolls, J., Kirk, D.: Graphics and Computing GPUs. In: Patterson, Hennessy (eds.) Computer Organization and Design, 4th edn., ch. A, pp. A.1–A.77. Elsevier (2009)

    Google Scholar 

  22. Pereira, F.M.Q., Palsberg, J.: Register Allocation After Classical SSA Elimination is NP-Complete. In: Aceto, L., Ingólfsdóttir, A. (eds.) FOSSACS 2006. LNCS, vol. 3921, pp. 79–93. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  23. Poletto, M., Sarkar, V.: Linear scan register allocation. TOPLAS 21(5), 895–913 (1999)

    Article  Google Scholar 

  24. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.M.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP, pp. 73–82. ACM (2008)

    Google Scholar 

  25. Sampaio, D., Martins, R., Collange, C., Pereira, F.M.Q.: Divergence analysis with affine constraints. Tech. rep., École normale supérieure de Lyon (2011)

    Google Scholar 

  26. Sethi, R.: Complete register allocation problems. In: 5th annual ACM Symposium on Theory of Computing, pp. 182–195. ACM Press (1973)

    Google Scholar 

  27. Sreedhar, V.C., Gao, G.R.: A linear time algorithm for placing f-nodes. In: POPL, pp. 62–73. ACM (1995)

    Google Scholar 

  28. Wegman, M.N., Zadeck, F.K.: Constant propagation with conditional branches. TOPLAS 13(2) (1991)

    Google Scholar 

  29. Zhang, E.Z., Jiang, Y., Guo, Z., Tian, K., Shen, X.: On-the-fly elimination of dynamic irregularities for GPU computing. In: ASPLOS, pp. 369–380. ACM (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sampaio, D.N., Gedeon, E., Pereira, F.M.Q., Collange, C. (2012). Spill Code Placement for SIMD Machines. In: de Carvalho Junior, F.H., Barbosa, L.S. (eds) Programming Languages. Lecture Notes in Computer Science, vol 7554. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33182-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33182-4_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33181-7

  • Online ISBN: 978-3-642-33182-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics