Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

International Conference on Compiler Construction

CC 2012: Compiler Construction pp 1–20Cite as

  1. Home
  2. Compiler Construction
  3. Conference paper
Improving Performance of OpenCL on CPUs

Improving Performance of OpenCL on CPUs

  • Ralf Karrenberg17 &
  • Sebastian Hack17 
  • Conference paper
  • 1612 Accesses

  • 38 Citations

  • 3 Altmetric

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 7210)

Abstract

Data-parallel languages like OpenCL and CUDA are an important means to exploit the computational power of today’s computing devices. In this paper, we deal with two aspects of implementing such languages on CPUs: First, we present a static analysis and an accompanying optimization to exclude code regions from control-flow to data-flow conversion, which is the commonly used technique to leverage vector instruction sets. Second, we present a novel technique to implement barrier synchronization. We evaluate our techniques in a custom OpenCL CPU driver which is compared to itself in different configurations and to proprietary implementations by AMD and Intel. We achieve an average speedup factor of 1.21 compared to naïve vectorization and additional factors of 1.15–2.09 for suited kernels due to the optimizations enabled by our analysis. Our best configuration achieves an average speedup factor of 2.5 against the Intel driver.

Keywords

  • OpenCL
  • SIMD
  • Vectorization
  • Data Parallelism
  • Code Generation
  • Synchronization
  • Divergent Control Flow

Download conference paper PDF

References

  1. Allen, J.R., Kennedy, K., Porterfield, C., Warren, J.: Conversion of control dependence to data dependence. In: POPL, pp. 177–189. ACM (1983)

    Google Scholar 

  2. Allen, R., Kennedy, K.: Automatic translation of FORTRAN programs to vector form. ACM Trans. Program. Lang. Syst. 9(4), 491–542 (1987)

    CrossRef  MATH  Google Scholar 

  3. AMD: AMD APP SDK v2.5 (March 2011)

    Google Scholar 

  4. Apodaca, A., Mantle, M.: RenderMan: Pursuing the Future of Graphics. IEEE Computer Graphics & Applications 10(4), 44–49 (1990)

    CrossRef  Google Scholar 

  5. Cheong, G., Lam, M.: An Optimizer for Multimedia Instruction Sets. In: Second SUIF Compiler Workshop (1997)

    Google Scholar 

  6. Darte, A., Robert, Y., Vivien, F.: Scheduling and Automatic Parallelization. Birkhauser, Boston (2000)

    CrossRef  MATH  Google Scholar 

  7. Fritz, N., Lucas, P., Slusallek, P.: CGiS, a New Language for Data-Parallel GPU Programming. In: VMV, pp. 241–248 (2004)

    Google Scholar 

  8. Gummaraju, J., Morichetti, L., Houston, M., Sander, B., Gaster, B.R., Zheng, B.: Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors. In: PACT, pp. 205–216. ACM, New York (2010)

    CrossRef  Google Scholar 

  9. Hormati, A.H., Choi, Y., Woh, M., Kudlur, M., Rabbah, R., Mudge, T., Mahlke, S.: Macross: macro-simdization of streaming applications. In: ASPLOS, pp. 285–296. ACM, New York (2010)

    Google Scholar 

  10. Intel: Intel OpenCL SDK 1.1 (June 2011)

    Google Scholar 

  11. Jaskelainen, P.O., de La Lama, C.S., Huerta, P., Takala, J.: OpenCL-based design methodology for application-specific processors. In: SAMOS 2010, pp. 223–230 (July 2010)

    Google Scholar 

  12. Karrenberg, R., Hack, S.: Whole Function Vectorization. In: CGO, pp. 141–150 (2011)

    Google Scholar 

  13. Khronos Group: OpenCL 1.1 Specification (June 2011)

    Google Scholar 

  14. Lattner, C., Adve, V.: LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In: CGO (March 2004)

    Google Scholar 

  15. Newburn, C.J., So, B., Liu, Z., McCool, M.D., Ghuloum, A.M., Toit, S.D., Wang, Z.G., Du, Z., Chen, Y., Wu, G., Guo, P., Liu, Z., Zhang, D.: Intel’s Array Building Blocks: A retargetable, dynamic compiler and embedded language. In: CGO, pp. 224–235 (2011)

    Google Scholar 

  16. Ngo, V.: Parallel loop transformation techniques for vector-based multiprocessor systems. Ph.D. thesis, University of Minnesota-Twin Cities (May 1994)

    Google Scholar 

  17. Nuzman, D., Henderson, R.: Multi-platform auto-vectorization. In: CGO, pp. 281–294 (2006)

    Google Scholar 

  18. Nuzman, D., Zaks, A.: Outer-loop vectorization: revisited for short simd architectures. In: PACT, pp. 2–11. ACM (2008)

    Google Scholar 

  19. NVIDIA: CUDA Programming Guide (2009)

    Google Scholar 

  20. Parker, S., et al.: RTSL: A Ray Tracing Shading Language. In: IEEE Symposium on Interactive Ray Tracing (2007)

    Google Scholar 

  21. Pharr, M.: Intel SPMD Program Compiler (June 2011)

    Google Scholar 

  22. Shin, J.: Introducing Control Flow into Vectorized Code. In: PACT, pp. 280–291. IEEE Computer Society (2007)

    Google Scholar 

  23. Sreraman, N., Govindarajan, R.: A vectorizing compiler for multimedia extensions. Int. J. Parallel Program. 28(4), 363–400 (2000)

    CrossRef  Google Scholar 

  24. Steckelmacher, D.: An OpenCL State Tracker for Gallium based on Clover (August 2011), http://people.freedesktop.org/~steckdenis/clover

  25. Stratton, J.A., Stone, S.S., Hwu, W.-m.W.: MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 16–30. Springer, Heidelberg (2008)

    CrossRef  Google Scholar 

  26. The Portland Group, Inc.: PGI CUDA-x86 (June 2011)

    Google Scholar 

  27. Touati, S.A.A., Worms, J., Briais, S.: The Speedup Test. Rapport de recherche (2010), http://hal.inria.fr/inria-00443839/en/

Download references

Author information

Authors and Affiliations

  1. Saarland University, Germany

    Ralf Karrenberg & Sebastian Hack

Authors
  1. Ralf Karrenberg
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Sebastian Hack
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. School for Informatics, University of Edinburgh, 10 Crichton Street, EH8 9AB, Edinburgh, UK

    Michael O’Boyle

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Karrenberg, R., Hack, S. (2012). Improving Performance of OpenCL on CPUs. In: O’Boyle, M. (eds) Compiler Construction. CC 2012. Lecture Notes in Computer Science, vol 7210. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28652-0_1

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-28652-0_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28651-3

  • Online ISBN: 978-3-642-28652-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature