Skip to main content

Extending OpenMP* with Vector Constructs for Modern Multicore SIMD Architectures

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNPSE,volume 7312)

Abstract

In order to obtain maximum performance, many applications require to extend parallelism from multi-threading to instruction-level (SIMD) parallelism that exists in many current (and future) multi-core architectures. While auto-vectorization technology has been used to exploit this SIMD level, it is not always enough due to OpenMP semantics and compiler technology limitations. In those cases, programmers need to resort to low-level intrinsics or vendor specific directives. We propose a new OpenMP directive: the simd directive. This directive will allow programmers to guide the vectorization process enabling a more productive and portable exploitation of the SIMD level. Our performance results show significant improvements over current auto-vectorizing technology of the Intel® Composer XE 2011.

Keywords

  • Vector Length
  • Loop Iteration
  • Chunk Size
  • OpenMP Directive
  • SIMD Architecture

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Borkar, S., Chien, A.A.: The Future of Microprocessors. Communications of the ACM 54(5), 67–77 (2011)

    CrossRef  Google Scholar 

  2. Caballero, D.L.: User-directed Vectorization in OmpSs. Master’s thesis, Universitat Politècnica de Catalunya, Barcelona, Spain (September 2011)

    Google Scholar 

  3. Barcelona Supercomputing Center. The NANOS Group Site: The Mercurium Compiler, http://nanos.ac.upc.edu/mcxx

  4. Omer Cheema, M., Hammami, O.: Application-specific SIMD Synthesis for Reconfigurable Architectures. Microprocessors and Microsystems 30(6), 398–412 (2006)

    CrossRef  Google Scholar 

  5. Eichenberger, A.E., Wu, P., O’Brien, K.: Vectorization for SIMD Architectures with Alignment Constraints. In: Proc. of the ACM SIGPLAN 2004 Conf. on Programming Language Design and Implementation, Washington, D.C, pp. 82–93 (June 2004)

    Google Scholar 

  6. Free Software Foundation Inc. GCC 4.7 Release Series (March 2012), http://gcc.gnu.org/gcc-4.7/

  7. Heinecke, A., Klemm, M., Bungartz, H.-J.: From GPGPUs to Many-Core: NVIDIA Fermi* and Intel® Many Integrated Core Architecture. Computing in Science and Engineering (to appear, 2012)

    Google Scholar 

  8. Heinecke, A., Pflüger, D.: Multi- and many-core data mining with adaptive sparse grids. In: Proc. of the 8th ACM Intl. Conf. on Computing Frontiers, New York, pp. 29:1–29:10 (May 2011)

    Google Scholar 

  9. Intel Corporation. Intel® Advanced Vector Extensions Programming Reference, Document number 319433-011 (June 2011)

    Google Scholar 

  10. Karrenberg, R., Hack, S.: Whole-Function Vectorization. In: Proc. of the 9th Intl. Ann. IEEE/ACM Symp. on Code Generation and Optimization, Charmonix, France, pp. 141–150 (April 2011)

    Google Scholar 

  11. Khronos OpenCL Working Group. The OpenCL Specification (February 2009), http://www.khronos.org/registry/cl/

  12. Krzikalla, O., Feldhoff, K., Müller-Pfefferkorn, R., Nagel, W.E.: Auto-Vectorization Techniques for Modern SIMD Architectures. In: Proc. of the 16th Workshop on Compilers for Parallel Computing, Padova, Italy (January 2012)

    Google Scholar 

  13. Larsen, S., Amarasinghe, S.: Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In: Proc. of the ACM SIGPLAN 2000 Conf. on Programming Language Design and Implementation, Vancouver, BC, Canada, pp. 145–156 (June 2000)

    Google Scholar 

  14. Maleki, S., Gao, Y., Garzarán, M.J., Wong, T., Padua, D.A.: An Evaluation of Vectorizing Compilers. In: Proc. of the 2011 Intl. Conf. on Parallel Architectures and Compilation Techniques, Galveston Island, TX, pp. 372–382 (October 2011)

    Google Scholar 

  15. Naishlos, D., Biberstein, M., Ben-David, S., Zaks, A.: Vectorizing for a SIMdD DSP architecture. In: Proc. of the 2003 Intl. Conf. on Compilers, Architecture and Synthesis for Embedded Systems, San Jose, CA, pp. 2–11 (October 2003)

    Google Scholar 

  16. Naishlos, D., Biberstein, M., Zaks, A.: Compiler Vectorization Techniques for a Disjoint SIMD Architecture. Technical Report H-0146, IBM Research Division, Haifa, Israel (November 2002)

    Google Scholar 

  17. Nuzman, D., Henderson, R.: Multi-platform Auto-vectorization. In: Proc. of the 4th Ann. IEEE/ACM Intl. Symp. on Code Generation and Optimization, New York, pp. 281–294 (March 2006)

    Google Scholar 

  18. Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of Interleaved Data for SIMD. In: Proc. of the 2006 ACM SIGPLAN Conf. on Programming Language Design and Implementation, Ottawa, ON, Canada, pp. 132–143 (June 2006)

    Google Scholar 

  19. Nuzman, D., Zaks, A.: Outer-loop Vectorization: Revisited for Short SIMD Architectures. In: Proc. of the 17th Intl. Conf. on Parallel Architectures and Compilation Techniques, Toronto, ON, Canada, pp. 2–11 (October 2008)

    Google Scholar 

  20. OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.1 (July 2011), http://www.openmp.org/

  21. Sutter, H.: The Free Lunch Is Over—A Fundamental Turn Toward Concurrency in Software. Dr. Dobb’s Journal 30(3) (March 2005)

    Google Scholar 

  22. Tian, X., Saito, H., Preis, S.V., Kozhukhov, S.S., Cherkasov, A.G., Nelson, C., Panchenko, N., Geva, R.: Compiling C/C++ SIMD Extensions for Function and Loop Vectorization on Multicore-SIMD Processors. In: Multicore and GPU Programming Models, Languages and Compilers Workshop (Submitted for peer review)

    Google Scholar 

  23. Wu, P., Eichenberger, A.E., Wang, A.: Efficient SIMD Code Generation for Runtime Alignment and Length Conversion. In: Proc. of the 3rd Ann. IEEE/ACM Intl. Symp. on Code Generation and Optimization, Jan Jose, CA, pp. 153–164 (March 2005)

    Google Scholar 

  24. Wu, P., Eichenberger, A.E., Wang, A., Zhao, P.: An Integrated Simdization Framework Using Virtual Vectors. In: Proc. of the 19th Annual Intl. Conf. on Supercomputing, Boston, MA, USA, pp. 169–178 (June 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Klemm, M., Duran, A., Tian, X., Saito, H., Caballero, D., Martorell, X. (2012). Extending OpenMP* with Vector Constructs for Modern Multicore SIMD Architectures. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds) OpenMP in a Heterogeneous World. IWOMP 2012. Lecture Notes in Computer Science, vol 7312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30961-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30961-8_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30960-1

  • Online ISBN: 978-3-642-30961-8

  • eBook Packages: Computer ScienceComputer Science (R0)