Abstract
In order to obtain maximum performance, many applications require to extend parallelism from multi-threading to instruction-level (SIMD) parallelism that exists in many current (and future) multi-core architectures. While auto-vectorization technology has been used to exploit this SIMD level, it is not always enough due to OpenMP semantics and compiler technology limitations. In those cases, programmers need to resort to low-level intrinsics or vendor specific directives. We propose a new OpenMP directive: the simd directive. This directive will allow programmers to guide the vectorization process enabling a more productive and portable exploitation of the SIMD level. Our performance results show significant improvements over current auto-vectorizing technology of the Intel® Composer XE 2011.
Keywords
- Vector Length
- Loop Iteration
- Chunk Size
- OpenMP Directive
- SIMD Architecture
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Borkar, S., Chien, A.A.: The Future of Microprocessors. Communications of the ACM 54(5), 67–77 (2011)
Caballero, D.L.: User-directed Vectorization in OmpSs. Master’s thesis, Universitat Politècnica de Catalunya, Barcelona, Spain (September 2011)
Barcelona Supercomputing Center. The NANOS Group Site: The Mercurium Compiler, http://nanos.ac.upc.edu/mcxx
Omer Cheema, M., Hammami, O.: Application-specific SIMD Synthesis for Reconfigurable Architectures. Microprocessors and Microsystems 30(6), 398–412 (2006)
Eichenberger, A.E., Wu, P., O’Brien, K.: Vectorization for SIMD Architectures with Alignment Constraints. In: Proc. of the ACM SIGPLAN 2004 Conf. on Programming Language Design and Implementation, Washington, D.C, pp. 82–93 (June 2004)
Free Software Foundation Inc. GCC 4.7 Release Series (March 2012), http://gcc.gnu.org/gcc-4.7/
Heinecke, A., Klemm, M., Bungartz, H.-J.: From GPGPUs to Many-Core: NVIDIA Fermi* and Intel® Many Integrated Core Architecture. Computing in Science and Engineering (to appear, 2012)
Heinecke, A., Pflüger, D.: Multi- and many-core data mining with adaptive sparse grids. In: Proc. of the 8th ACM Intl. Conf. on Computing Frontiers, New York, pp. 29:1–29:10 (May 2011)
Intel Corporation. Intel® Advanced Vector Extensions Programming Reference, Document number 319433-011 (June 2011)
Karrenberg, R., Hack, S.: Whole-Function Vectorization. In: Proc. of the 9th Intl. Ann. IEEE/ACM Symp. on Code Generation and Optimization, Charmonix, France, pp. 141–150 (April 2011)
Khronos OpenCL Working Group. The OpenCL Specification (February 2009), http://www.khronos.org/registry/cl/
Krzikalla, O., Feldhoff, K., Müller-Pfefferkorn, R., Nagel, W.E.: Auto-Vectorization Techniques for Modern SIMD Architectures. In: Proc. of the 16th Workshop on Compilers for Parallel Computing, Padova, Italy (January 2012)
Larsen, S., Amarasinghe, S.: Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In: Proc. of the ACM SIGPLAN 2000 Conf. on Programming Language Design and Implementation, Vancouver, BC, Canada, pp. 145–156 (June 2000)
Maleki, S., Gao, Y., Garzarán, M.J., Wong, T., Padua, D.A.: An Evaluation of Vectorizing Compilers. In: Proc. of the 2011 Intl. Conf. on Parallel Architectures and Compilation Techniques, Galveston Island, TX, pp. 372–382 (October 2011)
Naishlos, D., Biberstein, M., Ben-David, S., Zaks, A.: Vectorizing for a SIMdD DSP architecture. In: Proc. of the 2003 Intl. Conf. on Compilers, Architecture and Synthesis for Embedded Systems, San Jose, CA, pp. 2–11 (October 2003)
Naishlos, D., Biberstein, M., Zaks, A.: Compiler Vectorization Techniques for a Disjoint SIMD Architecture. Technical Report H-0146, IBM Research Division, Haifa, Israel (November 2002)
Nuzman, D., Henderson, R.: Multi-platform Auto-vectorization. In: Proc. of the 4th Ann. IEEE/ACM Intl. Symp. on Code Generation and Optimization, New York, pp. 281–294 (March 2006)
Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of Interleaved Data for SIMD. In: Proc. of the 2006 ACM SIGPLAN Conf. on Programming Language Design and Implementation, Ottawa, ON, Canada, pp. 132–143 (June 2006)
Nuzman, D., Zaks, A.: Outer-loop Vectorization: Revisited for Short SIMD Architectures. In: Proc. of the 17th Intl. Conf. on Parallel Architectures and Compilation Techniques, Toronto, ON, Canada, pp. 2–11 (October 2008)
OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.1 (July 2011), http://www.openmp.org/
Sutter, H.: The Free Lunch Is Over—A Fundamental Turn Toward Concurrency in Software. Dr. Dobb’s Journal 30(3) (March 2005)
Tian, X., Saito, H., Preis, S.V., Kozhukhov, S.S., Cherkasov, A.G., Nelson, C., Panchenko, N., Geva, R.: Compiling C/C++ SIMD Extensions for Function and Loop Vectorization on Multicore-SIMD Processors. In: Multicore and GPU Programming Models, Languages and Compilers Workshop (Submitted for peer review)
Wu, P., Eichenberger, A.E., Wang, A.: Efficient SIMD Code Generation for Runtime Alignment and Length Conversion. In: Proc. of the 3rd Ann. IEEE/ACM Intl. Symp. on Code Generation and Optimization, Jan Jose, CA, pp. 153–164 (March 2005)
Wu, P., Eichenberger, A.E., Wang, A., Zhao, P.: An Integrated Simdization Framework Using Virtual Vectors. In: Proc. of the 19th Annual Intl. Conf. on Supercomputing, Boston, MA, USA, pp. 169–178 (June 2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Klemm, M., Duran, A., Tian, X., Saito, H., Caballero, D., Martorell, X. (2012). Extending OpenMP* with Vector Constructs for Modern Multicore SIMD Architectures. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds) OpenMP in a Heterogeneous World. IWOMP 2012. Lecture Notes in Computer Science, vol 7312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30961-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-30961-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30960-1
Online ISBN: 978-3-642-30961-8
eBook Packages: Computer ScienceComputer Science (R0)
