Scout: A Source-to-Source Transformator for SIMD-Optimizations
We present Scout, a configurable source-to-source transformation tool designed to automatically vectorize C source code. Scout provides the means to vectorize loops using SIMD instructions at source level. Our main focus during the development of Scout is a maximum flexibility of the tool in two ways: being capable of vectorizing a wide range of loop constructs and being capable of targeting various modern SIMD architectures. Scout supports several SIMD instructions sets like SSE or AVX and is easily extensible to upcoming ones.
In the second part of the paper we present results of applying Scout’s vectorizing capabilities to CFD production codes of the German Aerospace Center. The complex loops used in these codes often inhibit the automatic vectorization of usual C compilers. In contrast, Scout is able to vectorize most of these loops. We measured the resulting speedup for SSE and AVX platforms.
KeywordsVector Size German Aerospace Abstract Syntax Tree SIMD Instruction Graphical User Inter
Unable to display preview. Download preview PDF.
- 1.clang: a C language family frontend for LLVM, http://clang.llvm.org (visited on March 26, 2010)
- 2.Intel VTune Performance Analyzer Basics: What is CPI and how do I use it? http://software.intel.com/en-us/articles/intel-vtune-performance-analyzer-basics-what-is-cpi-and-how-do-i-use-it/ (visited on June 6, 2011)
- 3.Loop unswitching, http://en.wikipedia.org/wiki/Loop_unswitching (visited on July 19, 2011)
- 4.HICFD - Highly Efficient Implementation of CFD Codes for HPC Many-Core Architectures (2009), http://www.hicfd.de (visited on March 26, 2010)
- 7.Kennedy, K., Allen, J.R.: Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco (2002)Google Scholar
- 8.Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. In: Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI 2000, pp. 145–156. ACM, New York (2000), http://doi.acm.org/10.1145/349299.349320 CrossRefGoogle Scholar
- 10.Schöne, R., Hackenberg, D.: On-line analysis of hardware performance events for workload characterization and processor frequency scaling decisions. In: Proceeding of the Second Joint WOSP/SIPEW International Conference on Performance Engineering, ICPE 2011, pp. 481–486. ACM, New York (2011), http://doi.acm.org/10.1145/1958746.1958819 CrossRefGoogle Scholar