Scout: A Source-to-Source Transformator for SIMD-Optimizations

  • Olaf Krzikalla
  • Kim Feldhoff
  • Ralph Müller-Pfefferkorn
  • Wolfgang E. Nagel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7156)


We present Scout, a configurable source-to-source transformation tool designed to automatically vectorize C source code. Scout provides the means to vectorize loops using SIMD instructions at source level. Our main focus during the development of Scout is a maximum flexibility of the tool in two ways: being capable of vectorizing a wide range of loop constructs and being capable of targeting various modern SIMD architectures. Scout supports several SIMD instructions sets like SSE or AVX and is easily extensible to upcoming ones.

In the second part of the paper we present results of applying Scout’s vectorizing capabilities to CFD production codes of the German Aerospace Center. The complex loops used in these codes often inhibit the automatic vectorization of usual C compilers. In contrast, Scout is able to vectorize most of these loops. We measured the resulting speedup for SSE and AVX platforms.


Vector Size German Aerospace Abstract Syntax Tree SIMD Instruction Graphical User Inter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    clang: a C language family frontend for LLVM, (visited on March 26, 2010)
  2. 2.
    Intel VTune Performance Analyzer Basics: What is CPI and how do I use it? (visited on June 6, 2011)
  3. 3.
    Loop unswitching, (visited on July 19, 2011)
  4. 4.
    HICFD - Highly Efficient Implementation of CFD Codes for HPC Many-Core Architectures (2009), (visited on March 26, 2010)
  5. 5.
    Allen, R., Kennedy, K.: Automatic translation of fortran programs to vector form. ACM Trans. Program. Lang. Syst. 9, 491–542 (1987), zbMATHCrossRefGoogle Scholar
  6. 6.
    Hohenauer, M., Engel, F., Leupers, R., Ascheid, G., Meyr, H.: A SIMD optimization framework for retargetable compilers. ACM Trans. Archit. Code Optim. 6(1), 1–27 (2009)CrossRefGoogle Scholar
  7. 7.
    Kennedy, K., Allen, J.R.: Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco (2002)Google Scholar
  8. 8.
    Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. In: Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI 2000, pp. 145–156. ACM, New York (2000), CrossRefGoogle Scholar
  9. 9.
    Pokam, G., Bihan, S., Simonnet, J., Bodin, F.: SWARP: a retargetable preprocessor for multimedia instructions. Concurr. Comput.: Pract. Exper. 16(2-3), 303–318 (2004)CrossRefGoogle Scholar
  10. 10.
    Schöne, R., Hackenberg, D.: On-line analysis of hardware performance events for workload characterization and processor frequency scaling decisions. In: Proceeding of the Second Joint WOSP/SIPEW International Conference on Performance Engineering, ICPE 2011, pp. 481–486. ACM, New York (2011), CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Olaf Krzikalla
    • 1
  • Kim Feldhoff
    • 1
  • Ralph Müller-Pfefferkorn
    • 1
  • Wolfgang E. Nagel
    • 1
  1. 1.Technische UniversitätDresdenGermany

Personalised recommendations