Scout: A Source-to-Source Transformator for SIMD-Optimizations

  • Olaf Krzikalla
  • Kim Feldhoff
  • Ralph Müller-Pfefferkorn
  • Wolfgang E. Nagel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7156)

Abstract

We present Scout, a configurable source-to-source transformation tool designed to automatically vectorize C source code. Scout provides the means to vectorize loops using SIMD instructions at source level. Our main focus during the development of Scout is a maximum flexibility of the tool in two ways: being capable of vectorizing a wide range of loop constructs and being capable of targeting various modern SIMD architectures. Scout supports several SIMD instructions sets like SSE or AVX and is easily extensible to upcoming ones.

In the second part of the paper we present results of applying Scout’s vectorizing capabilities to CFD production codes of the German Aerospace Center. The complex loops used in these codes often inhibit the automatic vectorization of usual C compilers. In contrast, Scout is able to vectorize most of these loops. We measured the resulting speedup for SSE and AVX platforms.

References

  1. 1.
    clang: a C language family frontend for LLVM, http://clang.llvm.org (visited on March 26, 2010)
  2. 2.
    Intel VTune Performance Analyzer Basics: What is CPI and how do I use it? http://software.intel.com/en-us/articles/intel-vtune-performance-analyzer-basics-what-is-cpi-and-how-do-i-use-it/ (visited on June 6, 2011)
  3. 3.
    Loop unswitching, http://en.wikipedia.org/wiki/Loop_unswitching (visited on July 19, 2011)
  4. 4.
    HICFD - Highly Efficient Implementation of CFD Codes for HPC Many-Core Architectures (2009), http://www.hicfd.de (visited on March 26, 2010)
  5. 5.
    Allen, R., Kennedy, K.: Automatic translation of fortran programs to vector form. ACM Trans. Program. Lang. Syst. 9, 491–542 (1987), http://doi.acm.org/10.1145/29873.29875 MATHCrossRefGoogle Scholar
  6. 6.
    Hohenauer, M., Engel, F., Leupers, R., Ascheid, G., Meyr, H.: A SIMD optimization framework for retargetable compilers. ACM Trans. Archit. Code Optim. 6(1), 1–27 (2009)CrossRefGoogle Scholar
  7. 7.
    Kennedy, K., Allen, J.R.: Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco (2002)Google Scholar
  8. 8.
    Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. In: Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI 2000, pp. 145–156. ACM, New York (2000), http://doi.acm.org/10.1145/349299.349320 CrossRefGoogle Scholar
  9. 9.
    Pokam, G., Bihan, S., Simonnet, J., Bodin, F.: SWARP: a retargetable preprocessor for multimedia instructions. Concurr. Comput.: Pract. Exper. 16(2-3), 303–318 (2004)CrossRefGoogle Scholar
  10. 10.
    Schöne, R., Hackenberg, D.: On-line analysis of hardware performance events for workload characterization and processor frequency scaling decisions. In: Proceeding of the Second Joint WOSP/SIPEW International Conference on Performance Engineering, ICPE 2011, pp. 481–486. ACM, New York (2011), http://doi.acm.org/10.1145/1958746.1958819 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Olaf Krzikalla
    • 1
  • Kim Feldhoff
    • 1
  • Ralph Müller-Pfefferkorn
    • 1
  • Wolfgang E. Nagel
    • 1
  1. 1.Technische UniversitätDresdenGermany

Personalised recommendations