Auto-Vectorization of Loops on Intel 64 and Intel Xeon Phi: Analysis and Evaluation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10421)


This paper evaluates auto-vectorizing capabilities of modern optimizing compilers Intel C/C++, GCC C/C++, LLVM/Clang and PGI C/C++ on Intel 64 and Intel Xeon Phi architectures. We use the Extended Test Suite for Vectorizing Compilers consisting of 151 loops. In this work, we estimate speedup by running the loops in scalar and vector modes for different data types and determine loop classes which the compilers used in the study fail to vectorize. We use the dual CPU system (NUMA, 2 x Intel Xeon E5-2620v4, Intel Broadwell microarchitecture) with the Intel Xeon Phi 3120A co-processor for our experiments.


  1. 1.
    Maleki, S., Gao, Y., Garzaran, M.J., Wong, T., Padua, D.A.: An evaluation of vectorizing compilers. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pp. 372–382 (2011)Google Scholar
  2. 2.
    Extended Test Suite for Vectorizing Compilers.
  3. 3.
    Callahan, D., Dongarra, J., Levine, D.: Vectorizing compilers: a test suite and results. In: Proceedings of the ACM/IEEE Conference on Supercomputing, pp. 98–105 (1988)Google Scholar
  4. 4.
    Levine, D., Callahan, D., Dongarra, J.: A comparative study of automatic vectorizing compilers. J. Parallel Comput. 17, 1223–1244 (1991)CrossRefGoogle Scholar
  5. 5.
  6. 6.
    Jibaja, I., Jensen, P., Hu, N., Haghighat, M., McCutchan, J., Gohman, D., Blackburn, S., McKinley, K.: Vector parallelism in JavaScript: language and compiler support for SIMD. In: Proceedings of the International Conference on Parallel Architecture and Compilation, Techniques, pp. 407–418 (2015)Google Scholar
  7. 7.
    Program Vectorization: Theory, Methods, Implementation (1991)Google Scholar
  8. 8.
    Metzger, R.C., Wen, Z.: Automatic Algorithm Recognition and Replacement: A New Approach to Program Optimization. MIT Press, Cambridge (2000)Google Scholar
  9. 9.
    Rohou, E., Williams, K., Yuste, D.: Vectorization technology to improve interpreter performance. ACM Trans. Archit. Code Optim. 9(4), 26: 1–26: 22 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Rzhanov Institute of Semiconductor PhysicsSiberian Branch of Russian Academy of SciencesNovosibirskRussia

Personalised recommendations