PostSLP: Cross-Region Vectorization of Fully or Partially Vectorized Code

Porpodas, Vasileios; Ratnalikar, Pushkar

doi:10.1007/978-3-030-72789-5_2

Vasileios Porpodas¹⁰ &
Pushkar Ratnalikar¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11998))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

341 Accesses

Abstract

Modern optimizing compilers rely on auto-vectorization algorithms for generating high-performance code. Both loop and straight-line code vectorization algorithms generate SIMD vector instructions out of scalar code, with no intervention from the programmer.

In this work, we show that the existing auto-vectorization algorithms operate on restricted code regions and therefore are missing out vectorization opportunities by either generating narrower vectors than those possible for the target architecture or are completely failing and leaving some of the code in scalar form. We show the need for a specialized post-processing re-vectorization pass, called PostSLP, that has the ability to span across multiple regions, and to generate more effective vector code. PostSLP is designed to convert already vectorized, or partially vectorized code into wider forms that perform better on the target architecture. We implemented PostSLP in LLVM and our evaluation shows significant performance improvements in SPEC CPU2006.

V. Porpodas—Currently at Google.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The shuffle instructions of these examples are similar to LLVM’s shufflevector instructions.
2.
In LLVM, we use either the shufflevector instructions when the output is a vector instruction, or extractelement when the output is scalar.

References

Allen, J.R., Kennedy, K.: PFC: A program to convert Fortran to parallel form. Technical report 82-6, Rice University (1982)
Google Scholar
Allen, J.R., Kennedy, K.: Automatic translation of Fortran programs to vector form. TOPLAS (1987)
Google Scholar
Anderson, A., Malik, A., Gregg, D.: Automatic vectorization of interleaved data revisited. ACM TACO 12, 1–25 (2015)
Google Scholar
Davies, J., et al.: The KAP/S-1- an advanced source-to-source vectorizer for the S-1 Mark IIa supercomputer. In: ICPP (1986)
Google Scholar
GCC: GNU compiler collection (2015). http://gcc.gnu.org
Huh, J., Tuck, J.: Improving the effectiveness of searching for isomorphic chains in superword level parallelism. In: MICRO (2017)
Google Scholar
Karrenberg, R., Hack, S.: Whole-function vectorization. In: CGO (2011)
Google Scholar
Kennedy, K., Allen, J.R.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., Burlington (2001)
Google Scholar
Kuck, D.J., et al.: Dependence graphs and compiler optimizations. In: POPL (1981)
Google Scholar
Larsen, S., Amarasinghe, S.: Exploiting superword level parallelism with multimedia instruction sets. In: PLDI (2000)
Google Scholar
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis transformation. In: CGO (2004)
Google Scholar
Liu, J., Zhang, Y., Jang, O., Ding, W., Kandemir, M.: A compiler framework for extracting superword level parallelism. In: PLDI (2012)
Google Scholar
Liu, Y.-P., et al.: Exploiting asymmetric SIMD register configurations in ARM-to-x86 dynamic binary translation. In: PACT (2017)
Google Scholar
Masten, M., Tyurin, E., Mitropoulou, K., Saito, H., Garcia, E.: Function/Kernel vectorization via loop vectorizer. In: LLVM-HPC (2018)
Google Scholar
Mendis, C., Amarasinghe, S.: goSLP: globally optimized superword level parallelism framework. In: OOPSLA (2018)
Google Scholar
Mendis, C., Jain, A., Jain, P., Amarasinghe, S.: Revec: program rejuvenation through revectorization. In: CC (2019)
Google Scholar
Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for SIMD. In: PLDI (2006)
Google Scholar
Nuzman, D., Zaks, A.: Outer-loop vectorization: revisited for short SIMD architectures. In: PACT (2008)
Google Scholar
OpenMP Application Program Inteface. https://www.openmp.org/specifications/
Park, Y., Seo, S., Park, H., Cho, H., Mahlke, S.: SIMD defragmenter: efficient ILP realization on data-parallel architectures. In: ASPLOS (2012)
Google Scholar
Porpodas, V.: SuperGraph-SLP auto-vectorization. In: PACT (2017)
Google Scholar
Porpodas, V., Jones, T.M.: Throttling automatic vectorization: when less is more. In: PACT (2015)
Google Scholar
Porpodas, V., et al.: PSLP: padded SLP automatic vectorization. In: CGO (2015)
Google Scholar
Porpodas, V., Rocha, R.C., et al.: Super-node SLP: optimized vectorization for code sequences containing operators and their inverse elements. In: CGO (2019)
Google Scholar
Porpodas, V., Rocha, R.C.O., Góes, L.F.W.: VW-SLP: auto-vectorization with adaptive vector width. In: PACT (2018)
Google Scholar
Porpodas, V., Rocha, R.C.O., Góes, L.F.W.: Look-ahead SLP: auto-vectorization in the presence of commutative operations. In: CGO (2018)
Google Scholar
Ren, G., et al.: Optimizing data permutations for SIMD devices. In: PLDI (2006)
Google Scholar
Rocha, R.C.O., et al.: Vectorization-aware loop unrolling with seed forwarding. In: CC (2020)
Google Scholar
Rosen, I., et al.: Loop-aware SLP in GCC. In: GCC Developers’ Summit (2007)
Google Scholar
Shin, J., Hall, M., Chame, J.: Superword-level parallelism in the presence of control flow. In: CGO (2005)
Google Scholar
Wolfe, M.: Vector optimization vs. vectorization. In: Houstis, E.N., Papatheodorou, T.S., Polychronopoulos, C.D. (eds.) ICS 1987. LNCS, vol. 297, pp. 309–315. Springer, Heidelberg (1988). https://doi.org/10.1007/3-540-18991-2_18
Chapter Google Scholar
Wolfe, M.J.: High Performance Compilers for Parallel Computing. Addison-Wesley, Boston (1995)
Google Scholar
Zhou, H., Xue, J.: A compiler approach for exploiting partial SIMD parallelism. TACO 13, 1–26 (2016)
Google Scholar
Zhou, H., Xue, J.: Exploiting mixed SIMD parallelism by reducing data reorganization overhead. In: CGO (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Intel Corporation, Santa Clara, USA
Vasileios Porpodas & Pushkar Ratnalikar

Authors

Vasileios Porpodas
View author publications
You can also search for this author in PubMed Google Scholar
Pushkar Ratnalikar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vasileios Porpodas .

Editor information

Editors and Affiliations

Georgia Institute of Technology, Atlanta, GA, USA
Santosh Pande
Georgia Institute of Technology, Atlanta, GA, USA
Vivek Sarkar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Porpodas, V., Ratnalikar, P. (2021). PostSLP: Cross-Region Vectorization of Fully or Partially Vectorized Code. In: Pande, S., Sarkar, V. (eds) Languages and Compilers for Parallel Computing. LCPC 2019. Lecture Notes in Computer Science(), vol 11998. Springer, Cham. https://doi.org/10.1007/978-3-030-72789-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-72789-5_2
Published: 26 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72788-8
Online ISBN: 978-3-030-72789-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics