Affine Parallelization of Loops with Run-Time Dependent Bounds from Binaries

Kotha, Aparna; Anand, Kapil; Creech, Timothy; ElWazeer, Khaled; Smithson, Matthew; Barua, Rajeev

doi:10.1007/978-3-642-54833-8_29

Aparna Kotha¹⁷,
Kapil Anand¹⁷,
Timothy Creech¹⁷,
Khaled ElWazeer¹⁷,
Matthew Smithson¹⁷ &
…
Rajeev Barua¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8410))

Included in the following conference series:

European Symposium on Programming Languages and Systems

2 Citations

Abstract

An automatic parallelizer is a tool that converts serial code to parallel code. This is an important tool because most hardware today is parallel and manually rewriting the vast repository of serial code is tedious and error prone. We build an automatic parallelizer for binary code, i.e. a tool which converts a serial binary to a parallel binary. It is important because: (i) most serial legacy code has no source code available; (ii) it is compatible with all compilers and languages.

In the past binary automatic parallelization techniques have been developed and researchers have presented results on small kernels from polybench. These techniques are a good start; however they are far from parallelizing larger codes from the SPEC2006 and OMP2001 benchmark suites which are representative of real world codes. The main limitation of past techniques is the assumption that loop bounds are statically known to calculate loop dependencies. However, in larger codes loop bounds are only known at run-time; hence loop dependencies calculated statically are overly conservative making binary parallelization ineffective.

In this paper we present a novel algorithm that enhancing past techniques significantly by guessing the most likely loop bounds using only the memory expressions present in that loop. It then inserts run-time checks to see if these guesses were indeed correct and if correct executes the parallel version of the loop, else the serial version executes. These techniques are applied to the large affine benchmarks in SPEC2006 and OMP2001 and unlike previous methods the speedups from binary are as good as from source. We also present results on the number of loops parallelized directly from a binary with and without this algorithm. Among the 8 affine benchmarks among these suites, the best existing binary parallelization method achieves an average speedup of 1.74X, whereas our method achieves a speedup of 3.38X. This is close to the speedup from source code of 3.15X.

Download to read the full chapter text

Chapter PDF

Pure Functions in C: A Small Keyword for Automatic Parallelization

Article Open access 30 May 2020

Runtime Vectorization Transformations of Binary Code

Article 22 December 2016

Parallelization of Implementations of Purely Sequential Algorithms

Article 16 December 2019

Keywords

References

Anand, K., et al.: A compiler level intermediate representation based binary analysis and rewriting system. In: Proceedings of the 8th ACM European Conference on Computer Systems (2013)
Google Scholar
Dasgupta, A., Dasgupta, A.: Vizer: A framework to analyze and vectorize intel x86 binaries (2003)
Google Scholar
Franke, B., O’boyle, M.: Array recovery and high-level transformations for dsp applications. ACM Trans. Embed. Comput. Syst. (2003)
Google Scholar
Yang, J., Soffa, M.L., Skadron, K., Whitehouse, K.: Feasibility of dynamic binary parallelization (2011)
Google Scholar
Kotha, A., Anand, K., Smithson, M., Yellareddy, G., Barua, R.: Automatic parallelization in a binary rewriter. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (2010)
Google Scholar
Lattner, C., Adve, V.: LLVM: A compilation framework for lifelong program analysis & transformation. In: Proceedings of the International Symposium on CGO (2004)
Google Scholar
LLVM, clang: a C language family frontend for LLVM (2007), http://clang.llvm.org/
LLVM, DragonEgg - Using LLVM as a GCC backend (2009), http://dragonegg.llvm.org/
Maslov, V.: Delinearization: an efficient way to break multiloop dependence equations. In: Proc. the SIGPLAN 1992 Conference on Programming Language Design and Implementation, pp. 152–161 (1992)
Google Scholar
Nakamura, T., Miki, S., Oikawa, S.: Automatic vectorization by runtime binary translation. In: Proceedings of the 2011 Second International Conference on Networking and Computing (2011)
Google Scholar
O’Sullivan, P., Anand, K., Kotha, A., Smithson, M., Barua, R., Keromytis, A.D.: Retrofitting security in cots software with binary rewriting. In: Proceedings of the 26th International Information Security Conference (2011)
Google Scholar
Pradelle, B., Ketterlin, A., Clauss, P.: Polyhedral parallelization of binary code. ACM Trans. Archit. Code Optim. (2012)
Google Scholar
Smithson, M., Anand, K., Kotha, A., Elwazeer, K., Giles, N., Barua, R.: Binary rewriting without relocation information. Technical report, University of Maryland, College Park (2010)
Google Scholar
Wang, C., Wu, Y., Borin, E., Hu, S., Liu, W., Sager, D., Ngai, T.-F., Fang, J.: Dynamic parallelization of single-threaded binary programs using speculative slicing. In: Proceedings of the 23rd International Conference on Supercomputing, ICS 2009 (2009)
Google Scholar
Yardimci, E., Franz, M.: Dynamic parallelization and mapping of binary executables on hierarchical platforms. In: Proceedings of the 3rd Conference on Computing Frontiers (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Maryland, College Park, MD, 20742, USA
Aparna Kotha, Kapil Anand, Timothy Creech, Khaled ElWazeer, Matthew Smithson & Rajeev Barua

Authors

Aparna Kotha
View author publications
You can also search for this author in PubMed Google Scholar
Kapil Anand
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Creech
View author publications
You can also search for this author in PubMed Google Scholar
Khaled ElWazeer
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Smithson
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Barua
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Yale University, New Haven, CT, USA
Zhong Shao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kotha, A., Anand, K., Creech, T., ElWazeer, K., Smithson, M., Barua, R. (2014). Affine Parallelization of Loops with Run-Time Dependent Bounds from Binaries. In: Shao, Z. (eds) Programming Languages and Systems. ESOP 2014. Lecture Notes in Computer Science, vol 8410. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54833-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-54833-8_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54832-1
Online ISBN: 978-3-642-54833-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Affine Parallelization of Loops with Run-Time Dependent Bounds from Binaries

Abstract

Chapter PDF

Similar content being viewed by others

Pure Functions in C: A Small Keyword for Automatic Parallelization

Runtime Vectorization Transformations of Binary Code

Parallelization of Implementations of Purely Sequential Algorithms

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Affine Parallelization of Loops with Run-Time Dependent Bounds from Binaries

Abstract

Chapter PDF

Similar content being viewed by others

Pure Functions in C: A Small Keyword for Automatic Parallelization

Runtime Vectorization Transformations of Binary Code

Parallelization of Implementations of Purely Sequential Algorithms

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation