Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code

Wang, Zheng; Powell, Daniel; Franke, Björn; O’Boyle, Michael

doi:10.1007/978-3-642-54807-9_9

Zheng Wang¹⁷,
Daniel Powell¹⁸,
Björn Franke¹⁸ &
…
Michael O’Boyle¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8409))

Included in the following conference series:

International Conference on Compiler Construction

1419 Accesses
9 Citations

Abstract

General purpose Gpus provide massive compute power, but are notoriously difficult to program. In this paper we present a complete compilation strategy to exploit Gpus for the parallelisation of sequential legacy code. Using hybrid data dependence analysis combining static and dynamic information, our compiler automatically detects suitable parallelism and generates parallel OpenCl code from sequential programs. We exploit the fact that dependence profiling provides us with parallel loop candidates that are highly likely to be genuinely parallel, but cannot be statically proven so. For the efficient Gpu parallelisation of those probably parallel loop candidates, we propose a novel software speculation scheme, which ensures correctness for the unlikely, yet possible case of dynamically detected dependence violations. Our scheme operates in place and supports speculative read and write operations. We demonstrate the effectiveness of our approach in detecting and exploiting parallelism using sequential codes from the Nas benchmark suite. We achieve an average speedup of 3.2x, and up to 99x, over the sequential baseline. On average, this is 1.42 times faster than state-of-the-art speculation schemes and corresponds to 99% of the performance level of a manual Gpu implementation developed by independent expert programmers.

Download to read the full chapter text

Chapter PDF

DiscoPoP: A Profiling Tool to Identify Parallelization Opportunities

CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications

Article 21 January 2015

NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model

Keywords

References

NAS parallel benchmarks 2.3, OpenMP C version, http://phase.hpcc.jp/Omni/benchmarks/NPB/index.html
Ahn, W., Duan, Y., Torrellas, J.: Dealiaser: Alias speculation using atomic region support. In: ASPLOS 2013 (2013)
Google Scholar
AMD. AMD/ATI Stream SDK, http://www.amd.com/stream/
Bridges, M., Vachharajani, N., Zhang, Y., Jablin, T., August, D.: Revisiting the sequential programming model for the multicore era. IEEE Micro 28(1) (2008)
Google Scholar
Grewe, D., Wang, Z., O’Boyle, M.: Portable mapping of data parallel programs to opencl for heterogeneous systems. In: CGO 2013 (2013)
Google Scholar
Hayashi, A., Grossman, M., Zhao, J., Shirako, J., Sarkar, V.: Speculative execution of parallel programs with precise exception semantics on gpus. In: LCPC 2013 (2013)
Google Scholar
Ketterlin, A., Clauss, P.: Profiling data-dependence to assist parallelization: Framework, scope, and optimization. In: MICRO 2012 (2012)
Google Scholar
Kim, M., Kim, H., Luk, C.-K.: Sd3: A scalable approach to dynamic data-dependence profiling. In: MICRO 43
Google Scholar
Landi, W.: Undecidability of static analysis. ACM Lett. Program. Lang. Syst. 1(4) (December 1992)
Google Scholar
Lee, S., Eigenmann, R.: Openmpc: Extended openmp programming and tuning for gpus. In: SC 2010 (2010)
Google Scholar
Mak, J., Faxén, K.-F., Janson, S., Mycroft, A.: Estimating and exploiting potential parallelism by source-level dependence profiling. In: EuroPar 2010 (2010)
Google Scholar
Mehrara, M., Hao, J., Hsu, P.-C., Mahlke, S.: Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In: PLDI 2009 (2009)
Google Scholar
Mishra, V., Aggarwal, S.K.: Partool: A feedback-directed parallelizer. In: Temam, O., Yew, P.-C., Zang, B. (eds.) APPT 2011. LNCS, vol. 6965, pp. 157–171. Springer, Heidelberg (2011)
Chapter Google Scholar
Oancea, C.E., Mycroft, A.: A lightweight model for software thread-level speculation (TLS). In: PACT 2007 (2007)
Google Scholar
Oancea, C.E., Mycroft, A.: Set-congruence dynamic analysis for thread-level speculation (TLS). In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 156–171. Springer, Heidelberg (2008)
Chapter Google Scholar
Oancea, C.E., Mycroft, A., Harris, T.: A lightweight in-place implementation for software thread-level speculation. In: SPAA 2009 (2009)
Google Scholar
Oancea, C.E., Mycroft, A., Harris, T.: A lightweight in-place implementation for software thread-level speculation. In: SPAA 2009 (2009)
Google Scholar
Prabhu, M.K., Olukotun, K.: Using thread-level speculation to simplify manual parallelization. In: PPoPP 2003 (2003)
Google Scholar
Rauchwerger, L.: Speculative parallelization of loops. Springer, Heidelberg (2011)
Google Scholar
Rauchwerger, L., Padua, D.A.: The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst. 10(2) (1999)
Google Scholar
Samadi, M., Hormati, A., Lee, J., Mahlke, S.: Paragon: Collaborative speculative loop execution on gpu and cpu. In: GPGPU 2012 (2012)
Google Scholar
Seo, S., Jo, G., Lee, J.: Performance characterization of the nas parallel benchmarks in opencl. In: IISWC 2011 (2011)
Google Scholar
Thies, W., Chandrasekhar, V., Amarasinghe, S.P.: A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In: MICRO 2007 (2007)
Google Scholar
Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.: Towards a holistic approach to auto-parallelization: Integrating profile-driven parallelism detection and machine -learning based mapping. In: PLDI 2009 (2009)
Google Scholar
Vandierendonck, H., Rul, S., De Bosschere, K.: The paralax infrastructure: Automatic parallelization with a helping hand. In: PACT 2010 (2010)
Google Scholar
Vanka, R., Tuck, J.: Efficient and accurate data dependence profiling using software signatures. In: CGO 2012 (2012)
Google Scholar
Vanka, R., Tuck, J.: Efficient and accurate data dependence profiling using software signatures. In: CGO 2012 (2012)
Google Scholar
Wallace, S., Calder, B., Tullsen, D.M.: Threaded multiple path execution. In: ISCA 1998 (1998)
Google Scholar
Wu, P., Kejariwal, A., Caşcaval, C.: Compiler-driven dependence profiling to guide program parallelization. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 232–248. Springer, Heidelberg (2008)
Chapter Google Scholar
Yu, H., Li, Z.: Fast loop-level data dependence profiling. In: ICS 2012 (2012)
Google Scholar
Zhai, A., Wang, S., Yew, P.-C., He, G.: Compiler optimizations for parallelizing general-purpose applications under thread-level speculation. In: PPoPP 2008 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Communications, Lancaster University, United Kingdom
Zheng Wang
School of Informatics, University of Edinburgh, United Kingdom
Daniel Powell, Björn Franke & Michael O’Boyle

Authors

Zheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Powell
View author publications
You can also search for this author in PubMed Google Scholar
Björn Franke
View author publications
You can also search for this author in PubMed Google Scholar
Michael O’Boyle
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Département d’Informatique, INRIA and École Normale Supérieure, 45 rue d’Ulm, 75005, Paris, France
Albert Cohen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Powell, D., Franke, B., O’Boyle, M. (2014). Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code. In: Cohen, A. (eds) Compiler Construction. CC 2014. Lecture Notes in Computer Science, vol 8409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54807-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-54807-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54806-2
Online ISBN: 978-3-642-54807-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code

Abstract

Chapter PDF

Similar content being viewed by others

DiscoPoP: A Profiling Tool to Identify Parallelization Opportunities

CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications

NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Exploitation of GPUs for the Parallelisation of Probably Parallel Legacy Code

Abstract

Chapter PDF

Similar content being viewed by others

DiscoPoP: A Profiling Tool to Identify Parallelization Opportunities

CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications

NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation