A Practical Approach to DOACROSS Parallelization

Unnikrishnan, Priya; Shirako, Jun; Barton, Kit; Chatterjee, Sanjay; Silvera, Raul; Sarkar, Vivek

doi:10.1007/978-3-642-32820-6_23

Priya Unnikrishnan¹⁹,
Jun Shirako²⁰,
Kit Barton¹⁹,
Sanjay Chatterjee²⁰,
Raul Silvera¹⁹ &
…
Vivek Sarkar²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7484))

Included in the following conference series:

European Conference on Parallel Processing

3148 Accesses
7 Citations
3 Altmetric

Abstract

Loops with cross-iteration dependences (doacross loops) often contain significant amounts of parallelism that can potentially be exploited on modern manycore processors. However, most production-strength compilers focus their automatic parallelization efforts on doall loops, and consider doacross parallelism to be impractical due to the space inefficiencies and the synchronization overheads of past approaches. This paper presents a novel and practical approach to automatically parallelizing doacross loops for execution on manycore-SMP systems. We introduce a compiler-and-runtime optimization called dependence folding that bounds the number of synchronization variables allocated per worker thread (processor core) to be at most the maximum depth of a loop nest being considered for automatic parallelization. Our approach has been implemented in a development version of the IBM XL Fortran V13.1 commercial parallelizing compiler and runtime system. For four benchmarks where automatic doall parallelization was largely ineffective (speedups of under 2×), our implementation delivered speedups of 6.5×, 9.0×, 17.3×, and 17.5× on a 32-core IBM Power7 SMP system, thereby showing that doacross parallelization can be a valuable technique to complement doall parallelization.

Download to read the full chapter text

Chapter PDF

DiscoPoP: A Profiling Tool to Identify Parallelization Opportunities

Extending Index-Array Properties for Data Dependence Analysis

Enhancing the Effectiveness of Inlining in Automatic Parallelization

Article 06 August 2021

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Chen, D.K.: Compiler optimizations for parallel loops with fine-grained synchronization. PhD Thesis (1994)
Google Scholar
Cytron, R.: Doacross: Beyond vectorization for multiprocessors. In: Proceedings of the 1986 International Conference for Parallel Processing, pp. 836–844 (August 1986)
Google Scholar
Chen, D.-K., Torrellas, J., Yew, P.C.: An efficient algorithm for the run-time parallelization of doacross loops. In: Proc. Supercomputing 1994, pp. 518–527 (November 1994)
Google Scholar
Gupta, R., Pande, S., Psarris, K., Sarkar, V.: Compilation techniques for parallel systems. Parallel Computing 25(13-14), 1741–1783 (1999)
Article Google Scholar
Li, Z.: Compiler algorithms for event variable synchronization. In: Proceedings of the 5th International Conference on Supercomputing, Cologne, West Germany, pp. 85–95 (June 1991)
Google Scholar
Lowenthal, D.K.: Accurately selecting block size at run time in pipelined parallel programs. International Journal of Parallel Programming 28(3), 245–274 (2000)
Article Google Scholar
Midkiff, S.P., Padua, D.A.: Compiler algorithms for synchronization. IEEE Transactions on computers C 36, 1485–1495 (1987)
Article MATH Google Scholar
Tang, P., Yew, P., Zhu, C.: Compiler techniques for data synchronization in nested parallel loop. In: Proc. of 1990 ACM Intl. Conf. on Supercomputing, Amsterdam, pp. 177–186 (June 1990)
Google Scholar
Rajamony, R., Cox, A.L.: Optimally synchronizing doacross loops on shared memory multiprocessors. In: Proc. of Intl. Conf. on Parallel Architectures and Compilation Techniques (November 1997)
Google Scholar
Su, H.M., Yew, P.C.: On data synchronization for multiprocessors. In: Proc. of the 16th Annual International Symposium on Computer Architecture, Jerusalem, Israel, pp. 416–423 (April 1989)
Google Scholar
Krothapalli, V.P., Sadayappan, P.: Removal of redundant dependences in doacross loops with constant dependences. IEEE Transactions on Parallel and Distributed Systems, 281–289 (July 1991)
Google Scholar
Wolfe, M.: Multiprocessor synchronization for concurrent loops. IEEE Software 5(1), 34–42 (1988)
Article Google Scholar
Zhang, G., Unnikrishnan, P., Ren, J.: Experiments with auto-parallelizing SPEC2000FP benchmarks. In: 17th Intl Workshop on Languages and Compilers for Parallel Computing (2004)
Google Scholar
Pan, Z., Armstrong, B., Bae, H., Eigenmann, R.: On the interaction of tiling and automatic parallelization. In: First International Workshop on OpenMP (Wompat) (June 2005)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Toronto Laboratory, Canada
Priya Unnikrishnan, Kit Barton & Raul Silvera
Department of Computer Science, Rice University, USA
Jun Shirako, Sanjay Chatterjee & Vivek Sarkar

Authors

Priya Unnikrishnan
View author publications
You can also search for this author in PubMed Google Scholar
Jun Shirako
View author publications
You can also search for this author in PubMed Google Scholar
Kit Barton
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Raul Silvera
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Patras, Computer Technology Institute and Press “Diophantus”,, N. Kazantzaki, 26504, Rio, Greece
Christos Kaklamanis
University of Patras, University Building B, 26504, Rio, Greece
Theodore Papatheodorou
Computer Technology Institute and Press “Diophantus”, University of Patras, N. Kazantzaki, 26504, Rio, Greece
Paul G. Spirakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Unnikrishnan, P., Shirako, J., Barton, K., Chatterjee, S., Silvera, R., Sarkar, V. (2012). A Practical Approach to DOACROSS Parallelization. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-32820-6_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32819-0
Online ISBN: 978-3-642-32820-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Practical Approach to DOACROSS Parallelization

Abstract

Chapter PDF

Similar content being viewed by others

DiscoPoP: A Profiling Tool to Identify Parallelization Opportunities

Extending Index-Array Properties for Data Dependence Analysis

Enhancing the Effectiveness of Inlining in Automatic Parallelization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Practical Approach to DOACROSS Parallelization

Abstract

Chapter PDF

Similar content being viewed by others

DiscoPoP: A Profiling Tool to Identify Parallelization Opportunities

Extending Index-Array Properties for Data Dependence Analysis

Enhancing the Effectiveness of Inlining in Automatic Parallelization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation