GTS: Extracting full parallelism out of DO loops
In this paper we present a new method for extracting the maximum parallelism out of DO loops with tight recurrences in a sequential programming language. We have named the method Graph Traverse Scheduling (GTS). It is devised for producing code for shared memory multiprocessors. Hardware support for fast synchronization is assumed.
Based on the dependence graph of a loop we first show how its parallelism can be evaluated. Then we apply GTS to distribute iterations of a recurrence between tasks. With this method, some dependences are included in the sequential execution of each task. Other dependences must be explicitely synchronized. A method to minimize the number of explicit synchronizations is also presented.
The parallelism of the loop is first evaluated considering that every statement in the loop has the same execution time. Later, the evaluation is performed when different execution times of statements are considered.
GTS can be easily adapted for VLIW machines as well as for vector processors.
Unable to display preview. Download preview PDF.
- D.J. Kuck, R.H. Kuhn, B. Leasure and M. Wolfe, “The Structure of an Advanced Vectorizer for Pipelined Processors”, Proc. 4th Int. Computer Software Appl. Conf., October 1980.Google Scholar
- D.J. Kuck, R.H. Kuhn, D.A. Padua, B. Leasure and M. Wolfe, “Dependence Graph and Compiler Optimization”, Proc. 8th ACM Symp. Principles of Programming Languages, January 1981.Google Scholar
- U. Banerjee, “Speedup of Ordinary Programs”, Univ. Illinois at Urbana-Champaign, DCS Report UIUCDCS-R-79-989, October 1979.Google Scholar
- J.R. Beckman Daview, “Parallel Loop Constructs for Multiprocessors”, Univ. Illinois at Urbana-Champaign, DCS Report UIUCDCS-R-81-1070, May 1981.Google Scholar
- R.G. Cytron, “Doacross: Beyond Vectorization for Multiprocessors”, Proc. 1986 Int. Conf. Parallel Processing, August 1986.Google Scholar
- D.A. Padua, “Multiprocessors: Discussions of Some Theoretical and Practical Problems”, Univ. Illinois at Urbana-Champaign, DCS Report UIUCDCS-R-79-990, November 1979.Google Scholar
- C.P. Polychronopoulos, “Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design”, IEEE Trans. on Computers, August 1988.Google Scholar
- E. Ayguadé and J. Labarta, “Graph Traverse Scheduling: Parallelization and Vectorization of DO Loops”, Univertsitat Politècnica de Catalunya, Technical Report RR-88/25, September 1988.Google Scholar
- A. Gottieb, “The NYU Ultracomputer-Designing an MIMD Shared Memory Parallel Computer”, IEEE Trans. on Computers, February 1983.Google Scholar
- A. Seznec and Y. Jégou, “Synchronizing Processors Through Memory Requests in a Tightly Coupled Multiprocessor”, Proc. 15th Annual Int. Symp. on Computer Architecture, June 1988.Google Scholar