Determining asynchronous pipeline execution times
Asynchronous pipelining is a form of parallelism in which processors execute different loop tasks (loop statements) as opposed to different loop iterations. An asynchronous pipeline schedule for a loop is an assignment of loop tasks to processors, plus an order on instances of tasks assigned to the same processor. This variant of pipelining is particularly relevant in distributed memory systems (since pipeline control may be distributed across processors), but may also be used in shared memory systems.
Accurate estimation of the execution time of a pipeline schedule is needed to determine if pipelining is appropriate for a loop, and to compare alternative schedules. Pipeline execution of n iterations of a loop requires time at most a + bn, for some constants a and b. The coefficient b is the iteration interval of the pipeline schedule, and is the primary measure of the performance of a schedule. The startup time a is a secondary performance measure.
We generalize previous work on determining if a pipeline schedule will deadlock, and generalize Reiter's well-known formula  for determining the iteration interval b of a deadlock-free schedule, to account for nonzero communication times (easy) and the assignment of multiple tasks to processors (nontrivial). Two key components of our generalization are the use of pipeline scheduling edges, and the notion of negative data dependence distances (in a single unnested loop). We also discuss implementation of an asynchronous pipeline schedule at runtime; derive bounds on the startup time a; and discuss evaluation of the iteration interval formula, including development of a new algorithm.
Unable to display preview. Download preview PDF.
- 1.Alexander Aiken and Alexandru Nicolau. Optimal loop parallelization. Proc. SIG-PLAN '88 Conference on Programming Language Design and Implementation, Atlanta, GA, June 1988, pp. 308–317.Google Scholar
- 3.Steven M. Burns. Performance analysis and optimization of asynchronous circuits. Ph.D. Thesis, California Institute of Technology, Pasadena, California, 1991.Google Scholar
- 4.F. Commoner, A. W. Holt, S. Even, and A. Pnueli. Marked directed graphs. Journal of Computer and System Sciences 5:5 (October 1971), pp. 511–523.Google Scholar
- 5.Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1990.Google Scholar
- 6.Val Donaldson and Jeanne Ferrante. Determining asynchronous acyclic pipeline execution times. Proc. 10th International Parallel Processing Symposium, Honolulu, HI, April 1996, pp. 568–572.Google Scholar
- 7.Val Donaldson and Jeanne Ferrante. Determining asynchronous pipeline execution times. Technical Report CS96-481, Computer Science and Engineering Dept., University of California, San Diego, La Jolla, CA, April 1996.Google Scholar
- 8.Franco Gasperoni and Uwe Schwiegeishohn. Scheduling loops on parallel processors: a simple algorithm with close to optimum performance. Second Joint International Conference on Vector and Parallel Processing (Parallel Processing: CON-PAR 92-VAPP V), Lyon, France, September 1992, pp. 625–636.Google Scholar
- 9.Mark Hartmann and James B. Orlin. Finding minimum cost to time ratio cycles with small integral transit times. Networks 23:6 (September 1993), pp. 567–74.Google Scholar
- 12.Peter M. Kogge. The Architecture of Pipelined Computers. Hemisphere Publishing, New York, 1981.Google Scholar
- 14.Monica Lam. Software pipelining: an effective scheduling technique for VLIW machines. Proc. SIGPLAN '88 Conference on Programming Language Design and Implementation, Atlanta, GA, June 1988, pp. 318–328.Google Scholar
- 15.Eugene L. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart, and Winston, New York, 1976.Google Scholar
- 16.F. Thomson Leighton. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, San Mateo, CA, 1992.Google Scholar
- 18.C. V. Ramamoorthy and Gary S. Ho. Performance evaluation of asynchronous concurrent systems using Petri nets. IEEE Transactions on Software Engineering SE-6:5 (September 1980), pp. 440–449.Google Scholar
- 20.Vivek Sarkar. Partitioning and Scheduling Parallel Programs for Multiprocessors. MIT Press, Cambridge, MA, 1989.Google Scholar
- 21.Tao Yang, Cong Fu, Apostolos Gerasoulis, and Vivek Sarkar. Mapping iterative task graphs on distributed memory machines. Proc. 24th International Conference on Parallel Processing, Oconomowoc, WI, August 1995, Vol II, pp. 151–158.Google Scholar