Determining asynchronous pipeline execution times

  • Val Donaldson
  • Jeanne Ferrante
Compiler Algorithms for Fine-Grain Parallelism
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1239)


Asynchronous pipelining is a form of parallelism in which processors execute different loop tasks (loop statements) as opposed to different loop iterations. An asynchronous pipeline schedule for a loop is an assignment of loop tasks to processors, plus an order on instances of tasks assigned to the same processor. This variant of pipelining is particularly relevant in distributed memory systems (since pipeline control may be distributed across processors), but may also be used in shared memory systems.

Accurate estimation of the execution time of a pipeline schedule is needed to determine if pipelining is appropriate for a loop, and to compare alternative schedules. Pipeline execution of n iterations of a loop requires time at most a + bn, for some constants a and b. The coefficient b is the iteration interval of the pipeline schedule, and is the primary measure of the performance of a schedule. The startup time a is a secondary performance measure.

We generalize previous work on determining if a pipeline schedule will deadlock, and generalize Reiter's well-known formula [19] for determining the iteration interval b of a deadlock-free schedule, to account for nonzero communication times (easy) and the assignment of multiple tasks to processors (nontrivial). Two key components of our generalization are the use of pipeline scheduling edges, and the notion of negative data dependence distances (in a single unnested loop). We also discuss implementation of an asynchronous pipeline schedule at runtime; derive bounds on the startup time a; and discuss evaluation of the iteration interval formula, including development of a new algorithm.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alexander Aiken and Alexandru Nicolau. Optimal loop parallelization. Proc. SIG-PLAN '88 Conference on Programming Language Design and Implementation, Atlanta, GA, June 1988, pp. 308–317.Google Scholar
  2. 2.
    Sati Banerjee, Takeo Hamada, Paul M. Chau, and Ronald D. Fellman. Macro pipelining based scheduling on high performance heterogeneous multiprocessor systems. IEEE Transactions on Signal Processing 43:8 (June 1995), pp. 1468–1484.CrossRefGoogle Scholar
  3. 3.
    Steven M. Burns. Performance analysis and optimization of asynchronous circuits. Ph.D. Thesis, California Institute of Technology, Pasadena, California, 1991.Google Scholar
  4. 4.
    F. Commoner, A. W. Holt, S. Even, and A. Pnueli. Marked directed graphs. Journal of Computer and System Sciences 5:5 (October 1971), pp. 511–523.Google Scholar
  5. 5.
    Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1990.Google Scholar
  6. 6.
    Val Donaldson and Jeanne Ferrante. Determining asynchronous acyclic pipeline execution times. Proc. 10th International Parallel Processing Symposium, Honolulu, HI, April 1996, pp. 568–572.Google Scholar
  7. 7.
    Val Donaldson and Jeanne Ferrante. Determining asynchronous pipeline execution times. Technical Report CS96-481, Computer Science and Engineering Dept., University of California, San Diego, La Jolla, CA, April 1996.Google Scholar
  8. 8.
    Franco Gasperoni and Uwe Schwiegeishohn. Scheduling loops on parallel processors: a simple algorithm with close to optimum performance. Second Joint International Conference on Vector and Parallel Processing (Parallel Processing: CON-PAR 92-VAPP V), Lyon, France, September 1992, pp. 625–636.Google Scholar
  9. 9.
    Mark Hartmann and James B. Orlin. Finding minimum cost to time ratio cycles with small integral transit times. Networks 23:6 (September 1993), pp. 567–74.Google Scholar
  10. 10.
    Phu D. Hoang and Jan M. Rabaey. Scheduling of DSP programs onto multiprocessors for maximum throughput. IEEE Transactions on Signal Processing 41:6 (June 1993), pp. 2225–2235.CrossRefGoogle Scholar
  11. 11.
    Donald B. Johnson. Finding all the elementary circuits of a directed graph. SIAM Journal on Computing 4:1 (March 1975), pp. 77–84.CrossRefGoogle Scholar
  12. 12.
    Peter M. Kogge. The Architecture of Pipelined Computers. Hemisphere Publishing, New York, 1981.Google Scholar
  13. 13.
    S. Y. Kung, P. S. Lewis, and S. C. Lo. Performance analysis and optimization of VLSI dataflow arrays. Journal of Parallel and Distributed Computing 4:6 (December 1987), pp. 592–618.CrossRefGoogle Scholar
  14. 14.
    Monica Lam. Software pipelining: an effective scheduling technique for VLIW machines. Proc. SIGPLAN '88 Conference on Programming Language Design and Implementation, Atlanta, GA, June 1988, pp. 318–328.Google Scholar
  15. 15.
    Eugene L. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart, and Winston, New York, 1976.Google Scholar
  16. 16.
    F. Thomson Leighton. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, San Mateo, CA, 1992.Google Scholar
  17. 17.
    David A. Padua and Michael J. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM 29:12 (December 1986), pp. 1184–1201.CrossRefGoogle Scholar
  18. 18.
    C. V. Ramamoorthy and Gary S. Ho. Performance evaluation of asynchronous concurrent systems using Petri nets. IEEE Transactions on Software Engineering SE-6:5 (September 1980), pp. 440–449.Google Scholar
  19. 19.
    Raymond Reiter. Scheduling parallel computations. Journal of the ACM 15:4 (October 1968), pp. 590–599.CrossRefGoogle Scholar
  20. 20.
    Vivek Sarkar. Partitioning and Scheduling Parallel Programs for Multiprocessors. MIT Press, Cambridge, MA, 1989.Google Scholar
  21. 21.
    Tao Yang, Cong Fu, Apostolos Gerasoulis, and Vivek Sarkar. Mapping iterative task graphs on distributed memory machines. Proc. 24th International Conference on Parallel Processing, Oconomowoc, WI, August 1995, Vol II, pp. 151–158.Google Scholar
  22. 22.
    Tao Yang and Apostolos Gerasoulis. DSC: scheduling parallel tasks on an unbounded number of processors. IEEE Transactions on Parallel and Distributed Systems 5:9 (September 1994), pp. 951–967.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Val Donaldson
    • 1
  • Jeanne Ferrante
    • 1
  1. 1.University of CaliforniaSan Diego

Personalised recommendations