Acceleration of first and higher order recurrences on processors with instruction level parallelism

  • Michael Schlansker
  • Vinod Kathail
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 768)


This paper describes parallelization techniques for accelerating a broad class of recurrences on processors with instruction level parallelism. We introduce a new technique, called blocked back-substitution, which has lower operation count and higher performance than previous methods. The blocked back-substitution technique requires unrolling and non-symmetric optimization of innermost loop iterations. We present metrics to characterize the performance of software-pipelined loops and compare these metrics for a range of height reduction techniques and processor architectures.


recurrences parallelism software pipeline loop optimization height reduction back-substitution blocked back-substitution 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    H. S. Stone. Parallel Tridiagonal Equation Solvers. ACM Trans Math. Softw 1, 4 (1975), 289–307.CrossRefGoogle Scholar
  2. 2.
    S.-C. Chen and D. J. Kuck. Time and Parallel Processor Bounds for Linear Recurrence Systems. IEEE Transactions on Computers C-24 7 (1975), 701–717.Google Scholar
  3. 3.
    D. Heller. Some Aspects of the Cyclic Reduction Algorithm for Block Tridiagonal Linear Systems. SIAM J. Numerical Anal. 13, 4 (1976), 484–496.CrossRefGoogle Scholar
  4. 4.
    A. H. Sameh and R. P. Brent. Solving Triangular Systems on a Parallel Computer. SIAM J. Numerical Analysis 14, 6 (1977), 1101–1113.CrossRefGoogle Scholar
  5. 5.
    D. J. Kuck. The structure of Computers and Computations. (Wiley, New York, 1978).Google Scholar
  6. 6.
    S. C. Chen, D. J. Kuck, and A. H. Sameh. Practical Parallel Band Triangular System Solvers. ACM Transactions on Mathematical Software 4, (1978), 270–277.CrossRefGoogle Scholar
  7. 7.
    H. H. Wang. A Parallel Method for Tridiagonal Equations. ACM Transactions on Mathematical Software 7, 2 (1981), 170–183.Google Scholar
  8. 8.
    H. A. Van Der Vorst and K. Dekker. Vectorization of Linear Recurrence Relations. SIAM Journal on Scientific and Statistical Computing 10, 1 (1989), 27–35.Google Scholar
  9. 9.
    Y. Tanaka, et al. Compiling Techniques for First-Order Linear Recurrences on a Vector Computer. Proceedings of the Supercomputing Conference (Orlando, FL., 1988), 174–180.Google Scholar
  10. 10.
    D. Callahan. Recognizing and Parallelizing Bounded Recurrences. Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Processing (Santa Clara, CA, 1991).Google Scholar
  11. 11.
    J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers C-30. (1981), 478–490.Google Scholar
  12. 12.
    G. Lowney, et al. The Multiflow Trace Scheduling Compiler. The Journal of Supercomputing 7. 1/2 (1993), 51–142.Google Scholar
  13. 13.
    W.-M. W. Hwu, et al. The Superblock: An Effective Technique for VLIW and Superscalar Compilation. The Journal of Supercomputing 7, 1/2 (1993), 229–248.CrossRefGoogle Scholar
  14. 14.
    B. R. Rau and C. D. Glaeser. Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing. Proceedings of the Fourteenth Annual Workshop on Microporgramming (1981), 183–198.Google Scholar
  15. 15.
    M. S.-L. Lam, A Systolic Array Optimizing Compiler. 1987, Carnegie Mellon UniversityGoogle Scholar
  16. 16.
    B. Rau. Data Flow and Dependence Analysis for Instruction Level Parallelism. Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Computing (Santa Clara, CA, 1992), 236–250.Google Scholar
  17. 17.
    B. R. Rau, M. S. Schlansker, and P. P. Tirumalai. Code Generation Schemas for Modulo Scheduled DO-Loops and WHILE-Loops. Proceedings of the 25th Annual International Symposium on Micro architecture (Portland, Oregon, 1992), 158–169.Google Scholar
  18. 18.
    J. C. Dehnert and R. A. Towle. Compiling for the Cydra 5. The Journal of Supercomputing 7, 1/2 (1993), 181–227.Google Scholar
  19. 19.
    M. Schlansker and V. Kathail, Acceleration of Algebraic Recurrences on Processors with Instruction Level Parallelism. Technical Report HPL-93-55. Hewlett-Packard Laboratories, 1993.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • Michael Schlansker
    • 1
  • Vinod Kathail
    • 1
  1. 1.Hewlett-Packard LaboratoriesPalo Alto

Personalised recommendations