Advertisement

Aggregate operation movement: A min-cut approach to global code motion

  • Raymond Lo
  • Sun Chan
  • Jim Dehnert
  • Ross Towle
Workshop 20 Instruction Level Parallelism
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1124)

Abstract

This paper describes a novel alternative to trace scheduling and other global scheduling techniques that attempt to boost instruction level parallelism by moving operations beyond basic block boundaries. We quantify the relative benefits of moving operations from one basic block to another with respect to critical path length, register pressure, and avoiding interlocks from long-latency operations. The benefits are encoded as flow capacities in a network, and a mincut algorithm is used to select the set of operations to move. Unlike other approaches, our method is applied before register allocation and scheduling. Our experiments on a superscalar processor show that significant speedup can be obtained for both integer and floating-point benchmarks using this method, even in the presence of an excellent software pipeliner.

Keywords

global scheduling global code motion software pipelining superscalar network flow minimum cut 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  1. [BeRo91]
    Bernstein, D., and Rodeh, M., Global Instruction Scheduling for Superscalar Machines, Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pp. 241–255, 1991.Google Scholar
  2. [BaLa93]
    Ball, T., and Laras, J.R., Branch Predication For Free, Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pp. 300–313, 1993.Google Scholar
  3. [Char+81]
    Charlesworth, A.E. An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164. IEEE Computer 14 (9), pp. 18–27, 1981.Google Scholar
  4. [DeTo93]
    Dehnert, J.C., and Towle, R.A., Compiling for the Cydra 5, The Journal of Supercomputing 7 (1/2), pp. 181–227, 1993.CrossRefGoogle Scholar
  5. [EbNi89]
    Kemal Ebcioglu and Alexandra Nicolau, A Global Resource-Constrained Parallelization Technique. Proceedings of the 3-rd International Conference on Supercomputing, pp. 154–163, 1989.Google Scholar
  6. [EdKa72]
    Edmonds, J., and Karp, R.M., Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems, J. Assoc. Computer Machinery 19, pp. 248–264, 1972.Google Scholar
  7. [Ellis86]
    Ellis, J., Bulldog: A Compiler for VLIW Architectures. MIT Press, Cambridge, Massachusetts, 1986.Google Scholar
  8. [FoFu62]
    Ford, L.R., and Fulkerson, D.R., Flows in Networks. Princeton University Press, Princeton, New Jersey, 1962.Google Scholar
  9. [Fisher81]
    Fisher, J.A., Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers C-30 (7), pp. 478–490, 1981.Google Scholar
  10. [HMCC+93]
    Hwu, W.W., Mahlke, Chen, Chang, Warter, Bringmann, Ouellete, Hank, Kiyohara, Haab, Holm, and Lavery, The Superblock: An Effective Technique for VLIW and Superscalar Compilation, The Journal of Supercomputing 1 (1/2), pp. 182–229, May 1993.Google Scholar
  11. [Hsu94]
    Peter Yan-Tek Hsu, Design the TFP Microprocessor, IEEE MICRO, April 1994, pp. 23–33.Google Scholar
  12. [Jain91]
    Suneel Jain, Circular Scheduling: A New Technique to Perform Software Pipelining, Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pp. 219–228, 1991.Google Scholar
  13. [LFKL+93]
    Lowney, P.G., Freudenberger, S.M., Karzes, T.J., Lichtenstein, W.D., Nix, R.P., O'Donnell, J.J., and Ruttenberg, J.C., The Multiflow Trace Scheduling Compiler, The Journal of Supercomputing 7 (1/2), pp. 51–142, May 1993.CrossRefGoogle Scholar
  14. [Lo88]
    Virginia Mary Lo, Heuristic Algorithms for Task Assignment in Distributed Systems, IEEE Trans. on Computers, vol. 37, no. 11, pp. 1384–1397, Nov 1988.CrossRefGoogle Scholar
  15. [NaEb93]
    Toshio Nakatani and Kemal Ebcioglu, Making Compaction-Based Parallelization Affordable, IEEE Trans. on Parallel and Dist. Syst. 4(9), pp. 1014–1029, 1993.CrossRefGoogle Scholar
  16. [Nicolau86]
    Alexandra Nicolau, A Fine-Grain Parallelizing Compiler, Tech. Report No. 86-792, Cornell University, 1986.Google Scholar
  17. [RaFi93]
    Rau, B.R., and Fisher, J.A., Instruction-Level Parallel Processing: History, Overview, and Perspective, The Journal of Supercomputing 7 (1/2), pp. 9–50.Google Scholar
  18. [RaG181]
    Rau, B.R., and Glaeser, C.D., Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High-performance Scientific Computing, Proceedings — MICRO-14, October 1981, pp. 183–198.Google Scholar
  19. [RGSL96]
    Ruttenberg, J., Gao, G., Stoutchinin, A., and Lichtenstein, W., Software Pipelining Showdown: Optimal vs. Heuristic Methods in a Production Compiler, to appear in Proceedings of the SIGPLAN '96 Conference on Programming Language Design and Implementation, May 1996.Google Scholar
  20. [Stone77]
    Harold S. Stone, Multiprocessor Scheduling with the Aid of Network Flow Algorithms, IEEE Trans. on Software Engineering, vol. SE-3, no. 1, Jan 1977.Google Scholar
  21. [Touzeau84]
    Touzeau, R.F., A Fortran Compiler for the FPS-164 Scientific Computer. In Conference Proceedings — SIGPLAN '84 Symposium on Compiler Construction (Montreal, Canada, June 20), pp. 48–57, 1984.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Raymond Lo
    • 1
  • Sun Chan
    • 1
  • Jim Dehnert
    • 1
  • Ross Towle
    • 1
  1. 1.Silicon Graphics Computer SystemsMountain View

Personalised recommendations