Advertisement

International Journal of Parallel Programming

, Volume 32, Issue 6, pp 447–474 | Cite as

Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures

  • Javier Zalamea
  • Josep Llosa
  • Eduard Ayguadé
  • Mateo Valero
Article

Abstract

High-performance microprocessors are currently designed with the purpose of exploiting instruction level parallelism (ILP). The techniques used in their design and the aggressive scheduling techniques used to exploit this ILP tend to increase the register requirements of the loops. This paper reviews hardware and software techniques that alleviate the high register demands of aggressive scheduling heuristics on VLIW cores. From the software point of view, instruction scheduling can stretch lifetimes and reduce the register pressure. If more registers than those available in the architecture are required, some actions (such as the injection of spill code) have to be applied to reduce this pressure, at the expense of some performance degradation. From the hardware point of view, this degradation could be reduced if a high-capacity register file were included without causing a negative impact on the design of the processor (cycle time, area and power dissipation). Novel organizations for the register file based on clustering and hierarchical organization are necessary to meet the technology constraints. This paper proposes the used of a clustered organization and proposes an aggressive instruction scheduling technique that minimizes the negative effect of the limitations imposed by the register file organization.

Modulo scheduling register requirements spill code register file organization clustered organization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

REFERENCES

  1. 1.
    1._ P. P Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, IMPACT: An Architectural Framework for Multiple-instruction-Issue Processors, Proceeding of the 18th International Symposium on Computer Architecture, pp. 266–275 (1991).Google Scholar
  2. 2.
    W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, The Superblock: An Effective Technique for VLIW and Superscalar Compilation, Journal of Supercomputing, Vol. 7, No. 1/2, pp. 229–248 (1993).Google Scholar
  3. 3.
    B. R. Rau and J. A. Fisher, Instruction-level Parallel Processing: History, Overview and Perspective, Journal of Supercomputing, Vol. 7, No. 1/2, pp. 9–50 (July 1993).Google Scholar
  4. 4.
    A. E. Charlesworth, An Approach to Scientific Array Processing: The Architectural Design of the AP120B/FPS-164 Family, Computer, Vol. 14, No. 9, pp. 18–27 (1981).Google Scholar
  5. 5.
    M. S. Lam, Software Pipelining: An Effective Scheduling Technique for VLIW Machines, Proceedings of the SIGPLAN'88 Conference on Programming Language Design and Implementation, pp. 318–328 (June 1988).Google Scholar
  6. 6.
    J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5, The Journal of Supercomputing, Vol. 7, No. 1/2, pp. 181–228 (May 1993).Google Scholar
  7. 7.
    B. R. Rau, Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops, Proceedings of the 27th Annual International Symposium on Microarchitecture, pp. 63–74 (November 1994).Google Scholar
  8. 8.
    J. Llosa, M. Valero, and E. Ayguadù, Quantitative Evaluation of Register Pressure on Software Pipelined Loops, International Journal of Parallel Programming, Vol. 26, No. 2, pp. 121–142 (April 1998).Google Scholar
  9. 9.
    A. E. Eichenberger and E. S. Davidson, Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo Schedule, Proceedings of the 28th Annual Int. Symp. on Microarchitecture (MICRO-28), pp. 338–349 (November 1995).Google Scholar
  10. 10.
    R. A. Huff, Lifetime-Sensitive Modulo Scheduling, Proceedings of the 6th Conference on Programming Language, Design and Implementation, pp. 258–267 (1993).Google Scholar
  11. 11.
    J. Llosa, M. Valero, E. Ayguadù, and A. González, Hypernode Reduction Modulo Scheduling, Proceedings of the 28th Annual Int. Symp. on Microarchitecture (MICRO-28), pp. 350–360 (November 1995).Google Scholar
  12. 12.
    D. Callahan, K. Kennedy, and A. Porterfield, Software Prefetching, Proceedings Fourth Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), pp. 40–52 (April 1991).Google Scholar
  13. 13.
    J. Llosa, M. Valero, and E. Ayguadù, Heuristics for Register-Constrained Software Pipelining, Proceedings of the 29th Annual Int. Symp. on Microarchitecture (MICRO-29), pp. 250–261 (December 1996).Google Scholar
  14. 14.
    J. Zalamea, J. Llosa, E. Ayguadù, and Mateo Valero, Improved Spill Code Generation for Software Pipelined Loops, Proceedings of the Programming Languages Design and Implementation (PLDI'00), pp. 134–144 (June 2000).Google Scholar
  15. 15.
    J. Zalamea, J. Llosa, E. Ayguadù, and M. Valero, MIRS: Modulo Scheduling with Integrated Register Spilling, Tech. Rep. UPC-DAC-2000–68, Universitat Politècnica de Catalunya, (November 2000).Google Scholar
  16. 16.
    A. K. Dani, V. J. Ramanan, and R. Govindarajan, Register-Sensitive Software Pipelining, Proceedings of the Merged 12th International Parallel Processing and 9th International Symposium on Parallel and Distributed Systems, (April 1998).Google Scholar
  17. 17.
    J. A. Fisher, Very Long Instruction Word Architectures and the ELI-512, Proceedings, Tenth Annual Internat. Symp. on Computer Architecture, pp. 140–150 (June 1983).Google Scholar
  18. 18.
    MAP1000 unfolds at Equator, Microprocessor Report, Vol. 12, No. 16, (December 1998).Google Scholar
  19. 19.
    Texas Instruments Inc., TMS320C62x/67x CPU and Instruction Set Reference Guide, (1998).Google Scholar
  20. 20.
    J. Fridman and Zvi Greefield, The Tigersharc DSP Architecture, IEEE Micro, pp. 66–76 (January–February 2000).Google Scholar
  21. 21.
    P. Faraboschi, G. Brown, G. Desoli, and F. Homewood, Lx: A Technology Platform for Customizable VLIW Embedded Porcessing, Proceedings of the 27nd Annual International Symposium on Computer Architecture, pp. 203–213 (June 2000).Google Scholar
  22. 22.
    S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J. D. Owens, Register Organization for Media Processing, Proceedings of the 6th High-Performance Computer Architecture (HPCA-6), pp. 375–386 (January 2000).Google Scholar
  23. 23.
    B. R. Rau, M. Lee, P. Tirumalai, and P. Schlansker, Register Allocation for Software Pipelined Loops, Proceedings of the ACM SIGPLAN'92 Conference on Programming Language Design and Implementation, pp. 283–299 (June 1992).Google Scholar
  24. 24.
    J. Llosa, A. González, E. Ayguadù, and M. Valero, Swing Modulo Scheduling: A Lifetime-sensitive Approach, IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96), pp. 80–86 (October 1996).Google Scholar
  25. 25.
    M. Berry, D. Chen, P. Koss, and D. Kuck, The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers, Tech. Rep. 827, Center for Supercomputing Research and Development (November 1988).Google Scholar
  26. 26.
    J. Ruttenberg, G. R. Gao, A. Stoutchinin, and W. Lichtenstein, Software Pipelining Showdown: Optimal vs. Heuristic Methods in a Production Compiler, Proceedings of the ACM SIGPLAN'96 Conf. on Programming Languages Design and Implementation, pp. 1–11 (May 1996).Google Scholar
  27. 27.
    J. Wang, A. Krall, M. A. Ertl, and C. Eisenbeis, Software Pipelining with Register Allocation and Spilling, Proceedings of the 27th Annual Int. Symp. on Microarchitecture, pp. 95–99 (November 1994).Google Scholar
  28. 28.
    R. E. Kessler, The Alpha 21264 Microprocessor, IEEE Micro, Vol. 19, No. 2, pp. 24–36, (March 1999).Google Scholar
  29. 29.
    S. W. White and S. Dhawan, POWER2: Next Generation of the RISC System/6000 Family, IBM RISC System/6000 Technology: Volume II. IBM Corporation (1993).Google Scholar
  30. 30.
    J. Llosa, M. Valero, and E. Ayguadù, Non-Consistent Dual Register Files to Reduce Register Pressure, Proceedings of the 1st Symposium on High Performance Computer Architecture, pp. 22–31 (January 1995).Google Scholar
  31. 31.
    A. Capitanio, N. Dutt, and A. Nicolau, Partitioned Register Files for VLIWs: A Preliminary Analysis of Tradeoffs, Proceedings of the 25th Annual Int. Symp. on Microarchitecture (MICRO-25), pp. 292–300 (December 1992).Google Scholar
  32. 32.
    M. Fernandes, J. Llosa, and N. Topham, Partitioned Schedules for Clustered Vliw Architectures, Proceedings of the 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP'1998), pp. 386–391 (March 1998).Google Scholar
  33. 33.
    J. Llosa, M. Valero, J. Fortes, and E. Ayguadù, Using Sacks to Organize Register Files VLIW Machines, CONPAR 94VAPP VI (September 1994).Google Scholar
  34. 34.
    K. D. Cooper and T. J. Harvey, Compiler-Controlled Memory, Proceedings of the Eighth Internat. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 100–104 (October 1998).Google Scholar
  35. 35.
    D. Siewiorek, C. Bell, and A. Newell, Computer Structures: Principles and Examples, MacGraw-Hill, Pittsbutgh, Pennsylvania (1982).Google Scholar
  36. 36.
    J. A. Swensen and Y. N. Patt, Hierarchical Registers For Scientific Computers, Proceedings of the International Conference on Supercomputing, pp. 346–353 (July 1988).Google Scholar
  37. 37.
    J. L. Cruz, A. Gonzalez, M. Valero, and N. P. Topham, Multiple-Banked Register File Architectures, Proceedings 27th Annual Internat. Symp. on Computer Architecture, (June 2000).Google Scholar
  38. 38.
    J. Zalamea, J. Llosa, E. Ayguadù, and Mateo Valero, Two-Level Hierarchical Register File Organization For VLIW Processors, Proceedings of the 33rd International Symposium on Microarchitecture (MICRO-33) pp. 137–146 (December 2000).Google Scholar
  39. 39.
    J. Sánchez and A. González, Cache Sensitive Modulo Scheduling, Proceedings of the 30th Annual International Symposium on Microarchitecture (MICRO-30), pp. 338–348 (December 1997).Google Scholar
  40. 40.
    J. R. Ellis, Bulldog: A Compiler for VLIW Architectures, MIT Press (1986).Google Scholar
  41. 41.
    G. Desoli, Instruction Assignment for Clustered VLIW DSP Compilers: A New Approach, Tech. Rep. HPL–98–13, HP Laboratories (January 1998).Google Scholar
  42. 42.
    S. Jang, S. Carr, P. Sweany, and D. Kuras, A Code Generation Framework for VLIW Architectures with Partitioned Register Banks, Proceedings of 3rd. Int. Conf. on Massively Parallel Computing Systems (April 1998).Google Scholar
  43. 43.
    E. Özer, S. Banerjia, and T. Conte, Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register file Microarchitectures, Proceedings of 31st Annual International Symposium on Microarchitecture (MICRO-31), pp. 308–315 (November 1998).Google Scholar
  44. 44.
    K. Kailas, K. Ebcioglu, and A. Agrawala, Cars: A New Code Generation Framework for Clustered Ilp Processors, Proceedings of the 7th High-Performance Computer Architecture (HPCA-7) (January 2001).Google Scholar
  45. 45.
    E. Nystrom and E. Eichenberger, Effective Cluster Assignment for Modulo Scheduling, Proceedings of 31st Annual Int. Symp. on Microarchitecture (MICRO-31), pp. 103–114 (November 1998).Google Scholar
  46. 46.
    J. Sánchez and A. González, The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures, Proceedings of the International Conference on Parallel Processing (ICPP'200), pp. 555–562 (August 2000).Google Scholar
  47. 47.
    J. Zalamea, J. Llosa, E. Ayguadù, and M. Valero, Modulo Scheduling with Integrated Register Spilling for Clustered VLIW Architectures, Tech. Rep. UPC-DAC-2001–19, Universitat Politècnica de Catalunya, (June 2001).Google Scholar
  48. 48.
    D López, J. Llosa, Mateo Valero, and E. Ayguadù, Widening Resources: A Cost-Effective Technique for Aggressive ilp Architectures, 31st International Symposium on Microarchitecture (MICRO-31), pp. 237–246 (November 2000).Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2004

Authors and Affiliations

  • Javier Zalamea
    • 1
  • Josep Llosa
    • 1
  • Eduard Ayguadé
    • 1
  • Mateo Valero
    • 1
  1. 1.Departament d'Arquitectura de ComputadorsUniversitat Politècnica de CatalunyaU.S.A

Personalised recommendations