Skip to main content
Log in

Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

High-performance microprocessors are currently designed with the purpose of exploiting instruction level parallelism (ILP). The techniques used in their design and the aggressive scheduling techniques used to exploit this ILP tend to increase the register requirements of the loops. This paper reviews hardware and software techniques that alleviate the high register demands of aggressive scheduling heuristics on VLIW cores. From the software point of view, instruction scheduling can stretch lifetimes and reduce the register pressure. If more registers than those available in the architecture are required, some actions (such as the injection of spill code) have to be applied to reduce this pressure, at the expense of some performance degradation. From the hardware point of view, this degradation could be reduced if a high-capacity register file were included without causing a negative impact on the design of the processor (cycle time, area and power dissipation). Novel organizations for the register file based on clustering and hierarchical organization are necessary to meet the technology constraints. This paper proposes the used of a clustered organization and proposes an aggressive instruction scheduling technique that minimizes the negative effect of the limitations imposed by the register file organization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. 1._ P. P Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, IMPACT: An Architectural Framework for Multiple-instruction-Issue Processors, Proceeding of the 18th International Symposium on Computer Architecture, pp. 266–275 (1991).

  2. W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, The Superblock: An Effective Technique for VLIW and Superscalar Compilation, Journal of Supercomputing, Vol. 7, No. 1/2, pp. 229–248 (1993).

    Google Scholar 

  3. B. R. Rau and J. A. Fisher, Instruction-level Parallel Processing: History, Overview and Perspective, Journal of Supercomputing, Vol. 7, No. 1/2, pp. 9–50 (July 1993).

    Google Scholar 

  4. A. E. Charlesworth, An Approach to Scientific Array Processing: The Architectural Design of the AP120B/FPS-164 Family, Computer, Vol. 14, No. 9, pp. 18–27 (1981).

    Google Scholar 

  5. M. S. Lam, Software Pipelining: An Effective Scheduling Technique for VLIW Machines, Proceedings of the SIGPLAN'88 Conference on Programming Language Design and Implementation, pp. 318–328 (June 1988).

  6. J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5, The Journal of Supercomputing, Vol. 7, No. 1/2, pp. 181–228 (May 1993).

    Google Scholar 

  7. B. R. Rau, Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops, Proceedings of the 27th Annual International Symposium on Microarchitecture, pp. 63–74 (November 1994).

  8. J. Llosa, M. Valero, and E. Ayguadù, Quantitative Evaluation of Register Pressure on Software Pipelined Loops, International Journal of Parallel Programming, Vol. 26, No. 2, pp. 121–142 (April 1998).

    Google Scholar 

  9. A. E. Eichenberger and E. S. Davidson, Stage Scheduling: A Technique to Reduce the Register Requirements of a Modulo Schedule, Proceedings of the 28th Annual Int. Symp. on Microarchitecture (MICRO-28), pp. 338–349 (November 1995).

  10. R. A. Huff, Lifetime-Sensitive Modulo Scheduling, Proceedings of the 6th Conference on Programming Language, Design and Implementation, pp. 258–267 (1993).

  11. J. Llosa, M. Valero, E. Ayguadù, and A. González, Hypernode Reduction Modulo Scheduling, Proceedings of the 28th Annual Int. Symp. on Microarchitecture (MICRO-28), pp. 350–360 (November 1995).

  12. D. Callahan, K. Kennedy, and A. Porterfield, Software Prefetching, Proceedings Fourth Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), pp. 40–52 (April 1991).

  13. J. Llosa, M. Valero, and E. Ayguadù, Heuristics for Register-Constrained Software Pipelining, Proceedings of the 29th Annual Int. Symp. on Microarchitecture (MICRO-29), pp. 250–261 (December 1996).

  14. J. Zalamea, J. Llosa, E. Ayguadù, and Mateo Valero, Improved Spill Code Generation for Software Pipelined Loops, Proceedings of the Programming Languages Design and Implementation (PLDI'00), pp. 134–144 (June 2000).

  15. J. Zalamea, J. Llosa, E. Ayguadù, and M. Valero, MIRS: Modulo Scheduling with Integrated Register Spilling, Tech. Rep. UPC-DAC-2000–68, Universitat Politècnica de Catalunya, (November 2000).

  16. A. K. Dani, V. J. Ramanan, and R. Govindarajan, Register-Sensitive Software Pipelining, Proceedings of the Merged 12th International Parallel Processing and 9th International Symposium on Parallel and Distributed Systems, (April 1998).

  17. J. A. Fisher, Very Long Instruction Word Architectures and the ELI-512, Proceedings, Tenth Annual Internat. Symp. on Computer Architecture, pp. 140–150 (June 1983).

  18. MAP1000 unfolds at Equator, Microprocessor Report, Vol. 12, No. 16, (December 1998).

  19. Texas Instruments Inc., TMS320C62x/67x CPU and Instruction Set Reference Guide, (1998).

  20. J. Fridman and Zvi Greefield, The Tigersharc DSP Architecture, IEEE Micro, pp. 66–76 (January–February 2000).

  21. P. Faraboschi, G. Brown, G. Desoli, and F. Homewood, Lx: A Technology Platform for Customizable VLIW Embedded Porcessing, Proceedings of the 27nd Annual International Symposium on Computer Architecture, pp. 203–213 (June 2000).

  22. S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J. D. Owens, Register Organization for Media Processing, Proceedings of the 6th High-Performance Computer Architecture (HPCA-6), pp. 375–386 (January 2000).

  23. B. R. Rau, M. Lee, P. Tirumalai, and P. Schlansker, Register Allocation for Software Pipelined Loops, Proceedings of the ACM SIGPLAN'92 Conference on Programming Language Design and Implementation, pp. 283–299 (June 1992).

  24. J. Llosa, A. González, E. Ayguadù, and M. Valero, Swing Modulo Scheduling: A Lifetime-sensitive Approach, IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'96), pp. 80–86 (October 1996).

  25. M. Berry, D. Chen, P. Koss, and D. Kuck, The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers, Tech. Rep. 827, Center for Supercomputing Research and Development (November 1988).

  26. J. Ruttenberg, G. R. Gao, A. Stoutchinin, and W. Lichtenstein, Software Pipelining Showdown: Optimal vs. Heuristic Methods in a Production Compiler, Proceedings of the ACM SIGPLAN'96 Conf. on Programming Languages Design and Implementation, pp. 1–11 (May 1996).

  27. J. Wang, A. Krall, M. A. Ertl, and C. Eisenbeis, Software Pipelining with Register Allocation and Spilling, Proceedings of the 27th Annual Int. Symp. on Microarchitecture, pp. 95–99 (November 1994).

  28. R. E. Kessler, The Alpha 21264 Microprocessor, IEEE Micro, Vol. 19, No. 2, pp. 24–36, (March 1999).

    Google Scholar 

  29. S. W. White and S. Dhawan, POWER2: Next Generation of the RISC System/6000 Family, IBM RISC System/6000 Technology: Volume II. IBM Corporation (1993).

  30. J. Llosa, M. Valero, and E. Ayguadù, Non-Consistent Dual Register Files to Reduce Register Pressure, Proceedings of the 1st Symposium on High Performance Computer Architecture, pp. 22–31 (January 1995).

  31. A. Capitanio, N. Dutt, and A. Nicolau, Partitioned Register Files for VLIWs: A Preliminary Analysis of Tradeoffs, Proceedings of the 25th Annual Int. Symp. on Microarchitecture (MICRO-25), pp. 292–300 (December 1992).

  32. M. Fernandes, J. Llosa, and N. Topham, Partitioned Schedules for Clustered Vliw Architectures, Proceedings of the 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP'1998), pp. 386–391 (March 1998).

  33. J. Llosa, M. Valero, J. Fortes, and E. Ayguadù, Using Sacks to Organize Register Files VLIW Machines, CONPAR 94VAPP VI (September 1994).

  34. K. D. Cooper and T. J. Harvey, Compiler-Controlled Memory, Proceedings of the Eighth Internat. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 100–104 (October 1998).

  35. D. Siewiorek, C. Bell, and A. Newell, Computer Structures: Principles and Examples, MacGraw-Hill, Pittsbutgh, Pennsylvania (1982).

    Google Scholar 

  36. J. A. Swensen and Y. N. Patt, Hierarchical Registers For Scientific Computers, Proceedings of the International Conference on Supercomputing, pp. 346–353 (July 1988).

  37. J. L. Cruz, A. Gonzalez, M. Valero, and N. P. Topham, Multiple-Banked Register File Architectures, Proceedings 27th Annual Internat. Symp. on Computer Architecture, (June 2000).

  38. J. Zalamea, J. Llosa, E. Ayguadù, and Mateo Valero, Two-Level Hierarchical Register File Organization For VLIW Processors, Proceedings of the 33rd International Symposium on Microarchitecture (MICRO-33) pp. 137–146 (December 2000).

  39. J. Sánchez and A. González, Cache Sensitive Modulo Scheduling, Proceedings of the 30th Annual International Symposium on Microarchitecture (MICRO-30), pp. 338–348 (December 1997).

  40. J. R. Ellis, Bulldog: A Compiler for VLIW Architectures, MIT Press (1986).

  41. G. Desoli, Instruction Assignment for Clustered VLIW DSP Compilers: A New Approach, Tech. Rep. HPL–98–13, HP Laboratories (January 1998).

  42. S. Jang, S. Carr, P. Sweany, and D. Kuras, A Code Generation Framework for VLIW Architectures with Partitioned Register Banks, Proceedings of 3rd. Int. Conf. on Massively Parallel Computing Systems (April 1998).

  43. E. Özer, S. Banerjia, and T. Conte, Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register file Microarchitectures, Proceedings of 31st Annual International Symposium on Microarchitecture (MICRO-31), pp. 308–315 (November 1998).

  44. K. Kailas, K. Ebcioglu, and A. Agrawala, Cars: A New Code Generation Framework for Clustered Ilp Processors, Proceedings of the 7th High-Performance Computer Architecture (HPCA-7) (January 2001).

  45. E. Nystrom and E. Eichenberger, Effective Cluster Assignment for Modulo Scheduling, Proceedings of 31st Annual Int. Symp. on Microarchitecture (MICRO-31), pp. 103–114 (November 1998).

  46. J. Sánchez and A. González, The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures, Proceedings of the International Conference on Parallel Processing (ICPP'200), pp. 555–562 (August 2000).

  47. J. Zalamea, J. Llosa, E. Ayguadù, and M. Valero, Modulo Scheduling with Integrated Register Spilling for Clustered VLIW Architectures, Tech. Rep. UPC-DAC-2001–19, Universitat Politècnica de Catalunya, (June 2001).

  48. D López, J. Llosa, Mateo Valero, and E. Ayguadù, Widening Resources: A Cost-Effective Technique for Aggressive ilp Architectures, 31st International Symposium on Microarchitecture (MICRO-31), pp. 237–246 (November 2000).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zalamea, J., Llosa, J., Ayguadé, E. et al. Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures. International Journal of Parallel Programming 32, 447–474 (2004). https://doi.org/10.1023/B:IJPP.0000042082.31819.6d

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:IJPP.0000042082.31819.6d

Navigation