Code Positioning for VLIW Architectures
Several studies have considered reducing instruction cache misses and branch penalty stall cycles by means of various forms of code placement. Most proposed approaches rearrange procedures or basic blocks in order to speed up execution on sequential architectures with branch prediction. Moreover, most works focus mainly on instruction cache performance and disregard execution cycles. To the best of our knowledge, no work has specifically addressed statically scheduled ILP machines like VLIWs, with control-transfer delay slots. We propose a new code positioning algorithm especially designed for VLIW-style architectures, which allows to trade off tighter schedule for program locality. Our measurements indicate that code positioning, as a result of tighter program schedule and removed unconditional jumps, can significantly reduce the number of execution cycles, by up to 21%, while improving program locality and instruction cache performance.
KeywordsBasic Block Cache Size Cache Line Instruction Cache Code Position
Unable to display preview. Download preview PDF.
- 1.J.W. Davidson and R.A. Vaughan. The effect of instruction set complexity on program size and memory performance. In ASPLOS-II, pages 60–64, Palo Alto, CA, 1987.Google Scholar
- 2.Abraham Mendlson, Shlomit S. Pinter, and Ruth Shtokhamer. Compile time instruction cache optimizations. In Compiler Construction, pages 404–418, April 1994.Google Scholar
- 3.W.W. Hwu and P.P. Chang. Achieving high instruction cache performance with an optimizing compiler. In ISCA-16, pages 242–251, Jerusalem, Israel, May 1989.Google Scholar
- 5.Karl Pettis and Robert C. Hansen. Profile guided code positioning. In PLDI, pages 16–27, White Plains, New York, June 1990.Google Scholar
- 6.Brad Calder and Dirk Grunwald. Reducing branch costs via branch alignment. In ASPLOSVI, pages 242–251, October 1994.Google Scholar
- 7.Cliff Young, David S. Johnson, David R. Karger, and Michael D. Smith. Near-optimal intraprocedural branch alignment. In PLDI, pages 183–193, June 1997.Google Scholar
- 8.Jan Hoogerbrugge. Instruction scheduling for trimedia. JILP, 1(1-2), 1999.Google Scholar
- 9.Texas Instrument Inc. TMS320C6000 Programmer’s Guide, 2000.Google Scholar
- 10.S. McFarling. Program optimization for instruction caches. In ASPLOS-III, pages 183–193, May 1989.Google Scholar
- 11.Jan Hoogerbrugge. Code Generation for Transport Triggered Architectures. PhD thesis, Technical University of Delft, February 1996.Google Scholar
- 12.Rabin Sugumar. Multi-Configuration Simulation Algorithms for the Evaluation of Computer Architecute Designs. PhD thesis, University of Michigan, August 1993.Google Scholar
- 13.Paul M. Embree. C Algorithms for Real-Time DSP. Prentice Hall, 1995.Google Scholar