Effective Code Generation for Distributed and Ping-Pong Register Files: A Case Study on PAC VLIW DSP Cores
The compiler is generally regarded as the most important software component that supports a processor design to achieve success. This paper describes our application of the open research compiler infrastructure to a novel VLIW DSP (known as the PAC DSP core) and the specific design of code generation for its register file architecture. The PAC DSP utilizes port-restricted, distributed, and partitioned register file structures in addition to a heterogeneous clustered data-path architecture to attain low power consumption and a smaller die. As part of an effort to overcome the new challenges of code generation for the PAC DSP, we have developed a new register allocation scheme and other retargeting optimization phases that allow the effective generation of high quality code. Our preliminary experimental results indicate that our developed compiler can efficiently utilize the features of the specific register file architectures in the PAC DSP. Our experiences in designing compiler support for the PAC VLIW DSP with irregular resource constraints may also be of interest to those involved in developing compilers for similar architectures.
Keywordscompiler ping-pong register files VLIW DSP clustering parallel processing
Unable to display preview. Download preview PDF.
- 1.The SUIF 2 compiler system, http://suif.stanford.edu/suif/suif2.
- 2.P.P. Chang et al., “IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors,” in Proceedings of the 18th Annual International Symposium on Computer Architecture, Toronto, Canada, vol. 28, no. 5 1991, pp. 266–275.Google Scholar
- 3.ReaCT-ILP Laboratory, “Trimaran: An Infrastructure for Research in Instruction-Level Parallelism,” http://www.trimaran.org.
- 4.A. Andrew et al., “The Zephyr Compiler Infrastructure,” http://www.cs.virginia.edu/zephyr/.
- 5.The GNU Compiler Collection, http://gcc.gnu.org.
- 6.R. Ju, S. Chan and C. Wu, “Open Research Compiler for the Itanium Family,” Tutorial at the 34th Annual International Symposium on Microarchitecture, Dec. 2001.Google Scholar
- 7.G.R. Gao, J.N. Amaral, J. Dehnert and R. Towle, “The SGI Pro64 compiler infrastructure: A tutorial,” in Tutorial at the International Conference on Parallel Architecture and Compilation Techniques, Oct. 2000.Google Scholar
- 8.T.-J. Lin, C.-C. Lee, C.-W. Liu and C.-W. Jen, “A Novel Register Organization for VLIW Digital Signal Processors,” in Proc. of 2005 IEEE Int. Symp. on VLSI Design, Automation, and Test, 2005, pp. 335–338.Google Scholar
- 9.T.-J. Lin, P.-C. Hsiao, C.-W. Liu and C.-W. Jen, “Area-Efficient Register Organization for Fully-Synthesizable VLIW DSP Cores”, International Journal of Electrical Engineering, vol. 13, May 2006.Google Scholar
- 10.D. Chang and M. Baron, “Taiwan’s Roadmap to Leadership in Design,” Microprocessor Report, In-Stat/MDR, Dec. 2004. http://www.mdronline.com/mpr/archive/mpr\_2004.html.
- 11.D.C.-W. Chang, C.-W. Jen, I-T. Liao, J.-K. Lee, W.-F. Chen and S.-Y. Tseng, “ PAC DSP Core and Application Processors,” in Proc. of the IEEE Int. Conf. on Multimedia & Expo, Toronto, July 9–12, 2006.Google Scholar
- 12.T.-J. Lin, C.-C. Chang, C.-C. Lee and C.-W. Jen, “An Efficient VLIW DSP Architecture for Baseband Processing,” in Proceedings of the 21th International Conference on Computer Design, 2003.Google Scholar
- 13.T.-J. Lin, C.-M. Chao, C.-H. Liu, P.-C. Hsiao, S.-K. Chen, L.-C. Lin, C.-W. Liu, C.-W. Jen, “Computer Architecture: A Unified Processor Architecture for RISC & VLIW DSP,” in Proceedings of the 15th ACM Great Lakes symposium on VLSI, April 2005.Google Scholar
- 14.TMS320DM6443 Digital Media System-on-Chip Datasheet, Texas Instruments, 2006.Google Scholar
- 15.S. Rixner, W.J. Dally, B. Khailany, P. Mattson, U.J. Kapasi and J.D. Owens, “Register organization for media processing,” in International Symposium on High Performance Computer Architecture (HPCA), pp. 375–386, 2000.Google Scholar
- 16.A. Capitanio, N. Dutt and A. Nicolau, “Partitioned register files for VLIW’s: A preliminary analysis of tradeoffs,” in Procs. of the 25th Int. Symp. on Microarchitecture: Portland, OR, December 1–4, 1992, pp. 292–300.Google Scholar
- 17.A. Terechko, E.L. Thenaff, M. Garg, Eijndhoven and H. Corporaal, “Inter-cluster communication models for clustered VLIW processors,” in Procs. HPCA, 2003, pp. 354–364.Google Scholar
- 18.WHIRL Intermediate Language Specification, “SGI,” 2000.Google Scholar
- 19.Y.-P. You, C.-R. Lee and J.K. Lee, “Compiler Analysis and Supports for Leakage Power Reduction on Microprocessors,” in LCPC’02, USA, July 2002.Google Scholar
- 21.Y.-P. You, C.-W. Huang and J.-K. Lee, A Sink-N-Hoist Framework for Leakage Power Reduction,” in Proceedings of ACM EMSOFT 2005, September 2005.Google Scholar
- 22.P.-S. Chen, M.-Y. Hung, Y.-S. Hwang, R. D.-C. Ju and J.K. Lee, “Compiler Support for Speculative Multithreading Architecture with Probabilistic Points-To Analysis,” in Proceedings of ACM Principles and Practices of Parallel Programming (ACM PPoPP), San Diego, 2003.Google Scholar
- 24.Y.-C. Lin, Y.-S. Hwang and J.K. Lee, “Compiler Optimizations with DSP-Specific Semantic Descriptions,” in LCPC’02, USA, July 2002.Google Scholar
- 25.John R. Hauser. SoftFloat. http://www.jhauser.us/arithmetic/SoftFloat.html.
- 26.C.-W. Chen, C.-L. Tang, Y.-C. Lin and J.-K. Lee, “ORC2DSP: Compiler Infrastructure Supports for VLIW DSP Processors,” in Proceedings of 2005 IEEE International Symposium on VLSI Design, Automation, and Test, 2005, pp. 224–227.Google Scholar
- 28.P. Salamon, P. Sibani and R. Frost, “Facts, Conjectures, and Improvements for Simulated Annealing. ser. Monographs on Mathematical Modeling and Computation,” Society for Industrial and Applied Mathematics, no. 7, 2002.Google Scholar
- 29.R. Leupers, “Instruction scheduling for clustered VLIW DSPs,” in Proc. Int’l Conference on Parallel Architecture and Compilation Techniques, Oct. 2000, pp. 291–300.Google Scholar
- 30.Y.-C. Lin, Y.-P. You and J.-K. Lee, “Register Allocation for VLIW DSP Processors with Irregular Register Files,” in CPC 2006, Spain, Jan. 2006.Google Scholar
- 31.A.V. Aho, R. Sethi and J.D. Ullman, “Compilers: Principles, Techniques and Tools,” Addison-Wesley, November 1985.Google Scholar
- 32.M.E. Wolf, D.E. Maydan and D.-K. Chen, “Combining loop transformations considering caches and scheduling,” International Journal of Parallel Programming, vol. 26, no. 4, 1998.Google Scholar
- 33.V. Zivojnovic, J. Martinez, C. Schläger and H. Meyr, “DSPstone: A DSP-Oriented Benchmarking Methodology,” Proc. of ICSPAT, Dallas, 1994.Google Scholar