Advertisement

Journal of Signal Processing Systems

, Volume 51, Issue 3, pp 269–288 | Cite as

Effective Code Generation for Distributed and Ping-Pong Register Files: A Case Study on PAC VLIW DSP Cores

  • Yung-Chia Lin
  • Chia Han Lu
  • Chung-Ju Wu
  • Chung-Lin Tang
  • Yi-Ping You
  • Ya-Chaio Moo
  • Jenq-Kuen Lee
Article

Abstract

The compiler is generally regarded as the most important software component that supports a processor design to achieve success. This paper describes our application of the open research compiler infrastructure to a novel VLIW DSP (known as the PAC DSP core) and the specific design of code generation for its register file architecture. The PAC DSP utilizes port-restricted, distributed, and partitioned register file structures in addition to a heterogeneous clustered data-path architecture to attain low power consumption and a smaller die. As part of an effort to overcome the new challenges of code generation for the PAC DSP, we have developed a new register allocation scheme and other retargeting optimization phases that allow the effective generation of high quality code. Our preliminary experimental results indicate that our developed compiler can efficiently utilize the features of the specific register file architectures in the PAC DSP. Our experiences in designing compiler support for the PAC VLIW DSP with irregular resource constraints may also be of interest to those involved in developing compilers for similar architectures.

Keywords

compiler ping-pong register files VLIW DSP clustering parallel processing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    The SUIF 2 compiler system, http://suif.stanford.edu/suif/suif2.
  2. 2.
    P.P. Chang et al., “IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors,” in Proceedings of the 18th Annual International Symposium on Computer Architecture, Toronto, Canada, vol. 28, no. 5 1991, pp. 266–275.Google Scholar
  3. 3.
    ReaCT-ILP Laboratory, “Trimaran: An Infrastructure for Research in Instruction-Level Parallelism,” http://www.trimaran.org.
  4. 4.
    A. Andrew et al., “The Zephyr Compiler Infrastructure,” http://www.cs.virginia.edu/zephyr/.
  5. 5.
    The GNU Compiler Collection, http://gcc.gnu.org.
  6. 6.
    R. Ju, S. Chan and C. Wu, “Open Research Compiler for the Itanium Family,” Tutorial at the 34th Annual International Symposium on Microarchitecture, Dec. 2001.Google Scholar
  7. 7.
    G.R. Gao, J.N. Amaral, J. Dehnert and R. Towle, “The SGI Pro64 compiler infrastructure: A tutorial,” in Tutorial at the International Conference on Parallel Architecture and Compilation Techniques, Oct. 2000.Google Scholar
  8. 8.
    T.-J. Lin, C.-C. Lee, C.-W. Liu and C.-W. Jen, “A Novel Register Organization for VLIW Digital Signal Processors,” in Proc. of 2005 IEEE Int. Symp. on VLSI Design, Automation, and Test, 2005, pp. 335–338.Google Scholar
  9. 9.
    T.-J. Lin, P.-C. Hsiao, C.-W. Liu and C.-W. Jen, “Area-Efficient Register Organization for Fully-Synthesizable VLIW DSP Cores”, International Journal of Electrical Engineering, vol. 13, May 2006.Google Scholar
  10. 10.
    D. Chang and M. Baron, “Taiwan’s Roadmap to Leadership in Design,” Microprocessor Report, In-Stat/MDR, Dec. 2004. http://www.mdronline.com/mpr/archive/mpr\_2004.html.
  11. 11.
    D.C.-W. Chang, C.-W. Jen, I-T. Liao, J.-K. Lee, W.-F. Chen and S.-Y. Tseng, “ PAC DSP Core and Application Processors,” in Proc. of the IEEE Int. Conf. on Multimedia & Expo, Toronto, July 9–12, 2006.Google Scholar
  12. 12.
    T.-J. Lin, C.-C. Chang, C.-C. Lee and C.-W. Jen, “An Efficient VLIW DSP Architecture for Baseband Processing,” in Proceedings of the 21th International Conference on Computer Design, 2003.Google Scholar
  13. 13.
    T.-J. Lin, C.-M. Chao, C.-H. Liu, P.-C. Hsiao, S.-K. Chen, L.-C. Lin, C.-W. Liu, C.-W. Jen, “Computer Architecture: A Unified Processor Architecture for RISC & VLIW DSP,” in Proceedings of the 15th ACM Great Lakes symposium on VLSI, April 2005.Google Scholar
  14. 14.
    TMS320DM6443 Digital Media System-on-Chip Datasheet, Texas Instruments, 2006.Google Scholar
  15. 15.
    S. Rixner, W.J. Dally, B. Khailany, P. Mattson, U.J. Kapasi and J.D. Owens, “Register organization for media processing,” in International Symposium on High Performance Computer Architecture (HPCA), pp. 375–386, 2000.Google Scholar
  16. 16.
    A. Capitanio, N. Dutt and A. Nicolau, “Partitioned register files for VLIW’s: A preliminary analysis of tradeoffs,” in Procs. of the 25th Int. Symp. on Microarchitecture: Portland, OR, December 1–4, 1992, pp. 292–300.Google Scholar
  17. 17.
    A. Terechko, E.L. Thenaff, M. Garg, Eijndhoven and H. Corporaal, “Inter-cluster communication models for clustered VLIW processors,” in Procs. HPCA, 2003, pp. 354–364.Google Scholar
  18. 18.
    WHIRL Intermediate Language Specification, “SGI,” 2000.Google Scholar
  19. 19.
    Y.-P. You, C.-R. Lee and J.K. Lee, “Compiler Analysis and Supports for Leakage Power Reduction on Microprocessors,” in LCPC’02, USA, July 2002.Google Scholar
  20. 20.
    C.-R. Lee, J.-K. Lee, T.-T. Hwang and S.-C. Tsai, “Compiler Optimizations on VLIW Instruction Scheduling for Low Power,” ACM Transact. Des. Automat. Electron. Syst., vol. 8, no. 2, 2003, pp. 252–268.CrossRefGoogle Scholar
  21. 21.
    Y.-P. You, C.-W. Huang and J.-K. Lee, A Sink-N-Hoist Framework for Leakage Power Reduction,” in Proceedings of ACM EMSOFT 2005, September 2005.Google Scholar
  22. 22.
    P.-S. Chen, M.-Y. Hung, Y.-S. Hwang, R. D.-C. Ju and J.K. Lee, “Compiler Support for Speculative Multithreading Architecture with Probabilistic Points-To Analysis,” in Proceedings of ACM Principles and Practices of Parallel Programming (ACM PPoPP), San Diego, 2003.Google Scholar
  23. 23.
    P.-S. Chen, Y.-S. Hwang, D.-C. Ju and J.K. Lee, “Interprocedural Probabilistic Pointer Analysis,” IEEE Trans. Parallel Distrib. Syst., vol. 15, no. 10, Oct. 2004, pp. 893–907.CrossRefGoogle Scholar
  24. 24.
    Y.-C. Lin, Y.-S. Hwang and J.K. Lee, “Compiler Optimizations with DSP-Specific Semantic Descriptions,” in LCPC’02, USA, July 2002.Google Scholar
  25. 25.
  26. 26.
    C.-W. Chen, C.-L. Tang, Y.-C. Lin and J.-K. Lee, “ORC2DSP: Compiler Infrastructure Supports for VLIW DSP Processors,” in Proceedings of 2005 IEEE International Symposium on VLSI Design, Automation, and Test, 2005, pp. 224–227.Google Scholar
  27. 27.
    S. Kirkpatrick, C.D. Gelatt and M.P. Vecchi, “Optimization by Simulated Annealing,” Science, vol. 220, no. 4598, 1983, pp. 671–680.CrossRefMathSciNetGoogle Scholar
  28. 28.
    P. Salamon, P. Sibani and R. Frost, “Facts, Conjectures, and Improvements for Simulated Annealing. ser. Monographs on Mathematical Modeling and Computation,” Society for Industrial and Applied Mathematics, no. 7, 2002.Google Scholar
  29. 29.
    R. Leupers, “Instruction scheduling for clustered VLIW DSPs,” in Proc. Int’l Conference on Parallel Architecture and Compilation Techniques, Oct. 2000, pp. 291–300.Google Scholar
  30. 30.
    Y.-C. Lin, Y.-P. You and J.-K. Lee, “Register Allocation for VLIW DSP Processors with Irregular Register Files,” in CPC 2006, Spain, Jan. 2006.Google Scholar
  31. 31.
    A.V. Aho, R. Sethi and J.D. Ullman, “Compilers: Principles, Techniques and Tools,” Addison-Wesley, November 1985.Google Scholar
  32. 32.
    M.E. Wolf, D.E. Maydan and D.-K. Chen, “Combining loop transformations considering caches and scheduling,” International Journal of Parallel Programming, vol. 26, no. 4, 1998.Google Scholar
  33. 33.
    V. Zivojnovic, J. Martinez, C. Schläger and H. Meyr, “DSPstone: A DSP-Oriented Benchmarking Methodology,” Proc. of ICSPAT, Dallas, 1994.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Yung-Chia Lin
    • 1
  • Chia Han Lu
    • 1
  • Chung-Ju Wu
    • 1
  • Chung-Lin Tang
    • 1
  • Yi-Ping You
    • 1
  • Ya-Chaio Moo
    • 1
  • Jenq-Kuen Lee
    • 1
  1. 1.Department of Computer ScienceNational Tsing-Hua UniversityHsinchuTaiwan

Personalised recommendations