Copy Propagation Optimizations for VLIW DSP Processors with Distributed Register Files

  • Chung-Ju Wu
  • Sheng-Yuan Chen
  • Jenq-Kuen Lee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4382)


High-performance and low-power VLIW DSP processors are increasingly deployed on embedded devices to process video and multimedia applications. For reducing power and cost in designs of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports in register files. This presents new challenges for devising compiler optimization schemes for such architectures. In our research work, we address the compiler optimization issues for PAC architecture, which is a 5-way issue DSP processor with distributed register files. We show how to support an important class of compiler optimization problems, known as copy propagations, for such architecture. We illustrate that a naive deployment of copy propagations in embedded VLIW DSP processors with distributed register files might result in performance anomaly. In our proposed scheme, we derive a communication cost model by cluster distance, register port pressures, and the movement type of register sets. This cost model is used to guide the data flow analysis for supporting copy propagations over PAC architecture. Experimental results show that our schemes are effective to prevent performance anomaly with copy propagations over embedded VLIW DSP processors with distributed files.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chang, D., Baron, M.: Taiwan’s Roadmap to Leadership in Design. Microprocessor Report, In-Stat/MDR (December 2004),
  2. 2.
    Overstreet, C.M., et al.: Support of software maintenance using data flow analysis. Technical Report TR-94-07, Old Dominion University, Computer Science Department (June 1994)Google Scholar
  3. 3.
    Overstreet, C.M., Cherinka, R., Sparks, R.: Using bidirectional data flow analysis to support software reuse. Technical Report TR-94-09, Old Dominion University, Computer Science Department (June 1994)Google Scholar
  4. 4.
    Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques and Tools. Addison-Wesley, Reading (November 1985)Google Scholar
  5. 5.
    Chen, C.-W., et al.: ORC2DSP: Compiler Infrastructure Supports for VLIW DSP Processors. IEEE VLSI TSA, April 27-29 (2005)Google Scholar
  6. 6.
    Karypis, G., Kumar, V.: A fast and highly quality multilevel scheme for partitioning irregular graphs. SIAM J. Scientific Computing 20(1), 359–392 (1999)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Zivojnovic, V., et al.: DSPstone: A DSP-oriented benchmarking methodology. In: Proceedings of the International Conference on Signal Processing and Technology, October, pp. 715–720 (1994)Google Scholar
  8. 8.
    Lin, T.J., et al.: An Efficient VLIW DSP Architecture for Baseband Processing. In: Proceedings of the 21th International Conference on Computer Design (2003)Google Scholar
  9. 9.
    Lin, T.-J., et al.: Computer architecture: A unified processor architecture for RISC & VLIW DSP. In: Proceedings of the 15th ACM Great Lakes symposium on VLSI, April 2005, ACM Press, New York (2005)Google Scholar
  10. 10.
    Rixner, S., et al.: Register organization for media processing. In: International Symposium on High Performance Computer Architecture, pp. 375–386 (2000)Google Scholar
  11. 11.
    SGI - Developer Central Open Source - Pro64,
  12. 12.
    Lin, Y.-C., You, Y.-P., Lee, J.-K.: Register Allocation for VLIW DSP Processors with Irregular Register Files. In: International Workshop on Languages and Compilers for Parallel Computing, January (2006)Google Scholar
  13. 13.
    Leupers, R.: Instruction scheduling for clustered VLIW DSPs. In: Proceedings of International Conference on Parallel Architecture and Compilation Techniques, October, pp. 291–300 (2000)Google Scholar
  14. 14.
    Hwang, G.-H., Lee, J.-K., Ju, R.D.-C.: A Function-Composition Approach to Synthesize Fortran 90 Array Operations. Journal of Parallel and Distributed Computing 54, 1–47 (1998)MATHCrossRefGoogle Scholar
  15. 15.
    Hwang, G.-H., Lee, J.-K.: Array Operation Synthesis to Optimize HPF Programs on Distributed Memory Machines. Journal of Parallel and Distributed Computing 61, 467–500 (2001)MATHCrossRefGoogle Scholar
  16. 16.
    Lin, Y.-C., et al.: Compiler Supports and Optimizations for PAC VLIW DSP Processors. In: Languages and Compilers for Parallel Computing (2005)Google Scholar
  17. 17.
    Briggs, P., Cooper, K.D., Torczon, L.: Rematerialization. In: Conference on Programming Language Design and Implementation (1992)Google Scholar
  18. 18.
    You, Y.-P., Lee, C.-R., Lee, J.-K.: Compilers for Leakage Power Reductions. ACM Transactions on Design Automation of Electronic Systems 11(Issue 1), 147–166 (2006)CrossRefGoogle Scholar
  19. 19.
    You, Y.-P., Huang, C.-W., Lee, J.-K.: A Sink-N-Hoist Framework for Leakage Power Reduction. In: ACM EMSOFT, September 2005, ACM Press, New York (2005)Google Scholar
  20. 20.
    Chen, P.-S., et al.: Interprocedural Probabilistic Pointer Analysis. IEEE Transactions on Parallel and Distributed Systems 15(Issue 10), 893–907 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Chung-Ju Wu
    • 1
  • Sheng-Yuan Chen
    • 1
  • Jenq-Kuen Lee
    • 1
  1. 1.Department of Computer Science, National Tsing-Hua University, Hsinchu 300Taiwan

Personalised recommendations