Energy-Aware Compiler Optimization for VLIW-DSP Cores

  • Yung-Cheng Ma
  • Tse-An Liu
  • Wen-Shih Chao
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 21)


VLIW-DSP processor cores are widely used in embedded SoCs. Improving the energy efficiency becomes one of the key issues in designing a VLIW-DSP core. This paper proposes compiler optimization algorithms to reduce the register file power in a VLIW-DSP processor. The optimization is targeted to VLIW processors in which each execution slot is associated with a low-powered local register file. Instruction scheduling and register allocation algorithms are proposed to direct operand accesses to the local register files. We propose energy-aware list scheduling algorithm to reduce cross-slot data dependencies without affecting the program execution time. Constrained by the instruction scheduling result, energy-aware register allocation is performed through weighted graph coloring. Evaluation with MiBench benchmark suite shows that our approach reduces over 50% of data transfer energy with low hardware cost. This research shows a cost-effective way to design an energy-efficient VLIW- DSP processor.


energy-aware instruction scheduling weighted graph coloring register allocation VLIW-DSP processor 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Texas Instruments, OMAP 5 mobile applications platform (2012)Google Scholar
  2. 2.
    Philips: Philips nexperiahighly integrated programmable system-on-chip (2012)Google Scholar
  3. 3.
    St. Nomadik: St nomadik multimedia processor (2012)Google Scholar
  4. 4.
    Texas Instruments, Tms320c6455 fixed-point digital signal processor (2008)Google Scholar
  5. 5.
    Freescale Semiconductor, Tuning C code for StarCore-based digital signal processors (2008)Google Scholar
  6. 6.
    Zyuban, V., Kogge, P.: The energy complexity of register files. In: Proceedings of the 1998 International Symposium on Low Power Electronics and Design, ISLPED 1998, pp. 305–310. ACM, New York (1998)CrossRefGoogle Scholar
  7. 7.
    Lin, Y.-C., Lu, C.H., Wu, C.-J., Tang, C.-L., You, Y.-P., Moo, Y.-C., Lee, J.-K.: Effective code generation for distributed and ping-pong register files: A case study on pac vliw dsp cores. J. Signal Process. Syst. 51, 269–288 (2008)CrossRefGoogle Scholar
  8. 8.
    Nagpal, R., Srikant, Y.: Compiler-assisted power optimization for clustered VLIW architectures. Parallel Computing 37(1), 42–59 (2011)CrossRefGoogle Scholar
  9. 9.
    Dally, W., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R., Parikh, V., Park, J., Sheffield, D.: Efficient embedded computing. Computer 41(7), 27–32 (2008)CrossRefGoogle Scholar
  10. 10.
    Guthaus, M., Ringenberg, J., Ernst, D., Austin, T., Mudge, T., Brown, R.: Mibench: A free, commercially representative embedded benchmark suite. In: 2001 IEEE International Workshop on Workload Characterization, WWC 4, pp. 3–14 (December 2001)Google Scholar
  11. 11.
    Terechko, A.S., Corporaal, H.: Inter-cluster communication in VLIW architectures. ACM Trans. Archit. Code Optim. 4(2), 11 (2007)CrossRefGoogle Scholar
  12. 12.
    Hochbaum, D.S.: Approximation Algorithms for NP-Hard Problems (1995)Google Scholar
  13. 13.
    Chaitin, G.J.: Register allocation & spilling via graph coloring. In: Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction, SIGPLAN 1982, pp. 98–105. ACM, New York (1982)CrossRefGoogle Scholar
  14. 14.
    Thoziyoor, S., Muralimanohar, N., Ahn, J.H., Jouppi, N.P.: Cacti 5.1. HP Laboratories Technical Report HPL-2008-20 (April 2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of Computer Science and Information EngineeringChang-Gung UniversityTaoyuanTaiwan

Personalised recommendations