Journal of Signal Processing Systems

, Volume 84, Issue 3, pp 435–446 | Cite as

Improving Code Density with Variable Length Encoding Aware Instruction Scheduling

  • Heikki Kultala
  • Timo Viitanen
  • Pekka Jääskeläinen
  • Janne Helkala
  • Jarmo Takala


Variable length encoding can considerably decrease code size in VLIW processors by reducing the number of bits wasted on encoding No Operations(NOPs). A processor may have different instruction templates where different execution slots are implicitly NOPs, but all combinations of NOPs may not be supported by the instruction templates. The efficiency of the NOP encoding can be improved by the compiler trying to place NOPs in such way that the usage of implicit NOPs is maximized. Two different methods of optimizing the use of the implicit NOP slots are evaluated: (a) prioritizing function units that have fewer implicit NOPs associated with them and (b) a post-pass to the instruction scheduler which utilizes the slack of the schedule by rescheduling operations with slack into different instruction words so that the available instruction templates are better utilized. Three different methods for selecting basic blocks to apply FU priorization on are also analyzed: always, always outside inner loops, and only outside inner loops only in basic blocks after testing where it helped to decrease code size. The post-pass optimizer alone saved an average of 2.4 % and a maximum of 10.5 % instruction memory, without performance loss. Prioritizing function units in only those basic blocks where it helped gave the best case instruction memory savings of 10.7 % and average savings of 3.0 % in exchange for an average 0.3 % slowdown. Applying both of the optimizations together gave the best case code size decrease of 12.2 % and an average of 5.4 %, while performance decreased on average by 0.1 %.


Code density Variable length instructions vliw tta Instruction scheduling Code optimization Instruction templates 



This work was funded by Academy of Finland (funding decision 253087), Finnish Funding Agency for Technology and Innovation (project ”Parallel Acceleration”, funding decision 40115/13), and ARTEMIS Joint Undertaking under grant agreement no 621439 (ALMARVI).


  1. 1.
    Corporaal, H., & Arnold, M. (1998). Using Transport Triggered Architectures for embedded processor design. Integrated Computer-Aided Engineering, 5(1), 19–38.Google Scholar
  2. 2.
    Conte, T.M., Banerjia, S., Larin, S.Y., Menezes, K.N., & Sathaye, S.W. (1996). Instruction fetch mechanisms for VLIW architectures with compressed encodings. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 201–211).Google Scholar
  3. 3.
    Aditya, S., Mahlke, S. A., & Rau, B. R. (2000). Code size minimization and retargetable assembly for custom EPIC and VLIW instruction formats. ACM Transactions on Design Automation of Electronic Systems, 5(4), 752–773.CrossRefGoogle Scholar
  4. 4.
    Helkala, J., Viitanen, T., Kultala, H., Jääskeläinen, P., Takala, J., Zetterman, T., & Berg, H. (2014). Variable length instruction compression on transport triggered architectures. In Proceedings of the International Conference on Embedded Computing Systems: Architectures Modeling and Simulation (pp. 149–155). Samos, Greece.Google Scholar
  5. 5.
    Kultala, H., Viitanen, T., Jääskelainen, P., Helkala, J., & Takala, J. (2014). Compiler optimizations for code density of variable length instructions. In Proceedings of the IEEE Workshop on Signal Processing Systems (pp. 1–6).Google Scholar
  6. 6.
    Lee, C., Lee, J.K., & Hwang, T. (2000). Compiler optimization on instruction scheduling for low power. In Proceedings of the 13th International Symposium on System Synthesis (pp. 55–60).Google Scholar
  7. 7.
    Hahn, T.T., Stotzer, E., Sule, D., & Asal, M. (2008). Compilation strategies for reducing code size on a VLIW processor with variable length instructions. In Proceedings of the 3rd International Conference on High Performance Embedded Architectures and Compilers (pp. 147–160). Berlin Heidelberg: Springer-Verlag.CrossRefGoogle Scholar
  8. 8.
    Stotzer, E.J., & Leiss, E.L. (2012). Co-design of compiler and hardware techniques to reduce program code size on a vliw processor. CLEI Electronic Journal, 15(2), 2–2.Google Scholar
  9. 9.
    Jee, S., & Palaniappan, K. (2002). Performance evaluation for a compressed-VLIW processor. In Proceedings of the ACM Symposium on Applied Computing (pp. 913–917).Google Scholar
  10. 10.
    Ros, M., & Sutton, P. (2005). A post-compilation register reassignment technique for improving hamming distance code compression. In Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (pp. 97–104).Google Scholar
  11. 11.
    Larin, S.Y., & Conte, M.T. (1999). Compiler-driven cached code compression schemes for embedded ilp processors. In Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture (pp. 82–92): IEEE.Google Scholar
  12. 12.
    Haga, S., Webber, A., Zhang, Y., Nguyen, N., & Barua, R. (2005). Reducing code size in VLIW instruction scheduling. Journal of Embedded Computing, 1(3), 415–433.Google Scholar
  13. 13.
    Haga, S., & Barua, R. (2001). EPIC instruction scheduling based on optimal approaches. In Proceedings of the First Annual Workshop on Explicitly Parallel Instruction Computing Architectures and Compiler Technology (pp. 22–31).Google Scholar
  14. 14.
    Muchnick, S.S. (1997). Advanced Compiler Design and Implementation: Morgan Kaufmann.Google Scholar
  15. 15.
    Hara, Y., Tomiyama, H., Honda, S., & Takada, H. (2009). Proposal and quantitative analysis of the CHStone benchmark program suite for practical C-based high-level synthesis. Journal of Information Processing, 17, 242–254.CrossRefGoogle Scholar
  16. 16.
    Jääskeläinen, P., Guzma, V., Cilio, A., & Takala, J. (2007). Codesign toolset for application-specific instruction-set processors. In Proceedings of SPIE Multimedia on Mobile Devices (pp. 65070X–1 – 65070X–11).Google Scholar
  17. 17.
    Viitanen, T., Kultala, H., Jääskeläinen, P., & Takala, J. (2014). Heuristics for greedy transport triggered architecture interconnect exploration. In Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (pp. 2:1–2:7).Google Scholar
  18. 18.
    Fisher, J.A., Faraboschi, P., & Young, C. (2005). Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools: Elsevier.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Heikki Kultala
    • 1
  • Timo Viitanen
    • 1
  • Pekka Jääskeläinen
    • 1
  • Janne Helkala
    • 1
  • Jarmo Takala
    • 1
  1. 1.Tampere University of TechnologyTampereFinland

Personalised recommendations