Impact of Address Generation on Multimedia Embedded VLIW Processors

  • Guillermo TalaveraEmail author
  • Antoni PorteroEmail author
  • Francky CatthoorEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11127)


Embedded multimedia devices need to be more and more energy efficient while dealing with applications of increasing complexity. These applications are characterised by having complex array index manipulation, a large number of data accesses and require high performant specific computation at low energy consumption due to battery life.

In many cases, the principal component of such systems is a programmable processor, and often, a Very Large Instruction Word (VLIW) processor (alone or integrated with other processor cores). A VLIW processor seems a good solution providing enough performance at low power with sufficient programmability but optimising the access to the data is a crucial issue for the success of those devices. Some modern embedded architectures include a dedicated unit that works in parallel with the central computing elements ensuring efficient feed and storage of the data from/to the data path: the Address Generation Unit.

In this paper, we present an experimental work that shows, on real and complete applications and benchmarks, the impact of address generation in VLIW-like processor architectures. We see how address generation in multimedia embedded systems has a very significant contribution to the energy budget and a careful analysis an optimisation is needed to extend battery life as much as possible while keeping enough performance to satisfy the quality of service requirements. We also present the framework used to create and evaluate the impact of address generation on the overall system.


Address generation VLIW processors Energy optimisation 



This work is supported by the Ministry of Education, Youth and Sports of the National Programme for Sustainability II (NPU II) under the project “IT4Innovations excellence in science – LQ1602” and by the EC under the grant HARPA FP7-612069.


  1. 1.
    Ho, N., et al.: Simulating a multi-core x86-64 architecture with hardware ISA extension supporting a data-flow execution model. In: AIMS 2014, pp. 264–269 (2014).
  2. 2.
    Ho, N., et al.: Enhancing an x86\_64 multi-core architecture with data-flow execution support. In: CF 2015. ACM, New York (2015). Article 41, 2 pp.
  3. 3.
    Huo, Y., Liu, D.: High-throughput area-efficient processor for 3GPP LTE cryptographic core algorithms. In: 28th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP, p. 210 (2017)Google Scholar
  4. 4.
    Zhang, B., Zhao, C., Mei, K., Zhao, J., Zheng, N.: Hierarchical and parallel pipelined heterogeneous SoC for embedded vision processing. IEEE Trans. Circuits Syst. Video Technol. 28, 1434–1444 (2018). Scholar
  5. 5.
    Iandola, F., Keutzer, K.: Keynote: small neural nets are beautiful: enabling embedded systems with small deep-neural-network architectures. In: 2017 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), Seoul, pp. 1–10 (2017).
  6. 6.
    Bouraoui, H., Jerad, Ch., Chattopadhyay, A., Hadj-Alouane, N.B.: Hardware architectures for embedded speaker recognition applications: a survey. ACM Trans. Embed. Comput. Syst. 16(3) (2017). Article 78, 28 pp.
  7. 7.
    Yoshida, N., Lanante, L., Nagao, Y., Kurosaki, M., Ochi, H.: A hybrid HW/SW 802.11ac/ax system design platform with ASIP implementation. In: 2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), Xiamen, pp. 827–831 (2017).
  8. 8.
    Khan, S., Rashid, M., Javaid, F.: A high performance processor architecture for multimedia applications. Comput. Electr. Eng. 66, 14–29 (2017). Scholar
  9. 9.
    Kuhn, P.: Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Estimation. Kluwer, Norwell (2004)Google Scholar
  10. 10.
    Wuytack, S., Catthoor, F., Nachtergaele, L., De Man, H.: Power exploration for data-dominated video applications. In: ISLPED 1996: Proceedings of the 1996 International Symposium on Low Power Electronics and Design, pp. 359–364. IEEE Press, Piscataway (1996)Google Scholar
  11. 11.
    Moolenaar, D., Nachtergaele, L., Catthoor, F., De Man, H.: System-level power exploration for MPEG-2 decoder on embedded cores: a systematic approach. J. VLSI Sig. Process. Syst., 395–404 (1997)Google Scholar
  12. 12.
    Catthoor, F.: Data Access and Storage Management for Embedded Programmable Processors. Kluwer, New York (2002). Scholar
  13. 13.
    Catthoor, F., Balasa, F., Greef, E.D., Nachtergaele, L.: Custom Memory Management Methodology: Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Academic Publisher, US (1998). Scholar
  14. 14.
    Catthoor, F., Dutt, N.: Hot topic session: how to solve the current memory access and data transfer bottlenecks: at the processor architecture or at the compiler level? In: DATE (2000)Google Scholar
  15. 15.
    Catthoor, F.: Energy-delay efficient data storage and transfer architectures and methodologies: current solutions and remaining problems. J. VLSI Sig. Process. 21, 219–231 (1999)CrossRefGoogle Scholar
  16. 16.
    Gonzalez, R., Horowitz, M.: Energy dissipation in general purpose microprocessors. IEEE J. Solid-State Circuits 31, 1277–1284 (1996)CrossRefGoogle Scholar
  17. 17.
    Falk, H., Marwedel, P.: Control flow driven splitting of loop nests at the source code level. In: DATE 2003: Proceedings of the Conference on Design, Automation and Test in Europe, pp. 410–415. IEEE Computer Society, Washington (2003)Google Scholar
  18. 18.
    Falk, H., Verma, M.: Combined data partitioning and loop nest splitting for energy consumption minimization. In: Schepers, H. (ed.) SCOPES 2004. LNCS, vol. 3199, pp. 137–151. Springer, Heidelberg (2004). Scholar
  19. 19.
    Falk, H., Marwedel, P.: Source Code Optimization Techniques for Data Flow Dominated Embedded Software. Springer, New York (2004). Scholar
  20. 20.
    Falk, H.: Control flow driven code hoisting at the source code level. In: ODES 2005: Proceedings of The 3rd Workshop on Optimizations for DSP and Embedded Systems, March 2005Google Scholar
  21. 21.
    Talavera, G., Jayapala, M., Carrabina, J., Catthoor, F.: Address generation optimization for embedded high-performance processors: a survey. J. Sig. Process. Syst. Sig. Image Video Technol. 53, 271–284 (2008)CrossRefGoogle Scholar
  22. 22.
    Portero, A., Talavera, G., Moreno, M., Carrabina, J., Catthoor, F.: Methodology for energy-flexibility space exploration and mapping of multimedia applications to multiple platform styles. IEEE Trans. Circuits Syst. Video Technol. 21(8), 1027–1039 (2011)CrossRefGoogle Scholar
  23. 23.
    Taniguchi, I., Raghavan, P., Jayapala, M., Catthoor, F., Takeuchi, Y., Imai, M.: Reconfigurable AGU: an address generation unit based on address calculation pattern for low energy and high performance embedded processors. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E92.A(4), 1161–1173 (2009)CrossRefGoogle Scholar
  24. 24.
    Taniguchi, I., Sakanushi, K., Ueda, K., Takeuchi, Y., Imai, M.: Dynamic reconfigurable architecture exploration based on parameterized reconfigurable processor model. In: De Micheli, G., Mir, S., Reis, R. (eds.) VLSI-SoC 2006. IIFIP, vol. 249, pp. 357–376. Springer, Boston, MA (2008). Scholar
  25. 25.
    Biswas, P., Choudhary, V., Atasu, K., Pozzi, L., Ienne, P., Dutt, N.: Introduction of local memory elements in instruction set extensions. In: DAC 2004: Proceedings of the 41st Annual Conference on Design Automation, NY, USA, pp. 729–734 (2004)Google Scholar
  26. 26.
    Yu, P., Mitra, T.: Scalable instructions identification for instruction-set extensible processors. In: Proceedings of CASES, September 2004Google Scholar
  27. 27.
    Wang, H., Miranda, M., Catthoor, F., Dehaene, W.: Synthesis of runtime switchable Pareto buffers offering full range fine grained energy/delay trade-offs. J. Sig. Process. Syst. 52, 193–210 (2007)CrossRefGoogle Scholar
  28. 28.
    Wang, H., Miranda, M., Dehaene, W., Catthoor, F.: Design and synthesis of Pareto buffers offering large range run-time energy-delay trade-off via combined buffer size and supply voltage tuning. IEEE Trans. VLSI Syst. 17, 117–127 (2009)CrossRefGoogle Scholar
  29. 29.
    Leroy, A., Milojevic, D., Verkest, D., Robert, F., Catthoor, F.: Concepts and implementation of spatial division multiplexing forguaranteed throughput in networks-on-chip. IEE Trans. Comput. 57(9), 1182–1195 (2008)CrossRefGoogle Scholar
  30. 30.
    Papanikolaou, A.: Application-driven software configuration of communication networks and memory organizations. Ph.D. thesis, CS Dept., U. Gent, Belgium, December 2006Google Scholar
  31. 31.
    RWTH Aachen - University of Technology. DPG User Manual Version 2.8, October 2005.
  32. 32.
    Guo, J.: Analysis and optimization of intra-tile communication network. Ph.D. thesis, ESAT/EE Dept., K.U.Leuven, August 2008Google Scholar
  33. 33.
    Raghavan, P., Catthoor, F.: Ultra low power asip (application-domain specific instruction-set processor) micro-computer. EU Patent Filed EP 1 701 250 A1, September 2006Google Scholar
  34. 34.
    Faraday Technology Corporation: Faraday UMC 90 nm RVT Standard Cell Library (2007)Google Scholar
  35. 35.
  36. 36.
    Raghavan, P., Lambrechts, A., Absar, J., Jayapala, M., Catthoor, F., Verkest, D.: Coffee: COmpiler framework for energy-aware exploration. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds.) HiPEAC 2008. LNCS, vol. 4917, pp. 193–208. Springer, Heidelberg (2008). Scholar
  37. 37.
    IDSPS Fixed/Floating Point Digital Signal Processor (TI TMS320c67). Accessed February 2018
  38. 38.
    Lee, C., Potkonjak, M., Mangione-Smith, W.H.: Mediabench: a tool for evaluating and synthesizing multimedia and communications systems. In: MICRO 30: Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 330–335. IEEE Computer Society (1997)Google Scholar
  39. 39.
    Bister, M., Taeymans, Y., Cornelis, J.: Automated segmentation of cardiac MR images. In: 1989 Proceedings of the Computers in Cardiology, pp. 215–218, September 1989Google Scholar
  40. 40.
    Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: AFIPS 1967 (Spring): Proceedings of the Spring Joint Computer Conference, 18–20 April 1967, NY, USA, pp. 483–485 (1967)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Institut de Bioenginyeria de CatalunyaBarcelonaSpain
  2. 2.IT4Innovations National SupercomputingCenter VSB-Technical University of OstravaOstrava - PorubaCzech Republic
  3. 3.imecLeuvenBelgium

Personalised recommendations