Energy Consumption Breakdown and Requirements for an Embedded Platform

  • Francky Catthoor
  • Praveen Raghavan
  • Andy Lambrechts
  • Murali Jayapala
  • Angeliki Kritikakou
  • Javed Absar
Chapter

Abstract

Current embedded systems are built of many interacting components. While optimizing the system, it is important to track the impact of the different parts and their interaction on the global optimality metrics. In this chapter, a representative case study is presented that estimates and compares the most important parts of an embedded platform: namely the processor, data memory hierarchy, instruction memory organization and communication network. The experiment uses a realistic driver application (a MPEG2 video encoder/decoder chain) to estimate the relative importance of the different parts on the final performance and energy consumption of the system. The main objectives of this case study are to get a better insight into platform energy estimation in general, to highlight the bottlenecks for this representative platform and to track the effects of local changes on the other parts in order not to move to a globally worse point. It thereby provides a context to the optimizations that are presented in the rest of this book. Also high-level requirements are derived for the entire platform.

Keywords

Memory Hierarchy Instruction Memory Instruction Level Parallelism Architecture Exploration VLIW Processor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. DeM06.
    G.De Micheli and L.Benini. Networks on Chips: Technology and Tools (Systems on Silicon). Morgan Kaufmann, 2006.Google Scholar
  2. Ler08.
    A. Leroy, D. Milojevic, D. Verkest, F. Robert, and F. Catthoor. Concepts and implementation of spatial division multiplexing forguaranteed throughput in networks-on-chip. IEEE Trans. on Computers, 57(9): 1182–1195, September 2008.CrossRefMathSciNetGoogle Scholar
  3. Pap06.
    A.Papanikolaou. Application-driven software configuration of communication networks and memory organizations. PhD thesis, CS Dept., U.Gent, Belgium, December 2006.Google Scholar
  4. Moo97.
    D.Moolenaar, L.Nachtergaele, F.Catthoor, and H.De Man. System-level power exploration for mpeg-2 decoder on embedded cores: a systematic approach. IEEE Wsh. on Signal Processing Systems, 1997.Google Scholar
  5. Wie02.
    P.Wielage and K.Goossens. Networks on silicon: Blessing or nightmare? Euromicro Symposium On Digital System Design, 2002.Google Scholar
  6. Guo08.
    J.Guo. Analysis and Optimization of intra-tile Communication Network. PhD thesis, ESAT/EE Dept., K.U.Leuven, August 2008.Google Scholar
  7. Lam09.
    A.Lambrechts, “Energy-aware datapath optimizations at the architecture-compiler interface”, Doctoral dissertation, ESAT/EE Dept., K.U.Leuven, Belgium, June 2009.Google Scholar
  8. Das05.
    M.Dasygenis, E.Brockmeyer, B.Durinck, F.Catthoor, D.Soudris, and A.Thanailakis. A memory hierarchical layer assigning and prefetching technique to overcome the memory performance/energy bottleneck. DATE ’05: Proceedings of the conference on Design, Automation and Test in Europe, pages 946–947, Washington, DC, USA, 2005. IEEE Computer Society.Google Scholar
  9. Dal01.
    W.Dally and B.Towles. Route packets, not wires: Interconnect woes through communication-based design. Proc. of the 38th Design Automation Conf., 2001.Google Scholar
  10. Shi03.
    A.Shickova, T.Marescaux, D.Verkest, F.Catthoor, S.Vernalde, and R.Lauwereins. Architecture exploration of interconnection networks as a communication layer for reconfigurable systems. ProRISC, 2003.Google Scholar
  11. TI04.
    Texas Instruments, http://focus.ti.com/docs/prod/folders/print/tms320c6204.html. TI TMS320C6204 DSP processor, March 2004.
  12. MedB.
    Mediabench multi-media application benchmark suite. http://www.cs.ucla.edu/∼leec/mediabench.
  13. Fei03.
    Y.Fei, S.Ravi, A.Raghunathan, and N.Jha. Energy estimation for extensible processors. IEEE Design and Test in Europe Conf. (DATE), 2003.Google Scholar
  14. Log04.
    M.Loghi, F.Angiolini, D.Bertozzi, L.Benini, and R.Zafalon. Analyzing on-chip communication in a mpsoc environment. IEEE Design and Test in Europe Conf. (DATE), 2004.Google Scholar
  15. Lam05.
    A.Lambrechts, P.Raghavan, A.Leroy, G.Talavera, T.Van der Aa, M.Jayapala, F.Catthoor, D.Verkest, G.Deconinck, H.Corporaal, F.Robert, J.Carrabina, “Power breakdown analysis for a heterogeneous NoC platform running a video application”, Proc. Intnl. Conf. on Applic.-Spec. Array Processors (ASAP), Samos, Greece, pp.179–184, July 2005.Google Scholar
  16. Rag08b.
    P.Raghavan, A.Lambrechts, J.Absar, , M.Jayapala, and F.Catthoor. COFFEE: COmpiler Framework For Energy-aware Expoloration. Proc. Intnl. Conf. on High-Perf. Emb. Arch. and Compilers (HIPEAC’08), Goteborg, Sweden, pp.193–208, Jan. 2008.Google Scholar
  17. Zal00a.
    J.Zalamea, J.Llosa, E.Ayguade, and M.Valero. Two-level hierarchical register file organization for vliw processors. Microarchitecture, 2000. MICRO-33. Proceedings. 33rd Annual IEEE/ACM Intnl. Symposium on, pages 137–146, 2000.Google Scholar
  18. Cat98b.
    F.Catthoor, S.Wuytack, E.De Greef, F.Balasa, L.Nachtergaele, and A.Vandecappelle. Custom Memory Management Methodology – Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Acad Publ. Boston, 1998.Google Scholar
  19. Cat02.
    F.Catthoor, K.Danckaert, C.Kulkarni, E.Brockmeyer, P.G.Kjeldsberg, T.Van Achteren, T.Omnes, “Data access and storage management for embedded programmable processors”, ISBN 0-7923-7689-7, Kluwer Acad. Publ., Boston, 2002.Google Scholar
  20. Pet03.
    P.Petrov and A.Orailoglu. Application-specific instruction memory customizations for power-efficient embedded processors. IEEE Design and Test, pages 18–25, 2003.Google Scholar
  21. Kad02.
    I.Kadayif and M.Kandemir. Instruction compression and encoding for low-power systems. IEEE Conf on ASIC/SOC, pages 301–305, 2002.Google Scholar
  22. Baj97.
    R.Bajwa, M.Hiraki, H.Kojima, D.J. Gorny, K.Nitta, A.Shridhar, K.Seki, and K.Sasaki. Instruction buffering to reduce power in processors for signal processing. IEEE Trans. on Very Large Scale Integration (VLSI) Systems, 5(4):417–424, December 1997.CrossRefGoogle Scholar
  23. Jay02a.
    M.Jayapala, F.Barat, T.Vander Aa, F.Catthoor, G.Deconinck, and H.Corporaal. Clustered l0 buffer organization for low energy embedded processors. Proc. of 1st Workshop on Application Specific Processors (WASP), held in conjunction with MICRO-35, November 2002.Google Scholar
  24. Jay05b.
    M. Jayapala, F. Barat, T. Vander Aa, F. Catthoor, H. Corporaal, and G. Deconinck. Clustered loop buffer organization for low energy VLIW embedded processors. IEEE Trans. on Computers, 54(6): 672–683, June 2005.CrossRefGoogle Scholar
  25. Ler06b.
    A.Leroy, “Optimizing the on-chip communication architecture of low power systems-on-chip in deep submicron technology”, Doctoral dissertation, U.L.Bruxelles, Belgium, Dec. 2006.Google Scholar
  26. Mur09.
    S.Murali. Designing Reliable and Efficient Networks on Chips. Lecture Notes in Electrical Engineering, Issue No. 34, Springer, 2009.Google Scholar
  27. Vij03.
    N.Vijaykrishnan, M.Kandemir, M.Irwin, H.Kim, W.Ye, and D.Duarte. Evaluating integrated hardware-software optimizations using a unified energy estimation framework. IEEE Trans. on Computers, 52(1):59–76, January 2003.CrossRefGoogle Scholar
  28. Vda05.
    T.Van der Aa. Low Energy Instruction Memory Exploration. PhD thesis, KULeuven, ESAT/ELECTA, 2005.Google Scholar
  29. Mei03a.
    B.Mei, S.Vernalde, D.Verkest, H.De Man, and R.Lauwereins. ADRES: An architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix. Proc. IEEE Conf. on Field-Programmable Logic and its Applications (FPL), pages 61–70, Lisbon, Portugal, September 2003.Google Scholar
  30. OdB03.
    P.Op de Beeck, C.Ghez, E.Brockmeyer, M.Miranda, F.Catthoor, and G.Deconinck. Background data organisation for the low-power implementation in real-time of a digital audio broadcast receiver on a simd processor. DATE ’03: Proceedings of the conference on Design, Automation and Test in Europe, page 11144, Washington, DC, USA, 2003. IEEE Computer Society.Google Scholar
  31. TI09c.
    Texas Instruments, Inc, http://focus.ti.com/dsp/docs/dsphome.tsp?sectionId=46 OMAP and Da Vinci DSP devices, 2009.
  32. Bar05b.
    M.Baron. Cortex a8:high speed, low power. Microprocessor Report, October 2005.Google Scholar
  33. Ben01.
    L.Benini, D.Bruni, M.Chinosi, C.Silvano, V.Zaccaria, and R.Zafalon. A power modeling and estimation framework for vliw-based embedded systems. PATMOS Intnl. Symposium, 2001.Google Scholar
  34. Por06.
    T.Portero, G.Talavera, F.Catthoor, J.Carrabina, “A study of a MPEG-4 codec in a Multiprocessor platform”, Proc. Intnl. Symp. on Industrial Elec.(ISTE), Montreal, Canada, pp.661–666, July 2006.Google Scholar
  35. OdB01.
    P.Op de Beeck, F.Barat, M.Jayapala, and R.Lauwereins. CRISP: A template for reconfigurable instruction set processors. Proc. of Intnl. conference on Field Programmable Logic (FPL), August 2001.Google Scholar
  36. Tri99.
    Trimaran 2.0: An Infrastructure for Research in Instruction-Level Parallelism. http://www.trimaran.org, 1999.
  37. Kat00.
    V.Kathail, M.Schlansker, and B.Rau. Hpl-pd architecture specification: Version 1.1. Technical Report HPL-93-80 (R.1), HP Research Labs, USA, 2000.Google Scholar
  38. Zyu98.
    V.Zyuban and P.Kogge. The energy complexity of register files. Intnl. Symposium on Low-Power Electronics and Design, pages 305–310, 1998.Google Scholar
  39. Ben02.
    L.Benini, D.Bruni, M.Chinosi, C.Silvano, and V.Zaccaria. A power modeling and estimation framework for vliw-based embedded system. ST Journal of System Research, 3(1):110–118, April 2002.Google Scholar
  40. Mei02.
    B.Mei, S.Vernalde, D.Verkest, H.De Man, and R.Lauwereins. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. Proc. of Intnl. Conf. on Field Programmable Technology, pages 166–173, 2002.Google Scholar
  41. Rag08a.
    Praveen Raghavan, Murali Jayapala, Francky Catthoor, Absar Javed, and Andy Lambrechts. Method and system for automated code conversion. Granted Patent US 2008/0263530 A1, Oct. 2008.Google Scholar
  42. DeM05.
    H.DeMan. Ambient intelligence: Giga-scale dreams and nano-scale realities. Proc. of ISSCC, Keynote Speech, February 2005.Google Scholar
  43. Syl99.
    D.Sylvester and K.Keutzer. Getting to the bottom of deep submicron ii: a global wiring paradigm. ISPD ’99: Proceedings of the 1999 international symposium on Physical design, pages 193–200, New York, NY, USA, 1999. ACM.Google Scholar
  44. ITR07.
    ITRS. Intnl. techology roadmap for semiconductors 2007 edition: Interconnect. Technical report, ITRS, http://www.itrs.net/Links/2007ITRS/2007∖_Chapters/2007∖_Interconnect.pdf, 2007.
  45. Rag09b.
    P.Raghavan, “Low energy VLIW architecture extensions and compiler plug-ins for embedded systems”, Doctoral dissertation, ESAT/EE Dept., K.U.Leuven, Belgium, June 2009.Google Scholar
  46. Li09.
    M.Li, D.Novo, B.Bougard, T.Carlson, L.Van der Perre, and F.Catthoor. Generic multi-phase software pipelined partial fft on instruction level parallel architectures. IEEE Trans. on Signal Processing, Vol.SP-57, No.4, pp.1604–1615, April 2009.Google Scholar
  47. Abs07.
    J.Absar. Locality Optimization in a Compiler for Embedded Systems. PhD thesis, KULeuven, July 2007.Google Scholar
  48. Jos06.
    M.Joshi, NS. Nagaraj, and A.Hill. Impact of interconnect scaling and process variations on performance. Proc. of CMOS Emerging Technologies, 2006.Google Scholar
  49. Amr00.
    B.Amrutur and M.Horowitz. Speed and power scaling of SRAM’s. IEEE Journal of Solid-State Circuits, Vol.35, February 2000.Google Scholar
  50. Eva95.
    P.Evans, R.Franzon. Energy consumption modeling and optimization for SRAM’s. IEEE Journal of Solid-State Circuits, Vol.30, pages 571–579, May 1995.Google Scholar
  51. Rag09c.
    P.Raghavan, A.Lambrechts, M.Jayapala, F.Catthoor, D.Verkest, “Distributed loop controller for multi-threading in uni-threaded ILP architectures”, IEEE Trans. on Computers, Vol.58, No.3, pp.311–321, March 2009.CrossRefGoogle Scholar
  52. Jay05a.
    M.Jayapala. Low Energy Instruction Memory Organization. Doctoral dissertation, ESAT/EE Dept., K.U.Leuven, Belgium, Sep. 2005.Google Scholar
  53. TI00.
    Texas Instruments, Inc, http://www.ti.com. TMS320C6000 CPU and Instruction Set Reference Guide, October 2000.
  54. VdW05.
    J.van de Waerdt, S.Vassiliadis, S.Das, S.Mirolo, C.Yen, B.Zhong, C.Basto, J.van Itegem, D.Amirtharaj, K.Kalra, P.odriguez, and H.van Antwerpen. The tm3270 media-processor. MICRO ’05: Proceedings of the 38th Annual IEEE/ACM Intnl. Symposium on Microarchitecture (MICRO’05), pages 331–342, Washington, DC, USA, 2005. IEEE Computer Society.Google Scholar
  55. Wie01.
    O.Wiess, M.Gansen, and T.Noll. A flexible datapath generator for physical oriented design. Proc. of ESSCIRC, pages 408–411, Sep 2001.Google Scholar
  56. Mir96.
    M.Miranda, F.Catthoor, M.Janssen, and H.De Man. Adopt: Efficient hardware address generation in distributed memory architectures. Proc. 9th ACM/IEEE Intnl. Symp. on System-Level Synthesis (ISSS), La Jolla CA, pp.20–25, Nov. 1996.Google Scholar
  57. Kim05.
    H.Kim and H.Oh. A low-power dsp-enhanced 32-bit eisc processor. Proc. of HiPEAC, pages 302–316, 2005.Google Scholar
  58. Par99.
    K.Parhi and T.Nishitani. Digital Signal Processing for Multimedia Systems. CRC Publications, 1999.Google Scholar
  59. Ver98.
    I. Verbauwhede and M. Touriguian. A low power dsp engine for wireless communications. J. VLSI Signal Process. Syst., 18(2): 177–186, 1998.CrossRefGoogle Scholar
  60. DPG05.
    RWTH Aachen – University of Technology, http://www.eecs.rwth-aachen.de/dpg/info.html. DPG User Manual Version 2.8, October 2005.
  61. Ryu01.
    K.Ryu, E.Shin, and V.Mooney. A comparison of five different multiprocessor soc bus architectures. Proc. of the EUROMICRO Symposium on Digital Systems Design (EUROMICRO’01), pages 202–209, 2001.Google Scholar
  62. Ye02.
    T.Ye, L.Benini, and G.De Micheli. Analysis of power consumption on switch fabrics in network routers. Proc. of DAC, 2002.Google Scholar
  63. Ram05.
    A.Ramachandran and M.Jacome. Energy-delay efficient data memory subsystems. IEEE Signal Processing Magazine, pages 23–37, May 2005.Google Scholar
  64. Xu04.
    J.Xu, W.Wolf, J.Henkel, and S.Chakradhar. A case study in power optimization of networks-on-chip. Proc. of DAC, 2004.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • Francky Catthoor
    • 1
  • Praveen Raghavan
    • 1
  • Andy Lambrechts
    • 1
  • Murali Jayapala
    • 1
  • Angeliki Kritikakou
    • 2
  • Javed Absar
    • 3
  1. 1.Interuniversity MicroElectronics Center IMECLeuvenBelgium
  2. 2.VLSI Design LabUniv. PatrasPatrasGreece
  3. 3.Samsung India Software Operations Pvt. LtdBangaloreIndia

Personalised recommendations