Overall Framework for Exploration
This chapter presents one of the core contributions of the book which also forms one of the main stepping stone for the rest of the book. It presents the compilation, simulation and energy estimation framework for modeling the large architecture design space of single core platforms for low power embedded systems. For multi-core platforms, the intra-core space can be reused but the inter-core data memory and communication network organisation should be added. The framework has been made to be consistent and complete with respect to the architecture space commonly used in embedded systems including possible extensions needed for future evolution. The chapter also presents some exploration results which are obtained using the framework. A few counter-intuitive trends which emerge from an exploration in this framework are also included.
KeywordsDesign Space Data Memory Design Space Exploration Instruction Memory Pareto Point
Unable to display preview. Download preview PDF.
- Coh05.A.Cohen, M.Sigler, S.Girbal, O.Temam, D.Parello, and N.Vasilache. Facilitating the search for compositions of program transformations. Proc. of ICS, pages 151–160, 2005.Google Scholar
- Tri99.Trimaran 2.0: An Infrastructure for Research in Instruction-Level Parallelism. http://www.trimaran.org, 1999.
- Ban02.R.Banakar, S.Steinke, B.Lee, M.Balakrishnan, and P.Marwedel. Scratchpad memory: design alternative for cache on-chip memory in embedded systems. Proc. of the 10th Intnl. Symposium on Hardware/software Codesign, CODES’02, pages 73–78, May 2002.Google Scholar
- Abs07.J.Absar. Locality Optimization in a Compiler for Embedded Systems. PhD thesis, KULeuven, July 2007.Google Scholar
- Abs08.J.Absar, P.Raghavan, A.Lambrechts, M.Li, M.Jayapala, and F.Catthoor. Locality optimizations in a compiler for wireless applications. Design Automation of Embedded Systems (DAEM), April 2008.Google Scholar
- Gor02a.A.Gordon-Ross, S.Cotterell, and F.Vahid. Exploiting fixed programs in embedded systems: A loop cache example. Proc. of IEEE Computer Architecture Letters, Jan 2002.Google Scholar
- Jay05b.M. Jayapala, F. Barat, T. Vander Aa, F.Catthoor, H.Corporaal, and G.Deconinck. Clustered loop buffer organization for low energy VLIW embedded processors. IEEE Trans. on Computers, 54(6):672–683, June 2005.Google Scholar
- Jay05a.M.Jayapala. Low Energy Instruction Memory Organization. Doctoral dissertation, ESAT/EE Dept., K.U.Leuven, Belgium, Sep. 2005.Google Scholar
- Kob07b.Y.Kobayashi. Low Power Design Method for Embedded Systems Using VLIW Processor. PhD thesis, Graduate School of Inforamation Science and Technology at Osaka University, July 2007.Google Scholar
- Vda04a.T.Vander Aa, M.Jayapala, F.Barat, G.Deconinck, R.Lauwereins, F.Catthoor, and H.Corporaal. Instruction buffering exploration for low energy vliws with instruction clusters. Proc. of the Asian Pacific Design and Automation Conf. 2004 (ASPDAC’2004), Yokohama, Japan, January 2004.Google Scholar
- Bro00a.E.Brockmeyer, C.Ghez, W.Baetens, and F.Catthoor. Unified low-power design flow for data-dominated multi-media and telecom applications. Kluwer Acad Publ. Boston, 2000.Google Scholar
- Cat98b.F.Catthoor, S.Wuytack, E.De Greef, F.Balasa, L.Nachtergaele, and A.Vandecappelle. Custom Memory Management Methodology – Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Acad Publ. Boston, 1998.Google Scholar
- Kob07.Y.Kobayashi, M.Jayapala, PS.Raghavan, F.Catthoor, M.Imai. “Methodology for operation scheduling and L0 cluster generation for low energy heterogeneous VLIW processors”. ACM Trans. on Design Automation for Embedded Systems (TODAES), Vol.12, No.4, Article 41, pp.1–28, Sep. 2007.Google Scholar
- Tan08.I.Taniguchi. Systematic Architecture Exploration Method for Low Energy and High Performance Reconfigurable Processors. PhD thesis, Osaka University, Osaka, Japan, Dec. 2009.Google Scholar
- Tan09.I.Taniguchi, M.Jayapala, P.Raghavan, F.Catthoor, K.Sakanushi, Y.Takeuchi, M.Imai. “Systematic Architecture Exploration based on Optimistic Cycle Estimation for Low Energy Embedded Processors”. 14th Proc. IEEE Asia and South Pacific Design Autom. Conf.(ASPDAC), Yokahama, Japan, pp.449–454, Jan. 2009.Google Scholar
- Bar05b.M.Baron. Cortex a8:high speed, low power. Microprocessor Report, October 2005.Google Scholar
- Rag09b.P.Raghavan, “Low energy VLIW architecture extensions and compiler plug-ins for embedded systems”, Doctoral dissertation, ESAT/EE Dept., K.U.Leuven, Belgium, June 2009.Google Scholar
- Sch07.T.Schuster, B.Bougard, P.Raghavan, R.Priewasser, D.Novo, L.Vanderperre, and F.Catthoor. Design of a low power pre-synchronization asip for multimode sdr terminals. Proc. of SAMOS, 2007.Google Scholar
- Dal04.W.Dally, U.Kapasi, B.Khailany, J.Ahn,, and A.Das. Stream processors: Programmability with efficiency. ACM Queue, 2(1), March 2004.Google Scholar
- Cat02.F.Catthoor, K.Danckaert, C.Kulkarni, E.Brockmeyer, P.G.Kjeldsberg, T.Van Achteren, T.Omnes, “Data access and storage management for embedded programmable processors”, ISBN 0-7923-7689-7, Kluwer Acad. Publ., Boston, 2002.Google Scholar
- Kje01.P.Kjeldsberg. Storage requirement estimation and optimisation for data-intensive applications. PhD thesis, Norwegian Univ. of Science and Technology, Trondheim, Norway, March 2001.Google Scholar
- Hu07.Q.Hu. Hierarchical memory size estimation for loop transformation and data memory platform exploration. PhD thesis, N.Technical Univ. Norway, Trondheim, April 2007.Google Scholar
- Gir06.S.Girbal, N.Vasilache, C.Bastoul, A.Cohen, D.Parello, M.Sigler, and O.Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Intnl. Journal of Parallel Programming, pages 261–317, October 2006.Google Scholar
- Far07.Faraday Technology Corporation, http://www.faraday-tech.com. Faraday UMC 90nm RVT Standard Cell Library, 2007.
- Syn06a.Synopsys, Inc. Design Compiler User Guide, 2006.Google Scholar
- Cad06.Cadence, Inc. Cadence SoC Encounter User Guide, 2006.Google Scholar
- Syn06b.Synopsys, Inc. Prime Power User Guide, 2006.Google Scholar
- ARM.ARM, http://www.arm.com/products/physicalip/memory.html. Artisan Memory Generator.
- Cos07.S.Cosemans, W.Dehaene, and F.Catthoor. A low power embedded sram for wireless applications. IEEE J. of Solid-state Circ., Vol.SC-42, pages 1607–1617, July 2007.Google Scholar
- Cos08.S.Cosemans, W.Dehaene, and F.Catthoor. A 3.6pj/access 480mhz, 128kbit on-chip sram with 850mhz boost mode in 90nm cmos with tunable sense amplifiers to cope with variability. Proc. 34th Europ. Solid-State Circuits Conf., ESSCIRC, Edinburgh, UK, pp.278–281, Sep. 2008.Google Scholar
- DPG05.RWTH Aachen – University of Technology, http://www.eecs.rwth-aachen.de/dpg/info.html. DPG User Manual Version 2.8, October 2005.
- Gem02.T.Gemmeke, M.Gansen, and T.G. Noll. Implementation of scalable power and area efficient high-throughput viterbi decoders. IEEE Journal of Solid-State Circuits, volume 37(7), July 2002.Google Scholar
- Sui01.SUIF2 Compiler System. http://suif.stanford.edu, 2001.
- Aus02.T.Austin, E.Larson, and D.Ernst. Simplescalar: an infrastructure for computer system modeling. IEE Computer Magazine, 35(2):59–67, 2002.Google Scholar
- Asc03.G.Ascia, V.Catania, M.Palesi, and D.Patti. Epic-explorer: A parameterized vliw-based platform framework for design space exploration. Proc. of ESTIMedia, pages 3–4, 2003.Google Scholar
- Lib02.LSF: Liberty Simulation Framework 1.0. http://liberty.princeton.edu/Software/LSE, 2002.
- Tar08.Target, http://www.retarget.com. IP Designer, 2008.
- CoW08a.CoWare Inc., www.coware.com/products/processordesigner.php. CoWare Processor Designer, 2008.
- Gon02.R.Gonzalez. Xtensa: A configurable and extensible processor. IEEE Micro, volume 20(2), 2002.Google Scholar
- Rix00a.S. Rixner, W. Dally, B. Khialany, P.Mattson, U.Kapasi, and J.Owens. Register organization for media processing. Proc. of 26th Intnl. Symposium on High-Performance Computer Architecture (HiPC), pages 375–386, January 2000.Google Scholar
- Gan07.A. Gangawar, M. Balakrishnan, and A. Kumar. Impact of intercluster communication mechanisms on ilp in clustered VLIW architectures. ACM TODAES, Vol.12, No.1, pages 1–29, Jan. 2007.Google Scholar
- Sch04.M.Schneider, H.Blume, and T.Noll. Power estimation on functional level for programmable processors. Advances in Radio Science, 2:215–219, May 2004.Google Scholar
- Syn08.Synfora Inc., http://www.synfora.com. PICO Express, 2008.
- Ben02.L.Benini, D.Bruni, M.Chinosi, C.Silvano, and V.Zaccaria. A power modeling and estimation framework for vliw-based embedded system. ST Journal of System Research, 3(1):110–118, April 2002.Google Scholar
- Tiw94.V.Tiwari, S.Malik, and A.Wolfe. Power analysis of embedded software: a first step towards software power minimization. IEEE Trans. on Very Large Scale Integration (VLSI) Systems, 2(4):437–445, 1994.Google Scholar
- Cha00.N.Chang, K.Kim, and H.Lee. Cycle-accurate energy consumption measurement and analysis: case study of arm7tdmi. ISLPED ’00: Proceedings of the 2000 international symposium on Low power electronics and design, pages 185–190, 2000.Google Scholar
- Sin01.A.Sinha and A.Chandrakasan. Jouletrack - a web based tool for software energy profiling. Proc. of Design Automation Conf. (DAC), June 2001.Google Scholar
- Pon02.D.Ponomarev, G.Kucuk, and K.Ghose. Accupower: An accurate power estimation tool for superscalar microprocessors. Proc. of DATE, pages 124–130, 2002.Google Scholar
- Ye00.W.Ye, N.Vijaykrishnan, M.Kandemir, and M.Irwin. The design and use of simplepower: a cycle-accurate energy estimation tool. Proc. of Design Automation Conf., pages 340–345, 2000.Google Scholar