Abstract
Resource usage in embedded system platforms depends on application workload characteristics, desired quality of service and environmental conditions. In general, system workload is highly non-stationary due to the heterogeneous nature of information content. Quality of service depends on user requirements, which may change over time. In addition, both can be affected by environmental conditions such as network congestion and wireless link quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Acquaviva, A., Benini, L., Riccò, B.: Software Controlled Processor Speed- Setting for Low-Power Streaming Multimedia. Transaction on CAD (November 2001)
Acquaviva, A., Simunic, T., Deolalikar, V., Roy, S.: Remote Power Control of Wireless Network Interfaces. In: Proceedings of PATMOS, Turin (September 2003)
Benini, L., Bogliolo, A., De Micheli, G.: A survey of design techniques for system-level dynamic power management. IEEE Trans. on VLSI Systems 8(3), 299–316 (2000)
Yuan, W., Nahrstedt, K.: A Middleware Framework Coordinating Processor/ Power Resource Management for Multimedia Applications. In: Proceedings of IEEE GLOBECOM (November 2001)
Lu, Y.H., Benini, L., De Micheli, G.: Dynamic Frequency Scaling with Buffer Insertion for Mixed Workloads. IEEE Transaction on CAD (November 2002)
Zhao, J., Chandramouli, R., Vijaykrishnan, N., Irwin, M.J., Kang, B., Somasundaram, S.: Influence of MPEG-4 Parameters on System Energy. Proceedings of IEEE ASIC/SOC (2002)
Chung, E.Y., Benini, L., De Micheli, G.: Contents Provider-Assisted Dynamic Voltage Scaling for Low Energy Multimedia Applications. Proceedings of IEEE ISLPED (August 2002)
Delaney, B., Jayant, N., Hans, M., Simunic, T., Acquaviva, A.: A low-power, fixed-point, front-end feature extraction for a distributed speech recognition system. In: IEEE Proceedings of ICASSP (May 2002)
Sinha, A., Wang, A., Chandrakasan, A.: Energy Scalable System Design. IEEE Transaction on VLSI 10(2) (April 2002)
He, Z.L., Chan, K.K., Tsui, C.Y., Liou, M.L.: Low-Power Motion Estimation Design Using Adaptive Pixel Truncation. IEEE Proceedings of ISLPED (1997)
He, Z.L., Liou, M.L.: Reducing Hardware Complexity of Motion Estimation Algorithms using Truncated Pixels. In: Proceedings of IEEE ISCAS (June 1997)
Yuan, W., Nahrstedt, K., Kim, K.: R-EDF: A Reservation Based EDF Scheduling Algorithm for Multiple Multimedia Task Classes. In: IEEE Real-Time Technology and Applications Symposium (May 2001)
Kumar, P., Srivastava, M.: Power Aware Multimedia Systems using Run-Time Prediction. In: Proceedings of IEEE VLSI Design (January 2001)
Yavatkar, R., Laksman, K.: A CPU Scheduling Algorithm for Continuous Media Applications. In: Workshop on Network and OS Support for Digital Audio and Video (April 1995)
Gatti, F., Acquaviva, A., Benini, L., Riccò, B.: Power Control Techniques for TFT LCD Displays. In: Procedings of ACM CASES, Grenoble (2002)
Gruian, F.: Energy-Centric Scheduling for Real-Time Systems, Doctoral Dissertation, Lund University, Faculty of technology (2002)
Min, R., Chandrakasan, A.: A Framework for Energy-Scalable Communication in High Density Wireless Networks. In: Proceedings of IEEE ISLPED (August 2002)
Sinha, A., Wang, A., Chandrakasan, A.: Algorithmic Transforms for Efficient Energy Scalable Computation. In: Proceedings of IEEE ISLPED (August 2000)
Simunic, T., Benini, L., Acquaviva, A., Glynn, P., de Micheli, G.: Dynamic voltage scaling and power management for portable systems. In: IEEE Proceedings of DAC (June 2001)
Chandrasena, L.H., Liebelt, M.J.: A comprehensive analysis of energy savings in dynamic supply voltage scaling systems using data dependent voltage level selection. In: Proceedings of IEEE International Conference on Multimedia and Expo (July-August 2000)
Pouwelse, J., Langendoen, K., Sips, H.: Energy priority scheduling for variable voltage processors. In: IEEE Proceedings of ISLPED (August 2001)
Chandrakasan, A.P., Sheng, S., Brodersen, R.W.: Low Power CMOS Digital Design. IEEE Journal of Solid State Circuits 27(4) (April 1992)
Choi, I., Shim, H., Chang, N.: Low-Power Color TFT LCD Display for Hand- Held Embedded Systems. In: IEEE Proceedings of ISLPED (August 2002)
Qu, G., Potkonjak, M.: Energy minimization with guaranteed quality of service. In: Proceedigns of IEEE ISLPED (July 2000)
Abu-Sufah, W., Kuck, D., Lawrie, D.: On the performance enhancement of paging systems through program analysis and transformations. IEEE Trans. on Computers C-30(5), 341–355 (1981)
Albenosi, D.H.: Selective cache ways: On-demand cache resource allocation. Journal of Instruction-Level Paralleism 2, 1–6 (2000)
Albera, G., Bahar, I.: Power/performance advantages of victim bu®er in highperformance processors. In: IEEE Alessandro Volta Memorial Intnl. Wsh. on Low Power Design (VOLTA), Como, Italy, March 1999, pp. 43–51 (1999)
Allen, R., Kennedy, K.: Vector register allocation. IEEE Trans. on Computers 41(10), 1290–1316 (1992)
Amarasinghe, S., Anderson, J., Lam, M., Tseng, C.: The SUIF compiler for scalable parallel machines. In: Proc. of the 7th SIAM Conf. on Parallel Proc. for Scientific Computing (1995)
Ancourt, C., Barthou, D., Guettier, C., Irigoin, F., Jeannet, B., Jourdan, J., Mattioli, J.: Automatic data mapping of signal processing applications. In: Proc. Intnl. Conf. on Applic.-Spec. Array Processors, Zurich, Switzerland, July 1997, pp. 350–362 (1997)
Bajwa, R.S., Hiraki, M., Kojima, H., Gorny, D.J., Nitta, K., Shridhar, A., Seki, K., Sasaki, K.: Instruction bu®ering to reduce power in processors for signal processing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 5(4), 417–424 (1997)
Banerjee, P., Chandy, J., Gupta, M., Hodges, E., Holm, J., Lain, A., Palermo, D., Ramaswamy, S., Su, E.: The Paradigm compiler for distributed-memory multicomputers. IEEE Computer Magazine 28(10), 37–47 (1995)
Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D.: Automatic program parallelisation. Proc. of the IEEE, invited paper 81(2), 211–243 (1993)
Bartolini, S., Prete, C.A.: A software strategy to improve cache performance. IEEE TC on Computer Architecture Newsletter, 35–40 (January 2001)
Belady, L.A.: A study of replacement algorithms for a virtual-storage computer. IBM Systems J. 5(6), 78–101 (1966)
Bellas, N., Hajj, I., Polychronopoulos, C., Stamoulis, G.: Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors. In: Proc of International Symposium on Low Power Electronic Design (ISLPED) (August 1998)
Benini, L., Macii, A., Macii, E., Poncino, M.: Selective instruction compression for memory energy reduction in embedded systems. In: Proc of International Symposium on Low Power Electronic Design (ISLPED) (August 1999)
Benini, L., Macii, A., Nannarelli, A.: Cached-code compression for energy minimization in embedded processors. In: Proc of Interational Symposium on Low Power Electronic Design (ISLPED) (August 2001)
Benini, L., Bruni, D., Chinosi, M., Silvano, C., Zaccaria, V.: A Power Modeling and Estimation Framework for VLIW-Based Embedded System. ST Journal of System Research 3(1), 110–118 (2002)
Benini, L., De Micheli, G.: System-level power optimization techniques and tools. ACM Trans. on Design Automation for Embedded Systems (TODAES) 5(2), 115–192 (2000)
Besdéz, A., Ferenc, R., Gyimttthy, T., Dolenc, A., Karsisto, K.: Survey of codesize reduction methods. ACM Computing Surveys (CSUR) 35(3), 223–267 (2003)
Bhattacharyya, S., Murthy, P., Lee, E.: Optimal parenthesization of lexical orderings for DSP block diagrams. In: IEEE Wsh. on VLSI signal processing, Osaka, Japan (October 1995); Kuroda, I., Nishitani, T. (eds.) Also in VLSI Signal Processing VIII, pp. 177–186. IEEE Press, New York (1995)
Bodin, F., Jalby, W., Windheiser, D., Eisenbeis, C.: A quantitative algorithm for data locality optimization, Technical Report IRISA/INRIA, Rennes, France (1992)
Bona, A., Sami, M., Sciuto, D., Zaccaria, V., Silvano, C., Zafalon, R.: An instruction-level methodology for power estimation and optimization of embedded vliw cores. In: Proc. of Design Automation and Test in Europe (DATE) (March 2002)
Calder, B., Grunwald, D., Emer, J.: Predictive sequential associative cache. In: Proc. of 2nd International Symposium on High Performance Computer Architecture (HPCA) (February 1996)
Catthoor, F.: Energy-delay e±cient data storage and transfer architectures and methodologies: current solutions and remaining problems. In: Smailagic, A., Brodersen, R., De Man, H. (eds.) J. of VLSI Signal Processing (special issue on “IEEE CS Annual Wsh. on VLSI”), July 1999, vol. 21(3), pp. 219–232. Kluwer, Boston (1999)
Catthoor, F., Wuytack, S., De Greef, E., Balasa, F., Nachtergaele, L., Vandecappelle, A.: Custom Memory Management Methodology – Exploration of Memory Organisation for Embedded Multimedia System Design. Kluwer Acad. Publ., Boston (1998) ISBN 0-7923-8288-9
Catthoor, F., Danckaert, K., Kulkarni, C., Brockmeyer, E., Kjeldsberg, P.G., Van Achteren, T., Omnes, T.: Data Access and Storage Management for Embedded Programmable Processors. Kluwer Acad. Publ., Boston (2002) ISBN 0-7923-7689-7
Centoducatte, P., Araujo, G., Pannain, R.: Compressed code execution on dsp architectures. In: Proc. of International Symposium on System Synthesis (ISSS) (November 1999)
Chen, S., Postula, A.: Synthesis of custom interleaved memory systems. IEEETrans. on VLSI Systems 8(1), 74–83 (2000)
Cheng, W.C., Pedram, M.: Power-aware bus encoding techniques for i/o and data busses in an embedded system. Journal of Circuits, Systems, and Computers 11(4), 351–364 (2002)
Chin, W., Darlington, J., Guo, Y.: Parallelizing conditional recurrences. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 579–586. Springer, Heidelberg (1996)
Cierniak, M., Li, W.: Unifying Data and Control Transformations for Distributed Shared-Memory Machines. In: Proc. of the SIGPLAN 1995 Conf. on Programming Language Design and Implementation, La Jolla, February 1995, pp. 205–217 (1995)
Conte, T.M., Banerjia, S., Larin, S.Y., Menezes, K.N.: Instruction fetch mechanisms for vliw architectures with compressed encodings. In: Proc. of 29th International Symposium on Microarchitecture (MICRO) (December 1996)
Cotterell, S., Vahid, F.: Synthesis of customized loop caches for core-based embedded systems. In: Proc. of International Conference on Computer Aided Design (ICCAD) (November 2002)
Cotterell, S., Vahid, F.: Tuning of loop cache architectures to programs in embedded system design. In: Proc. of International Symposium on System Synthesis (ISSS) (October 2002)
Danckaert, K., Catthoor, F., De Man, H.: A preprocessing step for global loop transformations for data transfer and storage optimization. In: Proc. Intnl. Conf. on Compilers, Arch. and Synth. for Emb. Sys., San Jose CA, November 2000, pp. 34–40 (2000)
Darte, A., Risset, T., Robert, Y.: Loop nest scheduling and transformations. In: Dongarra, J.J., et al. (eds.) Environments and Tools for Parallel Scientific Computing. Advances in Parallel Computing, vol. 6, pp. 309–332. North Holland, Amsterdam (1993)
Debray, S., Evans, W., Muth, R., Sutter, B.D.: Compiler techniques for code compaction. ACM Transactions on Programming Languages and Systems (TOPLAS) 22(2), 378–415 (2000)
Dezan, C., Le Verge, H., Quinton, P., Saouter, Y.: The Alpha du CENTAUR experiment. In: Quinton, P., Robert, Y. (eds.) Algorithms and parallel VLSI architectures II, pp. 325–334. Elsevier, Amsterdam (1992)
Ding, C., Kennedy, K.: The memory bandwidth bottleneck and its amelioration by a compiler. In: Proc. Intnl. Parallel and Distr. Proc. Symp(IPDPS) in Cancun, Mexico, May 2000, pp. 181–189 (2000)
Doalla, R., Fraguela, B., Zapata, E.: Set associative cache behaviour optimization. In: Proc. EuroPar Conf., Toulouse, France, September 1999, pp. 229–238 (1999)
Eisenbeis, C., Jalby, W., Windheiser, D., Bodin, F.: A Strategy for Array Management in Local Memory. In: Proc. of the 4th Wsh. on Languages and Compilers for Parallel Computing (August 1991)
Fang, J.Z., Lu, M.: An iteration partition approach for cache or local memory thrashing on parallel processing. IEEE Trans. on Computers C-42(5), 529–546 (1993)
Feautrier, P.: Compiling for massively parallel architectures: a perspective. In: Moonen, M., Catthoor, F. (eds.) Intnl. Wsh. on Algorithms and Parallel VLSI Architectures, Leuven, Belgium, August 1994. Also in Algorithms and Parallel VLSI Architectures III, pp. 259–270. Elsevier, Amsterdam (1995)
Fraboulet, A., Huard, G., Mignotte, A.: Loop alignment for memory access optimisation. In: Proc. 12th ACM/IEEE Intnl. Symp. on System-Level Synthesis (ISSS), San Jose CA, December 1999, pp. 71–70 (1999)
Gannon, D., Jalby, W., Gallivan, K.: Strategies for cache and local memory management by global program transformations. J. of Parallel and Distributed Computing 5, 568–586 (1988)
Gordon-Ross, A., Vahid, F.: Dynamic loop caching meets preloaded loop caching – a hybrid approach. In: Proc. of International Conference on Computer Design (ICCD) (September 2002)
Gordon-Ross, A., Cotterell, S., Vahid, F.: Exploiting fixed programs in embedded systems: A loop cache example. In: Proc. of IEEE Computer Architecture Letters) (January 2002)
Grun, P., Dutt, N., Nicolau, A.: Memory aware compilation through accurate timing extraction. In: Proc. 37th ACM/IEEE Design Automation Conf., Los Angeles, CA, June 2000, pp. 316–321 (2000)
Grun, P., Dutt, N., Nicolau, A.: MIST: an algorithm for memory miss tra±c management. In: Proc. IEEE Intnl. Conf. on Comp. Aided Design, Santa Clara, CA, November 2000, pp. 431–437 (2000)
Halambi, A., Shrivastava, A., Biswas, P., Dutt, N., Nicolau, A.: An e±cient compiler technique for code size reduction using reduced bit-width isas. In: Proc of Design Automation Conference (DAC) (March 2002)
Hall, M., Anderson, J., Amarasinghe, S., Murphy, B., Liao, S., Bugnion, E., Lam, M.: Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer Magazine 30(12), 84–89 (1996)
Harmsze, F., Timmer, A., Meerbergen, J.v.: Memory arbitration and cache management in stream-based systems. In: Proc. 3rd ACM/IEEE Design and Test in Europe Conf (DATE), Paris, France, April 2000, pp. 257–262 (2000)
Inoue, K., Moshnyaga, V.G., Murakami, K.: A history-based i-cache for lowenergy multimedia applications. In: Proc. of ACM/IEEE International Symposium on Low Power Electronics (ISLPED) (August 2002)
Irwin, M.J., Kandemir, M., Vijaykrishnan, N., Sivasubramaniam, A.: A holistic approach to system level energy optimisation. In: Proc. IEEE Wsh. on Power and Timing Modeling, Optimization and Simulation (PATMOS), Goettingen, Germany, October 2000, pp. 88–107 (2000)
Irwin, J., May, M.D., Muller, H.L., Page, D.: Predictable instruction caching for media processors. In: Proc of Internation Conference on Application-Specific Systems, Architectures and processors (ASAP) (July 2002)
Ishihara, T., Yasuura, H.: A power reduction technique with object code merging for application specific embedded processors. In: Proc. of Design Automation and Test in Europe (DATE) (March 2000)
Jayapala, M., Barat, F., OpDeBeeck, P., Catthoor, F., Deconinck, G., Corporaal, H.: A low energy clustered instruction memory hierarchy for long instruction word processors. In: Proc. of 12th International Workshop on Power And Timing Modeling, Optimization and Simulation (PATMOS) (September 2002)
Jimenez, M., Llaberia, J., Fernandez, A., Morancho, E.: A unified transformation technique for multi-level blocking. In: Proc. EuroPar Conf., Lyon, France, August 1996. Lecture notes in computer science series, pp. 402–405. Springer, Heidelberg (1996)
Jouppi, N.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch bu®ers. In: Proc. ACM Intnl. Symp. on Computer Arch., May 1990, pp. 364–373 (1990)
Kamble, M., Ghose, K.: Analytical Energy Dissipation Models for Low Power Caches. In: Proc. IEEE Intnl. Symp. on Low Power Design, Monterey, CA, August 1997, pp. 143–148 (1997)
Kampe, M., Dahlgren, F.: Exploration of spatial locality on emerging applications and the consequences for cache performance. In: Proc. Intnl. Parallel and Distr. Proc. Symp(IPDPS), Cancun, Mexico, May 2000, pp. 163–170 (2000)
Kang, J., Werf, A.v.d., Lippens, P.: Mapping array communication onto FIFO communication – towards an implementation. In: Proc. 13th ACM/IEEE Intnl. Symp. on System-Level Synthesis (ISSS), Madrid, Spain, September 2000, pp. 207–213 (2000)
Kelly, W., Pugh, W.: Generating schedules and code within a unified reordering transformation framework, Technical Report UMIACS-TR-92-126, CS-TR- 2995, Institute for Advanced Computer Studies Dept. of Computer Science, Univ. of Maryland, College Park, MD 20742 (1992)
Kim, S., Vijaykrishnan, N., Kandemir, M., Sivasubramaniam, A., Irwin, M.J., Geethanjali, E.: Power-aware partitioned cache architectures. In: Proc. of ACM/IEEE International Symposium on Low Power Electronics (ISLPED) (August 2001)
Kim, S., Vijaykrishnan, N., Kandemir, M., Sivasubramaniam, A., Irwin, M.J.: Partitioned instruction cache architecture for energy e±ciency. In: ACM Transactions on Embedded Computing Systems(TECS) (July 2002)
Kin, J., Gupta, M., Mangione-Smith, W.H.: Filtering memory references to increase energy e±ciency. IEEE Transactions on Computers 49(1), 1–15 (January 2000)
Kjeldsberg, P.G.: Storage requirement estimation and optimisation for dataintensive applications. Doctoral dissertation, Norwegian Univ. of Science and Technology, Trondheim, Norway (March 2001)
Kolson, D., Nicolau, A., Dutt, N.: Minimization of memory tra±c in high-level synthesis. In: Proc. 31st ACM/IEEE Design Automation Conf., San Diego, CA, June 1994, pp. 149–154 (1994)
Lekatsas, H., Henkel, J., Wolf, W.: Code compression for low power embedded system design. In: Proc of Design Automation Conference (DAC) (June 2000)
Leung, S.T., Zahorjan, J.: Restructuring arrays for e±cient parallel loop execution, Technical Report, Dep. of CSE, Univ. of Washington (February 1994)
Li, W., Pingali, K.: A singular loop transformation framework based on nonsingular matrices. In: Proc. 5th Annual Wsh. on Languages and Compilers for Parallelism, New Haven CN (August 1992)
Li, W., Pingali, K.: Access normalization: loop restructuring for NUMA compilers. In: Proc. 5th Intnl. Conf. on Architectural Support for Prog. Lang. and Operating Systems (ASPLOS) (April 1992)
Lim, H.B., Yew, P.-C.: Efficient integration of compiler-directed cache coherence and data prefetching. In: Proc. Intnl. Parallel and Distr. Proc. Symp. (IPDPS) in Cancun, Mexico, May 2000, pp. 331–339 (2000)
Liveris, N., Zervas, N.D., Soudris, D., Goutis, C.E.: A code transformationbased methodology for improving i-cache performance of dsp applications (March 2002)
Loveman, D.B.: Program improvement by source-to-source transformation. J. of the ACM 24(1), 121–145 (1977)
Mahendale, M., Sherlekar, S.D., Venkatesh, G.: Extensions to programmable dsp architectures for reduced power dissipation. In: Proc. of VLSI Design (January 1998)
Manjiakian, N., Abdelrahman, T.: Array data layout for reduction of cache conflicts. In: Intnl. Conf. on Parallel and Distributed Computing Systems (1995)
McCrackin, D.: Eliminating interlocks in deeply pipelined processors by delay enforced multistreaming. IEEE Trans. on Computers C-40(10), 1125–1132 (1991)
McKinley, K.: A compiler optimization algorithm for shared-memory multiprocessors. IEEE Trans. on Parallel and Distributed Systems 9(8), 769–787 (1998)
McKinley, K., Hall, M., Harvey, T., Kennedy, K., McIntosh, N., Oldham, J., Paleczny, M., Roth, G.: Experiences using the ParaScope editor: an interactive parallel programming tool. In: 4th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, San Diego, USA (May 1993)
McKinley, K., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Trans. on Programming Languages and Systems 18(4), 424–453 (1996)
Middelhoek, P., Mekenkamp, G., Molenkamp, B., Krol, T.: A transformational approach to VHDL and CDFG based high-level synthesis: a case study. In: Proc. IEEE Custom Integrated Circuits Conf., Santa Clara, CA, May 1995, pp. 37–40 (1995)
Moon, S.M., Ebcioglu, K.: A study on the number of memory ports in multiple instruction issue machines. Micro’26, 49–58 (November 1993)
Murthy, P., Bhattacharyya, S.: A bu®er merging technique for reducing memory requirements of synchronous dataflow specifications. In: Proc. 12th ACM/IEEE Intnl. Symp. on System-Level Synthesis (ISSS), San Jose, CA, December 1999, pp. 78–84 (1999)
Nachtergaele, L., Tiwari, V., Dutt, N.: System and architecture-level power reduction of microprocessor-based communication and multi-media applications. In: Proc. IEEE Intnl. Conf. on Comp. Aided Design, Santa Clara, CA, November 2000, pp. 569–573 (2000)
Padua, D.A., Wolfe, M.J.: Advanced compiler optimizations for supercomputers. Communications of the ACM 29(12), 1184–1201 (1986)
Panda, P.R.: Memory optimizations and exploration for embedded systems. In: Doctoral Dissertation, U.C.Irvine (April 1998)
Panda, P.R.: Memory bank customization and assignment in behavioural synthesis. In: Proc. IEEE Intnl. Conf. Comp. Aided Design, Santa Clara CA, November 1999, pp. 477–481 (1999)
Panda, P.R., Nakamura, H., Dutt, N.D., Nicolau, A.: Augmenting loop tiling with data alignment for improved cache performance. IEEE Trans. on Computers 48(2), 142–149 (1999)
Panda, P.R., Dutt, N.D., Nicolau, A.: Data cache sizing for embedded processor applications. In: Proc. 1st ACM/IEEE Design and Test in Europe Conf (DATE), Paris, France, February 1998, pp. 925–926 (1998)
Panda, P.R., Dutt, N.D., Nicolau, A.: Local memory exploration and optimization in embedded systems. IEEE Trans. on Comp. aided Design CAD-18(1), 3–13 (1999)
Panda, P., Dutt, N.: Low power mapping of behavioural arrays to multiple memories. In: Proc. IEEE Intnl. Symp. on Low Power Design, Monterey, CA, August 1996, pp. 289–292 (1996)
Parameswaran, S., Henkel, J.: I-copes: Fast instruction code placement for embedded systems to improve performance and energy e±ciency. In: Proc. of Internation Conference on Computer Aided Design (ICCAD) (November 2001)
Parhi, K.: Algorithmic transformation techniques for concurrent processors. Proc. of the IEEE 77(12), 1879–1895 (1989)
Parikh, A., Kandemir, M., Vijaykrishnan, N., Irwin, M.J.: Instruction scheduling based on energy and performance constraints. In: Proc of IEEE Computer Society Annual Workshop on VLSI (WVLSI) (April 2000)
Passos, N., Sha, E.: Full parallelism of uniform nested loops by multi-dimensional retiming. Proc. Intnl. Conf. on Parallel Processing 2, 130–133 (1994)
Passos, N., Sha, E., Chao, L.-F.: Multi-dimensional interleaving for time-andmemory design optimization. In: Proc. IEEE Intnl. Conf. on Computer Design, Austin TX, pp.440-445 (October 1995)
Patterson, D., Hennessey, J.: Computer architecture: A quantitative approach. Morgan Kaufmann Publ., San Francisco (1996)
Powell, M.D., et al.: Reducing set-associative cache energy via way-prediction and selective direct-mapping. In: Proc. of 34th International Symposium on Microarchitecture (MICRO) (November 2001)
Ramanujam, J., Hong, J., Kandemir, M., Narayan, A.: Reducing memory requirements of nested loops for embedded systems. In: 38th ACM/IEEE Design Automation Conf., Las Vegas NV, June 2001, pp. 359–364 (2001)
Ravi, S., Lakshminarayana, G., Jha, N.: Removal of memory access bottlenecks for scheduling control-flow intensive behavioural descriptions. In: Proc. IEEE Intnl. Conf. Comp. Aided Design, Santa Clara CA, November 1998, pp. 577–584 (1998)
Saltz, J., Berrymann, H., Wu, J.: Multiprocessors and runtime compilation. In: Proc. Intnl. Wsh. on Compilers for Parallel Computers, Paris, France (1990)
Samsom, H., Claesen, L., De Man, H.: SynGuide: an environment for doing interactive correctness preserving transformations. In: Eggermont, L., Dewilde, P., Deprettere, E., van Meerbergen, J. (eds.) IEEE Wsh. on VLSI signal processing, Veldhoven, The Netherlands, October 1993. Also in VLSI Signal Processing VI, pp. 269–277. IEEE Press, New York (1993)
Shang, W., Hodzic, E., Chen, Z.: On uniformization of a±ne dependence algorithms. IEEE Trans. on Computers 45(7), 827–839 (1996)
Shang, W., O’Keefe, M., Fortes, J.: Generalized cycle shrinking, presented at Wsh. on “Algorithms and Parallel VLSI Architectures II”. In: Quinton, P., Robert, Y. (eds.) Also in Algorithms and parallel VLSI architectures II, Bonas, France, June 1991, pp. 131–144. Elsevier, Amsterdam (1991)
Shin, D., Kim, J.: An operation rearrangement technique for low power vliw instruction fetch. In: Proc of Workshop on Complexity-E®ective Design (2000)
Shiue, W.T., Chakrabarti, C.: Memory exploration for low power embedded systems. In: Proc. 36th ACM/IEEE Design Automation Conf., New Orleans, LA, June 1999, pp. 140–145 (1999)
Sias, J.W., Hunter, H.C., Mei, W., Hwu, W.: Enhancing loop bu®ering of media and telecommunications applications using low-overhead predication. In: Proc. of 34th Annual International Symposium on Microarchitecture (MICRO) (December 2001)
Steinke, S., Wehmeyer, L., Lee, B.-S., Marwedel, P.: Assigning program and data objects to scratchpad for energy reduction. In: Proc. of Design Automation and Test in Europe (DATE) (March 2002)
Tang, W., Gupta, R., Nicolau, A.: Design of a predictive filter cache for energy savings in high performance processor architectures. In: Proc. of Internal Conference on Computer Design (ICCD) (September 2001)
Tang, W., Gupta, R., Nicolau, A.: Power savings in embedded processors through decode filter cache. In: Proc. of Design Automation and Test in Europe (DATE) (March 2002)
Tang, W., Gupta, R., Nicolau, A.: Reducing power with an l0 instruction cache using history-based prediction. In: Proc. of Internal Workshop on Innovative Architecture for Future Generation High-Performance processors and Systems (IWIA) (January 2002)
Thiele, L.: On the design of piecewise regular processor arrays. In: Proc. IEEE Intnl. Symp. on Circuits and Systems, Portland OR, May 1989, pp. 2239–2242 (1989)
Thomas, D.E., Dirkes, E., Walker, R., Rajan, J., Nestor, J., Blackburn, R.: The system architect’s workbench. In: Proc. 25th ACM/IEEE Design Automation Conf., San Francisco CA, June 1988, pp. 337–343 (1988)
Torrie, E., Martonosi, M., Tseng, C.-W., Hall, M.: Characterizing the memory behaviour of compiler-parallelized applications. IEEE Trans. on Parallel and Distributed Systems 7(12), 1224–1236 (1996)
Truong, D.N., Bodin, F., Seznec, A.: Accurate data distribution into blocks may boost cache performance. IEEE TC on Computer Architecture Newsletter, special issue on “Interaction between Compilers and Computer Architectures”, 55–57 (June 1997)
Tzen, T., Ni, L.: Dependence uniformization: a loop parallelization technique. IEEE Trans. on Parallel and Distributed Systems 4(5), 547–557 (1993)
Vandecappelle, A., Miranda, M., Brockmeyer, E., Catthoor, F., Verkest, D.: Global Multimedia System Design Exploration using Accurate Memory Organization Feedback. In: Proc. 36th ACM/IEEE Design Automation Conf., New Orleans LA, June 1999, pp. 327–332 (1999)
Vander Aa, T., Jayapala, M., Barat, F., Deconinck, G., Lauwereins, R., Catthoor, F., Corporaal, H.: Instruction bu®ering exploration for low energy vliws with instruction clusters. In: Proc. of the Asian Pacific Design and Automation Conference 2004 (ASPDAC 2004), Yokohama, Japan (January 2004)
Vander Aa, T., Jayapala, M., Barat, F., Deconinck, G., Lauwereins, R., Corporaal, H., Catthoor, F.: Instruction bu®ering exploration for low energy embedded processors. In: Proc. of 13th International Workshop on Power And Timing Modeling, Optimization and Simulation (PATMOS) (September 2003)
Verhaegh, W., Aarts, E., Gorp, P.V.: Period assignment in multi-dimensional periodic scheduling. In: Proc. IEEE Intnl. Conf. Comp. Aided Design, Santa Clara CA, November 1998, pp. 585–592 (1998)
Wolf, M., Lam, M.: A data locality optimizing algorithm. In: Proc. of the SIGPLAN’ 1991 Conf. on Programming Language Design and Implementation, Toronto ON, Canada, June 1991, pp. 30–43 (1991)
Wolfe, M.: The Tiny loop restructuring tool. In: Proc. of Intnl. Conf. on Parallel Processing, pp. II.46-II.53 (1991)
Wong, D., Davis, E., Young, J.: A software approach to avoiding spatial cache collisions in parallel processor systems. IEEE Trans. on Parallel and Distributed Systems 9(6), 601–608 (1998)
Wuytack, S., Catthoor, F., Jong, G.D., Lin, B., De Man, H.: Flow Graph Balancing for Minimizing the Required Memory Bandwidth. In: Proc. 9th ACM/IEEE Intnl. Symp. on System-Level Synthesis, La Jolla CA, November 1996, pp. 127–132 (1996)
Wuytack, S., Catthoor, F., Jong, G.D., De Man, H.: Minimizing the Required Memory Bandwidth in VLSI System Realizations. IEEE Trans. on VLSI Systems 7(4), 433–441 (1999)
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bouyssounouse, B., Sifakis, J. (2005). Low Power Engineering. In: Embedded Systems Design. Lecture Notes in Computer Science, vol 3436. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31973-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-31973-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25107-1
Online ISBN: 978-3-540-31973-3
eBook Packages: Computer ScienceComputer Science (R0)