Energy Optimization of Cache Hierarchy in Multicore Real-Time Systems
Computation using single-core processors has hit the power wall on its way of performance improvement. Chip multiprocessor (CMP) architectures, which integrates multiple processing units on a single integrated circuit (IC), have been widely adopted by major vendors like Intel, AMD, IBM and ARM in both general-purpose computers (e.g., ) and embedded systems (e.g., [2, 83]). Multicore processors are able to run multiple threads in parallel at lower power dissipation per unit of performance. Despite the inherent advantages, energy conservation is still a primary concern in multicore system optimization. While power consumption is a key concern in designing any computing devices, energy efficiency is especially critical for embedded systems. Real-time systems that run applications with timing constraints require unique considerations. Due to the ever growing demands for parallel computing, real-time systems commonly employ multicore processors nowadays [140, 148].
- 2.ARM. ARM11MPCore Processor. http://www.arm.com/, 2007.
- 15.B. Bui, M. Caccamo, L. Sha, and J. Martinez. Impact of cache partitioning on multi-tasking real time embedded systems. In Proceedings of International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pages 101–110, 2008.Google Scholar
- 40.M. Guthaus, J. Ringenberg, D.Ernest, T. Austin, T. Mudge, and R. Brown. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of International Workshop on Workload Characterization (WWC), pages 3–14, 2001.Google Scholar
- 48.Intel. Core i7 processor. http://www.intel.com/, 2008.
- 62.D. Kaseridis, J. Stuecheli, and L. John. Bank-aware dynamic cache partitioning for multicore architectures. In Proceedings of International Conference on Parallel Processing (ICPP), pages 18–25, 2009.Google Scholar
- 63.S. Kaxiras, Z. Hu, and M. Martonosi. Cache decay: exploiting generational behavior to reduce cache leakage power. In Proceedings of International Symposium on Computer architecture (ISCA), pages 240–251, 2001.Google Scholar
- 67.S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of International Conference on Parallel Architecture and Compilation Techniques (PACT), pages 111–122, 2004.Google Scholar
- 69.A. KleinOsowski and D. Lilja. Minnespec: A new SPEC benchmark workload for simulation-based computer architecture research. IEEE Computer Architecture Letters, 1(1):7, 2002.Google Scholar
- 74.J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceedings of International Symposium on High-Performance Computer Architecture (HPCA), pages 367–378, 2008.Google Scholar
- 83.MIPS. MIPS32 1004K. http://www.mips.com/, 2008.
- 93.M. Powell, S.-H. Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar. Gated-vdd: a circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), pages 90–95, 2000.Google Scholar
- 97.G. Quan and X. S. Hu. Energy efficient DVS schedule for fixed-priority real-time systems. ACM Transactions on Design Automation of Electronic Systems, 6:1–30, 2007.Google Scholar
- 98.M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of International Symposium on Microarchitecture (Micro), pages 423–432, 2006.Google Scholar
- 100.R. Reddy and P. Petrov. Eliminating inter-process cache interference through cache reconfigurability for real-time and low-power embedded multi-tasking systems. In Proceedings of International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), pages 198–207, 2007.Google Scholar
- 106.A. Settle, D. Connors, and E. Gibert. A dynamically reconfigurable cache for multithreaded processors. Journal of Embedded Computing, 2:221–233, 2006.Google Scholar
- 116.SPEC. SPEC CPU2000. http://www.spec.org/, 2000.
- 119.G. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of International Symposium on High-Performance Computer Architecture (HPCA), pages 117–128, 2002.Google Scholar
- 140.Y.-H. Wei, C.-Y. Yang, T.-W. Kuo, S.-H. Hung, and Y.-H. Chu. Energy-efficient real-time scheduling of multimedia tasks on multi-core processors. In Proceedings of ACM Symposium on Applied Computing (SAC), pages 258–262, 2010.Google Scholar
- 148.C.-Y. Yang, J.-J. Chen, T.-W. Kuo, and L. Thiele. An approximation scheme for energy-efficient scheduling of real-time tasks in heterogeneous multiprocessor systems. In Proceedings of Design, Automation and Test Conference in Europe (DATE), pages 694–699, 2009.Google Scholar
- 151.C. Yu and P. Petrov. Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms. In Proceedings of Design Automation Conference (DAC), pages 132–137, 2010.Google Scholar