Phase distance mapping: a phase-based cache tuning methodology for embedded systems

Abstract

Networked embedded systems typically leverage a collection of low-power embedded systems (nodes) to collaboratively execute applications spanning diverse application domains (e.g., video, image processing, communication, etc.) with diverse application requirements. The individual networked nodes must operate under stringent constraints (e.g., energy, memory, etc.) and should be specialized to meet varying applications’ requirements in order to adhere to these constraints. Phase-based tuning specializes a system’s tunable parameters to the varying runtime requirements of an application’s different phases of execution to meet optimization goals. Since the design space for tunable systems can be very large, one of the major challenges in phase-based tuning is determining the best configuration for each phase without incurring significant tuning overhead (e.g., energy and/or performance) during design space exploration. In this paper, we propose phase distance mapping, which directly determines the best configuration for a phase, thereby eliminating design space exploration. Phase distance mapping applies the correlation between a known phase’s characteristics and best configuration to determine a new phase’s best configuration based on the new phase’s characteristics. Experimental results verify that our phase distance mapping approach, when applied to cache tuning, determines cache configurations within 1 % of the optimal configurations on average and yields an energy delay product savings of 27 % on average.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Algorithm 2
Algorithm 3
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. 1.

    Albonesi D (1999) Selective cache ways: on-demand cache resource allocation. In: International symposium on microarchitecture, pp 248–259

    Google Scholar 

  2. 2.

    Balasubramonian R, Albonesi D, Byoktosunoglu A, Dwarkada S (2000) Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In: International symposium on microarchitecture, pp 245–257

    Google Scholar 

  3. 3.

    Binkert N et al. (2011) The gem5 simulator. Comput Archit News 39(2):1–7

    Article  Google Scholar 

  4. 4.

    Chaver D, Rojas M, Pinuel L, Prieto M, Tirado F, Huang M (2005) Energy-aware fetch mechanism: trace cache and BTB customization. In: International symposium on low power electronics and design, pp 42–47

    Google Scholar 

  5. 5.

    Chen L, Zou X, Lei J, Liu Z (2007) Dynamically reconfigurable cache for low-power embedded system. In: International conference on natural computation, pp 180–184

    Google Scholar 

  6. 6.

    Dhodapkar A, Smith J (2003) Comparing program phase detection techniques. In: International symposium on microarchitecture, pp 217–228

    Google Scholar 

  7. 7.

    Folegnani D (2001) Energy-effective issue logic. In: International symposium on computer architecture, pp 230–239

    Google Scholar 

  8. 8.

    Gal-On S, Levy M (2008) Measuring multicore performance. Computer 41(11):99–102

    Article  Google Scholar 

  9. 9.

    Ghosh A, Givargis T (2003) Cache optimization for embedded processor cores: an analytical approach. In: International conference on computer aided design, pp 342–347

    Google Scholar 

  10. 10.

    Gordon-Ross A, Lau J, Calder B (2008) Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy. In: ACM Great Lakes symposium on VLSI, pp 379–382

    Google Scholar 

  11. 11.

    Gordon-Ross A, Vahid F (2007) A self-tuning configurable cache. In: Design automation conference, pp 234–237

    Google Scholar 

  12. 12.

    Gordon-Ross A, Vahid F, Dutt N (2004) Automatic tuning of two-level caches to embedded applications. In: Design, automation and test in Europe, pp 208–213

    Google Scholar 

  13. 13.

    Gordon-Ross A, Vahid F, Dutt N (2005) Fast configurable-cache tuning with a unified second level cache. In: International symposium on low power electronics and design, pp 323–326

    Google Scholar 

  14. 14.

    Gordon-Ross A, Viana P, Vahid F, Najjar W, Barros E (2007) A one-shot configurable cache tuner for improved energy and performance. In: Design, automation and test in Europe, pp 1–6

    Google Scholar 

  15. 15.

    Gulati D, Kim C, Sethumadhavan S, Keckler S, Burger D (2008) Multitasking workload scheduling on flexible core chip multiprocessors. In: International conference on parallel architecture and compilation techniques, pp 187–196

    Google Scholar 

  16. 16.

    Hajimir H, Mishra P (2012) Intra-task dynamic cache reconfiguration. In: International conference on VLSI design, pp 430–435

    Google Scholar 

  17. 17.

    Hennessy J, Patterson D (2006) Computer architecture: a quantitative approach. Morgan Kaufman, San Mateo

    Google Scholar 

  18. 18.

    Jain S, Fall K, Patra R (2004) Routing in a delay tolerant network. In: SIGCOMM, pp 145–158

    Google Scholar 

  19. 19.

    Kanungo T et al. (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892

    Article  Google Scholar 

  20. 20.

    Lau J (2004) Structures for phase classification. In: International symposium on performance analysis of systems and software, pp 57–67

    Google Scholar 

  21. 21.

    Li S et al. (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and many core architectures. In: International symposium on microarchitecture, pp 469–480

    Google Scholar 

  22. 22.

    Malik A, Moyer W, Cermak D (2000) A low power unified cache architecture providing power and performance flexibility. In: International symposium on low power electronics and design, pp 241–243

    Google Scholar 

  23. 23.

    MIPS32 M14K. http://www.mips.com/products/cores/32-64-bit-cores/mips32-m14k/. Accessed 26 July 2013

  24. 24.

    Modarressi M, Hessabi S, Gourdarzi M (2006) A reconfigurable cache architecture for object-oriented embedded systems. In: Canadian conference on electrical and computer engineering, pp 959–962

    Google Scholar 

  25. 25.

    Munir A, Gordon-Ross A, Lysecky S, Lysecky R (2006) A one-shot dynamic optimization methodology for wireless sensor networks. In: International conference on mobile ubiquitous computing, pp 278–291

    Google Scholar 

  26. 26.

    Peng M, Sun J, Wang Y (2007) A phase-based self-tuning algorithm for reconfigurable cache. In: International conference on the digital society, pp 27–32

    Google Scholar 

  27. 27.

    Rawlins M, Gordon-Ross A (2012) An application classification guided cache tuning heuristic for multi-core architecture. In: Asia and South pacific design automation conference, pp 23–28

    Google Scholar 

  28. 28.

    Shen X, Zhong Y, Ding C (2004) Locality phase prediction. In: International conference on architectural support for programming languages and operating systems, pp 165–176

    Google Scholar 

  29. 29.

    Sherwood T, Calder B (1999) Time varying behavior of programs. Technical report UCSD-CS99-630, UC San Diego

  30. 30.

    Sherwood T, Perelman E, Calder B (2001) Basic block distribution analysis to find periodic behavior and simulation points in applications. In: International conference on parallel architectures and compilation techniques, pp 3–14

    Google Scholar 

  31. 31.

    Sherwood T, Perelman E, Hamerly G, Calder B (2002) Automatically characterizing large scale program behavior. In: International conference on architectural support for programming languages and operating systems, pp 45–57

    Google Scholar 

  32. 32.

    Sherwood T, Perelman E, Hamerly G, Sair S, Calder B (2003) Discovering and exploiting program phases. IEEE MICRO 23(6):84–93

    Article  Google Scholar 

  33. 33.

    Sherwood T, Sair S, Calder B (2003) Phase tracking and prediction. In: International symposium on computer architecture, pp 336–349

    Google Scholar 

  34. 34.

    Synopsis design compiler, Synopsis Inc. www.synopsys.com. Accessed 27 July 2013

  35. 35.

    Tee L, Lee S, Tsai C (2011) A scheduling with DVS mechanism for embedded multicore real-time systems. Int J Digit Content Technol Appl, 51–55

  36. 36.

    Zhang C, Vahid F, Lysecky R (2004) A self-tuning architecture for embedded systems. ACM Trans Embed Comput Syst 1:142–147. doi:10.1109/DATE.2004.1268840

    Google Scholar 

  37. 37.

    Zhang C, Vahid F, Najjar W (2003) A highly-configurable cache architecture for embedded systems. In: International symposium on computer architecture, pp 136–146

    Google Scholar 

  38. 38.

    Zhou Y, Cheng H, Yu J (2009) Graph clustering based on structural/attribute similarities. VLDB J 2:718–729. doi:10.14778/1687627.1687709

    Google Scholar 

  39. 39.

    Zou X, Lei J, Liu Z (2007) Dynamically reconfigurable cache for low-power embedded system. In: Third international conference on natural computation, pp 180–184

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Science Foundation (CNS-0953447). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Tosiron Adegbija.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Adegbija, T., Gordon-Ross, A. & Munir, A. Phase distance mapping: a phase-based cache tuning methodology for embedded systems. Des Autom Embed Syst 18, 251–278 (2014). https://doi.org/10.1007/s10617-014-9127-8

Download citation

Keywords

  • Cache tuning
  • Configurable architectures
  • Configurable hardware
  • Dynamic reconfiguration
  • Phase-based tuning
  • Energy delay product