Real-Time Systems

, Volume 53, Issue 5, pp 673–708 | Cite as

Addressing isolation challenges of non-blocking caches for multicore real-time systems

  • Prathap Kumar Valsan
  • Heechul Yun
  • Farzad Farshchi
Part of the following topical collections:
  1. Special Issue on Mixed-Criticality, Multi-Core, and Micro-Kernels


In multicore real-time systems, cache partitioning is commonly used to achieve isolation among different cores. We show, however, that space isolation achieved by cache partitioning does not necessarily guarantee predictable cache access timing in modern COTS multicore platforms, which use non-blocking caches. We find that special hardware registers in non-blocking caches, known as miss status holding registers, which track the status of outstanding cache-misses, can be a significant source of contention that is not addressed by conventional cache partitioning. We propose a hardware and system software (OS) collaborative approach to efficiently eliminate MSHR contention for multicore real-time systems. Our approach includes a low-cost hardware extension that enables dynamic control of per-core memory-level parallelism (MLP) by the OS. Using the hardware extension, the OS scheduler then globally controls each core’s MLP in such a way that eliminates MSHR contention and maximizes overall throughput of the system. We implement the hardware extension in a cycle-accurate full-system simulator and the scheduler modification in Linux 3.14 kernel. Extensive experimental results demonstrate the significance of the MSHR contention problem and the effectiveness of the proposed solution.


Non-blocking cache Multicore Real-time Isolation 



This research is supported in part by NSF CNS 1302563.


  1. ARM (2011) Cortex-A15 technical reference manual, Rev: r2p0Google Scholar
  2. Axer P, Ernst R, Falk H, Girault A, Grund D, Guan N, Jonsson B, Marwedel P, Reineke J, Rochange C et al (2014) Building timing predictable embedded systems. ACM Trans Embed Comput Syst (TECS) 13(4):82Google Scholar
  3. Binkert N, Beckmann B, Black G, Reinhardt S, Saidi A, Basu A, Hestness J, Hower D, Krishna T, Sardashti S et al (2011) The gem5 simulator. ACM SIGARCH Computer Architecture NewsGoogle Scholar
  4. Blem E, Menon J, Sankaralingam K (2013) Power struggles: revisiting the RISC vs. CISC debate on contemporary arm and x86 architectures. In: High performance computer architecture (HPCA). IEEE, pp 1–12Google Scholar
  5. Burns A, Davis R (2013) Mixed criticality systems-a review. Department of Computer Science, University of York, Tech. RepGoogle Scholar
  6. Certification Authorities Software Team (May 2014) CAST-32: multi-core processors (Rev 0). Technical report, Federal Aviation Administration (FAA)Google Scholar
  7. Chisholm M, Ward B, Kim N, Anderson J (2015) Cache sharing and isolation tradeoffs in multicore mixed-criticality systems. In: Real-time systems symposium (RTSS)Google Scholar
  8. Ebrahimi E, Lee C, Mutlu O, Patt Y (2010) Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. ACM Sigplan Notices 45(3):335CrossRefGoogle Scholar
  9. EEMBC benchmark suite.
  10. Eklov D, Nikolakis N, Black-Schaffer D, Hagersten E (2012) Bandwidth bandit: quantitative characterization of memory contention. In: Parallel architectures and compilation techniques (PACT)Google Scholar
  11. Freescale (2012) e500mc Core Reference ManualGoogle Scholar
  12. Glew A (1998) MLP yes!. ILP no, ASPLOS Wild and Crazy IdeaGoogle Scholar
  13. Greenhalgh P (2011) big.LITTLE processing with ARM Cortex-A15 & Cortex-A7. ARM White paperGoogle Scholar
  14. Gutierrez A, Pusdesris J, Dreslinski RG, Mudge T, Sudanthi C, Emmons CD, Hayenga M, Paver N (2014) Sources of error in full-system simulation. In: Performance analysis of systems and software (ISPASS). IEEE, pp 13–22Google Scholar
  15. Hansson A, Agarwal N, Kolli A, Wenisch T, Udipi A (2014) Simulating DRAM controllers for future system architecture exploration. In: International symposium on performance analysis of systems and software (ISPASS)Google Scholar
  16. Intel (2012) Intel ®64 and IA-32 architectures optimization reference manual, April 2012Google Scholar
  17. Intel. Intel®64 and IA-32 architectures software developer manualsGoogle Scholar
  18. Intel (2015) Improving real-time performance by utilizing cache allocation technology, April 2015Google Scholar
  19. Jahre M, Natvig L (2009) A light-weight fairness mechanism for chip multiprocessor memory systems. In: Proceedings of the 6th ACM conference on computing frontiers. ACM, pp 1–10Google Scholar
  20. Jahre M, Natvig L (2011) A high performance adaptive miss handling architecture for chip multiprocessors. In: Transactions on high-performance embedded architectures and compilers IV. Springer, pp 1–20Google Scholar
  21. Jalle J, Quinones E, Abella J, Fossati L, Zulianello M, Cazorla FJ (2014) A dual-criticality memory controller (DCMC): proposal and evaluation of a space case study. In: Real-time systems symposium (RTSS). IEEE, pp 207–217Google Scholar
  22. Kessler RE, Hill MD (1992) Page placement algorithms for large real-indexed caches. ACM Trans Comput Syst (TOCS) 10(4):338–359CrossRefGoogle Scholar
  23. Kim H, Kandhalu A, Rajkumar R (2013) A coordinated approach for practical os-level cache management in multi-core real-time systems. In: Real-time systems (ECRTS). IEEE, pp 80–89Google Scholar
  24. Kim H, de Niz D, Andersson B, Klein M, Mutlu O, Rajkumar RR (2014). Bounding memory interference delay in COTS-based multi-core systems. In: Real-time and embedded technology and applications symposium (RTAS)Google Scholar
  25. Kim H, Bromany D, Lee E, Zimmer M, Shrivastava A, Oh J et al (2015) A predictable and command-level priority-based dram controller for mixed-criticality systems. In: Real-time and embedded technology and applications symposium (RTAS). IEEE, pp 317–326Google Scholar
  26. Kroft D (1981) Lockup-free instruction fetch/prefetch cache organization. In: International symposium on computer architecture (ISCA). IEEE Computer Society Press, pp 81–87Google Scholar
  27. Liedtke J, Hartig H, Hohmuth M (1997) Os-controlled cache predictability for real-time systems. In: Real-time technology and applications symposium, 1997. Proceedings., third IEEE. IEEE, pp 213–224Google Scholar
  28. Liu L, Cui Z, Xing M, Bao Y, Chen M, Wu C (2012) A software memory partition approach for eliminating bank-level interference in multicore systems. In: Parallel architecture and compilation techniques (PACT). ACM, pp 367–376Google Scholar
  29. Mancuso R, Dudko R, Betti E, Cesati M, Caccamo M, Pellizzoni R (2013) Real-time cache management framework for multi-core architectures. In: Real-time and embedded technology and applications symposium (RTAS). IEEE, 2013Google Scholar
  30. NVIDIA (2014) NVIDIA Tegra K1 Mobile Processor, Technical Reference Manual Rev-01p, 2014Google Scholar
  31. Panchamukhi SA, Mueller F (2015) Providing task isolation via TLB coloring. In: Real-time and embedded technology and applications symposium (RTAS). IEEE, pp 3–13Google Scholar
  32. Ranganathan P, Adve S, Jouppi NP (2000) Reconfigurable caches and their application to media processing. In: Proceedings of the 27th annual international symposium on computer architecture, vol 28. ACMGoogle Scholar
  33. Shimpi AL, Klug B. NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On.
  34. Suh GE, Devadas S, Rudolph L (2002) A new memory monitoring scheme for memory-aware scheduling and partitioning. In: High-performance computer architecture (HPCA). IEEEGoogle Scholar
  35. Suzuki N, Kim H, Niz Dd, Andersson B, Wrage L, Klein M, Rajkumar R (2013) Coordinated bank and cache coloring for temporal protection of memory accesses. In: Computational science and engineering (CSE). IEEE, pp 685–692Google Scholar
  36. Tuck J, Ceze L, Torrellas J (2006) Scalable cache miss handling for high memory-level parallelism. In: International symposium on microarchitecture (MICRO). IEEE, pp 409–422Google Scholar
  37. Valsan P, Yun H (2015) MEDUSA: a predictable and high-performance DRAM controller for multicore based embedded systems. In: Cyber-physical systems, networks, and applications (CPSNA). IEEEGoogle Scholar
  38. Valsan PK, Yun H, Farshchi F (2016) Taming non-blocking caches to improve isolation in multicore real-time systems. In: Real-time and embedded technology and applications symposium (RTAS). IEEEGoogle Scholar
  39. Venkata SK, Ahn I, Jeon D, Gupta A, Louie C, Garcia S, Belongie S, Taylor MB (2009) SD-VBS: the San Diego vision benchmark suite. In: International symposium on workload characterization (ISWC). IEEE, pp 55–64Google Scholar
  40. Vestal S (2007) Preemptive scheduling of multi-criticality systems with varying degrees of execution time assurance. In: Real-time systems symposium (RTSS). IEEE, pp 239–243Google Scholar
  41. Ward B, Herman J, Kenna C, Anderson J (2013) Making shared caches more predictable on multicore platforms. In: Euromicro conference on real-time systems (ECRTS)Google Scholar
  42. Wolfe A (1994) Software-based cache partitioning for real-time applications. J Comput Softw Eng 2(3):315–327Google Scholar
  43. Ye Y, West R, Cheng Z, Li Y (2014) Coloris: a dynamic cache partitioning system using page coloring. In: Proceedings of the 23rd international conference on Parallel architectures and compilation. ACM, pp 381–392Google Scholar
  44. Yun H, Mancuso R, Wu Z, Pellizzoni R (2014) PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In: Real-time and embedded technology and applications symposium (RTAS)Google Scholar
  45. Yun H, Pellizzoni R, Valsan P (2015) Parallelism-aware memory interference delay analysis for COTS multicore systems. In: Euromicro conference on real-time systems (ECRTS). IEEEGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.IntelHillsboroUSA
  2. 2.University of KansasLawrenceUSA

Personalised recommendations