Real-Time Systems

, Volume 52, Issue 3, pp 356–395 | Cite as

Bounding and reducing memory interference in COTS-based multi-core systems

  • Hyoseung Kim
  • Dionisio de Niz
  • Björn Andersson
  • Mark Klein
  • Onur Mutlu
  • Ragunathan Rajkumar
Article

Abstract

In multi-core systems, main memory is a major shared resource among processor cores. A task running on one core can be delayed by other tasks running simultaneously on other cores due to interference in the shared main memory system. Such memory interference delay can be large and highly variable, thereby posing a significant challenge for the design of predictable real-time systems. In this paper, we present techniques to reduce this interference and provide an upper bound on the worst-case interference on a multi-core platform that uses a commercial-off-the-shelf (COTS) DRAM system. We explicitly model the major resources in the DRAM system, including banks, buses, and the memory controller. By considering their timing characteristics, we analyze the worst-case memory interference delay imposed on a task by other tasks running in parallel. We find that memory interference can be significantly reduced by (i) partitioning DRAM banks, and (ii) co-locating memory-intensive tasks on the same processing core. Based on these observations, we develop a memory interference-aware task allocation algorithm for reducing memory interference. We evaluate our approach on a COTS-based multi-core platform running Linux/RK. Experimental results show that the predictions made by our approach are close to the measured worst-case interference under workloads with both high and low memory contention. In addition, our memory interference-aware task allocation algorithm provides a significant improvement in task schedulability over previous work, with as much as 96 % more tasksets being schedulable.

Keywords

Memory interference DRAM Bank partitioning Memory controller Multi-core Task allocation 

References

  1. Akesson B, Goossens K, Ringhofer M (2007) Predator: a predictable SDRAM memory controller. In: IEEE/ACM international conference on hardware/software codesign and system synthesis (CODES+ISSS), 2007Google Scholar
  2. Altmeyer S, Davis R, Maiza C (2011) Cache related pre-emption delay aware response time analysis for fixed priority pre-emptive systems. In: IEEE real-time systems symposium (RTSS), 2011Google Scholar
  3. Andersson B, Easwaran A, Lee J (2010) Finding an upper bound on the increase in execution time due to contention on the memory bus in COTS-based multicore systems. SIGBED Rev 7(1):4CrossRefGoogle Scholar
  4. Ausavarungnirun R, Chang KK-W, Subramanian L, Loh GH, Mutlu O (2012) Staged memory scheduling: achieving high performance and scalability in heterogeneous systems. In: International symposium on computer architecture (ISCA), 2012Google Scholar
  5. Bhat B, Mueller F (2010) Making DRAM refresh predictable. In: Euromicro conference on real-time systems (ECRTS), 2010Google Scholar
  6. Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: Characterization and architectural implications. In: International conference on parallel architectures and compilation techniques (PACT), 2008Google Scholar
  7. Dasari D, Andersson B, Nelis V, Petters SM, Easwaran A, Lee J (2011) Response time analysis of COTS-based multicores considering the contention on the shared memory bus. In: IEEE international conference on trust, security and privacy in computing and communications, 2011Google Scholar
  8. de Niz D, Rajkumar R (2006) Partitioning bin-packing algorithms for distributed real-time systems. Int J Embed Syst 2(3):196–208CrossRefGoogle Scholar
  9. Ebrahimi E, Lee CJ, Mutlu O, Patt YN (2010) Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In: International conference on architectural support for programming languages and operating systems (ASPLOS), 2010Google Scholar
  10. Eswaran A, Rajkumar R (2005) Energy-aware memory firewalling for QoS-sensitive applications. In: Euromicro conference on real-time systems (ECRTS), 2005Google Scholar
  11. Jeong MK, Yoon DH, Sunwoo D, Sullivan M, Lee I, Erez M (2012) Balancing DRAM locality and parallelism in shared memory CMP systems. In: IEEE international symposium on high-performance computer architecture (HPCA), 2012Google Scholar
  12. Johnson DS, Demers A, Ullman JD, Garey MR, Graham RL (1974) Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J Comput 3(4):299–325MathSciNetCrossRefMATHGoogle Scholar
  13. Joseph M, Pandya PK (1986) Finding response times in a real-time system. Comput J 29(5):390–395MathSciNetCrossRefGoogle Scholar
  14. Kim H, de Niz D, Andersson B, Klein M, Mutlu O, Rajkumar RR (2014) Bounding memory interference delay in COTS-based multi-core systems. In: IEEE real-time technology and applications symposium (RTAS)Google Scholar
  15. Kim Y, Han D, Mutlu O, Harchol-Balter M (2010) ATLAS: a scalable and high-performance scheduling algorithm for multiple memory controllers. In: IEEE international symposium on high-performance computer architecture (HPCA), 2010Google Scholar
  16. Kim H, Kandhalu A, Rajkumar R (2013) A coordinated approach for practical OS-level cache management in multi-core real-time systems. In: Euromicro conference on real-time systems (ECRTS), 2013Google Scholar
  17. Kim H, Kim J, Rajkumar RR. A profiling framework in Linux/RK and its application. In: Open demo session of IEEE real-time systems symposium (RTSS@Work), 2012Google Scholar
  18. Kim Y, Papamichael M, Mutlu O, Harchol-Balter M (2010) Thread cluster memory scheduling: exploiting differences in memory access behavior. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2010Google Scholar
  19. Kim H, Rajkumar R. Shared-page management for improving the temporal isolation of memory reservations in resource kernels. In: IEEE conference on embedded and real-time computing systems and applications (RTCSA), 2012Google Scholar
  20. Krishnapillai Y, Wu ZP, Pellizzoni R (2014) A rank-switching, open-row DRAM controller for mixed-criticality systems. In: Euromicro conference on real-time systems (ECRTS), 2014Google Scholar
  21. Lakshmanan K, de Niz D, Rajkumar R, Moreno G (2010) Resource allocation in distributed mixed-criticality cyber-physical systems. In: IEEE international conference on distributed computing systems (ICDCS), 2010Google Scholar
  22. Lakshmanan K, Rajkumar R, Lehoczky JP (2009) Partitioned fixed-priority preemptive scheduling for multi-core processors. In: Euromicro conference on real-time systems (ECRTS), 2009Google Scholar
  23. Lee CJ, Narasiman V, Ebrahimi E, Mutlu O, Patt YN (2010) DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. Technical Report TR-HPS-2010-002, UT Austin, 2010Google Scholar
  24. Li Y, Akesson B, Goossens K (2014) Dynamic command scheduling for real-time memory controllers. In: Euromicro conference on real-time systems (ECRTS), 2014Google Scholar
  25. Liu L, Cui Z, Xing M, Bao Y, Chen M, Wu C (2012) A software memory partition approach for eliminating bank-level interference in multicore systems. In: International conference on parallel architectures and compilation techniques (PACT), 2012Google Scholar
  26. Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46–61MathSciNetCrossRefMATHGoogle Scholar
  27. Lv M, Nan G, Yi W, Yu G (2010) Combining abstract interpretation with model checking for timing analysis of multicore software. In: IEEE real-time systems symposium (RTSS), 2010Google Scholar
  28. Moscibroda T, Mutlu O (2007) Memory performance attacks: denial of memory service in multi-core systems. In: USENIX security symposium, 2007Google Scholar
  29. Muralidhara SP, Subramanian L, Mutlu O, Kandemir M, Moscibroda T (2011) Reducing memory interference in multicore systems via application-aware memory channel partitioning. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2011Google Scholar
  30. Mutlu O, Moscibroda T (2007) Stall-time fair memory access scheduling for chip multiprocessors. In: IEEE/ACM International symposium on microarchitecture (MICRO), 2007Google Scholar
  31. Mutlu O, Moscibroda T (2008) Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In: International symposium on computer architecture (ISCA), 2008Google Scholar
  32. Nesbit KJ, Aggarwal N, Laudon J, Smith JE (2006) Fair queuing memory systems. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2006Google Scholar
  33. Oikawa S, Rajkumar R (1998) Linux/RK: a portable resource kernel in Linux. In: IEEE real-time systems symposium (RTSS) Work-In-Progress, 1998Google Scholar
  34. Paolieri M, Quiñones E, Cazorla F, Valero M (2010) An analyzable memory controller for hard read-time CMPs. IEEE Embed Syst Lett 1(4):86–90CrossRefGoogle Scholar
  35. Paolieri M, Quiñones E, Cazorla F, Davis R, Valero M (2011) IA\(^{3}\): an interference aware allocation algorithm for multicore hard real-time systems. In: IEEE real-time technology and applications symposium (RTAS), 2011Google Scholar
  36. Pellizzoni R, Schranzhofer A, Chen J, Caccamo M, Thiele L (2010) Worst case delay analysis for memory interference in multicore systems. In: Design, automation test in europe conference exhibition (DATE), 2010Google Scholar
  37. Rajkumar R, Juvva K, Molano A, Oikawa S (1998) Resource kernels: A resource-centric approach to real-time and multimedia systems. In: SPIE/ACM conference on multimedia computing and networking, 1998Google Scholar
  38. Reineke J, Liu I, Patel HD, Kim S, Lee EA (2011) PRET DRAM controller: Bank privatization for predictability and temporal isolation. In: IEEE/ACM international conference on hardware/software codesign and system synthesis (CODES+ISSS), 2011Google Scholar
  39. Rixner S, Dally WJ, Kapasi UJ, Mattson P, Owens JD (200) Memory access scheduling. In: International symposium on computer architecture (ISCA), 2000Google Scholar
  40. Rosén J, Andrei A, Eles P, Peng Z (2007) Bus access optimization for predictable implementation of real-time applications on multiprocessor systems-on-chip. In: IEEE real-time systems symposium (RTSS), 2007Google Scholar
  41. Schliecker S, Negrean M, Ernst R (2010) Bounding the shared resource load for the performance analysis of multiprocessor systems. In: Design, automation test in europe conference exhibition (DATE), 2010Google Scholar
  42. Seshadri V, Bhowmick A, Mutlu O, Gibbons PB, Kozuch M, Mowry TC, et al. (2014) The dirty-block index. In: International symposium on computer architecture (ISCA), 2014Google Scholar
  43. Subramanian L, Lee D, Seshadri V, Rastogi H, Mutlu O (2014) The blacklisting memory scheduler: achieving high performance and fairness at low cost. In: IEEE international conference on computer design (ICCD), 2014Google Scholar
  44. Subramanian L, Seshadri V, Ghosh A, Khan S, Mutlu O (2015) The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2015Google Scholar
  45. Subramanian L, Seshadri V, Kim Y, Jaiyen B, Mutlu O (2013) MISE: providing performance predictability and improving fairness in shared main memory systems. In: IEEE international symposium on high-performance computer architecture (HPCA), 2013Google Scholar
  46. Suzuki N, Kim H, de Niz D, Andersson B, Wrage L, Klein M, Rajkumar RR (2103) Coordinated bank and cache coloring for temporal protection of memory accesses. In: IEEE International conference on embedded software and systems (ICESS), 2013Google Scholar
  47. Wilhelm R, Grund D, Reineke J, Schlickling M, Pister M, Ferdinand C (2009) Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. IEEE Trans Comput Aided Des Integr Circuits Syst 28(7):966–978CrossRefGoogle Scholar
  48. Wu ZP, Krish Y, Pellizzoni R (2013) Worst case analysis of DRAM latency in multi-requestor systems. In: IEEE real-time systems symposium (RTSS), 2013Google Scholar
  49. Xie M, Tong D, Huang K, Cheng X (2014) Improving system throughput and fairness simultaneously in CMP systems via dynamic bank partitioning. In: IEEE international symposium on high-performance computer architecture (HPCA), 2014Google Scholar
  50. Yun H, Mancuso R, Wu Z-P, Pellizzoni R (2014) PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In: IEEE real-time technology and applications symposium (RTAS), 2014Google Scholar
  51. Yun H, Yao G, Pellizzoni R, Caccamo M, Sha L (2012) Memory access control in multiprocessor for real-time systems with mixed criticality. In: Euromicro conference on real-time systems (ECRTS), 2012Google Scholar
  52. Zhang X, Dwarkadas S, Shen K (2009) Hardware execution throttling for multi-core resource management. In: USENIX annual technical conference (USENIX ATC), 2009Google Scholar
  53. Zuravleff W, Robinson T (1997) Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. US Patent Number 5,630,096, 1997Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Hyoseung Kim
    • 1
  • Dionisio de Niz
    • 2
  • Björn Andersson
    • 2
  • Mark Klein
    • 2
  • Onur Mutlu
    • 1
  • Ragunathan Rajkumar
    • 1
  1. 1.Electrical and Computer EngineeringCarnegie Mellon UniversityPittsburghUSA
  2. 2.Software Engineering InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations