Abstract
In multi-core systems, main memory is a major shared resource among processor cores. A task running on one core can be delayed by other tasks running simultaneously on other cores due to interference in the shared main memory system. Such memory interference delay can be large and highly variable, thereby posing a significant challenge for the design of predictable real-time systems. In this paper, we present techniques to reduce this interference and provide an upper bound on the worst-case interference on a multi-core platform that uses a commercial-off-the-shelf (COTS) DRAM system. We explicitly model the major resources in the DRAM system, including banks, buses, and the memory controller. By considering their timing characteristics, we analyze the worst-case memory interference delay imposed on a task by other tasks running in parallel. We find that memory interference can be significantly reduced by (i) partitioning DRAM banks, and (ii) co-locating memory-intensive tasks on the same processing core. Based on these observations, we develop a memory interference-aware task allocation algorithm for reducing memory interference. We evaluate our approach on a COTS-based multi-core platform running Linux/RK. Experimental results show that the predictions made by our approach are close to the measured worst-case interference under workloads with both high and low memory contention. In addition, our memory interference-aware task allocation algorithm provides a significant improvement in task schedulability over previous work, with as much as 96 % more tasksets being schedulable.
This is a preview of subscription content, access via your institution.















Notes
JEDEC. DDR3 SDRAM Standard. http://www.jedec.org.
The physical structure of priority queues, bank schedulers, and the channel scheduler depends on the implementation. They can be implemented as a single hardware structure (Nesbit et al. 2006) or as multiple decoupled structures (Mutlu and Moscibroda 2007, Mutlu and Moscibroda 2008; Ausavarungnirun et al. 2012).
The effect of REF (\(E_{R}\)) in memory interference delay can be roughly estimated as \(E_{R}^{k+1}=\lceil \text {\{(total delay from analysis)}+E_{R}^k\}/t_{REFI}\rceil \cdot t_{RFC}\), where \(E_R^0=0\). For the DDR3-1333 with 2 Gb density below 85\(^{\circ }\), \(t_{RFC}/t_{REFI}\) is \(160\text {ns}/7.8\mu \text {s}=0.02\), so the effect of REF results in only about 2 % increase in the total memory interference delay. A more detailed analysis on REF can be found in Bhat and Mueller (2010).
Micron 2Gb DDR3 Component: MT41J256M8-15E. http://download.micron.com/pdf/datasheets/dram/ddr3/2Gb_DDR3_SDRAM.pdf.
OSEK/VDX OS. http://portal.osek-vdx.org/files/pdf/specs/os223.pdf.
Windriver VxWorks. http://www.windriver.com.
An arbitrary tie-breaking rule can be used to assign a unique priority to each task.
These assumptions will be relaxed in future work.
This assumption is required to bound the re-ordering effect of the memory controller, which will be described in Sect. 4.1.
Note that the write-buffer draining does not completely block read requests until all the write requests are serviced. In a memory controller with write batching, read requests are always exposed to the memory controller, but write requests are exposed to and scheduled by the memory controller only when the write buffer is close to full (Lee et al. 2010). Hence, even when the write buffer is being drained, a read request can be scheduled if its commands are ready with respect to DRAM timing constraints (e.g., read and write requests to different banks).
This is why the DRAM address mapping in Fig. 1c does not have a bit for channel selection.
Linux/RK is available at https://rtml.ece.cmu.edu/redmine/projects/rk.
McCalpin JD. STREAM: Sustainable memory bandwidth in high performance computers. http://www.cs.virginia.edu/stream.
Software cache partitioning simultaneously partitions the entire physical memory space into the number of cache partitions. Therefore the spatial memory requirement of a task determines the minimum number of cache partitions for that task (Kim et al. 2013).
References
Akesson B, Goossens K, Ringhofer M (2007) Predator: a predictable SDRAM memory controller. In: IEEE/ACM international conference on hardware/software codesign and system synthesis (CODES+ISSS), 2007
Altmeyer S, Davis R, Maiza C (2011) Cache related pre-emption delay aware response time analysis for fixed priority pre-emptive systems. In: IEEE real-time systems symposium (RTSS), 2011
Andersson B, Easwaran A, Lee J (2010) Finding an upper bound on the increase in execution time due to contention on the memory bus in COTS-based multicore systems. SIGBED Rev 7(1):4
Ausavarungnirun R, Chang KK-W, Subramanian L, Loh GH, Mutlu O (2012) Staged memory scheduling: achieving high performance and scalability in heterogeneous systems. In: International symposium on computer architecture (ISCA), 2012
Bhat B, Mueller F (2010) Making DRAM refresh predictable. In: Euromicro conference on real-time systems (ECRTS), 2010
Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: Characterization and architectural implications. In: International conference on parallel architectures and compilation techniques (PACT), 2008
Dasari D, Andersson B, Nelis V, Petters SM, Easwaran A, Lee J (2011) Response time analysis of COTS-based multicores considering the contention on the shared memory bus. In: IEEE international conference on trust, security and privacy in computing and communications, 2011
de Niz D, Rajkumar R (2006) Partitioning bin-packing algorithms for distributed real-time systems. Int J Embed Syst 2(3):196–208
Ebrahimi E, Lee CJ, Mutlu O, Patt YN (2010) Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In: International conference on architectural support for programming languages and operating systems (ASPLOS), 2010
Eswaran A, Rajkumar R (2005) Energy-aware memory firewalling for QoS-sensitive applications. In: Euromicro conference on real-time systems (ECRTS), 2005
Jeong MK, Yoon DH, Sunwoo D, Sullivan M, Lee I, Erez M (2012) Balancing DRAM locality and parallelism in shared memory CMP systems. In: IEEE international symposium on high-performance computer architecture (HPCA), 2012
Johnson DS, Demers A, Ullman JD, Garey MR, Graham RL (1974) Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J Comput 3(4):299–325
Joseph M, Pandya PK (1986) Finding response times in a real-time system. Comput J 29(5):390–395
Kim H, de Niz D, Andersson B, Klein M, Mutlu O, Rajkumar RR (2014) Bounding memory interference delay in COTS-based multi-core systems. In: IEEE real-time technology and applications symposium (RTAS)
Kim Y, Han D, Mutlu O, Harchol-Balter M (2010) ATLAS: a scalable and high-performance scheduling algorithm for multiple memory controllers. In: IEEE international symposium on high-performance computer architecture (HPCA), 2010
Kim H, Kandhalu A, Rajkumar R (2013) A coordinated approach for practical OS-level cache management in multi-core real-time systems. In: Euromicro conference on real-time systems (ECRTS), 2013
Kim H, Kim J, Rajkumar RR. A profiling framework in Linux/RK and its application. In: Open demo session of IEEE real-time systems symposium (RTSS@Work), 2012
Kim Y, Papamichael M, Mutlu O, Harchol-Balter M (2010) Thread cluster memory scheduling: exploiting differences in memory access behavior. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2010
Kim H, Rajkumar R. Shared-page management for improving the temporal isolation of memory reservations in resource kernels. In: IEEE conference on embedded and real-time computing systems and applications (RTCSA), 2012
Krishnapillai Y, Wu ZP, Pellizzoni R (2014) A rank-switching, open-row DRAM controller for mixed-criticality systems. In: Euromicro conference on real-time systems (ECRTS), 2014
Lakshmanan K, de Niz D, Rajkumar R, Moreno G (2010) Resource allocation in distributed mixed-criticality cyber-physical systems. In: IEEE international conference on distributed computing systems (ICDCS), 2010
Lakshmanan K, Rajkumar R, Lehoczky JP (2009) Partitioned fixed-priority preemptive scheduling for multi-core processors. In: Euromicro conference on real-time systems (ECRTS), 2009
Lee CJ, Narasiman V, Ebrahimi E, Mutlu O, Patt YN (2010) DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. Technical Report TR-HPS-2010-002, UT Austin, 2010
Li Y, Akesson B, Goossens K (2014) Dynamic command scheduling for real-time memory controllers. In: Euromicro conference on real-time systems (ECRTS), 2014
Liu L, Cui Z, Xing M, Bao Y, Chen M, Wu C (2012) A software memory partition approach for eliminating bank-level interference in multicore systems. In: International conference on parallel architectures and compilation techniques (PACT), 2012
Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46–61
Lv M, Nan G, Yi W, Yu G (2010) Combining abstract interpretation with model checking for timing analysis of multicore software. In: IEEE real-time systems symposium (RTSS), 2010
Moscibroda T, Mutlu O (2007) Memory performance attacks: denial of memory service in multi-core systems. In: USENIX security symposium, 2007
Muralidhara SP, Subramanian L, Mutlu O, Kandemir M, Moscibroda T (2011) Reducing memory interference in multicore systems via application-aware memory channel partitioning. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2011
Mutlu O, Moscibroda T (2007) Stall-time fair memory access scheduling for chip multiprocessors. In: IEEE/ACM International symposium on microarchitecture (MICRO), 2007
Mutlu O, Moscibroda T (2008) Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In: International symposium on computer architecture (ISCA), 2008
Nesbit KJ, Aggarwal N, Laudon J, Smith JE (2006) Fair queuing memory systems. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2006
Oikawa S, Rajkumar R (1998) Linux/RK: a portable resource kernel in Linux. In: IEEE real-time systems symposium (RTSS) Work-In-Progress, 1998
Paolieri M, Quiñones E, Cazorla F, Valero M (2010) An analyzable memory controller for hard read-time CMPs. IEEE Embed Syst Lett 1(4):86–90
Paolieri M, Quiñones E, Cazorla F, Davis R, Valero M (2011) IA\(^{3}\): an interference aware allocation algorithm for multicore hard real-time systems. In: IEEE real-time technology and applications symposium (RTAS), 2011
Pellizzoni R, Schranzhofer A, Chen J, Caccamo M, Thiele L (2010) Worst case delay analysis for memory interference in multicore systems. In: Design, automation test in europe conference exhibition (DATE), 2010
Rajkumar R, Juvva K, Molano A, Oikawa S (1998) Resource kernels: A resource-centric approach to real-time and multimedia systems. In: SPIE/ACM conference on multimedia computing and networking, 1998
Reineke J, Liu I, Patel HD, Kim S, Lee EA (2011) PRET DRAM controller: Bank privatization for predictability and temporal isolation. In: IEEE/ACM international conference on hardware/software codesign and system synthesis (CODES+ISSS), 2011
Rixner S, Dally WJ, Kapasi UJ, Mattson P, Owens JD (200) Memory access scheduling. In: International symposium on computer architecture (ISCA), 2000
Rosén J, Andrei A, Eles P, Peng Z (2007) Bus access optimization for predictable implementation of real-time applications on multiprocessor systems-on-chip. In: IEEE real-time systems symposium (RTSS), 2007
Schliecker S, Negrean M, Ernst R (2010) Bounding the shared resource load for the performance analysis of multiprocessor systems. In: Design, automation test in europe conference exhibition (DATE), 2010
Seshadri V, Bhowmick A, Mutlu O, Gibbons PB, Kozuch M, Mowry TC, et al. (2014) The dirty-block index. In: International symposium on computer architecture (ISCA), 2014
Subramanian L, Lee D, Seshadri V, Rastogi H, Mutlu O (2014) The blacklisting memory scheduler: achieving high performance and fairness at low cost. In: IEEE international conference on computer design (ICCD), 2014
Subramanian L, Seshadri V, Ghosh A, Khan S, Mutlu O (2015) The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2015
Subramanian L, Seshadri V, Kim Y, Jaiyen B, Mutlu O (2013) MISE: providing performance predictability and improving fairness in shared main memory systems. In: IEEE international symposium on high-performance computer architecture (HPCA), 2013
Suzuki N, Kim H, de Niz D, Andersson B, Wrage L, Klein M, Rajkumar RR (2103) Coordinated bank and cache coloring for temporal protection of memory accesses. In: IEEE International conference on embedded software and systems (ICESS), 2013
Wilhelm R, Grund D, Reineke J, Schlickling M, Pister M, Ferdinand C (2009) Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. IEEE Trans Comput Aided Des Integr Circuits Syst 28(7):966–978
Wu ZP, Krish Y, Pellizzoni R (2013) Worst case analysis of DRAM latency in multi-requestor systems. In: IEEE real-time systems symposium (RTSS), 2013
Xie M, Tong D, Huang K, Cheng X (2014) Improving system throughput and fairness simultaneously in CMP systems via dynamic bank partitioning. In: IEEE international symposium on high-performance computer architecture (HPCA), 2014
Yun H, Mancuso R, Wu Z-P, Pellizzoni R (2014) PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In: IEEE real-time technology and applications symposium (RTAS), 2014
Yun H, Yao G, Pellizzoni R, Caccamo M, Sha L (2012) Memory access control in multiprocessor for real-time systems with mixed criticality. In: Euromicro conference on real-time systems (ECRTS), 2012
Zhang X, Dwarkadas S, Shen K (2009) Hardware execution throttling for multi-core resource management. In: USENIX annual technical conference (USENIX ATC), 2009
Zuravleff W, Robinson T (1997) Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. US Patent Number 5,630,096, 1997
Author information
Authors and Affiliations
Corresponding author
Additional information
This material is based upon work funded and supported by the Department of Defense under Contract No. FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. This material has been approved for public release and unlimited distribution. Carnegie Mellon® is registered in the U.S. Patent and Trademark Office by Carnegie Mellon University. DM-0001596.
Rights and permissions
About this article
Cite this article
Kim, H., de Niz, D., Andersson, B. et al. Bounding and reducing memory interference in COTS-based multi-core systems. Real-Time Syst 52, 356–395 (2016). https://doi.org/10.1007/s11241-016-9248-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11241-016-9248-1