Cache-Aware Scheduling

Guan, Nan

doi:10.1007/978-3-319-27198-9_8

Nan Guan²

914 Accesses

Abstract

The major obstacle to use multicores for real-time applications is that we may not predict and provide any guarantee on real-time properties of embedded software on such platforms; the way of handling the on-chip shared resources such as L2 cache may have a significant impact on the timing predictability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We focus on the interference caused by the shared L2 cache, and there could be other interference between tasks running simultaneously. However, we believe the scheduling algorithm and analysis techniques in this paper is a necessary step towards completely avoiding interference between tasks running on multicores, and can be integrated with techniques of performance isolation on other shared resources, for instance, the work in [167] to avoid interference caused by the shared on-chip bus.
2.
Note that this is just an example of how cache partitioning can be achieved; by no means is virtual memory a necessity to the results presented in this chapter.

References

V. Suhendra, T. Mitra, Exploring locking and partitioning for predictable shared caches on multi-cores, in DAC, 2008
Google Scholar
R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. FerdinanRd, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, P. Stenström, The worst-case execution-time problem overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst. 7(3), 36:1–36:53 (2008)
Google Scholar
J. Carpenter, S. Funk, P. Holman, A. Srinivasan, J. Anderson, S. Baruah, A categorization of real-time multiprocessor scheduling problems and algorithms, in Handbook of Scheduling - Algorithms, Models, and Performance Analysis (2004). http://www.crcnetbase.com/doi/abs/10.1201/9780203489802.ch30
Google Scholar
B. Andersson, S. Baruah, J. Jonsson, Static-priority scheduling on multiprocessors, in RTSS, 2001
Google Scholar
T.P. Baker, Multiprocessor edf and deadline monotonic schedulability analysis, in RTSS, 2003
Google Scholar
M. Bertogna, M. Cirinei, G. Lipari, Improved schedulability analysis of edf on multiprocessor platforms, in ECRTS, 2005
Google Scholar
J. Yan, W. Zhang, Wcet analysis for multi-core processors with shared l2 instruction caches, in RTAS, 2008
Google Scholar
M. Berkelaar, lp_solve: (mixed integer) linear programming problem solver (2003). Available from ftp://ftp.es.ele.tue.nl/pub/lp_solve
N. Guan, W. Yi, Z. Gu, Q. Deng, G. Yu, New schedulability test conditions for non-preemptive scheduling on multiprocessor platforms, in RTSS, 2008
Google Scholar
T.P. Baker, A comparison of global and partitioned edf schedulability tests for multiprocessors. Technical Report, Department of Computer Science, Florida State University, FL, 2005
Google Scholar
H. Leontyev, J.H. Anderson, A unified hard/soft real-time schedulability test for global edf multiprocessor scheduling, in Proceedings of the 29th IEEE Real-Time Systems Symposium (RTSS), 2008
Google Scholar
B.K. Bershad, B.J. Chen, D. Lee, T.H. Romer, Avoiding conflict misses dynamically in large direct mapped caches, in ASPLOS, 1994
Google Scholar
J. Herter, J. Reineke, R. Wilhelm, Cama: cache-aware memory allocation for wcet analysis, in ECRTS, 2008
Google Scholar
J. Rosen, A. Andrei, P. Eles, Z. Peng, Bus access optimization for predictable implementation of real-time applications on multiprocessor systems-on-chip, in RTSS, 2007
Google Scholar
A. Fedorova, M. Seltzer, C. Small, D. Nussbaum, Throughput-oriented scheduling on chip multithreading systems. Technical Report, Harvard University, 2005
Google Scholar
D. Chandra, F. Guo, S. Kim, Y. Solihin, Predicting inter-thread cache contention on a multi-processor architecture, in HPCA, 2005
Google Scholar
J.H. Anderson, J.M. Calandrino, U.C. Devi, Real-time scheduling on multicore platforms, in RTAS, 2006
Google Scholar
J.M. Calandrino, J.H. Anderson, Cache-aware real-time scheduling on multicore platforms: heuristics and a case study, in ECRTS, 2008
Google Scholar
K. Danne, M. Platzner, An edf schedulability test for periodic tasks on reconfigurable hardware devices, in LCTES, 2006
Google Scholar
N. Guan, Q. Deng, Z. Gu, W. Xu, G. Yu, Schedulability analysis of preemptive and non-preemptive edf on partial runtime-reconfigurable fpgas, in ACM Transaction on Design Automation of Electronic Systems, vol. 13, no. 4 (2008)
Google Scholar
N. Fisher, J. Anderson, S. Baruah, Task partitioning upon memory-constrained multiprocessors, in RTCSA, 2005, p. 1
Google Scholar
V. Suhendra, C. Raghavan, T. Mitra, Integrated scratchpad memory optimization and task scheduling for mpsoc architectures, in CASES, 2006
Google Scholar
H. Salamy, J. Ramanujam, A framework for task scheduling and memory partitioning for multi-processor system-on-chip, in HiPEAC, 2009
Google Scholar
A. Wolfe, Software-based cache partitioning for real-time applications. J. Comput. Softw. Eng. 2(3), 315–327 (1994). http://dl.acm.org/citation.cfm?id=200781.200792
Google Scholar
B.D. Bui, M. Caccamo, L. Sha, J. Martinez, Impact of cache partitioning on multi-tasking real time embedded systems, in RTCSA, 2008
Google Scholar
D. Chiou, S. Devadas, L. Rudolph, B.S. Ang, Dynamic cache partitioning via columnization. Technical Report, MIT, 1999
Google Scholar
D. Tam, R. Azimi, M. Stumm, L. Soares, Managing shared l2 caches on multicore systems in software, WIOSCA, 2007
Google Scholar
J. Liedtke, H. Hartig, M. Hohmuth, Os-controlled cache predictability for real-time systems, in RTAS, 1997
Google Scholar
N. Guan, M. Stigge, W. Yi, G. Yu, Cache-aware scheduling and analysis for multicores. Technical Report, Uppsala University, (http://user.it.uu.se/yi), 2009
C. Kim, D. Burger, S.W. Keckler, An adaptive, nonuniform cache structure for wiredelay dominated on-chip caches, in ASPLOS, 2002
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Nan Guan (Assistant Professor)

Authors

Nan Guan
View author publications
You can also search for this author in PubMed Google Scholar

Appendix: Improving the Interference Computation

The computation of $I_{k}^{i}$, an upper bound of the interference caused by τ _i over J _k, in Eq. (8.3) (Sect. 8.5.1) is grossly over-pessimistic. In the following we will present a more precise computation of $I_{k}^{i}$ by carefully identifying the worst-case scenario of τ _i’s interference.

Recall that the problem window $[r_{k},l_{k}]$ is a time frame of a given length ($l_{k} - r_{k} = S_{k}$) for which we want to derive a bound of how much interference a task τ _i (or rather its jobs) can cause to possibly prevent J _k from running. We can compute $I_{k}^{i}$ using the following lemma:

Lemma 8.2.

An upper bound of the interference contributed by τ _i in the problem window of length S _k can be computed by

$$\displaystyle{ I_{k}^{i} = \left \{\begin{array}{ll} S_{k} &i <k \wedge S_{k} <C_{i} \\ \left \lfloor \frac{S_{k}-C_{i}} {T_{i}} \right \rfloor C_{i} + C_{i}+\omega &i <k \wedge S_{k} \geq C_{i} \\ 0 &i = k \\ \min (C_{i},S_{k}) &i> k \end{array} \right. }$$

(8.10)

where

$$\displaystyle{ \omega =\min \Big (C_{i},\max \big(0,(S_{k} - C_{i})\bmod T_{i} - (T_{i} - D_{i})\big)\Big) }$$

(8.11)

Proof.

The lemma is proved in the following cases:

1.
i < k, i.e., τ _i’s priority is higher than τ _k’s.

If $S_{k} <C_{i}$, i.e., a job of τ _i can execute even longer than J _k’s slack, trivially $I_{k}^{i} = S_{k}$ is a safe bound.

If $S_{k} \geq C_{i}$, the worst-case for $I_{k}^{i}$ occurs when
1. (a)
  one of τ _i’s jobs is released at $l_{k} - C_{k}$,
2. (b)
  all jobs are released with period T _i, and
3. (c)
  the carry-in job executes as late as possible.
See Fig. 8.8. To see that this is indeed the worst-case, we imagine to move the release times of τ _i’s jobs leftwards for a distance $\epsilon ^{l} <T_{i} - C_{i}$ or rightwards for a distance $\epsilon ^{r} <C_{i}$, to see if it is possible to increase $I_{k}^{i}$ by doing so. (It’s easy to see that moving τ _i’s jobs’ releases more in either direction creates a situation equivalent to one of these two cases. Further, $I_{k}^{i}$ cannot be increased if the number of τ _i’s jobs in $[r_{k},l_{k}]$ is decreased, which means we only need to consider the scenario that all jobs are released periodically.) If it is moved leftwards by ε ^l, τ _i’s interference cannot increase at neither the left nor the right end of the interval $[r_{k},l_{k}]$, so moving leftwards for a distance $\epsilon ^{l} <T_{i} - C_{i}$ will not increase the interference. On the other hand, when moving rightwards by ε ^r, the interference is increased by no more than ε ^r at the left end, but decreased by ε ^r at the right end, so moving rightwards for a distance $\epsilon ^{r} <C_{i}$ will also not increase the interference. In summary, based on the scenario in Fig. 8.8, $I_{k}^{i}$ cannot be increased no matter how we move the release time of τ _i. With this worst-case scenario, we can see that the interference contributed by the carry-out job is C _i, the number of the body jobs is $\lfloor (S_{k} - C_{i})/T_{i}\rfloor$ (each contributing C _i interference), and the interference contributed by the carry-in job is bounded by both C _i and the distance between r _k and the carry-in job’s deadline. Thus, for each task τ _i with $i <k \wedge S_{k} \geq C_{i}$, we can compute $I_{k}^{i}$ by
$$\displaystyle{ I_{k}^{i} = \left \lfloor \frac{S_{k} - C_{i}} {T_{i}} \right \rfloor C_{i} + C_{i}+\omega }$$
(8.12)
where ω is defined as in Eq. (8.11).
Fig. 8.8
Computation of $I_{k}^{i}$ if i < k and $S_{k} \geq C_{i}$
Full size image
2.
i = k, i.e., τ _i is the analyzed task. Since $D_{k} \leq T_{k}$ holds for each task τ _k, the other jobs of τ _k cannot interfere with J _k, so in this case we have
$$\displaystyle{ I_{k}^{i} = 0 }$$
(8.13)
3.
i > k, i.e., τ _i’s priority is lower than τ _k.

In FP_CA, a job $J_{i}^{h}$ with lower priority than J _k can interfere with J _k only if it is released earlier than r _k. Therefore, τ _i can only cause interference to J _k with at most one job, so its interference is bounded by C _i. The interference is also bounded by the length of the problem window S _k. Thus, for i > k, we can compute $I_{k}^{i}$ by
$$\displaystyle{ I_{k}^{i} =\min (C_{ i},S_{k}) }$$
(8.14)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Guan, N. (2016). Cache-Aware Scheduling. In: Techniques for Building Timing-Predictable Embedded Systems. Springer, Cham. https://doi.org/10.1007/978-3-319-27198-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-27198-9_8
Published: 04 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27196-5
Online ISBN: 978-3-319-27198-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Cache-Aware Scheduling

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Appendix: Improving the Interference Computation

Appendix: Improving the Interference Computation

Lemma 8.2.

Proof.

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation