Skip to main content
Log in

The OMLP family of optimal multiprocessor real-time locking protocols

  • Published:
Design Automation for Embedded Systems Aims and scope Submit manuscript

Abstract

This paper presents the first suspension-based multiprocessor real-time locking protocols with asymptotically optimal blocking bounds (under certain analysis assumptions). These protocols can be applied under any global, clustered, or partitioned job-level fixed-priority scheduler and support mutual exclusion, reader-writer exclusion, and k-exclusion constraints. Notably, the reader-writer and k-exclusion protocols are the first analytically-sound suspension-based multiprocessor real-time locking protocols of their kind. To formalize a notion of “optimal blocking,” precise definitions of what constitutes “blocking” in a multiprocessor real-time system are given and a simple complexity metric for real-time locking protocols, called maximum priority-inversion blocking (pi-blocking), is introduced. It is shown that, in a system with m processors, Ω(m) maximum pi-blocking is unavoidable. This bound is shown to be asymptotically tight with the introduction of the O(m) multiprocessor locking protocol (OMLP) family presented herein, which includes protocols that ensure an upper bound on maximum pi-blocking that is approximately within a factor of two of the lower bound. In addition to the coarse-grained asymptotic bounds, detailed blocking bounds suitable for schedulability analysis are derived using holistic blocking analysis. Based on the detailed bounds, the proposed locking protocols are compared with each other and with previously-proposed protocols in an empirical schedulability study involving more than one billion task sets. In this study, the OMLP was found to perform better than two variants of the classic (but non-optimal) multiprocessor priority-ceiling protocol (MPCP).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. A protocol has a resource augmentation factor x if any feasible task set that is not schedulable under it is guaranteed to be schedulable on an x-times faster processor.

  2. Without loss of generality, we assume uniform cluster sizes and \(\frac{m}{c}\in \mathbb {N}\). Non-uniform cluster sizes could be trivially integrated into the presented analysis at the expense of additional notation.

  3. For the sake of simplicity, we assume that jobs require a processor for the entirety of each critical section. This is accurate for shared data structures, but may be somewhat pessimistic when accessing devices. The assumption could be relaxed at the expense of additional notation by splitting each request length parameter into a processor component and a suspension component.

  4. Interestingly, in the uniprocessor case, the PCP [43, 47] and the SRP [3] both ensure O(1) maximum pi-blocking regardless of the number of requests, which is possible due to the lack of concurrency (after a job has acquired a resource once, lower-priority jobs cannot lock it again while higher-priority jobs are ready). In the multiprocessor case, resources may be repeatedly locked by concurrently-scheduled remote jobs, which implies that a job may incur pi-blocking each time that it issues a request.

  5. The initial description of the OMLP [17] contained a variant for partitioned scheduling. This special case is not considered herein because, from an analytical point of view, it has since been superseded by the OMLP’s mutex protocol for clustered scheduling.

  6. If WQ q and CQ q are both empty, then DQ q is necessarily empty, too, as any readers in the draining queue would have had to enqueue when it was still the collecting queue (Rule R1) and the roles of CQ q and DQ q are only switched when a writer is waiting (Rules W1 and W3).

  7. It is unknown whether N i ⋅(m−1) is a tight lower bound in absolute terms (i.e., non-asymptotically).

  8. Bootstrapping is a standard technique for estimating sampling statistics for unknown population distributions (e.g., see [27]). Given a sample vector X=(x 1,x 2,…,x s ) consisting of s observations, N bootstrap sample vectors \(Y^{i} = (y_{1}^{i},y_{2}^{i},\ldots,y^{i}_{s})\), where i∈{1,…,N}, are constructed by uniformly choosing each \(y^{i}_{k} \in\{x_{1}, x_{2}, \ldots, x_{s}\}\) (i.e., each Y i is drawn from X with replacement). The distribution of a statistic f(X) can then be estimated by applying f to each Y i; an estimate of the 95 %-confidence interval of f(X) can be obtained from the 2.5th and 97.5th percentiles of the histogram of f(Y i). In our experiments, each x k , where 1≤ks=1,000, is a schedulability test result (i.e., x k ∈{0,1}) and the computed statistic is the sample mean (i.e., the fraction of schedulable task sets). Bootstrapping is well-suited to schedulability experiments since it does not make any assumptions about the underlying population distribution. We used N=10,000 bootstrap samples.

References

  1. Andersson B, Easwaran A (2010) Provably good multiprocessor scheduling with resource sharing. Real-Time Syst 46(2):153–159

    Article  MATH  Google Scholar 

  2. Audsley N, Burns A, Richardson M, Tindell K, Wellings A (1993) Applying new scheduling theory to static priority pre-emptive scheduling. Softw Eng J 8(5):284–292

    Article  Google Scholar 

  3. Baker T (1991) Stack-based scheduling for realtime processes. Real-Time Syst 3(1):67–99

    Article  Google Scholar 

  4. Baker T (2005) A comparison of global and partitioned EDF schedulability tests for multiprocessors. Tech Rep TR-051101, Florida State University

  5. Baker T, Baruah S (2007) Schedulability analysis of multiprocessor sporadic task systems. In: Handbook of real-time and embedded systems. Chapman Hall/CRC, London

    Google Scholar 

  6. Baker T, Baruah S (2009) Sustainable multiprocessor scheduling of sporadic task systems. In: Proceedings of the 21st Euromicro conference on real-time systems, pp 141–150

    Google Scholar 

  7. Baruah S (2007) Techniques for multiprocessor global schedulability analysis. In: Proceedings of the 28th IEEE real-time systems symposium, pp 119–128

    Google Scholar 

  8. Baruah S, Burns A (2006) Sustainable scheduling analysis. In: Proceedings of the 27th IEEE real-time systems symposium, pp 159–168

    Google Scholar 

  9. Bastoni A (2011) Towards the integration of theory and practice in multiprocessor real-time scheduling. Ph.D. thesis, Universita‘ degli Studi di Roma “Tor Vergata”

  10. Bastoni A, Brandenburg B, Anderson J (2010) An empirical comparison of global, partitioned, and clustered multiprocessor EDF schedulers. In: Proceedings of the 31st IEEE real-time systems symposium, pp 14–24

    Google Scholar 

  11. Bastoni A, Brandenburg B, Anderson J (2011) Is semi-partitioned scheduling practical? In: Proceedings of the 23rd Euromicro conference on real-time systems, pp 125–135

    Google Scholar 

  12. Bertogna M, Cirinei M (2007) Response-time analysis for globally scheduled symmetric multiprocessor platforms. In: Proceedings of the 28th IEEE real-time systems symposium, pp 149–160

    Google Scholar 

  13. Block A, Leontyev H, Brandenburg B, Anderson J (2007) A flexible real-time locking protocol for multiprocessors. In: Proceedings of the 13th IEEE conference on embedded and real-time computing systems and applications, pp 47–57

    Google Scholar 

  14. Brandenburg B (2011) Scheduling and locking in multiprocessor real-time operating systems. Ph.D. thesis, The University of North Carolina at Chapel Hill

  15. Brandenburg B, Anderson J (2008) A comparison of the M-PCP, D-PCP, and FMLP on LITMUSRT. In: Proceedings of the 12th international conference on principles of distributed systems. LNCS, vol 5401. Springer, Berlin, pp 105–124

    Chapter  Google Scholar 

  16. Brandenburg B, Anderson J (2008) An implementation of the PCP, SRP, D-PCP, M-PCP, and FMLP real-time synchronization protocols in LITMUSRT. In: Proceedings of the 14th IEEE real-time and embedded technology and applications symposium, pp 185–194

    Google Scholar 

  17. Brandenburg B, Anderson J (2010) Optimality results for multiprocessor real-time locking. In: Proceedings of the 31st real-time systems symposium, pp 49–60

    Google Scholar 

  18. Brandenburg B, Anderson J (2010) Spin-based reader-writer synchronization for multiprocessor real-time systems. Real-Time Syst 46(1):25–87

    Article  MATH  Google Scholar 

  19. Brandenburg B, Anderson J (2011) Real-time resource-sharing under clustered scheduling: mutex, reader-writer, and k-exclusion locks. In: Proceedings of the 9th ACM international conference on embedded software

    Google Scholar 

  20. Brandenburg B, Calandrino J, Block A, Leontyev H, Anderson J (2008) Synchronization on real-time multiprocessors: to block or not to block, to suspend or spin? In: Proceedings of the 14th IEEE real-time and embedded technology and applications symposium, pp 342–353

    Google Scholar 

  21. Calandrino J, Anderson J, Baumberger D (2007) A hybrid real-time scheduling approach for large-scale multicore platforms. In: Proceedings of the 19th Euromicro conference on real-time systems, pp 247–256

    Chapter  Google Scholar 

  22. Calandrino J, Leontyev H, Block A, Devi U, Anderson J (2006) LITMUSRT: a testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE real-time systems symposium, pp 111–123

    Google Scholar 

  23. Carpenter J, Funk S, Holman P, Srinivasan A, Anderson J, Baruah S (2004) A categorization of real-time multiprocessor scheduling problems and algorithms. In: Handbook of scheduling: algorithms, models, and performance analysis. Chapman Hall/CRC, London

    Google Scholar 

  24. Chen C, Tripathi S (1994) Multiprocessor priority ceiling based protocols. Tech Rep CS-TR-3252, Univ of Maryland

  25. Chen M, Lin K (1991) A priority ceiling protocol for multiple-instance resources. In: Proceedings of the 12th IEEE real-time system symposium, pp 140–149

    Google Scholar 

  26. Courtois P, Heymans F, Parnas D (1971) Concurrent control with “readers” and “writers”. Commun ACM 14(10):667–668

    Article  Google Scholar 

  27. Davison A, Hinkley D (1997) Bootstrap methods and their application. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  28. Devi U, Leontyev H, Anderson J (2006) Efficient synchronization under global EDF scheduling on multiprocessors. In: Proceedings of the 18th Euromicro conference on real-time systems, pp 75–84

    Chapter  Google Scholar 

  29. Easwaran A, Andersson B (2009) Resource sharing in global fixed-priority preemptive multiprocessor scheduling. In: Proceedings of the 30th IEEE real-time systems symposium, pp 377–386

    Google Scholar 

  30. Elliott G, Anderson J (2011) An optimal k-exclusion real-time locking protocol motivated by multi-GPU systems. In: Proceedings of the 19th international conference on real-time and network systems

    Google Scholar 

  31. Faggioli D, Lipari G, Cucinotta T (2010) The multiprocessor bandwidth inheritance protocol. In: Proceedings of the 22nd Euromicro conference on real-time systems, pp 90–99

    Google Scholar 

  32. Gai P, di Natale M, Lipari G, Ferrari A, Gabellini C, Marceca P (2003) A comparison of MPCP and MSRP when sharing resources in the Janus multiple processor on a chip platform. In: Proceedings of the 9th IEEE real-time and embedded technology application symposium, pp 189–198

    Google Scholar 

  33. Goossens J, Funk S, Baruah S (2003) Priority-driven scheduling of periodic task systems on multiprocessors. Real-Time Syst 25(2–3):187–205

    Article  MATH  Google Scholar 

  34. Hsiu PC, Lee DN, Kuo TW (2011) Task synchronization and allocation for many-core real-time systems. In: Proceedings of the 9th ACM international conference on embedded software. ACM, New York, pp 79–88

    Google Scholar 

  35. Joseph M, Pandya P (1986) Finding response times in a real-time system. Comput J 29(5):390–395

    Article  MathSciNet  Google Scholar 

  36. Lakshmanan K, Niz D, Rajkumar R (2009) Coordinated task scheduling, allocation and synchronization on multiprocessors. In: Proceedings of the 30th IEEE real-time systems symposium, pp 469–478

    Google Scholar 

  37. Liu C, Layland J (1973) Scheduling algorithms for multiprogramming in a hard real-time environment. J ACM 30:46–61

    Article  MathSciNet  Google Scholar 

  38. Liu J (2000) Real-time systems. Prentice Hall, New York

    Google Scholar 

  39. Macariu G, Cretu V (2011) Limited blocking resource sharing for global multiprocessor scheduling. In: Proceedings of the 23rd Euromicro conference on real-time systems, pp 262–271

    Google Scholar 

  40. Nemati F, Behnam M, Nolte T (2011) Independently-developed real-time systems on multi-cores with shared resources. In: Proceedings of the 23rd Euromicro conference on real-time systems, pp 251–261

    Google Scholar 

  41. Nemati F, Nolte T, Behnam M (2010) Partitioning real-time systems on multiprocessors with shared resources. In: Proceedings of the 14th international conference on principles of distributed systems. LNCS, vol 6490, pp 253–269

    Chapter  Google Scholar 

  42. Rajkumar R (1990) Real-time synchronization protocols for shared memory multiprocessors. In: Proceedings of the 10th international conference on distributed computing systems, pp 116–123

    Chapter  Google Scholar 

  43. Rajkumar RS (1991) In: Real-time systems—a priority inheritance approach. Kluwer Academic, Dordrecht

    Google Scholar 

  44. Rajkumar R, Sha L, Lehoczky J (1988) Real-time synchronization protocols for multiprocessors. In: Proceedings of the 9th IEEE real-time systems symposium, pp 259–269

    Chapter  Google Scholar 

  45. Ridouard F, Richard P, Cottet F (2004) Negative results for scheduling independent hard real-time tasks with self-suspensions. In: Proceedings of the 25th IEEE real-time systems symposium, pp 47–56

    Chapter  Google Scholar 

  46. Schliecker S, Negrean M, Ernst R (2009) Response time analysis on multicore ECUs with shared resources. IEEE Trans Ind Informatics 5(4):402–413

    Article  Google Scholar 

  47. Sha L, Rajkumar R, Lehoczky J (1990) Priority inheritance protocols: an approach to real-time synchronization. IEEE Trans Comput 39(9):1175–1185

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank Tomas Kalibera for suggesting the use of bootstrapping in schedulability experiments. Work supported by the Max Planck Society; NSF grants CNS 0834270, CNS 0834132, and CNS 1016954; ARO grant W911NF-09-1-0535; AFOSR grant FA9550-09-1-0549; and AFRL grant FA8750-11-1-0033. The first author was supported in part by a UNC Dissertation Completion Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Björn B. Brandenburg.

Appendix: Schedulability analysis

Appendix: Schedulability analysis

In this appendix, we first introduce a generic framework for expressing bounds on pi-blocking and then apply it to bound pi-blocking under each of the locking protocols presented in this paper. The blocking analysis presented in the following is essential for deriving safe blocking bounds suitable for schedulability analysis. However, such bounds tend to be somewhat technical in nature and are primarily required only for implementing schedulability tests (as used in the schedulability experiments presented in Sect. 5); the casual reader may safely skip this appendix and consult the overview presented in Sect. 4 instead.

The framework presented in the following is generic in the sense that it is not tied to any particular locking protocol. It serves two purposes. For one, it avoids redundancy in the subsequent analysis of the locking protocols, which have structurally similar blocking bounds. Second, the presented analysis takes a holistic analysis approach to reduce the pessimism inherent in analyzing requests individually. That is, it is intended to be applied to each job as a whole and bounds blocking across all requests that a job issues, instead of bounding delays on a request-by-request basis.

The presented holistic analysis approach was first used to analyze the FMLP under P-FP scheduling [16], and subsequently further developed to analyze RW spinlocks [18]. The version presented herein has been somewhat simplified compared to the previous variants. We next explain the intuition underlying the approach, which we then formalize in Sect. A.2 below.

1.1 A.1 Holistic blocking analysis

In the following, let J i denote an arbitrary job of the task T i for which a bound on maximum blocking is being derived. The main idea of the holistic approach is to avoid accounting for any individual possibly-blocking request more than once, and to avoid accounting for requests that cannot possibly interfere with J i ’s requests. In particular, when a job request the same resource more than once, the holistic approach can avoid substantial pessimism compared to analyzing each resource request in isolation, and especially so if long requests occur much less frequently than short requests (i.e., if there are large differences among the tasks’ L i,q , N i,q , and p i parameters).

Example 9

To illustrate possible pessimism when analyzing requests individually, consider the following scenario (in this and the following examples, the use of the clustered OMLP’s mutex variant is assumed). Suppose a task T i shares a serially-reusable resource q with another task T x . Further, suppose J i requests q up to N i,q =20 times and that jobs of T x hold q for at most L x,q =10 time units. Finally, suppose jobs of T x require q at most once while any J i is pending. When analyzing each of J i ’s many requests individually (i.e., when bounding the maximum pi-blocking incurred by a single request), T x ’s sole interfering request is effectively considered to block each of J i ’s requests since T x ’s request might delay any of the requests (but not all at once). Consequently, J i ’s overall bound on pi-blocking due to requests for q would be N i,q L x,q =200 time units, whereas the actual maximum possible delay is only L x,q =10 time units since, when considering J i ’s entire execution, J x obviously delays J i with at most only one blocking request for q , and not with up to 20.

This example demonstrates that maximum contention should be analyzed as a whole across all of J i ’s requests for a particular resource. (Since we assume that requests for resources are not nested, blocking bounds for individual resources are independent from each other and can be derived individually.) The extent to which J i is blocked due to requests for a resource q in the worst case is limited by the following constraints:

  1. 1.

    Maximum number of requests issued by other jobs. As discussed above in Example 9, if jobs of T x issue at most k requests while any J i is pending, then J i will be blocked by at most k requests of jobs of T x , regardless of the number of requests issued by J i .

  2. 2.

    Maximum number of interfering requests per request issued by J i . Suppose J i requests a serially-reusable resource q only once, that m=4, and that q is requested by other jobs up to k=100 times while J i is pending. In this case, J i is delayed by at most m−1=3 competing requests, irrespective of the total number of requests k for q since priority donation limits the maximum queue length to m jobs.

  3. 3.

    Maximum number of interfering requests per task. For example, suppose q is shared among three tasks T i , T x , and T y . If J i issues only one request, then it is blocked by at most one request from T x and one request from T y , irrespective of the total number of requests issued by these tasks, and irrespective of the number of processors. Due to the FIFO ordering in the wait queue FQ q , each task can precede J i at most once per request.

  4. 4.

    Task locality. For example, suppose T i shares a resource with tasks T x and T y under partitioned scheduling (c=1), and that T i and T x are assigned to processor 1, whereas T y is assigned to processor 2. Jobs of T y can cause J i to incur acquisition delay because they can issue conflicting requests while J i is scheduled. In contrast, jobs of T x cannot cause J i to incur acquisition delay because jobs of T x are not scheduled while J i is executing; however, a job J x can cause J i to incur pi-blocking if J i must serve as J x ’s priority donor upon release.

We formalize these four constraints next.

1.2 A.2 Interference sets

We begin with Constraint 1 by bounding the maximum resource requirements of competing tasks. In the task model assumed in this paper, a task T i ’s resource requirements are characterized by the parameters N i,q and L i,q . The main advantages of this model are that it is general enough to reflect many possible job behaviors (e.g., no particular request order or minimum separation of requests is assumed) and that the required information can be obtained as part of worst-case execution time analysis (or empirically bounded if such analysis is not available). However, it is possible that more detailed knowledge is available for specific applications.

For example, it could be the case that jobs of a task T i access a resource q twice, and that the second access is always much shorter than the first access. In this case, using a single upper bound L i,q for both requests is needlessly pessimistic. A similar concern arises with resources that are not accessed by every job of T i . For, example to reduce overheads, an application T a could be programmed to record status information in a shared log l only once every five jobs. Assuming that each J a requests access to l would needlessly overestimate contention for l . However, explicitly incorporating all such considerations yields a task model that is overly complicated for our goal (which is to study the underlying algorithmic properties of the protocols).

We instead use an abstraction called task interference bound to achieve a separation of concerns between the modeling of resource requirements and the actual analysis of locking protocols, which is structurally independent from model considerations. A task’s interference bound (for non-processor resources) is similar to a demand bound function (for processor time) in that it “upper bounds” a task’s worst-case resource requirement during some interval. The actual blocking analysis is expressed in terms of task interference, which can be defined to take advantage of detailed application-specific resource usage information. The primary benefit of this approach is that derived blocking terms can be reused to derive less pessimistic bounds when additional information in form of a more-detailed task model is available.

In the following, to achieve the desired separation of concerns, we formalize a task’s “interference bound” as a set of requests that safely approximates a task’s “actual contention” for a resource. Recall from Sect. 2.2 that \(\mathcal {R}_{i,q,v}\) denotes the vth request for resource q issued by any J i , and that \({\mathcal{L}}_{i,q,v}\) denotes the request length of \(\mathcal {R}_{i,q,v}\). This allows us to formalize the concept of a task’s “contention for a resource.”

Definition 6

Suppose jobs of a task T i execute k resource requests for a resource q during an interval [t 0,t 1). In a concrete, fixed schedule, the contention due to T i during [t 0,t 1) is the set of requests

$$C_{i,q}(t_0, t_1) \triangleq \{ \mathcal {R}_{i,q,v}, \mathcal {R}_{i,q,(v+1)}, \ldots, \mathcal {R}_{x,q,(v+k-1)} \} $$

such that \(\mathcal {R}_{i,q,v}\) is the first request and \(\mathcal {R}_{i,q,(v+k-1)}\) is last request issued by any J i during [t 0,t 1).

In general, v and k are unknown prior to the execution of T i , as is the length of each request in C i,q (t 0,t 1). To enable a priori analysis, a generic notion of worst-case contention is required. The purpose of T i ’s request interference bound, given next, is to define a set of generic requests (i.e., virtual requests defined for analysis purposes) that upper-bound the worst-case contention during any interval of length t 1t 0. That is, the interference bound for an interval of length t 1t 0 contains at least as many requests as T i issues in any interval of length t 1t 0 in any actual schedule, and each generic request is at least as long as a corresponding actual one. This can be formalized as follows.

Definition 7

The task interference bound for an interval of length t, denoted tif(T i , q ,t), is a set of generic requests that satisfies the following two properties.

  1. 1.

    For any C i,q (t 0,t 1) with regard to some actual schedule, one can choose a set of corresponding generic requests \(C'_{i,q} \subseteq\mathit{tif}(T_{i}, \ell _{q}, t_{1} - t_{0})\) that satisfies

    $$\bigl \vert C'_{i,q} \bigr \vert = \bigl \vert C_{i,q}(t_0, t_1) \bigr \vert \quad \text{and} \quad \sum_{\mathcal {R}_{i,q,v} \in C_{i,q}(t_0, t_1)} {\mathcal{L}}_{i,q,v} \leq\sum _{\mathcal {R}_{i,q,w} \in C'_{i,q}} {\mathcal{L}}_{i,q,w}. $$
  2. 2.

    Interference bounds are inclusive:

    $$t \leq t' \quad \Rightarrow\quad \mathit{tif}(T_i, \ell _q, t) \subseteq\mathit {tif}\bigl(T_i, \ell _q, t'\bigr). $$

Property 1 ensures that the task interference bound does not underestimate the number and length of requests in any actual execution of T i , and Property 2 ensures that a derived bound remains valid when analyzing a larger-than-necessary interval (i.e., when over-estimating a job’s response time). In the case of RW constraints, we analogously define task T i ’s read interference bound, denoted as rif(T i , q ,t), with respect to read requests for q , and T i ’s write interference bound, denoted as wif(T i , q ,t) with respect to write requests for q .

These definitions serve as an interface that allows the analysis of specific lock types presented in the following sections to be seamlessly integrated with more-refined task and resource models. Next, we provide a suitable definition of tif(T i , q ,t) for the model assumed in this paper. To this end, we require the following well-known bound on the maximum number of jobs that can execute requests in a given interval. Recall from Sect. 2 that p i denotes T i ’s period and r i denotes T i ’s maximum response time.

Lemma 16

At most \(\lceil \frac{t + r_{i}}{p_{i}} \rceil\) distinct jobs of a task T i can execute in any interval of length t (without proof, see e.g. [14, 18]).

It follows from Lemma 16 and the definition of N i,q that jobs of T i issue at most ⌈(t+r i )/p i ⌉⋅N i,q requests for q over any interval of length t. In the worst case, each request for q is of length L i,q . This yields the following interference bound for the task model assumed herein.

Definition 8

The request interference bound for task T i with respect to resource q over any interval of length t is the set of requests

$$\mathit{tif}(T_i, \ell _q, t) \triangleq \biggl\{ \mathcal {R}_{i,q,v}\ |\ 1 \leq v \leq N_{i,q} \cdot \biggl\lceil \frac{t + r_i}{p_i} \biggr\rceil \biggr\}, $$

where \({\mathcal{L}}_{i,q,v} = L_{i,q}\) for each \(\mathcal {R}_{i,q,v}\). If T i does not access a given resource q , then tif(T i , q ,t)=∅ for all t. We analogously define task T i ’s read and write interference bounds as rif(T i , q ,t) and wif(T i , q ,t), respectively.

Based on per-task interference bounds, we next introduce a generic, parametrized “aggregate interference bound” for use in the subsequent analysis of specific locking protocols. We first define three convenience functions over sets of requests, which serve to simplify the expression of “aggregate interference” and protocol-specific bounds on blocking.

Definition 9

Given a set of requests S, we let S k denote the kth longest request in S, where 1≤k≤|S| (with ties broken arbitrarily but consistently). Formally, if 1≤kl≤|S| and \(S_{k} = \mathcal {R}_{a,b,c}\) and \(S_{l} = \mathcal {R}_{x,y,z}\), then \({\mathcal{L}}_{a,b,c} \geq {\mathcal{L}}_{x,y,z}\).

Definition 10

Given a set of requests S, we denote the set of the l longest requests in S as top(l,S)≜{S k | 1≤k≤min(l,|S|)} and their total duration as \(\mathit{total}(l, S) \triangleq\sum_{\mathcal {R}_{i,q,v} \in\ \mathit{top}(l, S)} {\mathcal{L}}_{i,q,v}\). If l=0 or S=∅, then total(l,S)=0.

A task interference bound limits the maximum contention from jobs of a single task. Using the above definitions, we can formalize the notion of contention from a set of tasks. Recall Constraint 3 from Sect. A.1 above, namely that the number of requests per task that can possibly cause J i to incur acquisition delay is limited if jobs wait in FIFO order. If a task T x can delay J i with at most l requests, then it is sufficient to consider only the l longest requests in T x ’s interference bound. We therefore define the aggregate interference bound with a per-task “interference limit” parameter.

Definition 11

The aggregate interference bound of a set of tasks τ with respect to a resource q over any interval of length t and subject to an interference limit l is given by

A task set’s aggregate read interference and aggregate write interference, denoted as rifs(τ, q ,t,l) and wifs(τ, q ,t,l), respectively, are defined analogously with respect to read and write interference.

Given an interference limit l, tifs(τ, q ,t,l) contains the l longest requests in each task’s interference bound for q and t. In the task model assumed in this paper, each request in tif(T x , q ,t) is in fact of the same length \({\mathcal{L}}_{x,q,v} = L_{x,q}\) (see Definition 8). We define tifs(τ, q ,t,l) with additional generality to accomodate more-expressive task models for which tif(T x , q ,t) may contain non-uniform request lengths.

The holistic blocking analysis framework incorporates Constraints 1 and 3 from Sect. A.1 in a generic fashion. The remaining Constraints 2 and 4 are easier to incorporate on a protocol-by-protocol basis, which we do next to derive concrete, non-asymptotic bounds for the locking protocols presented in this paper.

1.3 A.3 The global OMLP for mutual exclusion

We begin with the global OMLP for mutex constraints under s-oblivious schedulability analysis. Since the global OMLP uses a hybrid queue that consists of a FIFO queue FQ q (which holds at most m jobs) and of a priority queue PQ q (which is only used if at least m+1 jobs are queued), maximum s-oblivious pi-blocking under the global OMLP depends on how many tasks share a given resource.

Definition 12

In the following, let A q ≜|{T i | T i τN i,q >0}| denote the number of tasks that access resource q .

If A q m+1, then at most m jobs are waiting to acquire q at any time, which implies that at most one job is queued in PQ q . In this case, the global OMLP reduces to a simple FIFO protocol.

Lemma 17

Under the global OMLP, if A q m+1, then a job J i incurs at most

s-oblivious pi-blocking due to requests for resource q .

Proof

J i ’s response time r i upper-bounds the duration of the interval during which other jobs can issue conflicting requests; that is, the aggregate task interference bound tifs(τ∖{T i },r i ,N i,q ) for any interval of length r i is a sufficient approximation of the resource demands of competing tasks. If J i is never enqueued in PQ q , then the lemma follows trivially.

Otherwise, if J i is enqueued in PQ q , then m jobs are already enqueued in FQ q at the time of J i ’s request. Since A q m+1, this implies that no other job is enqueued in PQ q . As soon as the head of FQ q releases q , J i is moved to FQ q . Hence there is at most one job in PQ q at any time, and the ordering of PQ q is irrelevant.

The FIFO ordering of FQ q implies that each of J i ’s requests is preceded by at most one request from each other task that accesses q . The per-task interference limit is hence N i,q . Since q is shared among only A q m+1 tasks, one of which is T i , no more than (A q −1)⋅N i,q requests pi-block J i in total. Priority inheritance ensures that the resource-holding job is scheduled whenever J i incurs s-oblivious pi-blocking; the cumulative duration of the (A q −1)⋅N i,q longest requests for q by tasks other than J i thus bounds maximum s-oblivious pi-blocking. □

In the case of A q >m+1, higher-priority jobs of some other task T x may “skip ahead” of J i repeatedly while J i waits in PQ q . However, the per-task interference limit is still limited to 2⋅N i,q , that is, the per-task interference limit is only doubled even if jobs of T x “skip ahead” an arbitrary number of times.

Lemma 18

Let T x denote some task other than T i that accesses q (i.e., T i T x and N x,q >1). Under the global OMLP, jobs of T x cause J i to incur s-oblivious pi-blocking for at most the duration of two requests each time that J i requests q .

Proof

In order to pi-block J i , a request issued by some J x must precede J i ’s request in FQ q (i.e., J x enters FQ q before J i does). If A q m+1, the bound follows analogously to Lemma 17 since FQ q is FIFO-ordered.

Hence assume A q >m+1. In this case, jobs of T x may enter FQ q repeatedly while J i waits in PQ q . Let t a denote the first time that a job of T x , denoted J x,a , enters FQ q , and let t b denote the second time that a job of T x , denoted J x,b , enters FQ q while J i is continuously waiting in PQ q . Since tasks are sequential, J x,b necessarily issued its request after J i issued its request (this is not necessarily the case with J x,a ).

Further, let t 1 denote the time that J i enters FQ q (as indicated in Fig. 10). If t 1 does not exist (i.e., if J i never enters FQ q ), then either FQ q is continuously populated with higher-priority jobs and J i does not incur s-oblivious pi-blocking, or some requests fails to complete (which is not possible since each L i,q is presumed finite). Therefore assume t 1 exists.

J i does not incur s-oblivious pi-blocking during [t b ,t 1). Since J i is waiting in PQ q at time t a , J x,a is necessarily preceded by m−1 other jobs in FQ q , which must complete before J x,a ’s request is satisfied. Since tasks are sequential, J x,a has completed its request before J x,b enters FQ q at time t b . Therefore, at least m higher-priority jobs must have entered FQ q during [t a ,t b ); otherwise, J i would no longer be waiting in PQ q at time t b . The presence of m higher-priority pending jobs rules out s-oblivious pi-blocking after t b (until J i enters FQ q at time t 1).

Therefore, at most one of the requests issued by jobs of T x after J i issued its request pi-blocks J i . Since sporadic tasks are sequential, at most one request of T x that was issued prior to J i ’s request is incomplete when J i issues its request. Hence, at most two requests of T x cause J i to incur pi-blocking. □

As a result, the per-task interference limit in the case of A q >m+1 is 2⋅N i,q . This yields the following bound.

Lemma 19

Under the global OMLP, if A q >m+1, then a job J i incurs at most

s-oblivious pi-blocking due to requests for resource q .

Proof

By Lemma 14, J i incurs s-oblivious pi-blocking for the combined duration of at most 2⋅m−1 requests each time that it requests q , which implies that J i is delayed by at most (2⋅m−1)⋅N i,q requests in total. Lemma 18 implies an interference limit of 2⋅N i,q . Priority inheritance ensures that the resource-holding job is scheduled whenever J i incurs pi-blocking. The bound follows. □

This yields the following overall bound on maximum s-oblivious pi-blocking.

Theorem 4

Under the global OMLP, a job J i incurs s-oblivious pi-blocking for at most

time units, where x q A q and l q ≜1 if A q m+1, and x q ≜2⋅m and l q ≜2 if A q >m+1.

Proof

Follows from Lemmas 17 and 19, since resource requests are not nested, and since J i does not incur s-oblivious pi-blocking under the global OMLP while not requesting resources. □

This concludes the analysis of the global OMLP. Next, we consider the clustered OMLP from Sects. 4.24.4, which uses priority donation instead of priority inheritance.

1.4 A.4 The clustered OMLP for mutual exclusion

A job J i is subject to two sources of s-oblivious pi-blocking under the clustered OMLP. J i can be delayed each time it issues requests for shared resources, and additionally once upon release if it serves as a priority donor. We begin with the mutex variant of the clustered OMLP, which is the simplest of the three protocols based on priority donation. Recall from Sect. 4.2 that each resource q is protected by a simple FIFO queue FQ q .

Lemma 20

Under the clustered OMLP ’s mutex protocol, a job J i incurs at most

pi-blocking due to requests for resource q issued by jobs of tasks assigned to the jth cluster.

Proof

By Lemma 4, priority donation ensures that at most c requests are incomplete at any time in each cluster; therefore, at most c requests from each cluster C j precede J i in FQ q each time that it issues a request. The strict FIFO ordering in FQ q ensures a per-task interference limit of N i,q . Due to priority donation, resource-holding jobs are always scheduled (Lemma 2). In the case of J i ’s local cluster (i.e., if j=P i ), only c−1 requests can interfere since J i ’s own request counts towards the limit of c concurrent requests imposed by priority donation. Since jobs and tasks are sequential, J i is not delayed by requests of (other) jobs of T i . □

When bounding the maximum pi-blocking due to priority donation, we only need to consider the set of tasks that could have released a lower-priority job prior to J i ’s arrival since priorities are only donated to jobs with lower base priority. This set of tasks necessarily depends on the specific scheduling policy.

Definition 13

We let lower(T i ) denote the set of local tasks that could potentially require one of T i ’s jobs to serve as a priority donor upon release. Under EDF-based schedulers, lower(T i ) includes only tasks with longer relative deadlines. Under FP-based schedulers, lower(T i ) includes tasks with lower priorities.

Lemma 21

Under the clustered OMLP ’s mutex protocol, a job J i incurs at most \(b_{i}^{D}\) s-oblivious pi-blocking upon release while serving as a priority donor, where

Proof

By Lemma 3, maximum s-oblivious pi-blocking due to priority donation is limited to one request span. Analogously to Lemma 20, \(b_{i}^{D}\) bounds the maximum request span of any local, potentially lower-priority job J x by considering the c longest requests in each remote cluster that could cause J x to incur acquisition delay, and the c−1 longest requests in J x ’s local cluster. □

Theorem 5

Under the clustered OMLP ’s mutex protocol, a job J i incurs at most

s-oblivious pi-blocking due to requests for shared resources, where b i,q,j and \(b_{i}^{D}\) are defined as in Lemmas 20 and 21, respectively.

Proof

Follows from Lemmas 20 and 21, and the assumptions that resource requests are not nested and that tasks do not migrate across cluster boundaries. □

1.5 A.5 The clustered OMLP for RW exclusion

The bounds on maximum pi-blocking under the OMLP’s RW protocol are structurally similar to the bounds on maximum spin-blocking and pi-blocking under non-preemptive phase-fair RW spinlocks that we previously presented in [18]. This is because the OMLP implements phase-fairness, and because priority donation allows at most c concurrent requests in each cluster, which has an effect that is equivalent to non-preemptive execution.

We begin by considering the set of potentially blocking write requests. Since write requests are satisfied in FIFO order with respect to other write requests, maximum acquisition delay incurred by a writer due to earlier-issued write requests is the same under the OMLP’s mutex and RW variants. However, since reader and writer phases alternate, the maximum acquisition delay incurred by a reader due to earlier-issued write requests is limited to one critical section. That is, at most \(N^{W}_{i,q} \cdot c + N^{R}_{i,q}\) write requests issued by jobs of a remote cluster can block J i under the clustered OMLP’s RW variant. In the case of J i ’s local cluster, if c>1, then the same reasoning applies and no more than \(N^{W}_{i,q} \cdot(c - 1) + N^{R}_{i,q}\) write requests block J i . In the special case of c=1, local jobs cannot cause J i to incur acquisition delay since they are not scheduled while J i waits. These considerations lead to the following definition of the set of possibly-interfering write requests.

Definition 14

In the following, let \(x^{\mathit{rem}} = N^{W}_{i,q} \cdot c + N^{R}_{i,q}\) and \(x^{\mathit{loc}} =N^{W}_{i,q} \cdot(c - 1) + N^{R}_{i,q}\), and define the sets of possibly-interfering write requests from jobs in the jth cluster, denoted as W(T i ,j, q ), as follows.

Further, let W i,q denote the union of all possibly-interfering write requests across all clusters, and let w i,q denote the maximum number of blocking write requests.

$$W_{i,q} = \bigcup_{j = 1}^{m/c} W(T_i, j, \ell _q) \quad w_{i,q} = \vert W_{i,q} \vert $$

Next, we consider the set of potentially blocking read requests. The defining property of an RW lock is that readers do not directly block other readers. That is, in the absence of any writers, a reader is not delayed in RW locks regardless of the number of concurrent read requests. Intuitively, a reader phase can only transitively block another read request if said phase is “assisted” by an also-blocking, interspersed write request. This intuition can be formalized to characterize acquisition delay due to interfering read requests in terms of the number of interfering write requests.

Lemma 22

(From [14, 18])

Let J i denote a job that issues at most \(N_{i,q}^{W}\) write requests for a resource q , let w denote the number of write requests that cause J i ’s write requests for q to incur acquisition delay, and let r denote the number of reader phases that cause J i ’s write requests for q to incur acquisition delay. If q is protected by a phase-fair RW lock, then \(r \leq w + N_{i,q}^{W}\).

Similarly, a writer that is not delayed by other writers incurs acquisition delay for the duration of at most one read request regardless of the number of blocking readers. For example, if m−1 readers hold a resource q when J i issues a write request for q , then all m−1 readers proceed in parallel and J i incurs acquisition delay only for the duration of the longest earlier-issued read request. Therefore, J i incurs acquisition delay due to interfering read requests for the combined duration of at most \(N^{R}_{i,q} + (m-1) \cdot N^{W}_{i,q}\) read requests (recall Lemmas 8 and 9). Taken together, this leads to the following definition.

Definition 15

Let \(r_{i,q} = \min(w_{i,q} + N^{W}_{i,q},\ N^{R}_{i,q} + (m-1) \cdot N^{W}_{i,q})\), and define the sets of possibly-interfering read requests from jobs in the jth cluster, denoted as R(T i ,j, q ), as follows.

Analogously to W i,q , let R i,q denote the set of all possibly interfering read requests across all clusters.

$$R_{i,q} = \bigcup_{j = 1}^{m/c} R(T_i, j, \ell _q). $$

With these definitions in place, we can state the following bound on pi-blocking due to requests for a given resource.

Lemma 23

Under the clustered OMLP ’s RW protocol, a job J i incurs pi-blocking due to its read and write requests for resource q for at most b i,q =total(w i,q , W i,q )+total(r i,q , R i,q ) time units.

Proof

Analogously to Lemma 20. Each time that J i issues a write request, it can be preceded by up to c other write requests in each cluster since the writer queue WQ q is FIFO ordered, and because priority donation allows at most c concurrent requests per cluster. Also due to the FIFO order, each other task can block each of J i ’s write requests with at most one request. Each time that J i issues a read request, it is blocked by at most one write request since the OMLP implements phase-fairness. Therefore, the per-task interference with regard to write requests is \(N^{W}_{i,q}+N^{R}_{i,q}\), and in total J i ’s \(N^{R}_{i,q}\) read requests and \(N^{W}_{i,q}\) write requests are blocked by at most \(N^{R}_{i,q}+N^{W}_{i,q} \cdot c\) write requests in the case of a remote cluster, and by at most \(N^{R}_{i,q}+N^{W}_{i,q} \cdot(c-1)\) requests in the case of J i ’s local cluster. The definitions of W(T i ,j, i,q ) and W i,q follow.

By Lemma 22, the upper bound on the total number of blocking writes w i,q implies an upper bound of \(w_{i,q} + N^{W}_{i,q}\) on the number of blocking reader phases. The total number of blocking reader phases is also limited to \(N^{R}_{i,q} + (m-1) \cdot N^{W}_{i,q}\): due to priority donation and because reader and writer phases alternate in a phase-fair RW lock, each of J i ’s read requests is transitively blocked by at most one reader phase, and each of J i ’s write requests is blocked by at most m−1 interspersed reader phases (since at most m−1 write requests block each of J i ’s write requests). The lesser of the two bounds limits the total number of blocking reader phases r i,q . The definitions of R(T i ,j, q ) and R i,q follow.

Since J i is blocked by at most w i,q writer phases and r i,q reader phases, total s-oblivious pi-blocking is bounded by the w i,q longest requests in W i,q and the r i,q longest request in R i,q . □

Since the clustered OMLP uses priority donation, a job may also incur s-oblivious pi-blocking when serving as a priority donor. The duration of priority donation depends on the request span of the priority recipient’s request, which may be either a write or a read. The maximum acquisition delay of a single write request for resource q issued by job J i can be bounded by instantiating Definitions 14 and 15 assuming \(N^{R}_{i,q}=0\) and \(N^{W}_{i,q}=1\). Similarly, the maximum acquisition delay of a single read request for q can be bounded by instantiating said definitions assuming \(N^{R}_{i,q}=1\) and \(N^{W}_{i,q}=0\). To avoid needless repetition, we use the following definitions to denote these two special cases.

Definition 16

Let \(W'_{i,q}\) and \(w'_{i,q}\) denote the values of W i,q and w i,q , respectively, that result when assuming \(N^{R}_{i,q}=0\) and \(N^{W}_{i,q}=1\) in Definition 14 above. Similarly, let \(W''_{i,q}\) and \(w''_{i,q}\) denote the values of W i,q and w i,q , respectively, that result when assuming \(N^{R}_{i,q}=1\) and \(N^{W}_{i,q}=0\) in Definition 14 above.

Definition 17

Let \(R'_{i,q}\) and \(r'_{i,q}\) denote the values of R i,q and r i,q , respectively, that result when assuming \(N^{R}_{i,q}=0\) and \(N^{W}_{i,q}=1\) in Definition 15 above. Similarly, let \(R''_{i,q}\) and \(R''_{i,q}\) denote the values of R i,q and R i,q , respectively, that result when assuming \(N^{R}_{i,q}=1\) and \(N^{W}_{i,q}=0\) in Definition 15 above.

With these special cases in place, we can express the maximum request span. Recall from Definition 13 that we let lower(T i ) denote the set of tasks local to T i that could potentially cause J i to incur pi-blocking upon release.

Lemma 24

Under the clustered OMLP ’s RW protocol, a job J i incurs at most \(b_{i}^{D} = \max(b_{i}', b_{i}'')\) s-oblivious pi-blocking upon release while serving as a priority donor, where

bounds the case of a writing priority recipient, and where

bounds the case of a reading priority recipient.

Proof

Follows analogously to Lemma 21 since J i serves as a priority donor at most once and at most for the duration of one request span. The maximum request span of a lower-priority write request is bounded by \(b'_{i}\); the maximum request span of a lower-priority read request is bounded by \(b''_{i}\). The maximum of either scenario bounds maximum s-oblivious pi-blocking due to priority donation under the clustered OMLP for RW exclusion. □

This yields the following bound on s-oblivious pi-blocking.

Theorem 6

Under the clustered OMLP ’s mutex protocol, a job J i incurs at most

s-oblivious pi-blocking due to read and write requests for shared resources, where b i,q and \(b_{i}^{D}\) are defined as in Lemmas 23 and 24, respectively.

Proof

Follows from Lemmas 23 and 24, and the assumptions that resource requests are not nested and that tasks do not migrate across cluster boundaries. □

1.6 A.6 The clustered OMLP for k-exclusion

In this section, we establish a bound on s-oblivious pi-blocking under the clustered OMLP for k-exclusion, which is presented in Sect. 4.4. The presented analysis is reasonably tight if blocking requests are relatively uniform in duration. However, if request lengths are heavily skewed (i.e., if there are some infrequent, long-running requests, but most requests are short), then a more accurate bound could be obtained by applying multiprocessor response-time analysis for non-preemptive global FIFO scheduling to each resource. In the following simpler analysis, which suffices for our purposes, some pessimism arises because Lemma 11, which implicitly lower-bounds the request completion rate, does not take non-uniform request lengths into account.

Lemma 25

Under the clustered OMLP ’s k-exclusion protocol, a job J i incurs at most b i,q s-oblivious pi-blocking due to requests for resource q , where

Proof

By Lemma 4, priority donation ensures that at most c requests are incomplete at any time in each cluster; therefore, at most c requests in each cluster precede J i in KQ q or hold a replica of q at the time that J i issues a request. The FIFO ordering of jobs in KQ q ensures a per-task interference limit of N i,q . Therefore, the set of the N i,q c longest requests issued by jobs in the jth cluster, denoted b i,q,j , bounds the worst-case interference from jobs in that cluster. In the case of J i ’s local cluster, only c−1 requests can interfere since J i ’s own request counts towards the limit of c concurrent requests imposed by priority donation.

Lemma 11 implies that J i holds a replica of q after at most ⌈(mk q )/k q ⌉ prior requests for q complete. Therefore, across all N i,q requests, J i is pi-blocked at most for the cumulative duration of the N i,q ⋅⌈(mk q )/k q ⌉ longest requests issued by jobs in any cluster. □

To bound maximum s-oblivious pi-blocking due to priority donation, we again require a bound for a single request. Such a bound can be obtained by applying Lemma 25 above to a single request.

Definition 18

Let \(b'_{i,q}\) denote the value of b i,q computed assuming N i,q =1 in Lemma 25 above.

Recall from Definition 13 that we let lower(T i ) denote the set of tasks local to T i that could potentially cause J i to incur pi-blocking upon release.

Lemma 26

Under the clustered OMLP ’s k-exclusion protocol, a job J i incurs at most

s-oblivious pi-blocking upon release while serving as a priority donor.

Proof

Follows analogously to Lemma 21 and Lemma 24. □

Theorem 7

Under the clustered OMLP ’s k-exclusion protocol, a job J i incurs at most

s-oblivious pi-blocking due to requests for shared resources, where b i,q and \(b_{i}^{D}\) are defined as in Lemmas 25 and 26, respectively.

Proof

Follows from Lemmas 25 and 26, and since resource requests are not nested. □

1.7 A.7 Schedulability test

Having derived bounds on maximum s-oblivious pi-blocking, any sustainable [6, 8] locking-unaware schedulability test can be used to establish schedulability under the OMLP. In short, we require a sustainable schedulability test because each task’s parameter b i is only an upper bound (i.e., it is not exact); therefore the employed schedulability test must be resilient to execution cost decreases at runtime.

Recall that b i was derived assuming that suspended higher-priority jobs are accounted for as demand. Thus, each per-job execution time must be inflated by b i before applying existing schedulability tests that assume tasks to be independent.

Theorem 8

Let \(\mathcal{T}\) denote a sustainable schedulability test for independent tasks for the employed JLFP scheduling policy. A task set τ is schedulable under the OMLP if \(\tau' \triangleq \{ T'_{i}(e_{i} + b_{i}, p_{i}) \ \vert \ T_{i} \in\tau\}\) is deemed schedulable by \(\mathcal{T}\).

Note that the derivation of b i itself does not depend on the actual scheduling policy or \(\mathcal{T}\); the OMLP can thus be applied to any JLFP scheduling policy and any corresponding sustainable schedulability test.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brandenburg, B.B., Anderson, J.H. The OMLP family of optimal multiprocessor real-time locking protocols. Des Autom Embed Syst 17, 277–342 (2013). https://doi.org/10.1007/s10617-012-9090-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10617-012-9090-1

Keywords

Navigation