Optimal scheduling of measurement-based parallel real-time tasks

In this work we consider a measurement-based model for parallel real-time tasks represented by the work and span parameters of directed acyclic graphs, with different bounds for nominal and overload scenarios. We address the corresponding real-time scheduling problem and propose an optimal scheduling strategy with a derived tight bound on the maximum response time of a task.


Introduction
Task models based upon directed acyclic graphs (DAGs) are widely used for representing recurrent real-time processes in a manner that exposes their internal parallelism, thereby enabling the exploitation of such parallelism upon multiprocessor and multicore platforms. These task models typically represent pieces of sequential (i.e., non-parallelizable) computation via vertices and their dependencies as edges between vertices; hence constructing such a model for a recurrent process requires detailed knowledge of the internal control-flow structure of the process.

3
Such knowledge is not always available. Furthermore, even when available, conservative estimates of the computational demands of individual vertices, e.g., via worst-case execution time (WCET) parameters, can result in severe underutilization of computational resources during run-time. To ameliorate these problems, a measurement-based model was recently proposed (Agrawal and Baruah 2018). This model deals with the lack of knowledge of the internal structure by representing the computation of a DAG with just the two parameters work (the cumulative computation of all the vertices in the DAG) and span (the maximum cumulative computation of any precedence-constrained sequence of vertices). This model deals with the potential pessimism by requiring that two estimates be provided for each parameters: work O and span O are very conservative upper bounds (safe even under overload conditions), while work N and span N are nominal upper bounds (i.e., upper bounds under "typical" circumstances) on the values of the work and span parameters respectively. It is assumed that work N ≤ work O and span N ≤ span O .
Definition 1 (The scheduling problem) Suppose we are given a task represented by the four parameters work N , span N , work O and span O , and a deadline D and two processor counts: m N and m O , where m N ≤ m O . The scheduling problem is to finish the task with a makespan (response time) no larger than the deadline D, and we may use at most m N processors to do so, unless it is observed during the execution that at least one of the nominal parameters work N and span N does not provide a valid upper bound for the current invocation of the task. If this is observed, we may switch to using up to m O processors instead for the remainder of the execution, but we must still meet the original deadline D even if the computational demands of the task invocation turns out to be as high as work O and span O . The scheduler does not know anything more about the internal details of the task than what can be deduced from the given parameters. ◻ The approach presented by Agrawal and Baruah (2018) is a scheduling strategy that precomputes an upper bound D N on the maximum makespan that is possible when executing a task with a total work at most work N and a span at most span N upon m N processors using any greedy (work-conserving) scheduling (Graham 1969). It then starts to execute the given task upon m N processors greedily, and after D N time units checks whether the task has completed. If not at least one of work N or span N must have been exceeded, and so it activates the additional (m O − m N ) processors and continues the greedy execution until completion.
The new approach in this paper is also to begin executing the task greedily upon m N processors, but rather than checking the progress of the task at a precomputed time point D N , it instead monitors the total amount of execution occurring across all the m N processors. If the invocation does not complete before the execution equals the nominal work parameter work N , then it activates the additional (m O − m N ) processors and continues executing the task greedily until completion.

Contributions and comparisons
The approach of Agrawal and Baruah (2018) only requires that the runtime detect whether the task has completed by time D N .
In contrast, our approach requires the capability to monitor the total progress on the work-that is, the amount of execution done across the processors. Assuming this capability is available, we will show below that our approach is, in fact, optimal-no other scheduler can guarantee to meet the deadline D under the constraints of the scheduling problem specified above if this approach cannot also do so. Note that, our approach also has the advantage that it only needs three parameters; work N , work O , and span O since it does not need to monitor whether the span exceeds span N . In contrast, the approach by Agrawal and Baruah (2018) needs In addition, (Expression (1) of Theorem 2) is a tight bound on the maximum makespan with this new scheduling approach. In addition to its use as a schedulability test, this expression can be used to, e.g., minimize the processor counts m N and m O needed to meet the deadline. Note that this is exactly what we want to do if the task is periodically or sporadically activated and we wanted to schedule it in a federated manner similar to Li et al. (2016).

Schedulability conditions
We use a well-known result about scheduling DAG tasks characterized by single work and span parameters (i.e., where we don't separate nominal and overload scenarios).
Theorem 1 (Graham (1969)) The maximum makespan of a given DAG executed on m processors by a greedy (work-conserving) scheduler is no larger than M = ( work−span m + span) . ◻ In the following, we derive a tight bound on the makespan for our new scheduling approach for DAG tasks that are characterized by parameters work N , span N , work O and span O for nominal and overload scenarios. Comparing this bound with a deadline is a sufficient schedulability condition for our proposed strategy and also a necessary condition for any scheduler following the rules of the scheduling problem described in Definition 1.

Theorem 2 Our proposed scheduling strategy will execute a task with a makespan that is no larger than
In addition, no scheduler can guarantee a smaller makespan. ◻ Theorem 2 follows directly from lemmas 1 to 4, proven below. We start with lemmas 1 and 2, which demonstrate that no scheduler can guarantee a smaller makespan bound. Recall from Definition 1 that schedulers are assumed to not We now show with lemmas 3 and 4 that our proposed scheduling strategy can finish within a makespan no larger than the one specified in Theorem 2.

Lemma 3
If work N > work O − span O , then our proposed scheduling strategy will complete the task with a makespan no larger than Proof Follows from using Theorem 1 with the more conservative task parameters work O and span O and the smaller number of processors m N that we are always guaranteed. ◻

Lemma 4
If work N ≤ work O − span O , then our proposed scheduling strategy will complete the task with a makespan no larger than Proof We separately consider the cases where the nominal parameter work N holds or not during the execution of the task invocation.

Case 1 (The total workload of the current invocation is no larger than work N ):
In this case the extra processors will never be activated. By Theorem 1 the makespan is no larger than Case 2 (The total workload of the current invocation is larger than work N ): In this case, the extra m O − m N processors will get activated by our proposed approach, say after t time units. Let t busy denote the total amount of time before t where all m N processors are busy, and let t idle = t − t busy denote the total time during which at least one processor is idling. Let work ′ and span ′ denote the actual remaining work and span after the first t time units and note that work � ≤ work O − work N and Because a greedy scheduler never idles all processors unless the invocation completes and we have completed exactly work N units of execution after t time units, we have work N ≥ t busy × m N + t idle , which implies that t busy ≤ work N −t idle m N . Note that the first vertex in any path is always available for execution, and so if any processor is idle we know that all critical paths must currently be executing and therefore the remaining span is also being shortened. We must then have span � ≤ span O − t idle , which implies t idle ≤ span O − span � . Thus, Using Eq. (2) and Theorem 1 we see that the total makespan cannot be larger than which finishes the proof. ◻ Acknowledgements Open access funding provided by Uppsala University. This research was supported by NSF Grants CCF-1733873, CCF-1618802, CCF-1439062, CNS-1814739, CPS-1932530, CNS-1911460, and CNS-1948457  (2) 1 3 are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.