As the real-time model is a generalization of the classic model, the set of systems covered by the classic model is a strict subset of the systems covered by the real-time model. More precisely, every system in the classic model \((n, [\underline{\delta }^-, \underline{\delta }^+])\) can be specified in terms of a real-time model \((n, [\delta ^-_{}, \delta ^+_{}], [\mu ^-_{}, \mu ^+_{}])\) with \(\delta ^-_{} = \underline{\delta }^-\), \(\delta ^+_{} = \underline{\delta }^+\) and \(\mu ^-_{} = \mu ^+_{} = 0\). Thus, every result (correctness or impossibility) for some classic system also holds in the corresponding real-time system with (a) the same message delay bounds, (b) \(\mu ^-_{(\ell )} = \mu ^+_{(\ell )} = 0\) for all \(\ell \), and (c) an admission control component that does not drop any messages. Intuition tells us that impossibility results also hold for the general case, i.e., that an impossibility result for some classic system \((n, [\underline{\delta }^-, \underline{\delta }^+])\) holds for all real-time systems \((n, [\delta ^-_{}, \delta ^+_{}], [\mu ^-_{}, \mu ^+_{}])\) with \(\delta ^-_{} \le \underline{\delta }^-\), \(\delta ^+_{} \ge \underline{\delta }^+\) and arbitrary \(\mu ^-_{}, \mu ^+_{}\) as well, because the additional delays do not provide the algorithm with any useful information.
As it turns out, this conjecture is true: This section will present a simulation (Algorithm 1) that allows us to use an algorithm designed for the real-time model in the classic model—and, thus, to transfer impossibility results from the classic to the real-time model (see Sect. 7.1 for an example)—provided the following conditions hold:
Definition 3
We define \(gstates(tr)\) to be the (ordered) set of global states in some st-trace \(tr\). For some state \(s\) and some set \(\mathcal {V}\), let \(s|_\mathcal {V}\) denote \(s\) restricted to variable names contained in the set \(\mathcal {V}\). For example, if \(s = \{(a, 1), (b, 2), (c, 3)\}\), then \(s|_{\{a, b\}} = \{(a, 1), (b, 2)\}\). Likewise, let \(gstates(tr)|_\mathcal {V}\) denote \(gstates(tr)\) where all local states \(s\) have been replaced by \(s|_\mathcal {V}\).
A problem \(\mathcal {P}\) is simulation-invariant, if there exists a finite set \(\mathcal {V}\) of variable names, such that \(\mathcal {P}\) can be specified as a predicate on \(gstates(tr)|_\mathcal {V}\) and the sequence of \(input\) st-events (which usually takes the form \(Pred_1(input\text { st-events of }tr) \Rightarrow Pred_2(gstates(tr)|_\mathcal {V})\)).
Informally, this means that adding variables to some algorithm or changing its message pattern does not influence its ability to solve some problem \(\mathcal {P}\), as long as the state transitions of the “relevant” variables \(\mathcal {V}\) still occur in the same way at the same time.
For example, the classic clock synchronization problem specifies conditions on the adjusted clock values of the processors, i.e., the hardware clock values plus the adjustment values, at any given real time. The problem cares neither about additional variables the algorithm might use nor about the number or contents of messages exchanged.
The advantage of such a problem specification is that algorithms can be run in a (time-preserving) simulation environment and still solve the problem: As long as the algorithm’s state transitions are the same and occur at the same time, the simulator may add its own variables and change the way information is exchanged. On the other hand, a problem specification that restricts either the type of messages that might be sent or the size of the local state would not be simulation invariant.
-
Cond2
The delay bounds in the classic system must be at least as restrictive as those in the real-time system. As long as \(\delta ^-_{(\ell )} \le \underline{\delta }^-\) and \(\delta ^+_{(\ell )} \ge \underline{\delta }^+\) holds (for all \(\ell \)), any message delay of the simulating execution (\(\underline{\delta }\in [\underline{\delta }^-{}, \underline{\delta }^+{}]\)) can be directly mapped to a message delay in the simulated rt-run (\(\delta = \underline{\delta }\)), such that \(\delta \in [\delta ^-_{(\ell )}, \delta ^+_{(\ell )}]\) is satisfied, cf. Fig. 6a. Thus, a simulated message corresponds directly to a simulation message with the same message delay.
-
Cond3
Hardware clock drift must be reasonably low. Assume a system with very inaccurate hardware clocks, combined with very accurate processing delays: In that case, timing information might be gained from the processing delay, for example, by increasing a local variable by \((\mu ^-_{} + \mu ^+_{})/2\) during each computing step. If \(\rho \), the hardware clock drift bound, is very large and \(\mu ^+_{} - \mu ^-_{}\) is very small, the precision of this simple “clock” might be better than the one of the hardware clock. Thus, algorithms might in fact benefit from the processing delay, as opposed to the zero step-time situation.
To avoid such effects, the hardware clock must be “accurate enough” to define (time-out) a time span that is guaranteed to lie within \(\mu ^-_{}\) and \(\mu ^+_{}\), which requires \(\rho \le \frac{\mu ^+_{(\ell )} - \mu ^-_{(\ell )}}{\mu ^+_{(\ell )} + \mu ^-_{(\ell )}}\). In this case, the classic system can simulate a delay within \(\mu ^-_{(\ell )}\) and \(\mu ^+_{(\ell )}\) real-time units by waiting for \(\tilde{\mu }_{(\ell )}{} = 2\frac{\mu ^+_{(\ell )}\mu ^-_{(\ell )}}{\mu ^+_{(\ell )} + \mu ^-_{(\ell )}}\) hardware clock time units.
Lemma 4
If \(\rho \le \frac{\mu ^+_{(\ell )} - \mu ^-_{(\ell )}}{\mu ^+_{(\ell )} + \mu ^-_{(\ell )}}\) holds, \(\tilde{\mu }_{(\ell )}\) hardware clock time units correspond to a real-time interval of \([\mu ^-_{(\ell )}, \mu ^+_{(\ell )}]\) on a non-Byzantine processor.
Proof
Since drift is bounded, \((1+\rho ) \ge \frac{HC_p(t')-HC_p(t)}{t'-t} \ge (1-\rho )\). Since \(HC_p\) is an unbounded, strictly increasing continuous function (cf. EX4), an inverse function \(HC^{-1}_p\), mapping hardware clock time to real time, exists. Thus, \( \forall T < T': \frac{1}{1+\rho } \le \frac{HC_p^{-1}(T') - HC_p^{-1}(T)}{T' - T} \le \frac{1}{1-\rho }\).
Choose \(T\) and \(T'\) such that \(T' - T = \tilde{\mu }_{(\ell )}{}\):
$$\begin{aligned} \frac{\tilde{\mu }_{(\ell )}{}}{1+\rho } \le HC_p^{-1}(T + \tilde{\mu }_{(\ell )}{}) - HC_p^{-1}(T) \le \frac{\tilde{\mu }_{(\ell )}{}}{1-\rho }. \quad (*) \end{aligned}$$
Since \(\rho \le \frac{\mu ^+_{(\ell )} - \mu ^-_{(\ell )}}{\mu ^+_{(\ell )} + \mu ^-_{(\ell )}}\) holds,
$$\begin{aligned} \frac{\tilde{\mu }_{(\ell )}{}}{1+\frac{\mu ^+_{(\ell )} - \mu ^-_{(\ell )}}{\mu ^+_{(\ell )} + \mu ^-_{(\ell )}}} \le \quad (*) \quad \le \frac{\tilde{\mu }_{(\ell )}{}}{1-\frac{\mu ^+_{(\ell )} - \mu ^-_{(\ell )}}{\mu ^+_{(\ell )} + \mu ^-_{(\ell )}}}. \end{aligned}$$
Applying the definition of \(\tilde{\mu }_{(\ell )}{}\) yields \(\mu ^-_{(\ell )} \le HC_p^{-1}(T + \tilde{\mu }_{(\ell )}{}) - HC_p^{-1}(T) \le \mu ^+_{(\ell )}\). \(\square \)
Overview
The following theorem, which hinges on a formal transformation from executions to rt-runs, represents one of the main results of this paper in a slightly simplified version.
Theorem 5
Let \(\underline{s}= (n, [\underline{\delta }^-, \underline{\delta }^+])\) be a classic system. If
-
\(\mathcal {P}\) is a simulation-invariant problem (Cond1),
-
the algorithm \(\mathcal {A}\) solves problem \(\mathcal {P}\) in some real-time system \(s= (n, [\delta ^-_{}, \delta ^+_{}], [\mu ^-_{}, \mu ^+_{}])\) with some scheduling/admission policy \(pol\) under failure model \(f\)-\(f'\)-\(\rho \),
-
\(\forall \ell : \delta ^-_{(\ell )} \le \underline{\delta }^-\) and \(\delta ^+_{(\ell )} \ge \underline{\delta }^+\)
(Cond2), and
-
\(\forall \ell : \rho \le \frac{\mu ^+_{(\ell )} - \mu ^-_{(\ell )}}{\mu ^+_{(\ell )} + \mu ^-_{(\ell )}}\)
(Cond3),
then the algorithm \(\underline{\mathcal {S}}_{\mathcal {A}, pol, \mu {}}\) solves \(\mathcal {P}\) in \(\underline{s}\) under failure model \(f\)-\(f'\)-\(\rho \).
For didactic reasons, the following structure will be used in this section: First, the simulation algorithm, the transformation and a sketch of the correctness proof for Theorem 5 will be presented. Afterwards, we show how Cond2 can be weakened, followed by a full formal proof of correctness.
Cond2: \(\forall \ell : \delta ^-_{(\ell )} \le \underline{\delta }^-\wedge \delta ^+_{(\ell )} \ge \underline{\delta }^+\) is a very strong requirement, since \([\underline{\delta }^-, \underline{\delta }^+]\) must lie within all intervals \([\delta ^-_{(1)}, \delta ^+_{(1)}]\), \([\delta ^-_{(2)}, \delta ^+_{(2)}]\), .... In some cases, such an interval \([\underline{\delta }^-, \underline{\delta }^+]\) might not exist: Consider, e.g., the case in the bottom half of Fig. 6b, where \([\delta ^-_{(1)}, \delta ^+_{(1)}]\) and \([\delta ^-_{(2)}, \delta ^+_{(2)}]\) do not overlap. After the sketch of Theorem 5’s proof, we will show that it is possible to weaken Cond2 while retaining correctness, although this modification adds complexity to the transformation as well as to the algorithm and the proof.
Algorithm
Algorithm \(\underline{\mathcal {S}}_{\mathcal {A}, pol, \mu {}}\) (\(=\)Algorithm 1), designed for the classic model, allows us to simulate a real-time system, and, thus, to use an algorithm \(\mathcal {A}\) designed for the real-time model to solve problems in a classic system. The algorithm essentially simulates queuing, scheduling, and execution of real-time model jobs of some duration within \(\mu ^-_{(\ell )}\) and \(\mu ^+_{(\ell )}\); it is parameterized with some real-time algorithm \(\mathcal {A}\), some scheduling/admission policy \(pol\) and the waiting time \(\tilde{\mu }_{(\ell )}{} = 2\frac{\mu ^+_{(\ell )}\mu ^-_{(\ell )}}{\mu ^+_{(\ell )} + \mu ^-_{(\ell )}}\). We define \(\underline{\mathcal {S}}_{\mathcal {A}, pol, \mu {}}\) to have the same initial states as \(\mathcal {A}\), with the set of variables extended by a \(queue\) and a flag \(idle\).
All actions occurring on a non-Byzantine processor within an execution \(ex\) of \(\underline{\mathcal {S}}_{\mathcal {A}, pol, \mu {}}\) fall into one of the following five groups:
-
(a)
an algorithm message arriving, which is immediately processed,
-
(b)
an algorithm message arriving, which is enqueued,
-
(c)
a (finished-processing) timer message arriving, causing some message from the queue to be processed,
-
(d)
a (finished-processing) timer message arriving when no messages are in the queue (or all messages in the queue get dropped),
-
(e)
an algorithm message arriving, which is immediately dropped.
Figure 3 illustrates state transitions (a)–(e) in the simulation algorithm: At every point in time, the simulated processor is either idle (variable \(idle = true\)) or busy (\(idle = false\)). Initially, the processor is idle. As soon as the first algorithm message (i.e., a message other than the internal (finished-processing) timer message) arrives [type (a) action], the processor becomes busy and waits for \(\tilde{\mu }_{(\ell )}{}\) hardware clock time units (\(\ell \) being the number of ordinary messages sent during that computing step), unless the message gets dropped by the scheduling/admission policy immediately [type (e) action], which would mean that the processor stays idle. All algorithm messages arriving while the processor is busy are enqueued [type (b) action]. After these \(\tilde{\mu }_{(\ell )}{}\) hardware clock time units have passed (modeled as a (finished-processing) timer message arriving), the queue is checked and a scheduling/admission decision is made (possibly dropping messages). If it is empty, the processor returns to its idle state [type (d) action]; otherwise, the next message is processed [type (c) action].
The transformation \(T_{C\rightarrow R}\) from executions to rt-runs
As shown in Fig. 4, the first step of the proof that this simulation is correct consists of transforming every execution \(ex\) of \(\underline{\mathcal {S}}_{\mathcal {A}, pol, \mu {}}\) into a corresponding rt-run of \(\mathcal {A}\). By showing that this rt-run is an admissible rt-run of \(\mathcal {A}\) and that the execution and the rt-run have (roughly) the same state transitions, the fact that the execution satisfies \(\mathcal {P}\) will be derived from the fact that the rt-run satisfies \(\mathcal {P}\).
The transformation \(ru= T_{C\rightarrow R}(ex)\) constructs an rt-run \(ru\). We set \(HC^{ru}_p = HC^{ex}_p\) for all \(p\), such that both \(ex\) and \(ru\) have the same hardware clocks. Depending on the type of action, a corresponding receive event, job and/or drop event in \(ru\) is constructed for each action \(ac\) on a fault-free processor.
-
Type (a): This action is mapped to a receive event \(R\) and a subsequent job \(J\) in \(ru\). The job’s duration equals the time required for the (finished-processing) message to arrive.
-
Type (b): This action is mapped to a receive event \(R\) in \(ru\). There is one special (technical) case where the action is instead mapped to a receive event at a different time, see Sect. 5.4 for details.
-
Type (c): This action is mapped to a job \(J\) in \(ru\), processing the algorithm message of the corresponding type (b) action (i.e., the message chosen by applying the scheduling policy to variable \(queue\)). The job’s duration equals the time required for the (finished-processing) message to arrive. In addition, for every message dropped from \(queue\) (if any), a drop event \(D\) is created right before \(J\).
-
Type (d): Similar to type (c) actions, a drop event \(D\) is created for every message removed from \(queue\) (if any).
-
Type (e): This action is mapped to a receive event \(R\) and a subsequent drop event \(D\) in \(ru\), both with the same parameters.
The state transitions of the jobs created by the transformation conform to those of the corresponding actions with the simulation variables (\(queue\), \(idle\)) removed. To illustrate this transformation, Fig. 5 shows an example with actions of types (a), (b) (twice), (c), (d) and (e) occurring in \(ex\) (in this order), the actions taken by the simulation algorithm and the resulting rt-run \(ru\).
Crashing processors: When a processor crashes in \(ex\), there is some action \(ac^{last}\) that might execute only part of its state transition sequence and that is followed only by actions with “NOP” transitions. All actions up to \(ac^{last}\) are mapped according to the rules above. If \(ac^{last}\) was a type (a) or (c) action that did not succeed in sending out its (finished-processing) message, we will, for the purposes of the transformation, assume that such a (finished-processing) message with a real-time delay of \(\mu ^-_{(\ell )}\) had been sent; this allows us to construct the corresponding job \(J^{last}\).Footnote 6 If \(ac^{last}\) was not a type (a) or (c) action, let \(J^{last}\) be the job corresponding to the last type (a) or (c) action before \(ac^{last}\) (if such an action exists).
Clearly, all actions on \(ex\) occurring between \(begin(J^{last})\) and \(end(J^{last})\) are (possibly partial) type (b) actions (before the crash) or NOP actions (after the crash). All of these actions are treated as type (b) actions w.r.t. the transformation, i.e., they are transformed into simple receive events. After \(J^{last}\) has finished, all messages still in \(queue\) plus all messages received during \(J^{last}\) are dropped, i.e., a drop event is created in \(ru\) for each of these messages at time \(end(J^{last})\).
Every action after \(end(J^{last})\) on this processor (which must be a NOP action) is treated like a type (e) action: It is mapped to a receive event immediately followed by a drop event.
Byzantine processors: On Byzantine processors, every action in the execution is simply mapped to a corresponding receive event and a zero-time job, sending the same messages and performing the same state transitions. Since jobs on Byzantine nodes do not need to obey any timing restrictions, it is perfectly legal to model them as taking zero time.
Special case: timer messages
There is a subtle difference between the classic and the real-time model with respect to the \(arrives\_timely(m_t)\) predicate of \(f\)-\(f'\)-\(\rho \): In an rt-run, a timer message \(m_t\) sent during some job \(J\) arrives at the end of the job (\(end(J)\)) if the desired arrival hardware clock time (\(sHC(m_t)\)) occurs while \(J\) is still in progress. On the other hand, in an execution, the timer message always arrives at \(sHC(m_t)\).
For \(T_{C\rightarrow R}\) this means that the transformation rule for type (b) actions changes: If the type (b) action \(ac\) for timer message \(m_t = msg(ac)\) occurs at some time \(t = time(ac)\) while the (finished-processing) message corresponding to the simulated job that sent \(m_t\) is still in transit, then the corresponding receive event \(R\) does not occur at \(t\) but rather at \(t' = time(ac')\), with \(ac'\) denoting the type (c) or (d) action where the (finished-processing) message arrives.
This change ensures that the receive event in the simulated rt-run occurs at the correct time, i.e., no earlier than at the end of the job sending the timer message. One inconsistency still remains, though: The order of the messages in the queue might differ between the simulated queue in the execution (i.e., variable \(queue\)) and the queue in the rt-run constructed by \(T_{C\rightarrow R}\): In the execution, \(m_t\) is added to \(queue\) at time \(t\), whereas in the rt-run, \(m_t\) is added to the real-time queue at time \(t'\). This could make a difference, for example, when another message arrives between \(t\) and \(t'\).
Since \(\underline{\mathcal {S}}_{\mathcal {A}, pol, \mu {}}\) “knows” about \(\mathcal {A}\), it is obviously possible for the simulation algorithm to detect such cases and reorder \(queue\) accordingly. We have decided not to include these details in Algorithm 1, since the added complexity might make it more difficult to understand the main structure of the simulation algorithm. For the remainder of this section, we will assume that such a reordering takes place.
Observations on algorithm \(\underline{\mathcal {S}}_{\mathcal {A}, pol, \mu {}}\) and transformation \(T_{C\rightarrow R}\)
The following can be asserted for every fault-free or not-yet-crashed processor:
Observation 6
Every type (c) action has a corresponding type (b) action where the algorithm message being processed in the type (c) action (Line 17) is enqueued (Line 8). More generally, every message removed from \(queue\) by \(pol\) in a type (c) or (d) action has been received earlier by a corresponding type (b) action.
Observation 7
Every type (a) and every type (c) action sending \(\ell \) ordinary messages also sends one (finished-processing) timer message, which arrives \(\tilde{\mu }_{(\ell )}{} := 2\frac{\mu ^+_{(\ell )}\mu ^-_{(\ell )}}{\mu ^+_{(\ell )} + \mu ^-_{(\ell )}}\) hardware clock time units later (Line 19).
Lemma 8
Initially and directly after executing some action \(ac\) with \(proc(ac) = p\), processor \(p\) is in one of two well-defined states:
-
State 1 (idle): \(newstate(ac).idle = true\), \(newstate(ac).queue\,= empty\), and there is no (finished-processing) timer message to \(p\) in transit,
-
State 2 (busy): \(newstate(ac).idle = false\) and there is exactly one (finished-processing) timer message to \(p\) in transit.
Proof
By induction. Initially (replace \(newstate(ac)\) with the initial state), every processor is in state 1. If a message is received while the processor is in state 1, it is added to the queue. Then, the message is either dropped, causing the processor to stay in state 1 [type (e) action], or the message is processed, \(idle\) is set to \(false\) and a (finished-processing) timer message is sent, i.e., the processor switches to state 2 [type (a) action]. If a message is received during state 2, one of two things can happen:
-
The message is a (finished-processing) timer message. If the queue was empty or all messages got dropped (Line 13; recall that \(next = \bot \) implies \(queue = empty\), since we assume a non-idling scheduler), the processor switches to state 1 [type (d) action]. Otherwise, a new (finished-processing) timer message is generated. Thus, the processor stays in state 2 [type (c) action].
-
The message is an algorithm message. The message is added to the queue and the processor stays in state 2 [type (b) action].\(\square \)
The following observation follows directly from this lemma and the design of the algorithm:
Observation 9
Type (a) and (e) actions can only occur in idle state, type (b), (c) and (d) actions only in busy state. Type (a) and (d) actions change the state (from idle to busy and from busy to idle, respectively), all other actions keep the state (see Fig. 3).
Lemma 10
After a type (a) or (c) action \(ac\) sending \(\ell \) ordinary messages occurred at hardware clock time \(T\) on processor \(p\) in \(ex\), the next type (a), (c), (d) or (e) action on \(p\) can occur no earlier than at hardware clock time \(T + \tilde{\mu }_{(\ell )}{}\), when the (finished-processing) message sent by \(ac\) has arrived.
Proof
Since \(ac\) is a type (a) or (c) action, \(newstate(ac).idle = false\), which, by Lemma 8, cannot change until no more (finished-processing) messages are in transit. By Observation 7, this cannot happen earlier than at hardware clock time \(T + \tilde{\mu }_{(\ell )}{}\). Lemma 8 also states that no second (finished-processing) message can be in transit simultaneously.
Thus, between \(T\) and \(T + \tilde{\mu }_{(\ell )}{}\), \(idle = false\) and only algorithm messages arrive at \(p\), which means that only type (b) actions can occur. \(\square \)
Lemma 11
On non-Byzantine processors, there is a one-to-one correspondence between (finished-processing) messages in \(ex\) and jobs in \(ru\): A job \(J\) exists in \(ru\) if, and only if, there is a corresponding (finished-processing) message \(m\) in \(ex\), with \(begin(J) = time(ac)\) of the action \(ac\) sending \(m\) and \(end(J) = time(ac')\) of the action \(ac'\) receiving \(m\).
Proof
(finished-processing) \(\rightarrow \)
job: Note that (finished-processing) messages in \(ex\) are only sent in type (a) and (c) actions. \(T_{C\rightarrow R}\) ensures that for both kinds of actions a job exists in \(ru\) that ends exactly at the time at which the (finished-processing) message arrives in \(ex\).
job
\(\rightarrow \) (finished-processing): Follows from the fact that, due to the rules of \(T_{C\rightarrow R}\), jobs only exist in \(ru\) if there is a corresponding type (a) or (c) action in \(ex\). These actions send (finished-processing) messages, and the mapping of the job length to the delivery time of the (finished-processing) message ensures that these messages do not arrive until the job has completed. \(\square \)
Correctness proof (sketch)
This section will sketch the proof idea for Theorem 5, following the outline of Fig. 4. Its main purpose is to prepare the reader for the more intricate proof of Theorem 16.
As defined in Theorem 5, let \(\underline{s}= (n, [\underline{\delta }^-, \underline{\delta }^+])\) be a classic system and \(\mathcal {P}\) be a simulation-invariant problem (Cond1). Let \(\mathcal {A}\) be an algorithm solving problem \(\mathcal {P}\) in some real-time system \(s= (n, [\delta ^-_{}, \delta ^+_{}], [\mu ^-_{}, \mu ^+_{}])\) with some scheduling/admission policy \(pol\) under failure model \(f\)-\(f'\)-\(\rho \). Let \(\forall \ell : \delta ^-_{(\ell )} \le \underline{\delta }^-\) and \(\delta ^+_{(\ell )} \ge \underline{\delta }^+\) (Cond2), and \(\forall \ell : \rho \le (\mu ^+_{(\ell )} - \mu ^-_{(\ell )})/(\mu ^+_{(\ell )} + \mu ^-_{(\ell )})\) (Cond3). As shown in Lemma 4, Cond3 ensures that the simulation algorithm can simulate a real-time delay between \(\mu ^-_{(\ell )}\) and \(\mu ^+_{(\ell )}\).
For each execution \(ex\) of \(\underline{\mathcal {S}}_{\mathcal {A}, pol, \mu {}}\) in \(\underline{s}\) conforming to failure model \(f\)-\(f'\)-\(\rho \), we create the corresponding rt-run \(ru\) according to transformation \(T_{C\rightarrow R}\). Applying the formal definitions of a valid rt-run and of failure model \(f\)-\(f'\)-\(\rho \), it can be shown that \(ru\)
is an admissible rt-run of algorithm
\(\mathcal {A}\)
in system
\(s\).
Since (a) \(ru\) is an admissible rt-run of algorithm \(\mathcal {A}\) in \(s\), and (b) \(\mathcal {A}\) is an algorithm solving \(\mathcal {P}\) in \(s\), it follows that \(ru\)
satisfies
\(\mathcal {P}\). Choose any st-trace \(tr^{ru}\) of \(ru\) where all state transitions are performed at the beginning of the job. Since \(ru\) satisfies \(\mathcal {P}\), \(tr^{ru} \in \mathcal {P}\). Transformation \(T_{C\rightarrow R}\) ensures that exactly the same state transitions are performed in \(ex\) and \(ru\) (omitting the simulation variables \(queue\) and \(idle\)). Since (i) \(\mathcal {P}\) is a simulation-invariant problem, (ii) \(tr^{ru} \in \mathcal {P}\), and (iii) every st-trace \(tr^{ex}\) of \(ex\) performs the same state transitions on algorithm variables as some \(tr^{ru}\) of \(ru\) at the same time, it follows that \(tr^{ex} \in \mathcal {P}\) and, thus, \(ex\)
satisfies
\(\mathcal {P}\).
By applying this argument to every admissible execution \(ex\) of \(\underline{\mathcal {S}}_{\mathcal {A}, pol, \mu {}}\) in \(\underline{s}\), we see that every such execution satisfies \(\mathcal {P}\). Thus, \(\underline{\mathcal {S}}_{\mathcal {A}, pol, \mu {}}\) solves \(\mathcal {P}\) in \(\underline{s}\) under failure model \(f\)-\(f'\)-\(\rho \).
Generalizing Cond2
Cond2 can be weakened to \(\delta ^-_{(1)} \le \underline{\delta }^-\wedge \delta ^+_{(1)} \ge \underline{\delta }^+\), by simulating the additional delay with a timer message (see Fig. 6b). This bound, denoted Cond2’, suffices, if Cond3 is slightly strengthened as follows (denoted Cond3’):
$$\begin{aligned} \forall \ell : \rho \le \frac{\mu ^+_{(\ell )} - \mu ^-_{(\ell )}}{\mu ^+_{(\ell )} + \mu ^-_{(\ell )}}\text { and } \rho \le \frac{(\delta ^+_{(\ell )} - \delta ^+_{(1)}) - (\delta ^-_{(\ell )} - \delta ^-_{(1)})}{(\delta ^+_{(\ell )} - \delta ^+_{(1)}) + (\delta ^-_{(\ell )} - \delta ^-_{(1)})}\end{aligned}$$
First, note that \(\delta ^+_{(1)} \ge \underline{\delta }^+\Leftrightarrow \forall \ell : \delta ^+_{(\ell )} \ge \underline{\delta }^+\), due to \(\delta ^+_{(\ell )}\) being non-decreasing with respect to \(\ell \) (cf. Sect. 4.2). Thus, the generalization mainly allows \(\delta ^-_{(\ell )}\) to be greater than \(\underline{\delta }^-\) for \(\ell > 1\). Since the message delay uncertainty \(\varepsilon _{(\ell )} (= \delta ^+_{(\ell )} - \delta ^-_{(\ell )})\) is non-decreasing in \(\ell \) as well, \(\varepsilon _{(\ell )}\ge {\varepsilon _{(1)}}\) holds, and we can ensure that the simulated message delays lie within \(\delta ^-_{(\ell )}\) and \(\delta ^+_{(\ell )}\), although the real message delay might be smaller than \(\delta ^-_{(\ell )}\), by introducing an artificial, additional message delay within the interval \([\delta ^-_{(\ell )} - \delta ^-_{(1)}, \delta ^+_{(\ell )} - \delta ^+_{(1)}]\) upon receiving a message. The restriction on \(\rho \) in Cond3’ ensures that such a delay can be estimated by the algorithm.
Lemma 12
If Cond3’ holds, \(\tilde{\delta }_{(\ell )}:= 2\frac{(\delta ^+_{(\ell )} - \delta ^+_{(1)})(\delta ^-_{(\ell )} - \delta ^-_{(1)})}{(\delta ^+_{(\ell )} - \delta ^+_{(1)}) + (\delta ^-_{(\ell )} - \delta ^-_{(1)})}\) hardware clock time units correspond to a real-time interval of \([\delta ^-_{(\ell )} - \delta ^-_{(1)}, \delta ^+_{(\ell )} - \delta ^+_{(1)}]\).
Proof
Analogous to Lemma 4. \(\square \)
Of course, being able to add this delay implies that each algorithm message is wrapped into a simulation message that also includes the value \(\ell \). The right-hand side of Fig. 6 illustrates the principle of this extended algorithm (Algorithm 2), denoted \(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\), and the transformation of an execution of \(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\) into an rt-run.
Interestingly, for \(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\) to work, Cond1 needs to be strengthened as well. Recall that processors can only send messages during an action or during a job, which, in turn, must be triggered by the reception of a message – this is the exact reason why we need input messages to boot the system! This restriction applies to Byzantine processors as well.
Consider Fig. 6b and assume that (1) the first action/job on the first processor boots the system and that (2) the second processor is Byzantine. Note that messages \((m,2)\) (in the execution) and \(m\) (in the rt-run) are received at different times. Since Byzantine processors can make arbitrary state transitions and send arbitrary messages, in the classic model, the second processor could send out a message \(m'\) right after receiving \((m,2)\). Let us assume that this happens, and let us call this execution \(ex'\).
Mapping \(ex'\) to an rt-run \(ru'\), however, causes a problem: We cannot map \(m'\) to \(ru'\), since, in the real-time model, the second processor has not received any message yet. Thus, it has not booted – there is no corresponding job that could send \(m'\).Footnote 7
Note that this is only an issue during booting: Afterwards, arbitrary jobs could be constructed on the Byzantine processor due to its ability to send timer messages to itself. Since booting is modeled through input messages, we strengthen Cond1 as follows:
This allows us to map \(ex'\) to an rt-run \(ru'\) in which the second processor receives an input message right before sending \(m'\).
Transformation \(T_{C\rightarrow R}\) revisited
\(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\) adds an additional layer: The actions of \(\underline{\mathcal {S}}_{\mathcal {A}, pol, \mu {}}\) previously triggered by incoming ordinary messages are now caused by an (additional-delay, \(m\)) message instead. Two new types of actions, (f) and (g), can occur: A type (f) action receives a \((m, \ell )\) pair and sends an (additional-delay, \(m\)) message (possibly with delay \(0\), if \(\ell = 1\)), and a type (g) action ignores a malformed message. For example, the first action on the second processor in Fig. 6b would be a type (f) action. Since \(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\) modifies neither \(queue\) nor \(idle\), note that Observations 6, 7 and 9 as well as Lemmas 8, 10 and 11 still hold.
In the transformation, actions of type (f) and (g) are ignored—this also holds for NOP actions on crashed processors that would have been type (f) or (g) actions before the crash. Apart from that, the transformation rules of Sect. 5.3 still apply, with the following exceptions. Let a valid ordinary message be a message that would trigger Line 31 in Algorithm 2 after reaching a fault-free recipient (which includes all messages sent by non-Byzantine processors).
-
1.
Valid ordinary messages received by a fault-free processor are “unwrapped”:
-
Sending side: A message \((m, \ell )\) in \(trans(ac)\) in \(ex\) is mapped to simply \(m\) in \(trans(J)\) of the corresponding job in \(ru\).
-
Receiving side: A message (additional-delay, \(m\)) in \(msg(ac)\) is replaced by \(m\) in \(msg(JD)\) of the corresponding job or drop event \(JD\) in \(ru\).
Note that \(T_{C\rightarrow R}\) removes the reception of \((m, \ell )\) and the sending of (additional-delay, \(m\)), since type (f) actions are ignored. Basically, the transformation ensures that the \(m \rightarrow (m, \ell ) \rightarrow \) (additional-delay, \(m\)) \(\rightarrow m\) chain is condensed to a simple transmission of message \(m\) (cf. Fig. 7, the message from \(p_2\) to \(p_1\)).
-
2.
Valid ordinary messages received by a crashing processor \(p\) are unwrapped as well. On the sending side, \((m, \ell )\) is replaced by \(m\). As long as the receiving processor \(p\) has not crashed, the remainder of the transformation does not differ from the fault-free case. After (or during) the crash, the receiving type (f) action no longer generates an (additional-delay) timer message. In this case, we add a receive event and a drop event for message \(m\) at \(t + \delta ^-_{(\ell )}\) on \(p\), with \(t\) denoting the sending time of the message. Analogous to Sect. 5.3, the drop event happens at the end of \(J^{last}\) instead, if the arrival time \(t + \delta ^-_{(\ell )}\) lies within \(begin(J^{last})\) and \(end(J^{last})\). Since type (f) actions are ignored in the transformation, we have effectively replaced the transmission of \((m, \ell )\) in \(ex\), taking \([\delta ^-_{(1)}, \delta ^+_{(1)}]\) time units, with a transmission of \(m\) in \(ru\), taking \(\delta ^-_{(\ell )}\) time units.
-
3.
Valid ordinary messages received by some Byzantine processor \(p\) are unwrapped as well. Note, however, that on \(p\)
all actions are transformed to (zero-time) jobs—there is no separation in type (a)–(g), since the processor does not need to execute the correct algorithm. In this case, the “unwrapping” just substitutes \((m, \ell )\) with \(m\) on both the sender and the receiver sides and adds a receiving job
\(J'_R\) (and a matching receive event) for \(m\) with a NOP transition sequence on the Byzantine processor at \(t + \delta ^-_{(\ell )}\), with \(t\) denoting the sending time of the message. \(msg(J_R)\) and \(msg(R_R)\), the triggering message of the job and the receive event corresponding to the action receiving the message in \(ex\), is changed to some new dummy timer message, sent by adding it to some earlier job on \(p\). If \(R_R\) is the first receive event on \(p\), Cond1’ allows us to insert a new input message into \(ru\) that triggers \(R_R\). Adding \(J'_R\) guarantees that the message delays of all messages stay between \(\delta ^-_{(\ell )}\) and \(\delta ^+_{(\ell )}\) in \(ru\). On the other hand, keeping \(J_R\) is required to ensure that any (Byzantine) actions performed by \(ac_R\) can be mapped to the rt-run and happen at the same time.
-
4.
Invalid ordinary messages (which can only be sent by Byzantine processors) are removed from the transition sequence of the sending job. To ensure message consistency, we also need to make sure that the message does not appear on the receiving side: If the receiving processor is non-Byzantine, a type (g) action is triggered on the receiver. Since type (g) actions are not mapped to the rt-run, we are done. If the receiver is Byzantine, let \(J_R\) be the job corresponding to \(ac_R\), the action receiving the message. As in rule 3, we replace \(msg(J_R)\) (and the message of the corresponding receive event) with a timer message sent by an earlier job or with an additional input message.
Figure 7 shows an example of valid ordinary messages sent to a non-Byzantine (\(p_1\)) as well as to a Byzantine (\(p_3\)) processor. Note that these modifications to \(T_{C\rightarrow R}\) do not invalidate Lemma 11.
Validity of the constructed rt-run
Lemma 13
If \(ex\) is a valid execution of \(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\) under failure model \(f\)-\(f'\)-\(\rho \), then \(ru= T_{C\rightarrow R}(ex)\) is a valid rt-run of \(\mathcal {A}\).
Proof
Let \(red(s)\) be defined as state \(s\) without the simulation variables \(queue\) and \(idle\). We will show that conditions RU1–8 defined in Sect. 4.2 are satisfied:
-
RU1
Applying the \(T_{C\rightarrow R}\) transformation rules to all actions \(ac\) in \(ex\) in sequential order (except for the special timer message case discussed in Sect. 5.4) ensures non-decreasing begin times in \(ru\). RU1 also requires message causality: Sending message \(m\) in \(ru\) occurs at the same time as sending message \((m, \ell )\) in \(ex\), and receiving message \(m\) in \(ru\) occurs at the same time as receiving message (additional-delay, \(m\)) in \(ex\) (or at the sending time plus \(\delta ^-_{}\), in the case of a Byzantine recipient, cf. Fig. 7). Since there is a causal chain \((m, \ell ) \rightarrow \) some type (f) action \( \rightarrow \) (additional-delay, \(m\)) in \(ex\), it is not hard to see that a message \(m\) violating message causality (by being sent after being received) can only exist in \(ru\) if either \((m, \ell )\) or (additional-delay, \(m\)) violates message causality, which is prohibited by EX1. W.r.t. jobs and drop events, the correct order on Byzantine processors follows directly from the transformation. For other processors, consider the different types of actions. Type (a): \(J\) is created right after \(R\). Type (b), (f) and (g): No job or drop event is created. Type (c) and (d): By Observation 6, every message removed from \(queue\) (= every message for which a job or drop event is created by \(T_{C\rightarrow R}\)) has been received before by a type (b) action. By \(T_{C\rightarrow R}\), a receive event has been created for this message. Type (e): \(D\) is created right after \(R\).
-
RU2
Assume by way of contradiction that there are two subsequent jobs \(J\) and \(J'\) on the same processor \(p\) such that \(newstate(J) \ne oldstate(J')\). If the processor is Byzantine, every action is mapped to a job with the same \(oldstate\) and \(newstate\). In addition, jobs are added upon receiving a message, but those jobs have NOP state transitions, i.e., their (equivalent) \(oldstate\) and \(newstate\) are chosen to match the previous and the subsequent job. Thus, on a Byzantine processor, RU2 can only be violated if EX2 does not hold. On fault-free or crashing processors, \(J\) corresponds to some type (a) or (c) action \(ac\) and \(red(newstate(ac)) = newstate(J)\). The same holds for \(J'\), which corresponds to some type (a) or (c) action \(ac'\) with \(red(oldstate(ac')) = oldstate(J')\). Since \(newstate(J) \ne oldstate(J')\), \(red(newstate(ac)) \ne red(oldstate(ac'))\). As EX2 holds in \(ex\), there must be some action \(ac''\) in between \(ac\) and \(ac'\) such that \(red(oldstate(ac'')) \ne red(newstate(ac''))\). This yields two cases, both of which lead to a contradiction: (1) \(ac''\) is a type (a) or (c) action. In that case, there would be some corresponding job \(J''\) with \(J\prec J'' \prec J'\) in \(ru\), contradicting the assumption that \(J\) and \(J'\) are subsequent jobs. (2) \(ac''\) is a type (b), (d), (e), (f) or (g) action. Since these kinds of actions only change \(queue\) and \(idle\), this contradicts \(red(oldstate(ac'')) \ne red(newstate(ac''))\).
-
RU3
On Byzantine processors, RU3 follows directly from EX3 due to the tight relationship between actions and jobs. On the other hand, on every non-Byzantine processor \(p\), \(oldstate(J)\) of the first job \(J\) on \(p\) in \(ru\) is equal to \(red(oldstate(ac))\) of the first type (a) or (c) action \(ac\) on \(p\) in \(ex\). Following the same reasoning as in the previous point, we can argue that \(red(oldstate(ac)) = red(oldstate(ac'))\), with \(ac'\) being the first (any type) action on \(p\) in \(ex\). Since the set of initial states of \(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\) equals the one of \(\mathcal {A}\) (extended with \(queue = empty\) and \(idle = true\)), RU3 follows from EX3.
-
RU4
Follows easily from \(HC^{ru}_p = HC^{ex}_p\), the transformation rules of \(T_{C\rightarrow R}\) and the fact that EX4 holds in \(ex\).
-
RU5
At most one job sending m: Follows from the fact that, on non-Byzantine processors, every action \(ac\) is mapped to at most one job \(J\), \(trans(J)\) is an (unwrapped) subset of \(trans(ac)\), and EX5 holds in \(ex\). On Byzantine processors, every action \(ac\) is mapped to at most one non-NOP job \(J\) sending the same messages plus newly-introduced (unique) dummy timer messages. At most one receive event receiving m: This follows from the fact that on non-Byzantine processors, every action \(ac\) is mapped to at most one receive event \(R\) in \(ru\) receiving the same message (unwrapped) and EX5 holds in \(ex\). On Byzantine processors, every action \(ac\) is mapped to at most one receive event receiving the same message as \(ac\) plus at most one receive event receiving a newly-introduced (unique) dummy timer messages. At most one job or drop event processing/dropping m: Since EX5 holds in \(ex\), every message received in \(ex\) is unique. On Byzantine processors, the action receiving the message is transformed to exactly one job processing it plus at most one job processing some dummy timer message. On other processors, every message gets unwrapped and put into \(queue\) at most once and, since \(pol\) is a valid scheduling/admission policy, every message is removed from \(queue\) at most once. Transformation \(T_{C\rightarrow R}\) is designed such that a job or drop event with \(msg(J/D) = m\) is created in \(ru\) if, and only if, \(m\) gets removed from \(queue\) in the corresponding action. Correct processor specified in the message: Follows from the fact that EX5 holds in \(ex\) and that \(T_{C\rightarrow R}\) does not change the processor at which messages are sent, received, processed or dropped.
-
RU6
Assume that there is some message \(m\) that has been received but not sent. Due to the rules of \(T_{C\rightarrow R}\), neither (finished-processing) nor (additional-delay) messages are received in \(ru\). The construction also ensures that dummy timer messages on Byzantine processors are sent before being received. Thus, \(m\) must be an algorithm message. If \(m\) is a timer message, no unwrapping takes place, so there must be a corresponding action receiving \(m\) in \(ex\). Since EX6 holds in \(ex\), there must be an action \(ac\) sending \(m\). As \(m\) is an algorithm message and all actions sending algorithm timer messages (type (a) and (c), or actions on Byzantine processors) are transformed to jobs sending the same timer messages at the same time, we have a contradiction. If \(m\) is an ordinary message received by a non-Byzantine processor, it has been unwrapped in the transformation, i.e., there is a corresponding (additional-delay, \(m\)) message in \(ex\), created by a type (f) action. This type (f) action has been triggered by a \((m, \ell )\) message, which—according to EX6—must have been sent in \(ex\). As in the previous case, we can argue that an action sending an algorithm message must be of type (a), (c) or from a Byzantine processor. Thus, it is transformed into a job in \(J\), and the transformation ensures that the action sending \((m, \ell )\) is replaced by a job sending \(m\)—a contradiction. Likewise, if \(m\) is received by a Byzantine processor, there is a corresponding action receiving \((m, \ell )\) in \(ex\) and the same line of reasoning can be applied.
-
RU7
Consider two jobs \(J\prec J'\) on the same non-Byzantine processor \(proc(J) = p = proc(J')\). \(T_{C\rightarrow R}\) ensures that there is a corresponding type (a) or (c) action for every job in \(ru\). Let \(ac\) and \(ac'\) be the actions corresponding to \(J\) and \(J'\) and note that \(time(ac) = begin(J)\) and \(time(ac') = begin(J')\). Lemma 10 implies that \(ac'\) cannot occur until the (finished-processing) message sent by \(ac\) has arrived. Since \(duration(J)\) is set to the delivery time of the (finished-processing) message in \(T_{C\rightarrow R}\), \(J'\) cannot start before \(J\) has finished. On Byzantine processors, jobs cannot overlap since they all have a duration of zero.
-
RU8
Drop events occur in \(ru\) only when there is a corresponding type (c), (d) or (e) action on a non-Byzantine processor in \(ex\). Type (c) and (d) actions are triggered by a (finished-processing) message arriving; thus, by Lemma 11, there is a job in \(ru\) finishing at that time. W.r.t. type (e) actions, Observation 9 shows that \(p\) is idle in \(ex\) when a type (e) action occurs, which, by Lemma 8, means that no (finished-processing) message is in transit and, thus, by Lemma 11, there is no job active in \(ru\). Therefore \(p\) is idle in \(ru\) and \(T_{C\rightarrow R}\) ensures that a receive event occurs at the time of the type (e) action.\(\square \)
Failure model compatibility
Lemma 14
Let \(\underline{s}\) and \(s\) be a classic and a real-time system, let \(\mathcal {A}\) be a real-time model algorithm, let \(pol\) be a scheduling/admission policy, and let \(ex\) be an execution of \(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\) in \(\underline{s}\) under failure model \(f\)-\(f'\)-\(\rho \).
If Cond1’, Cond2’ and Cond3’ hold, \(ru= T_{C\rightarrow R}(ex)\) conforms to failure model \(f\)-\(f'\)-\(\rho \) in system \(s\) with scheduling/admission policy \(pol\).
Proof
Lemma 13 has shown that \(ru\) is a valid rt-run of \(\mathcal {A}\). The following conditions of \(f\)-\(f'\)-\(\rho \), as specified in Sect. 4.3, are satisfied:
-
\(\forall m_o: is\_timely\_msg(m_o, \delta ^-_{}, \delta ^+_{})\) Every ordinary algorithm message \(m_o\) in \(ru\) is sent at the same time as its corresponding message \((m_o, \ell )\) in \(ex\). On a fault-free or not-yet-crashed recipient, \(m_o\) is received at the same time as its corresponding message (additional-delay, \(m_o\)) in \(ex\). (additional-delay, \(m_o\)) is a timer message sent by the action triggered by the arrival of \((m_o, \ell )\) and takes \(\tilde{\delta }_{(\ell )}\) hardware clock time units—corresponding to a real-time interval of \([\delta ^-_{(\ell )} - \delta ^-_{(1)}, \delta ^+_{(\ell )} - \delta ^+_{(1)}]\) (recall Lemma 12). Since the transmission of \((m_o, \ell )\) requires between \(\underline{\delta }^-\) and \(\underline{\delta }^+\) time units, a total of \([\underline{\delta }^-+ (\delta ^-_{(\ell )} - \delta ^-_{(1)}), \underline{\delta }^++ (\delta ^+_{(\ell )} - \delta ^+_{(1)})]\) time units elapsed between the sending of \((m_o, \ell )\) (corresponding to the sending of \(m_o\) in \(ru\)) and the reception of (additional-delay, \(m_o\)) (corresponding to the reception of \(m_o\) in \(ru\)). Since, by Cond2’, \(\delta ^-_{(1)} \le \underline{\delta }^-\) and \(\delta ^+_{(1)} \ge \underline{\delta }^+\), this interval lies within \([\delta ^-_{(\ell )}, \delta ^+_{(\ell )}]\) and \(m_o\) is timely. If the receiving processor is Byzantine or has crashed, the message takes exactly \(\delta ^-_{(\ell )}\) time units, see transformation rule 3 in Sect. 5.8.
-
\(\forall m_t: arrives\_timely(m_t) \vee [proc(m_t) \in F']\) Algorithm timer messages in \(ex\) sent for some hardware clock value \(T\) on some non-Byzantine processor \(p\) cause a type (a), (b) or (e) action \(ac\) at some time \(t\) with \(HC(ac) = T\) when they are received. As all of these actions are mapped to receive events \(R\) with \(msg(R) = msg(ac)\) and \(time(R) = t\) (or \(time(R) = end(J)\) of the job \(J\) sending the timer, see Sect. 5.4), and the hardware clocks are the same in \(ru\) and \(ex\), timer messages arrive at the correct time in \(ru\).
-
Relationship of
\(ac^{last}\)
and
\(J^{last}\): The following observation follows directly from the transformation rules for crashing processors in Sect. 5.3.\(\square \)
Observation 15
Fix some processor \(p \in F\), let \(ac^{last}\) be the first action \(ac\) on \(p\) for which \(is\_last(ac)\) holds. If \(ac^{last}\) is a type (a) or (c) action, \(is\_last(J)\) holds for the job \(J\) corresponding to \(ac^{last}\). Otherwise, \(is\_last(J)\) holds for the job \(J\) corresponding to the last type (a) or (c) action on \(p\) before \(ac^{last}\).
In the following, let \(J^{last}\) for some fixed processor \(p\) denote the job \(J\) for which \(is\_last(J)\) holds.
-
Correct processors: Observe that, due to the design of \(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\) and \(T_{C\rightarrow R}\), variable \(queue\) in \(ex\) represents the queue state of \(ru\). Every receive event in \(ru\) occurring while the processor is idle corresponds to either a type (a) or a type (e) action. In every such action, a scheduling decision according to \(pol\) is made (Line 11) and \(T_{C\rightarrow R}\) ensures that either a drop event (type (e) action) or a job (type (a) action) according to the output of that scheduling decision is created. Crashing processors: Fix some processor \(p \in F\) and let \(ac^{last}\) be the first action \(ac\) on \(p\) satisfying \(is\_last(ac)\). For all actions on \(p\) up to (and including) \(ac^{last}\) (or for all actions, if no such \(ac^{last}\) exists), the transformation rules are equivalent to those for correct processors and, thus, the above reasoning applies for all receive events on \(p\) prior to \(J^{last}\) (cf. Observation 15). The transformation rules for messages received on crashing processors (Sect. 5.8) ensure that all receive events satisfy either \(obeys\_pol(R)\) (if received during \(J^{last}\): no scheduling decision—neither job start nor message drop—is made) or \(arrives\_after\_crash(R)\) and \(drops\_msg(R)\) (if received after \(J^{last}\) has finished processing: the message is dropped immediately).
-
Correct processors: The same reasoning as in the previous point applies: Every job in \(ru\) finishing corresponds to a type (c) or (d) action in \(ex\) in which the (finished-processing) message representing that job arrives. Both of these actions cause a scheduling decision (Line 11) to be made on \(queue\) (which corresponds to \(ru\)’s queue state), and corresponding drop events and/or a corresponding job (only type (c) actions) are created by \(T_{C\rightarrow R}\). Crashing processors: For all jobs before \(J^{last}\), the same reasoning as for correct processors applies. The transformation rules ensure that all messages that have not been processed or dropped before get dropped at \(end(J^{last})\).
-
Correct processors: Let \(ac\) be the type (a) or (c) action corresponding to \(J\). \(ac\) executes all state transitions of \(\mathcal {A}\) (Line 17) for either \(msg(ac)\) (type (a) action) or some message from the queue (type (c) action) and the current hardware clock time, plus some additional operations that only affect variables \(queue\) and \(idle\) and (finished-processing) messages. Thus, \(T_{C\rightarrow R}\)’s choice of \(HC(J)\), \(msg(J)\) and \(trans(J)\) ensure that \(trans(J)\) conforms to algorithm \(\mathcal {A}\). Crashing processors: For all jobs before \(J^{last}\), the same reasoning as for the correct processor applies. Since \(J^{last}\) corresponds to either \(ac^{last}\) (which also satisfies \(follows\_alg\_partially\)) or to some earlier type (a) or (c) action (which satisfies \(follows\_alg\)), \(follows\_alg\_partially(J^{last})\) is satisfied.
-
\(\forall J: is\_timely\_job(J, \mu ^-_{}, \mu ^+_{}) \vee [proc(J) \in F']\)
Correct processors:
\(T_{C\rightarrow R}\) ensures that \(duration(J)\) equals the transmission time of the (finished-processing) message sent by the action \(ac\) corresponding to job \(J\). Since \(arrives\_timely(m_t)\) holds for (finished-processing) messages \(m_t\) in \(ex\), there are exactly \(\tilde{\mu }_{(\ell )}{}\) hardware clock time units between the sending and the reception of the (finished-processing) message sent by \(ac\) (see Line 19 of \(\underline{\mathcal {S}}_{\mathcal {A}, pol, \mu {}}\)). By Lemma 4, this corresponds to some real-time interval within \([\mu ^-_{(\ell )}, \mu ^+_{(\ell )}]\). Since \(\ell \) equals the number of ordinary messages sent in \(J\) (see Line 18 of the algorithm and the transformation rules for type (a) and (c) actions in \(T_{C\rightarrow R}\)), \(is\_timely\_job(J, \mu ^-_{}, \mu ^+_{})\) holds. Crashing processors: For all jobs before \(J^{last}\), the same reasoning as for the correct processor applies. If \(ac\), the action corresponding to \(J^{last}\), was able to successfully send a (finished-processing) message, the above reasoning holds for \(J^{last}\) as well. Otherwise, the transformation rules (Sect. 5.3) ensure that \(J^{last}\) takes exactly \(\mu ^-_{(\ell )}\) time units, with \(\ell \) denoting the number of ordinary messages that would have been sent in the non-crashing case, as required by \(is\_timely\_job\).
-
\(\forall p: bounded\_drift(p, \rho ) \vee [proc(J) \in F']\) Follows from the definition that \(HC^{ru}_p = HC^{ex}_p\) and the fact that the corresponding \(bounded\_drift\) condition holds in \(ex\). \(\square \)
Transformation proof
Theorem 16
Let \(\underline{s}= (n, [\underline{\delta }^-, \underline{\delta }^+])\) be a classic system, and let \(\mathcal {P}\) be a simulation-invariant problem. If
-
the algorithm \(\mathcal {A}\) solves problem \(\mathcal {P}\) in some real-time system \(s= (n, [\delta ^-_{}, \delta ^+_{}], [\mu ^-_{}, \mu ^+_{}])\) with some scheduling/admission policy \(pol\) under failure model \(f\)-\(f'\)-\(\rho \)
[A1]
Footnote 8 and
-
conditions Cond1’, Cond2’ and Cond3’ (see Sect. 5.7) hold,
then the algorithm \(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\) solves \(\mathcal {P}\) in \(\underline{s}\) under failure model \(f\)-\(f'\)-\(\rho \).
Proof
Let \(ex\) be such an execution of \(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\) in \(\underline{s}\) under failure model \(f\)-\(f'\)-\(\rho \)
[D1]. By Lemmas 13 and 14 as well as conditions Cond1’, Cond2’ and Cond3’, \(ru= T_{C\rightarrow R}(ex)\) is a valid rt-run of \(\mathcal {A}\) in \(s\) with scheduling/admission policy \(pol\) under failure model \(f\)-\(f'\)-\(\rho \)
[L1].
As \(\mathcal {A}\) is an algorithm solving \(\mathcal {P}\) in \(s\) with policy \(pol\) under failure model \(f\)-\(f'\)-\(\rho \) ([A1]) and \(ru\) is a valid rt-run of \(\mathcal {A}\) in \(s\) with policy \(pol\) conforming to failure model \(f\)-\(f'\)-\(\rho \) ([L1]), \(ru\) satisfies \(\mathcal {P}\) (cf. Sect. 4.4) [L2].
To show that \(ex\) satisfies \(\mathcal {P}\), we must show that \(tr' \in \mathcal {P}\) holds for every st-trace \(tr'\) of \(ex\). Let \(tr'\) be an st-trace of \(ex\), and let \(tr'/t\) be the list of all \(transition\) st-events in \(tr'\)
[D2]. We will construct some \(transition\) list \(tr/t\) from \(tr'/t\) by sequentially performing these operations for the \(transition\) st-events of all non-Byzantine processors:
-
1.
Remove the variables \(queue\) and \(idle\) from all states.
-
2.
Remove any \(transition\) st-events that only manipulate \(queue\) and/or \(idle\). Note that, due to the previous step, these st-events satisfy \(oldstate = newstate\).
Since \(\mathcal {P}\) is a simulation-invariant problem, there is some finite set \(\mathcal {V}\) of variable names, such that \(\mathcal {P}\) is a predicate on global states restricted to \(\mathcal {V}\) and the sequence of \(input\) st-events (cf. Definition 3). Since variables \(queue\) and \(idle\) in algorithm \(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\) could be renamed arbitrarily, we can assume w.l.o.g. that \(queue \not \in \mathcal {V}\) and \(idle \not \in \mathcal {V}\). Examining the list of operations in the definition of \(tr\) reveals that \(gstates(tr')|_\mathcal {V}= gstates(tr)|_\mathcal {V}\), for every \(tr\) having the same \(transition\) st-events as \(tr/t\)
[L3].
We now show that \(tr/t\) is the \(transition\) sequence of some st-trace of \(ru\) where all transitions happen at the very beginning of each job.
-
Every job in
\(ru\)
on a non-Byzantine processor is correctly mapped to
\(transition\)
st-events in
\(tr/t\): Every job \(J\) in \(ru\) is based on either a type (a) or a type (c) action \(ac\) in \(ex\). According to Sect. 4, the \(transition\) st-events produced by mapping \(ac\) are the same as the st-events produced by mapping \(J\), except that the st-events mapped by \(ac\) contain the simulation variables. However, they have been removed by the transformation from \(tr'/t\) to \(tr/t\).
-
Every
\(transition\)
st-event in
\(tr/t\)
on a non-Byzantine processor corresponds to a job in
\(ru\): Every st-event in \(tr'/t\) is based on an action \(ac\) in \(ex\). Since the transformation \(tr'/t \rightarrow tr/t\) does not add any st-events, every st-event in \(tr/t\) is based on an action \(ac\) in \(ex\) as well. Since all st-events only modifying \(queue\) and \(idle\) have been removed, \(tr/t\) only contains the st-events corresponding to some type (a) or (c) action in \(ex\). The st-events in \(tr'/t\) contain the \(transition\) st-events of \(\mathcal {A}\)-process_message(msg, current_hc) and additional steps taken by the simulation algorithm. The transformation from \(tr'/t\) to \(tr/t\) ensures that these additional steps (and only these) are removed. Thus, the remaining st-events in \(tr\) correspond to the job \(J\) corresponding to \(ac\).
-
For Byzantine processors, recall (Sect. 5.3) that the actions in \(ex\) and their corresponding jobs in \(ru\) perform exactly the same state transitions.
Following the rules in Sect. 4.4, an st-trace \(tr\) for \(ru\) where all transitions happen at the very beginning of each job must exist. Thus, we can conclude that \(tr/t\) is the \(transition\) sequence of some st-trace \(tr\) of \(ru\)
[L4]. W.r.t. \(input\) st-events, note that the same \(input\) st-events occur in \(tr\) and \(tr'\), except for Byzantine processors, which might receive dummy input messages in \(ru\) (and, thus, in \(tr\)) that are missing in \(ex\) (and, thus, in \(tr'\)), cf. Sect. 5.8. Cond1’, however, ensures that \(\mathcal {P}\) does not care about input messages sent to Byzantine processors.
As \(\mathcal {A}\) solves \(\mathcal {P}\) in \(s\) with policy \(pol\) under failure model \(f\)-\(f'\)-\(\rho \) ([A1]), \(ru\) is an rt-run of \(\mathcal {A}\) in \(s\) with policy \(pol\) under failure model \(f\)-\(f'\)-\(\rho \) ([L1]), and \(tr\) is an st-trace of \(ru\), \(tr\in \mathcal {P}\)
[L5]. Since \(gstates(tr')|_\mathcal {V}= gstates(tr)|_\mathcal {V}\) ([L3,L4]), \(tr\in \mathcal {P}\) ([L5]), and \(\mathcal {P}\) is a simulation-invariant problem, \(tr' \in \mathcal {P}\)
[L6].
As this ([L6]) holds for every st-trace \(tr'\) of every execution \(ex\) of \(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\) in \(\underline{s}\) under failure model \(f\)-\(f'\)-\(\rho \) ([D1,D2]), \(\underline{\mathcal {S'}}_{\mathcal {A}, pol, \delta {}, \mu {}}\) solves \(\mathcal {P}\) in \(\underline{s}\) under failure model \(f\)-\(f'\)-\(\rho \). \(\square \)