Reconciling fault-tolerant distributed algorithms and real-time computing

We present generic transformations, which allow to translate classic fault-tolerant distributed algorithms and their correctness proofs into a real-time distributed computing model (and vice versa). Owing to the non-zero-time, non-preemptible state transitions employed in our real-time model, scheduling and queuing effects (which are inherently abstracted away in classic zero step-time models, sometimes leading to overly optimistic time complexity results) can be accurately modeled. Our results thus make fault-tolerant distributed algorithms amenable to a sound real-time analysis, without sacrificing the wealth of algorithms and correctness proofs established in classic distributed computing research. By means of an example, we demonstrate that real-time algorithms generated by transforming classic algorithms can be competitive even w.r.t. optimal real-time algorithms, despite their comparatively simple real-time analysis.

uted state machine. The progress of time is solely reflected by the time intervals between steps. Owing to this assumption, it does not make a difference, for example, whether messages arrive at a processor simultaneously or nicely staggered in time: Conceptually, the messages are processed instantaneously in a step at the receiver when they arrive. The zero step-time abstraction is hence very convenient for analysis, and a wealth of distributed algorithms, correctness proofs, impossibility results and lower bounds have been developed for models that employ this assumption [15].
In real systems, however, computing steps are neither instantaneous nor arbitrarily preemptible: A computing step triggered by a message arriving in the middle of the execution of some other computing step is delayed until the current computation is finished. This results in queuing phenomena, which depend not only on the actual message arrival pattern, but also on the queuing/scheduling discipline employed. Real-time systems research has established powerful techniques for analyzing those effects [3,32], such that worstcase response times and even end-to-end delays [34] can be computed.
Our real-time model for message-passing systems [20,22] reconciles the distributed computing and the real-time systems perspective: By replacing zero-time steps by non-zero time steps, it allows to reason about queuing effects and puts scheduling in the proper perspective. In sharp contrast to the classic model, the end-to-end delay of a message is no longer a model parameter, but results from a real-time analysis based on job durations and communication delays.
Apart from making distributed algorithms amenable to real-time analysis, the real-time model also allows to address the interesting question of whether/which properties of real systems are inaccurately or even wrongly captured when resorting to classic zero step-time models. For example, it turned out [20] that no n-processor clock synchronization algorithm with constant running time can achieve optimal precision, but that Ω(n) running time is required for this purpose. Since an O(1) algorithm is known for the classic model [13], this is an instance of a problem where the standard distributed computing analysis gives too optimistic results.
In view of the wealth of distributed computing results, determining the properties that are preserved when moving from the classic zero step-time model to the real-time model is important: This transition should facilitate a realtime analysis without invalidating classic distributed computing analysis techniques and results. We developed powerful general transformations [24,26], which showed that a system adhering to some particular instance of the real-time model can simulate a system that adheres to some instance of the classic model (and vice versa). All the transformations presented in [26] were based on the assumption of a fault-free system, however.

Contributions:
In this paper, we generalize our transformations to the fault-tolerant setting: Processors are allowed to either crash or even behave arbitrarily (Byzantine) [11], and hardware clocks can drift. We define (mild) conditions on problems, algorithms and system parameters, which allow to re-use classic fault-tolerant distributed algorithms in the realtime model, and to employ classic correctness proof techniques for fault-tolerant distributed algorithms designed for the real-time model. As our transformations are generic, i.e., work for any algorithm adhering to our conditions, proving their correctness has already been a non-trivial exercise in the fault-free case [26], and became definitely worse in the presence of failures. We apply our transformation to the well-known problem of Byzantine agreement and analyze the timing properties of the resulting real-time algorithm.
Roadmap: Section 2 gives a brief, informal summary of the computing models and the fundamental problem of real-time analysis, which is followed by a review of related work in Sect. 3. Section 4 restates the formal definitions of the system models and presents the fault-tolerant extensions novel to this paper. The new, fault-tolerant system model transformations and their proofs can be found in Sects. 5 and 6, while Sect. 7 illustrates these transformations by applying them to wellknown distributed computing problems.

Informal overview
A distributed system consists of a set of processors and some means for communication. In this paper, we will assume that a processor is a state machine running some kind of algorithm and that communication is performed via messagepassing over point-to-point links between pairs of processors.
The algorithm specifies the state transitions that the processor may carry out. In distributed algorithms research, the common assumption is that state transitions are performed in zero time. The question remains, however, as to when these transitions are performed. In conjunction with bounds on message transmission delays, the answer to this question determines the synchrony of the computing model: The time required for one message to be sent, transmitted and received can either be constant (lock-step synchrony), bounded (synchrony or partial synchrony), or finite but unbounded (asynchrony). Note that, when computation times are zero, transmission delay bounds typically represent end-to-end delay bounds: All kinds of delays are abstracted away in one system parameter.

Computing models
The transformations introduced in this paper will relate two different distributed computing models: 1. In what we call the classic synchronous model, processors execute zero-time steps (called actions) and the only model parameters are lower and upper bounds on the endto-end delays [δ − , δ + ]. 1 Note that this assumption does not rule out end-to-end delays that are composed of communication delays + inter-step time bounds [7]. 2. In the real-time model, the zero-time assumption is dropped, i.e., the end-to-end delay bounds are split into bounds on the transmission time of a message (which we will call message delay) [δ − , δ + ] and on the actual processing time [μ − , μ + ]. In contrast to the actions of the classic model, we call the non-zero-time computing steps in the real-time model jobs. Contrary to the notion of a task in classic real-time analysis literature, a job in our setting does not represent a significant piece of code but rather a (few) simple machine operation(s). Figure 1 illustrates the real-time model: p and q are two processors. Processor p receives a message (from some other processor not shown in the diagram) at time 0, represented by an incoming arrow. The box from time 0 to 3 corresponds to the time p requires to process the message, to perform state transitions and to send out messages in response. One of those messages, m is represented by the dotted arrow and sent to q. 2 It arrives at processor q at time 4, while q is still busy executing the jobs triggered by two messages that arrived earlier. At time 7, q is idle again and can start processing m, represented by the dotted box.
The figure explicitly shows the major timing-related parameters of the real-time model, namely, message delay (δ), queuing delay (ω), end-to-end delay (Δ = δ + ω), and processing delay (μ) for the message m. The bounds on the message delay δ and the processing delay μ are part of the system model (but need not be known to the algorithm). Bounds on the queuing delay ω and the end-to-end delay Δ, however, are not parameters of the system model-in sharp contrast to the classic model. Rather, those bounds (if they exist) must be derived from the system parameters [δ − , δ + ], [μ − , μ + ] and the message pattern of the algorithm. Depending on the algorithm, this can be a non-trivial problem, and a generic solution to this issue is outside the scope of this paper. The following subsection gives a high-level overview of the problem; the examples in Sect. 7 will illustrate how such a real-time analysis can be performed for simple algorithms by deriving an upper bound on the queuing delay.

Real-time analysis
Consider the application of distributed algorithms in realtime systems, where both safety properties (like consistency of replicated data) and timeliness properties (like a bound on the maximum response time for a computation triggered by some event) must be satisfied. In order to assess some algorithm's feasibility for a given application, bounds on the maximum (and minimum) end-to-end delay [Δ − , Δ + ] are instrumental: Any relevant time complexity measure obviously depends on end-to-end delays, and even the correctness of synchronous and partially synchronous distributed algorithms [7] may rest on their ability to reliably timeout messages (explicitly or implicitly, via synchronized communication rounds).
Unfortunately, determining [Δ − , Δ + ] is difficult in practice: End-to-end delays include queuing delays, i.e., the time a delivered message waits until the processor is idle and ready to process it. The latter depends not only on the com-puting step times ([μ − , μ + ]) and the communication delays ([δ − , δ + ]) of the system, but also on the message pattern of the algorithm: If more messages arrive simultaneously at the same destination processor, the queuing delay increases. In order to compute [Δ − , Δ + ], a proper worst-case response time analysis (like in [34]) must be conducted for the endto-end delays, which has to take into account the worst-case message pattern, computing requirements, failure patterns, etc.
Computing worst-case end-to-end delays is relatively easy in case of round-based synchronous distributed algorithms, like the Byzantine Generals algorithm [11] analyzed in Sect. 7.2: If one can rely on the lock-step round assumption, i.e., that only round-k messages are sent and received by the processors in round k, their maximum number and hence the resulting queuing and processing delays can be determined easily. Choosing a round duration larger or equal to the computed maximum end-to-end delay Δ + is then sufficient to guarantee the lock-step round assumption in the system.
In case of general distributed algorithms, the worst-case response time analysis is further complicated by a circular dependency: The message pattern and computing load generated by some algorithm (and hence the bounds on the endto-end delays computed in the analysis) may depend on the actual end-to-end delays. In case of partially synchronous processors [7], for example, the number of new messages generated by a fast processor while some slow message m is still in transit obviously depends on m's end-to-end delay. These new messages can cause queuing delays for m at the receiver processor, however, which in turn affect its endto-end delay [35]. As a consequence, worst-case response time analyses typically involve solving a fixed point equation [3,34].
Recast in our setting, the following real-time analysis problem (termed worst-case end-to-end delay analysis in the sequel) needs to be solved: Given some algorithm A under failure model C, scheduling policy pol and assumed end-to-end delay bounds [Δ − , Δ + ], where the latter are considered as (still) unvalued parameters, and some real system with computing step times [μ − , μ + ] and communication delays [δ − , δ + ] in which A shall run, develop a fixed point equation for the end-to-end delay bounds ) (or show that no such function F(.) can exist, which could happen e.g. if unbounded queuing could develop). Solving this equation provides a feasible assignment of values for the end-to-end delays [Δ − , Δ + ] for the algorithm A in the given system, which is sufficient for guaranteeing its correctness: It will never happen that, during any run, any message will experience an end-to-end delay outside [Δ − , Δ + ]. Since A is guaranteed to work correctly under this assumption, it will only generate message patterns that do not violate the assumptions made in the analysis leading to [Δ − , Δ + ].
Note carefully that, once a feasible assignment for [Δ − , Δ + ] is known, there is no need to consider the system parameters [δ − , δ + ] and [μ − , μ + ] further. By "removing" the dependency on the real system's characteristics in this way, the real-time model facilitates a sound real-time analysis without sacrificing the compatibility with classic distributed computing analysis techniques and results. Recall that, in the classic model, the end-to-end delays [δ − , δ + ] were part of the system model and hence essentially had to be correctly guessed. By virtue of the transformations introduced in the later sections, all that is needed to employ some classic faulttolerant distributed algorithm in the real-time model is to conduct an appropriate worst-case end-to-end delay analysis and to compute a feasible end-to-end delay assignment.

Related work
All the work on time complexity of distributed algorithms we are aware of considers end-to-end delays as a model parameter in a zero-step time model. Hence, queuing and scheduling does not occur at all, even in more elaborate examples, e.g., [30]. Papers that assume non-zero step-times often consider them sufficiently small to completely ignore queuing effects [27] or assume shared-memory access instead of a message passing network [1,2].
The only work in the area of fault-tolerant distributed computing we are aware of that explicitly addresses queuing and scheduling is [8]. It introduces the Time Immersion ("late binding") approach, where real-time properties of an asynchronous or partially synchronous distributed algorithm e.g. for consensus are just "inherited" from the underlying system. Nevertheless, somewhat contrary to intuition, guaranteed timing bounds can be determined by a suitable real-time analysis. Their work does not rest on a formal distributed computing model, however.
There are also a few approaches in real-time systems research that aim at an integrated schedulability analysis in distributed systems [17,28,33,34]. However, contrary to the execution of many distributed algorithms, they assume very simple interaction patterns of the processors in the system, and do not consider failures.
Hence, our real-time model seems to be the first attempt to rigorously bridge the gap between fault-tolerant distributed algorithms and real-time systems that does not sacrifice the strengths of the individual views. Our real-time model, the underlying low-level st-traces and our general transformations between real-time model and classic model have been introduced in [20,22] and extended in [24,26]; [20] and [21] analyze clock synchronization issues in this model. The present paper finally adds failures to the picture.
Given that systems with real-time requirements have also been an important target for formal verification since decades, it is appropriate to also relate our approach to some important results of verification-related research. In fact, verification tools like Kronos [6] or Uppaal [12] based on timed automata [4] have successfully been used for model-checking real-time properties in many different application domains. On the other hand, there are also modeling and analysis frameworks based on various IO automata [9,14,16,18,31], which primarily use interactive (or manual) theorem-proving for verifying implementation correctness via simulation relations.
Essentially, all these frameworks provide the capabilities needed for modeling and analyzing distributed algorithms at the level of our st-traces (see Sect. 4.4). 3 However, to the best of our knowledge, none of these frameworks provides a convenient abstraction comparable to our rt-runs, which allows to reason about real-time scheduling and queueing effects explicitly and independently of correctness issues: State-based specifications suitable e.g. for Uppaal tightly intertwine the control flow of the algorithms with execution constraints and scheduling policies. This not only leads to very complex specifications, but also rules out the separation of correctness proofs (using classic distributed algorithms results) and real-time analysis (using worstcase response time analysis techniques) made possible by our transformations.

System models
Since the fault-free variants of the classic and the real-time model have already been introduced [24,26], we only restate the most important properties and the fault-tolerant extensions here.

Classic system model
We consider a network of n processors, which communicate by passing unique messages. Each processor p is equipped with a CPU, some local memory, a read-only hardware clock, and reliable, non-FIFO links to all other processors.
The hardware clock H C p : R + → R + is an invertible function that maps dense real-time to dense clock-time; it can be read but not changed by its processor. It starts with some initial value HC p (0) and then increases strictly, continuously and without bound.
An algorithm defines initial states and a transition function. The transition function takes the processor index p, one incoming message, the receiver processor's current local state and hardware clock reading as input, and yields a list of states and messages to be sent, e.g. [oldstate, int.st. 1 , int.st. 2 , msg. m to q, msg. m to q , int.st. 3 , newstate], as output. The list must start with the processor's current local state and end with a state. Thus, the single-element list [oldstate = newstate] is also valid.
If the CPU is unable to perform the transition from oldstate to newstate in an atomic manner, intermediate states (int.st. 1/2/3 in our example) might be present for a short period of time. Since, in the classic model, this time is abstracted away and the state transition from oldstate to newstate is assumed to be instantaneous, these states are usually neglected in the classic model. We explicitly model them to retain compatibility with the real-time model, where they will become important.
Formally, we consider a state to be a set of (variable name, value) pairs, containing no variable name more than once. We do not restrict the domain or type of those values, which might range, e.g., from simple Boolean values to lists or complex data structures.
A "message to be sent" (m and m in our example) is specified as a pair consisting of the message itself and the destination processor the message will be sent to.
Every message reception immediately causes the receiver processor to change its state and send out all messages according to the transition function (=an action). The complete action (message arrival, processing and sending messages) is performed instantly in zero time.
Actions can be triggered by ordinary, timer or input messages: -Ordinary messages (m o ) are transmitted over the links.
Let δ m denote the difference between the real-time of the action sending some ordinary message m and the realtime of the action receiving it. The classic model defines a lower and an upper bound [δ − , δ + ] on δ m , for all m.
Since the time required to process a message is zero in the classic model-which also means that no queuing effects can occur-δ m represents both the message (transmission) delay as well as the end-to-end delay. -Timer messages (m t ) are used for modeling time(r)-driven execution in our message-driven setting: A processor setting a timer is modeled as sending a timer message m (to itself) in an action, and timer expiration is represented by the reception of a timer message. Timer messages are received when the hardware clock reaches (or has already reached) the time specified in the message. -Input messages (m i ) arrive from outside the system and can be used to model booting and starting the algorithm, as well as interaction with elements (e.g., users, interfaces) outside the distributed system.

Executions
An execution in the classic model is a sequence ex of actions and an associated set of n hardware clocks HC ex = {HC ex p , HC ex q , . . .}. (We will omit the superscript of HC ex p if the associated execution is clear from context). An action ac occurring at real-time t at processor p is a 5-tuple, consisting of the processor index pr oc(ac) = p, the received message msg(ac), the occurrence realtime time(ac) = t, the hardware clock value HC(ac) = HC p (t) and the state transition sequence trans(ac) = [oldstate, . . . , newstate] (including messages to be sent).
A valid execution ex of an algorithm A must satisfy the following properties: EX1 ex must be a sequence of actions with a well-defined total order ≺. time(ac) must be non-decreasing. Message sending and receiving must be in the correct causal order, i.e., msg(ac ) ∈ trans(ac) ⇒ ac ≺ ac . EX2 Processor states can only change during an action, i.e., newstate(ac 1 ) = oldstate(ac 2 ) must hold for two consecutive actions ac 1 and ac 2 on the same processor. EX3 The first action ac at every processor p must occur in an initial state of A. EX4 The hardware clock readings must increase strictly (∀t, t , p : t < t ⇒ HC p (t) < HC p (t )), continuously and without bound. EX5 Messages must be unique, 4 i.e., there is at most one action sending some message m and at most one action receiving it. Messages can only be sent by and processed by the processors specified in the message.
EX6 Every non-input message that is received must have been sent.
Note that further conditions (such as adherence to the bounds on the message delay or the state transitions of the algorithm) will be added by the failure model in Sect. 4.3.
A classic system s is a system adhering to the classic model, parameterized by the system size n and the interval [δ − , δ + ] specifying bounds on the message delay.

Real-time model
The real-time model extends the classic model in the following way: A computing step in a real-time system is executed non-preemptively within system-wide bounds [μ − , μ + ], which may depend on the number of messages sent in a computing step. In order to clearly distinguish a computing step in the real-time model from a zero-time action in the classic model, we use the term job to refer to the former. We consider jobs as the unit of preemption in the real-time model, i.e., a running job cannot be interrupted by the scheduler.
This simple extension makes the real-time model more realistic but also more complex. In particular, queuing and scheduling effects must be taken into account: -We must now distinguish two modes of a processor at any point in real-time t: idle and busy (i.e., currently executing a job). Since jobs cannot be interrupted, a queue is needed that stores messages arriving while the processor is busy. -Contrary to the classic model, the state transitions oldstate → · · · → newstate in a single computing step typically occur at different times during the job, allowing an intermediate state to be valid on a processor for some non-zero duration. -Some non-idling scheduling policy is used to select a new message from the queue whenever processing of a job has been completed. To ensure liveness, we assume that the scheduling policy is non-idling. Note that the scheduling policy can also be used for implementing non-preemptible tasks consisting of multiple jobs, if required. -We assume that the hardware clock can only be read at the beginning of a job. This models the fact that real clocks cannot usually be read arbitrarily fast, i.e., with zero access time. This restriction in conjunction with our definition of message delays allows us to define transition functions in exactly the same way as in the classic model. After all, the transition function just defines the "logical" semantics of a transition, but not its timing. -If a timer set during some job J expires earlier than end(J ), the timer message will arrive at time end(J ), when J has completed. -In the classic zero step-time model, a faulty processor can send an arbitrary number of messages to all other proces-sors. This is not an issue when assuming zero step times, but could cause problems in the real-time model: It would allow a malicious node to create a huge number of jobs at any of its peers. Consequently, we must ensure that messages from faulty processors do not endanger the liveness of the algorithm at correct processors.
To protect against such "babbling" faulty nodes, each processor is equipped with an admission control component, allowing the scheduler to drop certain messages instead of processing them.
Both the scheduling and the admission control policy are represented by a single function pol : (queue, alg. state, HC reading) → (msg, queue new ), with queue new ⊆ queue, msg ∈ queue new and msg ∈ queue ∪ {⊥}. The non-idling requirement can be formalized as msg = ⊥ ⇒ queue new = ∅.
This function is used whenever a scheduling decision is made, i.e., (a) at the end of a job and (b) whenever the queue is empty, the processor is idle, and a new message just arrived. If msg = ⊥, the scheduling decision causes msg to be processed. "alg. state" refers to the newstate of the job that just finished or last finished, corresponding to cases (a) and (b), respectively, or the initial state, if no job has been executed on that processor yet.
Since we assume non-preemptive scheduling of jobs, a message received while the processor is currently busy will be neither scheduled nor dropped until the current job has finished. "Delaying" the admission control decision in such a way has the advantage that no intermediate states can ever be used for admission control decisions.

System parameters
Like the processing delay, the message delay and hence the bounds [δ − , δ + ] may depend on the number of messages sent in the sending job: For example, δ + (3) is the upper bound on the message delay of messages sent by a job sending three messages in total. Formally, the interval boundaries δ − , δ + , μ − and μ + can be seen as functions {0, . . . , n−1} → R + , representing a mapping from the number of destination processors to which ordinary messages are sent during that computing step to the actual message or processing delay bound. We assume that δ − ( ) , δ + ( ) , μ − ( ) and μ + ( ) as well as the message delay uncertainty ε ( ) = δ + ( ) − δ − ( ) are non-decreasing w.r.t. . In addition, sending messages at once must not be more costly than sending those messages in multiple steps; formally, ∀i, j ≥ 1 : The delay of a message δ ∈ [δ − , δ + ] is measured from the real-time of the start of the job sending the message to the arrival real-time at the destination processor (where the message will be enqueued or, if the processor is idle, the corresponding job starts immediately). This might seem counterintuitive at a first glance. However, this was not a technical requirement but rather a deliberate choice: The message delays are in fact bounds on the sum of (a) the time between the start of the job and the actual sending of the message and (b) the actual transmission delays.
Defining the message delay this way makes the model more flexible: If information about the actual sending time of the messages is known (e.g., always approximately in the middle of the job), this information can be used to make the bounds [δ − , δ + ] more realistic. Adding (a) to the message delay is justified, since this is a more-or-less constant value-in stark contrast to the queuing delay, which, depending on the system load, might vary between none and multiple processing delays.
Thus, as a trade-off between accuracy and simplicity, we chose the option where messages are "sent" at the start of processing a job, since it allows at least some information about the actual sending times to be incorporated into the model, without adding additional parameters or making the transition function more complex.
In addition, it is important to note that our model naturally supports a fine-grained modeling of standard "tasks" used in classic real-time analysis papers: Instead of modeling a job as a significant piece of code, a job in our setting can be thought of as consisting of a few simple machine operations: A classic task is then made up of several jobs, which are executed consecutively (and may of course be preempted at job boundaries). Hence, a job involving the sending of a message can be anywhere within the sequence of jobs making up a task.

Real-time runs
A real-time run (rt-run) corresponds to an execution in the classic model. An rt-run consists of a sequence ru of receive events, jobs and drop events, and of an associated set of n hardware clocks HC ru = {HC ru p , HC ru q , . . .}. (Again, the superscript will be omitted if clear from context).
A receive event R for a message arriving at p at real-time t is a triple consisting of the processor index pr oc(R) = p, the message msg(R), and the arrival real-time time(R) = t. Note that t is the receiving/enqueuing time in Fig. 1.
A job J starting at real-time t on p is a 6-tuple, consisting of the processor index pr oc(J ) = p, the message being processed msg(J ), the start time begin(J ) = t, the job processing time duration(J ), the hardware clock reading HC(J ) = HC p (t), and the state transition sequence trans(J ) = [oldstate, . . . , newstate]. We define end(J ) = begin(J ) + duration(J ). Figure 1 provides an example of an rt-run containing three receive events and three jobs on the second processor. Note that neither the actual state transition times nor the actual sending times of the sent messages are modeled in a job.
A drop event D at real-time t on processor p consists of the processor index pr oc(D) = p, the message msg(D), and the dropping real-time time(D) = t. These events represent messages getting dropped by the admission control component rather than being processed by a job.
Formally, an rt-run ru of some algorithm A must satisfy the following properties: RU1 ru must be a sequence of receive events, drop events and jobs with a well-defined total order ≺. The begin times (begin(J ) for jobs, time(R) and time(D) for receive events and drop events) must be nondecreasing. Message sending, receiving and processing/dropping must be in the correct causal order, i.e., RU2 Processor states can only change during a job, i.e., newstate(J 1 ) = oldstate(J 2 ) must hold for two consecutive jobs J 1 and J 2 on the same processor. RU3 The first job J at every processor p must occur in an initial state of A. RU4 The hardware clock readings must increase strictly, continuously and without bound. RU5 Messages must be unique, i.e., there is at most one job sending some message m, at most one receive event receiving it, and at most one job processing it or drop event dropping it. Messages must only be sent by and received/processed/dropped by the processors specified in the message. RU6 Every non-input message that is received must have been sent. Every message that is processed or dropped must have been received. RU7 Jobs on the same processor do not overlap: If J ≺ J and pr oc(J ) = pr oc(J ), then end(J ) ≤ begin(J ). RU8 Drop events can only occur when a scheduling decision is made, i.e., immediately after a receive event when the processor is idle, or immediately after a job has finished processing.

Failures and admissibility
A failure model indicates whether a given execution or rt-run is admissible w.r.t. a given system running a given algorithm.
In this work, we restrict our attention to the f -f -ρ failure model, which is a hybrid failure model [5,19,35] that incorporates both crash and Byzantine faulty processors. Of the n processors in the system, -at most f ≥ 0 may crash and -at most f ≥ 0 may be arbitrarily faulty ("Byzantine").
All other processors are called correct.
A given execution (resp. rt-run) conforms to the f -f -ρ failure model, if all message delays are within [δ − , δ + ] (resp. [δ − , δ + ]) and the following conditions hold: -All timer messages arrive at their designated hardware clock time. -On all non-Byzantine processors, clocks drift by at most ρ: ∀t, t : . -All correct processors make state transitions as specified by the algorithm. In the real-time model, they obey the scheduling/admission policy, and all of their jobs take between μ − and μ + time units. -A crashing processor behaves like a correct one until it crashes. In the classic model, the state transition sequence of all actions after the crash contains only the oneelement "NOP sequence" [s], i.e., s = oldstate(ac) = newstate(ac). In the real-time model, after a processor has crashed, all messages in its queue are dropped, and every new message arriving will be dropped immediately rather than being processed. Unclean orderly crashes are allowed: the last action/job on a processor might execute only a prefix of its state transition sequence.
In the analysis and the transformation proofs, we will examine given executions and rt-runs. Therefore, we know which processors behaved in a correct, crashing or Byzantine faulty manner. Note, however, that this information is only available during analysis; the algorithms themselves, including the simulation algorithms presented in the following sections, do not know which of the other processors are faulty. The same holds for timing information: While, during analysis, we can say that an event occurred at some exact real time t, the only information available to the algorithm is the local hardware clock reading at the beginning of the job.
Formally, failure models can be specified as predicates on executions and rt-runs. Let Π denote the set of n processors. f -f -ρ is defined as follows. Predicates involving faulty processors are underlined.
The predicates obeys_ pol(R) and obeys_ pol(J ) refer to the scheduling and the admission control policy. obeys_ pol(R) and obeys_ pol(J ) are defined to be satisfied if the following conditions hold, respectively: This scheduling decision causes messages to be dropped and/or a job to be started (according to the chosen policy pol).
The table in Fig. 2 formalizes the other predicates used in the definition of f -f -ρ. In Sect. 6, two variants of failure model f -f -ρ will be considered: in the real-time model plus the following restriction: These variants will be explained in detail in Sect. 6.

State transition traces
The global state of a system is composed of the real-time t and the local state s p of every processor p. Rt-runs do not allow a well-defined notion of global states, since they do not fix the exact time of state transitions in a job. Thus, we use the "microscopic view" of state-transition traces (st-traces) introduced in [24,26] to assign real-times to all atomic state transitions.

Definition 1 A state transition event (st-event) represents a change in the global state or the arrival of an input message.
It is -a tuple (transition : t, p, s, s ), indicating that, at time t, processor p changes its internal state from s to s , or -a tuple (input : t, m), indicating that, at time t, input message m arrives from an external source. 5 Example 2 Let J with trans(J ) = [oldstate, msg. m to q, int.st. 1 , newstate] and pr oc(J ) = p be a job in a real-time run ru. If tr is an st-trace of ru, then it contains the following st-events ev and ev : An st-trace tr contains the set of st-events, the processor's hardware clock readings HC tr (=HC ex or HC ru ), and, for every time t, at least one global state g = (s 1 (g), . . . , s n (g)).
Note carefully that tr may contain more than one g with time(g) = t. For example, if t = t in the previous example, three different global states at time t would be present in the st-trace, with s p (g) representing p's state as oldstate, int.st. 1  The relation ≺ must also preserve the causality of state transitions connected by a message: For example, if one job has a transition sequence of [s 1 , s 2 , msg, s 3 ] and the receipt of msg spawns a job with a transition sequence of [s 4 , s 5 ] on another processor, the switch from s 1 to s 2 must occur before the switch from s 4 to s 5 , since there is a causal chain Clearly, there are multiple possible st-traces for a single rt-run. Executions in the classic model have corresponding st-traces as well, with t = time(ac) for the time t of all st-events corresponding to some action ac.
A problem P is defined as a set of (or a predicate on) sttraces. An execution or an rt-run satisfies a problem if tr ∈ P holds for all its st-traces. If all st-traces of all admissible rtruns (or executions) of some algorithm in some system satisfy P, we say that this algorithm solves P in the given system.

Running real-time algorithms in the classic model
As the real-time model is a generalization of the classic model, the set of systems covered by the classic model is a strict subset of the systems covered by the real-time model. More precisely, every system in the classic model Thus, every result (correctness or impossibility) for some classic system also holds in the corresponding real-time system with (a) the same message delay bounds, (b) μ − ( ) = μ + ( ) = 0 for all , and (c) an admission control component that does not drop any messages. Intuition tells us that impossibility results also hold for the general case, i.e., that an impossibility result for some classic system (n, [δ − , δ + ]) holds for all real-time systems (n, [δ − , δ + ], [μ − , μ + ]) with δ − ≤ δ − , δ + ≥ δ + and arbitrary μ − , μ + as well, because the additional delays do not provide the algorithm with any useful information.
As it turns out, this conjecture is true: This section will present a simulation (Algorithm 1) that allows us to use an algorithm designed for the real-time model in the classic model-and, thus, to transfer impossibility results from the classic to the real-time model (see Sect. 7.1 for an example)provided the following conditions hold:

Cond1
Problems must be simulation-invariant. A problem P is simulation-invariant, if there exists a finite set V of variable names, such that P can be specified as a predicate on gstates(tr)| V and the sequence of input st-events (which usually takes the form Pred 1 (input st-events of tr) ⇒ Pred 2 (gstates(tr)| V )).
Informally, this means that adding variables to some algorithm or changing its message pattern does not influence its ability to solve some problem P, as long as the state transitions of the "relevant" variables V still occur in the same way at the same time.
For example, the classic clock synchronization problem specifies conditions on the adjusted clock values of the processors, i.e., the hardware clock values plus the adjustment values, at any given real time. The problem cares neither about additional variables the algorithm might use nor about the number or contents of messages exchanged.
The advantage of such a problem specification is that algorithms can be run in a (time-preserving) simulation environment and still solve the problem: As long as the algorithm's state transitions are the same and occur at the same time, the simulator may add its own variables and change the way information is exchanged. On the other hand, a problem specification that restricts either the type of messages that might be sent or the size of the local state would not be simulation invariant.

Cond2
The delay bounds in the classic system must be at least as restrictive as those in the real-time system. As long as δ − ( ) ≤ δ − and δ + ( ) ≥ δ + holds (for all ), any message delay of the simulating execution (δ ∈ [δ − , δ + ]) can be directly mapped to a message delay in the simulated rt-run Fig. 6a. Thus, a simulated message corresponds directly to a simulation message with the same message delay. Cond3 Hardware clock drift must be reasonably low. Assume a system with very inaccurate hardware clocks, combined with very accurate processing delays: In that case, timing information might be gained from the processing delay, for example, by increasing a local variable by (μ − + μ + )/2 during each computing step. If ρ, the hardware clock drift bound, is very large and μ + − μ − is very small, the precision of this simple "clock" might be better than the one of the hardware clock. Thus, algorithms might in fact benefit from the processing delay, as opposed to the zero step-time situation.
To avoid such effects, the hardware clock must be "accurate enough" to define (time-out) a time span that is guaranteed to lie within μ − and μ + , which requires ρ ≤ In this case, the classic system can simulate a delay within μ − ( ) and μ + ( ) real-time units by waiting forμ ( ) = 2 hardware clock time units.
Since HC p is an unbounded, strictly increasing continuous function (cf. EX4), an inverse function HC −1 p , mapping hardware clock time to real time, exists. Thus, ∀T <

Overview
The following theorem, which hinges on a formal transformation from executions to rt-runs, represents one of the main results of this paper in a slightly simplified version.
For didactic reasons, the following structure will be used in this section: First, the simulation algorithm, the transformation and a sketch of the correctness proof for Theorem 5 will be presented. Afterwards, we show how Cond2 can be weakened, followed by a full formal proof of correctness.
Cond2: 2) ], …. In some cases, such an interval [δ − , δ + ] might not exist: Consider, e.g., the case in the bottom half of Fig. 6b, After the sketch of Theorem 5's proof, we will show that it is possible to weaken Cond2 while retaining correctness, although this modification adds complexity to the transformation as well as to the algorithm and the proof.

Algorithm
Algorithm S A, pol,μ (=Algorithm 1), designed for the classic model, allows us to simulate a real-time system, and, thus, to use an algorithm A designed for the real-time model to solve problems in a classic system. The algorithm essen-Algorithm 1 Simulation algorithm S A, pol,μ , which allows to simulate the execution of an algorithm designed for the real-time model in the classic model.
tially simulates queuing, scheduling, and execution of realtime model jobs of some duration within μ − ( ) and μ + ( ) ; it is parameterized with some real-time algorithm A, some scheduling/admission policy pol and the waiting timeμ . We define S A, pol,μ to have the same initial states as A, with the set of variables extended by a queue and a flag idle.
All actions occurring on a non-Byzantine processor within an execution ex of S A, pol,μ fall into one of the following five groups: (a) an algorithm message arriving, which is immediately processed, (b) an algorithm message arriving, which is enqueued, (c) a (finishedprocessing) timer message arriving, causing some message from the queue to be processed, (d) a (finishedprocessing) timer message arriving when no messages are in the queue (or all messages in the queue get dropped), (e) an algorithm message arriving, which is immediately dropped. Figure 3 illustrates state transitions (a)-(e) in the simulation algorithm: At every point in time, the simulated processor is either idle (variable idle = true) or busy (idle = f alse). Initially, the processor is idle. As soon as the first algorithm message (i.e., a message other than the internal (finishedprocessing) timer message) arrives [type (a) action], the processor becomes busy and waits for μ ( ) hardware clock time units ( being the number of ordinary messages sent during that computing step), unless the message gets dropped by the scheduling/admission policy immediately [type (e) action], which would mean that the processor stays idle. All algorithm messages arriving while the processor is busy are enqueued [type (b) action]. After theseμ ( ) hardware clock time units have passed (modeled as a (finishedprocessing) timer message arriving), the queue is checked and a scheduling/admission decision is made (possibly dropping messages). If it is empty, the processor returns to its idle state [type (d) action]; otherwise, the next message is processed [type (c) action].  Fig. 4, the first step of the proof that this simulation is correct consists of transforming every execution ex of S A, pol,μ into a corresponding rt-run of A. By showing that this rt-run is an admissible rt-run of A and that the execution and the rt-run have (roughly) the same state transitions, the fact that the execution satisfies P will be derived from the fact that the rt-run satisfies P.
The transformation ru = T C→R (ex) constructs an rt-run ru. We set HC ru p = HC ex p for all p, such that both ex and ru have the same hardware clocks. Depending on the type of action, a corresponding receive event, job and/or drop event in ru is constructed for each action ac on a fault-free processor. Crashing processors: When a processor crashes in ex, there is some action ac last that might execute only part of its state transition sequence and that is followed only by actions with "NOP" transitions. All actions up to ac last are mapped according to the rules above. If ac last was a type (a) or (c) action that did not succeed in sending out its (finishedprocessing) message, we will, for the purposes of the transformation, assume that such a (finishedprocessing) message with a real-time delay of μ − ( ) had been sent; this allows us to construct the corresponding job J last . 6 If ac last was not a type (a) or (c) action, let J last be the job corresponding to the last type (a) or (c) action before ac last (if such an action exists).
Clearly, all actions on ex occurring between begin(J last ) and end(J last ) are (possibly partial) type (b) actions (before the crash) or NOP actions (after the crash). All of these actions are treated as type (b) actions w.r.t. the transformation, i.e., they are transformed into simple receive events. After J last has finished, all messages still in queue plus all messages received during J last are dropped, i.e., a drop event is created in ru for each of these messages at time end(J last ).
Every action after end(J last ) on this processor (which must be a NOP action) is treated like a type (e) action: It is mapped to a receive event immediately followed by a drop event.
Byzantine processors: On Byzantine processors, every action in the execution is simply mapped to a corresponding receive event and a zero-time job, sending the same mes-sages and performing the same state transitions. Since jobs on Byzantine nodes do not need to obey any timing restrictions, it is perfectly legal to model them as taking zero time.

Special case: timer messages
There is a subtle difference between the classic and the realtime model with respect to the arrives_timely(m t ) predicate of f -f -ρ: In an rt-run, a timer message m t sent during some job J arrives at the end of the job (end(J )) if the desired arrival hardware clock time (s HC(m t )) occurs while J is still in progress. On the other hand, in an execution, the timer message always arrives at s HC(m t ).
For T C→R this means that the transformation rule for type (b) actions changes: If the type (b) action ac for timer message m t = msg(ac) occurs at some time t = time(ac) while the (finishedprocessing) message corresponding to the simulated job that sent m t is still in transit, then the corresponding receive event R does not occur at t but rather at t = time(ac ), with ac denoting the type (c) or (d) action where the (finishedprocessing) message arrives.
This change ensures that the receive event in the simulated rt-run occurs at the correct time, i.e., no earlier than at the end of the job sending the timer message. One inconsistency still remains, though: The order of the messages in the queue might differ between the simulated queue in the execution (i.e., variable queue) and the queue in the rt-run constructed by T C→R : In the execution, m t is added to queue at time t, whereas in the rt-run, m t is added to the real-time queue at time t . This could make a difference, for example, when another message arrives between t and t .
Since S A, pol,μ "knows" about A, it is obviously possible for the simulation algorithm to detect such cases and reorder queue accordingly. We have decided not to include these details in Algorithm 1, since the added complexity might make it more difficult to understand the main structure of the simulation algorithm. For the remainder of this section, we will assume that such a reordering takes place.

Observations on algorithm S A, pol,μ and transformation T C→R
The following can be asserted for every fault-free or not-yetcrashed processor: The following observation follows directly from this lemma and the design of the algorithm: Proof Since ac is a type (a) or (c) action, newstate(ac). idle = f alse, which, by Lemma 8, cannot change until no more (finishedprocessing) messages are in transit. By Observation 7, this cannot happen earlier than at hardware clock time T +μ ( ) . Lemma 8 also states that no second (finishedprocessing) message can be in transit simultaneously.
Thus, between T and T +μ ( ) , idle = f alse and only algorithm messages arrive at p, which means that only type (b) actions can occur. job → (finishedprocessing): Follows from the fact that, due to the rules of T C→R , jobs only exist in ru if there is a corresponding type (a) or (c) action in ex. These actions send (finishedprocessing) messages, and the mapping of the job length to the delivery time of the (finishedprocessing) message ensures that these messages do not arrive until the job has completed.

Correctness proof (sketch)
This section will sketch the proof idea for Theorem 5, following the outline of Fig. 4. Its main purpose is to prepare the reader for the more intricate proof of Theorem 16.
As defined in Theorem 5, let s = (n, [δ − , δ + ]) be a classic system and P be a simulation-invariant problem (Cond1). Let A be an algorithm solving problem P in some real-time system s = (n, [δ − , δ + ], [μ − , μ + ]) with some scheduling/admission policy pol under failure model . As shown in Lemma 4, Cond3 ensures that the simulation algorithm can simulate a real-time delay between μ − ( ) and μ + ( ) . For each execution ex of S A, pol,μ in s conforming to failure model f -f -ρ, we create the corresponding rt-run ru according to transformation T C→R . Applying the formal definitions of a valid rt-run and of failure model f -f -ρ, it can be shown that r u is an admissible rt-run of algorithm A in system s.
Since (a) ru is an admissible rt-run of algorithm A in s, and (b) A is an algorithm solving P in s, it follows that ru satisfies P. Choose any st-trace tr ru of ru where all state transitions are performed at the beginning of the job. Since ru satisfies P, tr ru ∈ P. Transformation T C→R ensures that exactly the same state transitions are performed in ex and ru (omitting the simulation variables queue and idle). Since (i) P is a simulation-invariant problem, (ii) tr ru ∈ P, and (iii) every st-trace tr ex of ex performs the same state transitions on algorithm variables as some tr ru of ru at the same time, it follows that tr ex ∈ P and, thus, ex satisfies P. By applying this argument to every admissible execution ex of S A, pol,μ in s, we see that every such execution satisfies P. Thus, S A, pol,μ solves P in s under failure model f -f -ρ.
Proof Analogous to Lemma 4.
Of course, being able to add this delay implies that each algorithm message is wrapped into a simulation message that also includes the value . The right-hand side of Fig. 6 illustrates the principle of this extended algorithm (Algorithm 2), denoted S A, pol,δ,μ , and the transformation of an execution of S A, pol,δ,μ into an rt-run.
Interestingly, for S A, pol,δ,μ to work, Cond1 needs to be strengthened as well. Recall that processors can only send messages during an action or during a job, which, in turn, must be triggered by the reception of a message -this is the exact reason why we need input messages to boot the system! This restriction applies to Byzantine processors as well.
Consider Fig. 6b and assume that (1) the first action/job on the first processor boots the system and that (2) the second processor is Byzantine. Note that messages (m, 2) (in the execution) and m (in the rt-run) are received at different times. Since Byzantine processors can make arbitrary state transitions and send arbitrary messages, in the classic model, the second processor could send out a message m right after receiving (m, 2). Let us assume that this happens, and let us call this execution ex .
Mapping ex to an rt-run ru , however, causes a problem: We cannot map m to ru , since, in the real-time model, the second processor has not received any message yet. Thus, it has not booted -there is no corresponding job that could send m . 7 Note that this is only an issue during booting: Afterwards, arbitrary jobs could be constructed on the Byzantine processor due to its ability to send timer messages to itself. Since booting is modeled through input messages, we strengthen Cond1 as follows:

Transformation T C→R revisited
S A, pol,δ,μ adds an additional layer: The actions of S A, pol,μ previously triggered by incoming ordinary messages are now caused by an (additionaldelay, m) message instead. Two new types of actions, (f) and (g), can occur: A type (f) action receives a (m, ) pair and sends an (additionaldelay, m) message (possibly with delay 0, if = 1), and a type (g) action ignores a malformed message. For example, the first action on the second processor in Fig. 6b would be a type (f) action. Since S A, pol,δ,μ modifies neither queue nor idle, note that Observations 6, 7 and 9 as well as Lemmas 8, 10 and 11 still hold.
In the transformation, actions of type (f) and (g) are ignored-this also holds for NOP actions on crashed processors that would have been type (f) or (g) actions before the crash. Apart from that, the transformation rules of Sect. 5.3 still apply, with the following exceptions. Let a valid ordinary message be a message that would trigger Line 31 in Algorithm 2 after reaching a fault-free recipient (which includes all messages sent by non-Byzantine processors). Note that T C→R removes the reception of (m, ) and the sending of (additionaldelay, m), since type (f) processor p are unwrapped as well. Note, however, that on p all actions are transformed to (zero-time) jobsthere is no separation in type (a)-(g), since the processor does not need to execute the correct algorithm. In this case, the "unwrapping" just substitutes (m, ) with m on both the sender and the receiver sides and adds a receiving job J R (and a matching receive event) for m with a NOP transition sequence on the Byzantine processor at t + δ − ( ) , with t denoting the sending time of the message. msg(J R ) and msg(R R ), the triggering message of the job and the receive event corresponding to the action receiving the message in ex, is changed to some new dummy timer message, sent by adding it to some earlier job on p. If R R is the first receive event on p, Cond1' allows us to insert a new input message into ru that triggers R R . Adding J R guarantees that the message delays of all messages stay between δ − ( ) and δ + ( ) in ru. On the other hand, keeping J R is required to ensure that any (Byzantine) actions performed by ac R can be mapped to the rt-run and happen at the same time.

Invalid ordinary messages (which can only be sent by
Byzantine processors) are removed from the transition sequence of the sending job. To ensure message consistency, we also need to make sure that the message does not appear on the receiving side: If the receiving processor is non-Byzantine, a type (g) action is triggered on the receiver. Since type (g) actions are not mapped to the rtrun, we are done. If the receiver is Byzantine, let J R be the job corresponding to ac R , the action receiving the message. As in rule 3, we replace msg(J R ) (and the message of the corresponding receive event) with a timer message sent by an earlier job or with an additional input message. Figure 7 shows an example of valid ordinary messages sent to a non-Byzantine ( p 1 ) as well as to a Byzantine ( p 3 ) processor. Note that these modifications to T C→R do not invalidate Lemma 11.

Lemma 13 If ex is a valid execution of S A, pol,δ,μ under failure model f -f -ρ, then r u = T C→R (ex) is a valid rt-run of A.
Proof Let red(s) be defined as state s without the simulation variables queue and idle. We will show that conditions RU1-8 defined in Sect. 4.2 are satisfied: RU1 Applying the T C→R transformation rules to all actions ac in ex in sequential order (except for the special timer message case discussed in Sect. 5.4) ensures nondecreasing begin times in ru. RU1 also requires message causality: Sending message m in ru occurs at the same time as sending message (m, ) in ex, and receiving message m in ru occurs at the same time as receiving message (additionaldelay, m) in ex (or at the sending time plus δ − , in the case of a Byzantine recipient, cf. Fig. 7). On fault-free or crashing processors, J corresponds to some type (a) or (c) action ac and red(newstate (ac)) = newstate(J ). The same holds for J , which corresponds to some type (a) or (c) action ac with red(oldstate(ac )) = oldstate(J ). Since newstate(J ) = oldstate(J ), red(newstate(ac)) = red(oldstate(ac )). As EX2 holds in ex, there must be some action ac in between ac and ac such that red(oldstate(ac )) = red(newstate(ac )). This yields two cases, both of which lead to a contradiction: (1) ac is a type (a) or (c) action. In that case, there would be some corresponding job J with J ≺ J ≺ J in ru, contradicting the assumption that J and J are subsequent jobs. (2) ac is a type (b), (d), (e), (f) or (g) action. Since these kinds of actions only change queue and idle, this contradicts red(oldstate(ac )) = red(newstate(ac )). RU3 On Byzantine processors, RU3 follows directly from EX3 due to the tight relationship between actions and jobs. On the other hand, on every non-Byzantine processor p, oldstate(J ) of the first job J on p in ru is equal to red(oldstate(ac)) of the first type (a) or (c) action ac on p in ex. Following the same reasoning as in the previous point, we can argue that red(oldstate(ac)) = red(oldstate(ac )), with ac being the first (any type) action on p in ex. Since the set of initial states of S A, pol,δ,μ equals the one of A (extended with queue = empt y and idle = true), RU3 follows from EX3. RU4 Follows easily from HC ru p = HC ex p , the transformation rules of T C→R and the fact that EX4 holds in ex. RU5 At most one job sending m: Follows from the fact that, on non-Byzantine processors, every action ac is mapped to at most one job J , trans(J ) is an (unwrapped) subset of trans(ac), and EX5 holds in ex. On Byzantine processors, every action ac is mapped to at most one non-NOP job J sending the same messages plus newly-introduced (unique) dummy timer messages. At most one receive event receiving m: This follows from the fact that on non-Byzantine processors, every action ac is mapped to at most one receive event R in ru receiving the same message (unwrapped) and EX5 holds in ex. On Byzantine processors, every action ac is mapped to at most one receive event receiving the same message as ac plus at most one receive event receiving a newly-introduced (unique) dummy timer messages. At most one job or drop event processing/dropping m: Since EX5 holds in ex, every message received in ex is unique. On Byzantine processors, the action receiving the message is transformed to exactly one job processing it plus at most one job processing some dummy timer message. On other processors, every message gets unwrapped and put into queue at most once and, since pol is a valid scheduling/admission policy, every message is removed from queue at most once. Transformation T C→R is designed such that a job or drop event with msg(J/D) = m is created in ru if, and only if, m gets removed from queue in the corresponding action.
Correct processor specified in the message: Follows from the fact that EX5 holds in ex and that T C→R does not change the processor at which messages are sent, received, processed or dropped. RU6 Assume that there is some message m that has been received but not sent. Due to the rules of T C→R , neither (finishedprocessing) nor (additional-delay) messages are received in ru. The construction also ensures that dummy timer messages on Byzantine processors are sent before being received. Thus, m must be an algorithm message. If m is a timer message, no unwrapping takes place, so there must be a corresponding action receiving m in ex.
Since EX6 holds in ex, there must be an action ac sending m. As m is an algorithm message and all actions sending algorithm timer messages (type (a) and (c), or actions on Byzantine processors) are transformed to jobs sending the same timer messages at the same time, we have a contradiction. If m is an ordinary message received by a non-Byzantine processor, it has been unwrapped in the transformation, i.e., there is a corresponding (additionaldelay, m) message in ex, created by a type (f) action. This type (f) action has been triggered by a (m, ) message, which-according to EX6-must have been sent in ex. As in the previous case, we can argue that an action sending an algorithm message must be of type (a), (c) or from a Byzantine processor. Thus, it is transformed into a job in J , and the transformation ensures that the action sending (m, ) is replaced by a job sending m-a contradiction. Likewise, if m is received by a Byzantine processor, there is a corresponding action receiving (m, ) in ex and the same line of reasoning can be applied. RU7 Consider two jobs J ≺ J on the same non-Byzantine processor pr oc(J ) = p = pr oc(J ). T C→R ensures that there is a corresponding type (a) or (c) action for every job in ru. Let ac and ac be the actions corresponding to J and J and note that time(ac) = begin(J ) and time(ac ) = begin(J ). Lemma 10 implies that ac cannot occur until the (finishedprocessing) message sent by ac has arrived. Since duration(J ) is set to the delivery time of the (finishedprocessing) message in T C→R , J cannot start before J has finished. On Byzantine processors, jobs cannot overlap since they all have a duration of zero. RU8 Drop events occur in ru only when there is a corresponding type (c), (d) or (e) action on a non-Byzantine processor in ex. Type (c) and (d) actions are triggered by a (finishedprocessing) message arriving; thus, by Lemma 11, there is a job in ru finishing at that time. W.r.t. type (e) actions, Observation 9 shows that p is idle in ex when a type (e) action occurs, which, by Lemma 8, means that no (finishedprocessing) message is in transit and, thus, by Lemma 11, there is no job active in ru. Therefore p is idle in ru and T C→R ensures that a receive event occurs at the time of the type (e) action.  Lemma 12). Since the transmission of (m o , ) requires between δ − and δ + time units, a total of

Observation 15
Fix some processor p ∈ F, let ac last be the first action ac on p for which is_last (ac) holds. If ac last is a type (a) or (c) action, is_last (J ) holds for the job J corresponding to ac last . Otherwise, is_last (J ) holds for the job J corresponding to the last type (a) or (c) action on p before ac last .
In the following, let J last for some fixed processor p denote the job J for which is_last (J ) holds.
Correct processors: Observe that, due to the design of S A, pol,δ,μ and T C→R , variable queue in ex represents the queue state of ru. Every receive event in ru occurring while the processor is idle corresponds to either a type (a) or a type (e) action. In every such action, a scheduling decision according to pol is made (Line 11) and T C→R ensures that either a drop event (type (e) action) or a job (type (a) action) according to the output of that scheduling decision is created.
Crashing processors: Fix some processor p ∈ F and let ac last be the first action ac on p satisfying is_last (ac). For all actions on p up to (and including) ac last (or for all actions, if no such ac last exists), the transformation rules are equivalent to those for correct processors and, thus, the above reasoning applies for all receive events on p prior to J last (cf. Observation 15). The transformation rules for messages received on crashing processors (Sect. 5.8) ensure that all receive events satisfy either obeys_ pol(R) (if received during J last : no scheduling decision-neither job start nor message drop-is made) or arrives_a f ter_crash(R) and drops_msg(R) (if received after J last has finished processing: the message is dropped immediately).
Correct processors: The same reasoning as in the previous point applies: Every job in ru finishing corresponds to a type (c) or (d) action in ex in which the (finishedprocessing) message representing that job arrives. Both of these actions cause a scheduling decision (Line 11) to be made on queue (which corresponds to ru's queue state), and corresponding drop events and/or a corresponding job (only type (c) actions) are created by T C→R .
Crashing processors: For all jobs before J last , the same reasoning as for correct processors applies. The transformation rules ensure that all messages that have not been processed or dropped before get dropped at end(J last ).
Correct processors: Let ac be the type (a) or (c) action corresponding to J . ac executes all state transitions of A (Line 17) for either msg(ac) (type (a) action) or some message from the queue (type (c) action) and the current hardware clock time, plus some additional operations that only affect variables queue and idle and (finishedprocessing) messages. Thus, T C→R 's choice of HC(J ), msg(J ) and trans(J ) ensure that trans(J ) conforms to algorithm A.
Crashing processors: For all jobs before J last , the same reasoning as for the correct processor applies. Since J last corresponds to either ac last (which also satisfies f ollows_alg_ partially) or to some earlier type (a) or (c) action (which satisfies f ollows_alg), f ollows_alg_ partially(J last ) is satisfied. Crashing processors: For all jobs before J last , the same reasoning as for the correct processor applies. If ac, the action corresponding to J last , was able to successfully send a (finishedprocessing) message, the above reasoning holds for J last as well. Otherwise, the transformation rules (Sect. 5.3) ensure that J last takes exactly μ − ( ) time units, with denoting the number of ordinary messages that would have been sent in the non-crashing case, as required by is_timely_ job.
Follows from the definition that HC ru p = HC ex p and the fact that the corresponding bounded_dri f t condition holds in ex. To show that ex satisfies P, we must show that tr ∈ P holds for every st-trace tr of ex. Let tr be an st-trace of ex, and let tr /t be the list of all transition st-events in tr [D2]. We will construct some transition list tr/t from tr /t by sequentially performing these operations for the transition st-events of all non-Byzantine processors:

Transformation proof
1. Remove the variables queue and idle from all states.

Remove any transition st-events that only manipulate
queue and/or idle. Note that, due to the previous step, these st-events satisfy oldstate = newstate.
Since P is a simulation-invariant problem, there is some finite set V of variable names, such that P is a predicate on global states restricted to V and the sequence of input stevents (cf. Definition 3). Since variables queue and idle in algorithm S A, pol,δ,μ could be renamed arbitrarily, we can assume w.l.o.g. that queue ∈ V and idle ∈ V. Examining the list of operations in the definition of tr reveals that gstates(tr )| V = gstates(tr)| V , for every tr having the same transition st-events as tr/t [L3].
We now show that tr/t is the transition sequence of some st-trace of ru where all transitions happen at the very beginning of each job.
-Every job in r u on a non-Byzantine processor is correctly mapped to transition st-events in tr/t: Every job J in ru is based on either a type (a) or a type (c) action ac in ex. According to Sect. 4, the transition st-events produced by mapping ac are the same as the st-events produced by mapping J , except that the st-events mapped by ac 8 To aid the reader in following the arguments of this proof, we will label assumptions, definitions and lemmas used solely in this proof in bold face, e.g. contain the simulation variables. However, they have been removed by the transformation from tr /t to tr/t.

-Every transition st-event in tr/t on a non-Byzantine processor corresponds to a job in r u: Every st-event in
tr /t is based on an action ac in ex. Since the transformation tr /t → tr/t does not add any st-events, every stevent in tr/t is based on an action ac in ex as well. Since all st-events only modifying queue and idle have been removed, tr/t only contains the st-events corresponding to some type (a) or (c) action in ex.
The st-events in tr /t contain the transition st-events of A-process_message(msg, current_hc) and additional steps taken by the simulation algorithm. The transformation from tr /t to tr/t ensures that these additional steps (and only these) are removed. Thus, the remaining st-events in tr correspond to the job J corresponding to ac.

Running classic algorithms in the real-time model
When running a real-time model algorithm in a classic system, as shown in the previous section, the st-traces of the simulated rt-run and the ones of the actual execution are very similar: Ignoring variables solely used by the simulation algorithm, it turns out that the same state transitions occur in the rt-run and in the corresponding execution.
Unfortunately, this is not the case for transformations in the other direction, i.e., running a classic model algorithm in a real-time system: The st-traces of a simulated execution Algorithm 3 Simulation algorithm S A , which allows to run an algorithm for the classic model in the real-time model. 1: local state (= global variables of A) 2: 3: procedure S A -process_message(msg, current_hc) 4: A-process_message (msg, current_hc) are usually not the same as the st-traces of the corresponding rt-run. While all state transitions of some action ac at time t always occur at this time, the transitions of the corresponding job J take place at some arbitrary time between t and t + duration(J ). Thus, there could be algorithms that solve a given problem in the classic model, but fail to do so in the real-time model. Fortunately, however, it is possible to show that if some algorithm solves some problem P in some classic system, the same algorithm can be used to solve a variant of P, denoted P * μ + , in some corresponding real-time system, where the endto-end delay bounds Δ − and Δ + of the real-time system equal the message delay bounds δ − and δ + of the simulated classic system. For the fault-free case, this has been already been shown [26].
The key to this transformation is a very simple simulation: Let S A (= Algorithm 3) be an algorithm for the realtime model, comprising exactly the same initial states and transition function as a given classic model algorithm A.
The major problem here is the circular dependency of the algorithm A on the real end-to-end delays and vice versa: On one hand, the classic model algorithm Definition 17 [26] Let tr be an st-trace based on some execution ex or rt-run ru. A μ + -shuffle of tr is constructed by moving transition st-events in tr at most μ + ( ) time units into the future. These operations must preserve causality in a sense similar to the well-known happened before relation [10]: Every st-event may be shifted by a different value v, 0 ≤ v ≤ μ + ( ) . In addition, input st-events may be moved arbitrarily far into the past. 9 P * μ + is the set of all μ + -shuffles of all st-traces of P. 10 Example 18 Consider the classic Mutual Exclusion problem for P, and assume that there is some algorithm A solving this problem in the classic model. When running S A in the real-time model, the situation depicted in Fig. 8 can occur: As the actual state transitions can occur at any time during a job (marked as ticks in the figure), it may happen that, at a certain time (marked as a dotted vertical line), p has entered the critical section although q has not left yet. This situation arises because P * μ + is a weaker problem than mutual exclusion; in other words, S A only solves mutual exclusion with up to μ + -second overlap.
On the other hand, assume that P is the 3-second gap mutual exclusion problem, defined by the classic mutual exclusion properties and the additional requirement that all processors must have left the critical section for more than 3 seconds before the critical section can be entered again by some processor. In that case, P * μ + with μ + = 2 seconds is the 1-second gap mutual exclusion problem. Thus, if A 9 For practical purposes, this condition can be weakened from "arbitrarily far" to "the length of the longest busy period". 10 Recall from Sect. 4.4 that a problem is defined as a set of st-traces. Nevertheless, it turns out that most classic mutual exclusion algorithms work correctly in the real-time model. The reason is that these algorithms in fact solve a stronger problem: Let P be causal mutual exclusion, defined by the classic mutual exclusion properties and the additional requirement that every state transition in which a processor enters a critical section must causally depend on the last exit. Since shuffles must not violate causality, in this case, P * μ + = P, and the same algorithm used for some classic system can also be used in a real-time system with a feasible end-to-end delay assignment.

Conditions
Theorem 22 will show that the following conditions are sufficient for the transformation to work in the fault-tolerant case:

Cond1
There is a feasible end-to-end delay assignment The scheduling/admission policy (a) only drops irrelevant messages and (b) schedules input messages in FIFO order. More specifically, (a) only messages that would have caused a job J with a NOP state transition are allowed to be dropped. For example, these could be messages that obviously originate from a faulty sender or, in round-based algorithms, late messages from previous rounds. (b) If input messages m 1 and m 2 are in the queue and m 1 has been received before m 2 , then m 2 must not be dropped or processed before m 1 has been dropped or processed. Cond3 The algorithm tolerates late timer messages, and the scheduling policy ensures that timer messages get processed soon after being received. In the classic model, a timer message scheduled for hardware clock time T gets processed at time T . In the real-time model, on the other hand, the message arrives when the hardware clock reads T , but it might get queued if the processor is busy. Still, an algorithm designed for the classic model might depend on the message being processed exactly at hardware clock time T . Thus, either (a) the algorithm must be tolerant to timers being processed later than their designated arrival time or (b) the scheduling policy must ensure that timer messages do not experience queuing delays-which might not be possible, since we assume a non-idling and non-preemptive scheduler.
Cond3 is a combination of those options: The algorithm tolerates timer messages being processed up to α real-time units after the hardware clock read T , and the scheduling policy ensures that no timer message experiences a queuing delay of more than α. Options (a) and (b) outlined above correspond to the extreme cases of α = ∞ and α = 0.
These requirements can be encoded in failure models: ff -ρ+latetimers α , a failure model on executions in the classic model, is weaker than , since timer messages may arrive late by at most α seconds in the former. On the other hand, f -f -ρ+precisetimers α , a failure model on rt-runs in the real-time model that restricts timer message queuing by the scheduler to at most α seconds, is stronger than See Sect. 4.3 for the formal definition of these models.

The transformation T R→C from rt-runs to executions
As shown in Fig. 9, the proof works by transforming every rtrun of S A into a corresponding execution of A. By showing that (a) this execution is an admissible execution of A (w.r.t. f -f -ρ+latetimers α ) and (b) the execution and the rt-run have (roughly) the same state transitions, the fact that the rt-run satisfies P * μ + can be derived from the fact that the execution satisfies P. This transformation, ex = T R→C (ru), works by -mapping each job J in ru to an action ac in ex, with time(ac) = begin(J ), -mapping each drop event D in ru to a NOP action ac in ex, -setting HC ex p = HC ru p for all p. -Receive events in ru are ignored.
The following sections will show the correctness of the transformation.

Lemma 19 If r u is a valid rt-run of S A , ex = T R→C (ru) is a valid execution of A.
Proof EX1-6 (cf. Sect. 4.1) are satisfied in ex: EX1 follows from RU1 by ordering the actions like their corresponding jobs and drop events. EX2 follows from RU2 and the fact that the order of jobs in ru corresponds to the order of actions in ex, that the transition sequence is not changed and that the "correct" state is chosen for actions corresponding to drop events. EX3 is a direct consequence of RU3 and the fact that both ru and ex run the same algorithm (i.e., use the same initial state). Since ru and ex use the same hardware clocks, RU4 suffices to satisfy EX4. EX5 follows directly from RU5, and EX6 follows from RU6. Thus, ex is a valid execution of A.

Lemma 20
For every message m in ex, the message delay δ m is equal to the end-to-end delay Δ m of its corresponding message m in r u.
Proof By construction of ex, the sending time of every message stays the same (time(ac) = begin(J ), with ac and J being the sending action/job; recall that message delays are measured from the start of the sending job). For dropped messages, the drop time in ru equals the receiving/processing time in ex (time(ac) = time(D), with ac being the processing action and D being the drop event). For other messages, the processing time in ru equals the receiving/processing time in ex (time(ac) = begin(J ), with ac being the processing action and J being the processing job).  Before the processor crashes: The same arguments hold for all jobs J ≺ J last on p and all drop events before J last . Thus, (a) also holds for their corresponding actions.

Failure model compatibility
During the crash: For J = J last , the definition of f ollows_alg_ partially(ac)/(J ) directly translates to the corresponding action ac last . Since there are no jobs J J last on p, only actions based on drop events can occur in p after ac last , causing ac last to satisfy is_last (ac last ). Thus, ac last satisfies (b).
After the processor crashes: By definition of is_last (J ), no jobs occur in ru after a processor has crashed. Drop events occurring after a processor has crashed need not (and usually will not) obey the scheduling policy: Messages received and queued before the last job are dropped directly after that job (see predicate drops_all_queued(J )), and messages received afterwards are dropped immediately (see predicate arrives_a f ter_crash(R)). Since ac last ≺ ac holds for all actions ac corresponding to such drop events (on some processor p), (c), arrives_a f ter_crash(ac), is satisfied.
Follows from the equivalent condition in f -f -ρ+precisetimers α and the fact that T R→C ensures that HC ex p = HC ru p for all p. To show that ru satisfies P * μ + , we must show that every st-trace tr of ru is a μ + -shuffle of an st-trace tr of ex. Let Since pol ensures that input messages are processed in FIFO order (Cond2), the above operations are an inverse subset of the μ + -shuffle operations (see Definition 17); thus, tr is a μ + -shuffle of tr [L2]. Still, we need to show that tr is an st-trace of ex:

Transformation proof
-Every action in ex is correctly mapped to st-events in tr: Every job J in ru is mapped to an action ac in ex and a sequence of transition st-events in tr (plus at most one input st-event corresponding to J 's receive event). There are two differences in the mapping of some job J to stevents and the corresponding action ac to st-events: -The transition st-events all occur at the same time time(ac) when mapping an action. The construction of tr ensures that this is the case. -If msg(ac) is an input message, the corresponding input st-event occurs at the same time as the action processing it. Since ru satisfies RU6, there is also such an input st-event in tr , and, thus, in tr. The construction of tr ensures that this input st-event has the correct position in tr.

Examples
In previous work [26], the fault-free variant of the transformations were applied to the problem of terminating clock synchronization; the results are summarized in Sect. 7.1.
To illustrate the theorems established in this work, we apply them to the Byzantine Generals problem-a wellknown agreement problem that also incorporates failures. Section 7.2 will demonstrate that the comparatively simple worst-case end-to-end delay analysis made possible by our transformations is competitive with respect to the optimal solution.

Terminating clock synchronization
In the absence of clock-drift and failures, clock synchronization is a one-shot problem: Once the clocks are synchronized to within some bound γ , they stay synchronized forever. In the classic system model, a tight bound of γ = (δ + −δ − )(1− 1 n ) of the clock precision (also termed skew) is well-known [13]. Applying our transformations to this problem yields the following results [26]: Lower Bound: The impossibility of achieving a precision better than (δ + − δ − )(1 − 1 n ) translates to an impossibility of a precision better than (δ + (1) − δ − (1) )(1 − 1 n ) in the real-time model (cf. Cond2' in Sect. 5 and Theorem 11 of [26]).
Informally speaking, the argument goes as follows: Assume by way of contradiction that an algorithm A achieving a precision better than (δ + (1) − δ − (1) )(1 − 1 n ) in the realtime model exists. We can now use the transformation presented in [26], which is essentially a simple, non-faulttolerant variant of this paper's Sect. 5, to construct a classic algorithm S A, pol,δ,μ achieving a precision better than (δ + − δ − )(1 − 1 n ). Since the latter is known to be impossible, no such algorithm A can exist.
Since A depends on δ − and δ + , S A depends on Δ − and Δ + (cf. Cond1 in Sect. 6). However, due the simplicity of the algorithm, the message pattern created by A (and, thus, by S A ) does not depend on the actual values of δ − and δ + (or Δ − and Δ + , respectively). When running S A , the worst-case with respect to queuing times occurs when n − 1 messages arrive simultaneously at one processor that has just started broadcasting its clock value. Thus, Δ + can be bounded by δ + (n−1) + μ + (n−1) + (n − 2)μ + (0) (cf. Theorem 10 of [26]). Since every action of A sends either 0 or n − 1 messages, Δ − in S A turns out to be δ − (n−1) . Since (δ + − δ − )(1 − 1 n ) translates to (Δ + − Δ − )(1 − 1 n ) during the transformation, the resulting algorithm S A can synchronize clocks to within (δ + (n−1) + μ + (n−1) + (n − 2)μ + (0) − δ − (n−1) )(1 − 1 n ). Thus, applying these transformations leaves a gap in between what has been a tight bound in the classic model. As a consequence, more intricate algorithms are required to achieve optimal precision in the real-time model. In fact, [26] also shows that a tight precision bound of (δ + (1) − δ − (1) )(1 − 1 n ) can be obtained by using an algorithm specifically designed for the real-time model. On the other hand, the transformed algorithm is still quite competitive and much easier to obtain and to analyze.

The Byzantine generals
We consider the Byzantine Generals problem [11], which is defined as follows: A commanding general must send an order to his n − 1 lieutenant generals such that IC1 All loyal lieutenants obey the same order. IC2 If the commanding general is loyal, then every loyal lieutenant obeys the order he sends.
In the context of computer science, generals are processors, orders are binary values and loyal means fault-free. It is well-known that f Byzantine faulty processors can be tolerated if n > 3 f . The difficulty in solving this problem lies in the fact that a faulty processor might send out asymmetric information: The commander, for example, might send value 0 to the first lieutenant, value 1 to the second lieutenant and no message to the remaining lieutenants. Thus, the lieutenants (some of which might be faulty as well) need to exchange information afterwards to ensure that IC1 is satisfied.
Lamport et al. [11] presents an "oral messages" algorithm, which we will call A: Initially (round 0), the value from the commanding general is broadcast. Afterwards, every round basically consists of broadcasting all information received in the previous round. After round f , the non-faulty processors have enough information to make a decision that satisfies IC1 and IC2.
What makes this algorithm interesting in the context of this paper is the fact that (a) it is a synchronous round-based algorithm and (b) the number of messages exchanged during each round increases exponentially: After receiving v from the commander in round 0, lieutenant p sends " p : v" to all other lieutenants in round 1 (and receives such messages from the others). 11 In round 2, it relays those messages, e.g., processor q would send "q : p : v", meaning: "processor q says: (processor p said: (the commander said: v))", to all processors except p, q and the commander. More generally, in round r ≥ 2, every processor multicasts # S = (n − 2) · · · (n − r ) messages, each sent to n − r − 1 recipients, and receives # R = (n − 2) · · · (n − r − 1) messages. 12 Implementing synchronous rounds in the classic model is straightforward when the clock skew is bounded; for simplicity, we will hence assume that the hardware clocks are perfectly synchronized. At the beginning of a round (at some hardware clock time t), all processors perform some computation, send their messages and set a timer for time t + δ + , after which all messages for the current round have been received and processed and the next round can start.
We model these rounds as follows: The round start is triggered by a timer message. The triggered action, labeled as C, (a) sets a timer for the next round start and (b) initiates the broadcasts (using a timer message that expires immediately). The broadcasts are modeled as # S actions on each processor (labeled as S), connected by timer messages that expire immediately. Likewise, the # R actions receiving messages are labeled R.
Since the algorithm is simple, it is intuitively clear what needs to be done in order to make this algorithm work in the real-time model: We need to determine the longest possible round duration W (in the real-time model), i.e., the maximum time required for any one processor to execute all its C, S and R jobs, and replace the delay of the "start next round" timer from δ + to this value. Figure 10 shows examples of running a round of the algorithm in the real-time model.
Let us take a step back and examine the problem from a strictly formal point of view: Given algorithm A, we will try to satisfy Cond1, Cond2 and Cond3, so that the transformation of Sect. 6 can be applied.
For this example, let us restrict our failure model to a set of f processors that produce only benign message patterns, i.e., a faulty processor may crash or modify the message contents arbitrarily, but it must not send additional messages or send the messages at a different time (than a fault-free (a) (b) (c) Fig. 10 Rounds with durations W a , W b and W c , critical path highlighted [25]. a W a : message transmission faster than total send job processing. b W b : message transmission slower than total send job. c W c : fast receive job processing mit and process all pending round r messages by choosing a scheduling policy that favors C jobs before S jobs before R jobs. Formally, this can be shown by a simple induction on r ; for intuition, examine Fig. 10. Considering this scheduling policy and this lemma, it becomes apparent that α = μ + (n−1) was indeed sufficient (see above): A timer for an S job is only delayed until the current C or S job has finished.

Competitive factor
Since the transformation is generic and does not exploit the round structure, the round duration is considerably larger than necessary: Theorem 22 requires one fixed "feasible assignment" for Δ + ; thus, we had to choose # f S and # f R instead of # S and # R , which are much smaller for early rounds. Define μ C := μ + (0) , μ R := μ + (0) and μ S := μ + (n−r −1) . Since the rounds are disjoint-no messages cross the "round barrier"-and δ + /Δ + are only required for determining the round duration, a careful analysis of the transformation proof reveals that the results still hold if α, the maximum delay of timer messages, and Δ + , the end-to-end delay, are fixed per round. This allows us to choose α = μ S and Δ + = δ + (n−r −1) + μ C + # S · μ S + (# R − 1)μ R , resulting in a round duration of This is already quite close to the optimal round duration. Let W opt := max{W a , W b , W c }, with Moser [25] examined the round duration of the oral messages algorithm in the real-time model in detail and discovered a lower bound of W opt , i.e., no scheduling algorithm can guarantee a worst-case round duration of less than W opt , and a matching upper bound of W opt , i.e., a scheduling algorithm that ensures that no more than W opt time units are required per round. Figure 10 illustrates the three cases that can lead to worst-case durations of W a , W b and W c . Note that, even though the round durations are quite large-they increase exponentially with the round number, cf. the definition of # S and # R -the duration obtained through our model transformation is only a constant factor away from the optimal value, e.g., W est ≤ 4W opt . In conjunction with the fact that the transformed algorithm is much easier to get and to analyze than the optimal result, this reveals that our generic transformations are indeed a powerful tool for obtaining real-time algorithms.

Conclusions
We introduced a real-time model for message-passing distributed systems with processors that may crash or even behave in a malicious (Byzantine) manner, and established simulations that allow to run an algorithm designed for the classic zerostep-time model in some instance of the real-time model (and vice versa). Precise conditions that guarantee the correctness of these transformations are also given. The real-time model thus indeed reconciles fault-tolerant distributed and real-time computing, by facilitating a worst-case response time analysis without sacrificing classic distributed computing knowledge. In particular, our transformations allow to reuse existing classic fault-tolerant distributed algorithms and proof techniques in the real-time model, resulting in solutions that are competitive w.r.t. optimal real-time algorithms.
Part of our future research in this area is devoted to the development of advanced real-time analysis techniques for determining feasible end-to-end delay assignments for partially synchronous fault-tolerant distributed algorithms.