# Bounded phase analysis of message-passing programs

- 186 Downloads
- 2 Citations

## Abstract

We describe a novel technique for bounded analysis of asynchronous message-passing programs with ordered message queues. Our bounding parameter does not limit the number of pending messages, nor the number of “context-switches” between processes. Instead, we limit the number of process communication cycles, in which an unbounded number of messages are sent to an unbounded number of processes across an unbounded number of contexts. We show that remarkably, despite the potential for such vast exploration, our bounding scheme gives rise to a simple and efficient program analysis by reduction to sequential programs. As our reduction avoids explicitly representing message queues, our analysis scales irrespectively of queue content and variation.

### Keywords

Concurrency Verification Analysis Message-passing Distributed## 1 Introduction

Software is becoming increasingly concurrent: reactivity (e.g., for user interfaces, web servers), parallelization (e.g., in scientific computations), and decentralization (e.g., in web applications) necessitate asynchronous computation. Although shared-memory implementations are often possible, the burden of preventing unwanted thread interleavings without crippling performance is onerous. Many have instead adopted asynchronous programming models in which processes communicate by posting messages/tasks to others’ message/task queues—Miller et al. [34] discuss why such models provide good programming abstractions. Single-process systems such as the JavaScript page-loading engine of modern web browsers [22], and the highly scalable Node.js asynchronous web server [13], execute a series of short-lived tasks one-by-one, each task potentially queueing additional tasks to be executed later. This programming style ensures that the overall system responds quickly to incoming events (e.g., user input, connection requests). In the multi-process setting, languages such as Erlang and Scala have adopted message-passing as a fundamental construct with which highlyscalable and highly reliable distributed systems are built.

Despite the increasing popularity of such programming models, little is known about precise algorithmic reasoning. This is perhaps not without good reason: decision problems such as state-reachability for programs communicating with unbounded reliable queues are undecidable [11], even when there is only a single finite-state process (posting messages to itself). Furthermore, the known algorithms for decidable under-approximations (e.g., bounding the size of queues) represent queues explicitly, and are thus doomed to combinatorial explosion as the size and variability of queue content increases.

Some have proposed analyses which abstract message arrival order [18, 23, 39], or assume messages can be arbitrarily lost [1, 2]. Such analyses do not suffice when correctness arguments rely on reliable messaging—we discuss in Sect. 6.2 a fairly realistic case study in which correctness arguments can rely on reliable messaging—and several systems specifically do ensure the ordered delivery of messages, including Scala’s runtime system [20], and recent web-browser specifications [22]. Others have proposed analyses which compute finite symbolic representations of queue contents [6, 9]. Known bounded analyses which model queues precisely either bound the maximum capacity of message-queues, ignoring executions which exceed the bound, or bound the total number of process “contexts” [21, 27, 36], where each context involves a single process sending and receiving messages. For each of these bounding schemes there are trivial systems which cannot be adequately explored, e.g., by sending more messages than the allowed queue-capacity, having more processes than contexts, or by alternating message-sends to two processes—we discuss such examples in Sect. 3. All of the above techniques explicitly maintain some (perhaps symbolic) representation of queue contents, and face combinatorial explosion as queue content and variation increase.

In this work we propose a novel technique for bounded analysis of asynchronous message-passing programs with reliable, ordered message queues. Our bounding parameter, introduced in Sect. 3, is not sensitive to the capacity nor content of message queues, nor the number of process contexts. Instead, we bound the number of process communication cycles by labeling each message with a monotonically increasing phase number. Each time a message chain visits the same process, the phase number must increase. For a given parameter \(k\), we only explore behaviors of up to \(k\) phases—though \(k\) phases can go a long way. In the leader election distributed protocol [41] for example, each election round occurs in \(2\) phases: in the first phase each process sends *capture* messages to the others; in the second phase some processes receive *accept* messages, and those that find themselves majority-winners broadcast *elected* messages. In these two phases an unbounded number of messages are sent to an unbounded number of processes across an unbounded number of process contexts.

We demonstrate the strength of phase-bounding by showing in Sects. 4 and 5 that the bounded phase executions of a message-passing program can be concisely encoded as a non-deterministic sequential program, in which message-queues are not explicitly represented. Our so-called “sequentialization” sheds hope for scalable analyses of message-passing programs. In a small set of simple experiments (Sect. 4), we demonstrate that our phase-bounded encoding scales far beyond known explicit-queue encodings as queue-content increases, and even remains competitive as queue-content is fixed while the number of phases grows. In Sect. 6 we present a case study using a prototype implementation of our sequentialization to quickly discover bugs in typical textbook asynchronous network algorithms. By reducing to sequential programs, we leverage highly developed sequential program analysis tools for the algorithmic analysis of message-passing programs.

A shorter version of this paper appears in the LNCS proceedings of TACAS 2012 [42]. The current version contains additional complexity results, complete proofs to theorems and lemmas, and the experiment case study of Sect. 6.

## 2 Asynchronous message-passing programs

We consider a simple multi-processor programming model in which each processor is equipped with a procedure stack and a queue of pending tasks. Initially all processors are idle. When an idle processor’s queue is non-empty, the oldest task in its queue is removed and executed to completion. Each task executes essentially a recursive sequential program, which besides accessing its own processor’s global storage, can *post* tasks to the queues of any processor, including its own. When a task does complete, its processor again becomes idle, chooses the next pending task to execute to completion, and so on. The distinction between queues containing messages and queues containing tasks is mostly aesthetic, but in our task-based treatment queues are only read by idle processors; reading additional messages during a task’s execution is prohibited. While in principle many message-passing systems, e.g., in Erlang and Scala, allow reading additional messages at any program point, we have observed that common practice is to read messages only upon completing a task [42]. We thus make the *well-queueing* assumption [27] that only idle processors (i.e., those not currently executing a task, or in other words, those whose current procedure activation stacks are empty) can take new tasks from their task queues.

Though similar to Sen and Vishwanathan [39]’s model of asynchronous programs, the model we consider has two important distinctions. First, tasks execute across potentially several processors, rather than only one, each processor having its own global state and pending tasks. Second, the tasks of each processor are executed in exactly the order they are posted. In the case of single-processor programs, Sen and Vishwanathan [39]’s model can be seen as an abstraction of the model we consider, since there the task chosen to execute next when a processor is idle is chosen non-deterministically among all pending tasks.

### 2.1 Program syntax

*asynchronous message-passing programs*. We intentionally leave the syntax of expressions \(e\) unspecified, though we do insist Vals contains

**true**and

**false**, and Exprs contains Vals and the (

*nullary*)

*choice operator*\(\mathtt{\star } \).

Each program \(P\) declares a single global variable g and a procedure sequence, each \(p \in \)Procs having a single parameter l and top-level statement denoted \(s_{p}\); as statements are built inductively by composition with control-flow statements, \(s_{p}\) describes the entire body of \(p\). The set of program statements \(s\) is denoted Stmts. Intuitively, a **post**\(\rho \ p\ e\) statement is an asynchronous call to a procedure \(p\) with argument \(e\) to be executed on the processor identified by \(\rho \); a *self-post* to one’s own processor is made by setting \(\rho \) to _. A program in which all **post** statements are self-posts is called a *single-processor program*, and a program without **post** statements is called a *sequential program*. The **assume**\(e\) statement proceeds only when \(e\) evaluates to **true**; we use this statement to prevent undesired executions in later sections.

The programming language we consider is simple, yet very expressive, since the syntax of types and expressions is left free, and we lose no generality by considering only single global and local variables. Appendix A lists several syntactic extensions which we use in the source-to-source translations of the subsequent sections, e.g., multiple variables, and which easily reduce to the syntax of our grammar.

### 2.2 Single-processor semantics

A (*procedure*) *frame*\(f = \left\langle {\ell , s} \right\rangle \) is a current valuation \(\ell \in \)Vals to the procedure-local variable l, along with a statement \(s \in \)Stmts to be executed. (Here \(s\) describes the entire body of a procedure \(p\) that remains to be executed, and is initially set to \(p\)’s top-level statement \(s_p\); we refer to initial procedure frames \(t = \left\langle {\ell ,s_p} \right\rangle \) as *tasks*, to distinguish the frames that populate processor queues.) The set of all frames is denoted Frames.

A *processor configuration*\(\kappa = \left\langle {g,w,q} \right\rangle \) is a current valuation \(g \in \)Vals to the processor-global variable g, along with a procedure-frame stack \(w \in \)Frames\(^*\) and a pending-tasks queue \(q \in \)Frames\(^*\). A processor is idle when \(w = \varepsilon \). The set of all processor configurations is denoted Pconfigs. A processor configuration map \(\xi : \mathtt{{Pids}} \rightarrow \)Pconfigs maps each processor \(\rho \in \mathtt{{Pids}}\) to a processor configuration \(\xi (\rho )\). We write \(\xi ({\rho \shortmid \!\rightarrow \kappa })\) to denote the configuration \(\xi \) updated with the mapping \(({\rho \shortmid \!\rightarrow \kappa })\), i.e., the configuration \(\xi ^{\prime }\) such that \(\xi ^{\prime }(\rho ) = \kappa \), and \(\xi ^{\prime }(\rho ^{\prime }) = \xi (\rho ^{\prime })\) for all \(\rho ^{\prime } \in \mathtt{{Pids}} \backslash \{\rho \}\).

*statement context*\(S\) is a term derived from the grammar \(S \, {::{=}}\, \diamond \ \mid \ S; s\), where \(s \in \)Stmts. We write \(S[{s}]\) for the statement obtained by substituting a statement \(s\) for the unique occurrence of \(\diamond \) in \(S\). Intuitively, a context filled with \(s\), e.g., \(S[s]\), indicates that \(s\) is the next statement to execute in the statement sequence \(S[s]\). Similarly, a

*processor configuration context*\(C = \left\langle {g, \left\langle {\ell ,S} \right\rangle w, q} \right\rangle \) is a processor configuration whose top-most frame’s statement is replaced with a statement context, and we write \(C[{s}]\) to denote the processor configuration \(\left\langle {g, \left\langle {\ell ,S[{s}]} \right\rangle w, q} \right\rangle \). When \(e\) is an expression, we abbreviate \(e(C[{\mathbf{skip}}])\) by \(e(C)\).

**true**. The Assign statement stores the value of a given expression in either the local variable l or the global variable g. The If-Then and If-Else rules proceed to either the

**then**or else branch, depending on the current valuation of the given expression \(e\). Similarly, the Loop-Do and Loop-End rules proceed to (re-)enter the loop when the given expression \(e\) evaluates to

**true**, and step past the loop when \(e\) evaluates to false. More interestingly, the Call rule creates a new procedure frame \(f\) by evaluating the given argument \(e\), and places \(f\) at the top of the procedure-frame stack. The Return rule removes the top-most procedure frame from the stack, and substitutes the valuation of the return expression \(e\) into the assignment \(x:{=}\, \mathtt{\star } \) left below by the matching

**call**statement. Note that the transition relation \(\rightarrow \) is non-deterministic, since the evaluation of an expression \(e\) can result in an arbitrary set of possible values.

### 2.3 Multi-processor semantics

*interleaving semantics*: at any moment only one processor executes. Intuitively, this is true since we consider that each message is

*received*atomically. In order to later restrict processor interleaving, we make explicit the

*scheduler*which arbitrates the possible interleavings. Formally, a

*scheduler*

*deterministic*when \(|{ \mathtt{{enabled}}(m,\xi ) }| \le 1\) for all \(m\in D\) and \(\xi : \mathtt{{Pids}} \rightarrow \mathtt{{Pconfigs}}\), and is

*non-blocking*when for all \(m\) and \(\xi \), if there is some \(\rho \in \mathtt{{Pids}}\) such that \(\xi (\rho )\) is either non-idle or has pending tasks, then there exists \(\rho ^{\prime } \in \mathtt{{Pids}}\) such that \(\rho ^{\prime } \in \mathtt{{enabled}}(m,\xi )\) and \(\xi (\rho ^{\prime })\) is either non-idle or has pending tasks. A

*configuration*\(c = \left\langle {\rho ,\xi ,m} \right\rangle \) is a currently executing processor \(\rho \in \mathtt{{Pids}}\), along with a processor configuration map \(\xi \), and a scheduler object \(m\).

Until further notice, we assume \(M\) is a completely non-deterministic scheduler; i.e., all processors are always enabled. In Sect. 5 we discuss alternatives.

An \(M\)-*execution of a program*\(P\) (*from*\(c_0\) to \(c_j\)) is a configuration sequence \(c_0 c_1 \ldots c_j\) such that \(c_i \rightarrow c_{i+1}\) for \(0 \le i < j\). An *initial condition*\(\iota = \left\langle {\rho _0, g_0, \ell _0, p_0} \right\rangle \) is a processor identifier \(\rho _0\), along with a global-variable valuation \(g_0 \in \)Vals, a local-variable valuation \(\ell _0 \in \)Vals, and a procedure \(p_0 \in \mathtt{{Procs}}\). A configuration \(c = \left\langle {\rho _0, \xi , m} \right\rangle \) of a program \(P\) is \(\left\langle {\rho _0,g_0,\ell _0,p_0} \right\rangle \)-*initial* when \(m = \mathtt{{empty}}\), \(\xi (\rho _0) = \left\langle {g_0, \varepsilon , \left\langle {\ell _0, s_{p_0}} \right\rangle } \right\rangle \) and \(\xi (\rho ) = \left\langle {g_0, \varepsilon , \varepsilon } \right\rangle \) for all \(\rho \ne \rho _0\). A configuration \(\left\langle {\rho _0,\xi ,m} \right\rangle \) is *final* when \(\xi (\rho ) = \left\langle {g,\varepsilon ,\varepsilon } \right\rangle \) for some \(g \in \)Vals for all \(\rho \in \mathtt{{Pids}}\), and an execution to a final configuration is called *completed*. We say a global valuation \(g\) is *M-reachable* in \(P\) from \(\iota \) when there exists an \(M\)-execution^{1} of \(P\) from some \(c_0\) to some \(\left\langle {\rho ,\xi ,m} \right\rangle \) such that \(c_0\) is \(\iota \)-initial and \(\xi (\rho ^{\prime }) = \left\langle {g,w,q} \right\rangle \) for some \(\rho ^{\prime } \in \mathtt{{Pids}}\).

**Definition 1**

The *state-reachability problem* is to determine for an initial condition \(\iota \), valuation \(g\), and program \(P\), whether \(g\) is reachable in \(P\) from \(\iota \).

## 3 Phase-bounded execution

Because processors execute tasks precisely in the order which they are posted to their unbounded task-queues, our state-reachability problem is undecidable, even with only a single processor accessing finite-state data [11]. Since it is not algorithmically possible to consider every execution precisely, in what follows we present an incremental under-approximation. For a given bounding parameter \(k\), we consider a subset of execution (prefixes) precisely; as \(k\) increases, the set of considered executions increases, and in the limit as \(k\) approaches infinity, every execution of any program is considered—though for many programs, every execution is considered with a finite value of \(k\).

*task-chain*\(t_1 t_2 \ldots t_i\) from \(t_1\) to \(t_i\) is a sequence of tasks

^{2}such that the execution of each \(t_j\) posts \(t_{j+1}\), for \(0 < j < i\), and we say that \(t_1\) is an

*ancestor*of \(t_i\). We characterize execution prefixes by labeling each task \(t\) posted in an execution with a

*phase number*\(\varphi (t) \in \mathbb{N }\):

**Definition 2**

An execution is *k-phase* when \(\varphi (t) < k\) for each executed task \(t\).

The execution in Fig. 5 a is a \(4\)-phase execution, since all tasks have phase \(<\)4. Despite there being an arbitrary number \(4n+1\) of posted tasks, the execution in Fig. 5b is \(1\)-phase, since there are no task-chains between same-processor tasks. Contrarily, the execution in Fig. 5c requires \(n\) phases to execute all \(2n\) tasks, since every other occurrence of an \(A_i\) task creates a task-chain between \(A\)-tasks.

Note that bounding the number of execution phases does not necessarily bound the total number of tasks executed, nor the maximum size of task queues, nor the amount of switching between processors. Instead, a bound \(k\) restricts the maximum length of task chains to \(k\cdot |{\mathtt{{Pids}}}|\). In fact, phase-bounding is incomparable to bounding the maximum size of task queues. On the one hand, every execution of a program in which one root task posts an arbitrary, unbounded number of tasks to other processors (e.g., in Fig. 5b) are explored with \(1\) phase, though no bound on the size of queues will capture all executions. On the other hand, all executions with a single arbitrarily long chain of tasks (e.g., in Fig. 5c) are explored with size \(1\) task queues, though no limited number of phases captures all executions. In the limit as the bounding parameter increases, either scheme does capture all executions.

In comparison to the context-bounded approach [21, 27], where each “context” involves a single process reading from its queue, and posting to the queues of other processes, each \(k\)-context execution is a \(k\)-phase execution, since the \(i\)th context may create messages of phase at most \(i\!+\!1\). However, as Fig. 5b illustrates, there are programs whose fixed-phase executions require an unbounded number of contexts to capture: a single phase captures the execution order \(D_1 D_2 D_3\ldots D_{2n}\) on processor \(D\), which requires \(n+2\) contexts (specifically, of processors \(A (BC)^n D\)) to capture. We provide a more thorough comparison between phase-bounding and context-bounding in Sect. 7.

**Lemma 1**

(Completeness) For every execution \(h\) of a program \(P\), there exists \(k \in \mathbb{N }\) such that \(h\) is a \(k\)-phase execution.

*Proof*

This follows from the inductive definition of \(\varphi \). \(\square \)

## 4 Phase-bounding for single-processor programs

Characterizing executions by their phase-bound reveals a simple and efficient technique for bounded exploration. This seems remarkable, given that phase-bounding explores executions in which arbitrarily many tasks execute, making the task queue arbitrarily large. We demonstrate in Sect. 4.1 a succinct encoding of phase-bounded state-reachability into state-reachability in sequential programs. This reduction leads to an asymptotically optimal algorithm, in a certain sense, since we show in Sect. 4.2 that our exptime reduction-based algorithm for finite-data programs decides an exptime-hard problem. In Sect. 4.3 we demonstrate that our encoding can indeed be implemented efficiently.

### 4.1 A sequential encoding of phase-bounded exploration

Towards an encoding of phase-bounded executions into executions of sequential programs, the first key ingredient is that once the number of phases is bounded, each phase can be executed in isolation. For instance, consider again the execution of Fig. 5a. In phase \(1\), the tasks \(A_2,\;A_3\), and \(A_4\) pick up execution from the global valuation \(g_1\) which \(A_1\) left off at, and leave behind a global valuation \(g_2\) for the phase \(2\) tasks. In fact, given the sequence of tasks in each phase, the only other “communication” between phases is a single passed global valuation; executing that sequence of tasks on that global valuation is a faithful simulation of that phase.

The second key ingredient is that the ordered sequence of tasks executed in a given phase is exactly the ordered sequence of tasks posted in the previous phase. This is obvious, since tasks are executed in the order they are posted. However, combined with the first ingredient we have quite a powerful recipe. Supposing the global state \(g_i\) at the beginning of each phase \(i\) is known initially, we can simulate a \(k\)-phase execution by executing each task posted to phase \(i\) as soon as it is posted, with an independent virtual copy of the global state, initially set to \(g_i\). That is, our simulation will store a vector of \(k\) global valuations, one for each phase. Initially, the \(i\)th global valuation is set to the state \(g_i\) in which phase \(i\) begins; tasks of phase \(i\) then read from and write to the \(i\)th global valuation. It then only remains to ensure that the global valuations \(g_i\) used at the beginning of each phase \(i: 0 < i < k\) match the valuations reached at the end of phase \(i\!-\!1\).

**Lemma 2**

A global-valuation \(g\) is reachable in a \(k\)-phase execution of a single-processor program \(P\) if and only if \(g\) is reachable in a completed execution of \(((P))_k\)—the \(k\)-phase sequential translation of \(P\).

*Proof*

This follows easily from our preceding development. Any \(g\) reachable in a \(k\)-phase execution is reachable by executing a sequence of task sequences \(\tau _0 \tau _1\ldots ,\) each sequence \(\tau _i\) containing tasks with phase number \(i\), which encounter the global valuations \(g_0 g_1\ldots \), each \(g_i\) encountered at the beginning of the \(i\)th sequence \(\tau _i\). Given correctly guessed values of \(g_0 g_1\ldots \), the sequential program \(((P))_k\) executes each task of phase \(i : 0 \le i < k\) in the order of \(\tau _i\), posting, in order, the tasks \(\tau _{i+1}\) to phase \(i\!+\!1\). As the **assume** statement of Line 23 ensures that each \(g_i : 0 < i < k\) matches the value reached at the end of the phase \(i\!-\!1\) tasks \(\tau _{i-1}\), the completed executions of \(((P))_k\) correspond exactly to the \(k\)-phase executions of \(P\). \(\square \)

Given any underlying sequential program model, e.g., programs with integer variables, our translation makes applicable any analysis tool for the said model to message-passing programs, since the values of the additional variables are either from the finite domain \(\{0,\ldots ,k-1\}\), or in the domain of the original program variables. When the underlying sequential program model has a decidable state-reachability problem, Lemma 2 yields a decision procedure for the phase-bounded state-reachability problem, by applying the decision procedure for the underlying model to the translated program. This allows us for instance to derive a decidability result for programs with finite data domains.

### 4.2 Complexity of phase-bounded exploration

Here we demonstrate that state-reachability of single-procesor programs is EXPTIME-complete for programs with finite data domains. We show EXPTIME-hardness by reduction from the language emptiness problem of the intersection of a pushdown automaton and set of regular automata, and EXPTIME-membership by leveraging state-reachability algorithms for the polynomial-sized sequential programs resulting from our phase-bounded sequential translation.

**Lemma 3**

The language emptiness problem for the intersection of a pushdown automaton and \(k \in \mathbb{N }\) finite state automata is polynomial-time reducible to the \((k+1)\)-phase state-reachability problem for finite-data programs.

*Proof*

Let \(\mathcal{A }_0\) be a pushdown automaton, and let \(\mathcal{A }_1, \ldots , \mathcal{A }_k\) be finite state automata. For simplicity suppose that the sets \(Q_i : 0 \le i \le k\) of \(\mathcal{A }_i\) states are disjoint. Let \(Q = \bigcup _i Q_i\), and let \(\Sigma \) and \(\delta \) be the sets of symbols and transitions of \(\bigcup _i \mathcal{A }_i\). Further suppose that each \(\mathcal{A }_i\) has a single initial state, and a single final state without outgoing transitions; this is without loss of generality since \(\mathcal{A }_i\) may be nondeterministic. Finally, we suppose \(\mathcal{A }_0\) accepts upon reaching its final state with an empty stack, and that each transition of \(\mathcal{A }_0\) changes the stack size by at most one.

We define a finite-data single-processor program \(P\) with a single global variable state, along with five procedures, in Fig. 7. Initially, the main procedure calls init(0), then pda($)—we use $ to denote the empty stack symbol—then last(0). Each invocation of init(i) (resp., last(i)) checks that the current state is equal to the initial (resp., final) state of automaton \(\mathcal{A }_i\), and posts the next instance of init(i+1) (resp., last(i+1)). The pda procedure simulates the pushdown automaton \(\mathcal{A }_0\), repeatedly and nondeterministically choosing some transition tx enabled given the current state and stack symbol (i.e., when state and ss are equal to src[tx] and top[tx]), updating the state to the transition’s target tgt[tx], and either simulating a push transition by calling pda recursively, a pop transition by returning from the topmost instance of pda, or an internal transition, by simply updating the top stack symbol ss; the stack symbols fst[tx] and snd[tx] are the first and second symbols transition txputs on the stack, and \(\perp \) denotes no symbol. For each transition tx, pda also posts an instance of fsa with the symbol ltl[tx] it has read as an argument.

Subsequently, each automata \(\mathcal{A }_i : 0<i\le k\) is simulated in each phase \(i\) by the sequence of pending tasks init(i)fsa(\(\sigma _1\)) ...fsa(\(\sigma _j\)) last(i), according to the word \(w = \sigma _1\ldots \sigma _j\) initially read by \(\mathcal{A }_0\). Similarly to the simulation of \(\mathcal{A }_0\), each \(\mathcal{A }_i\) is simulated by choosing nondeterministically a sequence of transitions, at each step ensuring that one transition follows from the state reached by the previous; the init(i) and last(i) procedures ensure \(\mathcal{A }_i\) has started in its initial state, and ends in its final state after reading \(w\). In this way, the valuation state =\(q_f\), where \(q_f\) is the final state of \(\mathcal{A }_k\), is reachable in a \(k+1\) phase execution of \(P\) if and only if \(w\) is accepted by each \(\mathcal{A }_i: 0\le i\le k\).

**Theorem 1**

The phase-bounded state-reachability problem for finite-data single-processor programs is EXPTIME-complete.

*Proof*

As the language emptiness problem for the intersection of a pushdown automaton and \(k \in \mathbb{N }\) finite state automata is EXPTIME-complete [17], the reduction of Lemma 3 proves EXPTIME-hardness. An EXPTIME algorithm is obtained by applying a polynomial-time state-reachability algorithm for pushdown systems [12, 38, 40] to the exponentially sized system obtained by representing explicitly the valuations of program variables in the \(k\)-phase translation \(((P))_k\) of a given finite-data program \(P\). \(\square \)

### 4.3 Feasibility of phase-bounded exploration

Note that our simulation of a \(k\)-phase execution does not explicitly store the unbounded task queue. Instead of storing a multitude of possible unbounded task sequences, our simulation stores exactly \(k\) global state valuations. Accordingly, our simulation is not doomed to the unavoidable combinatorial explosion encountered by storing (even bounded-size) task queues explicitly. To demonstrate the capability of our advantage, we measure the time to verify two fabricated yet illustrative examples, comparing our bounded-phase encoding with a bounded task-queue encoding. In the bounded task-queue encoding, we represent the task-queue explicitly by an array of integers, which stores the identifiers of posted procedures.^{3} When the control of the initial task completes, the program enters a loop which takes a procedure identifier from the head of the queue, and calls the associated procedure. When the queue reaches a given bound, any further posted tasks are ignored.

**false**and set b to

**true**, and \(i\) procedures named \(q_1,\ldots , q_i\) which set b to

**false**. Initially, \(P_1(i)\) sets b to

**false**, and enters a loop in which each iteration posts some \(p_j\) followed by some \(q_j\). Since a \(q_j\) task must be executed between each \(p_j\) task, each of the assertions are guaranteed to hold.

**false**, sets b to

**true**, and posts \(p_2\), while \(p_2\) sets b to

**false**and posts \(p_1\). Initially, the program \(P_2\) sets b to

**false**and posts a single \(p_1\) task. Again here, since a \(p_2\) task must execute between each \(p_1\) task, each of the assertions are guaranteed to hold.

Figure 9b compares the time required to verify \(P_2\) for various bounds \(n\) on the number of tasks explored.^{4} Note that although every execution of \(P_2\) uses only size \(1\) task-queues, to explore all \(n\) tasks in any given execution, the number of phases must be at least \(n\), since each task must execute in its own phase. Although verification time for the bounded-phase encoding does increase with \(n\) faster than the bounded task-queue encoding—as expected—due to additional copies of the global valuation, and more deeply in-lined procedures, the verification time remains manageable. In particular, the time does not explode uncontrollably: even \(50\) tasks are explored in under \(20\) s.

## 5 Phase-bounding for multi-processor programs

Though state-reachability under a phase bound is immediately and succinctly reducible to sequential program analysis for single-processor programs, the multi-processor case is more complicated. The added complexity arises due to the many orders in which tasks on separate processors can contribute to others’ task-queues. As a simple example, consider the possible bounded-phase executions of Fig. 5b with four processors, \(A,\; B,\; C\), and \(D\). Though \(B\)’s tasks \(B_1, \ldots , B_n\) must be executed in order, and \(C\)’s tasks \(C_1, \ldots , C_n\) must also be executed in order, the order of \(D\)’s tasks are not pre-determined: the arrival order of \(D\)’s tasks depends on how \(B\)’s and \(C\)’s tasks *interleave*. Suppose for instance \(B_1\) executes to completion before \(C_1\), which executes to completion before \(B_2\), and so on. In this case \(D\)’s tasks arrive to \(D\)’s queue, and ultimately execute, in the index order \(D_1, D_2, \ldots \) as depicted. However, there exist executions for every possible order of \(D\)’s tasks respecting \(D_1 < D_3 < \ldots \) and \(D_2 < D_4 < \ldots \) (where \(<\) denotes an ordering constraint)—many possible orders indeed!

Here we show, in Sect. 5.1, that due to the capability of such unbounded interleaving, the problem of state-reachability under a phase-bound is undecidable for even finite-data multi-processor programs. Wishing still for algorithmic analyses, we attempt to leverage existing strategies for dealing with the complexity arising from processor interleaving. For this we recall delay-bounded scheduling [15] in Sect. 5.2, following which in Sects. 5.3 and 5.4 we demonstrate an effective “depth-first” delay-bounded scheduler which gives rise to an efficient encoding, again into sequential program analysis.

### 5.1 Undecidability of multi-processor phase-bounding

**Theorem 2**

The phase-bounded state-reachability problem for finite-data multi-processor programs is undecidable.

Our reduction-based proof makes use of the syntactic extensions of Appendix A.

*Proof*

Initially, the main procedure is pending on processor \(\rho _0\). In each loop iteration, main chooses a branch corresponding to an index \(i \in \{1 \ldots n\}\) and posts each symbol of \(\alpha _i\) individually and in order to \(\rho _1\), and each symbol of \(\beta _i\) individually and in order to \(\rho _2\). In this way, main sends to \(\rho _1\) the sequence \(\alpha _{i_1} \ldots \alpha _{i_k}\), and to \(\rho _2\) the sequence \(\beta _{i_1} \ldots \beta _{i_k}\) in \(k\) loop iterations, each terminated by a last message. Each instance of the hold tasks which execute on \(\rho _1\) and \(\rho _2\) simply propagate their symbol to \(\rho _3\). Using the global variable turn, \(\rho _3\) ensures that he only sees symbols sent from \(\rho _1\) and \(\rho _2\) in alternating order, starting with \(\rho _1\). Using the global variable prev, \(\rho _3\) ensures that each symbol of \(\beta _{i_1} \ldots \beta _{i_k}\) sent from \(\rho _2\) matches the previous symbol of \(\alpha _{i_1} \ldots \alpha _{i_k}\) sent from \(\rho _1\). Finally, if \(\rho _3\) receives both terminating last’ messages from \(\rho _1\) and \(\rho _2\) before another L-turn, then he has successfully checked the equality of sequences \(\alpha _{i_1} \ldots \alpha _{i_k} = \beta _{i_1} \ldots \beta _{i_k}\). Thus, when down[L] and done[R] are both set to true, this means main was able to guess a solution \(i_1 \ldots i_k\) to the correspondence problem. \(\square \)

Note that Theorem 2 holds independently of whether memory is shared between processors: the fact that a task-queue can store any possible (unbounded) shuffling of tasks posted by the two processors \(\rho _1\) and \(\rho _2\) lends the power to simulate Post’s correspondence problem [35]. Furthermore, while our proof used four processors \(\rho _0 \ldots \rho _3\) for clarity, by merging functionality, e.g., of \(\rho _0\) with \(\rho _1\) and \(\rho _2\) with \(\rho _3\), a two-processor reduction is also possible; the universal computing power arises from the order-preserving interleaving of as few as two processors.

### 5.2 Delay-bounded scheduling

Theorem 2 insists that phase-bounding alone will not lead to the elegant encoding to sequential programs which was possible for single-processor programs. If that were possible, then the translation from a finite-data program would lead to a finite-data sequential program, and thus a decidable state-reachability problem. Since a precise algorithmic solution to bounded-phase state-reachability is impossible for multi-processor programs, we resort to a further incremental yet orthogonal under-approximation, which limits the number of considered processor interleavings. The following development is based on delay-bounded scheduling [15].

Roughly speaking, the semantics of an asynchronous concurrent program is dependent on the timing with which processors take their respective steps. This concept of timing is often captured by considering a *scheduler* that decides which processor’s turn it is at any moment. Since the factors which determine the scheduler are often deemed too complex to model, many approaches to program analysis consider an abstract, nondeterministic scheduler, which allows *any* processor to take a step at any moment. Concordantly, the very large number of possible process/thread interleavings such analyses must consider is a key source of analysis complexity.

Delay-bounding [15] attempts to dodge much of the interleaving complexity by considering much fewer interleavings, by systematically limiting the amount of nondeterminism added to a given deterministic scheduler \(M\). In the simplest case, when the bound on delays is zero, only a single schedule is explored, as \(M\) is deterministic.^{5} However, when \(M\) is augmented with \(k>0\) delays, \(M\) gets to choose at up to \(k\) arbitrary scheduling points throughout each execution to postpone its next-scheduled task until some later time, picking the following task instead. So long as the base deterministic scheduler \(M\) and delaying policy are chosen well, any execution possible with a completely nondeterministic scheduler will be possible with \(M\) using some number of delay operations.

*delaying scheduler*

*delay (operation)*, saying that processor \(\rho \) is delayed. Note that a delay operation may or may not change the set of enabled processors in any given step, depending on the scheduler. Thinking of \(\mathtt{{delay}}\) operations as invoked outside of the scheduler, we say a delaying scheduler \(M\) is

*deterministic*when \(M\)’s base scheduler (i.e., without the delay operation) is deterministic. A delaying scheduler is

*delay-accessible*when for every configuration \(c_1\) and non-idle or task-pending processor \(\rho \), there exists a sequence \(c_1 \rightarrow \ldots \rightarrow c_j\) of Delay-steps such that \(\rho \) is enabled in \(c_j\). Given executions \(h_1\) and \(h_2\) of (delaying) schedulers \(M_1\) and \(M_2\) resp., we write \(h_1 \sim h_2\) when \(h_1\) and \(h_2\) are identical after projecting away delay operations.

**Definition 3**

An execution with at most \(k\) delay operators is called *k-delay*.

**Lemma 4**

(Completeness) Let \(M\) be any delay-accessible scheduler. For every execution \(h\) of a program \(P\), there exists an \(M\)-execution \(h^{\prime }\) and \(k \in \mathbb{N }\) such that \(h^{\prime }\) is a \(k\)-delay execution and \(h^{\prime } \sim h\).

Note that Lemma 4 holds for *any* delay-accessible scheduler \(M\)—even deterministic schedulers.

### 5.3 The multi-processor depth-first scheduler

As it turns out there is one particular scheduler \(M_\mathrm{dfs}\) for which we know a convenient sequential encoding. Here we define a deterministic, non-blocking, delay-accessible delaying scheduler \(M_\mathrm{dfs}\) which though perhaps odd from an operational point of view, has a very useful application: given a multi-processor message-passing program \(P\), the phase- and delay-bounded executions of \(P\) according to \(M_\mathrm{dfs}\) are simulated by executions of a sequential program \(P^{\prime }\); furthermore, \(P^{\prime }\) is obtained by a simple code-to-code translation of \(P\) which does not explicitly represent pending-task queues.

Let \(U\) be a set of identifiers uniquely identifying each task along an execution with a single initially pending task \(u_0 \in U\). Ourscheduler keeps a monotonically increasing phase number \(i \in \mathbb{N }\), along with an ordered task-posting tree \(T\) over nodes \(U\), a completion-labeling \(\surd : U \rightarrow \mathbb B \), and a phase-labeling \(\Phi : U \rightarrow \mathbb{N }\). Initially the tree contains a single node \(u_0\), with \(\Phi (u_0) = 0\) and \(\surd (u_0) = \mathtt{{false}} \). As additional tasks are posted, we add them as children of the posting task, in the order they are posted. Normally, the scheduler allows tasks to execute to completion; when a task does complete, the scheduler marks it as completed. When choosing the next task to execute, our scheduler selects the smallest—in depth-first order over the task-posting tree—unexecuted task in the current phase; if there are no non-completed tasks in the current phase, the scheduler moves to the next phase. In this way, the scheduler executes all tasks in phase order, and same-phase tasks in depth-first order of the task-posting tree.

To implement delaying, our scheduler also keeps a phase-delay counter \(\Delta (\rho ): \mathbb{N }\) for each processor \(\rho \). Supposing an executing task \(u\) has phase-\(i\) on a processor whose phase-delay counter has current value \(j\), the task \(u\) is treated as though it is in phase \(i+j\). When a processor is delayed, its phase-delay counter is simply incremented; the effect is to shift all following tasks on the given processor one additional phase later. Delaying causes the currently executing task to be interrupted and resumed in the following phase.

*depth-first scheduler*

If \(\tau \) is a Complete-step of task \(u\), then \(\surd _2 = \surd _1(u |\rightarrow \mathtt{{true}})\); otherwise \(\surd _2 = \surd _1\).

If \(\tau \) is a Post- or Self-Post-step of task \(u\) posting task \(u^{\prime }\), then \(T_2\) is obtained from \(T_1\) by adding to \(u\) new a rightmost child \(u^{\prime }\), and \(\Phi _2 = \Phi _1(u^{\prime } \shortmid \!\rightarrow \Phi _1(u) + \Delta (\rho ))\); otherwise, \(T_2 = T_1\) and \(\Phi _2 = \Phi _1\).

If there no longer exists a non-completed task \(u\) on some processor \(\rho ^{\prime }\) such that \(\Phi _2(u) + \Delta (\rho ^{\prime }) = i_1\) then \(i_2 = i_1+1\); otherwise \(i_2=i_1\).

\(\Delta _2 = \Delta _1({\rho \shortmid \!\rightarrow \Delta _1(\rho )+1})\) increments \(\Delta _1\)’s mapping for processor \(\rho \).

If there no longer exists a non-completed task \(u\) on some processor \(\rho ^{\prime }\) such that \(\Phi (u) + \Delta _2(\rho ^{\prime }) = i_1\), then \(i_2 = i_1+1\); otherwise \(i_2=i_1\).

Note that \(M_\mathrm{dfs}\) is a deterministic delaying scheduler which executes all tasks of a given phase before any task of a subsequent phase. Since \(M_\mathrm{dfs}\) must pick an enabled task so long as there are pending tasks on some processor, \(M_\mathrm{dfs}\) is non-blocking. Finally, since for any \(i = \Phi (u) + \Delta (\rho )\), repeatedly delaying every other processor \(\rho ^{\prime } \ne \rho \) eventually increments \(\Delta (\rho ^{\prime })\) such that for any pending \(u^{\prime }\) on \(\rho ^{\prime },\; \Phi (u^{\prime }) + \Delta (\rho ^{\prime }) > i,\; M_\mathrm{dfs}\) is delay accessible.

On non-delaying executions, \(M_\mathrm{dfs}\) essentially performs a phase-by-phase depth-first traversal of the task-posting tree \(T\)—a tree which includes tasks across all processors.

### 5.4 A sequential encoding of the depth-first scheduler

the phase of a posted task is not necessarily incremented, since posted tasks may not have same-processor ancestors in the current phase; the ancestors array records for each processor \(\rho \) and phase \(i\), whether the current task was transitively posted by a phase \(i\) task on processor \(\rho \);

at any point, the currently executing task may increment a delay counter, causing all following tasks on the same processor to shift forward one additional phase.

Note that even though our sequential translation for the (zero-delay) depth-first scheduler executes same-phase tasks according to depth-first traversal of the task-posting tree, the resulting task execution order always corresponds to *some* valid order. Taking a slightly counterintuitive example, consider a 1-phase computation with processors \(A,\; B\), and \(C\) starting with task \(A_1\) which posts \(B_1\) and \(C_1\), where \(B_1\) posts \(C_2\). The sequentialization \(((P))_{k,0}^{\mathrm{def}}\) without delays simulates an execution where \(C_2\) executes before \(C_1\). Though at first sight this may seem like an invalid interleaving, it is in fact valid, since \(B_1\), running on a different processor than \(A_1\), once posted, can post \(C_2\) before \(A_1\) has the chance to post \(C_1\), thus \(C\)’s task-queue can legitimately contain \(C_2\) before \(C_1\).

**Lemma 5**

A global valuation \(g\) is reachable in a \(k\)-phase \(d\)-delay \(M_\mathrm{dfs}\)-execution of a multi-processor program \(P\) if and only if \(g\) is reachable in a completed execution of \(((P))_{k,{d}}^{\mathrm{def}}\).

*Proof*

By extending Lemma 2 from the single-processor case, the case where \(d=0\) follows easily, since shift[\(\rho \)][\(i\)] is always zero, and the scheduler \(M_\mathrm{dfs}\) executes tasks of the same phase in depth-first order—the same order in which same-phase tasks execute in the sequential program \(((P))_{k,d}^{\mathrm{def}}\). In the presence of delays, our translation maintains the order of \(M_\mathrm{dfs}\) by incrementing the shift variable of a given processor, thus postponing to a later phase all subsequently posted tasks by said processor. As is done with the global values guessed at the beginning of each phase, the guessed total number of delays per-processor used during all previous phases is validated on Line 37. \(\square \)

As is the case for our single-processor translation, our simulation does not explicitly store the unbounded tasks queue, and is not doomed to combinatorial explosion faced by storing task-queues explicitly.

**Theorem 3**

The phase and delay-bounded state-reachability problem for finite-data programs is EXPTIME-complete.

*Proof*

As multi-processor programs subsume single-processor programs, the lower bound of Theorem 1 yields EXPTIME-hardness. Similarly, by the same reasoning used in the proof of Theorem 1, our translation \(((P))_{k,d}^{\mathrm{def}}\) reduces program state-reachability to pushdown reachability in exponentially sized pushdown systems. \(\square \)

## 6 An implementation of phase-bounded exploration

To demonstrate the effectiveness of our phase-bounded sequentialization, we have implemented a code-to-code translation following the translation of Sect. 5.4, using our translation as a key step in a bug-detection algorithm for asynchronous message-passing programs.

### 6.1 A bug-detection algorithm

The input to our bug-detection algorithm is a multiprocessor asynchronous message-passing program written in the Boogie intermediate verification language [5], extended with an asynchronous call statement to model procedure **post**ing. Each input program declares an uninterpreted **pid** type to encode processor identifiers, and each processor global state variable of type \(T\) becomes a map in the input Boogie program from **pid** to \(T\). Each procedure in the input program takes a self parameter of type **pid**, and we ensure that all global variable accesses are indexed by self; this ensures that the tasks of each processor only access its own processor’s global storage. Though it is possible to encode a statically configured network in the input message-passing program, our input programs each assume an arbitrary number of processors which send messages following an arbitrary relation between neighboring processors.

Following the translation of Sect. 5.4, our algorithm first constructs a sequential Boogie program from the given asynchronous program. For simplicity, our prototype implementation does not maintain the ancestors map of Fig. 14; instead we conservatively increment the phase of every single message sent in a message chain, regardless of whether or not there existed a same-phase ancestor of the given target processor. Thus, in principle our implementation will encode fewer behaviors than the translation of Fig. 14 for the same phase and delay bounds. Our translation ensures that the target sequential Boogie program can violate an assertion only if a state which violates an assertion in the source asynchronous Boogie program is reachable, according to the depth-first scheduler with given phase and delay bounds.

To search for assertion-violating executions in the resulting sequential Boogie program, we leverage the Corral reachability engine [32].

### 6.2 Case study: bug detection in network algorithms

**true**. Then, the process node propagates the Search message, sending it to all of its neighbors; we have modeled this broadcast with a nondeterministic loop which chooses arbitrary nodes, other than the current process node, its sender, and the root, such that each broadcast posts at most one message to each other process. The Parent procedure contains an assertion which is violated if the parent relation ever becomes cyclic. Note that the canonical correctness criterion for the network spanning tree calculation is precisely that a tree is constructed, and thus there are no cycles.

**if**branch) every Search message which it receives, following the first one, in which the parent link and reported bit are set synchronously. Crucial to this argument is that the Parent procedure is called synchronously, and it is not hard to imagine alternative implementations whose correctness is not so evident. Consider for example the implementation of the Search procedure of Fig. 16, in which the Parent procedure is called asynchronously.

Although the implementation of Fig. 16 also cannot violate the assertion when messages are processed in FIFO order, the correctness argument is more subtle. Note that if FIFO message order is not necessarily preserved, our model, like that of asynchronous programs [39], would consider that pending messages may be handled in any possible order; in this unordered setting, an assertion violation is indeed possible, since a pending Parent call on some processor \(p_1\) may remain pending, while in the meantime a descendent \(p_2\) of \(p_1\)’s Search broadcast may send, cyclicly, another Search message to \(p_1\); since \(p_1\)’s initial asynchronous Parent call may remain pending, while the asynchronous Parent call initiated by \(p_2\)’s Search executes, setting \(p_1\)’s parent link to \(p_2\), thus introducing the capability of cycle formation on the parent link between \(p_1\) and \(p_2\).

However, while this behavior is possible under the premise that message buffers are unordered, it is prohibited when message buffers are ordered. The reason is quite simple: at any moment when processor \(p_1\) processes the Search message broadcast from some descendant \(p_2\), \(p_1\) is guaranteed to have already executed its initially pending Parent procedure, since it was already pending at the time that \(p_1\) broadcast its Search messages to its neighbors. In fact, one can prove that ordered message buffers rule out not only the previously mentioned behavior, but any behavior leading to an assertion violation in the implementation of Fig. 16.

Important to note here is the subtlety of asynchronous programming; even the simplest asynchronous network algorithms such as spanning tree calculation can be made correct or incorrect according to slight reordering in message sending and receiving.

### 6.3 Experimentation

We have applied the bug detection algorithm outlined in Sect. 6.1 to the erroneous spanning tree algorithm variations just described, as well as two more advanced algorithms: the Bellman–Ford algorithm for computing the distance of each node in a network from a given root, and the breadth-first spanning tree algorithm for computing a spanning tree in which each node is connected by a minimal-distance path from a given root [33]—each with a subtle, injected bug. The Boogie source of these examples are listed in full in Appendix 8. In each case we invoked the Corral reachability engine [32] on a \(5\)-phase sequential program translation with a recursion-depth bound of \(5\). All of our experiments used a delay bound of \(0\). Our algorithm discovers the assertion violations in 20, \(10\), and \(5\) s, resp., for the buggy spanning tree, Bellman–Ford, and breadth-first spanning tree algorithms. Additionally, after exploring all possible \(5\)-phase executions of the (correct for FIFO buffers) spanning tree variation of Fig. 16 up to the recursion depth of \(5\)—of an arbitrarily connected network of arbitrarily many processors—our algorithm concludes that the assertion violation is not possible (up to recursion depth \(5\), and in \(5\) phases) after \(199\) s. Though not as effective at determining the absence of assertion violations, our prototype demonstrates that our phase-bounded sequentialization is a viable approach for discovering subtle programming bugs in asynchronous message-passing systems with ordered message buffers.

## 7 Related work

Our work follows the line of research on compositional reductions from concurrent to sequential programs. The initial so-called “sequentialization” [37] explored multi-threaded programs up to one context-switch between threads, and was later expanded to handle a parameterized amount of context-switches between a statically determined set of threads executing in round-robin order [31, 36]. Contrary to these approaches which rely on nondeterministically guessing the states reached by other threads, La Torre et al. [28] demonstrated a reduction from bounded-context round-robin to sequential program analysis which avoids guessing, and thus the exploration of unreachable states. La Torre et al. [30] later extended the approach to handle programs parameterized by an unbounded number of statically determined threads by the computation of linear interfaces [29], and shortly after, Emmi et al. [15] further extended these results to handle an unbounded amount of dynamically created tasks, which besides applying to multi-threaded programs, naturally handles asynchronous event-driven programs [39]. Bouajjani et al. [10] pushed these results even further to a sequentialization which attempts to explore as many behaviors as possible within a given analysis budget, and La Torre and Parlato [25] demonstrate a sequential reduction by *scope bounding*, which captures more concurrent behaviors than context bounding.

While the previously mentioned sequentializations provide reductions only from state-reachability problems (e.g., including violations to safety properties), Atig et al. [4] and Emmi and Lal [14] have recently demonstrated sequentializations which even reduce the detection of liveness property violations (e.g., non-termination) in multithreaded and asynchronous programs to sequential program state-reachability. All of these sequentializations necessarily do provide a bounding parameter which limits the amount of interleaving between threads or tasks, but none are capable of precisely exploring tasks in creation order, which is abstracted away from their program models [39]. While Kidd et al. [24] and Emmi et al. [16] have developed sequentializations which are sensitive to task priorities, their reductions assume a finite number of priorities, and thus cannot capture unbounded queues.

- 1.
There are programs whose fixed-phase executions require an unbounded number of contexts to capture; Fig. 5b serves as an example: a single phase captures the execution order \(D_1 D_2 D_3 \ldots D_{2n}\) on processor \(D\), which requires \(n+2\) contexts (specifically, of processors \(A (BC)^n D\)) to capture.

- 2.
While each \(k\)-context execution is a \(k\)-phase execution, since the \(i\)th context creates messages of phase at most \(i+1\), not all \(k\)-phase executions occur in \(k\) contexts, since, according to Theorem 2, \(k\)-phase bounded state-reachability is undecidable even for finite-data programs.

- 3.
Any \(k\)-context execution of a program with \(n\) processors is a \((nk)^k\) delay execution according to our depth-first scheduler, since each context may occur across \(k\) phases, forcing each processor to delay its tasks at most \(k\) times, per context.

- 4.
While the context-bounded state-reachability problem for finite-data programs is 2EXPTIME-complete [3, 26], Theorem 3 demonstrates that phase and delay-bounded state-reachability, with the depth-first scheduler, is only EXPTIME-complete. Note that since every \(k\)-context execution (and many more, following our first point) is captured by a \(k\)-phase \(\mathcal O (k^k)\)-delay execution, phase and delay-bounding subsume context-bounding, capturing a superset of behaviors, in the same 2EXPTIME worst-case complexity.

- 5.
Since La Torre et al. [27] and Heußner et al. [21]’s setting does not allow a process to post messages to its own queue, simulating \(k\)-phase executions of single-processor programs requires at least \(2k\) contexts, i.e., of two separate processors emulating a single processor.

- 6.
Though phase-bounding leads to a convenient sequential encoding with easily implementable analysis algorithms, we are unaware whether a similar encoding is possible for context-bounding systems communicating with message queues.

*asynchronous programs*with dynamic task-creation [14, 15, 16, 18, 19, 23, 39] can be seen as overapproximating fifo task-buffer semantics, since no order between pending tasks is enforced.

## 8 Conclusion

By introducing a novel phase-based characterization of message-passing program executions, we enable bounded program exploration which is not limited by message-queue capacity nor the number of processors. We show that the resulting phase-bounded analysis problems can be solved by concise reduction to sequential program analysis. Preliminary evidence suggests our approach is at worst competitive with known task-order respecting bounded analysis techniques, and can easily scale where those techniques quickly explode.

## Footnotes

- 1.
Since the

**assume**statement may block executions depending on earlier nondeterministic choices, we consider only the values reached in completed executions as reachable; this avoids the problematic situation where a valuation is reached in an execution which cannot later complete because of finally non-realizable nondeterministic choices. - 2.
We assume each task in a given execution has implicitly a unique task-identifier.

- 3.
For simplicity our examples do not pass arguments to tasks; in general, one should also store in the task-queue array the values of arguments passed to each posted procedure.

- 4.
The number \(n\) of explored tasks is controlled by limiting the number of loop unrollings in the bounded task-queue encoding, and limiting the recursion depth, and phase-bound, in the bounded-phase encoding.

- 5.
For simplicity, this description supposes the scheduler can be the only source of program nondeterminism.

## Notes

### Acknowledgments

We graciously thank Constantin Enea, Cezara Dragoi, Pierre Ganty, Salvatore La Torre, and the anonymous TACAS and STTT reviewers for helpful feedback.

### References

- 1.Abdulla, P.A., Jonsson, B.: Verifying programs with unreliable channels. In: LICS ’93: Proceedings of the 8th Annual IEEE Symposium on Logic in Computer Science, pp. 160–170. IEEE Computer Society (1993)Google Scholar
- 2.Abdulla, P.A., Bouajjani, A., Jonsson, B.: On-the-fly analysis of systems with unbounded, lossy fifo channels. In: CAV ’98: Proceedings of the 10th International Conference on Computer Aided Verification, vol. 1427 of LNCS, pp. 305–318. Springer, Berlin (1998)Google Scholar
- 3.Atig, M.F., Bollig, B., Habermehl, P.: Emptiness of multi-pushdown automata is 2etime-complete. In: DLT ’08: Proceedings of the 12th International Conference on Developments in Language Theory, vol. 5257 of LNCS, pp. 121–133. Springer, Heidelberg (2008)Google Scholar
- 4.Atig, M.F., Bouajjani, A., Emmi, M., Lal, A.: Detecting fair non-termination in multithreaded programs. In: CAV ’12: Proceedings of the 24th International Conference on Computer Aided Verification, vol. 7358 of LNCS. Springer, Heidelberg (2012)Google Scholar
- 5.Barnett, M., Leino, K.R.M. Moskal, M., Schulte W.: Boogie: an intermediate verification language. http://research.microsoft.com/en-us/projects/boogie/. Accessed 1 Jan 2012
- 6.Boigelot, B., Godefroid, P.: Symbolic verification of communication protocols with infinite state spaces using QDDs. Form. Methods Syst. Design
**14**(3), 237–255 (1999)CrossRefGoogle Scholar - 7.Bouajjani, A., Emmi, M.: Bounded phase analysis of message-passing programs. In: TACAS ’12: Proceedings of the 18th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, vol. 7214 of LNCS, pp. 451–465. Springer, Heidelberg (2012)Google Scholar
- 8.Bouajjani, A., Habermehl, P.: Symbolic reachability analysis of fifo-channel systems with nonregular sets of configurations. Theor. Comput. Sci.
**221**(1–2), 211–250 (1999)CrossRefMATHMathSciNetGoogle Scholar - 9.Bouajjani, A., Habermehl, P., Vojnar, T.: Verification of parametric concurrent systems with prioritised FIFO resource management. Form. Methods Syst. Design
**32**(2), 129–172 (2008)CrossRefMATHGoogle Scholar - 10.Bouajjani, A. Emmi, M., Parlato G.: On sequentializing concurrent programs. In: SAS ’11: Proceedings of the 18th International Symposium on Static Analysis, vol. 6887 of LNCS, pp. 129–145. Springer, Heidelberg (2011)Google Scholar
- 11.Brand, D., Zafiropulo, P.: On communicating finite-state machines. J. ACM
**30**(2), 323–342 (1983)CrossRefMATHMathSciNetGoogle Scholar - 12.Chaudhuri, S.: Subcubic algorithms for recursive state machines. In: POPL ’08: Proceedings of the 35th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 159–169. ACM, New York (2008)Google Scholar
- 13.Dahl, R.: Node.js: Evented I/O for V8 JavaScript. http://nodejs.org/. Accessed 1 Jan 2012
- 14.Emmi, M., Lal, A.: Finding non-terminating executions in distributed asynchronous programs. In: SAS ’12: Proceedings of the 19th International Static Analysis Symposium, LNCS, pp. 439–455. Springer, Berlin (2012)Google Scholar
- 15.Emmi, M. Qadeer, S., Rakamaric, Z.: Delay-bounded scheduling. In: POPL ’11: Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 411–422. ACM, New York (2011)Google Scholar
- 16.Emmi, M., Lal, A., Qadeer, S.: Asynchronous programs with prioritized task-buffers. In: FSE ’12: Proceedings of the 20th International Symposium on the Foundations of Software Engineering. ACM, New York (2012)Google Scholar
- 17.Esparza, J., Kucera, A., Schwoon, S.: Model checking ltl with regular valuations for pushdown systems. Inf. Comput.
**186**(2), 355–376 (2003)CrossRefMATHMathSciNetGoogle Scholar - 18.Ganty, P., Majumdar, R.: Algorithmic verification of asynchronous programs. ACM Trans. Program. Lang. Syst.
**34**(1), 6 (2012)Google Scholar - 19.Ganty, P., Majumdar, R., Rybalchenko, A.: Verifying liveness for asynchronous programs. In: POPL 09: Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 102–113. ACM, New York (2009)Google Scholar
- 20.Haller, P., Odersky, M.: Scala actors: unifying thread-based and event-based programming. Theor. Comput. Sci.
**410**(2–3), 202–220 (2009)CrossRefMATHMathSciNetGoogle Scholar - 21.Heußner, A., Leroux, J., Muscholl, A., Sutre, G.: Reachability analysis of communicating pushdown systems. In: FOSSACS ’10: Proceedings of the 13th International Conference on Foundations of Software Science and Computational Structures, vol. 6014 of LNCS, pp. 267–281. Springer, Heidelberg (2010)Google Scholar
- 22.HTML5: A vocabulary and associated APIs for HTML and XHTML. http://dev.w3.org/html5/spec/Overview.html. Accessed 1 Jan 2012
- 23.Jhala, R., Majumdar, R.: Interprocedural analysis of asynchronous programs. In: POPL ’07: Proceedings of the 34th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 339–350. ACM, New York (2007)Google Scholar
- 24.Kidd, N., Jagannathan, S., Vitek, J.: One stack to run them all: reducing concurrent analysis to sequential analysis under priority scheduling. In: SPIN ’10: Proceedings of the 17th International Workshop on Model Checking Software, vol. 6349 of LNCS, pp. 245–261. Springer, Heidelberg (2010)Google Scholar
- 25.La Torre S., Parlato, G.: Scope-bounded multistack pushdown systems: fixed-point, sequentialization, and tree-width. In: FSTTCS ’12: Proceedings of the IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, vol. 18 of LIPIcs, pp. 173–184. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2012)Google Scholar
- 26.La Torre, S., Madhusudan, P., Parlato, G.: A robust class of context-sensitive languages. In: LICS ’07: Proceedings of the 22nd IEEE Symposium on Logic in Computer Science, pp. 161–170. IEEE Computer Society (2007)Google Scholar
- 27.La Torre, S. Madhusudan, P., Parlato G.: Context-bounded analysis of concurrent queue systems. In: TACAS ’08: Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, vol. 4963 of LNCS, pp. 299–314. Springer, Heidelberg (2008)Google Scholar
- 28.La Torre, S. Madhusudan, P., Parlato, G.: Reducing context-bounded concurrent reachability to sequential reachability. In: CAV ’09: Proceedings of the 21st International Conference on Computer Aided Verification, vol. 5643 of LNCS, pp. 477–492. Springer, Heidelberg (2009)Google Scholar
- 29.La Torre, S., Madhusudan, P., Parlato, G.: Model-checking parameterized concurrent programs using linear interfaces. In: CAV ’10: Proceedings of the 22nd International Conference on Computer Aided Verification, vol. 6174 of LNCS, pp. 629–644. Springer, Heidelberg (2010)Google Scholar
- 30.La Torre, S., Madhusudan, P., Parlato, G.: Sequentializing parameterized programs. In: FIT ’12: Proceedings of the Fourth Workshop on Foundations of Interface Technologies, vol. 87 of EPTCS, pp. 34–47 (2012)Google Scholar
- 31.Lal, A., Reps, T.W.: Reducing concurrent analysis under a context bound to sequential analysis. Form. Methods Syst. Design
**35**(1), 73–97 (2009)CrossRefMATHGoogle Scholar - 32.Lal, A. Qadeer, S., Lahiri, S.K.: A solver for reachability modulo theories. In: CAV ’12: Proceedings of the 24th International Conference on Computer Aided Verification, vol. 7358 of LNCS, pp. 427–443. Springer, Heidelberg (2012)Google Scholar
- 33.Lynch, N.A.: Distributed algorithms. ISBN 1-55860-348-4. Morgan Kaufmann, San Francisco (1996)Google Scholar
- 34.Miller, M.S., Tribble, E.D., Shapiro, J.S.: Concurrency among strangers. In: TGC ’05: Proceedings of the International Symposium on Trustworthy Global Computing, vol. 3705 of LNCS, pp. 195–229. Springer, Heidelberg (2005)Google Scholar
- 35.Post, E.L.: A variant of a recursively unsolvable problem. Bull. Am. Math. Soc
**52**(4), 264–268 (1946)CrossRefMATHMathSciNetGoogle Scholar - 36.Qadeer, S., Rehof, J.: Context-bounded model checking of concurrent software. In: TACAS ’05: Proceedings of the 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, vol. 3440 of LNCS, pp. 93–107. Springer, Heidelberg (2005)Google Scholar
- 37.Qadeer, S., Wu, D.: KISS: keep it simple and sequential. In: PLDI ’04: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 14–24. ACM, New York (2004)Google Scholar
- 38.Reps, T.W., Horwitz, S., Sagiv, S.: Precise interprocedural dataflow analysis via graph reachability. In: POPL ’95: Proceedings of the 22th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 49–61. ACM, New York (1995)Google Scholar
- 39.Sen, K., Viswanathan, M.: Model checking multithreaded programs with asynchronous atomic methods. In: CAV ’06: Proceedings of the 18th International Conference on Computer Aided Verification, vol. 4144 of LNCS, pp. 300–314. Springer, Heidelberg (2006)Google Scholar
- 40.Sharir, M., Pnueli, A.: Two approaches to interprocedural data-flow analysis. In: Muchnick, S.S., Jones, N.D. (eds.) Program Flow Analysis: Theory and Applications, chapter 7, pp. 189–234. Prentice-Hall, Englewood Cliffs (1981) Google Scholar
- 41.Svensson, H., Arts, T.: A new leader election implementation. In: Erlang ’05: Proceedings of the 2005 ACM SIGPLAN Workshop on Erlang, pp. 35–39. ACM, New York (2005)Google Scholar
- 42.Trottier-Hebert, F.: Learn you some Erlang for great good! http://learnyousomeerlang.com/. Accessed 1 Jan 2012