From Nonpreemptive to Preemptive Scheduling Using Synchronization Synthesis
 6 Citations
 1 Mentions
 951 Downloads
Abstract
Keywords
Concurrent Program Independence Relation Preemptive Schedule Abstract Semantic Nondeterministic Finite Automaton1 Introduction
Concurrent sharedmemory programming is notoriously difficult and errorprone. Program synthesis for concurrency aims to mitigate this complexity by synthesizing synchronization code automatically [4, 5, 8, 11]. However, specifying the programmer’s intent may be a challenge in itself. Declarative mechanisms, such as assertions, suffer from the drawback that it is difficult to ensure that the specification is complete and fully captures the programmer’s intent.
We propose a solution where the specification is implicit. We observe that a core difficulty in concurrent programming originates from the fact that the scheduler can preempt the execution of a thread at any time. We therefore give the developer the option to program assuming a friendly, nonpreemptive, scheduler. Our tool automatically synthesizes synchronization code to ensure that every behavior of the program under preemptive scheduling is included in the set of behaviors produced under nonpreemptive scheduling. Thus, we use the nonpreemptive semantics as an implicit correctness specification.
The nonpreemptive scheduling model dramatically simplifies the development of concurrent software, including operating system (OS) kernels, network servers, database systems, etc. [13, 14]. In this model, a thread can only be descheduled by voluntarily yielding control, e.g., by invoking a blocking operation. Synchronization primitives may be used for communication between threads, e.g., a producer thread may use a semaphore to notify the consumer about availability of data. However, one does not need to worry about protecting accesses to shared state: a series of memory accesses executes atomically as long as the scheduled thread does not yield.
In defining behavioral equivalence between preemptive and nonpreemptive executions, we focus on externally observable program behaviors: two program executions are observationally equivalent if they generate the same sequences of calls to interfaces of interest. This approach facilitates modular synthesis where a module’s behavior is characterized in terms of its interaction with other modules. Given a multithreaded program \(\mathcal {C}\) and a synthesized program \(\mathcal {C}'\) obtained by adding synchronization to \(\mathcal {C}\), \(\mathcal {C}'\) is preemptionsafe w.r.t. \(\mathcal {C}\) if for each execution of \(\mathcal {C}'\) under a preemptive scheduler, there is an observationally equivalent nonpreemptive execution of \(\mathcal {C}\). Our synthesis goal is to automatically generate a preemptionsafe version of the input program.
We rely on abstraction to achieve efficient synthesis of multithreaded programs. We propose a simple, dataoblivious abstraction inspired by an analysis of synchronization patterns in OS code, which tend to be independent of data values. The abstraction tracks types of accesses (read or write) to each memory location while ignoring their values. In addition, the abstraction tracks branching choices. Calls to an external interface are modeled as writes to a special memory location, with independent interfaces modeled as separate locations. To the best of our knowledge, our proposed abstraction is yet to be explored in the verification and synthesis literature.
Two abstract program executions are observationally equivalent if they are equal modulo the classical independence relation I on memory accesses: accesses to different locations are independent, and accesses to the same location are independent iff they are both read accesses. Using this notion of equivalence, the notion of preemptionsafety is extended to abstract programs.
Under abstraction, we model each thread as a nondeterministic finite automaton (NFA) over a finite alphabet, with each symbol corresponding to a read or a write to a particular variable. This enables us to construct NFAs N, representing the abstraction of the original program \(\mathcal {C}\) under nonpremptive scheduling, and P, representing the abstraction of the synthesized program \(\mathcal {C}'\) under preemptive scheduling. We show that preemptionsafety of \(\mathcal {C}'\) w.r.t. \(\mathcal {C}\) is implied by preemptionsafety of the abstract synthesized program w.r.t. the abstract original program, which, in turn, is implied by language inclusion modulo I of NFAs P and N. While the problem of language inclusion modulo an independence relation is undecidable [2], we show that the antichainbased algorithm for standard language inclusion [9] can be adapted to decide a bounded version of language inclusion modulo an independence relation.
Our overall synthesis procedure works as follows: we run the algorithm for bounded language inclusion modulo I, iteratively increasing the bound, until it reports that the inclusion holds, or finds a counterexample, or reaches a timeout. In the first case, the synthesis procedure terminates successfully. In the second case, the counterexample is generalized to a set of counterexamples represented as a Boolean combination of ordering constraints over controlflow locations (as in [11]). These constraints are analyzed for patterns indicating the type of concurrency bug (atomicity, ordering violation) and the type of applicable fix (lock insertion, statement reordering). After applying the fix(es), the procedure is restarted from scratch; the process continues until we find a preemptionsafe program, or reach a timeout.
We implemented our synthesis procedure in a new prototype tool called Liss (Language Inclusionbased Synchronization Synthesis) and evaluated it on a series of device driver benchmarks, including an Ethernet driver for Linux and the synchronization skeleton of a USBtoserial controller driver. First, Liss was able to detect and eliminate all but two known race conditions in our examples; these included one race condition that we previously missed when synthesizing from explicit specifications [5], due to a missing assertion. Second, our abstraction proved highly efficient: Liss runs an order of magnitude faster on the more complicated examples than our previous synthesis tool based on the CBMC model checker. Third, our coarse abstraction proved surprisingly precise in practice: across all our benchmarks, we only encountered three program locations where manual abstraction refinement was needed to avoid the generation of unnecessary synchronization. Overall, our evaluation strongly supports the use of the implicit specification approach based on nonpreemptive scheduling semantics as well as the use of the dataoblivious abstraction to achieve practical synthesis for realworld systems code.
Contributions. First, we propose a new specificationfree approach to synchronization synthesis. Given a program written assuming a friendly, nonpreemptive scheduler, we automatically generate a preemptionsafe version of the program. Second, we introduce a novel abstraction scheme and use it to reduce preemptionsafety to language inclusion modulo an independence relation. Third, we present the first language inclusionbased synchronization synthesis procedure and tool for concurrent programs. Our synthesis procedure includes a new algorithm for a bounded version of our inherently undecidable language inclusion problem. Finally, we evaluate our synthesis procedure on several examples. To the best of our knowledge, Liss is the first synthesis tool capable of handling realistic (albeit simplified) device driver code, while previous tools were evaluated on small fragments of driver code or on manually extracted synchronization skeletons.
Related Work. Synthesis of synchronization is an active research area [3, 4, 5, 6, 10, 11, 12, 15, 16]. Closest to our work is a recent paper by Bloem et al. [3], which uses implicit specifications for synchronization synthesis. While their specification is given by sequential behaviors, ours is given by nonpreemptive behaviors. This makes our approach applicable to scenarios where threads need to communicate explicitly. Further, correctness in [3] is determined by comparing values at the end of the execution. In contrast, we compare sequences of events, which serves as a more suitable specification for infinitelylooping reactive systems.
Many efforts in synthesis of synchronization focus on userprovided specifications, such as assertions (our previous work [4, 5, 11]). However, it is hard to determine if a given set of assertions represents a complete specification. In this paper, we are solving language inclusion, a computationally harder problem than reachability. However, due to our abstraction, our tool performs significantly better than tools from [4, 5], which are based on a mature model checker (CBMC [7]). Our abstraction is reminiscent of previously used abstractions that track reads and writes to individual locations (e.g., [1, 17]). However, our abstraction is novel as it additionally tracks some controlflow information (specifically, the branches taken) giving us higher precision with almost negligible computational cost. The synthesis part of our approach is based on [11].
In [16] the authors rely on assertions for synchronization synthesis and include iterative abstraction refinement in their framework. This is an interesting extension to pursue for our abstraction. In other related work, CFix [12] can detect and fix concurrency bugs by identifying simple bug patterns in the code.
2 Illustrative Example
Fig. 1a contains our running example. Consider the case where the procedures open_dev() and close_dev() are invoked in parallel, possibly multiple times (modeled as a nondeterministic while loop). The functions power_up() and power_down() represent calls to a device. For the nonpreemptive scheduler, the sequence of calls to the device will always be a repeating sequence of one call to power_up(), followed by one call to power_down(). Without additional synchronization, however, there could be two calls to power_up() in a row when executing it with a preemptive scheduler. Such a sequence is not observationally equivalent to any sequence that can be produced when executing with a nonpreemptive scheduler.
Fig. 1b contains the abstracted versions (we omit tracking of branching choices in the example) of the two procedures, open_dev_abs() and close_dev_abs(). For instance, the instruction open = open + 1 is abstracted to the two instructions labeled (C) and (D). The abstraction is coarse, but still captures the problem. Consider two threads T1 and T2 running the open_dev_abs() procedure. The following trace is possible under a preemptive scheduler, but not under a nonpreemptive scheduler: T1.A; T2.A; T1.B; T1.C; T1.D; T2.B; T2.C; T2.D. Moreover, the trace cannot be transformed by swapping independent events into any trace possible under a nonpreemptive scheduler. This is because instructions A and D are not independent. Hence, the abstract trace exhibits the problem of two successive calls to power_up() when executing with a preemptive scheduler. Our synthesis procedure finds this problem, and fixes it by introducing a lock in open_dev() (see Sect. 5).
3 Preliminaries and Problem Statement
Semantics. We begin by defining the semantics of a single thread in \(\mathcal W\), and then extend the definition to concurrent nonpreemptive and preemptive semantics. Note that in our work, reads and writes are assumed to execute atomically and further, we assume a sequentially consistent memory model.
SingleThread Semantics. A program state is given by \(\langle \mathcal V, P\mathcal \rangle \) where \(\mathcal V\) is a valuation of all program variables, and \( P \) is the statement that remains to be executed. Let us fix a thread identifier \( tid \).
The operational semantics of a thread executing in isolation is given in Fig. 3. A single execution step \(\langle \mathcal V, P\mathcal \rangle \xrightarrow {\alpha } \langle \mathcal V', P\mathcal ' \rangle \) changes the program state from \(\langle \mathcal V, P\mathcal \rangle \) to \(\langle \mathcal V', P\mathcal '\rangle \) while optionally outputting an observable symbol \(\alpha \). The absence of a symbol is denoted using \(\epsilon \). Most rules from Fig. 3 are standard—the special rules are the Havoc, Input, and Output rules.
 1.
Havoc: Statement \(l: x:= \mathsf{havoc}\) assigns x a nondeterministic value (say k) and outputs the observable \(( tid , {\mathsf {havoc}}, k, x)\).
 2.
Input, Output: \(l: x:=\mathsf{input} (t)\) and \(l: \mathsf{output}(t,e)\) read and write values to the channel t, and output \(( tid , \mathsf{input}, k, t)\) and \(( tid , \mathsf{output}, k, t)\), where k is the value read or written, respectively.
NonPreemptive Semantics. The nonpreemptive semantics of \(\mathcal W\) is presented in the full version [18]. The nonpreemptive semantics ensures that a single thread from the program keeps executing as detailed above until one of the following occurs: (a) the thread finishes execution, or it encounters (b) a yield statement, or (c) a lock statement and the lock is taken, or (d) an await statement and the condition variable is not set. In these cases, a contextswitch is possible.
Preemptive Semantics. The preemptive semantics of a program is obtained from the nonpreemptive semantics by relaxing the condition on contextswitches, and allowing contextswitches at all program points (see full version [18]).
3.1 Problem Statement
A nonpreemptive observation sequence of a program \(\mathcal {C}\) is a sequence \(\alpha _0\ldots \alpha _k\) if there exist program states \(S_0^{pre}\), \(S_0^{post}\), ..., \(S_k^{pre}\), \(S_k^{post}\) such that according to the nonpreemptive semantics of \(\mathcal W\), we have: (a) for each \(0 \le i \le k\), \(\langle S_i^{pre} \rangle \xrightarrow {\alpha _i} \langle S_i^{post} \rangle \), (b) for each \(0 \le i < k\), Open image in new window , and (c) for the initial state \(S_\iota \) and a final state (i.e., where all threads have finished execution) \(S_f\), Open image in new window and Open image in new window . Similarly, a preemptive observation sequence of a program \(\mathcal {C}\) is a sequence \(\alpha _0\ldots \alpha _k\) as above, with the nonpreemptive semantics replaced with preemptive semantics. We denote the sets of nonpreemptive and preemptive observation sequences of a program \(\mathcal {C}\) by \([\![ \mathcal {C} ]\!]^{NP}\) and \([\![ \mathcal {C} ]\!]^P\), respectively.

The subsequences of \(\alpha _0\ldots \alpha _k\) and \(\beta _0\ldots \beta _k\) containing only symbols of the form \(( tid , \mathsf {Input}, k, t)\) and \(( tid , \mathsf {Output}, k, t)\) are equal, and

For each thread identifier \( tid \), the subsequences of \(\alpha _0\ldots \alpha _k\) and \(\beta _0\ldots \beta _k\) containing only symbols of the form \(( tid , \mathsf {Havoc}, k, x)\) are equal.
Intuitively, observable sequences are equivalent if they have the same interaction with the interface, and the same nondeterministic choices in each thread. For sets of observable sequences \(\mathcal {O}_1\) and \(\mathcal {O}_2\), we write \(\mathcal {O}_1 \subseteq \mathcal {O}_2\) to denote that each sequence in \(\mathcal {O}_1\) has an equivalent sequence in \(\mathcal {O}_2\). Given a concurrent program \(\mathcal {C}\) and a synthesized program \(\mathcal {C}'\) obtained by adding synchronization to \(\mathcal {C}\), the program \(\mathcal {C}'\) is preemptionsafe w.r.t. \(\mathcal {C}\) if \([\![ \mathcal {C}' ]\!]^{P} \subseteq [\![ \mathcal {C} ]\!]^{NP}\).
We are now ready to state our synthesis problem. Given a concurrent program \(\mathcal {C}\), the aim is to synthesize a program \(\mathcal {C}'\), by adding synchronization to \(\mathcal {C}\), such that \(\mathcal {C}'\) is preemptionsafe w.r.t. \(\mathcal {C}\).
3.2 Language Inclusion Modulo an Independence Relation
We reduce the problem of checking if a synthesized solution is preemptionsafe w.r.t. the original program to an automatatheoretic problem.
Abstract Semantics for \(\mathcal W\). We first define a singlethread abstract semantics for \(\mathcal W\) (Fig. 4), which tracks types of accesses (read or write) to each memory location while abstracting away their values. Inputs/outputs to an external interface are modeled as writes to a special memory location (dev). Even inputs are modeled as writes because in our applications we cannot assume that reads from the external interface are free of sideeffects. Havocs become ordinary writes to the variable they are assigned to. Every branch is taken nondeterministically and tracked. The only constructs preserved are the lock and condition variables. The abstract program state consists of the valuations of the lock and condition variables and the statement that remains to be executed. In the abstraction, an observable is of the form \(( tid , \{\mathsf {read,write,exit,loop,then,else}\}, v, l)\) and observes the type of access (read/write) to variable v and records nondeterministic branching choices (exit/loop/then/else). The latter are not associated with any variable.
The abstract program semantics is the same as the concrete program semantics where the single thread semantics is replaced by the abstract single thread semantics. Locks and conditionals and operations on them are not abstracted.
As with the concrete semantics of \(\mathcal W\), we can define the nonpreemptive and preemptive observable sequences for abstract semantics. For a concurrent program \(\mathcal {C}\), we denote the sets of abstract preemptive and nonpreemptive observable sequences by \([\![ \mathcal {C} ]\!]^{P}_{abs}\) and \([\![ \mathcal {C} ]\!]^{NP}_{abs}\), respectively.

For each thread \( tid \), the subsequences of \(\alpha _0\ldots \alpha _k\) and \(\beta _0\ldots \beta _k\) containing only symbols of the form \(( tid ,a,v,l)\), with \(a\in \{\mathsf {read,write,exit,loop,then,else}\}\) are equal,

For each variable v, the subsequences of \(\alpha _0\ldots \alpha _k\) and \(\beta _0\ldots \beta _k\) containing only write symbols (of the form \(( tid , \mathsf {write}, v, l)\)) are equal, and

For each variable v, the multisets of symbols of the form \(( tid , \mathsf {read}, v, l)\) between any two write symbols, as well as before the first write symbol and after the last write symbol are identical.
We first show that the abstract semantics is sound w.r.t. preemptionsafety (see full version for the proof [18]).
Theorem 1
Given concurrent program \(\mathcal {C}\) and a synthesized program \(\mathcal {C}'\) obtained by adding synchronization to \(\mathcal {C}\), \([\![ \mathcal {C}' ]\!]^{P}_{abs} \subseteq [\![ \mathcal {C} ]\!]^{NP}_{abs}\Rightarrow [\![ \mathcal {C}' ]\!]^{P} \subseteq [\![ \mathcal {C} ]\!]^{NP}\).
Abstract Semantics to Automata. An NFA \(\mathcal A\) is a tuple \((Q, \varSigma , \varDelta , Q_\iota , F)\) where \(\varSigma \) is a finite alphabet, \(Q,Q_\iota ,F\) are finite sets of states, initial states and final states, respectively and \(\varDelta \) is a set of transitions. A word \(\sigma _0\ldots \sigma _k \in \varSigma ^*\) is accepted by \(\mathcal A\) if there exists a sequence of states \(q_0\ldots q_{k+1}\) such that \(q_0\in Q_\iota \) and \(q_{k+1}\in F\) and \(\forall i:(q_i, \sigma _i, q_{i+1}) \in \varDelta \). The set of all words accepted by \(\mathcal A\) is called the language of \(\mathcal A\) and is denoted \(\mathcal {L}(\mathcal A)\).
Given a program \(\mathcal {C}\), we can construct automata \(\mathcal A([\![ \mathcal {C} ]\!]^{NP}_{abs})\) and \(\mathcal A([\![ \mathcal {C} ]\!]^{P}_{abs})\) that accept exactly the observable sequences under the respective semantics. We describe their construction informally. Each automaton state is a program state of the abstract semantics and the alphabet is the set of abstract observable symbols. There is a transition from one state to another on an observable symbol (or an \(\epsilon \)) iff the program can execute one step under the corresponding semantics to reach the other state while outputting the observable symbol (on an \(\epsilon \)).
Language Inclusion Modulo an Independence Relation. Let I be a nonreflexive, symmetric binary relation over an alphabet \(\varSigma \). We refer to I as the independence relation and to elements of I as independent symbol pairs. We define a symmetric binary relation \(\approx \) over words in \(\varSigma ^*\): for all words \(\sigma , \sigma ' \in \varSigma ^*\) and \((\alpha , \beta ) \in I\), \((\sigma \cdot \alpha \beta \cdot \sigma ', \sigma \cdot \beta \alpha \cdot \sigma ') \in \, \approx \). Let \(\approx ^t\) denote the reflexive transitive closure of \(\approx \).^{1} Given a language \(\mathcal{L}\) over \(\varSigma \), the closure of \(\mathcal{L}\) w.r.t. I, denoted \(\mathrm {Clo}_I(\mathcal{L})\), is the set \(\{\sigma \in \varSigma ^* {:}\ \exists \sigma ' \in \mathcal L \text { with } (\sigma ,\sigma ') \in \, \approx ^t\}\). Thus, \(\mathrm {Clo}_I(\mathcal{L})\) consists of all words that can be obtained from some word in \(\mathcal{L}\) by repeatedly commuting adjacent independent symbol pairs from I.
Definition 1
(Language Inclusion Modulo an Independence Relation). Given NFAs A, B over a common alphabet \(\varSigma \) and an independence relation I over \(\varSigma \), the language inclusion problem modulo I is: \(\mathcal L(\text{ A }) \subseteq \mathrm {Clo}_I(\mathcal L(\text{ B }))\)?
We reduce preemptionsafety under the abstract semantics to language inclusion modulo an independence relation. The independence relation I we use is defined on the set of abstract observable symbols as follows: \((( tid , a,v, l), ( tid ', a',v', l')) \in I\) iff \( tid \ne tid '\), and one of the following holds: (a) \(v \ne v'\) or (b) \(a \ne \mathsf {write} \wedge a'\ne \mathsf {write}\).
Proposition 1
Given concurrent programs \(\mathcal {C}\) and \(\mathcal {C}'\), \([\![ \mathcal {C}' ]\!]^{P}_{abs} \subseteq [\![ \mathcal {C} ]\!]^{NP}_{abs}\) iff \(\mathcal L(\mathcal A([\![ \mathcal {C}' ]\!]^{P}_{abs})) \subseteq \mathrm {Clo}_I(\mathcal L(\mathcal A([\![ \mathcal {C} ]\!]^{NP}_{abs})))\).
4 Checking Language Inclusion
We first focus on the problem of language inclusion modulo an independence relation (Definition 1). This question corresponds to preemptionsafety (Theorem. 1, Proposition 1) and its solution drives our synchronization synthesis (Sect. 5).
Theorem 2
For NFAs A, B over alphabet \(\varSigma \) and an independence relation \(I\subseteq \varSigma \times \varSigma \), \(\mathcal L(A)\subseteq \mathrm {Clo}_I(\mathcal L(B))\) is undecidable [2].
Fortunately, a bounded version of the problem is decidable. Recall the relation \(\approx \) over \(\varSigma ^*\) from Sect. 3.2. We define a symmetric binary relation \(\approx _i\) over \(\varSigma ^*\): \((\sigma , \sigma ') \in \, \approx _i\) iff \(\exists (\alpha ,\beta ) \in I\): \((\sigma , \sigma ') \in \, \approx \), \(\sigma [i] = \sigma '[i+1] = \alpha \) and \(\sigma [i+1] = \sigma '[i] = \beta \). Thus \(\approx ^i\) consists of all words that can be optained from each other by commuting the symbols at positions i and \(i+1\). We next define a symmetric binary relation \(\asymp \) over \(\varSigma ^*\): \((\sigma ,\sigma ') \in \, \asymp \) iff \(\exists \sigma _1,\ldots ,\sigma _t\): \((\sigma ,\sigma _1) \in \, \approx _{i_1},\ldots , (\sigma _{t},\sigma ') \in \, \approx _{i_{t+1}}\) and \(i_1 < \ldots < i_{t+1}\). The relation \(\asymp \) intuitively consists of words obtained from each other by making a single forward pass commuting multiple pairs of adjacent symbols. Let \(\asymp ^k\) denote the kcomposition of \(\asymp \) with itself. Given a language \(\mathcal{L}\) over \(\varSigma \), we use \(\mathrm {Clo}_{k,I}(\mathcal{L})\) to denote the set \(\{\sigma \in \varSigma ^*: \exists \sigma ' \in \mathcal L \text { with } (\sigma ,\sigma ') \in \, \asymp ^{\scriptstyle k} \}\). In other words, \(\mathrm {Clo}_{k,I}(\mathcal{L})\) consists of all words which can be generated from \(\mathcal{L}\) using a finitestate transducer that remembers at most k symbols of its input words in its states.
Definition 2
(Bounded Language Inclusion Modulo an Independence Relation). Given NFAs \(A, B\) over \(\varSigma \), \(I\subseteq \varSigma \times \varSigma \) and a constant \(k>0\), the kbounded language inclusion problem modulo I is: \(\mathcal L(\text{ A })\subseteq \mathrm {Clo}_{k,I}(\mathcal L(\text{ B }))\)?
Theorem 3
For NFAs \(A, B\) over \(\varSigma \), \(I\subseteq \varSigma \times \varSigma \) and a constant \(k>0\), \(\mathcal L(\text{ A }) \subseteq \mathrm {Clo}_{k,I}(\mathcal L(\text{ B }))\) is decidable.
We present an algorithm to check kbounded language inclusion modulo I, based on the antichain algorithm for standard language inclusion [9].
Antichain Algorithm for Language Inclusion. Given a partial order \((X, \sqsubseteq )\), an antichain over X is a set of elements of X that are incomparable w.r.t. \(\sqsubseteq \). In order to check \(\mathcal L(A)\subseteq \mathrm {Clo}_I(\mathcal L(B))\) for NFAs \(A = (Q_A,\varSigma ,\varDelta _A,Q_{\iota ,A},F_A)\) and \(B = (Q_B,\varSigma ,\varDelta _B,Q_{\iota ,B},F_B)\), the antichain algorithm proceeds by exploring \(A\) and \(B\) in lockstep. While \(A\) is explored nondeterministically, \(B\) is determinized on the fly for exploration. The algorithm maintains an antichain, consisting of tuples of the form \((s_A, S_B)\), where \(s_A\in Q_A\) and \(S_B\subseteq Q_B\). The ordering relation \(\sqsubseteq \) is given by \((s_A, S_B) \sqsubseteq (s'_A, S'_B)\) iff \(s_A= s'_A\) and \(S_B\subseteq S'_B\). The algorithm also maintains a frontier set of tuples yet to be explored.
Given state \(s_A\in Q_A\) and a symbol \(\alpha \in \varSigma \), let \(succ_\alpha (s_A)\) denote \(\{s_A' \in Q_A: (s_A,\alpha ,s_A') \in \varDelta _A\}\). Given set of states \(S_B\subseteq Q_B\), let \(succ_\alpha (S_B)\) denote \(\{s_B'\in Q_B: \exists s_B\in S_B:\ (s_B,\alpha ,s_B')\in \varDelta _B\}\). Given tuple \((s_A, S_B)\) in the frontier set, let \(succ_\alpha (s_A, S_B)\) denote \(\{(s'_A,S'_B): s'_A\in succ_\alpha (s_A), S'_B= succ_\alpha (s_B)\}\).

Rule 1: if there exists a tuple p in the antichain with \(p \sqsubseteq p'\), then \(p'\) is not added to the frontier set or antichain,

Rule 2: else, if there exist tuples \(p_1, \ldots , p_n\) in the antichain with \(p' \sqsubseteq p_1, \ldots , p_n\), then \(p_1, \ldots , p_n\) are removed from the antichain.
The algorithm terminates by either reporting a counterexample, or by declaring success when the frontier becomes empty.
Antichain Algorithm for k Bounded Language Inclusion modulo I . This algorithm is essentially the same as the standard antichain algorithm, with the automaton \(B\) above replaced by an automaton \(B_{k,I}\) accepting \(\mathrm {Clo}_{k,I}(\mathcal L(\text{ B }))\). The set \(Q_{B_{k,I}}\) of states of \(B_{k,I}\) consists of triples \((s_B, \eta _1, \eta _2)\), where \(s_B\in Q_B\) and \(\eta _1, \eta _2\) are klength words over \(\varSigma \). Intuitively, the words \(\eta _1\) and \(\eta _2\) store symbols that are expected to be matched later along a run. The set of initial states of \(B_{k,I}\) is \(\{(s_B,\varnothing ,\varnothing ): s_B\in I_B\}\). The set of final states of \(B_{k,I}\) is \(\{(s_B,\varnothing ,\varnothing ): s_B\in F_B\}\). The transition relation \(\varDelta _{B_{k,I}}\) is constructed by repeatedly applying the following rules, in order, for each state \((s_B, \eta _1, \eta _2)\) and each symbol \(\alpha \). In what follows, \(\eta [\setminus i]\) denotes the word obtained from \(\eta \) by removing its \(i^{th}\) symbol.
 1.
Pick new \(s'_B\) and \(\beta \in \varSigma \) such that \((s_B, \beta , s_B') \in \varDelta _B\)
 2.
(a) If \(\forall i\): \(\eta _1[i] \ne \alpha \) and \(\alpha \) is independent of all symbols in \(\eta _1\), \(\eta _2' \, \mathtt {:=}\, \eta _2\cdot \alpha \) and \(\eta _1' \, \mathtt {:=}\, \eta _1\), (b) else, if \(\exists i\): \(\eta _1[i] = \alpha \) and \(\alpha \) is independent of all symbols in \(\eta _1\) prior to i, \(\eta _1' \, \mathtt {:=}\, \eta _1[\setminus i]\) and \(\eta _2'\, \mathtt {:=}\, \eta _2\) (c) else, go to 1
 3.
(a) If \(\forall i\): \(\eta _2'[i] \ne \beta \) and \(\beta \) is independent of all symbols in \(\eta _2'\), \(\eta _1' \, \mathtt {:=}\eta _1' \,\cdot \beta \), (b) else, if \(\exists i\): \(\eta _2'[i] = \beta \) and \(\beta \) is independent of all symbols in \(\eta _2'\) prior to i, \(\eta _2' \, \mathtt {:=}\, \eta _2'[\setminus i]\) (c) else, go to 1
 4.
Add \(((s_B, \eta _1, \eta _2),\alpha ,(s'_B, \eta _1', \eta _2'))\) to \(\varDelta _{B_{k,I}}\) and go to 1.
Example 1
In Fig. 5, we have an NFA \(B\) with \(\mathcal L(\text{ B })= \{\alpha \beta , \beta \}\), \(I = \{(\alpha ,\beta )\}\) and \(k = 1\). The states of \(B_{k,I}\) are triples \((q, \eta _1, \eta _2)\), where \(q \in Q_B\) and \(\eta _1, \eta _2\in \{\varnothing ,\alpha ,\beta \}\). We explain the derivation of a couple of transitions of \(B_{k,I}\). The transition shown in bold from \((q_0, \varnothing ,\varnothing )\) on symbol \(\beta \) is obtained by applying the following rules once: 1. Pick \(q_1\) since \((q_0, \alpha , q_1) \in \varDelta _B\). 2(a). \(\eta _2'\ \mathtt {:=}\ \beta \), \(\eta _1'\ \mathtt {:=}\ \varnothing \). 3(a). \(\eta _1'\ \mathtt {:=}\ \alpha \). 4. Add \(((q_0, \varnothing , \varnothing ),\beta ,(q_1, \alpha , \beta ))\) to \(\varDelta _{B_{k,I}}\). The transition shown in bold from \((q_1, \alpha ,\beta )\) on symbol \(\alpha \) is obtained as follows: 1. Pick \(q_2\) since \((q_1, \beta , q_2) \in \varDelta _B\). 2(b). \(\eta _1'\ \mathtt {:=}\ \varnothing \), \(\eta _2'\ \mathtt {:=}\ \beta \). 3(b). \(\eta _2'\ \mathtt {:=}\ \varnothing \). 4. Add \(((q_1, \alpha , \beta ),\beta ,(q_2, \varnothing , \varnothing ))\) to \(\varDelta _{B_{k,I}}\). It can be seen that \(B_{k,I}\) accepts the language \(\{\alpha \beta ,\beta \alpha ,\beta \} = \mathrm {Clo}_{k,I}(B)\).
Proposition 2
Given \(k>0\), NFA \(B_{k,I}\) described above accepts \(\mathrm {Clo}_{k,I}(\mathcal L(\text{ B }))\).
We develop a procedure to check language inclusion modulo I by iteratively increasing the bound k (see the full version [18] for the complete algorithm). The procedure is incremental: the check for \(k+1\)bounded language inclusion modulo I only explores paths along which the bound k was exceeded in the previous iteration.
5 Synchronization Synthesis
We now present our iterative synchronization synthesis procedure, which is based on the procedure in [11]. The reader is referred to [11] for further details. The synthesis procedure starts with the original program \(\mathcal {C}\) and in each iteration generates a candidate synthesized program \(\mathcal {C}'\). The candidate \(\mathcal {C}'\) is checked for preemptionsafety w.r.t. \(\mathcal {C}\) under the abstract semantics, using our procedure for bounded language inclusion modulo I. If \(\mathcal {C}'\) is found preemptionsafe w.r.t. \(\mathcal {C}\) under the abstract semantics, the synthesis procedure outputs \(\mathcal {C}'\). Otherwise, an abstract counterexample \(cex\) is obtained. The counterexample is analyzed to infer additional synchronization to be added to \(\mathcal {C}'\) for generating a new synthesized candidate.
The counterexample trace \(cex\) is a sequence of event identifiers: \( tid _0.l_0 ; \ldots ; tid _n.l_n\), where each \(l_i\) is a location identifier. We first analyze the neighborhood of \(cex\), denoted \(nhood(cex)\), consisting of traces that are permutations of the events in \(cex\). Note that each trace corresponds to an abstract observation sequence. Furthermore, note that preemptionsafety requires the abstract observation sequence of any trace in \(nhood(cex)\) to be equivalent to that of some trace in \(nhood(cex)\) feasible under nonpreemptive semantics. Let bad traces refer to the traces in \(nhood(cex)\) that are feasible under preemptive semantics and do not meet the preemptionsafety requirement. The goal of our counterexample analysis is to characterize all bad traces in \(nhood(cex)\) in order to enable inference of synchronization fixes.
In order to succinctly represent subsets of \(nhood(cex)\), we use ordering constraints. Intuitively, ordering constraints are of the following forms: (a) atomic constraints \(\varPhi = A < B\) where A and B are events from \(cex\). The constraint \(A < B\) represents the set of traces in \(nhood(cex)\) where event A is scheduled before event B; (b) Boolean combinations of atomic constraints \(\varPhi _1 \wedge \varPhi _2\), \(\varPhi _1 \vee \varPhi _2\) and \(\lnot \varPhi _1\). We have that \(\varPhi _1 \wedge \varPhi _2\) and \(\varPhi _1 \vee \varPhi _2\) respectively represent the intersection and union of the set of traces represented by \(\varPhi _1\) and \(\varPhi _2\), and that \(\lnot \varPhi _1\) represents the complement (with respect to \(nhood(cex)\)) of the traces represented by \(\varPhi _1\).
Nonpreemptive Neighborhood. First, we generate all traces in \(nhood(cex)\) that are feasible under nonpreemptive semantics. We represent a single trace \(\pi \) using an ordering constraint \(\varPhi _\pi \) that captures the ordering between nonindependent accesses to variables in \(\pi \). We represent all traces in \(nhood(cex)\) that are feasible under nonpreemptive semantics using the expression \(\varPhi = \bigvee _{\pi } \varPhi _\pi \). The expression \(\varPhi \) acts as the correctness specification for traces in \(nhood(cex)\).
Example. Recall the counterexample trace from the running example in Sect. 2: \(cex= \mathtt {T1.A; T2.A; T1.B; T1.C; T1.D; T2.B; T2.C; T2.D}\). There are two trace in \(nhood(cex)\) that are feasible under nonpreemptive semantics:\(\pi _1=\mathtt {T1.A;T1.B;T1.C;T1.D;T2.A;T2.B;T2.C;T2.D}\) and \(\pi _2=\mathtt {T2.A;T2.B;T2.C;}\) \({T2.D;T1.A;T1.B;T1.C;T1.D}\). We represent \(\pi _1\) as \(\varPhi (\pi _1)= \{\mathtt {T1.A,T1.C,T1.D}\} <\mathtt {T2.D} \wedge \mathtt {T1.D}<\{\mathtt {T2.A,T2.C,T2.D}\} \wedge \mathtt {T1.B}<\mathtt {T2.B}\) and \(\pi _2\) as \(\varPhi (\pi _2) = \mathtt {T2.D} < \{\mathtt {T1.A,T1.C,T1.D}\} \wedge \{\mathtt {T2.A,T2.C,T2.D}\} < \mathtt {T1.D} \wedge \mathtt {T2.B}<\mathtt {T1.B}\). The correctness specification is \(\varPhi = \varPhi (\pi _1) \vee \varPhi (\pi _2)\).
Counterexample Generalization. We next build a quantifierfree first order formula \(\varPsi \) over the event identifiers in \(cex\) such that any model of \(\varPsi \) corresponds to a bad trace in \(nhood(cex)\). We iteratively enumerate models \(\pi \) of \(\varPsi \), building a constraint \(\rho = \varPhi (\pi )\) for each model \(\pi \), and generalizing each \(\rho \) into \(\rho _g\) to represent a larger set of bad traces.
Example. Our trace cex from Sect. 2 would be generalized to \(\mathtt {T2.A}<\mathtt {T1.D} \wedge \mathtt {T1.D}<\mathtt {T2.D}\). Any trace that fulfills this constraint is bad.
Inferring Fixes. From each generalized formula \(\rho _g\) described above, we infer possible synchronization fixes to eliminate all bad traces satisfying \(\rho _g\). The key observation we exploit is that common concurrency bugs often show up in our formulas as simple patterns of ordering constraints between events. For example, the pattern \( tid _1.l_1 < tid _2.l_2 \; \wedge \; tid _2.l'_2 < tid _1.l'_1\) indicates an atomicity violation and can be rewritten into \(\mathtt {lock}( tid _1.[l_1:l'_1], tid _2.[l_2:l'_2])\). The complete list of such rewrite rules is in the full version [18]. This list includes inference of locks and reordering of notify statements. The set of patterns we use for synchronization inference are not complete, i.e., there might be generalized formulae \(\rho _g\) that are not matched by any pattern. In practice, we found our current set of patterns to be adequate for most common concurrency bugs, including all bugs from the benchmarks in this paper. Our technique and tool can be easily extended with new patterns.
Example. The generalized constraint \(\mathtt {T2.A < T1.D \; \wedge \; T1.D<T2.D}\) matches the lock rule and yields \(\mathtt {lock(T2.[A:D],T1.[D:D])}\). Since the lock involves events in the same function, the lock is merged into a single lock around instructions \(\mathtt {A}\) and \(\mathtt {D}\) in open_dev_abs. This lock is not sufficient to make the program preemptionsafe. Another iteration of the synthesis procedure generates another counterexample for analysis and synchronization inference.
Proposition 3
If our synthesis procedure generates a program \(\mathcal {C}'\), then \(\mathcal {C}'\) is preemptionsafe with respect to \(\mathcal {C}\).
Note that our procedure does not guarantee that the synthesized program \(\mathcal {C}'\) is deadlockfree. However, we avoid obvious deadlocks using heursitics such as merging overlapping locks. Further, our tool supports detection of any additional deadlocks introduced by synthesis, but relies on the user to fix them.
6 Implementation and Evaluation
We implemented our synthesis procedure in Liss. Liss is comprised of 5000 lines of C++ code and uses Clang/LLVM and Z3 as libraries. It is available as opensource software along with benchmarks at https://github.com/thorstent/Liss. The language inclusion algorithm is available separately as a library called Limi (https://github.com/thorstent/Limi). Liss implements the synthesis method presented in this paper with several optimizations. For example, we take advantage of the fact that language inclusion violations can often be detected by exploring only a small fraction of the input automata by constructing \(\mathcal A([\![ \mathcal {C} ]\!]^{NP}_{abs})\) and \(\mathcal A([\![ \mathcal {C} ]\!]^{P}_{abs})\) on the fly.
Our prototype implementation has several limitations. First, Liss uses function inlining and therefore cannot handle recursive programs. Second, we do not implement any form of alias analysis, which can lead to unsound abstractions. For example, we abstract statements of the form “*x = 0” as writes to variable x, while in reality other variables can be affected due to pointer aliasing. We sidestep this issue by manually massaging input programs to eliminate aliasing.
Finally, Liss implements a simplistic lock insertion strategy. Inference rules (see Sect. 5) produce locks expressed as sets of instructions that should be inside a lock. Placing the actual lock and unlock instructions in the C code is challenging because the instructions in the trace may span several basic blocks or even functions. We follow a structural approach where we find the innermost common parent block for the first and last instructions of the lock and place the lock and unlock instruction there. This does not work if the code has gotos or returns that could cause control to jump over the unlock statement. At the moment, we simply report such situations to the user.
We evaluate our synthesis method against the following criteria: (1) Effectiveness of synthesis from implicit specifications; (2) Efficiency of the proposed synthesis procedure; (3) Precision of the proposed coarse abstraction scheme on realworld programs.
Experiments
Name  LOC  Th  It  MB  BF(s)  Syn(s)  Ver(s)  CR(s) 

ConRepair benchmarks [5]  
ex1.c  18  2  1  1  \(< 1\)s  \(< 1\)s  \(< 1\)s  \(< 1\)s 
ex2.c  23  2  1  1  \(< 1\)s  \(< 1\)s  \(< 1\)s  \(< 1\)s 
ex3.c  37  2  1  1  \(< 1\)s  \(< 1\)s  \(< 1\)s  \(< 1\)s 
ex5.c  42  2  3  1  \(< 1\)s  \(< 1\)s  2s  \(< 1\)s 
lcrc.c  35  4  0  1      \(< 1\)s  9s 
dv1394.c  37  2  1  1  \(< 1\)s  \(< 1\)s  \(< 1\)s  17s 
em28xx.c  20  2  1  1  \(< 1\)s  \(< 1\)s  \(< 1\)s  \(< 1\)s 
f_acm.c  80  3  1  1  \(< 1\)s  \(< 1\)s  \(< 1\)s  1871.99s 
i915_irq.c  17  2  1  1  \(< 1\)s  \(< 1\)s  \(< 1\)s  2.6s 
ipath.c  23  2  1  1  \(< 1\)s  \(< 1\)s  \(< 1\)s  12s 
iwl3945.c  26  3  1  1  \(< 1\)s  \(< 1\)s  \(< 1\)s  5s 
md.c  35  2  1  1  \(< 1\)s  \(< 1\)s  \(< 1\)s  1.5s 
myri10ge.c  60  4  0  1      \(< 1\)s  1.5s 
usbserial.bug1.c  357  7  2  1  0.4s  3.1s  3.4s  \({\infty }^{b}\) 
usbserial.bug2.c  355  7  1  3  0.7s  2.1s  12.9s  3563s 
usbserial.bug3.c  352  7  1  4  3.8s  1.3s  111.1s  \({\infty }^{b}\) 
usbserial.bug4.c  351  7  1  4  93.9s  2.4s  123.1s  \({\infty }^{b}\) 
\(\mathrm{usb}\hbox {}\mathrm{serial.c}^{a}\)  357  7  0  4      103.2s  1200s 
CPMAC driver benchmark  
cpmac.bug1.c  1275  5  1  1  1.3s  113.4s  21.9s   
cpmac.bug2.c  1275  5  1  1  3.3s  68.4s  27.8s   
cpmac.bug3.c  1270  5  1  1  5.4s  111.3s  8.7s   
cpmac.bug4.c  1276  5  2  1  2.4s  124.8s  31.5s   
cpmac.bug5.c  1275  5  1  1  2.8s  112.0s  58.0s   
\(\mathrm{cpmac.c}^{a}\)  1276  5  0  1      17.4s   
We use Liss to synthesize a preemptionsafe version of each benchmark. This method is based on the assumption that the benchmark is correct under nonpreemptive scheduling and bugs can only arise due to preemptive scheduling. We discovered two benchmarks (lcrc.c and myri10ge.c) that violated this assumption, i.e., they contained race conditions that manifested under nonpreemptive scheduling; Liss did not detect these race conditions. Liss was able to detect and fix all other known races without relying on assertions. Furthermore, Liss detected a new race in the usbserial family of benchmarks, which was not detected by ConRepair due to a missing assertion. We compared the output of Liss with manually placed synchronization (taken from real bug fixes) and found that the two versions were similar in most of our examples.
Performance and Precision. ConRepair uses CBMC for verification and counterexample generation. Due to the coarse abstraction we use, both steps are much cheaper with Liss. For example, verification of usbserial.c, which was the most complex in our set of benchmarks, took Liss 103 s, whereas it took ConRepair 20 min [5].
The loss of precision due to abstraction may cause the inclusion check to return a counterexample that is spurious in the concrete program, leading to unnecessary synchronization being synthesized. On our existing benchmarks, this only occurred once in the usbserial driver, where abstracting away the return value of a function led to an infeasible trace. We refined the abstraction manually by introducing a condition variable to model the return value.
While this result is encouraging, synthetic benchmarks are not necessarily representative of realworld performance. We therefore implemented another set of benchmarks based on a complete Linux driver for the TI AR7 CPMAC Ethernet controller. The benchmark was constructed as follows. We manually preprocessed driver source code to eliminate pointer aliasing. We combined the driver with a model of the OS API and the software interface of the device written in C. We modeled most OS API functions as writes to a special memory location. Groups of unrelated functions were modeled using separate locations. Slightly more complex models were required for API functions that affect thread synchronization. For example, the free_irq function, which disables the driver’s interrupt handler, blocks waiting for any outstanding interrupts to finish. Drivers can rely on this behavior to avoid races. We introduced a condition variable to model this synchronization. Similarly, most device accesses were modeled as writes to a special ioval variable. Thus, the only part of the device that required a more accurate model was its interrupt enabling logic, which affects the behavior of the driver’s interrupt handler thread.
Our original model consisted of eight threads. Liss ran out of memory on this model, so we simplified it to five threads by eliminating parts of driver functionality. Nevertheless, we believe that the resulting model represents the most complex synchronization synthesis case study, based on realworld code, reported in the literature.
The CPMAC driver used in this case study did not contain any known concurrency bugs, so we artificially simulated five typical race conditions that commonly occur in drivers of this type [4]. Liss was able to detect and automatically fix each of these defects (bottom part of Table 1). We only encountered two program locations where manual abstraction refinement was necessary.
We conclude that (1) our coarse abstraction is highly precise in practice; (2) manual effort involved in synchronization synthesis can be further reduced via automatic abstraction refinement; (3) additional work is required to improve the performance of our method to be able to handle realworld systems without simplification. In particular, our analysis indicates that significant speedup can be obtained by incorporating a partial order reduction scheme into the language inclusion algorithm.
7 Conclusion
We believe our approach and the encouraging experimental results open several directions for future research. Combining the abstraction refinement, verification (checking language inclusion modulo an independence relation), and synthesis (inserting synchronization) more tightly could bring improvements in efficiency. An additional direction we plan on exploring is automated handling of deadlocks, i.e., extending our technique to automatically synthesize deadlockfree programs. Finally, we plan to further develop our prototype tool and apply it to other domains of concurrent systems code.
Footnotes
 1.
The equivalence classes of \(\approx ^t\) are Mazurkiewicz traces.
References
 1.Alglave, J., Kroening, D., Nimal, V., Poetzl, D.: Don’t sit on the fence. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 508–524. Springer, Heidelberg (2014) Google Scholar
 2.Bertoni, A., Mauri, G., Sabadini, N.: Equivalence and membership problems for regular trace languages. In: Nielsen, M., Schmidt, E.M. (eds.) Automata, Languages and Programming. LNCS, pp. 61–71. Springer, Heidelberg (1982) Google Scholar
 3.Bloem, R., Hofferek, G., Könighofer, B., Könighofer, R., Außerlechner, S., Spörk, R.: Synthesis of synchronization using uninterpreted functions. In: FMCAD, pp. 35–42 (2014)Google Scholar
 4.Černý, P., Henzinger, T.A., Radhakrishna, A., Ryzhyk, L., Tarrach, T.: Efficient synthesis for concurrency by semanticspreserving transformations. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 951–967. Springer, Heidelberg (2013) Google Scholar
 5.Černý, P., Henzinger, T.A., Radhakrishna, A., Ryzhyk, L., Tarrach, T.: Regressionfree synthesis for concurrency. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 568–584. Springer, Heidelberg (2014) Google Scholar
 6.Cherem, S., Chilimbi, T., Gulwani, S.: Inferring locks for atomic sections. In: PLDI, pp. 304–315 (2008)Google Scholar
 7.Clarke, E., Kroning, D., Lerda, F.: A tool for checking ANSIC programs. In: Jensen, K., Podelski, A. (eds.) TACAS 2004. LNCS, vol. 2988, pp. 168–176. Springer, Heidelberg (2004) Google Scholar
 8.Clarke, E.M., Emerson, E.A.: Design and Synthesis of Synchronization Skeletons Using Branching Time Temporal Logic. Springer, Heidelberg (1982) Google Scholar
 9.De Wulf, M., Doyen, L., Henzinger, T.A., Raskin, J.F.: Antichains: a new algorithm for checking universality of finite automata. In: Ball, T., Jones, R.B. (eds.) CAV 2006. LNCS, vol. 4144, pp. 17–30. Springer, Heidelberg (2006) Google Scholar
 10.Deshmukh, J., Ramalingam, G., Ranganath, V.P., Vaswani, K.: Logical concurrency control from sequential proofs. In: Gordon, A.D. (ed.) ESOP 2010. LNCS, vol. 6012, pp. 226–245. Springer, Heidelberg (2010) Google Scholar
 11.Gupta, A., Henzinger, T., Radhakrishna, A., Samanta, R., Tarrach, T.: Succinct representation of concurrent trace sets. In: POPL15, pp. 433–444 (2015)Google Scholar
 12.Jin, G., Zhang, W., Deng, D., Liblit, B., Lu, S.: Automated ConcurrencyBug Fixing. In: OSDI, pp. 221–236 (2012)Google Scholar
 13.Ryzhyk, L., Chubb, P., Kuz, I., Heiser, G.: Dingo: Taming device drivers. In: Eurosys April 2009Google Scholar
 14.Sadowski, C., Yi, J.: User evaluation of correctness conditions: A case study of cooperability. In: PLATEAU, pp. 2:1–2:6 (2010)Google Scholar
 15.SolarLezama, A., Jones, C., Bodík, R.: Sketching concurrent data structures. In: PLDI, pp. 136–148 (2008)Google Scholar
 16.Vechev, M., Yahav, E., Yorsh, G.: Abstractionguided synthesis of synchronization. In: POPL, pp. 327–338 (2010)Google Scholar
 17.Vechev, M., Yahav, E., Raman, R., Sarkar, V.: Automatic verification of determinism for structured parallel programs. In: Cousot, R., Martel, M. (eds.) SAS 2010. LNCS, vol. 6337, pp. 455–471. Springer, Heidelberg (2010) Google Scholar
 18.From Nonpreemptive to Preemptive Scheduling using Synchronization Synthesis (full version). http://arxiv.org/abs/1505.04533