Data Flow Analysis of Asynchronous Systems using Infinite Abstract Domains

Asynchronous message-passing systems are employed frequently to implement distributed mechanisms, protocols, and processes. This paper addresses the problem of precise data flow analysis for such systems. To obtain good precision, data flow analysis needs to somehow skip execution paths that read more messages than the number of messages sent so far in the path, as such paths are infeasible at run time. Existing data flow analysis techniques do elide a subset of such infeasible paths, but have the restriction that they admit only finite abstract analysis domains. In this paper we propose a generalization of these approaches to admit infinite abstract analysis domains, as such domains are commonly used in practice to obtain high precision. We have implemented our approach, and have analyzed its performance on a set of 14 benchmarks. On these benchmarks our tool obtains significantly higher precision compared to a baseline approach that does not elide any infeasible paths and to another baseline that elides infeasible paths but admits only finite abstract domains.


Introduction
Distributed software that communicates by asynchronous message passing is a very important software paradigm in today's world. It is employed in varied domains, such as distributed protocols and workflows, event-driven systems, and UI-based systems. Popular languages used in this domain include Go (https: //golang.org/), Akka (https://akka.io/), and P (https://github.com/p-org).
Analysis and verification of asynchronous systems is an important problem, and poses a rich set of challenges. The research community has focused historically on a variety of approaches to tackle this overall problem, such as model checking and systematic concurrency testing [24,12], formal verification to check properties such as reachability or coverability of states [40,3,2,20,17,30,18,1], and data flow analysis [28].
Data flow analysis [31,29] is a specific type of verification technique that propagates values from an abstract domain while accounting for all paths in a arXiv:2101.10233v1 [cs.PL] 25 Jan 2021 program. It can hence be used to check whether a property or assertion always holds. The existing verification and data flow analysis approaches mentioned earlier have a major limitation, which is that they admit only finite abstract domains. This, in general, limits the classes of properties that can be successfully verified. On the other hand, data flow analysis of sequential programs using infinite abstract domains, e.g., constant propagation [31], interval analysis [11], and octagons [43], is a well developed area, and is routinely employed in verification settings. In this paper we seek to bridge this fundamental gap, and develop a precise data flow analysis framework for message-passing asynchronous systems that admits infinite abstract domains. if process is in passive mode then 5: receive a mesg and send this same mesg 6: else if message 1, i arrives then 7: if i = max then 8: Send message 2, i ; left := i 9: else 10: Declare max as the global maximum 11: nr leaders++; assert(nr leaders = 1) 12: else if message 2, j arrives then 13: if left > j and left > max then 14: max := left 15: Send message 1, max 16: else 17: Process enters passive mode To motivate our work we use a benchmark program 3 in the Promela language [24] that implements a leader election protocol [16]. In the protocol there is a ring of processes, and each process has a unique number. The objective is to discover the "leader", which is the process with the maximum number. The pseudo-code of each process in the protocol is shown in the left side of Figure 1. Each process has its own copy of local variables max and left, whereas nr leaders is a global variable that is common to all the processes (its initial value is zero). Each process sends messages to the next process in the ring via an unbounded FIFO channel. Each process becomes "ready" whenever a message is available for it to receive, and at any step of the protocol any one ready process (chosen non-deterministically) executes one iteration of its "while" loop. (We formalize these execution rules in a more general fashion in Section 2.1.) The messages are a 2-tuple x, i , where x can be 1 or 2, and 1 ≤ i ≤ max . The right side of Figure 1 shows a snapshot at an intermediate point during a run of the protocol. Each dashed arrow between two nodes represents a send of a message and a (completed) receipt of the same message. The block arrow depicts the channel from Process 2 to Process 1, which happens to contain three sent (but still unreceived) messages.
It is notable that in any run of the protocol, Lines 10-11 happen to get executed only by the actual leader process, and that too, exactly once. Hence, the assertion never fails. The argument for this claim is not straightforward, and we refer the reader to the paper [16] for the details.

Challenges in property checking
Data flow analysis could be used to verify the assertion in the example above, e.g., using the Constant Propagation (CP) abstract domain. This analysis determines at each program point whether each variable has a fixed value, and if yes, the value itself, across all runs that reach the point. In the example in Figure 1, all actual runs of the system that happen to reach Line 10 come there with value zero for the global variable nr leaders.
A challenge for data flow analysis on message-passing systems is that there may exist infeasible paths in the system. These are paths with more receives of a certain message than the number of copies of this message that have been sent so far. For instance, consider the path that consists of two back-to-back iterations of the "while" loop by the leader process, both times through Lines 3,6,9-11. This path is not feasible, due to the impossibility of having two copies of the message 1, max in the input channel [16]. The second iteration would bring the value 1 for nr leaders at Line 10, thus inferring a non-constant value and hence declaring the assertion as failing (which would be a false positive).
Hence, it is imperative in the interest of precision for any data flow analysis or verification approach to track the channel contents as part of the exploration of the state space. Tracking the contents of unbounded channels precisely is known to be undecidable even when solving problems such as reachability and coverability (which are simpler than data flow analysis). Hence, existing approaches either bound the channels (which in general causes unsoundness), or use sound abstractions such as unordered channels (also known as the Petri Net or VASS abstraction) or lossy channels. Such abstractions suffice to elide a subset of infeasible paths. In our running example, the unordered channel abstraction happens to suffice to elide infeasible paths that could contribute to a false positive at the point of the assertion. However, the analysis would need to use an abstract domain such as CP to track the values of integer variables. This is an infinite domain (due to the infinite number of integers). The most closely related previous dataflow analysis approach for distributed systems [28] does use the unordered channel abstraction, but does not admit infinite abstract domains, and hence cannot verify assertions such as the one in the example above.

Our Contributions
This paper is the first one to the best of our knowledge to propose an approach for data flow analysis for asynchronous message-passing systems that (a) admits infinite abstract domains, (b) uses a reasonably precise channel abstraction among the ones known in the literature (namely, the unordered channels abstraction), and (c) computes maximally precise results possible under the selected channel abstraction. Every other approach we are aware of exhibits a strict subset of the three attributes listed above. It is notable that previous approaches do tackle the infinite state space induced by the unbounded channel contents. However, they either do not reason about variable values at all, or only allow variables that are based on finite domains.
Our primary contribution is an approach that we call Backward DFAS. This approach is maximally precise, and admits a class of infinite abstract domains. This class includes well-known examples such as Linear Constant Propagation (LCP) [50] and Affine Relationships Analysis (ARA) [45], but does not include the full (CP) analysis. We also propose another approach, which we call Forward DFAS, which admits a broader class of abstract domains, but is not guaranteed to be maximally precise on all programs.
We describe a prototype implementation of both our approaches. On a set of 14 real benchmarks, which are small but involve many complex idioms and paths, our tool verifies approximately 50% more assertions than our implementation of the baseline approach [28].
The rest of the paper is structured as follows. Section 2 covers the background and notation that will be assumed throughout the paper. We present the Backward DFAS approach in Section 3, and the Forward DFAS approach in Section 4. Section 5 discusses our implementation and evaluation. Section 6 discusses related work, and Section 7 concludes the paper.

Background and Terminology
Vector addition systems with states or VASS [26] are a popular modelling technique for distributed systems. We begin this section by defining an extension to VASS, which we call a VASS-Control Flow Graph or VCFG. Definition 1. A VASS-Control Flow Graph or VCFG G is a graph, and is described by the tuple Q, δ, r, q 0 , V, π, θ , where Q is a finite set of nodes, δ ⊆ Q × Q is a finite set of edges, r ∈ N, q 0 is the start node, V is a set of variables or memory locations, π : δ → A maps each edge to an action, where A ≡ ((V → Z) → (V → Z)), θ : δ → Z r maps each edge to a vector in Z r .
For any edge e = (q 1 , q 2 ) ∈ δ, if π(e) = a and θ(e) = w, then a is called the action of e and w is called the queuing vector of e. This edge is depicted as q 1 a,w − − → q 2 . The variables and the actions are the only additional features of a VCFG over VASS.
A configuration of a VCFG is a tuple q, c, ξ , where q ∈ Q, c ∈ N r and ξ ∈ (V → Z). The initial configuration of a VCFG is q 0 , 0, ξ 0 , where 0 denotes a vector with r zeroes, and ξ 0 is a given initial valuation for the variables. The VCFG can be said to have r counters. The vector c in each configuration can be thought of as a valuation to the counters. The transitions between VCFG configurations are according to the rule below: e = (q 1 , q 2 ), e ∈ δ, π(e) = a, θ(e) = w, a(ξ 1 ) = ξ 2 , c 1 + w = c 2 , c 2 ≥ 0 q 1 , c 1 , ξ 1 ⇒ e q 2 , c 2 , ξ 2

Modeling of Asynchronous Message Passing Systems as VCFGs
Asynchronous systems are composed of finite number of independently executing processes that communicate with each other by passing messages along FIFO channels. The processes may have local variables, and there may exist shared (or global) variables as well. For simplicity of presentation we assume all variables are global. x ≔ x + 1, 〈0,0〉 x ≔ x + 1, 〈0,0〉 (a) (b) Fig. 2. (a) Asynchronous system with two processes, (b) its VCFG model Figure 2(a) shows a simple asynchronous system with two processes. In this system there are two channels, c 1 and c 2 , and a message alphabet consisting of two elements, m 1 and m 2 . The semantics we assume for message-passing systems is the same as what is used by the tool Spin [24]. A configuration of the system consists of the current control states of all the processes, the contents of all the channels, and the values of all the variables. A single transition of the system consists of a transition of one of the processes from its current control-state to a successor control state, accompanied with the corresponding queuing operation or variable-update action. A transition labeled c ! m can be taken unconditionally, and results in 'm' being appended to the tail of the channel 'c'. A transition labeled c ? m can be taken only if an instance of 'm' is available at the head of 'c', and results in this instance getting removed from 'c'. (Note, based on the context, we over-load the term "message" to mean either an element of the message alphabet, or an instance of a message-alphabet element in a channel at run-time.) Asynchronous systems can be modeled as VCFGs, and our approach performs data flow analysis on VCFGs. We now illustrate how an asynchronous system can be modeled as a VCFG. We assume a fixed number of processes in the system. We do this illustration using the example VCFG in Figure 2(b), which models the system in Figure 2(a). Each node of the VCFG represents a tuple of control-states of the processes, while each edge corresponds to a transition of the system. The action of a VCFG edge is identical to the action that labels the corresponding process transition. ("id" in Figure 2(b) represents the identity action) The VCFG will have as many counters as the number of unique pairs (c i , m j ) such that the operation c i ! m j is performed by any process. If an edge e in the VCFG corresponds to a send transition c i ! m j of the system, then e's queuing vector would have a +1 for the counter corresponding to (c i , m j ) and a zero for all the other counters. Analogously, a receive operation gets modeled as -1 in the queuing vector. In Figure 2(b), the first counter is for (c 1 ,m 1 ) while the second counter is for (c 2 ,m 2 ). Note that the +1 and -1 encoding (which are inherited from VASS's) effectively cause FIFO channels to be treated as unordered channels.
When each process can invoke procedures as part of its execution, such systems can be modeled using inter-procedural VCFGs, or iVCFGs. These are extensions of VCFGs just as standard inter-procedural control-flow graphs are extensions of control-flow graphs. Constructing an iVCFG for a given system is straightforward, under a restriction that at most one of the processes in the system can be executing a procedure other than its main procedure at any time. This restriction is also present in other related work [28,4].

Data flow analysis over iVCFGs
Data flow analysis is based on a given complete lattice L, which serves as the abstract domain. As a pre-requisite step before we can perform our data flow analysis on iVCFGs, we first consider each edge v a,w − − → w in each procedure in the iVCFG, and replace the (concrete) action a with an abstract action f , where f : L → L is a given abstract transfer function that conservatively overapproximates [11] the behavior of the concrete action a.
Let p be a path in a iVCFG, let p 0 be the first node in the path, and let ξ i be a valuation to the variables at the beginning of p. The path p is said to be feasible if, starting from the configuration p 0 , 0, ξ i , the configuration q, d, ξ obtained at each successive point in the path is such that d ≥ 0, with successive configurations along the path being generated as per the rule for transitions among VCFG configurations that was given before Section 2.1. For any path p = e 1 e 2 . . . e k of an iVCFG, we define its path transfer function ptf (p) as f e k • f e k−1 . . . • f e1 , where f e is the abstract action associated with edge e. t≔x≔ y≔ z≔0 The standard data flow analysis problem for sequential programs is to compute the join-over-all-paths (JOP) solution. Our problem statement is to compute the join-over-all-feasible-paths (JOFP) solution for iVCFGs. Formally stated, if start is the entry node of the "main" procedure of the iVCFG, given any node target in any procedure of the iVCFG, and an "entry" value d 0 ∈ L at start such that d 0 conservatively over-approximates ξ 0 , we wish to compute the JOFP value at target as defined by the following expression: p is a feasible and interprocedurally valid path in the iVCFG from start to target (ptf (p))(d 0 ) Intuitively, due to the unordered channel abstraction, every run of the system corresponds to a feasible path in the iVCFG, but not vice versa. Hence, the JOFP solution above is guaranteed to conservatively over-approximate the JOP solution on the runs of the system (which is not computable in general).

Backward DFAS Approach
In this section we present our key contribution -the Backward DFAS (Data Flow Analysis of Asynchronous Systems) algorithm -an interprocedural algorithm that computes the precise JOFP at any given node of the iVCFG.
We begin by presenting a running example, which is the iVCFG with two procedures depicted in Figure 3. There is only one channel and one message in the message alphabet in this example, and hence the queuing vectors associated with the edges are of size 1. The edges without the vectors are implicitly associated with zero vectors. The actions associated with edges are represented in the form of assignment statements. The edges without assignment statements next to them have identity actions. The upper part of the Figure 3, consisting of nodes a, b, p, q, h, i, j, k, l, is the VCFG of the "main" procedure. The remaining nodes constitute the VCFG of the (tail) recursive procedure foo. The solid edges are intra-procedural edges, while dashed edges are inter-procedural edges.
Throughout this section we use Linear Constant Propagation (LCP) [50] as our example data flow analysis. LCP, like CP, aims to identify the variables that have constant values at any given location in the system. LCP is based on the same infinite domain as CP; i.e., each abstract domain element is a mapping from variables to (integer) values. The " " relation for the LCP lattice is also defined in the same way as for CP. The encoding of the transfer functions in LCP is as follows. Each edge (resp. path) maps the outgoing value of each variable to either a constant, or to a linear expression in the incoming value of at most one variable into the edge (resp. path), or to a special symbol that indicates an unknown outgoing value. For instance, for the edge g → m in Figure 3, its transfer function can be represented symbolically as (t'=t,x'=x+1,y'=y,z'=z), where the primed versions represent outgoing values and unprimed versions represent incoming values.
Say we wish to compute the JOFP at node k. The only feasible paths that reach node k are the ones that attain calling-depth of three or more in the procedure foo, and hence encounter at least three send operations, which are required to clear the three receive operations encountered from node h to node k. All such paths happen to bring the constant values (t = 1, z = 1) to the node k. Hence, (t = 1, z = 1) is the precise JOFP result at node k. However, infeasible paths, if not elided, can introduce imprecision. For instance, the path that directly goes from node c to node o in the outermost call to the Procedure foo (this path is of calling-depth zero) brings values of zero for all four variables, and would hence prevent the precise fact (t = 1, z = 1) from being inferred.

Assumptions and Definitions
The set of all L → L transfer functions clearly forms a complete lattice based on the following ordering: f 1 f 2 iff for all d ∈ L, f 1 (d) f 2 (d). Backward DFAS makes a few assumptions on this lattice of transfer functions. The first is that this lattice be of finite height; i.e., all strictly ascending chains of elements in this lattice are finite (although no a priori bound on the sizes of these chains is required). The second is that a representation of transfer functions is available, as are operators to compose, join, and compare transfer functions. Note, the two assumptions above are also made by the classical "functional" inter-procedural approach of Sharir and Pnueli [54]. Thirdly, we need distributivity, as defined below: for any The distributivity assumption is required only if the given system contains recursive procedure calls.
Linear Constant Propagation (LCP) [50] and Affine Relationships Analysis (ARA) [45] are well-known examples of analyses based on infinite abstract domains that satisfy all of the assumptions listed above. Note that the CP transferfunctions lattice is not of finite height. Despite the LCP abstract domain being the same as the CP abstract domain, the encoding chosen for LCP transfer functions (which was mentioned above), ensures that LCP uses a strict, finiteheight subset of the full CP transfer-functions lattice that is closed under join and function composition operations. The trade-off is that LCP transfer functions for assignment statements whose RHS is not a linear expression and for conditionals are less precise than the corresponding CP transfer functions.
Our final assumption is that procedures other than "main" may send messages, but should not have any "receive" operations. Previous approaches that have addressed data flow analysis or verification problems for asynchronous systems with recursive procedures also have the same restriction [53,28,18].
We now introduce important terminology. The demand of a given path p in the VCFG is a vector of size r, and is defined as follows: Intuitively, the demand of a path p is the minimum required vector of counter values in any starting configuration at the entry of the path for there to exist a sequence of transitions among configurations that manages to traverse the entire path (following the rule given before Section 2.1). It is easy to see that a path p is feasible iff demand (p) = 0.
A set of paths C is said to cover a path p iff: (a) all paths in C have the same start and end nodes (respectively) as p, (b) for each p ∈ C, demand (p ) ≤ demand (p), and (c) ( p ∈C ptf (p )) ptf (p). (Regarding (b), any binary vector operation in this paper is defined as applying the same operation on every pair of corresponding entries, i.e., point-wise.) A path template (p 1 , p 2 , . . . , p n ) of any procedure F i is a sequence of paths in the VCFG of F i such that: (a) path p 1 begins at the entry node en Fi of F i and path p n ends at return node ex Fi of F i , (b) for all p i , 1 ≤ i < n, p i ends at a call-site node, and (c) for all p i , 1 < i ≤ n, p i begins at a return-site node v i r , such that v i r corresponds to the call-site node v i−1 c at which p i−1 ends.

Properties of Demand and Covering
At a high level, Backward DFAS works by growing paths in the backward direction by a single edge at a time starting from the target node (node k in our example in Figure 3). Every time this process results in a path reaching the start node (node a in our example), and the path is feasible, the approach simply transfers the entry value d 0 via this path to the target node. The main challenge is that due to the presence of cycles and recursion, there are an infinite number of feasible paths in general. In this subsection we present a set of lemmas that embody our intuition on how a finite subset of the set of all paths can be enumerated such that the join of the values brought by these paths is equal to the JOFP. We then present our complete approach in Section 3.3. Demand Coverage Lemma: Let p 2 and p 2 be two paths from a node v i to a node v j such that demand (p 2 ) ≤ demand (p 2 ). If p 1 is any path ending at v i , then demand (p 1 .p 2 ) ≤ demand (p 1 .p 2 ). This lemma can be argued using induction on the length of path p 1 . A similar observation has been used to solve coverability of lossy channels and well-structured transition systems in general [3,17,2]. An important corollary of this lemma is that for any two paths p 2 and p 2 from v i to v j such that demand (p 2 ) ≤ demand (p 2 ), if there exists a path p 1 ending at v i such that p 1 .p 2 is feasible, then p 1 .p 2 is also feasible.
Function Coverage Lemma: Let p 2 be a path from a node v i to a node v j , and P 2 be a set of paths from v i to v j such that ( p 2 ∈P2 ptf (p 2 )) ptf (p 2 ). Let p 1 be any path ending at v i and p 3 be any path beginning at v j . Under the distributivity assumption stated in Section 3.1, the following property holds: The following result follows from the Demand and Function Coverage Lemmas and from monotonicity of the transfer functions: Corollary 1: Let p 2 be a path from a node v i to a node v j , and P 2 be a set of paths from v i to v j such that P 2 covers p 2 . Let p 1 be any path ending at v i . Then, the set of paths {p 1 .p 2 | p 2 ∈ P 2 } covers the path p 1 .p 2 .
We now use the running example from Figure 3 to illustrate how we leverage Corollary 1 in our approach. When we grow paths in backward direction from the target node k, two candidate paths that would get enumerated (among others) are p i ≡ hijk and p j ≡ hijkhijk (in that order). Now, p i covers p j . Therefore, by Corollary 1, any backward extension p 1 .p j of p j (p 1 is any path prefix) is guaranteed to be covered by the analogous backward extension p 1 .p i of p i . By definition of covering, it follows that p 1 .p i brings in a data value that conservatively over-approximates the value brought in by p 1 .p j . Therefore, our approach discards p j as soon as it gets enumerated. To summarize, our approach discards any path as soon as it is enumerated if it is covered by some subset of the previously enumerated and retained paths.
Due to the finite height of the transfer functions lattice, and because demand vectors cannot contain negative values, at some point in the algorithm every new path that can be generated by backward extension at that point would be discarded immediately. At this point the approach would terminate, and soundness would be guaranteed by definition of covering.
In the inter-procedural setting the situation is more complex. We first present two lemmas that set the stage. The lemmas both crucially make use of the assumption that recursive procedures are not allowed to have "receive" operations. For any path p a that contains no receive operations, and for any demand vector d, we first define supply(p a , d) as min(s, d), where s is the sum of the queuing vectors of the edges of p a .
Supply Limit Lemma: Let p 1 , p 2 be two paths from v i to v j such that there are no receive operations in p 1 and p 2 . Let p b be any path beginning at v j . If demand (p b ) = d, and if supply(p 1 , d) ≥ supply(p 2 , d), then demand ( A set of paths P is said to d-supply-cover a path p a iff: (a) all paths in P have the same start node and same end node (respectively) as p a , (b) ( p ∈P ptf (p )) ptf (p a ), and (c) for each p ∈ P , supply(p , d) ≥ supply(p a , d).
Supply Coverage Lemma: If p a .p b is a path, and demand (p b ) = d, and if a set of paths P d-supply-covers p a , and p a as well as all paths in P have no receive operations, then the set of paths {p .p b | p ∈ P } covers the path p a .p b .
Proof argument: Since P d-supply-covers p a , by the Supply Limit Lemma, we have (a): for all p ∈ P , demand (p .p b ) ≤ demand (p a .p b ). Since P d-supplycovers p a , we also have ( p ∈P ptf (p )) ptf (p a ). From this, we use the Function Coverage lemma to infer that (b): The result now follows from (a) and (b).
Consider path hijk in our example, which gets enumerated and retained (as discussed earlier). This path gets extended back as qhijk ; let us denote this path as p . Let d be the demand of p (i.e., is equal to 3). Our plan now is to extend this path in the backward direction all the way up to node p, by prepending interprocedurally valid and complete (i.e., IVC) paths of procedure foo in front of p . An IVC path is one that begins at the entry node of foo, ends at the return node of foo, is of arbitrary calling depth, has balanced calls and returns, and has no pending returns when it completes [49]. First, we enumerate the IVC path(s) with calling-depth zero (i.e., path co in the example), and prepend them in front of p . We then produce deeper IVC paths, in phases. In each phase i, i > 0, we inline IVC paths of calling-depth i − 1 that have been enumerated and retained so far into the path templates of the procedure to generate IVC paths of calling-depth i, and prepend these IVC paths in front of p . We terminate when each IVC path that is generated in a particular phase j is d-supply-covered by some subset P of IVC paths generated in previous phases.
The soundness of discarding the IVC paths of phase j follows from the Supply Coverage lemma (p would take the place of p b in the lemma's statement, while the path generated in phase j would take the place of p a in the lemma statement). The termination condition is guaranteed to be reached eventually, because: (a) the supplies of all IVC paths generated are limited to d, and (b) the lattice of transfer functions is of finite height. Intuitively, we could devise a sound termination condition even though deeper and deeper IVC paths can increment counters more and more, because a deeper IVC path that increments the counters beyond the demand of p does not really result in lower overall demand when prepended before p than a shallower IVC path that also happens to meet the demand of p (Supply Limit lemma formalizes this).
We also need a result that when the IVC paths in the jth phase are d-supplycovered by paths generated in preceding phases, then the IVC paths that would be generated in the (j + 1)th would also be d-supply-covered by paths generated Returns JOFP from start ∈ Nodes to target ∈ Nodes, entry value d0 ∈ L.

2:
for all v ∈ Nodes do Nodes is the set of all nodes in the VCFG 3: For each intra-proc VCFG edge v → target, add this edge to workList and to sPaths(v) 5: repeat 6: Remove any path p from workList. 7: Let v1 be the start node of p.

8:
if v1 is a return-site node, with incoming return edge from func. F1 then 9: Let v3 be the call-site node corresponding to v1, e1 be the call-site-toentry edge from v3 to enF 1 , and r1 be the exit-to-return-site edge from exF 1 to v1.

24:
until workList is empty 25: P = {p | p ∈ sPaths(start), demand (p) = 0} 26: return p∈P (ptf (p))(d0) in phases that preceded j. This can be shown using a variant of the Supply Coverage Lemma, which we omit in the interest of space. Once this is shown, it then follows inductively that none of the phases after phase j are required, which would imply that it would be safe to terminate. The arguments presented above were in a restricted setting, namely, that there is only one call in each procedure, and that only recursive calls are allowed. These restrictions were assumed only for simplicity, and are not actually assumed in the algorithm to be presented below.

Data Flow Analysis Algorithm
Our approach is summarized in Algorithm 1. ComputeJOFP is the main routine. The algorithm works on a given iVCFG (which is an implicit parameter to the algorithm), and is given a target node at which the JOFP is to be com-Algorithm 2 Routines invoked for inter-procedural processing in Backward DFAS algorithm Returns a set of paths that d-supply-covers each IVC path of the procedure F .

puted.
A key data structure in the algorithm is sPaths; for any node v, sPaths(v) is the set of all paths that start from v and end at target that the algorithm has generated and retained so far. The workList at any point stores a subset of the paths in sPaths, and these are the paths of the iVCFG that need to be extended backward.
To begin with, all edges incident onto target are generated and added to the sets sPaths and workList (Line 4 in Algorithm 1). In each step the algorithm picks up a path p from workList (Line 6), and extends this path in the backward direction. The backward extension has three cases based on the start node of the path p. The simplest case is the intra-procedural case, wherein the path is extended backwards in all possible ways by a single edge (Lines 21-23). The routine Covered, whose definition is not shown in the algorithm, checks if its first argument (a path) is covered by its second argument (a set of paths). Note, covered paths are not retained.
When the start node of p is the entry node of a procedure F 1 (Lines [14][15][16][17][18][19], the path is extended backwards via all possible call-site-to-entry edges for procedure F 1 . If the starting node of path p is a return-site node v 1 (Lines 8-13) in a calling procedure, we invoke a routine ComputeEndToEnd (in line 10 of Algorithm 1). This routine, which we explain later, returns a set IVC paths of the called procedure such that every IVC path of the called procedure is d-supply-covered by some subset of paths in the returned set, where d denotes demand (p). These returned IVC paths are prepended before p (Line 11), with the call-edge e 1 and return edge r 1 appropriately inserted.
The final result returned by the algorithm (see Lines 25 and 26 in Algorithm 1) is the join of the values transferred by the zero-demand paths (i.e., feasible paths) starting from the given entry value d 0 ∈ L.
Routine ComputeEndToEnd: This routine is specified in Algorithm 2, and is basically a generalization of the approach that we described in Section 3.2, now handling multiple call-sites in each procedure, mutual recursion, calls to nonrecursive procedures, etc. We do assume for simplicity of presentation that there are no cycles (i.e., loops) in the procedures, as this results in a fixed number of path templates in each procedure. There is no loss of generality here because we allow recursion. The routine incrementally populates a group of sets -there is a set named sIVCPaths(F i , d) for each procedure F i in the system. The idea is that when the routine completes, sIVCPaths(F i , d) will contain a set of IVC paths of F i that d-supply-cover all IVC paths of F i . Note that we simultaneously populate covering sets for all the procedures in the system in order to handle mutual recursion.
The routine ComputeEndToEnd first enumerates and saves all zero-depth paths in all procedures (see Line 3 in Algorithm 2). The routine then iteratively takes a path template at a time, and fills in the "holes" between corresponding (call-site, return-site) pairs of the form v i−1 c , v i r in the path template with IVC paths of the procedure that is called from this pair of nodes, thus generating a deeper IVC path (see the loop in lines 6-11). A newly generated IVC path p is retained only if it is not d-supply-covered by other IVC paths already generated for the current procedure F i (Lines 10-11). The routine terminates when no more IVC paths that can be retained are generated, and returns the set sIVCPaths(F, d).

Illustration
We now illustrate our approach using the example in Figure 3. Algorithm 1 would start from the target node k, and would grow paths one edge at a time. After four steps the path hijk would be added to sPaths(h) (the intermediate steps would add suffixes of this path to sPaths(i), sPaths(j), and sPaths(k)). Next, path khijk would be generated and discarded, because it is covered by the "root" path k . Hence, further iterations of the cycle are avoided. On the other hand, the path hijk would get extended back to node q, resulting in path qhijk being retained in sPaths(q). This path would trigger a call to routine ComputeEndToEnd. As discussed in Section 3.2, this routine would return the following set of paths: p 0 = co, and p i = (cdefgm) i co(no) i for each 1 ≤ i ≤ 4. (Recall, as discussed in Section 3.2, that (cdefgm) 5 co(no) 5 and deeper IVC paths are 3-supply-covered by the paths {p 3 , p 4 }.) Each of the paths returned above by the routine ComputeEndToEnd would be prepended in front of qhijk , with the corresponding call and return edges inserted appropriately. These paths would then be extended back to node a. Hence, the final set of paths in sPaths(a) would be abpcoqhijk , abpcdefgmconoqhijk , abp(cdefgm) 2 co(no) 2 , abp(cdefgm) 3 co(no) 3 , and abp(cdefgm) 4 co(no) 4 . Of these paths, the first two are ignored, as they are not feasible. The initial data-flow value (in which all variables are non-constant) is sent via the remaining three paths. In all these three paths the final values of variables 't' and 'z' are one. Hence, these two constants are inferred at node k.

Properties of the algorithm
We provide argument sketches here about the key properties of Backward DFAS. Detailed proofs are available in the appendix.
Termination. The argument is by contradiction. For the algorithm to not terminate, one of the following two scenarios must happen. The first is that an infinite sequence of paths gets added to some set sPaths(v). By Higman's lemma it follows that embedded within this infinite sequence there is an infinite sequence p 1 , p 2 , . . ., such that for all i, demand (p i ) ≤ demand (p i+1 ). Because the algorithm never adds covered paths, it follows that for all i: 1≤k≤i+1 ptf (p k ) 1≤k≤i ptf (p k ). However, this contradicts the assumption that the lattice of transfer functions is of finite height. The second scenario is that an infinite sequence of IVC paths gets added to some set sIVCPaths(F, d) for some procedure F and some demand vector d in some call to routine ComputeEndToEnd. Because the "supply" values of the IVC paths are bounded by d, it follows that embedded within the infinite sequence just mentioned there must exist an infinite sequence of paths p 1 , p 2 , . . ., such that for all i, supply(p i , d) ≥ supply(p i+1 , d). However, since d-supply-covered paths are never added, it follows that for all i: 1≤k≤i ptf (p k ). However, this contradicts the assumption that the lattice of transfer functions is of finite height.
Soundness and Precision. We already argued informally in Section 3.2 that the algorithm explores all feasible paths in the system, omitting only paths that are covered by other already-retained paths. By definition of covering, this is sufficient to guarantee over-approximation of the JOFP. The converse direction, namely, under-approximation, is obvious to see as every path along which the data flow value d 0 is sent at the end of the algorithm is a feasible path. Together, these two results imply that the algorithm is guaranteed to compute the precise JOFP.
Complexity. We show the complexity of our approach in the single-procedure setting. Our analysis follows along the lines of the analysis of the backwards algorithm for coverability in VASS [5]. The overall idea, is to use the technique of Rackoff [47] to derive a bound on the length of the paths that need to be considered. We derive a complexity bound of O(∆.h 2 .L 2r+1 .r.log(L)), where ∆ is the total number of transitions in the VCFG, Q is the number of VCFG nodes, h is the height of lattice of L → L functions, and L = (Q.(h + 1).2) (3r)!+1 .

Forward DFAS Approach
The Backward DFAS approach, though precise, requires the transfer function lattice to be of finite height. Due to this restriction, infinite-height abstract domains like Octagons [43], which need widening [11], are not accommodated by Backward DFAS. To address this, we present the Forward DFAS approach, which admits any complete lattice as an abstract domain (if the lattice is of infinite height then a widening operator should also be provided). The trade-off is precision. Forward DFAS elides only some of the infeasible paths in the VCFG, and hence, in general, computes a conservative over-approximation of the JOFP. Forward DFAS is conceptually not as sophisticated as Backward DFAS, but is still a novel proposal from the perspective of the literature.
The Forward DFAS approach is structured as an instantiation of Kildall's data flow analysis framework [31]. This framework needs a given complete lattice, the elements of which will be propagated around the VCFG as part of the fix point computation. Let L be the given underlying finite or infinite complete lattice. L either needs to not have any infinite ascending chains (e.g., Constant Propagation), or L needs to have an associated widening operator " L ". The complete lattice D that we use in our instantiation of Kildall's framework is defined as D ≡ D r,κ → L, where κ ≥ 0 is a user-given non-negative integer, and D r,κ is the set of all vectors of size r (where r is the number of counters in the VCFG) such that all entries of the vectors are integers in the range [0, κ]. The ordering on this lattice is as follows: . If a widening operator L has been provided for L, we define a widening operator for D as follows: We now need to define the abstract transfer functions with signature D → D for the VCFG edges, to be used within the data flow analysis. As an intermediate step to this end, we define a ternary relation boundedMove1 as follows. Any triple of integers (p, q, s) ∈ boundedMove1 iff We now define a ternary relation boundedMove on vectors. A triple of vectors (c 1 , c 2 , c 3 ) belongs to relation boundedMove iff all three vectors are of the same size, and for each index i, We now define the D → D transfer function for the VCFG edge q 1 f,w − − → q 2 as follows: We can now invoke Kildall's algorithm using the fun transfer functions defined above at all VCFG edges, using l 0 as the fact at the "entry" to the "main" procedure. After Kildall's algorithm has finished computing the fix point solution, if l D v ∈ D is the fix point solution at any node v, we return the value c∈Dr,κ l D v (c) as the final result at v. The intuition behind the approach above is as follows. If v is a vector in the set D r,κ , and if (c, m) is a channel-message pair, then the value in the (c, m)th slot of v encodes the number of instances of message m in channel c currently. An important note is that if this value is κ, it actually indicates that there are κ or more instances of message m in channel c, whereas if the value is less than κ it represents itself. Hence, we can refer to vectors in D r,κ as bounded queue configurations. If d ∈ D is a data flow fact that holds at a node of the VCFG after data flow analysis terminates, then for any v ∈ D r,κ if d(v) = l, it indicates that l is a (conservative over-approximation) of the join of the data flow facts brought by all feasible paths that reach the node such that the counter values at the ends of these paths are as indicated by v (the notion of what counter values are indicated by a vector v ∈ D r,κ was described earlier in this paragraph).
The relation boundedMove is responsible for blocking the propagation along some of the infeasible paths. The intuition behind it is as follows. Let us consider a VCFG edge q 1 If c 1 is a bounded queue configuration at node q 1 , then, c 1 upon propagation via this edge will become a bounded queue configuration c 2 at q 2 iff (c 1 , w, c 2 ) ∈ boundedMove. Lines (a) and (b) in the definition of boundedMove1 correspond to sending a message; line (b) basically throws away the precise count when the number of messages in the channel goes above κ. Line (c) corresponds to receiving a message when all we know is that the number of messages currently in the channel is greater than or equal to κ. Line (d ) is key for precision when the channel has less than κ messages, as it allows a receive operation to proceed only if the requisite number of messages are present in the channel.
The formulation above extends naturally to inter-procedural VCFGs using generic inter-procedural frameworks such as the call strings approach [54]. We omit the details of this in the interest of space.
Properties of the approach: Since Forward DFAS is an instantiation of Kildall's algorithm, it derives its properties from the same. As the set D r,k is a finite set, it is easy to see that the fix-point algorithm will terminate.
To argue the soundness of the algorithm, we consider the concrete lattice D c ≡ D r → L, and the following "concrete" transfer function for the VCFG edge q 1 c1∈Dr such that c1+w=c2 f (l(c 1 )) , where D r is the set of all vectors of size r of natural numbers. We then argue that the abstract transfer function fun defined earlier is a consistent abstraction [11] of fun conc. This soundness argument is given in detail in the appendix.
If we restrict our discussion to single-procedure systems, the complexity of our approach is just the complexity of applying Kildall's algorithm. This works out to O(Q 2 κ r h), where Q is the number of VCFG nodes, and h is either the height of the lattice L or the maximum increasing sequence of values from L that is obtainable at any point using the lattice L in conjunction with Kildall's algorithm, using the given widening operation L .  In this illustration we assume a context insensitive analysis for simplicity (it so happens that context sensitivity does not matter in this specific example). We use the value κ = 3. Each small table is a data flow fact, i.e., an element of D ≡ D r,κ → L.
The top-left cell in the table shows the node at which the fact arises. In each row the first column shows the counter value, while the remaining columns depict the known constant value of the variables ( indicates unknown). Here are some interesting things to note. When any tuple of constant values transfers along the path from node c to node m, the constant values get updated due to the assignment statements encountered, and this tuple shifts from counter i to counter i + 1 (if i is not already equal to κ) due to the "send" operation encountered. When we transition from Step (5) to Step (6) in the figure, we get 's, as counter values 2 and 3 in Step (5) both map to counter value 3 in Step (6) due to κ being 3 (hence, the constant values get joined ). The value at node o (in Step (7)) is the join of values from Steps (5) and (6). Finally, when the value at node o propagates to node k, the tuple of constants associated with counter value 3 end up getting mapped to all lower values as well due to the receive operations encountered. Note, the precision of our approach in general increases with the value of κ (the running time increases as well). For instance, if κ is set to 2 (rather than 3) in the example, some more infeasible paths would be traversed. Only z = 1 would be inferred at node k, instead of (t = 1, z = 1).

Implementation and Evaluation
We have implemented prototypes of both the Forward DFAS and Backward DFAS approaches, in Java. Both the implementations have been parallelized, using the ThreadPool library. With Backward DFAS the iterations of the outer "repeat" loop in Algorithm 1 run in parallel, while with Forward DFAS propagations of values from different nodes to their respective successors happen in parallel. Our implementations currently target systems without procedure calls, as none of our benchmarks had recursive procedure calls.
Our implementations accept a given system, and a "target" control state q in one of the processes of the system at which the JOFP is desired. They then construct the VCFG from the system (see Section 2.1), and identify the target set of q, which is the set of VCFG nodes in which q is a constituent. For instance, in Figure 2, the target set for control state e is {(a, e), (b, e)}. The JOFPs at the nodes in the target set are then computed, and the join of these JOFPs is returned as the result for q.
Each variable reference in any transition leaving any control state is called a "use". For instance, in Figure 2, the reference to variable x along the outgoing transition from state d is one use. In all our experiments, the objective is to find the uses that are definitely constants by computing the JOFP at all uses. This is a common objective in many research papers, as finding constants enables optimizations such as constant folding, and also checking assertions in the code. We instantiate Forward DFAS with the Constant Propagation (CP) analysis, and Backward DFAS with the LCP analysis (for the reason discussed in Section 3.1). We use the bound κ = 2 in all runs of Forward DFAS, except with two benchmarks which are too large to scale to this bound. We discuss this later in this section. All the experiments were run on a machine with 128GB RAM and four AMD Opteron 6386 SE processors (64 cores total). We use 14 benchmarks for our evaluations. These are described in the first two columns of Table 1. Four benchmarks -bartlett, leader, lynch, and peterson -are Promela models for the Spin model-checker. Three benchmarks -boundedAsync, receive1, and replicatingStorage -are from the P language repository (www. github.com/p-org). Two benchmarks -server and chameneos -are from the Basset repository (www.github.com/SoftwareEngineeringToolDemos/FSE-2010-Basset). Four benchmarks -event bus test, jobqueue test, nursery test, and bookCollec-tionStore -are real world Go programs. There is one toy example "mutex", for ensuring mutual exclusion, via blocking receive messages, that we have made ourselves. We provide precise links to the benchmarks in the appendix.

Benchmarks and modeling
Our DFAS implementations expect the asynchronous system to be specified in an XML format. We have developed a custom XML schema for this, closely based on the Promela modeling language used in Spin [25]. We followed this direction in order to be able to evaluate our approach on examples from different languages. We manually translated each benchmark into an XML file, which we call a model. As the input XML schema is close to Promela, the Spin models were easily translated. Other benchmarks had to be translated to our XML schema by understanding their semantics.
Note that both our approaches are expensive in the worst-case (exponential or worse in the number of counters r). Therefore, we have chosen benchmarks that are moderate in their complexity metrics. Still, these benchmarks are real and contain complex logic (e.g., the leader election example from Promela, which was discussed in detail in Section 1.1). We have also performed some manual simplifications to the benchmarks to aid scalability (discussed below). Our evaluation is aimed towards understanding the impact on precision due to infeasible paths in real benchmarks, and not necessarily to evaluate applicability of our approach to large systems.
We now list some of the simplifications referred to above. Language-specific idioms that were irrelevant to the core logic of the benchmark were removed. The number of instances of identical processes in some of the models were reduced in a behavior-preserving manner according to our best judgment. In many of the benchmarks, messages carry payload. Usually the payload is one byte. We would have needed 256 counters just to encode the payload of one 1-byte message. Therefore, in the interest of keeping the analysis time manageable, the payload size was reduced to 1 bit or 2 bits. The reduction was done while preserving key behavioral aspects according to our best judgment. Finally, procedure calls were inlined (there was no use of recursion in the benchmarks).
In the rest of this section, whenever we say "benchmark", we actually mean the model we created corresponding to the benchmark. Table 1 also shows various metrics of our benchmarks (based on the XML models). Column 3-6 depict, respectively, the number of processes, the total number of variables, the number of "counters" r, and the total number of nodes in the VCFG. We provide our XML models of all our benchmarks, as well as full output files from the runs of our approach, as a downloadable folder (https://drive.google.com/drive/folders/ 181DloNfm6 UHFyz7qni8rZjwCp-a8oCV).

Data flow analysis results
We structure our evaluation as a set of research questions (RQs) below. Table 2 summarizes results for the first three RQs, while Table 3 summarizes results for RQ 4.
RQ 1: How many constants are identified by the Forward and Backward DFAS approaches? Column (2) in Table 2 shows the number of uses in each benchmark. Columns (4)-Forw and (4)-Back show the number of uses identified as constants by the Forward and Backward DFAS approaches, respectively. In total across all benchmarks Forward DFAS identifies 63 constants whereas Backward DFAS identifies 49 constants. Although in aggregate Backward DFAS appears weaker than Forward DFAS, Backward DFAS infers more constants than Forward DFAS in two benchmarksjobqueue test and bookCollectionStore. Therefore, the two approaches are actually incomparable. The advantage of Forward DFAS is that it can use relatively more precise analyses like CP that do not satisfy the assumptions of Backward DFAS, while the advantage of Backward DFAS is that it always computes the precise JOFP.
RQ 2: How many assertions are verified by the approaches? Verifying assertions that occur in code is a useful activity as it gives confidence to developers. All but one of our benchmarks had assertions (in the original code itself, before modeling). We carried over these assertions into our models. For instance, for the benchmark leader, the assertion appears in Line 11 in Figure 1. In some benchmarks, like jobqueue test, the assertions were part of test cases. It makes sense to verify these assertions as well, as unlike in testing, our technique considers all possible interleavings of the processes. As "bookCollectionStore" did not come with any assertions, a graduate student who was unfamiliar with our work studied the benchmark and suggested assertions.
Column (3) in Table 2 shows the number of assertions present in each benchmark. Columns (5)-Forw and (5)-Back in Table 2 show the number of assertions declared as safe (i.e., verified) by the Forward and Backward DFAS approaches, respectively. An assertion is considered verified iff constants (as opposed to " ") are inferred for all the variables used in the assertion, and if these constants satisfy the assertion. As can be seen from the last row in Table 2, both approaches verify a substantial percentage of all the assertions -52% by Forward DFAS and 48% by Backward DFAS. We believe these results are surprisingly useful, given that our technique needs no loop invariants or usage of theorem provers.

RQ 3:
Are the DFAS approaches more precise than baseline approaches? We compare the DFAS results with two baseline approaches. The first baseline is a Join-Over-all-Paths (JOP) analysis, which basically performs CP analysis on the VCFG without eliding any infeasible paths. Columns (6)-JOP and (7)-JOP in Table 2 show the number of constants inferred and the number of assertions verified by the JOP baseline. It can be seen that Backward DFAS identifies 2.2 times the number of constants as JOP, while Forward DFAS identifies 2.9 times the number of constants as JOP (see columns (4)-Forw, (4)-Back, and (6)-JOP in the Total row in Table 2). In terms of assertions, each of them verifies almost 5 times as many assertions as JOP (see columns (5)-Forw, (5)-Back, and (7)-JOP in Total row in Table 2.) It is clear from the results that eliding infeasible paths is extremely important for precision.
The second baseline is Copy Constant Propagation (CCP) [49]. This is another variant of constant propagation that is even less precise than LCP. However, it is based on a finite lattice, specifically, an IFDS [49] lattice. Hence this baseline represents the capability of the closest related work to ours [28], which elides infeasible paths but supports only IFDS lattices, which are a sub-class of finite lattices. (Their implementation also used a finite lattice of predicates, but we are not aware of a predicate-identification tool that would work on our benchmarks out of the box.) We implemented the CCP baseline within our Backward DFAS framework. This baseline hence computes the JOFP using CCP (i.e., it elides infeasible paths).
Columns (6)-CCP and (7)-CCP in Table 2 show the number of constants inferred and the number of assertions verified by the CCP baseline. From the Total row in Table 2 it can be seen that Forward DFAS finds 62% more constants than CCP, while Backward DFAS finds 26% more constants than CCP. With respect to number of assertions verified, the respective gains are 57% and 43%. In other words, infinite domains such as CP or LCP can give significantly more precision than closely related finite domains such as CCP.   Table 3 correspond to the benchmarks (only first three letters of each benchmark's name are shown in the interest of space). The rows show the running times for Forward DFAS, Backward DFAS, JOP baseline, and CCP baseline, respectively.
The JOP baseline was quite fast on almost all benchmarks (except lynch). This is because it maintains just a single data flow fact per VCFG node, in contrast to our approaches. Forward DFAS was generally quite efficient, except on chameneos and lynch. On these two benchmarks, it scaled only with κ = 1 and κ = 0, respectively, encountering memory-related crashes at higher values of κ (we used κ = 2 for all other benchmarks). These two benchmarks have large number of nodes and a high value of r, which increases the size of the data flow facts.
The running time of Backward DFAS is substantially higher than the JOP baseline. One reason for this is that being a demand-driven approach, the approach is invoked separately for each use ( Table 2, Col. 2), and the cumulative time across all these invocations is reported in the table. In fact, the mean time per query for Backward DFAS is less than the total time for Forward DFAS on 9 out of 14 benchmarks, in some cases by a factor of 20x. Also, unlike Forward DFAS, Backward DFAS visits a small portion of the VCFG in each invocation. Therefore, Backward DFAS is more memory efficient and scales to all our benchmarks. Every invocation of Backward DFAS consumed less than 32GB of memory, whereas with Forward DFAS, three benchmarks (leader, replicatingStorage, and jobqueue test) required more than 32GB, and two (lynch and chameneos) needed more than the 128 GB that was available in the machine. On the whole, the time requirement of Backward DFAS is still acceptable considering the large precision gain over the JOP baseline.

Limitations and Threats to Validity
The results of the evaluation using our prototype implementation are very encouraging, in terms of both usefulness and efficiency. The evaluation does however pose some threats to the validity of our results. The benchmark set, though extracted from a wide set of sources, may not be exhaustive in its idioms. Also, while modeling, we had to simplify some of the features of the benchmarks in order to let the approaches scale. Therefore, applicability of our approach directly on real systems with all their language-level complexities, use of libraries, etc., is not yet established, and would be a very interesting line of future work.

Related Work
The modeling and analysis of parallel systems, which include asynchronous systems, multi-threaded systems, distributed systems, event-driven systems, etc., has been the focus of a large body of work, for a very long time. We discuss some of the more closely related previous work, by dividing the work into four broad categories.
Data Flow Analysis: The work of Jhala et al. [28] is the closest work that addresses similar challenges as our work. They combine the Expand, Enlarge and Check (EEC) algorithm [20] that answers control state reachability in WSTS [17], with the unordered channel abstraction, and the IFDS [49] algorithm for data flow analysis, to compute the JOFP solution for all nodes. They admit only IDFS abstract domains, which are finite by definition. Some recent work has extended this approach for analyzing JavaScript [59] and Android [44] programs. Both our approaches are dissimilar to theirs, and we admit infinite lattices (like CP and LCP). On the other hand, their approach is able to handle parameter passing between procedures, which we do not.
Bronevetsky et al. [7] address generalized data flow analysis of a very restricted class of systems, where any receive operation must receive messages from a specific process, and channel contents are not allowed to cause nondeterminism in control flow. Other work has addressed analysis of asynchrony in web applications [27,41]. These approaches are efficient, but over-approximate the JOFP by eliding only certain specific types of infeasible paths.
The coverability problem mentioned above is considered equivalent to control state reachability, and has received wide attention [1,13,28,18,53,19,32,4,55]. Abdulla et al. [3] were the first to provide a backward algorithm to answer coverability. Our Backward DFAS approach is structurally similar to their approach, but is a strict generalization, as we incorporate data flow analysis using infinite abstract domains. (It is noteworthy that when the abstract domain is finite, then data flow analysis can be reduced to coverability.) One difference is that we use the unordered channel abstraction, while they use the lossy channel abstraction. It is possible to modify our approach to use lossy channels as well (when there are no procedure calls, which they also do not allow); we omit the formalization of this due to lack of space.
Bouajjani and Emmi [4] generalize over previous coverability results by solving the coverability problem for a class of multi-procedure systems called recursively parallel programs. Their class of systems is somewhat broader than ours, as they allow a caller to receive the messages sent by its callees. Our ComputeEndToEnd routine in Algorithm 2 is structurally similar to their approach. They admit finite abstract domains only. It would be interesting future work to extend the Backward DFAS approach to their class of systems.
Our approaches explore all interleavings between the processes, following the Spin semantics. Whereas, the closest previous approaches [28,4] only address "event-based" systems, wherein a set of processes execute sequentially without interleaving at the statement level, but over an unbounded schedule (i.e., each process executes from start to finish whenever it is scheduled).
Other forms of verification: Proof-based techniques have been explored for verifying asynchronous and distributed systems [23,57,46,21]. These techniques need inductive variants and are not as user-friendly as data flow analysis techniques. Behavioral types have been used to tackle specific analysis problems such as deadlock detection and correct usage of channels [35,36,51].
Testing and Model Checking: Languages and tools such as Spin and Promela [25], P [14], P# [12], and JPF-Actor [38] have been used widely to model-check asynchronous systems. A lot of work has been done in testing of asynchronous systems [15,12,52,22,58] as well. Such techniques are bounded in nature and cannot provide the strong verification guarantees that data flow analysis provides.

Conclusions and Future Work
In spite of the substantial body of work on analysis and verification of distributed systems, there is no existing approach that performs precise data flow analysis of such systems using infinite abstract domains, which are otherwise very commonly used with sequential programs. We propose two data flow analysis approaches that solve this problem -one computes the precise JOFP solution always, while the other one admits a fully general class of infinite abstract domains. We have implemented our approaches, analyzed 14 benchmarks using the implementation, and have observed substantially higher precision from our approach over two different baseline approaches.
Our approach can be extended in many ways. One interesting extension would be to make Backward DFAS work with infinite height lattices, using widening. Another possible extension could be the handling of parameters in procedure calls. There is significant scope for improving the scalability using better engineering, especially for Forward DFAS. One could explore the integration of partial-order reduction [10] into both our approaches. Finally, we would like to build tools based on our approach that apply directly to programs written in commonly-used languages for distributed programming.

Benchmark Sources
Following are the links to the sources of the benchmarks used in the paper.

Complete Proofs for Backward DFAS
In this section, we formally prove the termination and correctness of our algorithm ComputeJOFP. The proofs in the appendix are self-contained and only refer to Algorithm 1 and Algorithm 2 in the paper. Before presenting the proofs, we revise the important definitions.

Definition 2 (Demand). For a path p, and vector
Here, µ ≡ λz ∈ Z.(max(0, z)). µ can be applied to vectors of integers in the natural manner, that is, by applying µ to each component of the vector. This definition is equivalent to the definition presented in the paper (if d is replaced by 0).

Definition 3 (Covering).
A set of paths C is said to cover a path p iff all paths in C have the same start and end nodes (respectively) as p, and for each path p ∈ C, demand (p ) ≤ demand (p), and the join of the path transfer functions of all these paths dominates the path transfer function of p.

Definition 4 (Path Template).
A path template (p 1 , p 2 , . . . , p n ) of a procedure F ∈ Funcs is a sequence of paths in the VCFG of F such that n ≥ 2 path p 1 begins at node en F (the entry node of the VCFG of procedure F ) and path p n ends at node ex F (the designated exit node of the VCFG of procedure F ) for all p i , 1 ≤ i < n, p i ends at a call-site node, and for all p i , 1 < i ≤ n, p i begins at the return-site node corresponding to the call-site node at which p i−1 ends.

Definition 5 (D-Covering).
A path p is d-covered by a set of paths S, for given demand d iff, 1. if p begins in vertex v i and ends at vertex v j , all paths in S start at v i and end at v j 2. for all paths p ∈ S, (demand (p , d) ≤ demand (p, d)) 3. Proof: We prove the theorem in two parts. First, we prove that each invocation of the form ComputeEndToEnd(F, d ), where F ∈ Funcs and d ∈ N r , necessarily terminates.
First, it is clear that the loop at lines 2-3 in ComputeEndToEnd terminates, as there are only finite number of 0-depth paths.
Now we reason about the other loop in the routine from lines 4-12. Let each visit to Line 11 in the routine ComputeEndToEnd in algorithm (where a path is added to a set sIVCPaths(F i , d)) during the current invocation be considered as an "event". Each event is fully described by a triple of parameters, namely: (the procedure F i being currently visited, the path IVC path p in procedure F i that is currently being added to sIVCPaths(F i , d), demand (p , d)) Therefore, the entire invocation corresponds to a sequence of events of the kind mentioned above. Let this sequence be called S. Clearly, the invocation is non-terminating iff S is infinitely long.
Since the procedures contain only send operations, for any path p that is fully within the procedures, by Definition 2, demand (p, d) ≤ d. Therefore, there is a finite number of values possible in the third components of the triples mentioned above. Also, the number of procedures in Funcs is finite. Therefore, if S is infinite, there must exist a procedure ... However, this contradicts our assumption that the transfer procedure lattice has no infinitely increasing chains (refer assumptions section).
Therefore, S cannot be infinite, and hence S cannot be infinite. Therefore, we have proved that every invocation ComputeEndToEnd(F, d ) must necessarily terminate.
Next we prove that ComputeJOFP in Algorithm 1 always terminates. If every call to ComputeEndToEnd is terminating, then the only way Compute-JOFP will go into non-termination is if for some node v, an infinite number of paths get inserted into sPaths(v) at Lines 13, 19, and 23 in ComputeJOFP.
Here, v can be any node in the VCFG of any procedure. Let be the infinite sequence of paths inserted into sPaths(v). Since the set of all demand vectors form a well-quasi ordering w.r.t. the ≤ comparison on demand vectors, there must exist an infinite subsequence S 1 of S 1 such that for all i ≥ 1, . From this, Lines 12, 18, and 22 in Com-puteJOFP algorithm, and the definition of covered, it follows that for all i ≥ 1: does not dominate ptf (S 1 [i]). This implies that the following infinite sequence is strictly increasing as per the ordering in the lattice of transfer procedures: . . . However, this again contradicts our assumption that the transfer procedure lattice has no infinite increasing chains. Therefore, we have contradicted our initial assumption that the sequence S 1 is infinite. Therefore, S cannot have an infinite subsequence S 1 . Hence S will be finite. Therefore, procedure ComputeJOFP always terminates.
Soundness Soundness of the algorithm is characterized by the following theorem: Theorem 2 (Soundness). For any node v, let d ∈ L be the JOFP value computed by ComputeJOFP in algorithm for v, treating d 0 ∈ L as the initial value at the start node, and start as the initial node. Then, d p is a feasible, interprocedurally valid path from start to v (ptf (p))(d 0 ) The proof of this theorem requires a set of lemmas and intermediate theorems. Therefore, we will first present the necessary lemmas, then the intermediate theorems, and then the final proof of correctness.

Important Lemmas
We require the following lemmas. Lemma 1. If d ∈ N r is a vector, r ≥ 1, p 1 and p 2 are paths from v i to v j such that demand (p 1 , d) ≤ demand (p 2 , d), and p 0 is any path ending at v i , then demand (p 0 .p 1 , d) ≤ demand (p 0 .p 2 , d).
Proof: This is the Demand Supply Lemma presented in the paper. We prove the lemma using induction on the length of the path p 0 .
We first consider the base case when We are given, d 1 ≤ d 2 Subtracting w 0 from both sides we get, We now prove that the application of µ preserves the ordering in the inequation (1), or in other words, µ applied to vectors is a monotone function.
As µ is applied component-wise on a vector and addition/subtraction of vectors is also component-wise, it suffices to show that for any i ∈ [1 . . . r],  (2) and (3) we can infer that (5) and (6) we can infer that (8) and (9) we can infer that Due to (1), the fourth case (d 1 (7) and (10) Therefore, it can be seen that µ(d 1 − w 0 ) ≤ µ(d 2 − w 0 ) − (11) Now, using the definition of demand for p 0 .p 1 and p 0 .p 2 , and from (11) it follows that, This proves the base case. Now, for the inductive case, let the length of p 0 be n + 1.
The inductive case can be proved the same way as the base case, by replacing p 1 by p 0 .p 1 , and p 2 by p 0 .p 2 in the base case proof.
Lemma 2. Let p 1 be a path from v i to v j and S 2 be a set of paths from v i to v j such that p2∈S2 ptf (p 2 ) ptf (p 1 ) . Let p 0 be any path ending at v i . Then, p2∈S2 ptf (p 0 .p 2 ) ptf (p 0 .p 1 ).
Proof: Let S 2 = {p 21 , p 22 , . . . , p 2n }, We are given, , By composing on the left-side of the LHS and RHS in (1) using ptf (p 0 ) and the monotonicity of composition operation, we obtain, It is given that the path transfer functions form a complete lattice. As a consequence, the path transfer function composition left-distributes over function join, i.e., f . Therefore, expanding (2) using left-distributivity we obtain, The path transfer function for a path p = p a .p b is given by ptf (p a .p b ) = ptf (p a ) • ptf (p b ). Therefore, by rewriting (3)  Proof: We prove the lemma by induction on the length of p 1 , and taking any arbitrary p 2 . We first prove the base case. The base case is when p 1 is of length 1, i.e., it is an edge e. Let w e be the queuing vector of e, and therefore w p1 = w e .
By Definition 2, we have, Replacing the value of w e using w p1 = w e , demand (p, d) = µ(demand (p 2 , d) − w p1 ) This proves the base case. We now proceed to the inductive case. Let p = e.p 1 .p 2 , where e.p 1 is of length n + 1. As p 1 is of length n, the inductive hypothesis holds for p 1 and path p = p 1 .p 2 . Therefore, by the hypothesis we have, − (2) After replacing the value of demand (p , d) from (1) in (2) we get, demand (p, d) = µ(µ(demand (p 2 , d) − w p 1 ) − w e ) − (3) Now we argue that demand (p, d) = µ(demand (p 2 , d) − w p 1 − w e ); i.e., the inner application of µ from (3) can be dropped. In order to prove this, we will argue that for any i ∈ [1 . . . r], µ(µ(demand (p 2 , d) ). Proving this suffices as the two operations involved, µ and vector addition/subtraction are component-wise. Therefore, proving the required result for all components, will prove it for the full vector.
Thus, we proceed to the proof. Let x = demand (p 2 , d) − w p 1 . Based on the value of x[i] for any i ∈ [1 . . . r], we have the following possible scenarios: Therefore by replacing the value of µ(  (7), and the definition of µ, we have µ(µ(x[i]) − w e [i]) = 0 − (8) As the RHS are equal in (6) and (8), we can infer that in this case, Therefore, from (4) and (9), we have proved that for As the paths p 1 and p 1 do not receive any messages w p1 = w e.p 1 = w p 1 + w e . Thus, replacing (−w p 1 − w e ) in the above equation by −w p1 , demand (p, d) = µ(demand (p 2 , d) − w p1 ) Therefore, the inductive case holds as well.
As a consequence of Lemma 3, we have the following corollary. Corollary 2: Let p = e 1 .e 2 . . . e n be a path, such that for i ∈ [1 . . . n], w ei = θ(e i ). Let d be any element of N r . If p does not receive any messages, then Proof : The proof of the corollary is the same as that of Lemma 3, and can be obtained by simply replacing demand (p 2 , d) by d. Composing the LHS and RHS of inequation (1) from the right-side by ptf (p 0 ) and due to the monotonicity of ptf (p 0 ) and procedure composition, we get, As procedure composition is right-distributive over procedure join, i.e. for any f, f 1 , . . . f n ∈ L → L, The path transfer procedure for a path p = p a .p b is given by ptf (p a .p b ) = ptf (p a ) • ptf (p b ). Therefore, by rewriting (3) in terms of path transfer procedure s, we get Hence proved.
Lemma 5. Let d ∈ N r be a vector,v i and v j be any two nodes p 1 and p 2 be paths from v i to v j such that demand (p 1 , d) ≤ demand (p 2 , d). Let p 0 be any path ending at v i and p 3 be any path beginning from v j . Let the paths p 1 , p 2 , p 0 and p 3 be such that they do not receive any messages. Then, demand (p 0 .p 1 .p 3 , d) ≤ demand (p 0 .p 2 .p 3 , d).
Proof: The proof is in two parts. We first prove that demand (p 1 . To prove the first part, we use Corollary 2. As the paths p 1 , p 2 do not receive any messages, Corollary 2 is applicable on p 1 and p 2 . Let w p1 and w p2 be the sums of the queuing vectors of edges in paths p 1 and p 2 respectively. Therefore, by mapping p 1 and p 2 to p in Corollary 2, we have, (1) and (2) into (3), we obtain

Substituting the values from Equations
− (4) Equation (4) will hold true iff for all i ∈ [1 . . . r], it holds that : We now argue that irrespective of which of the two conditions above holds for any given − (6) By Definition 2 and Lemma 3, we have That is, to obtain the demand, µ will be applied on both sides. We proved in Lemma 1 that µ is a monotone procedure , i.e., if x ≤ y then µ(x) ≤ µ(y). Thus, the application of µ on both sides of Equation (6) preserves the ordering of the inequation, and we obtain Using the definitions of demands of paths p 1 .p 3 and p 2 .p 3 from Equations (7) and (8), in conjunction with Equation (9) it follows that, The second case is when condition 2 holds for i. In this case we have, − (12) Negating both sides of (11) and (12) (13) and (14) we get, (15) and the definition of µ, we have Using Equation (16), and the above definitions, we can infer Since (10) and (17) hold for all i, we get: Now we prove the second part, i.e., if demand (p 1 , d) ≤ demand (p 2 , d) then demand (p 0 .p 1 .p 3 , d) ≤ demand (p 0 .p 2 .p 3 , d).
Applying Lemma 1 by mapping p 1 .p 3 to p 1 in the lemma statement, p 2 .p 3 to p 2 in the lemma statement, and p 0 to p 0 in the lemma statement, and using (18), we have Hence proved.

Lemma 6.
Let v i and v j be any two nodes. Let p 1 be a path from v i to v j and S 2 be a set of paths from v i to v j such that p2∈S2 ptf (p 2 ) ptf (p 1 ). Let procedure composition be both left-and right-distributive over procedure join for all the path transfer procedure s. Let p 0 be any path ending at v i and p 3 be any path starting from v j . Then Proof: This is the Function Coverage Lemma, as presented in the paper. To prove lemma we use the results of Lemma 2 and 4. From Lemma 2, we can infer that, Now, from Lemma 4, we can infer, Rewriting (3) in terms of S 2 using the definition of S 1 , we obtain Hence proved. 1. all paths in S start at v i and end at v j 2. for all paths p ∈ S, Similarly, we have the following facts for any p ∈ S the set S p that d-covers p .
1. as p starts at v i and ends at v j , all paths in S p start at v i and end at v j 2. for all paths p ∈ S p , (demand (p , d) ≤ demand (p , d))

3.
p ∈S p ptf (p ) ptf (p ) From the facts above and the definition of S 1 , it can be directly inferred that every path in S 1 begins at v i and ends at v j .
− (2) We are given that (demand (p , d) ≤ demand (p, d)). Therefore from (2), we can infer that (demand (p 1 , d) ≤ demand (p, d)). − (3) Now we prove that the join of the transfer procedure s of the paths in S 1 dominates the path transfer procedure of p. Because S covers p, we have p ∈S ptf (p ) ptf (p) Say S = {p 1 , p 2 , . . . p n }, then by expanding the above equation we get We are given for all p ∈ S that p1∈S p ptf (p 1 ) ptf (p ). By the property of join operation we have, where S p 1 , S p 2 , . . . S p n are the sets d-covering paths p 1 , p 2 , . . . p n respectively. From (4) and (5), it can be seen that By the property of join (6) can be rewritten as, As S 1 = S p 1 ∪ S p 2 . . . ∪ S p n , therefore (7) can be rewritten as Our algorithm can be seen as generating paths iteratively, and storing each generated path if it is not covered by other paths. A path is generated at every visit to Lines 11, 17, and 21 of routine ComputeJOFP. For a given path p, we say that it is generated by our algorithm if it is generated during any visit to any of the lines mentioned above. Over the course of a run of the algorithm, note that for any path p that is generated, the path necessarily ends at the node target, although it could begin at any node. Also note that when any path p is generated, if p begins at a node v i , then p is stored in sPaths(v i ) unless sPaths(v i ) already stores a previously generated set of paths that cover p.
Lemma 8. If p is a path from a node v i in the VCFG of any procedure to the target node, and if the algorithm generates a set of paths S that cover p, then when ComputeJOFP terminates there is guaranteed to be a set of paths in sPaths(v i ) that cover p.
Proof: Note, after any path p ∈ S is generated, the algorithm would invoke the routine Covered(p ). The two following outcomes can result from this invocation.
1. The routine Covered returns false . In this case p is added to sPaths(v i ), That is, p is retained. 2. The routine Covered returns true . In this case, p is not added to sPaths(v i ), as there exists a set of paths Cover (p ) ⊆ sPaths(v i ) such that Cover (p ) covers p .
We now prove that irrespective of the outcome above of Covered, there exists a set of paths in sPaths(v i ) that cover p.
− (2) Similarly, as S covers p, and S 2 contains paths due to sets that cover paths p ∈ S, therefore for all paths p ∈ S 2 . (demand (p ) ≤ demand (p)).
− (3) From (2) and (3), we can infer that for all paths p ∈ S 1 ∪ S 2 , demand (p ) ≤ demand (p)) − (4) Now we prove the relation between the path transfer procedure s of p and paths in sPaths(v i ). We are given, Splitting the above using the definition of S 1 we obtain, We can expand the set S − S 1 = {p 1 , p 2 , . . . p n } and write (5) as, From the two outputs of Covered, we know that S 1 ⊆ sPaths(v i ). Also, for each p ∈ S − S 1 there exists a set Cover (p ) ⊆ sPaths(v i ) such that Cover (p ) covers p .
Thus replacing the path transfer procedure s of all p ∈ S−S 1 by p1∈Cover (p ) ptf (p 1 ) in (6), and the property that if a b then c a c b, we get Let S = S 1 ∪ Cover (p 1 ) ∪ . . . ∪ Cover (p n ). Rewriting (7) in terms of S we get From the definition of S we know that S = S 1 ∪ S 2 and S ⊆ sPaths(v i ). Therefore, from (1), (4) and (8), we can infer that S ⊆ sPaths(v i ) covers p. Hence proved.
Similar to the generation of paths discussed above, the routine ComputeEnd-ToEnd also generates paths iteratively in each invocation, and stores a generated path if it is not covered by other paths. The routine ComputeEndToEnd is invoked with a procedure F and a vector d (which is the demand of a path). In each invocation, the routine ComputeEndToEnd generates an interprocedurally valid and complete path, using path templates, at every visit to Line 9. For a given path p, we say that it is generated by ComputeEndToEnd if it is generated during any visit to Line 9. Any path p generated by ComputeEnd-ToEnd begins at the entry node of the VCFG of some procedure F i and ends at the designated exit node of F i . If p is generated and it begins at the entry node of F i , then it is stored in sIVCPaths(F i , d) unless sIVCPaths(F i , d) already stores a previously generated set of paths that d-cover p.
Lemma 9. Let d ∈ N r be a given vector such that ComputeEndToEnd is invoked with procedure F i ∈ Funcs and d as arguments. If p is an interprocedurally valid and complete path from the entry node en F of any procedure F ∈ Funcs to the exit node ex F , and if ComputeEndToEnd generates (at Line 11) a set of paths S that d-cover p for the given vector d, then when the above-mentioned invocation to ComputeEndToEnd terminates there is guaranteed to be a set of paths in sIVCPaths(F, d) that d-cover p.
Proof : The proof of this lemma is similar to the proof of Lemma 8. In this case also, for each p ∈ S, either p is retained in sIVCPaths (F, d), or a set of d-covering paths is already present. Therefore, this lemma holds.
Lemma 10. Let p 0 , p 1 , p 2 be paths where p 1 and p 2 end at v i and p 0 begins at v i . If demand (p 1 , demand (p 0 )) ≤ demand (p 2 , demand (p 0 )), then demand (p 1 .p 0 ) ≤ demand (p 2 .p 0 ) Proof: To prove the lemma, we first prove an intermediate result, i.e., for any paths p, q, such that end node of p is the same as the start node of q, demand (p.q, d) = demand (p, demand (q, d)). We prove this using induction on the length of p.
The base case is when p is a single edge e with queuing vector w e . Then by Definition 2, Again by Definition 2 demand (e, demand (q, d)) = µ(demand (q, d)−w e ) − (2).
We now prove the inductive case. Let p = e.p , where p is of length n + 1 and p is of length n. By the induction hypothesis we have, demand (p .q, d) = demand (p , demand (q, d)).
By Definition 2, we have Replacing the value in the RHS using induction hypothesis, Also, since p = e.p , using the definition of demand, we have From (3) and (4), we can infer that demand (p.q, d) = demand (p, demand (q, d)) for the inductive case as well. Therefore we have proved that for any paths p and q, demand (p.q, d) = demand (p, demand (q, d)).
Lemma 11. Let v be any node in the VCFG of any procedure, such that for a given node target the algorithm computes the set sPaths(v) for v on termination. Any path p ∈ sPaths(v) is an interprocedurally valid path.
Proof: According to the definition of an interprocedurally valid path, a path p will not be interprocedurally valid if at least once during the traversal of the path, the symbol popped from the top of the stack on encountering a returnsite node is not the same as the corresponding call-node. The above scenario can result only if the path has a return edge that does not have a 'matching' call-edge. We prove by induction on the length of the path p, that whenever a path is added to sPaths(v), it is interprocedurally valid (i.e., there are no unbalanced return edges).
For the base case, the length of p is 1. A path of length 1 will be interprocedurally invalid if it requires that an empty stack should be popped along the traversal of the path. Paths of length 1 are added only at Line 4 in Com-puteJOFP, and all the added paths (rather edges) are intra-procedural edges. Therefore, p does not contain an unbalanced return edge that will cause the stack to pop from an empty stack.
Hence all paths of length 1 added to sPaths(v) are interprocedurally valid. Moving on to the inductive case, let p be a path of length n + 1. Let the inductive hypothesis hold for all paths of length upto n. There are 3 points in the algorithm where paths are added to sPaths(v) -lines 13, 19, and 23 in ComputeJOFP. Therefore, based on the location in algorithm where p was added, we have the following cases.
-Case 1 : p = e.p and p was added to sPaths(v) at line 23 in Compute-JOFP. In this case, e is an intra-procedural edge, and the path p is an interprocedurally valid path by the induction hypothesis. Therefore, the concatenation of e and p does not introduce any un-balanced return edges, and hence p is also an interprocedurally valid path. − (1) -Case 2 : p = c.p and p was added to sPaths(v) at line 19 in InterProcExt.
In this case, c is a call edge. Traversal of c does not pop the stack, and from the hypothesis we know that the traversal of p is also interprocedurally valid. Therefore, p is interprocedurally valid in this case. − (2) -Case 3: p = p 1 .p 2 , where p was added to sPaths(v) at line 13 in InterPro-cExt, p 1 begins at v and ends at a return-site node v r and p 2 begins at v r , and p 1 is an interprocedurally valid and complete path. Both p 1 and p 2 are of length ≤ n such that their combined length is equal to n. Therefore, the induction hypothesis holds for both p 1 and p 2 . As p 1 is an IVC path, at the end of the traversal of p 1 (i.e., at node v r ) , the stack will be empty. From the induction hypothesis we know that the traversal of p 2 from v r is interprocedurally valid. Therefore, the path p = p 1 .p 2 is also interprocedurally valid. − From (1), (2) and (3), it follows that p is an interprocedurally valid path. Hence Proved.
Intermediate Theorems We first prove that the set of paths returned by the routine ComputeEndToEnd for a given demand d, d-covers all IVC paths of any procedure F ∈ Funcs.
Theorem 3 (ComputeEndToEnd Cover). Let d ∈ N r be a given vector. Let F i be a procedure in Funcs and let ComputeEndToEnd be invoked with aguments F i and d. Let sIVCPaths(F i , d) be the set of interprocedurally valid and complete (IVC) paths computed and returned by ComputeEndtoEnd. Let F be any procedure in Funcs. Let p be any IVC path from start of F to end of F . Then there exists a set of paths Cover (p) ⊆ sIVCPaths(F, d) such that Cover (p) d-covers p.
Proof: We prove the theorem using induction on the depth of path p. The depth of path p is the maximum number of (non-sequential) calls made from within the path p.
The base case is when p is of depth 0. For all methods F 1 ∈ Funcs and given demand d, all the paths of depth 0 in F 1 are added to sIVCPaths(F 1 , d) at Line 3 in ComputeEndToEnd. Hence, p ∈ sIVCPaths(F, d), and d-covers itself. We now prove the inductive case. We assume that the hypothesis holds for all paths of depth up to n, i.e., for a given d, all paths of depth up to n are d-covered by their respective sIVCPaths sets.
Let p be an IVC path from en F to ex F of depth n + 1. Any IVC path of F depth n + 1 will have the following structure. It will start at the method entry en F , then reach a call-site ,traverse an IVC path of depth upto n, then return to F , then reach another call-site, traverse another ≤ n depth IVC path, and so on, until it reaches the exit node ex F .
For the simplicity of discussion, for now we assume that p contains just two outermost level calls. Figure 5 shows a schematic structure of such a path. At the end of the proof, we will discuss how to extend the proof for paths with arbitrary number of calls.
In Figure 5, as p is of depth n + 1, p 1 and p 2 are paths of depth n, and we have from the hypothesis that p 2 is d-covered by Cover (p 2 ) ∈ sIVCPaths(F 2, d), and p 1 is d-covered by Cover (p 1 ) ∈ sIVCPaths (F 1, d). In the figure, the paths p 1 , p 2 , p a , p b and p c are all in non-main procedures and hence do not receive any messages. The edges c 1 , c 2 , r 1 , and r 2 are call and return edges.
We now obtain a set of paths that d-covers Therefore, according to Lemma 5, each path p j ∈ S 2 is such that demand (p j , d) ≤ demand (p b .c 2 .p 2 .r 2 .p c , d).
Again by the hypothesis, and by Lemma 6, the join of the path transfer procedures of the paths in S 2 dominates the path transfer procedure of p b .c 2 .p 2 .r 2 .p c . Therefore, S 2 d-covers the path p b .c 2 .p 2 .r 2 .p c .
− (1) Let S 12 = {p a .c 1 .p 1 .r 1 .p i | p i ∈ S 2 }. By Lemma 1, we have for all p ∈ S 12 demand (p , d) ≤ demand (p, d) Applying Lemma 2, we can infer that the join of path transfer procedures of paths in S 12 dominates ptf (p). Thus, S 12 d-covers p.
For arbitrary structure, the above reasoning can be repeated the required number of times, i.e., for every call made in the path p, the proof will find the d-covering paths for the path suffix starting from the end of the current call to the exit node, and extend it backwards with the d-covering paths of the IVC path of the current call, as done in this proof. Therefore, an inductive proof with induction on the number of calls in a path will be able to prove it using the same arguments. Hence proved.
We use the concept of segment in the subsequent proofs, which can be defined as follows.
Definition 6 (Segment). Any interprocedurally valid path between any two nodes, where each node can bes a node in the VCFG of any procedure in the system, can be considered as a sequence of segments. Each segment is either a single intra-procedural edge, a call edge, or a path consisting of: a call edge from a call-site v i to a procedure F , followed by an IVC path from the start of F to the end of F , followed by the return edge from the exit of F to the return site corresponding to v i Similar to the proof of Lemma 11, it is easy to see that this definition is valid, that is, the three kinds of segments are sufficient. For instance, a return edge cannot be a segment, as then it would allow a single return edge to be deemed an interprocedurally valid path, which is not correct.
The next theorem ensures that the paths that do not go through main and end at the node target in the VCFG of any F ∈ Funcs are covered by the paths generated and stored by the algorithm.
Theorem 4 (Covering in non-main procedures). Let p be any interprocedurally valid path from a node v 1 in the VCFG of some procedure F 1 , and say the target node is in some procedure F (i.e. target is not in main). When the algorithm terminates, there exists a set of paths Cover (p) ⊆ sPaths(v 1 ) such that Cover (p) covers p.
Proof: We prove the theorem using induction on the number of segments in the given path p, where segments are as defined in Definition 6. Without loss of generality, we assume that the target node cannot be a return-site node or an entry node of any procedure. In order to compute the JOFP for these nodes, one can always introduce dummy successor nodes from these nodes.
The base case is when the path has only one segment. Due to the assumptions on target stated above, p must be of the form v i f,w − − → target, i.e., a path consisting of only a single intra-procedural edge (the other segments end at either a returnsite node or an entry node). All such edges are added to sPaths(v i ) at line 4 of routine ComputeJOFP in the algorithm. Therefore, p ∈ sPaths(v i ) and each as path covers itself by definition, the base case holds. Now we prove the inductive case. From the inductive hypothesis we have that all paths p 1 having n segments are covered.
Based on the types of segments, the inductive case has 3 cases. The first case is when p is of the form p = (v i f,w − − → v j ).p 1 , where p 1 is a path from v j to target having n segments, and p 1 is covered by Cover (p 1 ) ⊆ sPaths(v j ).
Consider the following set of paths: By mapping d in Lemma 1 to be the zero vector, (v 1 → v i ) to p 0 , and each path p j ∈ Cover (p 1 ) to p 2 , we have for every path p i in S 1 is such that By the inductive hypothesis, the join of the path transfer procedures of the paths in Cover (p 1 ) dominates the path transfer procedure of p 1 . Therefore, by Lemma 2, we have, From Equations 1 and 2 it follows that S 1 covers p.
Since every path in Cover (p 1 ) is present in sPaths(v j ) (inductive hypothesis), the algorithm would have generated every path in S 1 (at Lines 21 and 22). Therefore, by applying Lemma 8 and using the fact that S 1 covers p we can infer that a set paths that covers p is present in sPaths(v i ) when ComputeJOFP terminates.
Case 2 is when the first segment of p is a call-return path as described in Definition 6. Let p = p 1 .p 2 , where p 1 is the call-return path, and p 2 is the remainder of p. Let v j be the end node of p 1 (i.e., a return-site node) and the start node of p 2 as well. Let v i be the start node of p 1 (i.e., the call-site node corresponding to v j ), and let F be the procedure that is called from v i .
By the inductive hypothesis, there exists a set of paths Cover (p 2 ) in sPaths(v j ) that cover p 2 .
− (3) Consider the following set of paths: Due to the assumptions on the VCFG, the main procedure is not called by any other procedure. Therefore, v i and target are both not in main, the paths considered in the theorem do not go through the main procedure. As a result the demands of all paths are 0. Therefore, each path p j in S 1 is such that demand (p j ) ≤ demand (p). Again by the inductive hypothesis, and by Lemma 2, the join of the path transfer procedures of the paths in S 1 dominate the path transfer procedure of p.
Therefore, S 1 covers p − (4) Let c be the call edge from v i to en F and r be the corresponding return edge from ex F to v j . Consider any path p i in Cover (p 2 ) and the following set: S pi = {c.p j .r .p i | p j ∈ computeEndToEnd(F, 0)} From Theorem 3, we know that the set of paths sIVCPaths(F, 0) d-covers p 1 where d = 0. Therefore, using Lemma 5, by mapping c to p 0 , r to p 3 , p 1 to p 1 and each path p j ∈ computeEndToEnd(F, 0) to p 2 , it follows that for every path p k ∈ S pi , demand (p k ) ≤ demand (p 1 .p i ) − (5) Also, by Lemma 6 by mapping computeEndToEnd(F, 0) to S 2 , it follows that the join of the path transfer procedures of the paths in S pi dominates the path transfer procedure of p 1 .p i . − (6) From (5) and (6), it follows that the set S pi covers the path p 1 .p i . − (7) The path p i was generated by the algorithm (as per definition of Cover (p 2 )). Thus, from lines 10-11 in the pseudocode of ComputeJOFP, it is clear that algorithm generates all paths in S pi .
− (8) Consider the following set: From the definition of S 1 , from statements (4) and (8), from Lemma 7, and from the definition of S 2 , it follows that: S 2 covers p −(7) From Statement (8), it is clear that the algorithm generates every path in S 2 . From this, and from Statement (7) and Lemma 8, we infer that when algorithm terminates, sPaths(v i ) will contain a set of paths that cover p.
The third case is when p is of the form p = e c .p 1 , where e c is a call-site-toentry-node edge for method F 2 , and p 1 is covered by Cover (p 1 ) ⊆ sPaths(en F2 ). The proof in this case is again similar to the first case.
Next we need to prove that the algorithm generates and stores paths that cover all the paths in the system. In order to do this, we first prove the following important lemma.
Lemma 12. Let p j be any inter-procedurally valid path from the entry of a procedure F i to the target node v that is inside some procedure in Funcs such that the algorithm has added p j to sPaths(en Fi ). If p 1 is any interprocedurally valid path such that p 1 begins in some vertex v i in main and ends at a call-site node v k in main from which there is a call-edge c to en Fi , then it can be shown that when the algorithm terminates there exist a set of paths in sPaths(v i ) such that this set of paths covers the path p = p 1 .c.p j .
Proof: By Lines 14-16 in the procedure ComputeJOFP in algorithm , and from our assumption that p j is in sPaths(en Fi ), it follows that sPaths(v k ) will contain a set of paths, denoted as Cover (c.p j ), that cover the path c.p j .
− (1) The proof of this lemma is by induction on the number of segments, as defined in Definition 6, in p 1 . As p 1 is an IVC path in main (if p 1 is not IVC then the edge c will not be from main to en Fi , and hence will not satisfy the requirements of the lemma), there can be only two kinds of segments in p 1 -intra-procedural edge in main , or an IVC call return path.
The base case is when p 1 has a single segment. This segment has to be of the form v i → v k , where v i → v k is an edge in main (the other kind of segment ends at a return-site node, not at a call-site node). Consider the following set of paths: Since there are no receive operations inside the procedures, the demand (c.p j ) = 0. Therefore, from Statement (1), since every path p j in Cover (c.p j ) covers c.p j , every path p k in Cover (c.p j ) is such that demand (p k ) = 0. Therefore, for every path p l ∈ S 1 , demand (p l ) ≤ demand (p).
− (2) From Statement (1), since Cover (c.p j ) covers c.p j , the join of the path transfer procedures of the paths in Cover (c.p j ) dominates the path transfer procedure of c.p j . Therefore, by Lemma 2, the join of the path transfer procedures of the paths in S 1 dominates the path transfer procedure of p.
− (4) Since every path in Cover (c.p j ) is present in sPaths(v k ) (Statement (1) above), the algorithm would have generated every path in S 1 , and would have checked whether to add this path to sPaths(v i ) or whether this path is already covered by paths in sPaths(v i ) (Line 21-22 in the pseudocode for Compute-JOFP). This, in conjunction with Statement (4) above and Lemma 8 lets us infer that a set of paths that covers p is present in sPaths(v i ) when the algorithm terminates.
We now move onto the inductive case. We assume that the lemma is true whenever the path ending at v k is of length at most n segments. Let p 1 consists of n + 1 segments. Based on types of segments, the argument proceeds under two cases.
The first case is that p 1 is of the form (v i → v j ).p 2 , where v i → v j is an edge in main , and v j is the first vertex in the suffix path p 2 .
Since p 2 has at most n segments, the inductive hypothesis is applicable on the path p 2 .c.p j . The remainder of the argument is identical to the same inductive case in the proof of Theorem 4.
The second case is that p 1 is of the form p 3 .p 2 , where p 3 is an IVC callreturn path (p 3 is the first segment of p 1 ), and p 2 is the remainder of p 1 . Since p 2 has at most n segments, the inductive hypothesis is applicable on the path p 2 .c.p j .
Let v j be the end node of p 3 (i.e., a return-site node) and start node of p 2 as well. Let v i be the start node of p 3 (i.e., the call-site node corresponding to v j ), and let F be the procedure that is called from v i .
Let p = p 2 .c.p j . By the inductive hypothesis, there exists a set of paths Cover (p ) in sPaths(v j ) that cover p . −(5) Consider the following set of paths: By the inductive hypothesis, demand (p i ) ≤ demand (p ) for each p i in Cover (p ). Therefore, according to Lemma 1 (taking d in that lemma to be the zero vector), each path p j in S 1 is such that demand (p j ) ≤ demand (p). Again by the inductive hypothesis, and by Lemma 2, the join of the path transfer procedures of the paths in S 1 dominate the path transfer procedure of p.
Therefore, S 1 covers p −(6) Consider any path p i in Cover (p ). Let c 1 be the call edge from v i to en F and r 1 be the corresponding return edge from ex F to v j . Consider the following sets: From Theorem 3, it follows that the set T pi d-covers the path fragment from en F to ex F in p 3 , where d = demand (p i ).
Therefore, from Theorem 3, Lemma 10 by taking p 0 to be p i in Lemma 10, and Lemma 5, it follows that for every path p k ∈ S pi , demand (p k ) ≤ demand (p 3 .p i ) −(7) From Theorem 3, it also follows that the join of the path transfer procedures of the paths in T pi dominates the path transfer procedure of the path fragment from en F to ex F in p 3 . Therefore, by Lemma 6, the join of the path transfer procedures of the paths in S pi dominates the path transfer procedure of p 3 .p i . − (8) From (7) and (8), it follows that the set S pi covers the path p 3 .p i . −(9) p i was generated by the algorithm (as per definition of Cover (p )). Thus, from lines 10-11 in the pseudocode of ComputeJOFP, it is clear that algorithm generates all paths in S pi . −(10) Consider the following set: From the definition of S 1 , from statement (6) and (9), from Lemma 7, and from the definition of S 2 , it follows that: S 2 covers p −(11) From Statement (10), it is clear that the algorithm generates every path in S 2 . From this, and from Statement (11) and from Lemma 8, we infer that when algorithm terminates, sPaths(v i ) will contain a set of paths that cover p.
Theorem 5. If p is an interprocedurally valid path from a node v i in main to a target node v such that v is in any procedure (including main), then when algorithm terminates, there exists a set of paths Cover (p) ⊆ sPaths(v i ) such that Cover (p) covers p.
Proof: Based on the structure of p, there can be two possible cases. First is when the target v is in main, and therefore the structure of p is that it starts from v i in main, goes via vertices in main, enters procedures whose calls it encounters in main and returns to main, again goes via vertices in main, and so on, and then ends at v in main. In this case, the proof is similar to the proof of Lemma 12, by taking the interprocedurally valid path suffix c.p j to be empty, and the induction is on the number of segments in p.
The other case is when the structure of p is that it starts from vertex v i in main, goes via vertices in main, enters procedures whose calls it encounters in main and returns to main, again goes via vertices in main, and so on, until it makes a final entry into a procedure F i such that after this entry it eventually reaches the target vertex v (which may be in F i or a transitive callee of F i ) without returning from F i . Therefore, p is of the form p 1 .c.p 2 , where p 2 is the suffix of p from the en Fi to v (without returning from F i ), p 1 is a path from v i to a call-site node in main, and c is a call-edge from this call-site node to en Fi .
According to Theorem 4, taking v 1 to be en Fi , when algorithm terminates, there exists a set of inter-procedurally valid paths Cover (p 2 ) in sPaths(en Fi ) such that all these paths are from en Fi to v, and the join of the path transfer procedures of these paths dominates ptf (p 2 ).
− (1) Consider the following set of paths: (1), and from Lemmas 1 and 2, it follows that S 1 covers p.
− (2) For any path p j in Cover (p 2 ), according to Lemma 12, when the algorithm terminates, a set of paths that cover p 1 .c.p j exist in sPaths(v i ). Therefore, it follows from the definition of S 1 that a set of paths exist in sPaths(v i ) that cover S 1 .
The statement above, together with Statement (2) and Lemma 7, implies that a set of paths exist in sPaths(v i ) that cover p.
Proof of Theorem 2 Now we are ready to prove the soundness of the algorithm. As proved in Theorem 5 any interprocedurally valid path p from start node to target node is covered by Cover (p) ⊆ sPaths(start). From Lemma 11, we know that all paths in sPaths(start) are interprocedurally valid. As the set of all feasible paths is subset of all possible paths, Theorem 5 holds for all feasible paths as well.
All feasible paths from start to v have a demand of 0 (else they have more receives than sends in some prefix of the path). Therefore, by the definition of covering, all paths in the set Cover (p) will have a demand 0.
Clearly, S ⊇ Cover (p), thus S covers p. From Lines 25-26 in ComputeJOFP we have, where d is the value returned by the algorithm. As S covers p, therefore pi∈S ptf (p i ) ptf (p) Hence, d ptf (p)(d 0 ), for any p.
As d ptf (p)(d 0 ) for any feasible p, and by the property (a b ∧ a c) ⇒ (a b c), for the set of all feasible paths P = {p 1 , p 2 , . . .} that begin at start, and end at v, By condensing the above inequation, we get d p is a feasible, interprocedurally valid path from start to target (ptf (p))(d 0 ) Hence proved.
Complexity Analysis of Backward DFAS Algorithm We present here the complexity derivation of the single procedure case in the Backward DFAS algorithm. The derivation additionally assumes that the transfer functions are right-distributive. Let Q be the number of locations, K be the number of counters and let h be the height of the transfer function lattice. Wlog. we assume that each transition changes the value of any counter by at most 1. For this section we assume that the composition operator is distributive on the lattice of transfer functions.
Let π be a run from (p, u) to (q, v), written (p, u) π −→ (q, v). We shall write f π to denote the transfer function defined by composing those associated with the transitions along the run. We say that a set P of runs covers π, if for each ρ ∈ P , (p, u) ρ −→ (q, w) with v ≤ w and f π ≤ ρ∈P f ρ . Note that this notion of covering is defined over runs and as we shall see it is related to the notion defined for paths earlier. We say that P strictly covers π if for each ρ ∈ P , the final configuration is identical to (q, v). We shall write f P to denote the function ρ∈P f ρ . The following are easy to see.
Fact 1: If the set of runs P covers (resp. strictly covers) π, and for each ρ ∈ P , the set P ρ covers (resp. strictly covers) ρ then ρ∈P P ρ covers (resp. strictly covers) π.

Fact 3:
If P 1 is a set of runs that strictly cover π 1 and suppose P 2 is a set of runs that cover π 2 . Then, P 1 .P 2 covers π 1 .π 2 .
We write (P ) for the maximum of the lengths of the runs in a finite set P . Let π be a run from (q, u) to (t, v) for some v. Then, we let e (π) (effective length) to be M inimum { (P ) | P covers π} First we consider the case when all configurations along the run are bounded by a value B, i.e., the value of each counter in each configuration along the run (including the initial and final configurations) is bounded by B. We say that such a run is B-bounded.
Lemma 13. For a B-bounded run (p, u) π −→ (q, v), we have a finite set of runs P , with (P ) ≤ Q.(h + 1).B K that strictly covers π, where K is the number of counters.
Proof. We prove this by induction on the length of π. If the length of π is less than Q.(h + 1).B K then we may take P = {π}. Otherwise, since the number of B bounded configurations is bounded by Q.B K , we may break up the run as: where each π i is non-empty. Now, consider the runs ξ 0 = π 0 , ξ 1 = π 0 π 1 , . . ., ξ h = π 0 π 1 . . . π h . Then, the increasing sequence Hence, by distributivity of composition, Consider the set of runs P = {π 0 π i+2 π i+3 . . . π h π , π 0 π 1 π i+2 π i+3 . . . π h π , . . . , π 0 π 1 . . . π i π i+2 π i+3 . . . π h π , . . . , π 0 π 1 . . . π i π i+2 . . . π h π } From the above calculation, P strictly covers π and further every run in P is strictly shorter than π. By the induction hypothesis, each ρ in this set P is strictly covered by a set of runs P ρ containing only runs of length at most Q.(h + 1).B K . Thus, by Fact 1, ρ∈P P ρ is the desired strict covering set for π. Following [5], for any I ⊆ {1, . . . , K} a subset of the counters, we define u I to be the function which returns u(i) if i ∈ I and 0 otherwise. For such an I and a system G, we define G I to be the one obtained from G where each transition is modified to leave all counters outside I untouched and operate on counters from I as before. For any run π from (p, u) to (q, v) in G there is a corresponding run π I from (p, u I ) to (q, v I ) constituting a valid run in G I .
Let us fix a system G and a target location t. For any configuration s (i.e. s is of the form (p, u)), lattice function f , and I ⊆ {1, . . . , K} we define dist(I, f, s)) as follows: dist(I, f, s) = M in({0} ∪ {e (π) | ∃v.s π −→ (t, v) in G I and f f π }) For any 0 ≤ k ≤ K, we set g(k) to be sup{dist(I, f, s)) | |I| = k}. Thus, the function g(k) provides an upper bound on the length of runs that suffice to cover any run from any configuration s to a configuration above (t, 0) in any system G I with |I| = k. We now derive bounds on g(k). (1 + g(k − 1)) k + 1 + g(k − 1) if k > 0 In particular, g(k) is finite for all 0 ≤ k ≤ K.
Proof. The proof follows an argument in the style of Rackoff ([47], [5]) and proceeds by induction on k.
For k = 0 the result follows directly from Lemma 13. We examine the inductive case next. Let |I| = k. Suppose, for s, f , there is a run π from s to (t, v) in G I with f f π . We consider two cases.
Let L = (Q.(h + 1).2) (3K)!+1 . From Lemma 14 and 15 we know that for any run π from any configuration s = (p, u) to one with control state t can be covered by runs of length at most L. This allows us restrict our analysis entirely to configurations bounded by L.
Lemma 16. Suppose s = (p, u) π −→ (t, v) be a run of length at most m. Let w be such that u (i) = M in(u(i), m). Then, there is a run (p, u ) π −→ (t, v ) where π and π follow the same sequence of transitions, in particular, f π = f π .
Proof. Follows simply from the fact that the length of the run is bounded m and each transition may reduce the value of a counter by at most 1. Now, if s ≤ s and s π −→ (t, v) then clearly there is a run s π −→ (t, w) following the same sequence of transitions (so that f π = f π ). This ensures that the first value is below (under ) the second. If s π −→ (t, v) then by Lemma 15 there is a covering set P for π with e (P ) ≤ L. We then apply Lemma 16 to each element of P to conclude that the second value is below the third under , completing the proof the Lemma.
Consider Algorithm 1 and assume that the working set is maintained as a queue. Then, paths are extended in increasing order of length. We think of the algorithm as proceeding in rounds. Round i pertains to the segment when paths of lengths i are extended to paths of length i + 1, added to the working list if required and placed in the appropriate bins i.e. sP aths(q) for appropriate q.
Suppose (q, u) π −→ (target, v) is any run of length i. Let p be the path induced by the run π. Then, clearly p has length i and further demand(p, 0) ≤ u. The proof of correctness given earlier showed that there are paths C p ⊆ sP aths(q) at the end of round i that cover the path p (here cover refers to paths and is used in the sense defined in the main paper). Thus, -Each path p in C p moves from control location q to target -For each path p in C p , demand(p , 0) ≤ demand(p, 0). Thus there is a run (p, u) π p −→ (target, w p ) using the sequence of transitions p . Let P be the collection of these runs.
f p p ∈Cp f p . But f p = f π and for each p ∈ C p , f p = f π p . Thus, f π = f p p ∈Cp f p = π p ∈P f π p .
For a configuration s = (q, u) let Π i s be the set of paths in sP aths(q) after the ith round whose demand is below u under ≤. In other words, Π i s = {x ∈ sP aths(q) | (q, demand(x, 0)) ≤ (q, u)} at the end of the ith round. Then, for any run of the form s π −→ (target, v) inducing a path p C p ⊆ Π i s and thus have f π p ∈Π i s f p .
Lemma 18. For any configuration s = (p, u) Proof. This follows from Lemma 17 and the fact that each run of length at most L from s is subsumed by Π L s as shown above. If s = (q, 0) then Π L s consists only of paths with demand 0. Thus, JOP(q, target) is the join of the transfer functions defined by all the runs from (q, 0) to the target as required.
In addition note that the demands at the end of round i of Algorithm 1 are no more than i on each counter and thus no more than L at the end of L rounds. Now we can complete the computation of the complexity of Algorithm 1. We first note that instead of maintaining a path p it suffices to maintain its demand vector along with the transfer function defined by the run. Since all our demand vectors are L-bounded, the total number of demand vectors is no more than L K . In addition, in any stage of the algorithm, in any bin, for any demand vector v at most h different copies exist (with different associated transfer functions). To see this suppose (v, f 1 ), . . . (v, f h+1 ) appear in some bin and suppose this is the order in which they were added. Then, as argued before, there is an i such that f i+1 f 1 f 2 . . . f i contradicting the definition of Algorithm 1. Thus, at any point in the algorithm, there are at most Q bins, each of which contain at most h.L K demand vector -transfer function pairs. In each round, for each possible transition δ we consider at most h.L K possible candidates for extension (all drawn from the same bin). Thus each round considers ∆.h.L K candidates. For each such candidiate the operations required are: 1. manipulating the demand vector by combining it with δ to determine the new demand. These vectors both consist of K values each of size L (hence can represented and manipulated using log(L) bits). 2. composing the transfer functions 3. checking if the composed transfer function is subsumbed by the join of subset of functions selected from the same bin.
The first part uses time proportional to K.log(L) . Let us suppose that composing transfer functions take A time. Selecting the desired subset in step three requires us to examine each element of the bin and compare with its demand vector. Thus, h.L K comparisons, each taking K.log(L) steps is necessary. Assuming each join takes time B time, we can carry out the resulting join in time h.L K .B. Finally, we need to compare the resulting function with the candidate taking C time. Thus, the time spent at each candidate is proportional to

Correctness Proof for Forward DFAS
We first present the important definitions required for the proof.
The complete lattice D that we for this purpose is defined as follows: The ordering on this lattice is as follows: (d 1 ∈ D) (d 2 ∈ D) iff ∀c ∈ D r,κ . d 1 (c) L d 2 (c).
  c1 such that (c1,w,c2)∈boundedMove f (l(c 1 ))   As we intend to prove the correctness by adopting the correctness proof technique of abstract interpretation to argue the soundness of our approach, we fist present the "concrete " lattice and transfer function. Let D r be the set of all vectors of size r of natural numbers. Consider the "concrete" lattice D c ≡ D r → L, and the following "concrete" transfer function for the VCFG edge q 1 f,w − − → q 2 : fun conc(l ∈ D c ) ≡ λc 2 ∈ D r .
We now prove that the function fun is a consistent abstraction of the function fun conc. For that we define the following function γ γ(l ∈ D) ≡ λc ∈ D r . l(min(c, κ)) where κ is a vector of size r all of whose elements are equal to κ. We need to prove that for any d ∈ D, γ(fun(d)) fun conc(γ(d)). As discussed in the paper, this will be sufficient to prove the correctness.
Let d be any bounded queue configuration in D r,κ , and let d 1 be any configuration in N r such that min(d 1 , κ) is equal to d. Let d 2 be the concrete successor of d 1 (if one exists) along the VCFG edge t. It is easy to see from the definitions of the transfer functions that to prove consistent abstraction, it is enough to prove that (d, w, min(d 2 , κ)) is in boundedMove, where w is the queuing vector of t.
To prove this, let us assume for simplicity that there is a single counter. Therefore, d, d 1 , d 2 , and w all are integers. Also, in this case κ can be written as κ itself. The generalization to multiple counters is easy, since the argument actually applies to each counter individually.