1 Introduction

Interprocedural Analysis. One of the classical algorithmic analysis problems in programming languages is the interprocedural analysis. The problem is at the heart of several key applications, ranging from alias analysis, to data dependencies (modification and reference side effect), to constant propagation, to live and use analysis [10, 14,15,16, 18, 19, 24, 32, 35]. In seminal works [32, 35] it was shown that a large class of interprocedural dataflow analysis problems can be solved in polynomial time.

Models for Interprocedural Analysis. Two standard models for interprocedural analysis are pushdown systems (or finite automata with stacks) and recursive state machines (RSMs) [4, 5]. An RSM is a formal model for control flow graphs of programs with recursion. We consider RSMs that consist of modules, one for each method or function that has a number of entry nodes and a number of exit nodes, and each module contains boxes that represent calls to other modules. A special case of RSMs with a single entry and a single exit node for every module (SESE RSMs, aka supergraph in [32]) has also been considered. While pushdown systems and RSMs are linearly equivalent (i.e., there is a linear translation from one model to the other and vice versa), there are two distinct advantages of RSMs. First, the model of RSMs closely resembles the problems of programming languages with explicit function calls and returns, and hence even its special cases such as SESE RSMs has been considered to model many applications. Second, the model of RSMs provides many parameters, such as the number of entry and exit nodes, and the number of modules, and better algorithms can be developed by considering that some parameters are small. Typically the SESE RSMs can model data-independent interprocedural analysis, whereas general RSMs can model data dependency as well. For most applications, the number of entries and exits of a module, usually represents the input parameters of the module.

Semiring Framework. We consider a general framework to express computation properties of RSMs where the transitions of an RSM are labeled from a semiring. The labels are referred to as weights. A computation of an RSM executes transitions between configurations consisting of a node (representing the current control state) and a stack of boxes (representing the current calling context). To express properties of interest we need to define how to assign weights to computations, i.e., to accumulate weights along a computation, and how to assign weights to sets of computations, i.e., to combine weights across a set of computations. The weight of a given computation is the semiring product of the weights on the individual transitions of the computation, and the weight of a given set of computations is the semiring plus of the weights of the individual computations in the set. For example, (i) with the Boolean semiring (with semiring product as AND, and semiring plus as OR) we express the reachability property; (ii) with a Dataflow semiring we can express problems from dataflow analysis. One class of such problems is given by the IFDS/IDE framework [32, 35] that considers the propagation of dataflow facts along distributive dataflow functions (note that the IFDS/IDE framework only considers SESE RSMs). Hence the large and important class of dataflow analysis problems that can be expressed in the IFDS/IDE framework can also be expressed in our framework. Pushdown systems with semiring weights have also been extensively considered in the literature [20, 22, 33, 34].

Problems Considered. We consider the following basic distance problems.

  • Configuration distance. Given a set of source configurations and a target configuration, the configuration distance is the weight of the set of computations that start at some source configuration and end in the target configuration. In the configuration distance problem the input is a set of source configurations and the output is the configuration distance to all reachable configurations.

  • Superconfiguration distance. We also consider a related problem of superconfiguration distance. A superconfiguration represents a sequence of modules, rather than a sequence of invocations. Intuitively, it does not consider the sequence of function calls, but only which functions were invoked. This is a coarser notion than configurations and allows for fast overapproximation. The superconfiguration distance problem is then similar to the configuration distance problem, with configurations replaced by superconfigurations.

  • Node distance. Given a set of source configurations and a target node, the node distance is the weight of the set of computations that start at some source configuration and end in a configuration with the target node (with arbitrary stack). In the node distance problem the input is a set of source configurations and the output is the node distance to all reachable nodes.

Symbolic Representation. A core ingredient for solving distance problems is the symbolic representation of sets of RSM configurations and their manipulation. Given a symbolic representation of the set of initial configurations, we provide a two step approach to solve the distance problems. In step one we compute a symbolic representation of the set of all configurations reachable from the initial configurations. Furthermore, the transitions in the representation are annotated with appropriate semiring weights to capture the various distances described above. In step two we query the computed representation for the required distances. Thus we make the important distinction between the complexity of a one-time preprocessing and the complexity of every individual query.

Concurrent RSMs. While reachability is the most basic property, the study of pushdown systems and RSMs with the semiring framework is the fundamental quantitative extension of the basic problem. An orthogonal fundamental extension is to study the reachability property in a concurrent setting, rather than the sequential setting. However, the reachability problem in concurrent RSMs (equivalently concurrent pushdown systems) is undecidable [31]. A very relevant problem to study in the concurrent setting is to consider context-bounded reachability, where at most k context switches are allowed. The context-bounded reachability problem is both decidable [29] and practically relevant [26, 27].

Previous Results. Many previous results have been established for pushdown systems, and the translation of RSMs to pushdown systems implies that similar results carry over to RSMs as well. We describe the most relevant previous results with respect to our results. For an RSM \(\mathcal {R}\), let \(|\mathcal {R}|\) denote its size, \(\theta _e\) and \(\theta _x\) the maximum number of entries and exits, respectively, and f the number of modules. The existing results for weighted pushdown systems over semirings of height \(H\) [34, 36] along with the linear translation of RSMs to pushdown systems [4] gives an \(O(H\cdot |\mathcal {R}| \cdot \theta _e \cdot \theta _x\cdot f )\)-time algorithm for the configuration and node distance problems for RSMs. The previous results for context-bounded reachability of concurrent pushdown systems [29] applied to concurrent RSMs gives the following complexity bound: \(O(|\mathcal {R^{\parallel }}|^5 \cdot \theta ^{||~5}_x\cdot n^k\cdot |G|^k)\), where \(|\mathcal {R^{\parallel }}|\) is the size of the concurrent RSM, \(\theta ^{||}_x\) is the number of exit nodes, n is the number of component RSMs, G is the global part of the concurrent RSM, and k is the bound on the number of context switches.

Table 1. Asymptotic time complexity of computing configuration automata.
Table 2. Asymptotic time complexity of answering a configuration/superconfiguration distance query of size n. Preprocess time refers to additional preprocessing after the configuration automaton is constructed.

Our Contributions. Our main contributions are as follows:

  1. 1.

    Finite-height semirings. We present an algorithm for computing configuration and node distance problems for RSMs over semirings with finite height \(H\) with running time \(O(H\cdot (|\mathcal {R}|\cdot \theta _e + | Call |\cdot \theta _e\cdot \theta _x))\), where \(| Call |\) is the number of call nodes. The algorithm we present constructs the symbolic representations from which the distances can be extracted. Thus our algorithm improves the current best-known algorithms by a factor of \(\varOmega ((|\mathcal {R}|\cdot f)/(\theta _x + | Call |))\) (Table 1) (also see Remark 3 for details).

  2. 2.

    Distance queries. Once a symbolic representation is constructed, it can be used for extracting distances. We present algorithms which given a configuration query of size n, return the distance in \(O(n \cdot \theta _e^2)\) time. Furthermore, we present several improvements for the case when the semiring has a small domain. Finally, we show that when the RSM has a sparse call graph, we can obtain a range of tradeoffs between preprocessing and querying times. Our results on distance queries are summarized in Table 2.

  3. 3.

    Concurrent RSMs. For the context-bounded reachability of concurrent RSMs we present an algorithm with time bound \(O(|\mathcal {R^{\parallel }}|\cdot \theta _e^{||}\cdot \theta _x^{||}\cdot n^k\cdot |G|^{k+2})\). Thus our algorithm significantly improves the current best-known algorithm (Table 1).

  4. 4.

    Experimental results. We experiment with a basic prototype implementation for our algorithms. Our implementation is an explicit (rather than symbolic) one. We compare our implementation with jMoped [1], which is a leading and mature tool for weighted pushdown systems, on several real-world benchmarks coming from the SLAM/SDV project [6, 7]. We consider the basic reachability property (representative for finite-height semirings) for the sequential setting. Our experimental results show that our algorithm provides significant improvements on the benchmarks compared to jMoped.

Technical Contribution. The main technical contributions are as follows:

  • We show how to combine (i) the notion of configuration automata as a symbolic representation structure for sets of configurations, and (ii) entry-to-exit summaries to avoid redundant computations, and obtain an efficient dynamic programming algorithm for various distance problems in RSMs over finite-height semirings.

  • Configuration and superconfiguration distances are extracted using graph traversal of configuration automata. When the semiring has small domain, we obtain several speedups by exploiting advances in matrix-vector multiplication. Finally, the speedup of superconfiguration distance extraction on sparse RSMs is achieved by devising a Four-Russians type of algorithm, which spends some polynomial preprocessing time in order to allow compressing the query input in blocks of logarithmic length.

All proofs are provided in our technical report [12].

2 Preliminaries

In this section we present the necessary definitions of recursive state machines (RSMs) where every transition is labeled with a value (or weight) from an appropriate domain (semiring). Then we formally state the problems we study on weighted RSMs.

Semirings. An idempotent semiring is a quintuple \(\langle D, \oplus , \otimes , \overline{0}, \overline{1}\rangle \), where \(D\) is a set called the domain, \(\overline{0}\) and \(\overline{1}\) are elements of \(D\), and \(\oplus \) (the combine operation) and \(\otimes \) (the extend operation) are binary operators on \(D\) such that

  1. 1.

    \(\langle D,\oplus ,\overline{0}\rangle \) is an idempotent commutative monoid with neutral element \(\overline{0}\),

  2. 2.

    \(\langle D,\otimes ,\overline{1}\rangle \) is a monoid with neutral element \(\overline{1}\),

  3. 3.

    \(\otimes \) distributes over \(\oplus \),

  4. 4.

    \(\overline{0}\) is an annihilator for \(\otimes \), i.e., \(a \otimes \overline{0}= \overline{0}\otimes a = \overline{0}\) for all \(a \in D\).

An idempotent semiring has a canonical partial order \(\sqsubseteq \), defined by

$$\begin{aligned} a \sqsubseteq b \iff a \oplus b = a. \end{aligned}$$

Furthermore, this partial order is monotonic, i.e., for all \(a,b,c \in D\)

$$\begin{aligned} a \sqsubseteq b&\implies a \oplus c \sqsubseteq b \oplus c, \\ a \sqsubseteq b&\implies a \otimes c \sqsubseteq b \otimes c, \\ a \sqsubseteq b&\implies c \otimes a \sqsubseteq c \otimes b. \end{aligned}$$

The height H of an idempotent semiring is the length of the longest descending chain in \(\sqsubseteq \). In the rest of the paper we will only write semiring to mean an idempotent finite-height semiring.

Remark 1

Instead of finite height, the more general descending chain condition would be sufficient for our purposes. This only requires that there are no infinite descending chains in \(\sqsubseteq \), but there is not necessarily a finite height \(H\).

Recursive State Machines (informally). Intuitively, an RSM is a collection of finite automata, called modules, such that computations consist of ordinary local transitions within a module as well as calls to other modules, and returns from other modules. For this, every module has a well-defined interface of entry and exit nodes. Calls to other modules are represented by boxes, which have call and return nodes corresponding to the respective entry and exit nodes of the called module.

Unlike pushdown automata (PDAs), there is no explicit stack manipulation in RSMs. Instead a call stack is maintained implicitly along computations as follows. When a call node of a box is reached, the control is passed to the respective entry node of the called module and the box is pushed onto the top of the stack. When an exit node of a module is reached, a box is popped off from the top of the stack and the control is passed to the corresponding return node of the box. Hence, the stack is a sequence of boxes representing the current calling context and a configuration in a computation of an RSM is a node together with a sequence of boxes.

Recursive State Machines (formally). A recursive state machine (RSM) over a semiring \(\langle D, \oplus , \otimes , \overline{0}, \overline{1}\rangle \) is a tuple \(\mathcal {R}=\langle \mathcal {M}_1,\dots ,\mathcal {M}_k\rangle \), where every module \(\mathcal {M}_i = \langle B_i,Y_i,N_i,\delta _i,w_i\rangle \) is given by

  • a finite set \(B_i\) of boxes,

  • a mapping \(Y_i : B_i \mapsto \{1,\dots ,k\}\),

  • a finite set \(N_i = In _i \cup En _i \cup Ex _i \cup Call _i \cup Ret _i\) of nodes, partitioned into

    • internal nodes \( In _i\),

    • entry nodes \( En _i\),

    • exit nodes \( Ex _i\),

    • call nodes \( Call _i = \{\langle b,e\rangle \mid b \in B_i \text { and } e \in En _{Y_i(b)}\}\),

    • return nodes \( Ret _i = \{\langle b,x\rangle \mid b \in B_i \text { and } x \in Ex _{Y_i(b)}\}\),

  • a transition relation \(\delta _i \subseteq ( In _i \cup En _i \cup Ret _i) \times ( In _i \cup Ex _i \cup Call _i)\),

  • a weight function \(w_i : \delta _i \mapsto D\), with \(w_i(u,x)=\overline{1}\) for every exit node \(x\in Ex _i\).

We write B for \(\bigcup _{i=1}^k B_i\), and similarly for \(N\), \( In \), \( En \), \( Ex \), \( Call \), \( Ret \), \(\delta \), \(w\). To measure the size of an RSM we let \(|\mathcal {R}| = \max (|N|, \sum _i |\delta _i|)\). A major source of complexity in analysis problems for RSMs is the number of entry and exit nodes of the modules. Throughout the paper we express complexity with respect to the entry bound \(\theta _e = \max _{1 \le i \le k} | En _i|\) and the exit bound \(\theta _x = \max _{1 \le i \le k} | Ex _i|\), i.e., the maximum number of entries and exits, respectively, over all modules. Note that the restriction on the weight function to assign weight \(\overline{1}\) to every transition to an exit node is wlog, as any weighted RSM that does not respect this can be turned into an equivalent one that does, with only a constant factor increase in its size.

Stacks. A stack is a sequence of boxes \(S = b_1 \dots b_r\), where the first box denotes the top of the stack; and \(\varepsilon \) is the empty stack. The height of S is \(|S| = r\), i.e., the number of boxes it contains. For a box b and a stack S, we denote with bS the concatenation of b and S, i.e., a push of b onto the top of S.

Configurations and Transitions. A configuration of an RSM \(\mathcal {R}\) is a tuple \(\langle u,S\rangle \), where \(u \in In \cup En \cup Ret \) is an internal, entry, or return node, and S is a stack. For \(S = b_1 \dots b_r\), where \(b_i \in B_{j_i}\) for \(1 \le i \le r\) and some \(j_i\), we require that \(Y_{j_i}(b_i) = j_{i-1}\) for \(1 < i \le r\), as well as \(u \in N_{Y_{j_1}(b_1)}\). This corresponds to the case where the control is inside the module of node u, which was entered via box \(b_1\) from module \(\mathcal {M}_{j_1}\), which was entered via box \(b_2\) from module \(\mathcal {M}_{j_2}\), and so on.

We define a transition relation over configurations and a corresponding weight function , such that with \(w(\langle u,S\rangle ,\langle u',S'\rangle ) = v\) if and only if there exists a transition \(t \in \delta _i\) in \(\mathcal {R}\) with \(w_i(t)=v\) and one of the following holds:

  1. 1.

    Internal transition: \(u' \in In _i\), \(t = \langle u,u'\rangle \), and \(S' = S\).

  2. 2.

    Call transition: \(u' = e \in En _{Y_i(b)}\) for some box \(b \in B_i\), \(t = \langle u,\langle b,e\rangle \rangle \), and \(S' = bS\).

  3. 3.

    Return transition: \(u' = \langle b,x\rangle \in R_i\) for some box \(b \in B_i\) and exit node \(x \in Ex _{Y_i(b)}\), \(t = \langle u,x\rangle \), and \(S = bS'\).

Note that we follow the convention that a call immediately enters the called module and a return immediately returns to the calling module. Hence, the node of a configuration can be an internal node, an entry node, or a return node, but not a call node or an exit node.

Computations. A computation of an RSM \(\mathcal {R}\) is a sequence of configurations \(\pi = c_1,\dots ,c_n\), such that for every \(1 \le i < n\). We say that \(\pi \) is a computation from \(c_1\) to \(c_n\), of length \(|\pi | = n-1\), and of weight \(\otimes (\pi ) = \bigotimes _{i=1}^{n-1} w(c_i,c_{i+1})\) (the empty extend is \(\overline{1}\)). We write to denote that \(\pi \) is a computation from c to \(c'\) of any length. A computation is called non-decreasing if the stack height of every configuration of \(\pi \) is at least as large as that of c (in other words, the top stack symbol of c is never popped in \(\pi \)). The computation \(\pi \) is called same-context if it is non-decreasing, and c and \(c'\) have the same stack height. A computation that cannot be extended by any transition is called a halting computation. For a set of computations \(\varPi \) we define its weight as \(\bigoplus (\varPi ) = \bigoplus _{\pi \in \varPi } \otimes (\pi )\) (the empty combine is \(\overline{0}\)). For a configuration c and a set of configurations R we denote by \(\varPi (R,c)\) the set of all computations from any configuration in R to c. Here, and for similar purposes below, we will use the convention to write \(\varPi (c,c')\) instead of \(\varPi (\{c\},c')\).

Example 1

Figure 1 shows an RSM \(\mathcal {R}=\langle \mathcal {M}_1,\mathcal {M}_2\rangle \) that consists of two modules \(\mathcal {M}_1\) and \(\mathcal {M}_2\). The modules are mutually recursive, since box \(b_1\) of module \(\mathcal {M}_1\) calls module \(\mathcal {M}_2\), and box \(b_2\) of module \(\mathcal {M}_2\) calls module \(\mathcal {M}_1\). A possible computation of \(\mathcal {R}\) is

(1)
Fig. 1.
figure 1

Example of a weighted RSM that consists of two modules with mutual recursion.

Distance Problems. Given a set of configurations R, the set of configurations that are reachable from R is

Instead of mere reachability, we are interested in the following distance metrics that aggregate over computations from R using the semiring combine and hence are expressed as semiring values.

  • Configuration distance. The configuration distance from R to c is defined as

    $$ d(R,c) = \bigoplus (\varPi (R,c)). $$

    That is, we take the combine over the weights of all computations from a configuration in R to c. Naturally, for configurations c not reachable from R we have \(d(R,c) = \overline{0}\).

  • Superconfiguration distance. A superstack is a sequence of modules \(\overline{S} = \mathcal {M}_1 \dots \mathcal {M}_r\). A stack \(S = b_1 \dots b_r\) refines \(\overline{S}\) if \(b_i \in B_i\) for all \(1 \le i \le r\), i.e., the i-th box of S belongs to the i-th module of \(\overline{S}\). A superconfiguration of \(\mathcal {R}\) is a tuple \(\langle u,\overline{S}\rangle \). Let \(\llbracket \langle u,\overline{S}\rangle \rrbracket = \{ \langle u,S\rangle \mid S \text { refines } \overline{S} \}\). The superconfiguration distance from R to a superconfiguration \(\overline{c}\) is defined as

    $$ d(R,\overline{c}) = \bigoplus _{c \in \llbracket \overline{c} \rrbracket } d(R, c) $$

    The superconfiguration distance is only concerned with the sequence of modules that have been used to reach the node u, rather than the concrete sequence of boxes as in the configuration distance. This is a coarser notion than configuration and allows for fast overapproximation.

  • Node and same-context distance. The node distance of a node u from R is defined as

    $$ d(R, u) = \bigoplus _{c=\langle u,S\rangle } d(R, c) $$

    where S ranges over stacks of \(\mathcal {R}\). Finally, the same-context node distance of a node u in module \(\mathcal {M}_i\) is defined as

    $$ d(\mathcal {M}_i, u)=\bigoplus _{e\in En _i} d(\langle e,\varepsilon \rangle , \langle u,\varepsilon \rangle ). $$

    Intuitively, the node distance minimizes over all possible ways (i.e., stack sequences) to reach a node, and the same-context problem considers nodes in the same module that can be reached with empty stack.

Relevance. We discuss the relevance of the model and the problems we consider in program analysis. A prime application area of our framework is the analysis of procedural programs. Computations in an RSM correspond to the interprocedurally valid paths of a program. The distance values defined above allow to obtain information at different levels of granularity, depending on the requirement for a particular analysis. MEME (multi-entry, multi-exit) RSMs naturally arise in the model checking of procedural programs, where every node represents a combination of control location and data. Checking for reachability, usually of an error state, requires only the simple Boolean semiring. On the other hand, interprocedural data flow analysis problems, like in IFDS/IDE, are usually cast on SESE (single-entry, single-exit) RSMs (the control flow graph of the program) using richer semirings. Our framework captures both of these important applications, and furthermore allows a hybrid approach of modeling program information both in the state space of the RSM as well as in the semiring.

3 Configuration Distance Algorithm

In this section we present an algorithm which takes as input an RSM \(\mathcal {R}\) and a representation of a regular set of configurations R, and computes a representation of the set of reachable configurations \( post ^*(R)\) that allows the extraction of the distance metrics defined above. In Sect. 3.1 we introduce configuration automata as representation structures for regular sets of configurations. In Sect. 3.2 we present an algorithm for RSMs over finite-height semirings. The algorithm saturates the input configuration automaton with additional transitions and assigns the correct weights via a dynamic programming approach that gradually relaxes transition weights from an initial overapproximation. We exploit the monotonicity property in idempotent semirings which allows to factor the computation into subproblems, and hence corresponds to the optimal substructure property of dynamic programming. Although a transition might have to be processed multiple times, the finite height of the semiring prevents a transition from being relaxed indefinitely. Here we show that the final configuration automata constructed by our algorithms correctly capture configuration distances. The extraction of distance values is considered in Sect. 4.

3.1 Configuration Automata

In general, like R, the set \( post ^*(R)\) is infinite. Hence we make use of a representation of regular sets of configurations as the language accepted by configuration automata, defined below. The main feature of a regular set of configurations R is its closure under \( post ^*\). That is, \( post ^*(R)\) is also a regular set of configurations and can be represented by a configuration automaton.

Intuition. Every state in a configuration automaton corresponds to a node in the RSM. In order to represent arbitrary regular sets of configurations we must allow the replication of states with the same node. Therefore we annotate every state with a mark (see Remark 2 for details). Transitions are of two types: (i) \(\varepsilon \)-transitions pointing from a node u to an entry node e and labeled with \(\varepsilon \), denoting that a computation reaching u entered the module of u via entry e, and (ii) b-transitions pointing from an entry node e to another entry node \(e'\) and labeled with a box b, corresponding to a call transition \(\langle u,\langle b,e\rangle \rangle \) in the module of \(e'\) in the RSM. Reading the labels along a path in the automaton yields a stack.

In addition to the labeling with boxes we label every transition of a configuration automaton with a semiring value. In the final configuration automata constructed by our algorithms, every run generates a configuration c and thereby captures a certain subset \(\varPi \subseteq \varPi (R,c)\) of computations from the initial set of configurations R to c. The weight of the run equals the combine over the weight of the computations in \(\varPi \). The combine over the weights of all runs in the automaton that generate c equals the combine over the weights of all computations from R to c, i.e., the configuration distance \(d(R, c)\). Since the transitions in a configuration automaton are essentially reversed transitions of the RSM (and the extend operation is not commutative), the weight of a run is given by the extend of the transitions in reversed order.

Configuration Automata. Let \(\mathbb {M}\) be a countably infinite set of marks. A configuration automaton for an RSM \(\mathcal {R}\), also called an \(\mathcal {R}\)-automaton, is a weighted finite automaton \(\mathcal {C}= \langle Q,B,{\xrightarrow {}},I,F,\ell \rangle \), where

  • \(Q \subseteq ( In \cup En \cup Ret ) \times \mathbb {M}\) is a finite set of states,

  • B (the boxes of \(\mathcal {R}\)) is the transition alphabet,

  • \({\xrightarrow {}} \subseteq Q \times ({B \cup \{\varepsilon \}}) \times Q\) is a transition relation, such that every transition has one of the following forms:

    • b-transition: \(\langle e, m_e\rangle \xrightarrow {b} \langle e', m_{e'}\rangle \), where \(b \in B_i\) for some i, \(e \in En _{Y_i(b)}\), and \(e' \in En _i \),

    • \(\varepsilon \)-transition: \(\langle u, m_u\rangle \xrightarrow {\varepsilon } \langle e, m_e\rangle \), where \(e \in En _i\) for some i, and either \(u \in In _i \cup Ret _i\), or \(u = e\),

  • \(I \subseteq Q\) is a set of initial states,

  • \(F \subseteq Q\) and \(F \subseteq En \times \mathbb {M}\) is a set of final states,

  • \(\ell : {\xrightarrow {}} \mapsto D\) is a weight function that assigns a semiring weight to every transition.

Remark 2

(Marks). The marks in the states of a configuration automaton are introduced to support the general setting of representing an arbitrary set of configurations, e.g., with stacks that are not even reachable in the RSM. Since every state is tied to an RSM node, the marks allow to have multiple “copies” of the same node in unrelated parts of the automaton. Furthermore, our algorithm (Sect. 3.2) introduces a fresh mark to recognize when it can safely store entry-to-exit summaries. For the common setting of starting the analysis from the entry nodes of a main module with empty stack, marks are not necessary and can be elided.

Runs and Regular sets of Configurations. A run of a configuration automaton \(\mathcal {C}\) is a sequence \(\lambda = t_1,\dots ,t_{n}\), such that there are states \(q_1,\dots ,q_{n+1}\) and each \(t_i = q_i \xrightarrow {\sigma _i} q_{i+1}\) is a transition of \(\mathcal {C}\) labeled with \(\sigma _i\). We say that \(\lambda \) is a run from \(q_1\) to \(q_{n+1}\), of length \(|\lambda | = n\), labeled by \(S = \sigma _1 \dots \sigma _n\), and of weight \(\otimes (\lambda ) = \bigotimes _{i=n}^{1} \ell (t_i)\) (note that the weights of the transitions are extended in reverse order). We write to denote that \(\lambda \) is a run from q to \(q'\) of any length labeled by S and of weight v. We will also use the notation without v if we are not interested in the weight. The run \(\lambda \) is accepting if \(q \in I\) and \(q' \in F\). A configuration \(\langle u,S\rangle \) is accepted by \(\mathcal {C}\) if there is an accepting run for some mark \(m \in \mathbb {M}\), and additionally \(\otimes (\lambda ) \ne \overline{0}\). We say that two runs are equivalent if they accept the same configuration with the same weight. For technical convenience we consider that for every state \(\langle e, m_e\rangle \) with entry node \(e \in En \) there is an \(\varepsilon \)-self-loop \(\langle e, m_e\rangle \xrightarrow {\varepsilon } \langle e, m_e\rangle \) with weight \(\overline{1}\).

The set of all configurations accepted by \(\mathcal {C}\) is denoted by \(\mathcal {L}(\mathcal {C})\). A set of configurations R is called regular if there exists an \(\mathcal {R}\)-automaton \(\mathcal {C}\) such that \(\mathcal {L}(\mathcal {C})= R\). For a configuration c let \(\varLambda (c)\) be the set of all accepting runs of c and define \(\mathcal {C}(c) = \bigoplus _{\lambda \in \varLambda (c)} \otimes (\lambda )\) the weight that \(\mathcal {C}\) assigns to c.

We note that, despite the imposed syntactic restrictions, our definition of configuration automata is most general in the following sense.

Proposition 1

Let R be a set of configurations such that their string representations is a regular language. Then there exists a configuration automaton \(\mathcal {C}\) such that \(\mathcal {L}(\mathcal {C}) = R\).

3.2 Algorithm for Finite-Height Semirings

In the following we present algorithm \(\mathtt {ConfDist}\) for computing the set \( post ^*(R)\) of a regular set of configurations R. The algorithm operates on an \(\mathcal {R}\)-automaton \(\mathcal {C}\) with \(\mathcal {L}(\mathcal {C})=R\). In the end, it has constructed an \(\mathcal {R}\)-automaton \(\mathcal {C}_{ post ^*}\) such that \(\mathcal {L}(\mathcal {C}_{ post ^*})= post ^*(R)\). Moreover, the configuration distance \(d(R,c)\) from R to any configuration c can be obtained from the labels of \(\mathcal {C}_{ post ^*}\) as \(\mathcal {C}_{ post ^*}(c)\). A computation is called initialized, if its first configuration is accepted by the initial configuration automation \(\mathcal {C}\).

Key Technical Contribution. In this work we consider the configuration distance computation. Using the notion of configuration automata as a symbolic representation structure for regular sets of configurations, the solution of the configuration distance problem has been previously studied in the setting of (weighted) pushdown systems [9, 34, 36]. One of the main algorithmic ideas for the efficient RSM reachability algorithm of [4] is to expand RSM transitions and use entry-to-exit summaries to avoid traversing a module more than once. However, the algorithm in [4] is limited to the node reachability problem. We combine the symbolic representation of configuration automata, along with the summarization principle, to obtain an efficient algorithm for the general configuration distance problem on RSMs.

Intuitive Description of \(\mathtt {ConfDist}\). The intuition behind our algorithm is very simple: it performs a forward search in the RSM. In every iteration it picks a frontier node u and extends the already discovered computations to u with the outgoing transitions from u. Depending on the type of outgoing transitions, a new node discovered and added to the frontier can be (a) an internal node by following an internal transition, (b) the entry node of another module by following a call transition, and (c) a return node corresponding to a previously discovered call by following an exit transition.

In summary, the algorithm simply follows interprocedural paths. However, the crux to achieve our complexity is to keep summaries of paths through a module. Whenever we discovered a full (interprocedural) path from an entry e to an exit x, we keep its weight as an upper bound. Now any subsequently discovered call reaching e does not need to continue the search from e, but short-circuits to x by using the stored summary.

Preprocessing. In order to ease the formal presentation of the algorithm, we consider the following preprocessing on the initial configuration automaton \(\mathcal {C}\). Let \(M \subseteq \mathbb {M}\) be the set of marks in the initial automaton and \(\widehat{m}\in \mathbb {M}\setminus M\) a fresh mark.

  1. 1.

    For every node \(u \in In \cup En \cup Ret \), we add a new state \(\langle u, \widehat{m}\rangle \) marked with the fresh mark. Additionally, all these new states are declared initial.

  2. 2.

    For every initial state \(\langle u, m_u\rangle \in I\) such that there is a call transition \(t = \langle u,\langle b,e\rangle \rangle \in \delta _i\) in \(\mathcal {R}\), for every state \(\langle e', m_{e'}\rangle \) where \(e'\) is an entry node of the same module as u, we add a b-transition \(\langle e, \widehat{m}\rangle \xrightarrow {b} \langle e, m_{e'}\rangle \) with weight \(\overline{0}\).

  3. 3.

    For every state \(\langle e, m_e\rangle \) with entry node \(e \in En _i\) and every internal or return node \(u \in In _i \cup Ret _i\) in the same module as e, we add an \(\varepsilon \)-transition \(\langle u, \widehat{m}\rangle \xrightarrow {\varepsilon } \langle e, m_e\rangle \) with weight \(\overline{0}\).

Essentially the preprocessing a priori adds to \(\mathcal {C}\) all possible states and transitions, so that the algorithm only has to relax those transitions (i.e., without adding them first). Note that the preprocessing only provides for an easier presentation of our algorithm. Indeed, in practice it would be impractical to do the full preprocessing and thus our implementation adds states and transitions to the automaton on the fly.

Technical Description of \(\mathtt {ConfDist}\). We present a detailed explanation of the algorithm supporting the formal description given in Algorithm 1. We require that every transition in the input configuration automaton \(\mathcal {C}\) has weight \(\overline{1}\), since the configurations in \(\mathcal {L}(\mathcal {C})\) should not contribute any initial weight to the configuration distance. The algorithm maintains a worklist \(\mathsf {WL}\) of weighted transitions either of the form \(\langle u, m_u\rangle \xrightarrow {\varepsilon } \langle e, m_e\rangle \) or \(\langle e, m_e\rangle \xrightarrow {b} \langle e', m_{e'}\rangle \), and a summary function \(\mathsf {sum}: ( En \times \mathbb {M}) \times Ex \mapsto D\). Initially, the worklist contains all such transitions where the source state \(\langle u, m_u\rangle \) is an initial state in I, and \(\mathsf {sum}\) is all \(\overline{0}\). In every iteration a transition \(t_\mathcal {C}\) is extracted from the worklist and processed as follows. Since every accepting run starting with \(t_\mathcal {C}\) corresponds to a reachable configuration \(\langle u,S\rangle \) (where S varies over different runs), every transition \(t_{\mathcal {R}}=\langle u,u'\rangle \) in \(\mathcal {R}\) gives rise to another reachable configuration. More precisely, the run corresponds to a set of computations reaching \(\langle u,S\rangle \) from the initial set of configurations, and \(t_\mathcal {R}\) allows to extend these computations by one step. The algorithm incorporates the newly discovered computations by relaxing a transition as follows, illustrated in Fig. 2.

Fig. 2.
figure 2

Relaxation steps of \(\mathtt {ConfDist}\).

  1. 1.

    If \(t_{\mathcal {C}}\) is of the form \(\langle u, m_u\rangle \xrightarrow {\varepsilon } \langle e, m_e\rangle \), then:

    1. (a)

      If \(u'\) is an internal node then the algorithm captures the internal transition by relaxing the transition \(\langle u', \widehat{m}\rangle \xrightarrow {\varepsilon } \langle e, m_e\rangle \) using the weights \(\ell (t_\mathcal {C})\) and \(w(t_{\mathcal {R}})\).

    2. (b)

      If \(u'\) is a call node \(\langle b,e'\rangle \) then the transition \(\langle e', \widehat{m}\rangle \xrightarrow {b} \langle e, m_e\rangle \) is relaxed with the new weight \(\ell (t_\mathcal {C}) \otimes w(t_\mathcal {R})\). Furthermore, an \(\varepsilon \)-self-loop is stored in the worklist to continue exploration from the called entry node \(e'\).

    3. (c)

      If \(u'\) is an exit node x then the algorithm relaxes \(\mathsf {sum}(\langle e, m_e\rangle ,x)\) if a smaller computation to x has been discovered. Note that for \(m_e = \widehat{m}\) this corresponds to valid entry-to-exit computations from e to x. If another call to e is discovered later, the summary is used to avoid traversing the module again. For \(m_e \ne \widehat{m}\) the summary does not necessarily correspond to valid entry-to-exit computations (e.g., because node u was provided as an initial configuration) and is only stored to avoid redundant work. For a return transition from \(\langle u,S\rangle \) the stack S has to be non-empty. The algorithm looks for all possible boxes b at the top of S by going along a b-transition from \(\langle e, m_e\rangle \) to a state \(\langle e', m_{e'}\rangle \). Then for any \(S = bS'\), relaxing the transition \(\langle \langle b,x\rangle , \widehat{m}\rangle \xrightarrow {\varepsilon } \langle e', m_{e'}\rangle \) captures the return transition . Note that here we make use of the fact that the return transition itself has weight \(\overline{1}\).

  2. 2.

    If \(t_{\mathcal {C}}\) is of the form \(\langle e, m_e\rangle \xrightarrow {b} \langle e', m_{e'}\rangle \), then:

    1. (d)

      for every exit node x in the module of e the summary function is used to relax the weight of the transition \(\langle \langle b,x\rangle , \widehat{m}\rangle \xrightarrow {\varepsilon } \langle e', m_{e'}\rangle \) to the value \(\ell (t_\mathcal {C}) \otimes \mathsf {sum}(\langle e, \widehat{m}\rangle ,x)\).

The initial states of \(\mathcal {C}_{ post ^*}\) are the initial states of \(\mathcal {C}\) together with all states with the fresh mark added in the preprocessing. The final states of \(\mathcal {C}_{ post ^*}\) are the unmodified final states of \(\mathcal {C}\).

figure a

Example 2

In Fig. 3 we illustrate an execution of \(\mathtt {ConfDist}\) for the reachability problem in the RSM from Fig. 1. The reader can verify that every configuration in the example computation (1) is accepted by a run of the constructed automaton.

Fig. 3.
figure 3

The configuration automaton \(\mathcal {C}_{ post ^*}\) constructed by \(\mathtt {ConfDist}\) for the RSM in Fig. 1 over the Boolean semiring \(\langle \langle 0,1\rangle , \vee , \wedge , 0, 1 \rangle \), expressing the reachability problem. The initial input automaton \(\mathcal {C}\) is given by the black states, whereas the gray states represent the newly added states with the fresh mark \(\widehat{m}\). The black/gray color gives a similar distinction for the transitions (i.e., the gray transitions have been added by the algorithm). The set of initial states of \(\mathcal {C}\) is \(I=\{e_1^1, e_2\}\), and the set of final states is the singleton set \(F=\{e_1^1\}\). Transitions added in the preprocessing phase with value \(\overline{0}\) are not shown.

Correctness. In the following we outline the correctness of the algorithm. We start with a simple observation about the shape of runs in the constructed configuration automaton.

Proposition 2

For every accepting run \(\lambda \) there exists an equivalent accepting run \(\lambda '\) that starts with an \(\varepsilon \)-transition followed by only b-transitions. Furthermore, all but the first state contain an entry node.

The following three lemmas capture the correctness of \(\mathtt {ConfDist}\). We start with completeness, namely that the distance computed for any configuration c is at most the actual distance from the initial set of configurations \(\mathcal {L}(\mathcal {C})\) to c. The proof relies on showing that for any initialized computation there is a run \(\lambda \) accepting \(\langle u',S'\rangle \) such that \(\otimes (\lambda ) \sqsubseteq \otimes (\pi )\), and follows an induction on the length \(|\pi |\).

Lemma 1

(Completeness). For every configuration c we have \(\mathcal {C}_{ post ^*}(c) \sqsubseteq d(\mathcal {L}(\mathcal {C}), c)\).

We now turn our attention to soundness, namely that the distance computed for any configuration c is at least the actual distance from the initial set of configurations \(\mathcal {L}(\mathcal {C})\) to c. The proof is established via a set of interdependent invariants that state that the algorithm maintains sound entry-to-exit summaries and any run in the automaton has a weight that is witnessed by a set of computations.

Lemma 2

(Soundness). For every configuration c we have \(d(\mathcal {L}(\mathcal {C}), c) \sqsubseteq \mathcal {C}_{ post ^*}(c)\).

Complexity. Finally, we turn our attention to the complexity analysis of the algorithm, which is done by bounding the number of times the algorithm can perform a relaxation step. The complexity bound is based on the height of the semiring \(H\), which implies that every transition can be relaxed at most \(H\) times. The contribution of the size of the initial automaton \(\mathcal {C}\) in the complexity is captured by the number of initial marks \(\kappa \).

Lemma 3

(Complexity). Let \(\kappa \) be the number of distinct marks \(m\in \mathbb {M}\) of the initial automaton \(\mathcal {C}\). Algorithm \(\mathtt {ConfDist}\) constructs \(\mathcal {C}_{ post ^*}\) in time \(O(H\cdot (|\mathcal {R}|\cdot \theta _e\cdot \kappa ^2 + | Call |\cdot \theta _e\cdot \theta _x\cdot \kappa ^3))\), and \(\mathcal {C}_{ post ^*}\) has \(O(|\mathcal {R}|\cdot \theta _e\cdot \kappa ^2)\) transitions.

We summarize the results of this section in the following theorem.

Theorem 1

Let \(\mathcal {R}\) be an RSM over a semiring of height \(H\), and \(\mathcal {C}\) an \(\mathcal {R}\)-automaton with \(\kappa \) marks. Algorithm \(\mathtt {ConfDist}\) constructs in \(O(H\cdot (|\mathcal {R}|\cdot \theta _e\cdot \kappa ^2 + | Call |\cdot \theta _e\cdot \theta _x\cdot \kappa ^3))\) time an \(\mathcal {R}\)-automaton \(\mathcal {C}_{ post ^*}\) with \(\kappa +1\) marks, such that \(d(\mathcal {L}(\mathcal {C}),c) = \mathcal {C}_{ post ^*}(c)\) for every configuration c.

Remark 3

(Comparison with existing work). We now relate Theorem 1 with the existing work for computing configuration distance (often called generalized reachability in the literature) in weighted pushdown systems (WPDS) [34, 36]. For simplicity we assume that the initial automaton is of constant size. A formal description of WPDS is omitted; the reader can refer to [4, 34]. Let \(\mathcal {P}\) be a WPDS where:

  1. 1.

    \(n_{\mathcal {P}}\) is the number of states

  2. 2.

    \(n_{\varDelta }\) is the size of the transition relation

  3. 3.

    \(n_\mathsf {sp}\) is the number of different pairs \(\langle p', \gamma '\rangle \) such that there is a transition of the form \(\langle p,\gamma \rangle \xrightarrow {}{} \langle p', \gamma ' \gamma ''\rangle \) (i.e., from some state p with \(\gamma \) on the top of the stack, the WPDS \(\mathcal {P}\) (i) transitions to state \(p'\), (ii) swaps \(\gamma \) and \(\gamma ''\), and (iii) pushes \(\gamma '\) on the top of the stack).

As shown in [34], given a WPDS \(\mathcal {P}\) with weights from a semiring with height \(H\), together with a corresponding automaton \(\mathcal {C}^{\mathcal {P}}\) that encodes configurations of \(\mathcal {P}\), an automaton \(\mathcal {C}^{\mathcal {P}}_{ post ^*}\) can be constructed as a solution to the configuration distance problem for \(\mathcal {P}\). For ease of presentation we focus on the common case where \(\mathcal {C}^{\mathcal {P}}\) has constant size (e.g., for encoding an initial configuration of \(\mathcal {P}\) with empty stack). Then the time required to construct \(\mathcal {C}^{\mathcal {P}}_{ post ^*}\) is \(O(H\cdot n_{\mathcal {P}}\cdot n_{\varDelta } \cdot n_\mathsf {sp})\) [34, 36].

A direct consequence of [4, Theorem 1] is that an RSM \(\mathcal {R}\) and a configuration automaton \(\mathcal {C}^{\mathcal {R}}\) can be converted to an equivalent PDS \(\mathcal {P}\) and configuration automaton \(\mathcal {C}^{\mathcal {P}}\), and vice versa, such that the following equalities hold:

$$ |\mathcal {R}|=\varTheta (n_{\varDelta });\quad \theta _x = \varTheta (n_{\mathcal {P}}); \quad f\cdot \theta _e= \varTheta (n_\mathsf {sp}), $$

where f represents the number of modules. Hence, the bound we obtain by translating the input RSM to a WPDS and using the algorithm of [34, 36] is \(O(H\cdot |\mathcal {R}|\cdot \theta _e\cdot \theta _x\cdot f)\). Our complexity bound on Theorem 1 is better by a factor \(\varOmega ((|\mathcal {R}|\cdot f)/(\theta _x + | Call |))\). Moreover, to verify such improvements, we have also constructed a family of dense RSMs, and apply our algorithm, and compare against the jMoped implementation of the existing algorithms, and observe a linear speed-up (see Sect. 6.1 for details).

The above analysis considers an explicit model, where \(\mathcal {R}\) comprises two parts, a program control-flow graph \(\mathcal {R}_\mathrm {CFG}\) and the set of all data valuations V, where \(|V|=\theta _e=\theta _x\). Hence, \(|\mathcal {R}|=|\mathcal {R}_\mathrm {CFG}|\cdot |V|^2\). In a symbolic model, where all the data valuations are tracked on the semiring, the input RSM is a factor \(|V|^2\) smaller (i.e., the contribution of the data valuation to \(|\mathcal {R}|\)), and \(\theta _e=\theta _x=1\). However, now each semiring operation incurs a factor \(|V|^2\) increase in time cost, and the height of the semiring increases by a factor \(|V|^2\) as well, in the worst case. Hence, existing symbolic approaches for PDSs have the same worst-case time complexity as the explicit one, and our comparison applies to these as well. For further discussion on symbolic extensions of our algorithm we refer to our technical report [12].

4 Distance Extraction

The algorithm presented in Sect. 3 takes as input a weighted RSM \(\mathcal {R}\) over a semiring and a configuration automaton \(\mathcal {C}\) that represents a regular set R of configurations of \(\mathcal {R}\), and outputs an automaton \(\mathcal {C}_{ post ^*}\) that encodes the distance \(d(R, c)\) to every configuration c. We now discuss the algorithmic problem of extracting such distances from \(\mathcal {C}_{ post ^*}\), and present fast algorithms for this problem. First we will consider the general case for RSMs over an arbitrary semiring. Then we present several improvements for special cases, like RSMs over a semiring with small domain, or sparse RSMs. As the correctness of the constructions is straightforward, our attention will be on the complexity.

4.1 Distances over General Semirings

Configuration Distances. Given a configuration \(c=\langle u,S\rangle \), \(S = b_1 \dots b_{|S|}\), the task is to extract \( d(R,c) = \bigoplus (\varPi (R,c)). \) This is done by a dynamic-programming style algorithm, which computes iteratively for every prefix \(b_1 \dots b_i\) of S and state \(\langle e, m_e\rangle \) with \(e \in En _j\) and \(b_i \in B_j\), the weight

Since there are \(O(\kappa ^2\cdot \theta _e^2)\) transitions labeled with \(b_i\), every iteration requires \(O(\kappa ^2\cdot \theta _e^2)\) time, and the total time for computing \(d(R, c)\) is \(O(|S|\cdot \kappa ^2\cdot \theta _e^2)\).

Superconfiguration Distances. Given a superconfiguration \(\overline{c}=\langle u,\overline{S}\rangle \), \(\overline{S} = \mathcal {M}_1 \dots \mathcal {M}_{|\overline{S}|}\), the task is to extract \( d(R,\overline{c}) = \bigoplus _{c \in \llbracket \langle u,\overline{S}\rangle \rrbracket } d(R, c). \) To handle such queries, we perform a one-time preprocessing of \(\mathcal {C}_{ post ^*}\), so that the transitions are labeled with modules instead of boxes. That is, we create an automaton \(\overline{\mathcal {C}}_{ post ^*}\), initially identical to \(\mathcal {C}_{ post ^*}\). Then we add a transition \(t=\langle e, m_{e}\rangle \xrightarrow {\mathcal {M}} \langle e', m_{e'}\rangle \), with \(\mathcal {M}\) being the module of \(e'\), if there exists a b-transition \(\langle e, m_{e}\rangle \xrightarrow {b} \langle e', m_{e'}\rangle \) in \(\mathcal {C}_{ post ^*}\). The weight function \(\overline{\ell }\) of \(\overline{\mathcal {C}}_{ post ^*}\) is such that the weight of the transition t is

$$ \overline{\ell }(t)=\bigoplus _{t':\langle e, m_{e}\rangle \xrightarrow {b} \langle e', m_{e'}\rangle } \ell (t') $$

where \(t'\) ranges over transitions of \(\mathcal {C}_{ post ^*}\). This construction requires linear time in the number of b-transitions of \(\mathcal {C}_{ post ^*}\), i.e., \(O(|\mathcal {R}|\cdot \theta _e)\). It is straightforward to see that

where \(\overline{\lambda }\) and \(\lambda \) range over accepting runs of \(\overline{\mathcal {C}}_{ post ^*}\) and \(\mathcal {C}_{ post ^*}\) respectively, and S refines \(\overline{S}\). Then, given a superconfiguration \(\overline{c}=\langle u,\overline{S}\rangle \), the extraction of \(d(R,\overline{c})\) is done similarly to the configuration distance extraction, in \(O(|S|\cdot \kappa ^2\cdot \theta _e^2)\) time.

Node Distances. For node distances, the task is to compute \( d(R, u) = \bigoplus _{c=\langle u,S\rangle } d(R, c) \) for every node u of \(\mathcal {R}\). This reduces to treating the automaton \(\mathcal {C}_{ post ^*}\) as a graph G, and solving a traditional single-source distance problem, where the source set contains all states with old marks (i.e., old states that appear in the initial automaton \(\mathcal {C}\)). This requires \(O(H\cdot |\mathcal {C}_{ post ^*}|)\) time for semirings of height \(H\). An informal argument for these bounds is to observe that G can be itself encoded by a SESE RSM \(\mathcal {R}_G\) with a single module, where the entry represents the source set of nodes with old marks. Then, running \(\mathtt {ConfDist}\) for the corresponding semiring, we obtain a solution to the single-source distance problem in the aforementioned times, as established in Theorem 1. Finally, computing same-context node distances requires \(O(|\mathcal {R}|\cdot \theta )\) time in total (i.e., for all nodes). Hence, regardless of the semiring, all node distances can be computed with no overhead, i.e., within the time bounds required for constructing the respective configuration automaton \(\mathcal {C}_{ post ^*}\). The following theorem summarizes the complexity bounds that we obtain for the various distance extraction problems.

Theorem 2

(Distance extraction). Let \(\mathcal {R}\) be an RSM over a semiring of height \(H\) and \(\mathcal {C}\) an \(\mathcal {R}\)-automaton with \(\kappa \) marks. After \(O(H\cdot |\mathcal {R}|\cdot \theta _e\cdot \theta _x\cdot \kappa ^3)\) preprocessing time

  1. 1.

    configuration and superconfiguration distance queries \(\langle u,S\rangle \) are answered in \(O(|S|\cdot \theta _e^2\cdot \kappa ^2)\) time;

  2. 2.

    node distance queries are answered in O(1) time.

4.2 Distances over Semirings with Small Domain

We now turn our attention to configuration and superconfiguration distance extraction for the case of semirings with small domains D. Such semirings express a range of important problems, with reachability being the most well-known (expressed on the Boolean semiring with \(|D|=2\)). We harness algorithmic advancements on the matrix-vector multiplication problem and Four-Russians-style algorithms to obtain better bounds on the distance extraction problem.

Recall that given a box b, the configuration automaton \(\mathcal {C}_{ post ^*}\) has at most \((\theta _e\cdot \kappa )^2\) transitions labeled with b. Such transitions can be represented by a matrix \(A_{b}\in D^{(\theta _e\cdot \kappa ) \times (\theta _e\cdot \kappa )}\). Additionally, for every internal node u we have one matrix \(A_u\in D^{(\kappa ) \times (\theta _e\cdot \kappa )}\) that captures the weights of all transitions of the form \(\langle u, m_u\rangle \xrightarrow {\varepsilon }\langle e, m_e\rangle \). Then, answering a configuration distance query \(\langle u,S\rangle \) with \(S=b_1,\dots , b_{|S|}\) amounts to evaluating the expression

$$\begin{aligned} \mathbf {\overline{1}}_{\kappa } \cdot A_u\cdot A_{b_{1}} \cdots A_{b_{|S|}} \cdot \mathbf {\overline{1}}_{\kappa \cdot \theta _e}^{\top } \end{aligned}$$
(2)

where \(\mathbf {\overline{1}}_{z}\) is a row vector of \(\overline{1}\)s and size z, \(\cdot ^\top \) denotes the transpose, and matrix multiplication is taken over the semiring. The situation is similar in the case of superconfiguration distances, where we have one matrix \(A_{\mathcal {M}, \mathcal {M}'}\) for each pair of modules \(\mathcal {M}\), \(\mathcal {M}'\) such that \(\mathcal {M}\) invokes \(\mathcal {M}'\).

Evaluating Eq. (2) from left to right (or right to left) yields a sequence of matrix-vector multiplications. The following two theorems use the results of [25, 37] on matrix-vector multiplications to provide a speedup on the distance extraction problem when the semiring has constant size \(|D|=O(1)\).

Theorem 3

(Mailman’s speedup [25]). Let \(\mathcal {R}\) be an RSM over a semiring of constant size, and \(\mathcal {C}\) an \(\mathcal {R}\)-automaton with \(\kappa \) marks. After \(O(|\mathcal {R}|\cdot \theta _e\cdot \theta _x\cdot \kappa ^3)\) preprocessing time, configuration and superconfiguration distance queries \(\langle u,S\rangle \) are answered in \(O\left( |S|\cdot \frac{ \theta _e^2\cdot \kappa ^2}{\log (\theta _e \cdot \kappa )}\right) \) time.

Theorem 4

(Williams’s speedup [37]). Let \(\mathcal {R}\) be an RSM over a semiring of size |D|, and \(\mathcal {C}\) an \(\mathcal {R}\)-automaton with \(\kappa \) marks. For any fixed \(\varepsilon >0\), let \(X=|\mathcal {R}|\cdot \theta _e\cdot \theta _x\cdot \kappa ^3\) and \(Z=|\mathcal {R}|\cdot \kappa \cdot (\theta _e\cdot \kappa )^{1+\varepsilon \log _2|D|}\). After \(O(\max (X,Z))\) preprocessing time, configuration and superconfiguration distance queries \(\langle u,S\rangle \) are answered in \(O\left( |S|\cdot \frac{ \theta _e^2\cdot \kappa ^2}{\varepsilon ^2\cdot \log ^2 (\theta _e \cdot \kappa )}\right) \) time.

Finally, using the Four-Russians technique for parsing on non-deterministic automata [28], we obtain the following speedup for the case of reachability. We note that although the alphabet is not of constant size (i.e., the number of boxes is generally non-constant) this poses no overhead, as long as comparing two boxes for equality requires constant time (which is the case in the standard RAM model).

Theorem 5

(Four-Russians speedup [28]). Let \(\mathcal {R}\) be an RSM over a binary semiring, and \(\mathcal {C}\) an \(\mathcal {R}\)-automaton with \(\kappa \) marks. After \(O(|\mathcal {R}|\cdot \theta _e\cdot \theta _x\cdot \kappa ^3)\) preprocessing time, configuration and superconfiguration distance queries \(\langle u,S\rangle \) are answered in \(O\left( |\mathcal {R}|\cdot \theta _e\cdot \kappa ^2\cdot \frac{ |S|}{\log (|S|)}\right) \) time.

4.3 A Speedup for Sparse RSMs

We call an RSM \(\mathcal {R}\) sparse if there is a constant bound r such that for all modules \(\mathcal {M}_i\) we have \(|\{ Y_i(b) \mid b \in B_i\}| \le r\) i.e., every module invokes at most r other modules (although \(\mathcal {M}_i\) can have arbitrarily many boxes). Typical call-graphs of most programs are very sparse, e.g., typical call graphs of thousands of nodes have average degree at most eight [8, 30]. Hence, an RSM modeling a typical program is expected to comprise thousands of modules, while the average module invokes a small number of other modules. Although this does not imply a constant bound on the number of invoked modules, such an assumption provides a good theoretical basis for the analysis of typical programs.

Our goal is to provide a speedup for extracting superconfiguration distances w.r.t. a sparse RSM. This is achieved by an additional polynomial-time preprocessing, which then allows to process a distance query in blocks of logarithmic size, and thus offers a speedup of the same order.

Given an RSM \(\mathcal {R}\) of k modules and an integer z, there exist at most \(k\cdot r^z\) valid module sequences \(\mathcal {M}_1\dots , \mathcal {M}_{z+1}\) which can appear as a substring in a module sequence \(\overline{S}\) which is refined by some stack S. Recall the definition of the matrices \(A_{\mathcal {M}, \mathcal {M}'}\in D^{(\theta _e\cdot \kappa ) \times (\theta _e\cdot \kappa )}\) from Sect. 4.2. For every valid sequence of \(z+1\) modules \(s=\mathcal {M}_1\dots , \mathcal {M}_{z+1}\), we construct a matrix \(A_{s}=A_{\mathcal {M}_1, \mathcal {M}_2}\cdot A_{\mathcal {M}_2,\mathcal {M}_3}\cdot \dots A_{\mathcal {M}_z,\mathcal {M}_{z+1}}\) in total time

$$\begin{aligned} k\cdot (\theta _e\cdot \kappa )^{\omega } \sum _{i=1}^z r^i = O\left( |\mathcal {R}|\cdot \theta _e^{\omega -1} \kappa ^{\omega }\cdot r^z \right) \end{aligned}$$
(3)

where \((\theta _e\cdot \kappa )^\omega =\varOmega (\theta ^2\cdot \kappa ^2)\) is time require to multiply two \(D^{(\theta _e\cdot \kappa ) \times (\theta _e\cdot \kappa )}\) matrices (currently \(\omega \simeq 2.372\), due to [38]).

Observe that as long as \(z=O(\log |\mathcal {R}|)\), there are polynomially many such sequences s, and thus each one can be indexed in O(1) time on the standard RAM model. Then a superconfiguration distance query \(\langle u,S\rangle \) can be answered by grouping S in \(\lceil \frac{|S|}{z}\rceil \) blocks of size z each, and for each such block s multiply with matrix \(A_{s}\).

Theorem 6

(Sparsity speedup). Let \(\mathcal {R}\) be a sparse RSM over a semiring of height \(H\), and \(\mathcal {C}\) an \(\mathcal {R}\)-automaton with \(\kappa \) marks. Let \(X=H\cdot |\mathcal {R}|\cdot \theta _e\cdot \theta _x\cdot \kappa ^3\), and given an integer parameter \(x = O({{\mathrm{poly}}}|\mathcal {R}|)\), let \(Z=|\mathcal {R}|\cdot \theta _e^{\omega -1} \kappa ^{\omega }\cdot x\). After \(O(\max (X,Z))\) preprocessing time, superconfiguration distance queries \(\langle u,S\rangle \) are answered in \(O\left( |S|\cdot \left\lceil \frac{\theta _e^2\cdot \kappa ^2}{\log x}\right\rceil \right) \) time.

By varying the parameter z, Theorem 6 provides a tradeoff between preprocessing and query times. Finally, the presented method can be combined with the preprocessing on constant-size semirings of Sect. 4.2 which leads to a \(\varTheta (\log z)\) factor improvement on the query times of Theorems 3, 4 and 5.

5 Context-Bounded Reachability in Concurrent Recursive State Machines

Context bounding, i.e., limiting the number of context switches considered during state space exploration, is an effective technique for systematic analysis of concurrent programs. The context-bounded reachability problem in concurrent pushdown systems has been studied in [29]. In this section we phrase the context-bounded reachability problem over concurrent RSMs and show that the procedure of [29] using our algorithm \(\mathtt {ConfDist}\) together with the results of the previous sections give a better time complexity for the problem. As the section follows closely the well-known framework of concurrent pushdown systems [29], we keep the description brief.

Concurrent RSMs. A concurrent RSM (CRSM) \(\mathcal {R^{\parallel }}\) is a collection of RSMs \(\mathcal {R}_i\) equipped with a finite set of global states G used for communication between the RSMs. To this end, the semantics of RSMs is lifted to \(\mathcal {R}_i\)-configurations of the form \(\langle g,u_i, S_i\rangle \), carrying an additional global state \(g \in G\). Then, a global configuration of \(\mathcal {R^{\parallel }}\) is a tuple \(\langle g,\langle u_1,S_1\rangle ,\dots ,\langle u_n,S_n\rangle \rangle \), where \(\langle g,u_i, S_i\rangle \) are configurations of \(\mathcal {R}_i\), respectively. The semantics of \(\mathcal {R^{\parallel }}\) over global configurations is the standard interleaving semantics, i.e., in each step some RSM \(\mathcal {R}_i\) modifies the global state and its local configuration, while the local configuration of every other RSM remains unchanged.

Context-Bounded Reachability. For a positive natural number k and a fixed initial global configuration c, the k-bounded reachability problem asks for all global configurations \(c'\) such that there is a computation from c to \(c'\) that switches control between RSMs at most \(k-1\) times.

An Algorithm for Context-Bounded Reachability. The procedure of [29] for solving the k-bounded reachability problem for concurrent pushdown systems (CPDSs) systematically performs \( post ^*\) operations on the reachable configuration set of every constituent PDS, while capturing all possible interleavings within k context switches. The k-bounded reachability problem for CRSMs can be solved with an almost identical procedure, replacing the black-box invocations of the PDS reachability algorithm of [36] with our algorithm \(\mathtt {ConfDist}\). However, using our algorithm for each \( post ^*\) operation, we obtain a complexity improvement over the method of [29].

Key Complexity Improvement. The key advantage of our algorithm as compared to [29] is as follows: in the algorithm of [29], in each iteration the configuration automata, used to represent the reachable configurations of each component RSM, grows by a cubic term; in contrast, replacing with our algorithm the configuration automata grows only by a linear term in each iteration. This comes from the fact that in our configuration automata every state corresponds to a node of the RSM, whereas such strong correspondence does not hold for the configuration automata of [29].

Theorem 7

For a concurrent RSM \(\mathcal {R^{\parallel }}\), and a bound k, the procedure of [29, Figure 2] using \(\mathtt {ConfDist}\) for performing \( post ^*\) operations correctly solves the k-bounded reachability problem and requires \(O(|\mathcal {R^{\parallel }}|\cdot \theta _e^{||}\cdot \theta _x^{||}\cdot n^k\cdot |G|^{k+2})\) time.

Compared to Theorem 7, solving the CRSM problem by translation to a CPDS and using the algorithm of [29] gives the bound \(O(|\mathcal {R^{\parallel }}|^5 \cdot \theta ^{||~5}_x\cdot n^k\cdot |G|^k)\). Conversely, solving the CPDS problem by translation to a CRSM and using our algorithm gives an improvement by a factor \(\varOmega (|\mathcal {P}^{\parallel }|^3/|G|^2)\). We refer to our technical report [12] for a detailed discussion.

6 Experimental Results

In this section we empirically demonstrate the algorithmic improvements achieved by our RSM-based algorithm over existing PDS-based algorithms on interprocedural program analysis problems. The main goal is to demonstrate the improvements in algorithmic ideas rather than implementation details and engineering aspects. In particular, we implemented our algorithm \(\mathtt {ConfDist}\) in a prototype tool and compared its efficiency against jMoped [1], which implements the algorithms of [34, 36] and is a leading tool for the analysis of weighted pushdown systems. In all cases we used an explicit representation of data valuations on the nodes of RSMs, as opposed to a symbolic semiring representation. All experiments were run on a machine with an Intel Xeon CPU and a memory limit of 80 GB. We first present our result on a synthetic example to verify the algorithmic improvements on a constructed family, and then present results on real-world benchmarks.

Fig. 4.
figure 4

Speedup of our algorithm over the algorithms of [34, 36] implemented by jMoped on the RSM family \(\mathcal {R}_n\).

6.1 A Family of Dense RSMs

For our first experiments we constructed a family of dense RSMs that can be scaled in size. The purpose of this experiments is to verify that (i) our algorithm indeed achieves a speedup over the algorithms of [34, 36], and (ii) the speedup scales with the size of the input to ensure that improvements on real-world benchmarks are not due to implementation details, such as the used data types. Let \(\mathcal {R}_n\) be a single-module RSM that consists of n entries and n exits, and a single box which makes a recursive call. The transition relation is \(\delta = ( En \times ( Call \cup Ex )) \cup ( Ret \times Ex )\), i.e., every entry node connects to every call and exit node, and every return node connects to every exit node. Hence \(|\mathcal {R}_n|=n^2\). The transition weights are irrelevant, as we will focus on reachability. The initial configuration automaton \(\mathcal {C}\) contains a single entry state. We considered \(\mathcal {R}_n\) with n in the range from 10 to 200. For each RSM, we used the standard translation to a PDS [4], and then applied our tool and jMoped to compute a configuration automaton that represents \( post ^{*}(\mathcal {L}(\mathcal {C}))\). Figure 4 depicts the obtained speedup, which scales linearly with n. We have also experimented with other similar synthetic RSMs with different means of scaling; and in all cases the obtained speedups have the same qualitative behavior. This confirms the theoretical algorithmic improvements of our algorithm on the synthetic benchmarks.

6.2 Boolean Programs from SLAM/SDV

Benchmarks. For our second experiments we used the collection of Boolean programs distributed as part of the SLAM/SDV project [6, 7]. These programs are the final abstractions in the verification of Windows device drivers, and thus they represent RSMs obtained from real-world programs. From the Boolean programs we obtained RSMs where every node represents a control location together with a valuation of Boolean variables, and call/entry and exit/return nodes model the parameter passing between functions. Thus, the RSMs are naturally multi-entry-multi-exit. Overall we obtained 73 RSMs, which correspond to the largest Boolean programs possible to handle explicitly.

Evaluation. To ensure a fair performance comparison, we applied two preprocessing steps to the benchmark RSMs.

  • First, to ensure that both tools compute the same result without any potential unnecessary work, we restricted the state space of the RSMs to the interprocedurally reachable states.

  • Second, to focus on the performance of interprocedural analysis, we eliminated all internal nodes by computing the intraprocedural transitive closure within every RSM module.

The above two transformations ensure preprocessing steps like removal of unreachable states and intraprocedural analysis is already done, and we compare the interprocedural algorithmic aspects of the algorithms. For each RSM, we used the standard translation to a PDS [4], and then applied our tool and jMoped to compute a configuration automaton that represents \( post ^{*}(\mathcal {L}(\mathcal {C}))\), where \(\mathcal {C}\) is an initial configuration automaton that contains the entry states of the main module. Table 3 shows for every benchmark the number of RSM transitions (Trans.), their ratio to nodes (D), the runtime for computing the intraprocedural transitive closure (TC), the runtime of jMoped (jMop), the runtime of our tool (Ours), and the speedup our tool achieved over jMoped (SpUp).

Out tool clearly outperforms jMoped on every benchmark, with speedups from 3.94 up to 28.48. The runtimes of our tool range from 0.13 to 33.96 s, while the runtimes of jMoped range from 1.03 to 950.82 s. Thus, our experiments show that also for real-world examples our algorithm successfully exploits the structure of procedural programs preserved in RSMs. This shows the potential of our algorithm for building program analysis tools.

Note that the benchmark RSMs are quite large, with millions of nodes and transitions, which even a basic implementation of our algorithm handled quite efficiently. Moreover, in our experiments we observed that our tool uses considerably less memory than jMoped. While we set 80 GB as the memory limit, the peak memory consumption of jMoped was 72 GB, whereas our tool solved all benchmarks with less than 32 GB memory.

Table 3. Comparison of our tool against jMoped. Runtimes are given in seconds. The names of all benchmarks are given in our technical report [12].

6.3 Discussion

In our experiments we compared the implementation of our algorithm with jMoped on sequential RSM analysis in an explicit setting. While our algorithm can be made symbolic in a straightforward way, a symbolic implementation and efficiency for large symbolic domains involve significant engineering efforts. Moreover, the main goal of our work is to compare the algorithmic improvements over the existing approaches, which is best demonstrated in an explicit setting, since in the explicit setting the improvements are algorithmic rather than due to implementation details of symbolic data-structures. Our experimental results show the potential of the new algorithmic ideas, and investigating the applicability of them with a symbolic implementation is a subject of future work.

7 Related Work

Sequential Setting. Pushdown systems are very well studied for interprocedural analysis [10, 32, 35]. While the most basic problem is reachability, the weighted pushdown systems (i.e., pushdown systems enriched with semiring) can express several basic dataflow properties, and other relevant problems in interprocedural program analysis [20, 22, 33, 34]. Hence weighted pushdown systems have been studied in many different contexts, such as [13, 17, 32, 35], and tools have been developed, such as Moped [2], jMoped [1], and WALi [3]. The more convenient model of RSMs was introduced and studied in [4], which on the one hand explicitly models the function calls and returns, and on the other hand specifies many natural parameters for algorithmic analysis. In this work, we improve the fundamental algorithms for RSMs over finite-height semirings, as compared to the bounds obtained by translating RSMs to pushdown systems and applying the best-known bounds for the pushdown case. Along with general RSMs, special cases of SESE RSMs have also been considered, such as RSMs with constant treewidth, and only same context queries [11] (i.e., computation of node distances between nodes of the same module). Our results apply to the general case of all RSMs and are not restricted to any special types of queries.

Concurrent Setting. The problem of reachability in concurrent pushdown systems (or concurrent RSMs) is again a fundamental problem in program analysis, which allows for the interprocedural analysis in a concurrent setting. However, the problem is undecidable [31]. Motivated by practical problems, where bugs are discovered with few context switches, the context-bounded reachability problem, where there can be at most k context switches have been considered for concurrent pushdown systems [21, 23, 26, 27, 29] as well as related models of asynchronous pushdown networks [9]. We present a new algorithm for concurrent pushdown systems and concurrent RSMs which improves the existing complexity when the size of the global component is small.

8 Conclusion

In this work we consider RSMs, a fundamental model for interprocedural analysis, with path properties expressed over finite-height semirings, that can express a large class of properties for program analysis. We present algorithms that improve the previous algorithms, both in the sequential as well as in the concurrent setting. Moreover, along with our algorithm, we present new methods to extract distances from the data-structure (configuration automata) that the algorithm constructs. We present a prototype implementation for sequential RSMs in an explicit setting that provides significant improvements for real-world programs obtained from SLAM/SDV benchmarks. Our results show the potential of the new algorithmic ideas. There are several interesting directions of future work. A symbolic implementation is a direction for future work. Another direction of future work is to explore the new algorithmic ideas in the concurrent setting in practice.