Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Markov decision processes (MDPs) [14, 17] are a formal model for games on directed graphs, where certain decisions are taken by a strategic player (a.k.a. Player 1, or controller) while others are taken randomly (a.k.a. by nature, or the environment) according to pre-defined probability distributions. MDPs are thus a subclass of general 2-player stochastic games, and they are equivalent to 1.5-player games in the terminology of [10]. They are also called “games against nature”.

A run of the MDP consists of a sequence of visited states and transitions on the graph. Properties of the system are expressed via properties of the induced runs. The most basic objectives are reachability (is a certain (set of) control-state(s) eventually visited?) and Büchi objectives (is a certain (set of) control-state(s) visited infinitely often?).

Since a strategy of Player 1 induces a probability distribution of runs of the MDP, the objective of an MDP is defined in terms of this distribution, e.g., if the probability of satisfying a reachability/Büchi objective is at least a given constant. The special case where this constant is 1 is a key example of a qualitative objective. Here one asks whether Player 1 has a strategy that achieves an objective surely (all runs satisfy the property) or almost-surely (the probability of the runs satisfying the property is 1).

Most classical work on algorithms for MDPs and stochastic games has focused on finite-state systems (e.g., [11, 14, 19]), but more recently several classes of infinite-state systems have been considered as well. For instance, MDPs and stochastic games on infinite-state probabilistic recursive systems (i.e., probabilistic pushdown automata with unbounded stacks) [13] and on one-counter systems [6, 7] have been studied. Another infinite-state probabilistic model, which is incomparable to recursive systems, is a suitable probabilistic extension of Vector Addition Systems with States (VASS; a.k.a. Petri nets), which have a finite number of unbounded counters holding natural numbers.

Our Contribution. We study the decidability of probability-1 qualitative reachability and Büchi objectives for infinite-state MDPs that are induced by suitable probabilistic extensions of VASS that we call VASS-MDPs. (Most quantitative objectives in probabilistic VASS are either undecidable, or the solution is at least not effectively expressible in \((\mathbb {R},+,*,\le )\) [3]). It is easy to show that, for general VASS-MDPs, even the simplest of these problems, (almost) sure reachability, is undecidable. Thus we consider two monotone subclasses: 1-VASS-MDPs and P-VASS-MDPs. In 1-VASS-MDPs, only Player 1 can modify counter values while the probabilistic player can only change control-states, whereas for P-VASS-MDPs it is vice-versa. Still these two models induce infinite-state MDPs. Unlike for finite-state MDPs, it is possible that the value of the MDP, in the game theoretic sense, is 1, even though there is no single strategy that achieves value 1. For example, there can exist a family of strategies \(\sigma _\epsilon \) for every \(\epsilon >0\), where playing \(\sigma _\epsilon \) ensures a probability \(\ge 1-\epsilon \) of reaching a given target state, but no strategy ensures probability 1. In this case, one says that the reachability property holds limit-surely, but not almost-surely (i.e., unlike in finite-state MDPs, almost-surely and limit-surely do not coincide in infinite-state MDPs).

We show that even for P-VASS-MDPs, all sure/almost-sure/limit-sure reachability/Büchi problems are still undecidable. However, in the deadlock-free subclass of P-VASS-MDPs, the sure reachability/Büchi problems become decidable (while the other problems remain undecidable). In contrast, for 1-VASS-MDPs, the sure/almost-sure/limit-sure reachability problem and the sure/almost-sure Büchi problem are decidable.

Our decidability results rely on two different techniques. For the sure and almost sure problems, we prove that we can reduce them to the model-checking problem over VASS of a restricted fragment of the modal \(\mu \)-calculus that has been proved to be decidable in [4]. For the limit-sure reachability problem in 1-VASS-MDP, we use an algorithm which at each iteration reduces the dimension of the considered VASS while preserving the limit-sure reachability properties.

Although we do not consider the class of qualitative objectives referring to the probability of (repeated) reachability being strictly greater than 0, we observe that reachability on VASS-MDPs in such a setting is equivalent to reachability on standard VASS (though this correspondence does not hold for repeated reachability).

Outline. In Sect. 2 we define basic notations and how VASS induce MDPs. In Sects. 3 and 4 we consider verification problems for P-VASS-MDP and 1-VASS-MDP, respectively. In Sect. 5 we summarize the decidability results (Table 1) and outline future work. Omitted proofs can be found in [2].

2 Models and Verification Problems

Let \(\mathbb {N}\) (resp. \(\mathbb {Z}\)) denote the set of nonnegative integers (resp. integers). For two integers ij such that \(i \le j\) we use \([i..j]\) to represent the set \(\{ k \in \mathbb {Z}\mid i \le k \le j \}\). Given a set X and \(n \in \mathbb {N}\setminus \{ 0 \}\), \(X^n\) is the set of n-dimensional vectors with values in X. We use \(\mathbf{0 }\) to denote the vector such that \(\mathbf{0 }(i)=0\) for all \(i \in [1..n]\). The classical order on \(\mathbb {Z}^n\) is denoted \(\le \) and is defined by \(\mathbf{v } \le \mathbf{w }\) if and only if \(\mathbf{v }(i) \le \mathbf{w }(i)\) for all \(i \in [1..n]\). We also define the operation \(+\) over n-dimensional vectors of integers in the classical way (i.e., for \(\mathbf{v }\), \(\mathbf{v }' \in \mathbb {Z}^n\), \(\mathbf{v } + \mathbf{v }'\) is defined by \((\mathbf{v }+\mathbf{v }')(i)=\mathbf{v }(i)+\mathbf{v }'(i)\) for all \(i \in [1..n]\)). Given a set S, we use \(S^*\) (respectively \(S^\omega \)) to denote the set of finite (respectively infinite) sequences of elements of S. We now recall the notion of well-quasi-ordering (which we abbreviate as wqo). A quasi-order \((A,\preceq )\) is a wqo if for every infinite sequence of elements \(a_1,a_2,\ldots \) in A, there exist two indices \(i<j\) such that \(a_i \preceq a_j\). For \(n>0\), \((\mathbb {N}^n,\le )\) is a wqo. Given a set A with an ordering \(\preceq \) and a subset \(B \subseteq A\), the set B is said to be upward closed in A if \(a_1 \in B\), \(a_2 \in A\) and \(a_1 \preceq a_2\) implies \(a_2 \in B\).

2.1 Markov Decision Processes

A probability distribution on a countable set X is a function \(f: X \mapsto [0,1]\) such that \(\sum _{x \in X}f(x)=1\). We use \(\mathcal{D}(X)\) to denote the set of all probability distributions on X. We first recall the definition of Markov decision processes.

Definition 1

(MDPs). A Markov decision process (MDP) M is a tuple \(\langle C,C_1, C_P,A,\rightarrow ,p \rangle \) where: C is a countable set of configurations partitioned into \(C_1\) and \(C_P\) (that is \(C=C_1 \cup C_P\) and \(C_1 \cap C_P=\emptyset \)); A is a set of actions; \(\rightarrow \subseteq C \times A \times C\) is a transition relation; \(p: C_P \mapsto \mathcal{D}(C)\) is a partial function which assigns to some configurations in \(C_P\) probability distributions on C such that \(p(c)(c')>0\) if and only if \(c \xrightarrow {a} c'\) for some \(a \in A\).

Note that our definition is equivalent as seeing MDPs as games played between a nondeterministic player (Player 1) and a probabilistic player (Player P). The set \(C_1\) contains the nondeterministic configurations (or configurations of Player 1) and the set \(C_P\) contains the probabilistic configurations (or configurations of Player P). Given two configurations \(c,c'\) in C, we write \(c\rightarrow c'\) whenever there exists \(a \in A\) such that \(c \xrightarrow {a} c'\). We will say that a configuration \(c \in C\) is a deadlock if there does not exist \(c' \in C\) such that \(c \rightarrow c'\). We use \(C^{df}_1\) (resp. \(C^{df}_P\)), to denote the configurations of Player 1 (resp. of Player P) which are not a deadlock (df stands here for deadlock free).

A play of the MDP \(M=\langle C,C_1,C_P,A,\rightarrow ,p \rangle \) is either an infinite sequence of the form \(c_0 \xrightarrow {a_0} c_1 \xrightarrow {a_1} c_2 \cdots \) or a finite sequence \(c_0 \xrightarrow {a_0} c_1 \xrightarrow {a_1} c_2 \cdots \xrightarrow {a_{k-1}} c_k\). We call the first kind of play an infinite play, and the second one a finite play. A play is said to be maximal whenever it is infinite or it ends in a deadlock configuration. These latter plays are called deadlocked plays. We use \(\varOmega \) to denote the set of maximal plays. For a finite play \(\rho =c_0 \xrightarrow {a_0} c_1 \xrightarrow {a_1} c_2 \cdots \xrightarrow {a_{k-1}} c_k\), let \(c_k= last (\rho )\). We use \(\varOmega ^{df}_1\) to denote the set of finite plays \(\rho \) such that \( last (\rho ) \in C^{df}_1\).

A strategy for Player 1 is a function \(\sigma : \varOmega ^{df}_1 \mapsto C\) such that, for all \(\rho \in \varOmega ^{df}_1\) and \(c \in C\), if \(\sigma (\rho )=c\) then \( last (\rho ) \rightarrow c\). Intuitively, given a finite play \(\rho \), which represents the history of the game so far, the strategy represents the choice of Player 1 among the different possible successor configurations from \( last (\rho )\). We use \(\varSigma \) to denote the set of all strategies for Player 1. Given a strategy \(\sigma \in \varSigma \), an infinite play \(c_0 \xrightarrow {a_0} c_1 \xrightarrow {a_1} c_2 \cdots \) respects \(\sigma \) if for every \(k \in \mathbb {N}\), we have that if \(c_k \in C_1\) then \(c_{k+1}=\sigma (c_0 \xrightarrow {a_0} c_1 \xrightarrow {a_1} c_2 \cdots c_{k})\) and if \(c_k \in C_P\) then \(p(c_k)(c_{k+1}) >0\). We define finite plays that respect \(\sigma \) similarly. Let \(\mathtt {Plays}(M,c,\sigma ) \subseteq \varOmega \) be the set of all maximal plays of M that start from c and that respect \(\sigma \).

Note that once a starting configuration \(c_0 \in C\) and a strategy \(\sigma \) have been chosen, the MDP is reduced to an ordinary stochastic process. We define an event \(\mathcal{A}\subseteq \varOmega \) as a measurable set of plays and we use \(\mathbb {P}(M,c,\sigma ,\mathcal{A})\) to denote the probability of event \(\mathcal{A}\) starting from \(c \in C\) under strategy \(\sigma \). The notation \(\mathbb {P}^+(M,c,\mathcal{A})\) will be used to represent the maximal probability of event \(\mathcal{A}\) starting from c which is defined as \(\mathbb {P}^+(M,c,\mathcal{A})=\text{ sup }_{\sigma \in \varSigma } \mathbb {P}(M,c,\sigma ,\mathcal{A})\).

2.2 VASS-MDPs

Probabilistic Vector Addition Systems with States have been studied, e.g., in [3]. Here we extend this model with non-deterministic choices made by a controller. We call this new model VASS-MDPs. We first recall the definition of Vector Addition Systems with States.

Definition 2

(VASS). For \(n>0\), an n-dimensional Vector Addition System with States (VASS) is a tuple \(S=\langle Q,T \rangle \) where Q is a finite set of control states and \(T \subseteq Q \times \mathbb {Z}^n \times Q\) is the transition relation labelled with vectors of integers.

In the sequel, we will not always make precise the dimension of the considered VASS. Configurations of a VASS are pairs \(\langle q,\mathbf{v } \rangle \in Q \times \mathbb {N}^n\). Given a configuration \(\langle q,\mathbf{v } \rangle \) and a transition \(t=\langle q,\mathbf{z },q' \rangle \) in T, we will say that t is enabled at \(\langle q'',\mathbf{v } \rangle \), if \(q=q''\) and \(\mathbf{v } + \mathbf{z } \ge \mathbf{0 }\). Let then \(\mathtt {En}(q,\mathbf{v })\) be the set \(\{ t \in T \mid t \text{ is } \text{ enabled } \text{ at } \langle q,\mathbf{v }) \rangle \}\). In case the transition \(t=\langle q,\mathbf{z },q' \rangle \) is enabled at \(\langle q,\mathbf{v } \rangle \), we define \(t(q,\mathbf{v })=\langle q',\mathbf{v }' \rangle \) where \(\mathbf{v }'=\mathbf{v } + \mathbf{z }\). An n-dimensional VASS S induces a labelled transition system \(\langle C,T,\rightarrow \rangle \) where \(C=Q \times \mathbb {N}^n\) is the set of configurations and the transition relation \(\rightarrow \subseteq C \times T \times C\) is defined as follows: \(\langle q,\mathbf{v } \rangle \xrightarrow {t} \langle q',\mathbf{v }' \rangle \text{ iff } \langle q',\mathbf{v }' \rangle =t(q,\mathbf{v })\). VASS are sometimes seen as programs manipulating integer variables, a.k.a. counters. When a transition of a VASS changes the i-th value of a vector \(\mathbf{v }\), we will sometimes say that it modifies the value of the i-th counter. We show now in which manner we add probability distributions to VASS.

Definition 3

(VASS-MDP). A VASS-MDP is a tuple \(S=\langle Q,Q_1,Q_P,T,\tau \rangle \) where \(\langle Q,T \rangle \) is a VASS for which the set of control states Q is partitioned into \(Q_1\) and \(Q_P\), and \(\tau : T \mapsto \mathbb {N}\setminus \{ 0 \}\) is a partial function assigning to each transition a weight which is a positive natural number.

Nondeterministic (resp. probabilistic) choices are made from control states in \(Q_1\) (resp. \(Q_P\)). The subset of transitions from control states of \(Q_1\) (resp. control states of \(Q_P\)) is denoted by \(T_1\) (resp. \(T_P\)). Hence \(T =T_1 \cup T_P\) with \(T_1 \subseteq Q_1 \times \mathbb {Z}^n \times Q\) and \(T_P \subseteq Q_P \times \mathbb {Z}^n \times Q\). A VASS-MDP \(S=\langle Q,Q_1,Q_P,T,\tau \rangle \) induces an MDP \(M_S=\langle C,C_1,C_P,T,\rightarrow ,p \rangle \) where: \(\langle C,T,\rightarrow \rangle \) is the labelled transition system associated with the VASS \(\langle Q,T \rangle \); \(C_1=Q_1 \times \mathbb {N}^n\) and \(C_P=Q_P \times \mathbb {N}^n\); and for all \(c \in C^{df}_P\) and \(c' \in C\), if \(c \rightarrow c'\), the probability of going from c to \(c'\) is defined by \(p(c)(c')=(\sum _{\{ t \mid t(c)=c' \}} \tau (t)) / (\sum _{t \in \mathtt {En}(c)} \tau (t))\), whereas if \(c \not \rightarrow c'\), we have \(p(c)(c')=0\). Note that the MDP \(M_S\) is well-defined: when defining \(p(c)(c')\) in the case \(c \rightarrow c'\), there exists at least one transition in \(\mathtt {En}(c)\) and consequently the sum \(\sum _{t \in \mathtt {En}(c)} \tau (t)\) is never equal to 0. Also, we could have restricted the weights to be assigned only to transitions leaving from a control state in \(Q_P\) since we do not take into account the weights assigned to the other transitions. A VASS-MDP is deadlock free if its underlying VASS is deadlock free.

Finally, as in [18] or [4], we will see that to gain decidability it is useful to restrict the power of the nondeterministic player or of the probabilistic player by restricting their ability to modify the counters’ values and hence letting them only choose a control location. This leads to the two following definitions: a P-VASS-MDP is a VASS-MDP \(\langle Q,Q_1,Q_P,T,\tau \rangle \) such that for all \(\langle q,\mathbf{z },q' \rangle \in T_1\), we have \(\mathbf{z }=\mathbf{0 }\) and a 1-VASS-MDP is a VASS-MDP \(\langle Q,Q_1,Q_P,T,\tau \rangle \) such that for all \(\langle q,\mathbf{z },q' \rangle \in T_P\), we have \(\mathbf{z }=\mathbf{0 }\). In other words, in a P-VASS-MDP, Player 1 cannot change the counter values when taking a transition and, in a 1-VASS-MDP, it is Player P which cannot perform such an action.

2.3 Verification Problems for VASS-MDPs

We consider qualitative verification problems for VASS-MDPs, taking as objectives control-state reachability and repeated reachability. To simplify the presentation, we consider a single target control-state \(q_F \in Q\). However, our positive decidability results easily carry over to sets of target control-states (while the negative ones trivially do). Note however, that asking to reach a fixed target configuration like \(\langle q_F,\mathbf{0 } \rangle \) is a very different problem (cf. [3]).

Let \(S=\langle Q,Q_1,Q_P,T,\tau \rangle \) be a VASS-MDP and \(M_S\) its associated MDP. Given a control state \(q_F \in Q\), we denote by \(\llbracket \diamondsuit q_F \rrbracket \) the set of infinite plays \(c_0 \cdot c_1 \cdot \cdots \) and deadlocked plays \(c_0 \cdot \cdots \cdot c_l\) of \(M_S\) for which there exists an index \(k \in \mathbb {N}\) such that \(c_k=\langle q_F,\mathbf{v } \rangle \) for some \(\mathbf{v } \in \mathbb {N}^n\). Similarly, \(\llbracket \Box \diamondsuit q_F \rrbracket \) characterizes the set of infinite plays \(c_0 \cdot c_1 \cdot \cdots \) of \(M_S\) for which the set \(\{ i \in \mathbb {N}\mid c_i=\langle q_F,\mathbf{v } \rangle \text{ for } \text{ some } \mathbf{v } \in \mathbb {N}^n \}\) is infinite. Since \(M_S\) is an MDP with a countable number of configurations, we know that the sets of plays \(\llbracket \diamondsuit q_F \rrbracket \) and \(\llbracket \Box \diamondsuit q_F \rrbracket \) are measurable (for more details see for instance [5]), and are hence events for \(M_S\). Given an initial configuration \(c_0 \in Q \times \mathbb {N}^n\) and a control-state \(q_F \in Q\), we consider the following questions:

  1. 1.

    The sure reachability problem: Does there exist a strategy \(\sigma \in \varSigma \) such that

    \(\mathtt {Plays}(M_S,c_0,\sigma ) \subseteq \llbracket \diamondsuit q_F \rrbracket \)?

  2. 2.

    The almost-sure reachability problem: Does there exist a strategy \(\sigma \in \varSigma \) such that \(\mathbb {P}(M_S,c_0,\sigma ,\llbracket \diamondsuit q_F \rrbracket )=1\)?

  3. 3.

    The limit-sure reachability problem: Does \(\mathbb {P}^+(M_S,c_0,\llbracket \diamondsuit q_F \rrbracket )=1\)?

  4. 4.

    The sure repeated reachability problem: Does there exist a strategy \(\sigma \in \varSigma \) such that \(\mathtt {Plays}(M_S,c_0,\sigma ) \subseteq \llbracket \Box \diamondsuit q_F \rrbracket \)?

  5. 5.

    The almost-sure repeated reachability problem: Does there exist a strategy \(\sigma \in \varSigma \) such that \(\mathbb {P}(M_S,c_0,\sigma ,\llbracket \Box \diamondsuit q_F \rrbracket )=1\)?

  6. 6.

    The limit-sure repeated reachability problem: Does \(\mathbb {P}^+(M_S,c_0,\llbracket \Box \diamondsuit q_F \rrbracket )=1\)?

Note that sure reachability implies almost-sure reachability, which itself implies limit-sure reachability, but not vice-versa, as shown by the counterexamples in Fig. 1 (see also [7]). The same holds for repeated reachability. For the sure problems, probabilities are not taken into account, and thus these problems can be interpreted as the answer to a two player reachability game played on the transition system of S. Such games have been studied for instance in [1, 4, 18]. Finally, VASS-MDPs subsume deadlock-free VASS-MDPs and thus decidability (resp. undecidability) results carry over to the smaller (resp. larger) class.

Fig. 1.
figure 1figure 1

Two 1-dimensional VASS-MDPs. The circles (resp. squares) are the control states of Player 1 (resp. Player P). All transitions have the same weight 1. From \(\langle q_0,0 \rangle \), the state \(q_F\) is reached almost-surely, but not surely, due to the possible run with an infinite loop at \(q_0\) (which has probability zero). From \(\langle q_1,0 \rangle \), the state \(q_F\) can be reached limit-surely (by a family of strategies that repeats the loop at \(q_1\) more and more often), but not almost-surely (or surely), since every strategy has a chance of getting stuck at state \(q_2\) with counter value zero.

2.4 Undecidability in the General Case

It was shown in [1] that the sure reachability problem is undecidable for (2-dimensional) two player VASS. From this we can deduce that the sure reachability problem is undecidable for VASS-MDPs. We now present a similar proof to show the undecidability of the almost-sure reachability problem for VASS-MDPs.

For all of our undecidability results we use reductions from the undecidable control-state reachability problem for Minsky machines. A Minsky machine is a tuple \(\langle Q,T \rangle \) where Q is a finite set of states and T is a finite set of transitions manipulating two counters, say \(x_1\) and \(x_2\). Each transition is a triple of the form \(\langle q,x_i=0?,q' \rangle \) (counter \(x_i\) is tested for 0) or \(\langle q,x_i:=x_i+1,q' \rangle \) (counter \(x_i\) is incremented) or \(\langle q,x_i:=x_i-1,q' \rangle \) (counter \(x_i\) is decremented) where \(q,q' \in Q\). Configurations of a Minsky machine are triples in \(Q \times \mathbb {N}\times \mathbb {N}\). The transition relation \(\Rightarrow \) between configurations of the Minsky machine is then defined in the obvious way. Given an initial state \(q_I\) and a final state \(q_F\), the control-state reachability problem asks whether there exists a sequence of configurations \(\langle q_I,0,0 \rangle \Rightarrow \langle q_1,v_1,v'_1 \rangle \Rightarrow \ldots \Rightarrow \langle q_k,v_k,v'_k \rangle \) with \(q_k=q_F\). This problem is known to be undecidable [16]. W.l.o.g. we assume that Minsky machines are deadlock-free and deterministic (i.e., each configuration has always a unique successor) and that the only transition leaving \(q_F\) is of the form \(\langle q_F,x_1:=x_1+1,q_F \rangle \).

Fig. 2.
figure 2figure 2

Encoding \(\langle q_1,x_1:=x_1+1,q_2 \rangle \) and \(\langle q_3,x_2:=x_2-1,q_4 \rangle \) and \(\langle q_5,x_1=0?,q_6 \rangle \)

We now show how to reduce the control-state reachability problem to the almost-sure and limit-sure reachability problems in deadlock-free VASS-MDPs. From a Minsky machine, we construct a deadlock-free 2-dimensional VASS-MDP for which the control states of Player 1 are exactly the control states of the Minsky machine. The encoding is presented in Fig. 2 where the circles (resp. squares) are the control states of Player 1 (resp. Player P), and for each edge the corresponding weight is 1. The state \(\bot \) is an absorbing state from which the unique outgoing transition is a self loop that does not affect the values of the counters. This encoding allows us to deduce our first result.

Theorem 1

The sure, almost-sure and limit-sure (repeated) reachability problems are undecidable problems for 2-dimensional deadlock-free VASS-MDPs.

In the special case of 1-dimensional VASS-MDPs, the sure and almost-sure reachability problems are decidable [7].

2.5 Model-Checking \(\mu \)-calculus on Single-Sided VASS

It is well-known that there is a strong connection between model-checking branching time logics and games, and in our case we have in fact undecidability results for simple reachability games played on a VASS and for the model-checking of VASS with expressive branching-time logics [12]. However for this latter point, decidability can be regained by imposing some restrictions on the VASS structure [4] as we will now recall. We say that a VASS \(\langle Q,T \rangle \) is \((Q_1,Q_2)\)-single-sided iff \(Q_1\) and \(Q_2\) represents a partition of the set of states Q such that for all transitions \(\langle q,\mathbf{z },q' \rangle \) in T with \(q \in Q_2\), we have \(\mathbf{z }=\mathbf{0 }\); in other words only the transitions leaving a state from \(Q_1\) are allowed to change the values of the counters. In [4], it has been shown that, thanks to a reduction to games played on a single-sided VASS with parity objectives, a large fragment of the \(\mu \)-calculus called \(L^{\textit{sv}}_\mu \) has a decidable model-checking problem over single-sided VASS. The idea of this fragment is that the “always" operator \(\Box \) is guarded with a predicate enforcing the current control states to belong to \(Q_2\). Formally, the syntax of \(L^{\textit{sv}}_\mu \) for \((Q_1,Q_2)\)-single-sided VASS is given by the following grammar: \(\phi \,{:}{:}{=}\, q ~\mid ~ X ~\mid ~ \phi \wedge \phi ~\mid ~ \phi \vee \phi ~\mid ~ \diamondsuit \phi ~\mid ~ Q_2 \wedge \Box \phi ~\mid ~ \mu X.\phi ~\mid ~ \nu X.\phi \), where \(Q_2\) stands for the formula \(\bigvee _{q \in Q_2} q\) and X belongs to a set of variables \(\mathcal {X}\). The semantics of \(L^{\textit{sv}}_\mu \) is defined as usual: it associates to a formula \(\phi \) and to an environment \(\varepsilon : \mathcal {X}\rightarrow 2^C\) a subset of configurations \(\llbracket \phi \rrbracket _\varepsilon \). We use \(\varepsilon _0\) to denote the environment which assigns the empty set to any variable. Given an environment \(\varepsilon \), a variable \(X \in \mathcal {X}\) and a subset of configurations C, we use \(\varepsilon [X:=C]\) to represent the environment \(\varepsilon '\) which is equal to \(\varepsilon \) except on the variable X, where we have \(\varepsilon '(X)=C\). Finally the notation \(\llbracket \phi \rrbracket \) corresponds to the interpretation \(\llbracket \phi \rrbracket _{\varepsilon _0}\).

The problem of model-checking single-sided VASS with \(L^{\textit{sv}}_\mu \) can then be defined as follows: given a single-sided VASS \(\langle Q,T \rangle \), an initial configuration \(c_0\) and a formula \(\phi \) of \(L^{\textit{sv}}_\mu \), do we have \(c_0 \in \llbracket \phi \rrbracket \)?

Theorem 2

[4]. Model-checking single-sided VASS wrt. \(L^{\textit{sv}}_\mu \) is decidable.

3 Verification of P-VASS-MDPs

In [4] it is proved that parity games played on a single-sided deadlock-free VASS are decidable (this entails the decidability of model checking \(L^{\textit{sv}}_\mu \) over single-sided VASS). We will see here that in the case of P-VASS-MDPs, in which only the probabilistic player can modify the counters, the decidability status depends on the presence of deadlocks in the system.

3.1 Undecidability in Presence of Deadlocks

We point out that the reduction presented in Fig. 2 to prove Theorem 1 does not carry over to P-VASS-MDPs, because in that construction both players have the ability to change the counter values. However, it is possible to perform a similar reduction leading to the undecidability of verification problems for P-VASS-MDPs, the main difference being that we crucially exploit the fact that the P-VASS-MDP can contain deadlocks.

We now explain the idea behind our encoding of Minsky machines into P-VASS-MDPs. Intuitively, Player 1 chooses a transition of the Minsky machine to simulate, anticipating the modification of the counters values, and Player P is then in charge of performing the change. If Player 1 chooses a transition with a decrement and the accessed counter value is actually 0, then Player P will be in a deadlock state and consequently the desired control state will not be reached. Furthermore, if Player 1 decides to perform a zero-test when the counter value is strictly positive, then Player P is able to punish this choice by entering a deadlock state. Similarly to the proof of Theorem 1, Player P can test if the value of the counter is strictly greater than 0 by decrementing it. The encoding of the Minsky machine is presented in Fig. 3. Note that no outgoing edge of Player 1’s states changes the counter values. Furthermore, we see that Player P reaches the control state \(\bot \) if and only if Player 1 chooses to take a transition with a zero-test when the value of the tested counter is not equal to 0. Note that, with the encoding of the transition \(\langle q_3,x_2:=x_2-1,q_4 \rangle \), when Player P is in the control state between \(q_3\) and \(q_4\), it can be in a deadlock if the value of the second counter is not positive. In the sequel we will see that in P-VASS-MDP without deadlocks the sure reachability problem becomes decidable.

Fig. 3.
figure 3figure 3

Encoding \(\langle q_1,x_1:=x_1+1,q_2 \rangle \) and \(\langle q_3,x_2:=x_2-1,q_4 \rangle \) and \(\langle q_5,x_1=0?,q_6 \rangle \)

From this encoding we deduce the following result.

Theorem 3

The sure, almost sure and limit sure (repeated) reachability problems are undecidable for 2-dimensional P-VASS-MDPs.

3.2 Sure (repeated) Reachability in Deadlock-Free P-VASS-MDPs

Unlike in the case of general P-VASS-MDPs, we will see that the sure (repeated) reachability problem is decidable for deadlock-free P-VASS-MDPs. Let \(S=\langle Q,Q_1,Q_P,T,\tau \rangle \) be a deadlock-free P-VASS-MDP, \(M_S=(C,C_1,C_P,\rightarrow ,p)\) its associated MDP and \(q_F \in Q\) a control state. Note that because the P-VASS-MDP S is deadlock free, Player P cannot take the play to a deadlock to avoid the control state \(q_F\), but he has to deal only with infinite plays. Since S is a P-VASS-MDP, the VASS \(\langle Q,T \rangle \) is \((Q_P,Q_1)\)-single-sided. In [1, 18], it has been shown that control-state reachability games on deadlock-free single-sided VASS are decidable, and this result has been extended to parity games in [4]. This implies the decidability of sure (repeated) reachability in deadlock-free P-VASS-MDPs. However, to obtain a generic way of verifying these systems, we construct a formula of \(L^{\textit{sv}}_\mu \) that characterizes the sets of winning configurations and use then the result of Theorem 2. Let \(V^P_S\) be the set of configurations from which the answer to the sure reachability problem (with \(q_F\) as state to be reached) is negative, i.e., \(V^P_S=\{ c \in C \mid \not \exists \sigma \in \varSigma \text{ s.t. } \mathtt {Plays}(M_S,c,\sigma ) \subseteq \llbracket \diamondsuit q_F \rrbracket \}\) and similarly let \(W^P_S=\{ c \in C \mid \not \exists \sigma \in \varSigma \text{ s.t. } \mathtt {Plays}(M_S,c,\sigma ) \subseteq \llbracket \Box \diamondsuit q_F \rrbracket \}\). The next lemma relates these two sets with a formula of \(L^{\textit{sv}}_\mu \) (where \(Q_P\) corresponds to the formula \(\bigvee _{q \in Q_P}\) and \(Q_1\) corresponds to the formula \(\bigvee _{q \in Q_1} q\)).

Lemma 1

  • \(V^P_S=\llbracket \nu X. (\bigvee _{q \in Q \setminus \{ q_F \}} q) \wedge (Q_1 \vee \diamondsuit X) \wedge (Q_P \vee (Q_1 \wedge \Box X)) \rrbracket \).

  • \(W^P_S=\llbracket \mu Y. \nu X. \big ( (\bigvee _{q \in Q \setminus \{ q_F \}} q) \wedge (Q_1 \vee \diamondsuit X) \wedge (Q_P \vee (Q_1 \wedge \Box X)) \vee (q_F \wedge Q_P \wedge \diamondsuit Y) \vee (q_F \wedge Q_1 \wedge \Box Y) \big ) \rrbracket \)

We use \((Q_P \vee (Q_1 \wedge \Box X))\) instead of \((Q_P \vee \Box X)\) so that the formulae are in the guarded fragment of the \(\mu \)-calculus. Since the two formulae belong to \(L^{\textit{sv}}_\mu \) for the \((Q_P,Q_1)\)-single-sided VASS S, decidability follows directly from Theorem 2.

Theorem 4

The sure reachability and repeated reachability problem are decidable for deadlock free P-VASS-MDPs.

3.3 Almost-Sure and Limit-Sure Reachability in Deadlock-Free P-VASS-MDPs

We have seen that, unlike for the general case, the sure reachability and sure repeated reachability problems are decidable for deadlock free P-VASS-MDPs, with deadlock freeness being necessary to obtain decidability. For the corresponding almost-sure and limit-sure problems we now show undecidability, again using a reduction from the reachability problem for two counter Minsky machines, as shown in Fig. 4. The main difference with the construction used for the proof of Theorem 3 lies in the addition of a self-loop in the encoding of the transitions for decrementing a counter, in order to avoid deadlocks. If Player 1, from a configuration \(\langle q_3,\mathbf{v } \rangle \), chooses the transition \(\langle q_3,x_2:=x_2-1,q_4 \rangle \) which decrements the second counter, then the probabilistic state with the self-loop is entered, and there are two possible cases: if \(\mathbf{v }(2) > 0\) then the probability of staying forever in this loop is 0 and the probability of eventually going to state \(q_4\) is 1; on the other hand, if \(\mathbf{v }(2)=0\) then the probability of staying forever in the self-loop is 1, since the other transition that leaves the state of Player P and which performs the decrement on the second counter effectively is not available. Note that such a construction does not hold in the case of sure reachability, because the path that stays forever in the loop is a valid path.

Fig. 4.
figure 4figure 4

Encoding \(\langle q_1,x_1:=x_1+1,q_2 \rangle \) and \(\langle q_3,x_2:=x_2-1,q_4 \rangle \) and \(\langle q_5,x_1=0?,q_6 \rangle \)

This allows us to deduce the following result for deadlock free P-VASS-MDPs.

Theorem 5

The almost-sure and limit-sure (repeated) reachability problems are undecidable for 2-dimensional deadlock-free P-VASS-MDPs.

4 Verification of 1-VASS-MDPs

In this section, we will provide decidability results for the subclass of 1-VASS-MDPs. As for deadlock-free P-VASS-MDPs, the proofs for sure and almost-sure problems use the decidability of \(L^{\textit{sv}}_\mu \) over single-sided VASS, whereas the technique used to show decidability of limit-sure reachability is different.

4.1 Sure Problems in 1-VASS-MDPs

First we show that, unlike for P-VASS-MDPs, deadlocks do not matter for 1-VASS-MDPs. The idea is that in this case, if the deadlock is in a probabilistic configuration, it means that there is no outgoing edge (because of the property of 1-VASS-MDPs), and hence one can add an edge to a new absorbing state, and the same can be done for the states of Player 1. Such a construction does not work for P-VASS-MDPs, because in that case deadlocks in probabilistic configurations may depend on the counter values, and not just on the current control-state.

Lemma 2

The sure (resp. almost sure, resp. limit sure) (repeated) reachability problem for 1-VASS-MDPs reduces to the sure (resp. almost sure, resp. limit-sure) (repeated) reachability problem for deadlock-free 1-VASS-MDPs.

Hence in the sequel we will consider only deadlock-free 1-VASS-MDPs. Let \(S=\langle Q,Q_1,Q_P,T,\tau \rangle \) be a deadlock-free 1-VASS-MDP. For what concerns the sure (repeated) reachability problems we can directly reuse the results from Lemma 1 and then show that the complement formulae of the ones expressed in this lemma belong to \(L^{\textit{sv}}_\mu \) for the \((Q_1,Q_P)\)-single-sided VASS \(\langle Q,T \rangle \) (in fact the correctness of these two lemmas did not depend on the fact that we were considering P-VASS-MDPs). Theorem 2 allows us to retrieve the decidability results already expressed in [18] (for sure reachability) and [4] (for sure repeated reachability).

Theorem 6

The sure (repeated) reachability problem is decidable for 1-VASS-MDPs.

4.2 Almost-Sure Problems in 1-VASS-MDPs

We now move to the case of almost-sure problems in 1-VASS-MDPs. We consider a deadlock free 1-VASS-MDP \(S=\langle Q,Q_1,Q_P,T,\tau \rangle \) and its associated MDP \(M_S=\langle C,C_1,C_P,\rightarrow ,p \rangle \). We will see that, unlike for P-VASS-MDPs, it is here also possible to characterize by formulae of \(L^{\textit{sv}}_\mu \) the two following sets: \(V^1_{AS}=\{ c \in C \mid \exists \sigma \in \varSigma \text{ such } \text{ that } \mathbb {P}(M_S,c,\sigma ,\llbracket \diamondsuit q_F \rrbracket )=1 \}\) and \(W^1_{AS}=\{ c \in C \mid \exists \sigma \in \varSigma \text{ such } \text{ that } \mathbb {P}(M_S,c,\sigma ,\llbracket \Box \diamondsuit q_F \rrbracket )=1 \}\), i.e. the set of configurations from which Player 1 has a strategy to reach the control state \(q_F\), respectively to visit infinitely often \(q_F\), with probability 1.

We begin with introducing the following formula of \(L^{\textit{sv}}_\mu \) based on the variables X and Y: \(\mathtt {InvPre}(X,Y)= (Q_1 \wedge \diamondsuit (X \wedge Y)) \vee (\diamondsuit Y \wedge Q_P \wedge \Box X)\). Note that \(\mathtt {InvPre}(X,Y)\) is a formula of \(L^{\textit{sv}}_\mu \) for the \((Q_1,Q_P)\)-single-sided VASS \(\langle Q,T \rangle \). Intuitively, this formula represents the set of configurations from which (i) Player 1 can make a transition to the set represented by the intersection of the sets characterized by the variables X and Y and (ii) Player P can make a transition to the set Y and cannot avoid making a transition to the set X.

Almost Sure Reachability. We will now prove that \(V^1_{AS}\) can be characterized by the following formula of \(L^{\textit{sv}}_\mu \): \(\nu X. \mu Y. ( q_F \vee \mathtt {InvPre}(X,Y))\). Note that a similar result exists for finite-state MDPs, see e.g. [9]; this result in general does not extend to infinite-state MDPs, but in the case of VASS-MDPs it can be applied. Before proving this we need some intermediate results.

We denote by E the set \(\llbracket \nu X. \mu Y. \big ( q_F \vee \mathtt {InvPre}(X,Y)\big ) \rrbracket _{\varepsilon _0}\). Since \(\nu X. \mu Y. \big ( q_F \vee \mathtt {InvPre}(X,Y)\big )\) is a formula of \(L^{\textit{sv}}_\mu \) interpreted over the single-sided VASS \(\langle Q,T \rangle \), we can show that E is an upward-closed set. We now need another lemma which states that there exists \(N \in \mathbb {N}\) and a strategy for Player 1 such that, from any configuration of E, Player 1 can reach the control state \(q_F\) in less than N steps and Player P cannot take the play outside of E. The fact that we can bound the number of steps is crucial to show that \(\llbracket \nu X. \mu Y. \big ( q_F \vee \mathtt {InvPre}(X,Y)\big ) \rrbracket _{\varepsilon _0}\) is equal to \(V^1_{AS}\). For infinite-state MDPs where this property does not hold, our techniques do not apply.

Lemma 3

There exists \(N \in \mathbb {N}\) and a strategy \(\sigma \) of Player 1 such that for all \(c \in E\), there exists a play \(c \cdot c_1 \cdot c_2 \cdot \ldots \) in \(\mathtt {Plays}(M_S,c,\sigma )\) satisfying the three following properties: (1) there exists \(0 \le i \le N\) such that \(c_i \in \llbracket q_F \rrbracket \); (2) for all \(0 \le j \le i\), \(c_j \in E\); (3) for all \(0 \le j \le i\), if \(c_j \in C_P\) then for all \(c'' \in C\) such that \(c_j \rightarrow c''\), we have \(c'' \in E\).

This previous lemma allows us to characterize \(V^1_{AS}\) with a formula of \(L^{\textit{sv}}_\mu \). The proof of the following result uses the fact that the number of steps is bounded, and also the fact that the sets described by closed \(L^{\textit{sv}}_\mu \) formulae are upward-closed. This makes the fixpoint iteration terminate in a finite number of steps.

Lemma 4

\(V^1_{AS}=\llbracket \nu X. \mu Y. ( q_F \vee \mathtt {InvPre}(X,Y)) \rrbracket \).

Since \(\langle Q,T \rangle \) is \((Q_1,Q_P)\)-single-sided and since the formula associated to \(V^1_{AS}\) belongs to \(L^{\textit{sv}}_\mu \), from Theorem 2 we deduce the following theorem.

Theorem 7

The almost-sure reachability problem is decidable for 1-VASS-MDPs.

Almost Sure Repeated Reachability. For the case of almost sure repeated reachability we reuse the previously introduced formula \(\mathtt {InvPre}(X,Y)\). We can perform a reasoning similar to the previous ones and provide a characterization of the set \(W^1_{AS}\).

Lemma 5

\(W^1_{AS}=\llbracket \nu X. \mathtt {InvPre}(X,\mu Y.( q_F \vee \mathtt {InvPre}(X,Y))) \rrbracket \).

As previously, this allows us to deduce the decidability of the almost sure repeated reachability problem for 1-VASS-MDP.

Theorem 8

The almost sure repeated reachability problem is decidable for 1-VASS-MDPs.

4.3 Limit-Sure Reachability in 1-VASS-MDP

We consider a slightly more general version of the limit-sure reachability problem with a set \(X \subseteq Q\) of target states instead of a single state \(q_F\), i.e., the standard case corresponds to \(X = \{q_F\}\).

We extend the set of natural numbers \(\mathbb {N}\) to \(\mathbb {N}_{*}= \mathbb {N} \bigcup \{*\}\) by adding an element \(*\notin \mathbb {N}\) with \(*+j=*-j=*\) and \(j < *\) for all \(j \in \mathbb {N}\). We consider then the set of vectors \(\mathbb {N}_{*}^{d}\). The projection of a vector \(\mathbf{v }\) in \(\mathbb {N}^{d}\) by eliminating components that are indexed by a natural number k is defined by \(proj_k (\mathbf{v })(i) = \mathbf{v }(i) \) if \(i\ne k\) and \(proj_k (\mathbf{v })(i)=*\) otherwise

Let \(Q_c\) represent control-states which are indexed by a color. The coloring functions \(col_i:Q \rightarrow Q_c\) create colored copies of control-states by \(col_i(q)= q_i\).

Given a 1-VASS-MDP \(S=\langle Q,Q_1,Q_P,T,\tau \rangle \) of dimension d, an index \(k \le d\) and a color i, the colored projection is defined as:

$$\begin{aligned} Proj_k(S,d,i)=\langle col_i(Q), col_i(Q_1), col_i(Q_P), proj_{k,i} (T),\tau _{k,i} \rangle \end{aligned}$$

where \(proj_{k,i} (T)= \{proj_{k,i} (t) | t \in T\}\) is the projection of the set of transitions T and \(proj_{k,i} (t)= \langle col_i(x), proj_k(\mathbf{z }),col_i(y) \rangle \) is the projection of transition \(t=\langle x,\mathbf{z },y \rangle \) obtained by removing component k and coloring the states x and y with color i. The transition weights carry over, i.e., \(\tau _{k,i}(t') = \sum \{\tau (t)\,|\, proj_{k,i} (t)=t'\}\).

We define the functions \(state: Q \times \mathbb {N}_{*}^{d} \rightarrow Q\) and \({ count}: Q \times \mathbb {N}_{*}^{d} \rightarrow \mathbb {N}_{*}^{d}\) s.t for a configuration \(c_i = \langle q, \mathbf{v } \rangle \), where \(q \in Q\) and \(\mathbf v \in \mathbb {N}_{*}^{d}\) we have that \(state(q, \mathbf v )=q\) and \(count(q, \mathbf v )= \mathbf v \). For any two configurations \(c_1\) and \(c_2\), we write \(c_1 \prec c_2\) to denote that \(state(c_1)=state(c_2)\), and there exists a nonempty set of indexes I where for every \(i \in I\) , \(count(c_1)(i) < count(c_2)(i)\), whereas for every index \(j \notin I\), \(0<j \le d\), \(count(c_1)(j) = count(c_2)(j)\).

Algorithm 1 reduces the dimension of the limit-sure reachability problem for 1-VASS-MDP by a construction resembling the Karp-Miller tree [15]. It takes as input a 1-VASS-MDP S of some dimension \(d>0\) with a set of target states X. It outputs a new 1-VASS-MDP \(S'\) of dimension \(d-1\) and a new set of target states \(X'\) such that \(M_S\) can limit-surely reach X iff \(M_{S'}\) can limit-surely reach \(X'\). In particular, in the base case where \(d-1=0\), the new system \(S'\) has dimension zero and thus induces a finite-state MDP \(M_{S'}\), for which limit-sure reachability of \(X'\) coincides with almost-sure reachability of \(X'\), which is known to be decidable in polynomial time. Algorithm 1 starts by exploring all branches of the computation tree of S (and adding them to \(S'\) as the so-called initial uncolored part) until it encounters a configuration that is either (1) equal to, or (2) strictly larger than a configuration encountered previously on the same branch. In case (1) it just adds a back loop to the point where the configuration was encountered previously. In case (2), it adds a modified copy of S (identified by a unique color) to \(S'\). This so-called colored subsystem is similar to S except that those counters that have strictly increased along the branch are removed. The intuition is that these counters could be pumped to arbitrarily high values and thus present no obstacle to reaching the target. Since the initial uncolored part is necessarily finite (by Dickson’s Lemma) and each of the finitely many colored subsystems only has dimension \(d-1\) (since a counter is removed; possibly a different one in different colored subsystems), the resulting 1-VASS-MDP \(S'\) has dimension \(d-1\). The set of target states \(X'\) is defined as the union of all appearances of states in X in the uncolored part, plus all colored copies of states from X in the colored subsystems.

figure afigure a

By Dickson’s Lemma, the conditions on line 7 or line 19 of the algorithm must eventually hold on every branch of the explored computation tree. Thus, it will terminate.

Lemma 6

Algorithm 1 terminates.

The next lemma states the correctness of Algorithm 1. Let \(S=\langle Q,Q_1,Q_P,T,\tau \rangle \) be 1-VASS-MDP of dimension \(d > 0\) with initial configuration \(c_0 = \langle q_0,\mathbf{v } \rangle \) and \(X\subseteq Q\) a set of target states. Let \(S'=\langle Q',Q_1',Q_P',T',\tau ' \rangle \) with initial configuration \(c_0' = \langle q_0',\mathbf {0} \rangle \) and set of target states \(X' \subseteq Q'\) be the \((d-1)\) dimensional 1-VASS-MDP produced by Algorithm 1. As described above we have the following relation between these two systems.

Lemma 7

\(\mathbb {P}^+(M_S, c_0, \llbracket \diamondsuit X \rrbracket ) = 1\) iff \(\mathbb {P}^+(M_{S'}, c_0', \llbracket \diamondsuit X' \rrbracket ) = 1\).

By applying the result of the previous lemma iteratively until we obtain a finite-state MDP, we can deduce the following theorem.

Table 1. Decidability of verification problems for P-VASS-MDP, deadlock-free P-VASS-MDP and 1-VASS-MDP. A \(\checkmark \) stands for decidable and a \(\times \) for undecidable.

Theorem 9

The limit-sure reachability problem for 1-VASS-MDP is decidable.

5 Conclusion and Future Work

Table 1 summarizes our results on the decidability of verification problems for subclasses of VASS-MDP. The exact complexity of most problems is still open. Algorithm 1 relies on Dickson’s Lemma for termination, and the algorithm deciding the model-checking problem of Theorem 2 additionally uses the Valk-Jantzen construction repeatedly. However, all these problems are at least as hard as control-state reachability in VASS, and thus EXPSPACE-hard [12].

The decidability of the limit-sure repeated reachability problem for 1-VASS-MDP is open. A hint of its difficulty is given by the fact that there are instances where the property holds even though a small chance of reaching a deadlock cannot be avoided from any reachable configuration. In particular, a solution would require an analysis of the long-run behavior of multi-dimensional random walks induced by probabilistic VASS. However, these may exhibit strange nonregular behaviors for dimensions \(\ge 3\), as described in [8] (Sect. 5).