Pseudopolynomial iterative algorithm to solve total-payoff games and min-cost reachability games
Abstract
Quantitative games are two-player zero-sum games played on directed weighted graphs. Total-payoff games—that can be seen as a refinement of the well-studied mean-payoff games—are the variant where the payoff of a play is computed as the sum of the weights. Our aim is to describe the first pseudo-polynomial time algorithm for total-payoff games in the presence of arbitrary weights. It consists of a non-trivial application of the value iteration paradigm. Indeed, it requires to study, as a milestone, a refinement of these games, called min-cost reachability games, where we add a reachability objective to one of the players. For these games, we give an efficient value iteration algorithm to compute the values and optimal strategies (when they exist), that runs in pseudo-polynomial time. We also propose heuristics to speed up the computations.
1 Introduction
Games played on graphs are nowadays a well-studied and well-established model for the computer-aided design of computer systems, as they enable automatic synthesis of systems that are correct-by-construction. Of particular interest are quantitative games, that allow one to model precisely quantitative parameters of the system, such as energy consumption. In this setting, the game is played by two players on a directed weighted graph, where the edge weights model, for instance, a cost or a reward associated with the moves of the players. Each vertex of the graph belongs to one of the two players who compete by moving a token along the graph edges, thereby forming an infinite path called a play. With each play is associated a real-valued payoff computed from the sequence of edge weights along the play. The traditional payoffs that have been considered in the literature include total-payoff [12], mean-payoff [9] and discounted-payoff [21]. In this quantitative setting, one player aims at maximising the payoff while the other tries to minimise it. So one wants to compute, for each player, the best payoff that he can guarantee from each vertex, and the associated optimal strategies (i.e. that guarantee the optimal payoff no matter how the adversary is playing).
Such quantitative games have been extensively studied in the literature. Their associated decision problems (is the value of a given vertex above a given threshold?) are known to be in \(\mathrm {NP}\cap \mathrm {co}\hbox {-}\mathrm {NP}\) . Mean-payoff games have arguably been best studied from the algorithmic point of view. A landmark is Zwick and Paterson’s [21] pseudo-polynomial time (i.e. polynomial in the weighted graph when weights are encoded in unary) algorithm, using the value iteration paradigm that consists in computing a sequence of vectors of values that converges towards the optimal values of the vertices. After a fixed, pseudo-polynomial, number of steps, the computed values are precise enough to deduce the actual values of all vertices. Better pseudo-polynomial time algorithms have later been proposed, e.g., by Björklund and Vorobyov [1], Brim et al. [6], Comin and Rizzi [8], also achieving sub-exponential expected running time by means of randomisation.
In this paper, we focus on total-payoff games.^{1} Given an infinite play \(\pi \), we denote by \(\pi [k]\) the prefix of \(\pi \) of length k, and by \({\mathbf{TP}}(\pi [k])\) the (finite) sum of all edge weights along this prefix. The total-payoff of \(\pi \), \({\mathbf{TP}}(\pi )\), is the inferior limit of all those sums, i.e. \({\mathbf{TP}}(\pi )=\liminf _{k\rightarrow \infty } {\mathbf{TP}}(\pi [k])\). Compared to mean-payoff (and discounted-payoff) games, the literature on total-payoff games is less extensive. Gimbert and Zielonka [12] have shown that optimal memoryless strategies always exist for both players and the best algorithm to compute the values runs in exponential time [11], and consists in iteratively improving strategies. Other related works include energy games where one player tries to optimise its energy consumption (computed again as a sum), keeping the energy level always above 0. Note that it differs in essence from total-payoff games where no condition on the energy level is required: in particular, the optimal total-payoff could be negative, and even \(-\infty \), and it is a priori not possible to simply lift all the weights by a constant to solve total-payoff games by solving a related energy games. Moreover, this difference makes difficult to apply techniques solving energy games in the case of total-payoff games. Probabilistic variants of total-payoff games have also been studied, but the weights are restricted to be non-negative [7].
We argue that the total-payoff objective is interesting as a refinement of the mean-payoff. Indeed, recall first that the total-payoff is finite if and only if the mean-payoff is null. Then, the computation of the total-payoff enables a finer, two-stage analysis of a game \(\mathcal {G}\): (i) compute the mean payoff \(\mathbf{{MP}}(\mathcal {G})\); (ii) subtract \(\mathbf{{MP}}(\mathcal {G})\) from all edge weights, and scale the resulting weights if necessary to obtain integers. At that point, one has obtained a new game \(\mathcal {G}^{\prime }\) with null mean-payoff; (iii) compute \({\mathbf{TP}}(\mathcal {G}^{\prime })\) to quantify the amount of fluctuation around the mean-payoff of the original game. Unfortunately, so far, no efficient (i.e. pseudo-polynomial time) algorithms for total-payoff games have been proposed, and straightforward adaptations of Zwick and Paterson’s value iteration algorithm for mean-payoff do not work, as we demonstrate at the end of Sect. 2. In the present article, we fill in this gap by introducing the first pseudo-polynomial time algorithm for computing the values in total-payoff games.
Our solution is a non-trivial value iteration algorithm that proceeds through nested fixed points (see Algorithm 2). A play of a total-payoff game is infinite by essence. We transform the game so that one of the players (the minimiser) must ensure a reachability objective: we assume that the game ends once this reachability objective has been met. The intuition behind this transformation, that stems from the use of an inferior limit in the definition of the total-payoff, is as follows: in each play \(\pi \) whose total-payoff is finite, there is a position \(\ell \) in the play after which all the partial sums \({\mathbf{TP}}(\pi [i])\) (with \(i\geqslant \ell \)) will be larger than or equal to the total-payoff \({\mathbf{TP}}(\pi )\) of \(\pi \), and infinitely often both will be equal. For example, consider the game depicted in Fig. 1a, where the maximiser player (henceforth called \({\mathsf {Max}}\)) plays with the round vertices and the minimiser (\({\mathsf {Min}}\)) with the square vertices. For both players, the optimal value when playing from \(v_1\) is 2, and the play \(\pi =v_1 v_2 v_3\ v_4 v_5\ v_4 v_3\ (v_4 v_5)^\omega \) reaches this value [i.e. \({\mathbf{TP}}(\pi )=2\)]. Moreover, for all \(k\geqslant 7\): \({\mathbf{TP}}(\pi [k])\geqslant {\mathbf{TP}}(\pi )\), and infinitely many prefixes (\(\pi [8]\), \(\pi [10]\), \(\pi [12]\), \(\ldots \)) have a total-payoff of 2, as shown in Fig. 1b.
In the following, such refined total-payoff games—where \({\mathsf {Min}}\)must reach a designated target vertex—will be called min-cost reachability games (MCR games). Failing to reach the target vertices is the worst situation for \({\mathsf {Min}}\), so the payoff of all plays that do not reach the target is +\(\infty \), irrespective of the weights along the play. Otherwise, the payoff of a play is the sum of the weights up to the first occurrence of the target. As such, this problem nicely generalises the classical shortest path problem in a weighted graph. In the one-player setting (considering the point of view of \({\mathsf {Min}}\) for instance), this problem can be solved in polynomial time by Dijkstra’s and Floyd–Warshall’s algorithms when the weights are non-negative and arbitrary, respectively. Khachiyan et al. [13] propose an extension of Dijkstra’s algorithm to handle the two-player, non-negative weights case. However, in our more general setting (two players, arbitrary weights), this problem has, as far as we know, not been studied as such, except that the associated decision problem is known to be in \(\mathrm {NP}\cap \mathrm {co}\hbox {-}\mathrm {NP}\) [10]. A pseudo-polynomial time algorithm to solve a very close problem, called the longest shortest path problem (LSP) has been introduced by Björklund and Vorobyov [1] to eventually solve mean-payoff games. However, because of this peculiar context of mean-payoff games, their definition of the length of a path differs from our definition of the payoff and their algorithm can not be easily adapted to solve our MCR problem. Thus, as a second contribution, we show that a value iteration algorithm enables us to compute in pseudo-polynomial time the values of a MCR game. We believe that MCR games bear their own potential theoretical and practical applications.^{2} Those games are discussed in Sect. 3. In addition to the pseudo-polynomial time algorithm to compute the values, we show how to compute optimal strategies for both players and characterise them: there is always a memoryless strategy for the maximiser player, but we exhibit an example (see Fig. 2) where the minimiser player needs (finite) memory. Those results on MCR games are exploited in Sect. 4 where we introduce and prove correct our efficient algorithm for total-payoff games.
Finally, we briefly present our implementation in Sect. 5, using as a core the numerical model-checker PRISM. This allows us to describe some heuristics able to improve the practical performances of our algorithms for total-payoff games and MCR games on certain subclasses of graphs.
2 Quantitative games with arbitrary weights
In this section, we formally introduce the game model we consider throughout the article.
We denote by \(\mathbb {Z}\) the set of integers, and \(\mathbb {Z}_{\infty }=\mathbb {Z}\cup \{-\infty ,+\infty \}\). The set of vectors indexed by \(V\) with values in S is denoted by \(S^V\). We let \(\preccurlyeq \) be the pointwise order over \(\mathbb {Z}_{\infty }^V\), where \(x\preccurlyeq y\) if and only if \(x(v)\leqslant y(v)\) for all \(v\in V\).
2.1 Games played on graphs
We consider two-player turn-based games played on weighted graphs and denote the two players by \({\mathsf {Max}}\) and \({\mathsf {Min}}\). A weighted graph is a tuple \(\langle V,E,\omega \rangle \) where \(V=V_{{\mathsf {Max}}}\uplus V_{{\mathsf {Min}}}\) is a finite set of vertices partitioned into the sets \(V_{{\mathsf {Max}}}\) and \(V_{{\mathsf {Min}}}\) of \({\mathsf {Max}}\) and \({\mathsf {Min}}\) respectively, \(E\subseteq V\times V\) is a set of directed edges, \(\omega :E\rightarrow \mathbb {Z}\) is the weight function, associating an integer weight with each edge. In our drawings, \({\mathsf {Max}}\) vertices are depicted by circles; \({\mathsf {Min}}\) vertices by rectangles. For every vertex \(v\in V\), the set of successors of v with respect to \(E\) is denoted by \(E(v) = \{v^{\prime }\in V\mid (v,v^{\prime })\in E\}\). Without loss of generality, we assume that every graph is deadlock-free, i.e. for all vertices v, \(E(v)\ne \emptyset \). Finally, throughout this article, we let \(W=\max _{(v,v^{\prime })\in E}|\omega (v,v^{\prime })|\) be the greatest edge weight (in absolute value) in the game graph. A finite play is a finite sequence of vertices \(\pi =v_0v_1\ldots v_k\in V^*\) such that for all \(0\leqslant i<k\), \((v_i,v_{i+1})\in E\). A play is an infinite sequence of vertices \(\pi = v_0v_1\ldots \) such that every finite prefix \(v_0\ldots v_k\), denoted by \(\pi [k]\), is a finite play.
The total-payoff of a finite play \(\pi =v_0 v_1 \ldots v_k\) is obtained by summing up the weights along \(\pi \), i.e. \({\mathbf{TP}}(\pi ) = \sum \nolimits _{i=0}^{k-1} \omega (v_i,v_{i+1})\). In the following, we sometimes rely on the mean-payoff to obtain information about total-payoff objectives. The mean-payoff computes the average weight of \(\pi \), i.e. if \(k\geqslant 1\), \(\mathbf{{MP}}(\pi ) = \frac{1}{k}\sum \nolimits _{i=0}^{k-1} \omega (v_i,v_{i+1})\), and \(\mathbf{{MP}}(\pi )=0\) when \(k=0\). These definitions are lifted to infinite plays as follows. The total-payoff of a play \(\pi \) is given by \({\mathbf{TP}}(\pi ) = \liminf _{k\rightarrow \infty } {\mathbf{TP}}(\pi [k])\).^{3} Similarly, the mean-payoff of a play \(\pi \) is given by \(\mathbf{{MP}}(\pi ) = \liminf _{k\rightarrow \infty } \mathbf{{MP}}(\pi [k])\). Tuples \(\langle V,E,\omega , {\mathbf{TP}} \rangle \) and \(\langle V,E,\omega , \mathbf{{MP}} \rangle \), where \(\langle V,E,\omega \rangle \) is a weighted graph, are called total-payoff and mean-payoff games respectively.
2.2 Strategies and values
A strategy for \({\mathsf {Max}}\) (respectively, \({\mathsf {Min}}\)) in a game \(\mathcal {G}=\langle V,E,\omega ,\mathbf {P}\rangle \) (with \(\mathbf {P}\) one of the previous payoffs), is a mapping \(\sigma :V^* V_{{\mathsf {Max}}}\rightarrow V\) (\(\sigma :V^* V_{{\mathsf {Min}}}\rightarrow V\)) such that for all sequences \(\pi = v_0\ldots v_k\) with \(v_k\in V_{{\mathsf {Max}}}\) (\(v_k\in V_{{\mathsf {Min}}}\)), it holds that \((v_k,\sigma (\pi ))\in E\). A play or finite play \(\pi = v_0v_1\ldots \) conforms to a strategy \(\sigma \) of \({\mathsf {Max}}\) (respectively, \({\mathsf {Min}}\)) if for all k such that \(v_k\in V_{{\mathsf {Max}}}\) (\(v_k\in V_{{\mathsf {Min}}}\)), we have that \(v_{k+1} = \sigma (\pi [k])\). A strategy \(\sigma \) is memoryless if for all finite plays \(\pi , \pi ^{\prime }\), we have that \(\sigma (\pi v)=\sigma (\pi ^{\prime } v)\) for all \(v\in V\). A strategy \(\sigma \) is said to be finite-memory if it can be encoded in a deterministic Moore machine, \(\langle M,m_0,\mathsf {up},\mathsf {dec} \rangle \), where M is a finite set representing the memory of the strategy, with an initial memory content \(m_0\in M\), \(\mathsf {up}:M\times V\rightarrow M\) is a memory-update function, and \(\mathsf {dec}:M\times V\rightarrow V\) a decision function such that for every finite play \(\pi \) and vertex v, \(\sigma (\pi v)=\mathsf {dec}(\mathsf {mem}(\pi v),v)\) where \(\mathsf {mem}(\pi )\) is defined by induction on the length of the finite play \(\pi \) as follows: \(\mathsf {mem}(v_0)=m_0\), and \(\mathsf {mem}(\pi v)=\mathsf {up}(\mathsf {mem}(\pi ),v)\). In this case, we say that |M| is the size of the strategy.
For all strategies \(\sigma _{{\mathsf {Max}}}\) and \(\sigma _{{\mathsf {Min}}}\), for all vertices v, we let \(\mathsf {Play}(v,\sigma _{{\mathsf {Max}}},\sigma _{{\mathsf {Min}}})\) be the outcome of \(\sigma _{{\mathsf {Max}}}\) and \(\sigma _{{\mathsf {Min}}}\), defined as the unique play conforming to \(\sigma _{{\mathsf {Max}}}\) and \(\sigma _{{\mathsf {Min}}}\) and starting in v. Naturally, the objective of \({\mathsf {Max}}\) is to maximise its payoff. In this model of zero-sum game, \({\mathsf {Min}}\) then wants to minimise the payoff of \({\mathsf {Max}}\). Formally, we let \(\mathsf{{Val}}_\mathcal {G}(v,\sigma _{{\mathsf {Max}}})\) and \(\mathsf{{Val}}_\mathcal {G}(v,\sigma _{{\mathsf {Min}}})\) be the respective values of the strategies, defined as (recall that \(\mathbf {P}\) is either \({\mathbf{TP}}\) or \(\mathbf{{MP}}\)): \(\mathsf{{Val}}_\mathcal {G}(v,\sigma _{{\mathsf {Max}}}) = \inf _{\sigma _{{\mathsf {Min}}}} \mathbf {P}(\mathsf {Play}(v,\sigma _{{\mathsf {Max}}},\sigma _{{\mathsf {Min}}}))\) and \(\mathsf{{Val}}_\mathcal {G}(v,\sigma _{{\mathsf {Min}}}) = \sup _{\sigma _{{\mathsf {Max}}}} \mathbf {P}(\mathsf {Play}(v,\sigma _{{\mathsf {Max}}},\sigma _{{\mathsf {Min}}}))\). Finally, for all vertices v, we let \(\underline{\mathsf {Val}}_\mathcal {G}(v) = \sup _{\sigma _{{\mathsf {Max}}}} \mathsf{{Val}}_\mathcal {G}(v,\sigma _{{\mathsf {Max}}})\) and \(\overline{\mathsf {Val}}_\mathcal {G}(v) = \inf _{\sigma _{{\mathsf {Min}}}} \mathsf{{Val}}_\mathcal {G}(v,\sigma _{{\mathsf {Min}}})\) be respectively the lower and upper values of v. We may easily show that \(\underline{\mathsf {Val}}_\mathcal {G}\preccurlyeq \overline{\mathsf {Val}}_\mathcal {G}\). We say that strategies \(\sigma _{{\mathsf {Max}}}^\star \) of \({\mathsf {Max}}\) and \(\sigma _{{\mathsf {Min}}}^\star \) of \({\mathsf {Min}}\) are optimal if, for all vertices v: \(\mathsf{{Val}}_\mathcal {G}(v,\sigma _{{\mathsf {Max}}}^\star )=\underline{\mathsf {Val}}_\mathcal {G}(v)\) and \(\mathsf{{Val}}_\mathcal {G}(v,\sigma _{{\mathsf {Min}}}^\star )=\overline{\mathsf {Val}}_\mathcal {G}(v)\) respectively. We say that a game \(\mathcal {G}\) is determined if for all vertices v, its lower and upper values are equal. In that case, we write \(\mathsf{{Val}}_\mathcal {G}(v)=\underline{\mathsf {Val}}_\mathcal {G}(v)=\overline{\mathsf {Val}}_\mathcal {G}(v)\), and refer to it as the value of v in \(\mathcal {G}\). If the game is clear from the context, we may drop the index \(\mathcal {G}\) from all previous notations. Mean-payoff and total-payoff games are known to be determined, with the existence of optimal memoryless strategies [12, 21].
2.3 Previous works and contribution
Total-payoff games have been mainly considered as a refinement of mean-payoff games [12]. Indeed, if the mean-payoff value of a game is positive (respectively, negative), its total-payoff value is necessarily \(+\infty \) (\(-\infty \)). When the mean-payoff value is 0 however, the total-payoff is necessarily different from +\(\infty \) and \(-\infty \), hence total-payoff games are particularly useful in this case, to refine the analysis of the game. Deciding whether the total-payoff value of a vertex is positive can be achieved in \(\mathrm {NP}\cap \mathrm {co}\hbox {-}\mathrm {NP}\) . Gawlitza and Seidl [11] refined the complexity to UP \(\cap \) co-UP, and values are shown to be effectively computable solving nested fixed point equations with a strategy iteration algorithm working in exponential time in the worst case. Because of this strong relationship between mean- and total-payoff games, we can show that total-payoff games are, in some sense, as hard as mean-payoff games, for which the existence of a (strongly) polynomial time algorithm is a long-standing open question.
In this article, we improve on this state-of-the-art and introduce the first (to the best of our knowledge) pseudo-polynomial time algorithm for total-payoff games. In many cases, (e.g., mean-payoff games), a successful way to obtain such an efficient algorithm is the value iteration paradigm. Intuitively, value iteration algorithms compute successive approximations \(x_0, x_1, \ldots , x_i, \ldots \) of the game value by restricting the number of turns that the players are allowed to play: \(x_i\) is the vector of optimal values achievable when the players play at most i turns. The sequence of values is computed by means of an operator \(\mathcal {F}\), letting \(x_{i+1}=\mathcal {F}(x_i)\) for all i. Good properties (Scott-continuity and monotonicity) of \(\mathcal {F}\) ensure convergence towards its smallest or greatest fixed point (depending on the value of \(x_0\)), which, in some cases, is the value of the game.
Let us briefly explain why, unfortunately, a straightforward application of this approach fails with total-payoff games. In our case, the most natural operator \(\mathcal {F}\) is such that \(\mathcal {F}(x)(v)=\max _{v^{\prime }\in E(v)} (\omega (v,v^{\prime }) + x(v^{\prime }))\) for all \(v\in V_{{\mathsf {Max}}}\) and \(\mathcal {F}(x)(v)=\min _{v^{\prime }\in E(v)}(\omega (v,v^{\prime }) + x(v^{\prime }))\) for all \(v\in V_{{\mathsf {Min}}}\). Indeed, this definition matches the intuition that \(x_N\) is the optimal value after N turns. Then, consider the example of Fig. 1a, limited to vertices \(\{v_3,v_4,v_5\}\) for simplicity. Observe that there are two simple cycles with weight 0, hence the total-payoff value of this game is finite. \({\mathsf {Max}}\) has the choice between cycling into one of these two cycles. It is easy to check that \({\mathsf {Max}}\)’s optimal choice is to enforce the cycle between \(v_4\) and \(v_5\), securing a payoff of −1 from \(v_4\) (because of the \(\liminf \) definition of \({\mathbf{TP}}\)). Hence, the values of \(v_3\), \(v_4\) and \(v_5\) are respectively 1, −1 and 0. In this game, we have \(\mathcal {F}(x) = (2+x(v_4),\max (-2+x(v_3),-1+x(v_5)),1+x(v_4))\), and the vector \((1,-1,0)\) is indeed a fixed point of \(\mathcal {F}\). However, it is neither the greatest nor the smallest fixed point of \(\mathcal {F}\). Indeed, it is easy to check that, ifx is a fixed point of \(\mathcal {F}\), then\(x+(a,a,a)\) is also a fixed point, for all constant \(a\in \mathbb {Z}\cup \{-\infty ,+\infty \}\). If we try to initialise the value iteration algorithm with value (0, 0, 0), which could seem a reasonable choice, the sequence of computed vectors is: (0, 0, 0), \((2,-1,1)\), (1, 0, 0), \((2,-1,1)\), (1, 0, 0), \(\ldots \) that is not stationary, and does not even contain \((1,-1,0)\). Notice that \((-\infty ,-\infty ,-\infty )\) and \((+\infty ,+\infty ,+\infty )\) are fixed points, so that they do not allow us to find the correct answer too. Thus, it seems difficult to compute the actual game values with an iterative algorithm relying on the operator \(\mathcal {F}\), as in the case of mean-payoff games.^{4} Notice that, in the previous example, the Zwick and Paterson’s algorithm [21] to solve mean-payoff games would easily conclude from the sequence above, since the vectors of interest are then (0, 0, 0), \((1,-0.5,0.5)\), (0.33, 0, 0), \((0.5,-0.25,0.25)\), (0.2, 0, 0), \(\ldots \) indeed converging towards (0, 0, 0), the mean-payoff values of this game.
Instead, as explained in the introduction, we propose a different approach that consists in reducing total-payoff games to MCR games where \({\mathsf {Min}}\) must enforce a reachability objective on top of his optimisation objective. The aim of the next section is to study these games, and we reduce total-payoff games to them in Sect. 4.
3 Min-cost reachability games
In this section, we consider MCR games, a variant of total-payoff games where one player has a reachability objective that he must fulfil first, before minimising his quantitative objective (hence the name min-cost reachability). Without loss of generality, we assign the reachability objective to player \({\mathsf {Min}}\), as this will make our reduction from total-payoff games easier to explain. Hence, when the target is not reached along a path, the payoff of this path shall be the worst possible for \({\mathsf {Min}}\), i.e. +\(\infty \). Formally, an MCR game is played on a weighted graph \(\langle V,E,\omega \rangle \) equipped with a target set of vertices \(T\subseteq V\). The payoff \(T\hbox {-}\mathbf {MCR}(\pi )\) of a play \(\pi =v_0v_1\ldots \) is given by \(T\hbox {-}\mathbf {MCR}(\pi )=+\infty \) if the play avoids \(T\), i.e. if for all \(k\geqslant 0\), \(v_k\notin T\), and \(T\hbox {-}\mathbf {MCR}(\pi )={\mathbf{TP}}(\pi [k])\) if k is the least position in \(\pi \) such that \(v_k\in T\). Lower and upper values are then defined as in Sect. 2.
Using an indirect consequence of Martin’s theorem [15], we can show that MCR games are determined, i.e. that the upper and lower valus always coincide:
Theorem 1
MCR games are determined.
Proof
Consider a quantitative game \(\mathcal {G}=\langle V,E,\omega ,\,\mathbf {P}\rangle \) and a vertex \(v\in V\). We will prove the determinacy result by using the Borel determinacy result of [15]. First, notice that the payoff mapping \(T\hbox {-}\mathbf {MCR}\) is Borel measurable since the set of plays with finite \(T\hbox {-}\mathbf {MCR}\) payoff is a countable union of cylinders. Then, for an integer M, consider \(\mathsf {Win}_M\) to be the set of plays with a payoff less than or equal to M. It is a Borel set, so that the qualitative game defined over the graph \(\langle V,E,\omega \rangle \) with winning condition \(\mathsf {Win}_M\) is determined. We now use this preliminary result to show our determinacy result.
We fix an MCR game and one of its vertices v, and first consider cases where either the lower or the upper values is infinite. Suppose first that \(\underline{\mathsf {Val}}(v)=-\infty \). We have to show that \(\overline{\mathsf {Val}}(v)=-\infty \) too. Let M be an integer. Since \(\underline{\mathsf {Val}}(v)<M\), we know that for all strategies \(\sigma _{{\mathsf {Max}}}\) of \({\mathsf {Max}}\), there exists a strategy \(\sigma _{{\mathsf {Min}}}\) for \({\mathsf {Min}}\), such that \(\mathbf {P}(\mathsf {Play}(v,\sigma _{{\mathsf {Max}}},\sigma _{{\mathsf {Min}}}))\leqslant M\). In particular, \({\mathsf {Max}}\) has no winning strategy in the qualitative game equipped with \(\mathsf {Win}_M\) as a winning condition, hence, by determinacy, \({\mathsf {Min}}\) has a winning strategy, i.e. a strategy \(\sigma _{{\mathsf {Min}}}\) such that every strategy \(\sigma _{{\mathsf {Max}}}\) of \({\mathsf {Max}}\) verifies \(\mathbf {P}(\mathsf {Play}(v,\sigma _{{\mathsf {Max}}},\sigma _{{\mathsf {Min}}}))\leqslant M\). This exactly means that \(\overline{\mathsf {Val}}(v)\leqslant M\). Since this holds for every value M, we get that \(\overline{\mathsf {Val}}(v)=-\infty \). The proof goes exactly in a symmetrical way to show that \(\overline{\mathsf {Val}}(v)=+\infty \) implies \(\underline{\mathsf {Val}}(v)=+\infty \).
Consider then the case where both \(\overline{\mathsf {Val}}(v)\) and \(\underline{\mathsf {Val}}(v)\) are finite values. For the sake of contradiction, assume that \(\underline{\mathsf {Val}}(v)<\overline{\mathsf {Val}}(v)\) and consider a real number r strictly in-between those two values. From \(r<\overline{\mathsf {Val}}(v)\), we deduce that \({\mathsf {Min}}\) has no winning strategy from v in the qualitative game with winning condition \(\mathsf {Win}_r\). Identically, from \(\underline{\mathsf {Val}}(v)<r\), we deduce that \({\mathsf {Max}}\) has no winning strategy from v in the same game. This contradicts the determinacy of this qualitative game. Hence, \(\underline{\mathsf {Val}}(v)=\overline{\mathsf {Val}}(v)\). \(\square \)
Example 2
As an example, consider the MCR game played on the weighted graph of Fig. 2, where W is a positive integer and \(v_3\) is the target. We claim that the values of vertices \(v_1\) and \(v_2\) are both −W. Indeed, consider the following strategy for \({\mathsf {Min}}\): during each of the first W visits to \(v_2\) (if any), go to \(v_1\); else, go to \(v_3\). Clearly, this strategy ensures that the target will eventually be reached, and that either (i) edge \((v_1,v_3)\) (with weight −W) will eventually be traversed; or (ii) edge \((v_1,v_2)\) (with weight −1) will be traversed at least W times. Hence, in all plays following this strategy, the payoff will be at most −W. This strategy allows \({\mathsf {Min}}\) to secure −W, but he can not ensure a lower payoff, since \({\mathsf {Max}}\) always has the opportunity to take the edge \((v_1,v_3)\) (with weight −W) instead of cycling between \(v_1\) and \(v_2\). Hence, \({\mathsf {Max}}\)’s optimal choice is to follow the edge \((v_1,v_3)\) as soon as \(v_1\) is reached, securing a payoff of −W. The \({\mathsf {Min}}\) strategy we have just given is optimal, and there is no optimal memoryless strategy for \({\mathsf {Min}}\). Indeed, always playing \((v_2,v_3)\) does not ensure a payoff less than or equal to \(-W\); and, always playing \((v_2,v_1)\) does not guarantee to reach the target, and this strategy has thus value \(+\infty \).
A remark on related work Let us note that [1] introduce the LSP and propose a pseudo-polynomial time algorithm to solve it. However, their definition has several subtle but important differences to ours, such as in the definition of the payoff of a play (equivalently, the length of a path). As an example, in the game of Fig. 2, the play \(\pi =(v_1 v_2)^\omega \) (that never reaches the target) has length \(-\infty \) in their setting, while, in our setting, \(\{v_3\}\hbox {-}\mathbf {MCR}(\pi )=+\infty \). A more detailed comparison of the two definitions is given in “Appendix”. Moreover, even if a preprocessing would hypothetically allow one to use the LSP algorithm to solve MCR games, our solution (that has the same worst-case complexity as theirs) is simpler to implement, and we also introduce (see Sect. 5) heuristics that are only applicable to our value iteration solution.
As explained in the introduction of this section, we show how to solve those games, i.e. how to compute \(\mathsf {Val}(v)\) for all vertices v in pseudo-polynomial time. This procedure will be instrumental to solving total-payoff games. Our contributions are summarised in the following theorem:
Theorem 3
- 1.
For all \(v\in V\), deciding whether \(\mathsf {Val}(v)=+\infty \) can be done in polynomial time.
- 2.
For \(v\in V\), deciding whether \(\mathsf {Val}(v)=-\infty \) is as hard as solving mean-payoff games, in \(\mathrm {NP}\cap \mathrm {co}\hbox {-}\mathrm {NP}\) and can be achieved in pseudo-polynomial time.
- 3.
If \(\mathsf {Val}(v)\ne -\infty \) for all vertices \(v\in V\), then both players have optimal strategies. Moreover, \({\mathsf {Max}}\) always has a memoryless optimal strategy, while \({\mathsf {Min}}\) may require finite (pseudo-polynomial) memory in his optimal strategy.
- 4.
Computing all values \(\mathsf {Val}(v)\) (for \(v\in V\)), as well as optimal strategies (if they exist) for both players, can be done in (pseudo-polynomial) time \(O(|V|^2 |E| W)\).
3.1 Finding vertices with value \(+\infty \)
To prove the first item of Theorem 3, it suffices to notice that vertices with value \(+\infty \) are exactly those from which \({\mathsf {Min}}\) can not reach the target. Therefore the problem reduces to deciding the winner in a classical reachability game, that can be solved in polynomial time [19], using the classical attractor construction.
In those games, one can construct in polynomial time a memoryless strategy, called an attractor strategy, ensuring to reach the target in less than \(|V|\) steps from every vertex.
In the following, we assume that all vertices have a value different from \(+\infty \). Indeed as described above one can detect in polynomial time the vertices with value \(+\infty \) and remove them without changing the values of the other vertices.
3.2 Finding vertices with value \(-\infty \)
To prove the second item, we notice that vertices with value \(-\infty \) are exactly those with a value <0 in the mean-payoff game played on the same graph. On the other hand, we can show that every mean-payoff game can be transformed (in polynomial time) into an MCR game such that a vertex has value <0 in the mean-payoff game if and only if the value of its corresponding vertex in the MCR game is \(-\infty \). More precisely:
Proposition 4
- 1.
For all MCR games \(\mathcal {G}=\langle V,E,\omega ,T\hbox {-}\mathbf {MCR}\rangle \) where \(\mathsf {Val}_\mathcal {G}(v)\ne +\infty \) for all v, for all vertices v of \(\mathcal {G}\), \(\mathsf {Val}_\mathcal {G}(v)=-\infty \) if and only if \(\mathsf {Val}_{\mathcal {G}^{\prime }}(v)<0\), where \(\mathcal {G}^{\prime }\) is the mean-payoff game \(\langle V,E,\omega ,\mathbf{{MP}}\rangle \).
- 2.
Conversely, given a mean-payoff game \(\mathcal {G}=\langle V,E,\omega ,\mathbf{{MP}}\rangle \), we can build, in polynomial time, an MCR game \(\mathcal {G}^{\prime }\) such that for all vertices v of \(\mathcal {G}\): \(\mathsf {Val}_{\mathcal {G}}(v)<0\) if and only if \(\mathsf {Val}_{\mathcal {G}^{\prime }}(v)=-\infty \).
Proof
To prove the first item, consider an MCR game \(\mathcal {G}=\langle V,E,\omega ,T\hbox {-}\mathbf {MCR}\rangle \) such that \(\mathsf {Val}_\mathcal {G}(v)\ne +\infty \) for all \(v\in V\), and \(\mathcal {G}^{\prime }=\langle V,E,\omega ,\mathbf{{MP}}\rangle \) the same weighted graph equipped with a mean-payoff objective.
If \(\mathsf {Val}_{\mathcal {G}^{\prime }}(v)<0\), we know that there is a profile of optimal memoryless strategies \((\sigma _{{\mathsf {Max}}}^\star ,\sigma _{{\mathsf {Min}}}^\star )\) such that the outcome starting in v and following this profile necessarily starts with a finite prefix and then loops in a cycle with a total weight \({<}\)0. For every \(M>0\), we construct a strategy \(\sigma _{{\mathsf {Min}}}^M\) that ensures in \(\mathcal {G}\) a cost less than or equal to −M: this will prove that \(\mathsf {Val}_\mathcal {G}(v)=-\infty \). Since we have assumed that \(\mathsf {Val}_\mathcal {G}(v)\ne +\infty \) for all v, we know that \({\mathsf {Min}}\) has a strategy to reach the target from all v (for instance, take the attractor strategy described above), by a path of length at most |V|. Thus, there exists a bound w and a strategy allowing \({\mathsf {Min}}\) to reach the target from every vertex of \(\mathcal {G}^{\prime }\) with a cost at most w. The strategy \(\sigma _{{\mathsf {Min}}}^M\) of \({\mathsf {Min}}\) is then to follow \(\sigma _{{\mathsf {Min}}}^\star \) until the accumulated cost is less than \(-M-w\), at which point it follows his strategy to reach the target. Clearly, for all M, \(\sigma _{{\mathsf {Min}}}^M\) guarantees that \({\mathsf {Min}}\) reaches the target with a cost at most −M.
Reciprocally, if \(\mathsf {Val}_\mathcal {G}(v)=-\infty \), consider \(M=|V| W\) and a strategy \(\sigma _{{\mathsf {Min}}}^M\) of \({\mathsf {Min}}\) ensuring a cost less than −M, i.e. such that \(\mathsf {Val}_\mathcal {G}(v,\sigma _{{\mathsf {Min}}}^M)<-M\). Consider the finitely-branching tree built from \(\mathcal {G}\) by unfolding the game from vertex v and resolving the choices of \({\mathsf {Min}}\) with strategy \(\sigma _{{\mathsf {Min}}}^M\). Each branch of this tree corresponds to a possible strategy of \({\mathsf {Max}}\). Since this strategy generates a finite cost, we are certain that every such branch leads to a vertex of \(T\). If we trim the tree at those vertices, we finally obtain a finite tree. Now, for a contradiction, consider an optimal memoryless strategy \(\sigma _{{\mathsf {Max}}}^\star \) of \({\mathsf {Max}}\) securing a non-negative mean-payoff, that is, \(\mathsf {Val}_{\mathcal {G}^{\prime }}(v,\sigma _{{\mathsf {Max}}}^\star )\geqslant 0\). Consider the branch of the previous tree where \({\mathsf {Max}}\) follows strategy \(\sigma _{{\mathsf {Max}}}^\star \). Since this finite branch has cost less than \(-M=-|V| W<0\) (W is positive, otherwise the mean-payoff value would be 0), we know for sure that there are two occurrences of the same vertex \(v'\) with an in-between weight <0: otherwise, by removing all non-negative cycles, we obtain a play without repetition of vertices, henceforth of length bounded by \(|V|\), and therefore of cost at least −M. Suppose that \(v^{\prime }\in V_{{\mathsf {Max}}}\). Then, \({\mathsf {Min}}\) has a strategy \(\sigma _{{\mathsf {Min}}}\) to ensure a negative mean-payoff \(\mathsf {Val}_{\mathcal {G}^{\prime }}(v,\sigma _{{\mathsf {Min}}})<0\): indeed, he simply modifies^{5} his strategy so that he always stays in the negative cycle starting in \(v^{\prime }\) (he can do that since \(\sigma _{{\mathsf {Max}}}\) plays a memoryless strategy, so that he can not change his decisions in the cycle), ensuring that, against the optimal strategy \(\sigma _{{\mathsf {Max}}}^\star \) of \({\mathsf {Max}}\), he gets a mean-payoff being the cost of the cycle. This is a contradiction since \({\mathsf {Max}}\) is supposed to have a strategy ensuring a non-negative mean-payoff from v. Hence, \(v^{\prime }\in V_{{\mathsf {Min}}}\). But the same contradiction appears in that case since \({\mathsf {Min}}\) can force that it always stays in the negative cycle by modifying his strategy. Finally, we have proved that \({\mathsf {Max}}\) can not have a memoryless strategy securing a non-negative mean-payoff from v. By memoryless determinacy of the mean-payoff games, this ensures that \({\mathsf {Min}}\) has a memoryless strategy securing a negative mean-payoff from v.
Hence, we have shown that \(\mathsf {Val}_\mathcal {G}(v)=-\infty \) if and only if \(\mathsf {Val}_{\mathcal {G}^{\prime }}(v)<0\), which concludes the first claim of Proposition 4.
To prove the second item, we reduce mean-payoff games to MCR games as follows. Let \(\mathcal {G}= \langle V,E,\omega ,\mathbf{{MP}}\rangle \) be a mean-payoff game. Without loss of generality, we may suppose that the graph of the game is bipartite, in the sense that \(E\subseteq V_{{\mathsf {Max}}}\times V_{{\mathsf {Min}}}\cup V_{{\mathsf {Min}}}\times V_{{\mathsf {Max}}}\).^{6} The problem we are interested in is to decide whether \(\mathsf {Val}_\mathcal {G}(v)<0\) for a given vertex v. We now construct an MCR game \(\mathcal {G}^{\prime } = \langle V^{\prime },E^{\prime }, \omega ^{\prime }, T^{\prime }\hbox {-}\mathbf {MCR} \rangle \) from \(\mathcal {G}\). The only difference is the presence of a fresh target vertex \({\texttt {t}}\) on top of vertices of \(V\): \(V^{\prime }=V\uplus \{{\texttt {t}}\}\) with \(T^{\prime }=\{{\texttt {t}}\}\). Edges of \(\mathcal {G}^{\prime }\) are given by \(E^{\prime } = E\cup \{(v,{\texttt {t}})\mid v\in V_{{\mathsf {Min}}}\}\cup \{({\texttt {t}},{\texttt {t}})\}\). Weights of edges are given by: \(\omega ^{\prime }(v,v^{\prime })=\omega (v,v^{\prime })\) if \((v,v^{\prime })\in E\), and \(\omega ^{\prime }(v,{\texttt {t}})=\omega ^{\prime }({\texttt {t}},{\texttt {t}})=0\). We show that \(\mathsf {Val}_{\mathcal {G}}(v)<0\) if and only if \(\mathsf {Val}_{\mathcal {G}^{\prime }}(v)=-\infty \).
In \(\mathcal {G}^{\prime }\), all values are different from +\(\infty \), since \({\mathsf {Min}}\) plays at least every two steps, and has the capability to go to the target vertex with weight 0. Hence, letting \(\mathcal {G}^{\prime \prime }=\langle V^{\prime },E^{\prime }, \omega ^{\prime },\mathbf{{MP}} \rangle \) the mean-payoff game on the weighted graph of \(\mathcal {G}^{\prime }\), by the previous direction, we have that for every vertex \(v\in V^{\prime }\), \(\mathsf {Val}_{\mathcal {G}^{\prime }}(v)=-\infty \) if and only if \(\mathsf {Val}_{\mathcal {G}^{\prime \prime }}(v)<0\).
To conclude, we prove that for all vertices \(v\in V\), \(\mathsf {Val}_{\mathcal {G}^{\prime \prime }}(v)<0\) if and only if \(\mathsf {Val}_{\mathcal {G}}(v)<0\). If \(\mathsf {Val}_{\mathcal {G}}(v)<0\), by mapping the memoryless optimal strategies of \(\mathcal {G}\) into \(\mathcal {G}^{\prime \prime }\), we directly obtain that \(\mathsf {Val}_{\mathcal {G}^{\prime \prime }}(v)\leqslant \mathsf {Val}_{\mathcal {G}}(v)<0\), since \({\mathsf {Max}}\) has no possibility to go by himself to the target. Reciprocally, if \(\mathsf {Val}_{\mathcal {G}^{\prime \prime }}(v)<0\), we can project a profile of memoryless optimal strategies over vertices of \(\mathcal {G}\), since the target can not be visited in this case (otherwise the optimal play would have mean-payoff 0): the play obtained from v in \(\mathcal {G}\) is then the projection of the play obtained from v in \(\mathcal {G}^{\prime \prime }\), with the same cost. Hence, \(\mathsf {Val}_{\mathcal {G}}(v)\leqslant \mathsf {Val}_{\mathcal {G}^{\prime \prime }}(v)<0\). \(\square \)
3.3 Computing all values
Now that we have discussed the case of vertices with value in \(\{-\infty , +\infty \}\), let us present our core contribution on MCR games, which is a pseudo-polynomial time, value iteration algorithm to compute the values of those games. Note that this algorithm is correct even when some vertices have value in \(\{-\infty , +\infty \}\), as we will argue later.
In all that follows, we assume that there is exactly one target vertex denoted by \({\texttt {t}}\), and the only outgoing edge from \({\texttt {t}}\) is a self loop with weight 0: this is reflected by denoting \(\mathbf {MCR}\) the payoff mapping \(\{{\texttt {t}}\}\hbox {-}\mathbf {MCR}\). This is without loss of generality since everything that happens after the first occurrence of a target vertex in a play does not matter for the payoff.
Proposition 5
If an MCR game \(\mathcal {G}=\langle V,E,\omega ,\mathbf {MCR}\rangle \) is given as input (possibly with values +\(\infty \) or \(-\infty \)), Algorithm 1 outputs \(\mathsf {Val}_\mathcal {G}\), after at most \((2|V|-1) W |V|+2|V|\) iterations.
Lemma 6
Proof
We have just shown that for all \(i\geqslant 1\), \(\overline{\mathsf {Val}}^{\leqslant i} =\mathcal {F}(\overline{\mathsf {Val}}^{\leqslant i-1})\), and since \(x_0=\overline{\mathsf {Val}}^{\leqslant 0}\), we obtain (as expected) \(x_i=\overline{\mathsf {Val}}^{\leqslant i}\) for all \(i\geqslant 0\). The main question is now to characterise the limit of the sequence \((x_i)_{i\geqslant 0}\), and more precisely, to prove that it is the value \(\mathsf {Val}\) of the game. Indeed, at this point, it would not be too difficult to show that \(\mathsf {Val}\) is a fixed point of operator \(\mathcal {F}\), but it would be more difficult to show that it is the greatest fixed point of \(\mathcal {F}\), that is indeed the limit of sequence \((x_i)_{i\geqslant 0}\) (by Kleene’s theorem, applicable since \(\mathcal {F}\) is Scott-continuous). Instead, we study refined properties of the sequence \((\overline{\mathsf {Val}}^{\leqslant i})_{i\geqslant 0}\), namely its stationarity and the speed of its convergence, and deduce that \(\mathsf {Val}\) is the greatest fixed point as a corollary (see Corollary 11).
Lemma 7
Let \(v\in V\) be a vertex and let \(0\leqslant k\leqslant |V|\) be such that \(v\in \mathsf {Attr}_k(\{{\texttt {t}}\}){\setminus } \mathsf {Attr}_{k-1}(\{{\texttt {t}}\})\) (assuming \(\mathsf {Attr}_{-1}(\{{\texttt {t}}\})=\emptyset \)). Then, for all \(0\leqslant j\leqslant |V|\): (i) \(j<k\) implies \(\overline{\mathsf {Val}}^{\leqslant j}(v)=+\infty \) and (ii) \(j\geqslant k\) implies \(\overline{\mathsf {Val}}^{\leqslant j}(v)\leqslant j W\).
Proof
We prove the property for all vertices v, by induction on j.
Base case: \(j=0\). We consider two cases. Either \(v={\texttt {t}}\). In this case, \(k=0\), and we must show that \(\overline{\mathsf {Val}}^{\leqslant 0}(v)\leqslant 0\times W=0\), which is true by definition of \(\overline{\mathsf {Val}}^{\leqslant 0}\). Or \(v\ne {\texttt {t}}\). In this case, \(k>0\), and we must show that \(\overline{\mathsf {Val}}^{\leqslant 0}(v)=+\infty \), which is true again by definition of \(\overline{\mathsf {Val}}^{\leqslant 0}\).
- 1.First, assume \(k>\ell \). In this case, we must show that \(\overline{\mathsf {Val}}^{\leqslant \ell }(v)=+\infty \). We consider again two cases:
- (a)If \(v\in V_{{\mathsf {Min}}}\), then none of its successors belong to \(\mathsf {Attr}_{\ell -1}(\{{\texttt {t}}\})\), otherwise, v would be in \(\mathsf {Attr}_{\ell }(\{{\texttt {t}}\})\), by definition of the attractor, and we would have \(k\leqslant \ell \). Hence, by induction hypothesis, \(\overline{\mathsf {Val}}^{\leqslant \ell -1}(v^{\prime })=+\infty \) for all \(v^{\prime }\) such that \((v,v^{\prime })\in E\). Thus:$$\begin{aligned} \overline{\mathsf {Val}}^{\leqslant \ell }(v)&= \min _{(v,v^{\prime })\in E} \left( \omega \left( v,v^{\prime }\right) +\overline{\mathsf {Val}}^{\leqslant \ell -1}\left( v^{\prime }\right) \right)&\quad (\hbox {Lemma~6})\\&=+\infty . \end{aligned}$$
- (b)If \(v\in V_{{\mathsf {Max}}}\), then at least one successor of v does not belong to \(\mathsf {Attr}_{\ell -1}(\{{\texttt {t}}\})\), otherwise, v would be in \(\mathsf {Attr}_{\ell }(\{{\texttt {t}}\})\), by definition of the attractor, and we would have \(k\leqslant \ell \). Hence, by induction hypothesis, there exists \(v^{\prime }\) such that \((v,v^{\prime })\in E\) and \(\overline{\mathsf {Val}}^{\leqslant \ell -1}(v^{\prime })=+\infty \). Thus:$$\begin{aligned} \overline{\mathsf {Val}}^{\leqslant \ell }(v)&= \max _{(v,v^{\prime })\in E} \left( \omega \left( v,v^{\prime }\right) + \overline{\mathsf {Val}}^{\leqslant \ell -1}\left( v^{\prime }\right) \right)&\quad \text {(Lemma}~6)\\&=+\infty . \end{aligned}$$
- (a)
- 2.Second, assume \(k\leqslant \ell \). In this case, we must show that \(\overline{\mathsf {Val}}^{\leqslant \ell }(v)\leqslant \ell W\). As in the previous item, we consider two cases:
- (a)In the case where \(v\in V_{{\mathsf {Min}}}\), we let \(\overline{v}\) be a vertex such that \(\overline{v}\in \mathsf {Attr}_{k-1}(\{{\texttt {t}}\})\) and \((v,\overline{v})\in E\). Such a vertex exists by definition of the attractor. By induction hypothesis, \(\overline{\mathsf {Val}}^{\leqslant \ell -1}(\overline{v})\leqslant \ell W\). Then:$$\begin{aligned} \overline{\mathsf {Val}}^{\leqslant \ell }(v)&= \min _{(v,v^{\prime })\in E} \left( \omega \left( v,v^{\prime }\right) +\overline{\mathsf {Val}}^{\leqslant \ell -1}\left( v^{\prime }\right) \right)&\quad \text {(Lemma}~6)\\&\leqslant \omega \left( v,\overline{v}\right) +\overline{\mathsf {Val}}^{\leqslant \ell -1}\left( \overline{v}\right)&\quad \left( \left( v,\overline{v}\right) \in E\right) \\&\leqslant \omega \left( v,\overline{v}\right) +(\ell -1) W&\quad \text {(Ind. Hyp.)}\\&\leqslant W+(\ell -1) W =\ell W. \end{aligned}$$
- (b)In the case where \(v\in V_{{\mathsf {Max}}}\), we know that all successors \(v^{\prime }\) of v belong to \(\mathsf {Attr}_{k-1}(\{{\texttt {t}}\})\) by definition of the attractor. By induction hypothesis, for all successors \(v^{\prime }\) of v: \(\overline{\mathsf {Val}}^{\leqslant \ell -1}(v^{\prime })\leqslant \ell W\). Hence:$$\begin{aligned} \overline{\mathsf {Val}}^{\leqslant \ell }(v)&= \max _{\left( v,v^{\prime }\right) \in E} \left( \omega \left( v,v^{\prime }\right) + \overline{\mathsf {Val}}^{\leqslant \ell -1}\left( v^{\prime }\right) \right)&\quad \text {(Lemma}~6)\\&\leqslant \max _{\left( v,v^{\prime }\right) \in E} \left( W+(\ell -1) W\right)&\quad \text {(Ind. Hyp.)}\\&=\ell W. \end{aligned}$$
- (a)
In particular, this allows us to conclude that, after \(|V|\) steps, all values are bounded by \(|V| W\):
Corollary 8
For all \(v\in V\), \(\overline{\mathsf {Val}}^{\leqslant |V|}(v)\leqslant |V| W\).
The next step is to show that the sequence \((x_i)_{i\geqslant 0}\) stabilises after a bounded number of steps, when all values are finite:
Lemma 9
In an MCR game where all values are finite, the sequence \((\overline{\mathsf {Val}}^{\leqslant i})_{i\geqslant 0}\) stabilises after at most \((2|V|-1) W |V|+|V|\) steps.
Proof
We first show that if \({\mathsf {Min}}\) can secure, from some vertex v, a payoff less than \(-(|V|-1) W\), i.e. \(\mathsf {Val}(v)<-(|V|-1) W\), then it can secure an arbitrarily small payoff from that vertex, i.e. \(\mathsf {Val}(v)=-\infty \), which contradicts our hypothesis that the value is finite. Hence, let us suppose that there exists a strategy \(\sigma _{{\mathsf {Min}}}\) for \({\mathsf {Min}}\) such that \(\mathsf {Val}(v,\sigma _{{\mathsf {Min}}})<-(|V|-1) W\). Let \(\mathcal {G}^{\prime }\) be the mean-payoff game studied in Proposition 4. We will show that \(\mathsf {Val}_{\mathcal {G}^{\prime }}(v)<0\), which permits to conclude that \(\mathsf {Val}_\mathcal {G}(v)=-\infty \). Let \(\sigma _{{\mathsf {Max}}}\) be a memoryless strategy of \({\mathsf {Max}}\). By hypothesis, we know that \(\mathbf {MCR}(\mathsf {Play}(v,\sigma _{{\mathsf {Max}}},\sigma _{{\mathsf {Min}}})) < -(|V|-1) W\). This ensures the existence of a cycle with negative cost in the play \(\mathsf {Play}(v,\sigma _{{\mathsf {Max}}},\sigma _{{\mathsf {Min}}})\): otherwise, we could iteratively remove every possible non-negative cycle of the finite play before reaching \({\texttt {t}}\) (hence reducing the cost of the play) and obtain a play without cycles before reaching \({\texttt {t}}\) with a cost less than \(-(|V|-1) W\), which is impossible (since it should be of length at most \(|V|-1\) to cross at most one occurrence of each vertex). Consider the first negative cycle in the play. After the first occurrence of the cycle, we let \({\mathsf {Min}}\) choose its actions like in the cycle. By this way, we can construct another strategy \(\sigma _{{\mathsf {Min}}}^{\prime }\) for \({\mathsf {Min}}\), verifying that for every memoryless strategy \(\sigma _{{\mathsf {Max}}}\) of \({\mathsf {Max}}\) (this would not be true for general strategies of \({\mathsf {Max}}\)), we have \(\mathbf{{MP}}(\mathsf {Play}(v,\sigma _{{\mathsf {Max}}},\sigma _{{\mathsf {Min}}}^{\prime }))\) being the weight of the negative cycle in which the play finishes. Since for mean-payoff games, memoryless strategies are sufficient for \({\mathsf {Max}}\), we deduce that \(\mathsf {Val}_{\mathcal {G}^{\prime }}(v)<0\).
Let us thus denote by \(\overline{\mathsf {Val}}^{\leqslant }\) the value obtained when the sequence \((\overline{\mathsf {Val}}^{\leqslant i})_{i\geqslant 0}\) stabilises. We are now ready to prove that this value is the actual value of the game:
Lemma 10
For all MCR games where all values are finite: \(\overline{\mathsf {Val}}^{\leqslant }=\mathsf {Val}.\)
Proof
We already know that \(\overline{\mathsf {Val}}^{\leqslant }\succcurlyeq \mathsf {Val}\). Let us show that \(\overline{\mathsf {Val}}^{\leqslant }\preccurlyeq \mathsf {Val}\). Let \(v\in V\) be a vertex. Since \(\mathsf {Val}(v)\) is a finite integer, there exists a strategy \(\sigma _{{\mathsf {Min}}}\) for \({\mathsf {Min}}\) that realises this value, i.e. \(\mathsf {Val}(v)=\sup _{\sigma _{{\mathsf {Max}}}} \mathbf {MCR}(\mathsf {Play}(v,\sigma _{{\mathsf {Max}}},\sigma _{{\mathsf {Min}}}))\). Notice that this holds because the values are integers, inducing that the infimum in the definition of \(\overline{\mathsf {Val}}(v)=\mathsf {Val}(v)\) is indeed reached.
Let us build a tree \(A_{\sigma _{{\mathsf {Min}}}}\) unfolding all possible plays from v against \(\sigma _{{\mathsf {Min}}}\). \(A_{\sigma _{{\mathsf {Min}}}}\) has a root labeled by v. If a tree node is labeled by a vertex v of \({\mathsf {Min}}\), this tree node has a unique child labeled by \(\sigma _{{\mathsf {Min}}}(v)\). If a tree node is labeled by a vertex v of \({\mathsf {Max}}\), this tree node has one child per successor \(v^{\prime }\) of v in the graph, labeled by \(v^{\prime }\). We proceed this way until we encounter a node labeled by a vertex from \({\texttt {t}}\) in which case this node is a leaf. \(A_{\sigma _{{\mathsf {Min}}}}\) is necessarily finite. Otherwise, by König’s lemma, it has one infinite branch that never reaches \({\texttt {t}}\). From that infinite branch, one can extract a strategy \(\sigma _{{\mathsf {Max}}}\) for \({\mathsf {Max}}\) such that \(\mathbf {MCR}(\mathsf {Play}(v,\sigma _{{\mathsf {Max}}},\sigma _{{\mathsf {Min}}}))=+\infty \), hence \(\mathsf {Val}(v)=+\infty \), which contradicts the hypothesis. Assume the tree has depth m. Then, \(A_{\sigma _{{\mathsf {Min}}}}\) is a subtree of the tree A obtained by unfolding all possible plays up to length m (as in the proof of Lemma 6). In this case, it is easy to check that the value labeling the root of \(A_{\sigma _{{\mathsf {Min}}}}\) after applying backward induction is larger than or equal to the value labeling the root of A after applying backward induction. The latter is \(\mathsf {Val}(v)\) while the former is \(\overline{\mathsf {Val}}^{\leqslant m}(v)\), by Lemma 6, so that \(\mathsf {Val}(v)\geqslant \overline{\mathsf {Val}}^{\leqslant m}(v)\). Since the sequence is non-increasing, we finally obtain \(\mathsf {Val}(v)\geqslant \overline{\mathsf {Val}}^{\leqslant }(v)\). \(\square \)
As a corollary of this lemma, we obtain:
Corollary 11
In all MCR games where all values are finite, \(\mathsf {Val}\) is the greatest fixed point of \(\mathcal {F}\).
We are finally able to establish the correctness of Algorithm 1.
Proof of Proposition 5
Let us first suppose that the values of all vertices are finite. Then, \(x_j=\overline{\mathsf {Val}}^{\leqslant j}\) is the value of \(\mathsf{X}\) at the beginning of the jth step of the loop, and the condition of line 13 can never be fulfilled. Hence, by Lemma 9, after at most \((2|V|-1) W |V|+|V|\) iterations, all values are computed correctly (by Lemma 10) in that case.
Suppose now that there are vertices with value \(+\infty \). Those vertices will remain at their initial value \(+\infty \) during the whole computation, and hence do not interfere with the rest of the computation.
Finally, consider that the game contains vertices with value \(-\infty \). By the proof of Lemma 9, we know that optimal values of vertices of values different from \(-\infty \) are at least \(-(|V|-1) W +1\) so that, if the value of a vertex reaches an integer below \(-(|V|-1) W\), we are sure that its value is indeed \(-\infty \), which proves correct the line 13 of the algorithm. This update may cost at most one step per vertex, which in total adds at most \(|V|\) iterations. Moreover, dropping the value to \(-\infty \) does not harm the correction for the other vertices (it may only speed up the convergence of their values). This is due to the fact that, if the Kleene sequence \((\mathcal {F}^i(x_0))_{i\geqslant 0}\) is initiated with a vector of values \(x_0\) that is greater or equal to the optimal value vector \(\mathsf {Val}\), then the sequence converges at least as fast as before towards the optimal value vector. \(\square \)
Example 12
We close this discussion on the computation of the values by an example of execution of Algorithm 1. Consider the MCR game in Fig. 2. The successive values for vertices \((v_1,v_2)\) (value of the target \(v_3\) is always 0) computed by the value iteration algorithm are the following: \((+\infty ,+\infty )\), \((+\infty ,0)\), \((-1,0)\), \((-1,-1)\), \((-2,-1)\), \((-2,-2), \ldots , (-W,-W+1)\), \((-W, -W)\). This requires 2W steps to converge (hence a pseudo-polynomial time).
3.4 Computing optimal strategies for both players
As we have seen earlier, \({\mathsf {Min}}\) does not always have optimal memoryless strategies. However, we will see that one can always construct so-called negative cycle strategies (NC-strategies), which are memoryless strategies that have a meaningful structure for \({\mathsf {Min}}\), in the sense that they allow him either: (i) to reach the target by means of a play whose value is lower that the value of the game; or (ii) to decrease arbitrarily the partial sums along the play, when it does not reach the target (in other words, the partial sums tend to \(-\infty \) as the play goes on). So, NC-strategies are, in general, not optimal, as they do not guarantee to reach the target (but in this case, they guarantee that \({\mathsf {Min}}\) will play consistently with his objective, by decreasing the value of the play prefixes).
Example 13
On the game of Fig. 2, the memoryless strategy \(\sigma _{{\mathsf {Min}}}\) mapping \(v_2\) to \(v_1\) is an NC-strategy as the only two possible cycles \(v_1 v_2 v_1\) and \(v_2 v_1 v_2\) both have weight \(-1\). The value of \(\sigma _{{\mathsf {Min}}}\) from \(v_2\) is \(+\infty \) as the play \(v_2 v_1 v_2 v_1 v_2 \ldots \) agrees with it and does not reach the target, but the fake value of \(\sigma _{{\mathsf {Min}}}\) is \(-W\), since the play \(v_2 v_1 v_3\) is the finite play that agrees with \(\sigma _{{\mathsf {Min}}}\) with the biggest cost possible. Since we know that the actual optimal value is \(-W\), the strategy \(\sigma _{{\mathsf {Min}}}\) is fake-optimal.
The following proposition reveals the interest of NC-strategies, by explaining how one can, in some cases, construct an optimal finite-memory strategy from a fake-optimal NC-strategy.
Proposition 14
If \({\mathsf {Min}}\) has a strategy \(\sigma _{{\mathsf {Min}}}^\dagger \) to reach a target vertex (from every possible initial vertex), and has a fake-optimal NC-strategy \(\sigma _{{\mathsf {Min}}}^\star \), then for all \(n\in \mathbb {Z}\) one can construct a finite-memory strategy \(\sigma _{{\mathsf {Min}}}^n\) such that for all vertices v, it holds that \(\mathsf {Val}(v,\sigma _{{\mathsf {Min}}}^n)\leqslant \max (n,\mathsf {Val}(v))\).
Remark 15
In particular, if the value of all vertices is finite, then one can construct an optimal finite-memory strategy. If the value of a vertex is \(-\infty \), this proposition also says that there is an infinite family a strategies that allows one to secure a value which is arbitrarily low (remember that, by definition, \(-\infty \) can not be the value that corresponds to a single strategy).
Proof
First let us show that for all partial plays \(\pi =v_1\ldots v_\ell \) of size at least \(k |V|+1\) (for some k) that conforms with \(\sigma _{{\mathsf {Min}}}^\star \), \({\mathbf{TP}}(\pi ) \leqslant W(|V|-1)-k\). We establish this proof by induction on k. For the base case, we consider the case where \(\ell \leqslant |V|\). Then, the play visits at most \(|V|-1\) edges, and thus its total cost is at most \(W(|V|-1)\). Now, for the induction, we assume that \(\ell \geqslant k |V|+1\) for some \(k\geqslant 1\). Then, let i and j be two indices such that \(i<j\), \(v_i=v_j\) and \(j\leqslant i+|V|\) (those indices necessarily exist since \(\ell \geqslant |V|+1\)). Since \(\sigma _{{\mathsf {Min}}}^\star \) is an NC-strategy, the total cost of \(v_i\ldots v_j\) is at most \(-1\). As \(\pi ^{\prime }=v_1\ldots v_i v_{j+1} \ldots v_\ell \) is also a play that conforms with \(\sigma _{{\mathsf {Min}}}^\star \) with size greater than \((k-1) |V|+1\), we have (by induction hypothesis) \({\mathbf{TP}}(\pi ^{\prime })\leqslant W(|V|-1)-k+1\), thus \({\mathbf{TP}}(\pi ) = {\mathbf{TP}}(v_i\ldots v_j) + {\mathbf{TP}}(\pi ^{\prime }) \leqslant W(|V|-1)-k\).
In the following, let \(\sigma _{{\mathsf {Min}}}^\dagger \) be a memoryless strategy ensuring to reach the target (obtained by the attractor technique for instance), and let \(k =\max (2 W (|V|-1) - n,0)\). The strategy \(\sigma _{{\mathsf {Min}}}^n\) consists in playing \(\sigma _{{\mathsf {Min}}}^\star \), until switching to \(\sigma _{{\mathsf {Min}}}^\dagger \) when the length of the play is greater than \(k|V|+1\): formally \(\sigma _{{\mathsf {Min}}}^n(\pi ) = \sigma _{{\mathsf {Min}}}^\star (\pi )\) if \(|\pi | < k|V|+1\) and \(\sigma _{{\mathsf {Min}}}^n(\pi ) = \sigma _{{\mathsf {Min}}}^\dagger (\pi )\) otherwise. It is clear that this strategy can be implemented by a finite deterministic Moore machine, storing the size of the current play until it is greater than \(k|V|+1\).
In practice, rather than using a Moore machine, we can simulate the strategy \(\sigma _{{\mathsf {Min}}}^n\) (of the proof above) quite easily: one just has to handle two memoryless strategies and a counter keeping track of the length of the current play. Since \(\sigma _{{\mathsf {Min}}}^\dagger \) can easily be obtained by the classical attractor algorithm, we turn our attention to the construction of a fake-optimal NC-strategy \(\sigma _{{\mathsf {Min}}}^\star \). Without loss of generality, we suppose that no vertex has optimal value \(+\infty \), since for these vertices, all strategies are equivalent.
For vertices of value \(-\infty \), we can obtain \(\sigma _{{\mathsf {Min}}}^\star \) as an optimal strategy for \({\mathsf {Min}}\) in the mean-payoff game of the first item in Proposition 4. Since the mean-payoff value is negative, this strategy guarantees that it does not reach target, thus generating a fake value \(-\infty \), equal to the optimal value of the vertex. Moreover, since it is a memoryless strategy, we know that, as soon as \({\mathsf {Max}}\) plays a memoryless strategy that necessarily reaches a cycle, this cycle must have a negative weight (at most the optimal value of the initial vertex): this strategy is thus a fake-optimal NC-strategy.
From now on, we thus concentrate our study on the vertices of finite value, thus considering that no vertices have value \(+\infty \) or \(-\infty \) in the MCR game. Let \(X^i\) denote the value of variable \(\mathsf {X}\) after i iterations of the loop of Algorithm 1, and let \(X^0(v)=+\infty \) for all \(v\in V\). We have seen that the sequence \(X^0\succcurlyeq X^1 \succcurlyeq X^2\succcurlyeq \ldots \) is stationary at some point, equal to \(\mathsf {Val}\). Let us now define \(\sigma _{{\mathsf {Min}}}^\star (v)\) for all vertices \(v\in V_{{\mathsf {Min}}}{\setminus }\{{\texttt {t}}\}\). We let \(i_v>0\) be the smallest index such that \(X^{i_v}(v)=\mathsf {Val}(v)\). Fix a vertex \(v^{\prime }\) such that \(X^{i_v}(v) = \omega (v,v^{\prime }) + X^{i_v-1}(v^{\prime })\) (such a \(v^{\prime }\) exists by definition) and define \(\sigma _{{\mathsf {Min}}}^\star (v)=v^{\prime }\). Notice that it is exactly what is achieved in line 11 of Algorithm 1. Note also that strategy \(\sigma _{{\mathsf {Min}}}^\dagger \) is correctly computed in line 12 of Algorithm 1.
Let us prove that this construction indeed yields a fake-optimal NC strategy \(\sigma _{{\mathsf {Min}}}^\star \). We first prove the following lemma, that states that the vertex \(\sigma _{{\mathsf {Min}}}^\star (v)\) has already reached its final value at step \(i_v-1\) of the algorithm, for all vertices v.
Lemma 16
For all vertices \(v\in V_{{\mathsf {Min}}}{\setminus }\{{\texttt {t}}\}\), \(X^{i_v-1}(\sigma _{{\mathsf {Min}}}^\star (v))= \mathsf {Val}(\sigma _{{\mathsf {Min}}}^\star (v))\).
Proof
We can now prove that our definition of \(\sigma _{{\mathsf {Min}}}^\star \) has the announced properties:
Proposition 17
\(\sigma _{{\mathsf {Min}}}^\star \) is an NC-strategy, and \(\mathrm {fake}(v,\sigma _{{\mathsf {Min}}}^\star ) \leqslant \mathsf {Val}(v)\) for all vertices v.
Proof
- (1)
if there exists \(i<j\) such that \(v_i=v_j\), then \({\mathbf{TP}}(v_i \ldots v_j)<0\),
- (2)
if \(\pi \) reaches \({\texttt {t}}\) then \(\mathbf {MCR}(\pi )\leqslant \mathsf {Val}(v)\).
As a corollary, we obtain the existence of finite memory strategies (obtained from \(\sigma _{{\mathsf {Min}}}^\star \) and \(\sigma _{{\mathsf {Min}}}^\dagger \) as described above) when all values are finite:
Corollary 18
When the values of all vertices of an MCR game are finite, one can construct an optimal finite-memory strategy for player \({\mathsf {Min}}\).
Strategies of \({\mathsf {Max}}\) Let us now show that \({\mathsf {Max}}\) always has a memoryless optimal strategy. This asymmetry stems directly from the asymmetric definition of the game—while \({\mathsf {Min}}\) has the double objective of reaching \({\texttt {t}}\) and minimising its cost, \({\mathsf {Max}}\) aims at avoiding \({\texttt {t}}\), and if not possible, maximising the cost.
Proposition 19
In all MCR games, \({\mathsf {Max}}\) has a memoryless optimal strategy.
Proof
For vertices with value \(+\infty \), we already know a memoryless optimal strategy for \({\mathsf {Max}}\), namely every strategy that remains outside the attractor of the target vertices. For vertices with value \(-\infty \), all strategies are equally bad for \({\mathsf {Max}}\).
- If \(v_{\ell -i-1}\in V_{{\mathsf {Max}}}{\setminus }\{{\texttt {t}}\}\), then \(v_{\ell -i}=\sigma _{{\mathsf {Max}}}^\star (v_0v_1\ldots v_{\ell -i-1})\), so that by definition of \(\sigma _{{\mathsf {Max}}}^\star \):Using Corollary 11 and (4), we obtain$$\begin{aligned} \omega (v_{\ell -i-1},v_{\ell -i}) + \mathsf {Val}(v_{\ell -i}) = \max _{v^{\prime }\in V\mid \left( v_{\ell -i-1},v^{\prime }\right) \in E} \left( \omega \left( v_{\ell -i-1},v^{\prime }\right) + \mathsf {Val}(v^{\prime })\right) . \end{aligned}$$$$\begin{aligned}\mathbf {MCR}(v_{\ell -i-1}\ldots v_\ell ) \geqslant \mathsf {Val}(v_{\ell -i-1}). \end{aligned}$$
- If \(v_{\ell -i-1}\in V_{{\mathsf {Min}}}{\setminus }\{{\texttt {t}}\}\), thenOnce again using Corollary 11 and (4), we obtain$$\begin{aligned} \omega (v_{\ell -i-1},v_{\ell -i}) + \mathsf {Val}(v_{\ell -i}) \geqslant \min _{v^{\prime }\in V\mid \left( v_{\ell -i-1},v^{\prime }\right) \in E} \left( \omega \left( v_{\ell -i-1},v^{\prime }\right) + \mathsf {Val}\left( v^{\prime }\right) \right) . \end{aligned}$$$$\begin{aligned} \mathbf {MCR}(v_{\ell -i-1}\ldots v_\ell ) \geqslant \mathsf {Val}(v_{\ell -i-1}). \end{aligned}$$
Notice further that the presence of the if condition in line 11, and the absence of a similar condition at line 7, are crucial. Indeed, removing the if from line 11 would amount to computing \(\sigma _{{\mathsf {Min}}}^\star \) from the vector of values obtained at the end of the value iteration, when the vector \(\mathsf X\) has stabilised. Let us show that, in this case, the algorithm might fail to compute a fake-optimal NC-strategy, by considering the MCR game in the left part of Fig. 3. Clearly, the values of both \(v_1\) and \(v_2\) are 0. However, if we extract \(\sigma _{{\mathsf {Min}}}^\star \) from \(\mathsf {X}_{pre}\) at that point of the execution of the algorithm [i.e. we let \(\sigma _{{\mathsf {Min}}}^\star (v)={{\mathrm{\arg \!\min }}}_{v^{\prime }\in E(v)}(\omega (v,v^{\prime })+\mathsf {X}_{pre}(v^{\prime }))\)], then we could end up with \(\sigma _{{\mathsf {Min}}}^\star (v_1)=v_2\) and \(\sigma _{{\mathsf {Min}}}^\star (v_2)=v_1\). In this case, \(\sigma _{{\mathsf {Min}}}^\star \) is no longer an NC-strategy because the cycle \(v_1v_2v_1\) does not have negative weight. Similarly, let us consider the MCR game in the right part of Fig. 3 to explain why line 7 is not under the range of an \(\mathbf {if}(\mathsf {X}(v)\ne \mathsf {X}_{pre}(v))\) condition. After two iterations, \(\mathsf X\) reaches the optimal values \((-1,0,-1,0)\) but a \({\mathsf {Max}}\) strategy \(\sigma _{{\mathsf {Max}}}^\star \) such that \(\sigma _{{\mathsf {Max}}}^\star (v_1)=v_3\) can still be chosen since \(\mathsf {X}_{pre}(v_3)=0\) at that point. However, on the next iteration, \(\mathsf {X}(v_3)=\mathsf {X}_{pre}(v_3)=-1\) (indeed, \(\mathsf {X}\) has now stabilised on all nodes), and it is crucial that \(\sigma _{{\mathsf {Max}}}^\star (v_1)=v_2\) gets computed, otherwise the strategy would not be optimal for \({\mathsf {Max}}\).
4 An efficient algorithm to solve total-payoff games
We now turn our attention back to total-payoff games (without reachability objective), and discuss our main contribution. Building on the results of the previous section, we introduce the first (as far as we know) pseudo-polynomial time algorithm for solving those games in the presence of arbitrary weights, thanks to a reduction from total-payoff games to MCR games. The MCR game produced by the reduction has size pseudo-polynomial in the size of the original total-payoff game. Then, we show how to compute the values of the total-payoff game without building the entire MCR game, and explain how to deduce memoryless optimal strategies from the computation of our algorithm.
4.1 Reduction to MCR games
Example 20
For example, considering the weighted graph of Fig. 2, the corresponding reachability total-payoff game \(\mathcal {G}^3\) is depicted in Fig. 4 (where weights 0 have been removed).
The next proposition formalises the relationship between the two games, and is proved in the rest of this subsection.
Proposition 21
\(\mathsf{{Val}}_\mathcal {G}(v)\ne +\infty \) if and only if \(\mathsf{{Val}}_\mathcal {G}(v)=\mathsf{{Val}}_{\mathcal {G}^k}((v,k))\);
\(\mathsf{{Val}}_\mathcal {G}(v)=+\infty \) if and only if \(\mathsf{{Val}}_{\mathcal {G}^k}((v,k))\geqslant (|V|-1) W+1\).
The bound K will be found by using the fact (informally described in the previous section) that if not infinite, the value of a MCR game belongs in \([-(|V|-1)\times W+1, |V|\times W]\), and that after enough visits of the same vertex, an adequate loop ensures that \(\mathcal {G}^k\) verifies the above properties.
The following lemma relates plays of \(\mathcal {G}^n\) with their projection in \(\mathcal {G}\), comparing their total-payoff.
Lemma 22
- 1.
If \(\pi \) is a finite play in \(\mathcal {G}^n\) then \(\mathsf {proj}(\pi )\) is a finite play in \(\mathcal {G}\).
- 2.
If \(\pi \) is a play in \(\mathcal {G}^n\) that does not reach the target, then \(\mathsf {proj}(\pi )\) is a play in \(\mathcal {G}\).
- 3.
For all finite play \(\pi \), \({\mathbf{TP}}(\pi )={\mathbf{TP}}(\mathsf {proj}(\pi ))\).
Proof
If \(\pi =({\texttt {in}},v,j)\pi ^{\prime }\), then \(\mathsf {proj}(\pi )=\mathsf {proj}(\pi ^{\prime })\). Hence, 1 holds by induction hypothesis. If \(\mathsf {proj}(\pi )\) is non-empty, so is \(\mathsf {proj}(\pi ^{\prime })\). Moreover, the first vertex of \(\pi ^{\prime }\) is either (v, j) or \(({\texttt {ex}},v,j)\), so that we have 4 by induction hypothesis. Finally, the previous remark shows that the first edge of \(\pi \) has necessarily weight 0, so that, \({\mathbf{TP}}(\pi )={\mathbf{TP}}(\pi ^{\prime })\), and 3 also holds by induction hypothesis.
If \(\pi =(v,j)\pi ^{\prime }\), then \(\mathsf {proj}(\pi )=v\,\mathsf {proj}(\pi ^{\prime })\) so that 4 holds directly. Moreover, \(\pi ^{\prime }\) is a non-empty finite play so that \(\pi ^{\prime }=({\texttt {in}},v^{\prime },j)\pi ^{\prime \prime }\) with \((v,v^{\prime })\in E\), and \(\mathsf {proj}(\pi ^{\prime })=\mathsf {proj}(\pi ^{\prime \prime })\). By induction, \(\mathsf {proj}(\pi ^{\prime })\) is a finite play in \(\mathcal {G}\), and it starts with \(v^{\prime }\) (by 4). Since \((v,v^{\prime })\in E\), this shows that \(\mathsf {proj}(\pi )\) is a finite play. Moreover, \({\mathbf{TP}}(\pi )=\omega ^n((v,j),({\texttt {in}},v^{\prime },j))+{\mathbf{TP}}(\pi ^{\prime }) = \omega (v,v^{\prime })+{\mathbf{TP}}(\pi ^{\prime })\). By induction hypothesis, we have \({\mathbf{TP}}(\pi ^{\prime })=\mathbf {MCR}(\mathsf {proj}(\pi ^{\prime }))\). Moreover, \(\mathbf {MCR}(\mathsf {proj}(\pi )) = \omega (v,v^{\prime })+\mathbf {MCR}(\mathsf {proj}(\pi ^{\prime }))\) which concludes the proof of 3.
If \(\pi =({\texttt {ex}},v,j)(v,j-1)\pi ^{\prime }\) then \(\mathsf {proj}(\pi )=v\,\mathsf {proj}(\pi ^{\prime })=\mathsf {proj}((v,j-1)\pi ^{\prime })\): this allows us to conclude directly by using the previous case.
Otherwise, \(\pi =({\texttt {ex}},v,j){\texttt {t}}\pi ^{\prime }\), and then \(\mathsf {proj}(\pi )=v\) is a finite play with total-payoff 0, like \(\pi \), and 4 holds trivially. \(\square \)
The next lemma states that when playing memoryless strategies, one can bound the total-payoff of all finite plays.
Lemma 23
Let \(v\in V\), and \(\sigma _{{\mathsf {Min}}}\) (respectively, \(\sigma _{{\mathsf {Max}}}\)) be a memoryless strategy for \({\mathsf {Min}}\) (respectively, \({\mathsf {Max}}\)) in the total-payoff game \(\mathcal {G}\), such that \(\mathsf{{Val}}(v,\sigma _{{\mathsf {Min}}})\ne +\infty \) (respectively, \(\mathsf{{Val}}(v,\sigma _{{\mathsf {Max}}})\ne -\infty \)). Then for all finite play \(\pi \) conforming to \(\sigma _{{\mathsf {Min}}}\) (respectively, to \(\sigma _{{\mathsf {Max}}}\)), \({\mathbf{TP}}(\pi ) \leqslant (|V|-1) W\) (respectively, \({\mathbf{TP}}(\pi ) \geqslant -(|V|-1) W\)).
Proof
We prove the part for \({\mathsf {Min}}\), the other case is similar. The proof proceeds by induction on the size of a partial play \(\pi =v_1\ldots v_k\) with \(v_1=v\). If \(k\leqslant |V|\) then \({\mathbf{TP}}(\pi ) = \sum \nolimits _{i=1}^{k-1} \omega (v_i,v_{i+1}) \leqslant (k-1) W \leqslant (|V|-1) W\). If \(k\geqslant |V|+1\) then there exists \(i<j\) such that \(v_i=v_j\). Assume by contradiction that \({\mathbf{TP}}(v_i \ldots v_j) > 0\). Then the play \(\pi ^{\prime } = v_1 \ldots v_i \ldots v_j (v_{i+1} \ldots v_j)^\omega \) conforms to \(\sigma _{{\mathsf {Min}}}\) and \({\mathbf{TP}}(\pi ^{\prime }) =+\infty \) which contradicts \(\mathsf{{Val}}(v,\sigma _{{\mathsf {Min}}})\ne +\infty \). Therefore \({\mathbf{TP}}(v_i \ldots v_j) \leqslant 0\). We have \({\mathbf{TP}}(\pi ) = {\mathbf{TP}}(v_1 \ldots v_i)+ {\mathbf{TP}}(v_i \ldots v_j) + {\mathbf{TP}}(v_{j+1} \ldots v_{k})\), and since \(v_i=v_j\), \(v_1 \ldots v_i v_{j+1} \ldots v_k\) is a finite play starting from v that conforms to \(\sigma _{{\mathsf {Min}}}\), and by induction hypothesis \({\mathbf{TP}}(v_1 \ldots v_iv_{j+1} \ldots v_{k} ) \leqslant (|V|-1) W\). Then \({\mathbf{TP}}(\pi ) = {\mathbf{TP}}(v_1 \ldots v_iv_{j+1} \ldots v_{k} )+ {\mathbf{TP}}(v_i \ldots v_j)\leqslant {\mathbf{TP}}(v_1 \ldots v_iv_{j+1} \ldots v_{k} )\leqslant (|V|-1) W\). \(\square \)
This permits to bound the finite values \(\mathsf{{Val}}(v)\) of vertices v of the game:
Corollary 24
For all \(v\in V\), \(\mathsf{{Val}}(v)\in [-(|V|-1) W,(|V|-1) W] \uplus \{-\infty ,+\infty \}\).
Proof
From the result of [12], we know that total-payoff games are memorylessly determined, i.e. there exists two memoryless strategies \(\sigma _{{\mathsf {Max}}},\sigma _{{\mathsf {Min}}}\) such that for all v, \(\mathsf{{Val}}(v)= \mathsf{{Val}}(v,\sigma _{{\mathsf {Max}}}) = \mathsf{{Val}}(v,\sigma _{{\mathsf {Min}}})\). Assume that \(\mathsf{{Val}}(v)\notin \{-\infty ,+\infty \}\). Then since \(\mathsf{{Val}}(v,\sigma _{{\mathsf {Min}}})=\mathsf{{Val}}(v)\ne -\infty \), Lemma 23 shows that all finite play \(\pi \) that conforms to \(\sigma _{{\mathsf {Min}}}\) verifies \({\mathbf{TP}}(\pi ) \geqslant -(|V|-1) W\), therefore \(\mathsf{{Val}}(v) \geqslant -(|V|-1) W\). One can similarly prove that \({\mathbf{TP}}(v) \leqslant (|V|-1) W\). \(\square \)
We now compare values in both games. A first lemma shows, in particular, that \(\mathsf{{Val}}_{\mathcal {G}^n}(v,n)\leqslant \mathsf{{Val}}_\mathcal {G}(v)\), in case \(\mathsf{{Val}}_\mathcal {G}(v)\ne +\infty \).
Lemma 25
For all \(m\in \mathbb {Z}\), \(v\in V\), and \(n\geqslant 1\), if \(\mathsf{{Val}}_\mathcal {G}(v) \leqslant m\) then \(\mathsf{{Val}}_{\mathcal {G}^n}(v,n)\leqslant m\).
Proof
We now turn to the other comparison between \(\mathsf{{Val}}_{\mathcal {G}^n}(v,n)\) and \(\mathsf{{Val}}_\mathcal {G}(v)\). Since \(\mathsf{{Val}}_\mathcal {G}(v)\) can be infinite in case the target is not reachable, we have to be more careful. In particular, we show that \(\mathsf{{Val}}_{\mathcal {G}^n}(v,n)\geqslant \min (\mathsf{{Val}}_\mathcal {G}(v),(|V|-1) W +1)\) holds for large values of n. In the following, we let \(K=|V| (2 (|V|-1) W +1)\).
Lemma 26
For all \(m\leqslant (|V|-1)W+1\), \(k\geqslant K\), and vertex v, if \(\mathsf{{Val}}_\mathcal {G}(v) \geqslant m\) then \(\mathsf{{Val}}_{\mathcal {G}^k}(v,k)\geqslant m\).
Proof
By construction of \(\sigma _{{\mathsf {Max}}}^m\), if \(\pi \) conforms to \(\sigma _{{\mathsf {Max}}}^m\), then \(\mathsf {proj}(\pi )\) conforms to \(\sigma _{{\mathsf {Max}}}\). From the structure of the weighted graph, we know that for every play \(\pi \) of \(\mathcal {G}^k\), there exists \(1\leqslant j\leqslant k\) such that \(\pi \) is of the form \(\pi _k ({\texttt {ex}},v_k,k)\pi _{k-1} ({\texttt {ex}},v_{k-1},k-1) \ldots \pi _j ({\texttt {ex}},v_j,j) \pi ^{\prime }\) verifying that: there are no occurrences of exterior vertices in \(\pi _k,\ldots ,\pi _j,\pi ^{\prime }\); for all \(\ell \leqslant j\), all vertices in \(\pi _\ell \) belong to the \(\ell \)th copy of \(\mathcal {G}\); either \(\pi ^{\prime }={\texttt {t}}^\omega \) or all vertices of \(\pi ^{\prime }\) belong to the \((j+1)\)th copy of \(\mathcal {G}\) (in which case, \(j<k\)).
- 1.
If \(\pi \) does not reach the target, then \(\mathbf {MCR}(\pi )=+\infty \geqslant m\).
- 2.If \(\pi = \pi _k ({\texttt {ex}},v_k,k) \ldots \pi _j ({\texttt {ex}},v_j,j) {\texttt {t}}^\omega \) and \(j>1\) then,Thus, using Lemma 22,$$\begin{aligned}\sigma _{{\mathsf {Max}}}^m \left( \pi _k ({\texttt {ex}},v_0,k) \ldots \pi _j ({\texttt {ex}},v_j,j)\right) ={\texttt {t}}.\end{aligned}$$$$\begin{aligned} \mathbf {MCR}(\pi )&= {\mathbf{TP}}(\pi _k ({\texttt {ex}},v_k,k) \ldots \pi _j ({\texttt {ex}},v_j,j){\texttt {t}})\\&={\mathbf{TP}}(\mathsf {proj}(\pi _k ({\texttt {ex}},v_k,k) \ldots \pi _j ({\texttt {ex}},v_j,j){\texttt {t}})) \\&\geqslant \mathsf {Val}_\mathcal {G}(v,\sigma _{{\mathsf {Max}}}) \geqslant m. \end{aligned}$$
- 3.If \(\pi = \pi _k ({\texttt {ex}},v_k,k) \ldots \pi _1 ({\texttt {ex}},v_1,1) {\texttt {t}}^\omega \), assume by contradiction thatOtherwise, we directly obtain \(\mathbf {MCR}(\pi )\geqslant m\). Let \(v^\star \) be a vertex that occurs at least \(N=\left\lceil K/|V|\right\rceil = 2 (|V|-1) W +1\) times in the sequence \(v_1,\ldots ,v_k\): such a vertex exists, since otherwise \(K\leqslant k \leqslant (N-1) |V|\) which contradicts the fact that \((N-1) |V|<K\). Let \(j_1>\ldots >j_N\) be a sequence of indices such that \(v_{j_i} = v^\star \) for all i. We give a new decomposition of \(\pi \):$$\begin{aligned} {\mathbf{TP}}(\pi _k ({\texttt {ex}},v_k,k) \ldots \pi _1)\leqslant m-1. \end{aligned}$$Since \(\pi \) conforms to \(\sigma _{{\mathsf {Max}}}^m\) and according to the assumption, we have that for all i,$$\begin{aligned} \pi = \pi ^{\prime }_1 ({\texttt {ex}},v_{j_1},j_1) \ldots \pi ^{\prime }_N ({\texttt {ex}},v_{j_N},j_N) \pi ^{\prime }_{N+1}. \end{aligned}$$We consider two cases.$$\begin{aligned} {\mathbf{TP}}\left( \pi ^{\prime }_1 ({\texttt {ex}},v_{j_1},j_1) \ldots \pi ^{\prime }_i\right) \leqslant m-1. \end{aligned}$$
- (a)If there exists \(\pi ^{\prime }_i\) such that \({\mathbf{TP}}(\pi ^{\prime }_i) \leqslant 0\) then, let \(\mathsf {proj}(\pi ^{\prime }_i) = u_1\ldots u_\ell \) with \(u_1 = u_\ell = v^\star \), Since \(\pi ^{\prime }_i\) conforms to \(\sigma _{{\mathsf {Max}}}^m\), \(\mathsf {proj}(\pi ^{\prime }_i)\) conforms to \(\sigma _{{\mathsf {Max}}}\). Therefore the playconforms to \(\sigma _{{\mathsf {Max}}}\). Furthermore, using again Lemma 22,$$\begin{aligned} \widetilde{\pi }=\mathsf {proj}\left( \pi ^{\prime }_1 ({\texttt {ex}},v_{j_1},j_1) \ldots \pi ^{\prime }_i ({\texttt {ex}},v_{j_i},j_i)\right) (u_1 \ldots u_{\ell -1})^\omega \end{aligned}$$and since \({\mathbf{TP}}(u_1\ldots u_\ell )= {\mathbf{TP}}(\pi ^{\prime }_i)\leqslant 0\), we have$$\begin{aligned} {\mathbf{TP}}\left( \widetilde{\pi }\right) = \liminf _{n\rightarrow +\infty } \left( {\mathbf{TP}}\left( \pi ^{\prime }_1 ({\texttt {ex}},v_{j_1},i_1) \ldots \pi ^{\prime }_i ({\texttt {ex}},v_{j_i},j_i)\right) + n {\mathbf{TP}}(u_1\ldots u_\ell )\right) \end{aligned}$$Thus \(\widetilde{\pi }\) is a play starting from v that conforms to \(\sigma _{{\mathsf {Max}}}\) but whose total-payoff is strictly less than m, which raises a contradiction.$$\begin{aligned} {\mathbf{TP}}\left( \widetilde{\pi }\right) \leqslant {\mathbf{TP}}\left( \pi ^{\prime }_1 ({\texttt {ex}},v_{j_1},i_1) \ldots \pi ^{\prime }_i ({\texttt {ex}},v_{j_i},j_i)\right) \leqslant m-1. \end{aligned}$$
- (b)If for all \(\pi ^{\prime }_i\), \({\mathbf{TP}}(\pi ^{\prime }_i)\geqslant 1\) (notice that it is implied by \({\mathbf{TP}}(\pi ^{\prime }_i)>0\)). From Lemma 23, since \(\mathsf {Val}_\mathcal {G}(v,\sigma _{{\mathsf {Max}}})\geqslant m\ne -\infty \), we know that \({\mathbf{TP}}(\mathsf {proj}(\pi ^{\prime }_0))\geqslant -(|V|-1) W\). From Lemma 22, \({\mathbf{TP}}(\pi ^{\prime }_0)\geqslant -(|V|-1) W\). Thereforewhich contradicts the assumption that$$\begin{aligned} {\mathbf{TP}}\left( \pi ^{\prime }_1 ({\texttt {ex}},v_{j_1},i_1) \ldots \pi ^{\prime }_N\right)&\geqslant -(|V|-1) W + N \\&=(|V|-1) W +1 \geqslant m \end{aligned}$$$$\begin{aligned} {\mathbf{TP}}\left( \pi ^{\prime }_1({\texttt {ex}},v_{j_1},i_1) \ldots \pi ^{\prime }_N\right) < m. \end{aligned}$$
- (a)
Using the two last lemmas, we can now prove Proposition 21 by relating precisely values in \(\mathcal {G}\) and \(\mathcal {G}^k\).
Proof of Proposition 21
If \(\mathsf{{Val}}_\mathcal {G}(v)= -\infty \), then for all m, \(\mathsf{{Val}}_\mathcal {G}(v)\leqslant m\). By Lemma 25, \(\mathsf{{Val}}_{\mathcal {G}^K}(v,K) \leqslant m\). Therefore \(\mathsf{{Val}}_{\mathcal {G}^K}(v,K) = -\infty \).
If \(\mathsf{{Val}}_\mathcal {G}(v)= m \in [-(|V|-1) W,(|V|-1) W]\). Then, \(m\leqslant \mathsf{{Val}}_\mathcal {G}(v)\leqslant m\). Thus, by Lemmas 25 and 26, \(m\leqslant \mathsf{{Val}}_{\mathcal {G}^K}(v,K) \leqslant m\). Therefore \(\mathsf{{Val}}_{\mathcal {G}^K}(v,K) = m\).
If \(\mathsf{{Val}}_\mathcal {G}(v) = +\infty \), then \(\mathsf{{Val}}_\mathcal {G}(v)\geqslant (|V|-1) W+1\). By Lemma 26, \(\mathsf{{Val}}_{\mathcal {G}^K}(v,K) \geqslant (|V|-1)W+1\). \(\square \)
4.2 Value iteration algorithm for total-payoff games
By Proposition 21, an immediate way to obtain a value iteration algorithm for total-payoff games is to build game \(\mathcal {G}^K\), run Algorithm 1 on it, and map the computed values back to \(\mathcal {G}\). We take advantage of the structure of \(\mathcal {G}^K\) to provide a better algorithm that avoids building \(\mathcal {G}^K\). Intuitively, we first compute the values of the vertices in the last copy of the game (vertices of the form (v, 1), \(({\texttt {in}},v,1)\) and \(({\texttt {ex}},v,1)\)), then of those in the penultimate (vertices of the form (v, 2), \(({\texttt {in}},v,2)\) and \(({\texttt {ex}},v,2)\)), and so on.
Theorem 27
If a total-payoff game \(\mathcal {G}=\langle V,E,\omega ,{\mathbf{TP}}\rangle \) is given as input, Algorithm 2 outputs the vector \(\mathsf {Val}_\mathcal {G}\) of optimal values, after at most \(K=|V| (2 (|V|-1) W+1)\) iterations of the external loop. The complexity of the algorithm is \(O(|V|^4 |E| W^2)\).
Notice the absence of exterior vertices \(({\texttt {ex}},v^{\prime },j)\) in game \(\mathcal {G}_Y\), replaced by the computation of the maximum between 0 and \(X(v^{\prime })\) on the edge towards the target. Before proving the correctness of Algorithm 2, we prove several interesting properties of operator \(\mathcal {H}\).
Proposition 28
\(\mathcal {H}\) is a monotonic operator.
Proof
Consider then the vector \(X_0\) defined by \(X_0(v_1)=+\infty \) for all \(v_1\in V^{\prime }\). Given \(Y\preccurlyeq Y^{\prime }\), from (6), we have that \(\mathcal {F}_{Y}(X_0)\preccurlyeq \mathcal {F}_{Y^{\prime }}(X_0)\), then a simple induction shows that for all i, \(\mathcal {F}^i_{Y}(X_0)\preccurlyeq \mathcal {F}^i_{Y^{\prime }}(X_0)\). Thus, since \(\mathsf {Val}_{\mathcal {G}_Y}\) (respectively, \(\mathsf {Val}_{\mathcal {G}_{Y^{\prime }}}\)) is the greatest fixed point of \(\mathcal {F}_{Y}\) [respectively, \(\mathcal {F}_{Y^{\prime }}\)], we have \(\mathsf {Val}_{\mathcal {G}_Y}\preccurlyeq \mathsf {Val}_{\mathcal {G}_{Y^{\prime }}}\). As a consequence \(\mathcal {H}(Y)\preccurlyeq \mathcal {H}(Y^{\prime })\). \(\square \)
Example 29
Recall that, in our setting, a Scott-continuous operator is a mapping \(F: \mathbb {Z}_{\infty }^V\rightarrow \mathbb {Z}_{\infty }^V\) such that for every sequence of vectors \((x_i)_{i\geqslant 0}\) having a limit \(x_{\omega }\), the sequence \((F(x_i))_{i\geqslant 0}\) has a limit equal to \(F(x_{\omega })\).
We present a total-payoff game whose associated operator \(\mathcal {H}\) is not continuous. Let \(\mathcal {G}\) be the total-payoff game containing one vertex v of \({\mathsf {Min}}\) and a self loop of weight \(-1\) (as depicted in Fig. 6). For all \(Y\in \mathbb {Z}\), in the MCR game \(\mathcal {G}_Y\), v has value \(-\infty \), indeed one can take the loop an arbitrary number of times before reaching the target, ensuring a value arbitrary low. Therefore, if we take an increasing sequence \((Y_i)_{i\geqslant 0}\) of integers, \(\mathcal {H}(Y_i)(v)=-\infty \) for all i, thus the limit of the sequence \((\mathcal {H}(Y_i))_{i\geqslant 0}\) is \(-\infty \). However, the limit of the sequence \((Y_i)_{i\geqslant 0}\) is \(+\infty \) and \(\mathcal {H}(+\infty )(v)=+\infty \), since the target is not reachable anymore (in case the weight of an edge would be \(+\infty \), it is removed in the definition of \(E_Y\)). Thus, \(\mathcal {H}\) is not Scott-continuous. \(\square \)
In particular, we may not use the Kleene sequence, as we have done for MCR games, to conclude to the correctness of our algorithm. Anyhow, we will show that the sequence \((Y^j)_{j\geqslant 0}\) indeed converges towards the vector of values of the total-payoff game. We first show that this vector is a pre-fixed point of \(\mathcal {H}\) starting with a technical lemma that is useful in the subsequent proof.
Lemma 30
Proof
Lemma 31
\(\mathsf{{Val}}_\mathcal {G}\) is a pre-fixed point of \(\mathcal {H}\), i.e. \(\mathcal {H}(\mathsf{{Val}}_\mathcal {G}) \preccurlyeq \mathsf{{Val}}_\mathcal {G}\).
Proof
- If there exists \(i\geqslant 2\) such that \(Y^\star (v_i)=\mathsf{{Val}}_\mathcal {G}(v_i)\geqslant 0\), thenwhich raises a contradiction.$$\begin{aligned} {\mathbf{TP}}(v_1 ({\texttt {in}},v_2) \ldots ({\texttt {in}},v_i) {\texttt {t}})&= {\mathbf{TP}}( v_1 ({\texttt {in}},v_2) \ldots ({\texttt {in}},v_i)) + \omega _X(({\texttt {in}},v_i), {\texttt {t}})\\&= {\mathbf{TP}}(v_1\ldots v_i) + \max \left( 0,Y^\star (v_i)\right) \\&= {\mathbf{TP}}(v_1\ldots v_i) +Y^\star (v_i) \\&\leqslant \mathsf{{Val}}(v_1,\sigma _{{\mathsf {Min}}}) \qquad \qquad \text {(from Lemma~30)}\\&\leqslant m \end{aligned}$$
- If for all \(i\geqslant 2\), \(Y^\star (v_i)=\mathsf{{Val}}_\mathcal {G}(v_i)<0\) then for all \(i\geqslant 2\),Thus \({\mathbf{TP}}(v_1 v_2 \ldots )>m\), which contradicts the fact that \(v_1v_2\ldots \) conforms to \(\sigma _{{\mathsf {Min}}}\) and \(\mathsf{{Val}}_\mathcal {G}(v_1,\sigma _{{\mathsf {Min}}})\leqslant m\).$$\begin{aligned} {\mathbf{TP}}(v_1 ({\texttt {in}},v_2) \ldots ({\texttt {in}},v_i) {\texttt {t}}) = {\mathbf{TP}}(v_1 ({\texttt {in}},v_2) \ldots ({\texttt {in}},v_i))={\mathbf{TP}}(v_1\ldots v_i)>m. \end{aligned}$$
Remark 32
Before continuing the proof of Theorem 27, we show the result used in the previous remark.
Lemma 33
- (i)
For every finite play \(v_1 \ldots v_k\) conforming to \(\sigma _{{\mathsf {Max}}}\) starting in \(v_1=v\), if there exists \(i<j\) such that \(v_i=v_j\) then \({\mathbf{TP}}(v_i\ldots v_j) \geqslant 1\).
- (ii)
For every \(m\in \mathbb {N}\), \(k\geqslant m|V|+1\) and \(v_1 \ldots v_k\) a finite play conforming to \(\sigma _{{\mathsf {Max}}}\) and starting in \(v_1=v\), \({\mathbf{TP}}( v_1 \ldots v_k) \geqslant m-(|V|-1)W\).
- (iii)
For all \(m\in \mathbb {N}\) and \(k\geqslant (m+(|V|-1)W) |V|+1\), \(\mathsf{{Val}}_{\mathcal {G}^{k}}(v,k)\geqslant m\).
- (iv)
\(\lim _{j\rightarrow \infty } \mathsf{{Val}}_{\mathcal {G}^{j}}(v,j)= +\infty \).
Proof
We prove (i) by contradiction. Therefore, assume that \({\mathbf{TP}}(v_i\ldots v_j) \leqslant 0\), then \(\pi =v_1 \ldots v_{i-1} (v_i \ldots v_{j-1})^\omega \) conforms to \(\sigma _{{\mathsf {Max}}}\) and \({\mathbf{TP}}(\pi )\leqslant {\mathbf{TP}}(v_1 \ldots v_{i-1})<+\infty \), which contradicts the fact that \(\mathsf{{Val}}_\mathcal {G}(v,\sigma _{{\mathsf {Max}}})=+\infty \).
To prove (iii), let \(\sigma _{{\mathsf {Max}}}^{\prime }\) be a strategy of \({\mathsf {Max}}\) in \(\mathcal {G}^k\) defined by \(\sigma _{{\mathsf {Max}}}^{\prime }(v,j)=({\texttt {in}},\sigma _{{\mathsf {Max}}}(v),j) \) and \(\sigma _{{\mathsf {Max}}}^{\prime }({\texttt {ex}},v,j)= (v,j-1)\) for all \(v\in V\) and \(j\geqslant k\). Let \(\pi \) be a play starting in (v, k) and conforming to \(\sigma _{{\mathsf {Max}}}^{\prime }\). If \(\pi \) does not reach \({\texttt {t}}\), then \(\mathbf {MCR}(\pi )=+\infty \geqslant m\). If \(\pi \) reaches the target then \(\mathsf {proj}(\pi )\) is of the form \(v_1\ldots v_\ell {\texttt {t}}^\omega \), with \(\mathbf {MCR}(\pi )={\mathbf{TP}}(v_1\ldots v_\ell )\). It is clear by construction of \(\sigma _{{\mathsf {Max}}}^{\prime }\) that \(v_1\ldots v_\ell \) is a finite play of \(\mathcal {G}\) that conforms to \(\sigma _{{\mathsf {Max}}}\). Furthermore, \(\ell \geqslant k\geqslant (m+(|V|-1)W) |V|+1\) thus, from (ii), we have that \({\mathbf{TP}}(v_1\ldots v_\ell )\geqslant m\). This implies \(\mathbf {MCR}(\pi )\geqslant m\). Hence, every play in \(\mathcal {G}^k\) conforming to \(\sigma _{{\mathsf {Max}}}^{\prime }\) and starting in (v, k) has a value at least m, which means that \(\mathsf{{Val}}_{\mathcal {G}^{k}}(v,k)\geqslant m\).
Item (iv) is then a direct consequence of (iii). \(\square \)
We are now ready to state and prove the inductive invariant allowing us to show the correctness of Algorithm 2.
Lemma 34
Before the jth iteration of the external loop of Algorithm 2, we have \(\mathsf {Val}_{\mathcal {G}^j}(v,j) \leqslant Y^j(v) \leqslant \mathsf {Val}_\mathcal {G}(v)\) for all vertices \(v\in V\).
Proof
We are now able to prove the correctness and termination of the algorithm.
Proof of Theorem 27
Hence, \(K=|V| (2 (|V|-1) W +1)\) is an upper bound on the number of iterations before convergence of Algorithm 2, and moreover, at the convergence, the algorithm outputs the vector of optimal values of the total-payoff game. \(\square \)
4.3 Optimal strategies
In Sect. 3, we have shown, for all MCR games, the existence of a fake-optimal NC-strategy permitting to reconstruct an optimal finite-memory strategy for \({\mathsf {Min}}\) (if every vertex has value different from \(-\infty \), or a strategy ensuring every possible threshold for vertices with value \(-\infty \)). Given a total-payoff game \(\mathcal {G}\), if we apply this construction to the game \(\mathcal {G}_{\mathsf {Val}_\mathcal {G}}\), we obtain an NC-strategy \(\sigma _{{\mathsf {Min}}}^\star \). Consider the strategy \(\overline{\sigma _{{\mathsf {Min}}}}\), obtained by projecting \(\sigma _{{\mathsf {Min}}}^\star \) on \(V\) as follows: for all finite plays \(\pi \) and vertices \(v\in V_{{\mathsf {Min}}}\), we let \(\overline{\sigma _{{\mathsf {Min}}}}(\pi v) = v^{\prime }\) if \(\sigma _{{\mathsf {Min}}}^\star (v)=({\texttt {in}},v^{\prime })\). We show thereafter that \(\overline{\sigma _{{\mathsf {Min}}}}\) is optimal for \({\mathsf {Min}}\) in \(\mathcal {G}\). Notice that \(\sigma _{{\mathsf {Min}}}^\star \), and hence \(\overline{\sigma _{{\mathsf {Min}}}}\), can be computed during the last iteration of the value iteration algorithm, as explained in the case of MCR games in Sect. 3.4. A similar construction can be done to compute an optimal strategy for \({\mathsf {Max}}\).
Theorem 35
The memoryless strategy \(\overline{\sigma _{{\mathsf {Min}}}}\) is optimal in \(\mathcal {G}\).
Proof
Assume first that during \(\pi \), \({\mathsf {Min}}\) never asks to go to the target, i.e. for all \(i>0\), \(\sigma _{{\mathsf {Min}}}^\star ({\texttt {in}},v_i) = v_i\). Then the play \(v_0 ({\texttt {in}},v_1) v_1 ({\texttt {in}}, v_2) \ldots \) is a play of \(\mathcal {G}_{\mathsf {Val}_\mathcal {G}}\) that conforms with \(\sigma _{{\mathsf {Min}}}^\star \). As there are only finitely many vertices in \(\mathcal {G}_{\mathsf {Val}_\mathcal {G}}\), there must exist a vertex v that appears infinitely often in this play. As \(\sigma _{{\mathsf {Min}}}^\star \) is an NC-strategy, the accumulated cost of the chunk of the play between two occurrences of v has weight at most −1, thus the total payoff of \(v_0 ({\texttt {in}},v_1) v_1 ({\texttt {in}}, v_2) \ldots \) is \(-\infty \). As the total-payoff of this play is equal to the total payoff of \(\pi \), we have that \({\mathbf{TP}}(\pi ) = -\infty \).(2) for all plays \(\pi = v_0v_1 \ldots \) in \(\mathcal {G}\) that conforms with \(\overline{\sigma _{{\mathsf {Min}}}}\) and such that \(\mathsf {Val}_\mathcal {G}(v_0)<+\infty \), either \({\mathbf{TP}}(\pi ) = -\infty \) or there exists \(i_{\pi }>0\) such that \(\sigma _{{\mathsf {Min}}}^\star ({\texttt {in}}, v_{i_\pi })={\texttt {t}}\) and \({\mathbf{TP}}(v_0\ldots v_{i_{\pi }})\leqslant \mathsf{{Val}}_\mathcal {G}(v_0) - \mathsf{{Val}}_\mathcal {G}(v_{i_{\pi }})\).
Otherwise, \({\mathsf {Min}}\) asks at some point to go to the target: let \(i_{\pi }\) be the first index such that \(\sigma _{{\mathsf {Min}}}^\star ({\texttt {in}},v_{i_{\pi }})={\texttt {t}}\). As the strategy \(\sigma _{{\mathsf {Min}}}^\star \) is a fake-optimal NC-strategy, we know that the accumulated cost of \(\pi \) until the target verifies \({\mathbf{TP}}(\pi ) = \mathbf {MCR}(\pi ) \leqslant \mathsf {Val}_{\mathcal {G}_{\mathsf {Val}_\mathcal {G}}}(v_0)\). As \(\mathsf {Val}_\mathcal {G}\) is a fixed point of \(\mathcal {H}\) (see Remark 32), we have that for all v, \(\mathsf {Val}_{\mathcal {G}_{\mathsf {Val}_\mathcal {G}}}(v) = \mathcal {H}(\mathsf {Val}_\mathcal {G}(v))=\mathsf {Val}_\mathcal {G}(v)\), thus \({\mathbf{TP}}(\pi )\leqslant \mathsf {Val}_\mathcal {G}(v_0)\). We have \({\mathbf{TP}}(v_0 \ldots v_{i_\pi }) = {\mathbf{TP}}(v_0 ({\texttt {in}},v_1) v_1 \ldots v_{i_{\pi }-1} ({\texttt {in}},v_{i_\pi })) = {\mathbf{TP}}(\pi ) - \omega (({\texttt {in}},v_{i_\pi }), {\texttt {t}})\). From (1), we have \(\omega (({\texttt {in}},v_{i_\pi }), {\texttt {t}}) = \mathsf {Val}_\mathcal {G}(v_{i_\pi })\), thus \({\mathbf{TP}}(v_0 \ldots v_{i_\pi }) \leqslant \mathsf {Val}_\mathcal {G}(v_0) - \mathsf {Val}_\mathcal {G}(v_{i_\pi })\), which proves (2).
Now let us prove the theorem. Let v be a vertex in \(\mathcal {G}\). If \(\mathsf {Val}_\mathcal {G}(v) = +\infty \) then trivially \(\mathsf {Val}_\mathcal {G}(v,\overline{\sigma _{{\mathsf {Min}}}})\leqslant +\infty =\mathsf {Val}_\mathcal {G}(v)\). Otherwise let \(\pi = v_0 v_1\ldots \) be a play in \(\mathcal {G}\) that conforms with \(\overline{\sigma _{{\mathsf {Min}}}}\) such that \(v=v_0\). If \({\mathbf{TP}}(\pi ) = -\infty \) then \({\mathbf{TP}}(\pi ) \leqslant \mathsf {Val}_\mathcal {G}(v)\). Otherwise we construct inductively an increasing sequence of indices \(i_0, i_1, \ldots \) such that \({\mathbf{TP}}(v_{i_j}\ldots v_{i_{j+1}})\leqslant \mathsf {Val}_\mathcal {G}(v_{i_j})- \mathsf {Val}_\mathcal {G}(v_{i_{j+1}})\) (for all j) as follows. First, we let \(i_0=0\). Then, for all j, let \(\pi ^{\prime } = v_{i_j} v_{i_j +1} \ldots \) be the current suffix of \(\pi \): since \({\mathbf{TP}}(\pi )\ne -\infty \), we have \({\mathbf{TP}}(\pi ^{\prime })\ne -\infty \), thus (2) shows that by letting \(i_{j+1} = i_{j}+i_{\pi ^{\prime }}\), we obtain \({\mathbf{TP}}(v_{i_j}\ldots v_{i_{j+1}})\leqslant \mathsf {Val}_\mathcal {G}(v_{i_j})- \mathsf {Val}_\mathcal {G}(v_{i_{j+1}})\), and we know that \(\sigma _{{\mathsf {Min}}}^\star ({\texttt {in}}, v_{i_{j+1}}) = {\texttt {t}}\).
5 Implementation and heuristics
Results of value iteration on a parametric example
W | n | Without heuristics | With heuristics | ||||
---|---|---|---|---|---|---|---|
t (s) | \(k_e\) | \(k_i\) | t (s) | \(k_e\) | \(k_i\) | ||
50 | 100 | 0.52 | 151 | 12,603 | 0.01 | 402 | 1404 |
50 | 500 | 9.83 | 551 | 53,003 | 0.42 | 2002 | 7004 |
200 | 100 | 2.96 | 301 | 80,103 | 0.02 | 402 | 1404 |
200 | 500 | 45.64 | 701 | 240,503 | 0.47 | 2002 | 7004 |
500 | 1000 | 536 | 1501 | 1,251,003 | 2.37 | 4002 | 14,004 |
Notice that due to the very little memory consumption of the algorithm, there is no risk of running out of memory. However, the execution time can become very large. For instance, in case \(W=500\) and \(n=1000\), the execution time becomes 536 s whereas the total number of iterations in the internal loop is greater than a million.
5.1 Acceleration techniques
We close this section by sketching two techniques that can be used to speed up the computation of the fixed point in Algorithms 1 and 2. We fix a weighted graph \(\langle V,E,\omega \rangle \). Both accelerations rely on a topological order of the strongly connected components (SCC for short) of the graph, given as a function \(\mathsf {c}:V\rightarrow \mathbb {N}\), mapping each vertex to its component, verifying that (i) \(\mathsf {c}(V)=\{0,\ldots ,p\}\) for some \(p\geqslant 0\), (ii) \(\mathsf {c}^{-1}(q)\) is a maximal SCC for all q, (iii) and \(\mathsf {c}(v)\geqslant \mathsf {c}(v^{\prime })\) for all \((v,v^{\prime })\in E\).^{8}
In case of an MRC game with \({\texttt {t}}\) the unique target, \(\mathsf {c}^{-1}(0) = \{{\texttt {t}}\}\). Intuitively, \(\mathsf {c}\) induces a directed acyclic graph whose vertices are the sets \(\mathsf {c}^{-1}(q)\) for all \(q\in \mathsf {c}(V)\), and with an edge \((S_1,S_2)\) if and only if there are \(v_1\in S_1, v_2\in S_2\) such that \((v_1,v_2)\in E\).
The first acceleration heuristic is a divide-and-conquer technique that consists in applying Algorithm 1 (or the inner loop of Algorithm 2) iteratively on each \(\mathsf {c}^{-1}(q)\) for \(q=0,1,2,\ldots ,p\), using at each step the information computed during steps \(j<q\) (since the value of a vertex v depends only on the values of the vertices \(v^{\prime }\) such that \(\mathsf {c}(v^{\prime })\leqslant \mathsf {c}(v)\)).
The second acceleration heuristic consists in studying more precisely each component \(\mathsf {c}^{-1}(q)\). Having already computed the optimal values \(\mathsf {Val}(v)\) of vertices \(v\in \mathsf {c}^{-1}(\{0,\ldots ,q-1\})\), we ask an oracle to precompute a finite set \(S_v\subseteq \mathbb {Z}_{\infty }\) of possible optimal values for each vertex \(v\in \mathsf {c}^{-1}(q)\). For MCR games and the inner iteration of the algorithm for total-payoff games, one way to construct such a set \(S_v\) is to consider that possible optimal values are the one of non-looping paths inside the component exiting it, since, in MCR games, there exist optimal strategies for both players whose outcome is a non-looping path (see Sect. 3).
Finally, we note that we can identify classes of weighted graphs for which there exists an oracle that runs in polynomial time and returns, for all vertices v, a set \(S_v\) of polynomial size. On such classes, Algorithms 1 and 2, enhanced with our two acceleration techniques, run in polynomial time. For instance, for all fixed positive integers L, the class of weighted graphs where every component \(\mathsf {c}^{-1}(q)\) uses at most L distinct weights (that can be arbitrarily large in absolute value) satisfies this criterion. Table 1 contains the results obtained with the heuristics on the parametric example presented before. Observe that the acceleration technique permits here to decrease drastically the execution time, the number of iterations in both loops depending not even anymore on W. Even though the number of iterations in the external loop increases with heuristics, due to the decomposition, less computation is required in each internal loop since we only apply the computation for the active component.
6 Conclusion
In this work, we have provided the first (to the best of our knowledge) pseudo-polynomial time algorithm to solve total-payoff games with arbitrary (positive and negative) weights. This algorithm is a variation on the classical value iteration technique. To obtain this algorithm, we have reduced the problem of solving total-payoff games to that of solving MCR games, a variant of the former where the game stops as soon as a target vertex is reached (in which case the payoff of the plays is the total accumulated weight of the play up to the target). We believe that those MCR games are interesting by themselves, as they can be used to model problems where, for instance, a target configuration must be reached while ensuring a minimal energy spending. Notice also that they have been used as a building block for the resolution of priced timed games in [3, 4]. We have characterised the optimal strategies that one can extract in those total-payoff and MCR games. Finally, we have implemented our algorithms and proposed some heuristics that take into account the structure of the games to speed up the computation. As future works, we would like to push further the MCR games in a context of non-zero sum games where each player wants to optimise its accumulated cost until reaching its own target. As a possible direction, the search for Nash equilibria in this context will most likely benefit from our better understanding of optimal strategies for both players in the underlying zero-sum games. This bridge from zero-sum to non-zero-sum games has already been investigated for concurrent priced games by Klimoš et al. [14], and Brihaye et al. [2] to find simple Nash equilibria for large classes of multiplayer cost games.
Footnotes
- 1.
Note that those games are different from total-reward games as studied in [20].
- 2.
An example of practical application would be to perform controller synthesis taking into account energy consumption. On the other hand, the problem of computing the values in certain classes of priced timed games has recently been reduced to computing the values in MCR games [3].
- 3.
Our results can easily be extended by substituting a \(\limsup \) for the \(\liminf \). The \(\liminf \) is more natural since we adopt the point of view of the maximiser \({\mathsf {Max}}\), where the \(\liminf \) is the worst partial sum seen infinitely often.
- 4.
- 5.
This is not needed in the proof, but notice that \({\mathsf {Min}}\) necessarily modifies its strategy here, i.e. owns at least one vertex of the cycle. Otherwise, the value in the MCR game \(\mathcal {G}\) would not be \(-\infty \).
- 6.
It suffices to add a vertex of the opponent in-between two vertices of the same player related by a transition in \(\mathcal {G}\): in this vertex, the opponent has no choice but to follow the transition chosen by the first player.
- 7.
Source and binary files, as well as some examples, can be downloaded from http://www.ulb.ac.be/di/verif/monmege/tool/TP-MCR/.
- 8.
Such a mapping is computable in linear time, e.g., by Tarjan’s algorithm [18].
- 9.
We believe that this difference would certainly be eliminated by our preprocessing of vertices of value \(+\infty \) presented in the first item of Theorem 3.
Notes
References
- 1.Björklund, H., Vorobyov, S.: A combinatorial strongly subexponential strategy improvement algorithm for mean payoff games. Discret. Appl. Math. 155, 210–229 (2007)MathSciNetCrossRefMATHGoogle Scholar
- 2.Brihaye, T., De Pril, J., Schewe., S.: Multiplayer cost games with simple nash equilibria. In: Proceedings of the International Symposium on Logical Foundations of Computer Science (LFCS’13). Lecture Notes in Computer Science, vol. 7734, pp. 59–73. Springer, Berlin (2013)Google Scholar
- 3.Brihaye, T., Geeraerts, G., Krishna, S.N., Manasa, L., Monmege, B., Trivedi, A.:Adding negative prices to priced timed games. In: Proceedings of the 25th International Conference on Concurrency Theory (CONCUR’14). Lecture Notes in Computer Science, vol. 8704, pp. 560–575. Springer, Berlin (2014). doi:10.1007/978-3-662-44584-63_8
- 4.Brihaye, T., Geeraerts, G., Haddad, A., Lefaucheux, E., Monmege, B.: Simple priced timed games are not that simple. In: Proceedings of the 35th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS’15), Schloss Dagstuhl–Leibniz-Zentrum für Informatik, LIPIcs (2015)Google Scholar
- 5.Brihaye, T., Geeraerts, G., Haddad, A., Monmege, B.: To reach or not to reach? Efficient algorithms for total-payoff games. In: Proceedings of the 26th International Conference on Concurrency Theory (CONCUR ’15), Schloss Dagstuhl–Leibniz-Zentrum für Informatik, LIPIcs, vol. 42, pp. 297–310 (2015)Google Scholar
- 6.Brim, L., Chaloupka, J., Doyen, L., Gentilini, R., Raskin, J.F.: Faster algorithms for mean-payoff games. Form. Methods Syst. Des. 38(2), 97–118 (2011)CrossRefMATHGoogle Scholar
- 7.Chen, T., Forejt, V., Kwiatkowska, M., Parker, D., Simaitis, A.: Automatic verification of competitive stochastic systems. Form. Methods Syst. Des. 43(1), 61–92 (2013)CrossRefMATHGoogle Scholar
- 8.Comin, C., Rizzi, R.: Improved Pseudo-polynomial bound for the value problem and optimal strategy synthesis in mean payoff games. Algorithmica (2016). doi:10.1007/s00453-016-0123-1
- 9.Ehrenfeucht, A., Mycielski, J.: Positional strategies for mean payoff games. Int. J. Game Theory 8(2), 109–113 (1979)MathSciNetCrossRefMATHGoogle Scholar
- 10.Filiot, E., Gentilini, R., Raskin, J.F.: Quantitative languages defined by functional automata. In: Proceedings of the 23rd International Conference on Concurrency theory (CONCUR ’12). Lecture Notes in Computer Science, vol. 7454, pp. 132–146. Springer, Berlin (2012)Google Scholar
- 11.Gawlitza, T.M., Seidl, H.: Games through nested fixpoints. In: Proceedings of the 21st International Conference on Computer Aided Verification (CAV ’09). Lecture Notes in Computer Science, vol. 5643, pp. 291–305. Springer, Berlin (2009)Google Scholar
- 12.Gimbert, H., Zielonka, W.: When can you play positionally? In: Proceedings of the 29th International Conference on Mathematical Foundations of Computer Science (MFCS ’04). Lecture Notes in Computer Science, vol. 3153, pp. 686–698. Springer, Berlin (2004)Google Scholar
- 13.Khachiyan, L., Boros, E., Borys, K., Elbassioni, K., Gurvich, V., Rudolf, G., Zhao, J.: On short paths interdiction problems: total and node-wise limited interdiction. Theory Comput. Syst. 43, 204–233 (2008)MathSciNetCrossRefMATHGoogle Scholar
- 14.Klimoš, M., Larsen, K.G., Štefaňák, F., Thaarup, J.: Nash equilibria in concurrent priced games. In: Proceedings of the 6th International Conference on Language and Automata Theory and Applications (LATA’12). Lecture Notes in Computer Science, vol. 7183, pp. 363–376. Springer, Berlin (2012)Google Scholar
- 15.Martin, D.A.: Borel determinacy. Ann. Math. 102(2), 363–371 (1975)MathSciNetCrossRefMATHGoogle Scholar
- 16.Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)CrossRefMATHGoogle Scholar
- 17.Strauch, R.E.: Negative dynamic programming. Ann. Math. Stat. 37, 871–890 (1966)MathSciNetCrossRefMATHGoogle Scholar
- 18.Tarjan, R.E.: Depth first search and linear graph algorithms. SIAM J. Comput. 1(2), 146–160 (1972)MathSciNetCrossRefMATHGoogle Scholar
- 19.Thomas, W.: On the synthesis of strategies in infinite games. In: Symposium on Theoretical Aspects of Computer Science (STACS ’95). Lecture Notes in Computer Science, vol. 900, pp. 1–13. Springer, Berlin (1995)Google Scholar
- 20.Thuijsman, F., Vrieze, O.J.: The bad match: a total reward stochastic game. Oper. Res. Spektrum 9(2), 93–99 (1987)MathSciNetCrossRefMATHGoogle Scholar
- 21.Zwick, U., Paterson, M.S.: The complexity of mean payoff games. Theor. Comput. Sci. 158, 343–359 (1996)MathSciNetCrossRefMATHGoogle Scholar