Abstract
We consider two-player zero-sum stochastic mean payoff games with perfect information. We show that any such game, with a constant number of random positions and polynomially bounded positive transition probabilities, admits a polynomial time approximation scheme, both in the relative and absolute sense.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The rise of the Internet has led to an explosion in research in game theory, the mathematical modeling of competing agents in strategic situations. The central concept in such models is that of a Nash equilibrium, which defines a state where no agent gains an advantage by changing to another strategy. Nash equilibria serve as predictions for the outcome of strategic situations in which selfish agents compete.
A fundamental result in game theory states that if the agents can choose a mixed strategy (i.e., probability distributions of deterministic strategies), a Nash equilibrium is guaranteed to exist in finite games [24, 25]. Often, however, already pure (i.e., deterministic) strategies lead to a Nash equilibrium. Still, the existence of Nash equilibria might be irrelevant in practice since their computation would take too long (finding mixed Nash equilibria in two-player games is PPAD-complete in general [11]). Thus, algorithmic aspects of game theory have gained a lot of interest. Following the dogma that only polynomial time algorithms are feasible algorithms, it is desirable to show polynomial time complexity for the computation of Nash equilibria.
We consider two-player zero-sum stochastic mean payoff games with perfect information. In this case the concept of Nash equilibria coincides with saddle points or mini–max/maxi–min strategies. The decision problem associated with computing such strategies and the values of these games is in the intersection of NP and co-NP, but it is unknown whether it can be solved in polynomial time. In cases where efficient algorithms are not known to exist, an approximate notion of a saddle point has been suggested. In an approximate saddle point, no agent can gain a substantial advantage by changing to another strategy. In this paper, we design approximation schemes for saddle points for such games when the number of random positions is fixed (see Sect. 1.2 for a definition).
In the remainder of this section, we introduce the concepts used in this paper. Our results are summarized in Sect. 1.4. After that, we present our approximation schemes (Sect. 2). We conclude with a list of open problems (Sect. 3), where we address in particular the question of polynomial smoothed complexity of mean payoff games. In the conference version of this paper [2], we wrongly claimed that stochastic mean payoff games can be solved in smoothed polynomial time.
1.1 Stochastic Mean Payoff Games
1.1.1 Definition and Notation
The model that we consider is described is a stochastic mean payoff game with perfect information, or equivalently a BWR-game \(\mathcal {G}= (G, P,r)\):
-
\(G=(V,E)\) is a directed graph that may have loops and multiple edges, but no terminal positions, i.e., no positions of out-degree 0. The vertex set V of G is partitioned into three disjoint subsets \(V = V_B \cup V_W \cup V_R\) that correspond to black, white, and random positions, respectively. The edges stand for moves. The black and white positions are owned by two players: Black —the minimizer—owns the black positions in \(V_B\), and White —the maximizer—owns the white positions in \(V_W\). The positions in \(V_R\) are owned by nature.
-
P is the vector of probability distributions for all positions \(v \in V_R\) owned by nature. We assume that \(\sum _{u: (v,u) \in E} p_{vu} = 1\) for all \(v \in V_R\) and \(p_{vu} > 0\) for all \(v \in V_R\) and \((v,u) \in E\).
-
r is the vector of rewards; each edge e has a local reward \(r_e\).
Starting from some vertex \(v_0 \in V\), a token is moved along one edge e in every round of the game. If the token is on a black vertex, Black selects an outgoing edge e and moves the token along e. If the token is on a white vertex, then White selects an outgoing edge e. In a random position \(v \in V_R\), a move \(e= (v, u)\) is chosen according to the probabilities \(p_{v u}\) of the outgoing edges of v. In all cases, Black pays White the reward \(r_e\) on the selected edge e.
Starting from a given initial position \(v_0 \in V\), the game yields an infinite walk \((v_0, v_1, v_2, \ldots )\), called a play. Let \(b_i\) denote the reward \(r_{(v_{i-1},v_{i})}\) received by White in step i. The undiscounted limit average effective payoff is defined as the Cesàro average \(c=\liminf _{n\rightarrow \infty }\frac{\sum _{i=1}^n{{\mathrm{\mathbb {E}}}}[b_i]}{n}\). White’s objective is to maximize c, while the objective of Black is to minimize it.
In this paper, we will restrict our attention to the sets of pure (that is, non-randomized) and stationary (that is, history-independent) strategies of players White and Black, denoted by \(S_W\) and \(S_B\), respectively; such strategies are called positional strategies. Formally, a positional strategy \(s_W \in S_W\) for White is a mapping that assigns a move \((v,u) \in E\) to each position in \(V_W\). We sometimes abbreviate \(s_W(v)=(v,u)\) by \(s_W(v)=u\). Strategies \(s_B \in S_B\) for Black are analogously defined. A pair of strategies \(s = (s_W, s_B)\) is called a situation. By abusing notation, let \(s(v) = u\) if \(v \in V_W\) and \(s_W(v) = u\) or \(v \in V_B\) and \(s_B(v) = u\).
Given a BWR-game \(\mathcal {G}= (G, P, r)\) and a situation \(s = (s_B, s_W)\), we obtain a weighted Markov chain \(\mathcal {G}(s) = (G(s)=(V,E(s)),P(s), r)\) with transition matrix P(s) defined in the obvious way:
Here, \(E(s)=\{e=(v,u) \in E \mid p_{vu}(s)>0\}\) is the set of arcs with positive probability. Given an initial position \(v_0\in V\) from which the play starts, we define the limiting (mean) effective payoff \(c_{v_0}(s)\) in \(\mathcal {G}(s)\) as
where \(\rho (s)=\rho (s,v_0)\in [0,1]^E\) is the arc-limiting distribution for \(\mathcal {G}(s)\) starting from \(v_0\). This means that for \((v,u)\in E\), we have \(\rho _{vu}(s)=\pi _v(s)p_{vu}(s)\), where \(\pi \in [0,1]^V\) is the limiting distribution in the Markov chain \(\mathcal {G}(s)\) starting from \(v_0\). In what follows, we will use \((\mathcal {G},v_0)\) to denote the game starting from \(v_0\). We will simply write \(\rho (s)\) for \(\rho (s,v_0)\) if \(v_0\) is clear from the context. For rewards \(r: E \rightarrow \mathbb {R}\), let \(r^- = \min _{e} r_e\) and \(r^+ = \max _e r_e\). Let \([r] = [r^-,r^+]\) be the range of r. Let \(R=R(\mathcal {G}) =r^+-r^-\) be the size of the range.
1.1.2 Strategies and Saddle Points
If we consider \(c_{v_0}(s)\) for all possible situations s, we obtain a matrix game \(C_{v_0} : S_W \times S_B \rightarrow \mathbb {R}\), with entries \(C_{v_0}(s_W,s_B) = c_{v_0}(s_W,s_B)\). It is known that every such game has a saddle point in pure strategies [19, 29]. Such a saddle point defines an equilibrium state in which no player has an incentive to switch to another strategy. The value at that state coincides with the limiting payoff in the corresponding BWR-game [19, 29].
We call a pair of strategies optimal if they correspond to a saddle point. It is well-known that there exist optimal strategies \((s^*_W,s^*_B)\) that do not depend on the starting position \(v_0\). Such strategies are called uniformly optimal. Of course there might be several optimal strategies, but they all lead to the same value. We define this to be the value of the game and write \(\mu _{v_0}(\mathcal {G}) = C_{v_0}(s^*_W,s_B^*)\), where \((s^*_W,s^*_B)\) is any pair of optimal strategies. Note that \(\mu _{v_0}(\mathcal {G})\) may depend on the starting node \(v_0\). Note also that for an arbitrary situation s, \(\mu _{v_0}(\mathcal {G}(s))\) denotes the effective payoff \(c_{v_0}(s)\) in the Markov chain \(\mathcal {G}(s)\).
An algorithm is said to solve the game if it computes an optimal pair of strategies.
1.2 Approximation and Approximate Equilibria
Given a BWR-game \(\mathcal {G}=(G=(V,E),P,r)\), a constant \(\varepsilon >0\), and a starting position \(v\in V\), an \(\varepsilon \)-relative approximation of the value of the game is determined by a situation \((s_W^*,s_B^*)\) such that
An alternative concept of an approximate equilibrium are \(\varepsilon \)-relative equilibria. They are determined by a situation \((s_W^*,s_B^*)\) such that
Note that, for sufficiently small \(\varepsilon \), an \(\varepsilon \)-relative approximation implies a \(\Theta (\varepsilon )\)-relative equilibrium, and vice versa. Thus, in what follows, we will use these notions interchangeably. When considering relative approximations and relative equilibria, we assume that the rewards are non-negative integers.
An alternative to relative approximations is to look for an approximation with an absolute error of \(\varepsilon \); this is achieved by a situation \((s_W^*, s_B^*)\) such that
Similarly, for an \(\varepsilon \)-absolute equilibrium, we have the following condition:
Again, an \(\varepsilon \)-absolute approximation implies a \(2\varepsilon \)-absolute equilibrium, and vice versa. When considering absolute equilibria and absolute approximations, we assume that the rewards come from the interval \([-1,1]\).
A situation \((s_W^*,s_B^*)\) is called relatively \(\varepsilon \)-optimal, if satisfies (1), and it is called absolutely \(\varepsilon \)-optimal if it satisfies (3). In the following, we will drop the specification of absolute and relative if it is clear from the context. If the pair \((s_W^*,s_B^*)\) is (absolutely or relatively) \(\varepsilon \)-optimal for all starting positions, it is called uniformly (absolutely or relatively) \(\varepsilon \)-optimal (also called subgame perfect).
We note that, under the above assumptions, the notion of relative approximation is stronger. Indeed, consider a BWR-game \(\mathcal {G}\) with rewards in \([-1,1]\). A relatively \(\varepsilon \)-optimal situation \((s_W^*,s_B^*)\) of the game \(\hat{\mathcal {G}}\) with local rewards given by \(\hat{r} = r + \mathbf {1}\ge 0\) (where \(\mathbf {1}\) is the vector of all ones, and the addition and comparison is meant component-wise) satisfies
and
This is because \(\mu _v(\hat{\mathcal {G}}(s))=\mu _v(\mathcal {G}(s))+1\) for any situation s and \(\mu _v(\mathcal {G})\le 1\). Thus, we obtain a \(2\varepsilon \)-absolute approximation for the value of the original game.
An algorithm for approximating (absolutely or relatively) the values of the game is said to be a fully polynomial-time (absolute or relative) approximation scheme (FPTAS) if the running-time depends polynomially on the input size and \(1/\varepsilon \). In what follows, we assume without loss of generality that \(1/\varepsilon \) is an integer.
1.3 Previous Results
BWR-games are an equivalent formulation [21] of the stochastic games with perfect information and mean payoff that were introduced in 1957 by Gillette [19]. As it was noticed already in [21], the BWR model generalizes a variety of games and problems: BWR-games without random positions (\(V_R = \emptyset \)) are called cyclic or mean payoff games [16, 17, 21, 33, 34]; we call these BW-games. If one of the sets \(V_B\) or \(V_W\) is empty, we obtain a Markov decision process for which polynomial-time algorithms are known [32]. If both are empty (\(V_B = V_W = \emptyset \)), we get a weighted Markov chain. If \(V=V_W\) or \(V=V_B\), we obtain the minimum mean-weight cycle problem, which can be solved in polynomial time [27].
If all rewards are 0 except for m terminal loops, we obtain the so-called Backgammon-like or stochastic terminal payoff games [7]. The special case \(m=1\), in which every random node has only two outgoing arcs with probability 1 / 2 each, defines the so-called simple stochastic games (SSGs), introduced by Condon [13, 14]. In these games, the objective of White is to maximize the probability of reaching the terminal, while Black wants to minimize this probability. Recently, it has been shown that Gillette games (and hence BWR-games [3]) are equivalent to SSGs under polynomial-time reductions [1]. Thus, by recent results of Halman [22], all these games can be solved in randomized strongly subexponential time \(2^{O(\sqrt{n_d\log n_d})}{{\mathrm{{\text {poly}}}}}(|V|)\), where \(n_d=|V_B|+|V_W|\) is the number of deterministic positions.
Besides their many applications [26, 30], all these games are of interest to complexity theory: The decision problem “whether the value of a BW-game is positive” is in the intersection of NP and co-NP [28, 40]; yet, no polynomial algorithm is known even in this special case. We refer to Vorobyov [39] for a survey. A similar complexity claim holds for SSGs and BWR-games [1, 3]. On the other hand, there exist algorithms that solve BW-games in practice very fast [21]. The situation for these games is thus comparable to linear programming before the discovery of the ellipsoid method: linear programming was known to lie in the intersection of NP and co-NP, and the simplex method proved to be fast in practice. In fact, a polynomial algorithm for linear programming in the unit cost model would already imply a polynomial algorithm for BW-games [37]; see also [4] for an extension to BWR-games.
While there are numerous pseudo-polynomial algorithms known for BW-games [21, 35, 40], pseudo-polynomiality for BWR-games (with no restriction on the number of random positions) is in fact equivalent to polynomiality [1]. Gimbert and Horn [20] have shown that a generalization of simple stochastic games on k random positions having arbitrary transition probabilities [not necessarily (1 / 2, 1 / 2)] can be solved in time \(O(k!(|V||E|+L))\), where L is the maximum bit length of a transition probability. There are various improvements with smaller dependence on k [9, 15, 20, 23] (note that even though BWR-games are polynomially reducible to simple stochastic games, under this reduction the number of random positions does not stay constant, but is only polynomially bounded in n, even if the original BWR-game had a constant number of random positions). Recently, a pseudo-polynomial algorithm was given for BWR-games with a constant number of random positions and polynomial common denominator of transition probabilities, but under the assumption that the game is ergodic (that is, the value does not depend on the initial position) [5]. Then, this result was extended for the non-ergodic case [6]; see also [4].
As for approximation schemes, the only result we are aware [36] of is the observation that the values of BW-games can be approximated within an absolute error of \(\varepsilon \) in polynomial-time, if all rewards are in the range \([-1,1]\). This follows immediately from truncating the rewards and using any of the known pseudo-polynomial algorithms [21, 35, 40].
On the negative side, it was observed recently [18] that obtaining an \(\varepsilon \)-absolute FPTAS without the assumption that all rewards are in \([-1,1]\), or an \(\varepsilon \)-relative FPTAS without the assumption that all rewards are non-negative, for BW-games, would imply their polynomial time solvability. In that sense, our results below are the best possible unless there is a polynomial algorithm for solving BW-games.
1.4 Our Results
In this paper, we extend the absolute FPTAS for BW-games [36] in two directions. First, we allow a constant number of random positions, and, second, we derive an FPTAS with a relative approximation error. Throughout the paper, we assume the availability of a pseudo-polynomial algorithm \(\mathbb {A}\) that solves any BWR-game \(\mathcal {G}\) with integral rewards and rational transition probabilities in time polynomial in n, D, and R, where \(n=n(\mathcal {G})\) is the total number of positions, \(R=R(\mathcal {G}):=r^+(\mathcal {G})-r^-(\mathcal {G})\) is the size of the range of the rewards, \(r^+(\mathcal {G})=\max _e r_e\) and \(r^-(\mathcal {G})=\min _e r_e\), and \(D=D(\mathcal {G})\) is the common denominator of the transition probabilities. Note that the dependence on D is inherent in all known pseudo-polynomial algorithms for BWR-games. Note also that the affine scaling of the rewards does not change the game.
Let \(p_{\min }=p_{\min }(\mathcal {G})\) be the minimum positive transition probability in the game \(\mathcal {G}\). Throughout this paper, we will assume that the number k of random positions is bounded by a constant.
The following theorem says that a pseudo-polynomial algorithm can be turned into an absolute approximation scheme.
Theorem 1
Given a pseudo-polynomial algorithm for solving any BWR-game with \(k=O(1)\) (in uniformly optimal strategies), there is an algorithm that returns, for any given BWR-game with rewards in \([-1,1]\), \(k=O(1)\), and for any \(\varepsilon > 0\), a pair of strategies that (uniformly) approximates the value within an absolute error of \(\varepsilon \). The running-time of the algorithm is bounded by \({{\mathrm{{\text {poly}}}}}(n, 1/p_{\min }, 1/\varepsilon )\) [assuming \(k=O(1)\)].
We also obtain an approximation scheme with a relative error.
Theorem 2
Given a pseudo-polynomial algorithm for solving any BWR-game with \(k=O(1)\), there is an algorithm that returns, for any given BWR-game with non-negative integral rewards, \(k=O(1)\), and for any \(\varepsilon > 0\), a pair of strategies that approximates the value within a relative error of \(\varepsilon \). The running-time of the algorithm is bounded by \({{\mathrm{{\text {poly}}}}}(n,1/p_{\min },\log R, 1/\varepsilon )\) [assuming \(k=O(1)\)].
We remark that Theorem 1 (apart from the dependence of the running time on \(\log R\)) can be obtained from Theorem 2 (see Sect. 2). However, our reduction in Theorem 1, unlike Theorem 2, has the property that if the pseudo-polynomial algorithm returns uniformly optimal strategies, then the approximation scheme also returns uniformly \(\varepsilon \)-optimal strategies. For BW-games, i.e., the special case without random positions, we can also strengthen the result of Theorem 2 to return a pair of strategies that is uniformly \(\varepsilon \)-optimal.
Theorem 3
Assume that there is a pseudo-polynomial algorithm for solving any BW-game in uniformly optimal strategies. Then for any \(\varepsilon > 0\), there is an algorithm that returns, for any given BW-game with non-negative integral rewards, a pair of uniformly relatively \(\varepsilon \)-optimal strategies. The running-time of the algorithm is bounded by \({{\mathrm{{\text {poly}}}}}(n,\log R, 1/\varepsilon )\).
In deriving these approximation schemes from a pseudo-polynomial algorithm, we face two main technical challenges that distinguish the computation of \(\varepsilon \)-equilibria of BWR-games from similar standard techniques used in combinatorial optimization. First, the running-time of the pseudo-polynomial algorithm depends polynomially both on the maximum reward and the common denominator D of the transition probabilities. Thus, in order to obtain a fully polynomial-time approximation scheme (FPTAS) with an absolute guarantee whose running-time is independent of D, we have to truncate the probabilities and bound the change in the game value, which is a non-linear function of D. Second, in order to obtain an FPTAS with a relative guarantee, one needs (as often in optimization) a (trivial) lower/upper bound on the optimum value. In the case of BWR-games, it is not clear what bound we can use, since the game value can be arbitrarily small. The situation becomes even more complicated if we look for uniformly \(\varepsilon \)-optimal strategies. This is because we have to output just a single pair of strategies that guarantees \(\varepsilon \)-optimality from any starting position.
In order to resolve the first issue, we analyze the change in the game values and optimal strategies if the rewards or transition probabilities are changed. Roughly speaking, we use results from Markov chain perturbation theory to show that if the probabilities are perturbed by a small error \(\delta \), then the change in the game value is \(O(\delta n^2/p_{\min }^{2k})\) (see Sect. 2.1). It is worth mentioning that a somewhat related result was obtained recently for the class of so-called almost-sure ergodic games (not necessarily with perfect information) [10]. More precisely, it was shown that for this class of games there is an \(\varepsilon \)-optimal strategy with rational representation with denominator \(D=O(\frac{n^3}{\varepsilon p_{\min }^{k}})\) [10]. The second issue is resolved through repeated applications of the pseudo-polynomial algorithm on a truncated game. After each such application we have one of the following situations: either the value of the game has already been approximated within the required accuracy or it is guaranteed that the range of the rewards can be shrunk by a constant factor without changing the value of the game (see Sects. 2.3, 2.4).
Since BWR-games with a constant number of random positions admit a pseudo-polynomial algorithm, as was recently shown [5, 6], we obtain the following results.
Corollary 1
-
(i)
There is an FPTAS that solves, within an absolute error guarantee, in uniformly \(\varepsilon \)-optimal strategies, any BWR-game with a constant number of random positions, \(1/p_{\min }={{\mathrm{{\text {poly}}}}}(n)\), and rewards in \([-1,1]\).
-
(ii)
There is an FPTAS that solves, within a relative error guarantee, in \(\varepsilon \)-optimal strategies, any BWR-game with a constant number of random positions, \(1/p_{\min }={{\mathrm{{\text {poly}}}}}(n)\), and non-negative rational rewards.
-
(iii)
There is an FPTAS that solves, within a relative error guarantee, in uniformly \(\varepsilon \)-optimal strategies, any BW-game with non-negative (rational) rewards.
The proofs of Theorems 1, 2, and 3 will be given in Sects. 2.2, 2.3, and 2.4, respectively.
2 Approximation Schemes
2.1 The Effect of Perturbation
Our approximation schemes are based on the following three lemmas. The first one (which is known) says that a linear change in the rewards corresponds to a linear change in the game value. In our approximation schemes, we truncate and scale the rewards to be able to run the pseudo-polynomial algorithm in polynomial time. We need the lemma to bound the error in the game value resulting from the truncation.
Lemma 1
Let \(\mathcal {G}=(G=(V,E),P,r)\) be a BWR-game. Let \(\theta _1,\gamma _1,\theta _2,\gamma _2\) be constants such that \(\theta _1,\theta _2>0\). Let \(\hat{\mathcal {G}}\) be a game \((G=(V,E),P,\hat{r})\) with \(\theta _1 r_e+\gamma _1\mathbf {1}\le \hat{r}_e\le \theta _2 r_e +\gamma _2\mathbf {1}\), for all \(e\in E\). Then for any \(v\in V\), we have \(\theta _1 \mu _v(\mathcal {G})+\gamma _1 \le \mu _v(\hat{\mathcal {G}})\le \theta _2 \mu _v(\mathcal {G}) +\gamma _2\). Moreover, if \((\hat{s}_W,\hat{s}_B)\) is an absolutely \(\varepsilon \)-optimal situation in \((\hat{\mathcal {G}},v)\), then
Proof
This uses only standard techniques, and we give the proof only for completeness. Let \((s_W^*,s_B^*)\) and \((\hat{s}_W,\hat{s}_B)\) be pairs of optimal strategies for \((\mathcal {G},v)\) and \((\hat{\mathcal {G}},v)\), respectively. Denote by \(\rho ^*,\hat{\rho },\rho ^{\prime },\) and \(\rho ^{\prime \prime }\) the (arc) limiting distributions for the Markov chains starting from \(v_0\) and corresponding to pairs \((s_W^*,s_B^*)\), \((\hat{s}_W,\hat{s}_B)\), \((s_W^*,\hat{s}_B)\), and \((\hat{s}_W,s_B^*)\), respectively. By the definition of optimal strategies and the facts that \(\Vert \rho ^{\prime }\Vert _1= \Vert \rho ^{\prime \prime }\Vert _1 =1\) (because they are probability distributions), we have the following series of inequalities:
To see the first bound in (5), note that for any \(s_W\), we have \(\mu _v(\mathcal {G}(s_W,\hat{s}_B))\le \frac{1}{\theta _1}(\mu _v(\hat{\mathcal {G}}(s_W,\hat{s}_B))-\gamma _1)\). Also, by the \(\varepsilon \)-optimality of \(\hat{s}_W\) in \((\hat{\mathcal {G}},v)\), we have \(\mu _v(\hat{\mathcal {G}}(s_W,\hat{s}_B))\le {\mu _v(\hat{\mathcal {G}})}+\varepsilon \le \theta _2\mu _v(\mathcal {G})+\gamma _2+\varepsilon \). The first bound in (5) follows. The second bound can be shown similarly. \(\square \)
The second lemma, which is new as far as we are aware, states that if we truncate the transition probabilities within a small error \(\varepsilon \), then the change in the game value is bounded by \(O(\varepsilon ^2 n^3/p_{\min }^{2k})\). More precisely, for a BWR-game \(\mathcal {G}\) and a constant \(\varepsilon >0\), define
where \(n =n(\mathcal {G})\), \(p_{\min }= p_{\min }(\mathcal {G})\), \(k=k(\mathcal {G})\), and \(r^*=r^*(\mathcal {G}):=\max \{|r^+(\mathcal {G})|,|r^-(\mathcal {G})|\}\).
Lemma 2
Let \(\mathcal {G}=(G=(V,E),P,r)\) be a BWR-game with \(r\in [-1,1]^{E}\), and let \(\varepsilon \le p_{\min }/2 =p_{\min }(\mathcal {G})/2\) be a positive constant. Let \(\hat{\mathcal {G}}\) be a game \((G=(V,E),\hat{P},r)\) with \(\Vert P-\hat{P} \Vert _{\infty }\le \varepsilon \) (and \(\hat{p}_{uv}=0\) if \(p_{uv}=0\) for all arcs (u, v)). Then we have \(|\mu _v(\mathcal {G})-\mu _v(\hat{\mathcal {G}})|\le \delta (\mathcal {G},\varepsilon )\) for any \(v\in V\). Moreover, if the pair \((\tilde{s}_W,\tilde{s}_B)\) is absolutely \(\varepsilon ^{\prime }\)-optimal in \((\hat{\mathcal {G}},v)\), then it is absolutely \((\varepsilon ^{\prime }+2\delta (\mathcal {G},\varepsilon ))\)-optimal in \((\mathcal {G},v)\).
Proof
We apply Lemma 10. Let \((s_W^*,s_B^*)\) and \((\hat{s}_W,\hat{s}_B)\) be pairs of optimal strategies for \((\mathcal {G},v)\) and \((\hat{\mathcal {G}},v)\), respectively. Write \(\delta =\delta (\mathcal {G},\varepsilon )\). Then optimality and Lemma 10 imply the following two series of inequalities:
To see the second claim, note that for any \(s_W\in S_W\), we have
Similarly, we can show that \(\mu _v(\mathcal {G}(\tilde{s}_W,s_B))\ge \mu _v(\mathcal {G})-\varepsilon ^{\prime }-2\delta \) for all \(s_B\in S_B\). \(\square \)
Since we assume that the running-time of the pseudo-polynomial algorithm for the original game \(\mathcal {G}\) depends on the common denominator D of the transition probabilities, we have to truncate the probabilities to remove this dependence on D. By Lemma 2, the value of the game does not change too much after such a truncation.
The third result that we need concerns relative approximation. The main idea is to use the pseudo-polynomial algorithm to test whether the value of the game is larger than a certain threshold. If it is, we get already a good relative approximation. Otherwise, the next lemma says that we can reduce all large rewards without changing the value of the game.
Lemma 3
Let \(\mathcal {G}=(G=(V,E),P,r)\) be a BWR-game with \(r\ge 0\), and let v be any vertex with \(\mu _v(\mathcal {G})< t\). Suppose that \(r_e\ge t^{\prime }=ntp_{\min }^{-(2k+1)}\) for some \(e\in E\). Let \(\hat{\mathcal {G}}=(G=(V,E),P,\hat{r})\), where \(\hat{r}_e=\min \{r_e,t^{\prime \prime }\}\), \(t^{\prime \prime }\ge (1+\varepsilon )t^{\prime }\) for some \(\varepsilon \ge 0\), and \(\hat{r}_{e^{\prime }}=r_{e^{\prime }}\) for all \(e^{\prime }\ne e\). Then \(\mu _v(\hat{\mathcal {G}})=\mu _v(\mathcal {G})\), and any relatively \(\varepsilon \)-optimal situation in \((\hat{\mathcal {G}},v)\) is also relatively \(\varepsilon \)-optimal in \((\mathcal {G},v)\).
Proof
We assume that \(\hat{r}_e=t^{\prime \prime }\ge (1+\varepsilon )t^{\prime }\), since otherwise there is nothing to prove. Let \(s^*=(s^*_W,s^*_B)\) be an optimal situation for \((\mathcal {G},v)\). This means that \(\mu _v(\mathcal {G})=\mu _v(\mathcal {G}(s^*))=\rho (s^*)^Tr< t\). Lemma 8 says that \(\rho _e(s^*)>0\) implies \(\rho _{e}(s^*)\ge p_{\min }^{2k+1}/n\). Hence, \(r_{e}\rho _e(s^*)\le \rho (s^*)^Tr=\mu _v(\mathcal {G})<t\) implies that \(r_{e}<t^{\prime }\), if \(\rho _e(s^*)>0\). We conclude that \(\rho _e(s^*)=0\), and hence \(\mu _v(\hat{\mathcal {G}}(s^*))=\mu _v(\mathcal {G})\).
Since \(\hat{r}\le r\), we have \(\mu _v(\hat{\mathcal {G}}(s))\le \mu _v(\mathcal {G}(s))\) for all situations s. In particular, for any \(s_W\in S_W\),
We claim that also \(\mu _v(\hat{\mathcal {G}}(s_W^*,s_B))\ge \mu _v(\hat{\mathcal {G}}(s^*_W,s^*_B))\) for all \(s_B\in S_B\). Indeed, if there is a strategy \(s_B\) for Black such that \(\mu _v(\hat{\mathcal {G}}(s_W^*,s_B))<\mu _v(\hat{\mathcal {G}}(s^*_W,s^*_B))=\mu _v(\mathcal {G})<t\), then, by the same argument as above, we must have \(\rho _e(s_W^*,s_B)=0\) (since \(\rho _e(s^*_W,s_B)(1+\varepsilon )t^{\prime }\le \rho _e(s^*_W,s_B)t^{\prime \prime }=\rho _e(s^*_W,s_B)\hat{r}_e\le \rho (s^*_W,s_B)^T\hat{r}=\mu _v(\hat{\mathcal {G}}(s_W^*,s_B))<t\)). This, however, implies that
which is in contradiction to the optimality of \(s^*\) in \(\mathcal {G}\). We conclude that \((s_W^*,s_B^*)\) is also optimal in \(\hat{\mathcal {G}}\) and hence \(\mu _v(\hat{\mathcal {G}})=\mu _v(\mathcal {G})\).
Suppose that \((\hat{s}_W,\hat{s}_B)\) is a relatively \(\varepsilon \)-optimal situation in \((\hat{\mathcal {G}},v)\). Then \(\rho _e(s_W,\hat{s}_B)=0\) for any \(s_W\in S_W\). Indeed,
gives a contradiction with Lemma 8 if \(\rho _e(s_W,\hat{s}_B)>0\). It follows that, for any \(s_W\in S_W\), \(\mu _v(\mathcal {G}(s_W,\hat{s}_B))=\mu _v(\hat{\mathcal {G}}(s_W,\hat{s}_B))\le (1+\varepsilon )\mu _v(\mathcal {G})\). Furthermore, for any \(s_B\in S_B\),
\(\square \)
2.2 Absolute Approximation
In this section, we assume that \(r^-=-1\) and \(r^+=1\), i.e., all rewards are from the interval \([-1,1]\). We may assume also that \(\varepsilon \in (0,1)\) and \(\frac{1}{\varepsilon }\in \mathbb {Z}_+\). We apply the pseudo-polynomial algorithm \(\mathbb {A}\) on a truncated game \(\tilde{\mathcal {G}}=(G=(V,E),\tilde{P},\tilde{r})\) defined by rounding the rewards to the nearest integer multiple of \(\varepsilon /4\) (denoted \(\tilde{r}:=\lfloor r\rceil _{\frac{\varepsilon }{4}}\)) and truncating the vector of probabilities \((p_{(v,u)})_{u \in V}\) for each random node \(v\in V_R\), as described in the following lemma.
Lemma 4
Let \(\alpha \in [0,1]^{n}\) with \(\Vert \alpha \Vert _{1} = 1\). Let \(B \in \mathbb {N}\) such that \(\min _{i:\alpha _i>0}\{\alpha _i\}>2^{-B}\). Then there exists \(\alpha ^{\prime } \in [0,1]^{n}\) such that
-
(i)
\(\Vert \alpha ^{\prime }\Vert _{1}=1\);
-
(ii)
for all \(i=1,\ldots ,n\), \(\alpha ^{\prime }_i = c_i/2^{B}\) where \(c_i \in \mathbb {N}\) is an integer;
-
(iii)
for all \(i=1,\ldots ,n\), \(\alpha ^{\prime }_i>0\) if and only \(\alpha _i>0\); and
-
(iv)
\(\Vert \alpha -\alpha ^{\prime }\Vert _{\infty } \le 2^{-B}\).
Proof
This is straight-forward, and we include the proof only for completeness. Without loss of generality, we assume \(\alpha _i>0\) for all i (set \(\alpha _i^{\prime }=0\) for all i such that \(\alpha _i=0\)). Initialize \(\varepsilon _0=0\) and iterate for \(i=1,\ldots ,n\): set \(\alpha _i^{\prime } =\lfloor \alpha _i+\varepsilon _{i-1}\rceil _{2^{-B}}\) and \(\varepsilon _{i} =\alpha _i+\varepsilon _{i-1}-\alpha _i^{\prime }\). The construction implies (4). Note that \(|\varepsilon _i|\le 2^{-(B+1)}\) for all i, and \(\varepsilon _n=\sum _i \alpha _i-\sum _i\alpha _i^{\prime }\), which implies (4). Furthermore, \(|\alpha _i-\alpha _i^{\prime }|=|\varepsilon _i-\varepsilon _{i-1}|\le 2^{-B}\), which implies (4). Note finally that (4) follows from (4) since \(\min _{i:\alpha _i>0}\{\alpha _i\}>2^{-B}\). \(\square \)
Lemma 5
Let \(\mathbb {A}\) be a pseudo-polynomial algorithm that solves, in (uniformly) optimal strategies, any BWR-game \(\mathcal {G}=(G,P,r)\) in time \(\tau (n,D,R)\). Then for any \(\varepsilon >0\), there is an algorithm that solves, in (uniformly) absolutely \(\varepsilon \)-optimal strategies, any given BWR-game \(\mathcal {G}=(G,P,r)\) in time bounded by \(\tau \bigl (n,\frac{2^{k+4}n^2(3k+1)}{\varepsilon p_{\min }^{k}},\frac{8}{\varepsilon }\bigr )\).
Proof
We apply \(\mathbb {A}\) to the game \(\tilde{\mathcal {G}}=(G,\tilde{P},\tilde{r})\), where \(\tilde{r}:=\frac{4}{\varepsilon }\lfloor r\rceil _{\frac{\varepsilon }{4}}\). The probabilities \(\tilde{P}\) are obtained from P by applying Lemma 4 with \(B=\lceil \log _2(1/\varepsilon ^{\prime })\rceil \), where we select \(\varepsilon ^{\prime }\) such that \(\delta (\mathcal {G},\varepsilon ^{\prime })\le \frac{\varepsilon }{4}\) [as defined by (6)]. It is easy to check that \(\delta (\mathcal {G},\varepsilon ^{\prime })\le \varepsilon /4\) for \(\varepsilon ^{\prime }=\frac{\varepsilon p_{\min }^{k}}{2^{k+3}n^2(3k+1)}\), as \(r^*=1\). Note that all rewards in \(\tilde{\mathcal {G}}\) are integers in the range \([-\frac{4}{\varepsilon },\frac{4}{\varepsilon }]\). Since \(D(\tilde{\mathcal {G}})=2^B\) and \(R(\tilde{\mathcal {G}})= 8/\varepsilon \), the statement about the running-time follows.
Let \(\tilde{s}\) be the pair of (uniformly) optimal strategies returned by \(\mathbb {A}\) on input \(\tilde{\mathcal {G}}\). Let \(\hat{\mathcal {G}}\) be the game \((G,\tilde{P},r)\). Since \(\Vert \tilde{r}-\frac{4}{\varepsilon }r\Vert _{\infty }\le 1\), we can apply Lemma 1 (with \(\hat{r}=\tilde{r}\), \(\theta _1=\theta _2=\frac{4}{\varepsilon }\) and \(\gamma _1=-\gamma _2=-1\)) to conclude that \(\tilde{s}\) is a (uniformly) absolutely \(\frac{\varepsilon }{2}\)-optimal pair for \(\hat{\mathcal {G}}\). Now we apply Lemma 2 and conclude that \(\tilde{s}\) is (uniformly) \((\frac{\varepsilon }{2}+2\delta (\mathcal {G},\varepsilon ^{\prime }))\)-optimal for \(\mathcal {G}\). \(\square \)
Note that the above technique yields an approximation algorithm with polynomial running-time only for \(k=O(1)\), even if the pseudo-polynomial algorithm \(\mathbb {A}\) works for arbitrary k.
2.3 Relative Approximation
Let \(\mathcal {G}=(G,P,r)\) be a BWR-game on G with non-negative integral rewards, that is, \(r^-=0\) and \(\min _{e: r_{e}>0}r_e \ge 1\). The algorithm is given as Algorithm 1. The main idea is to truncate the rewards, scaled by a certain factor of 1 / K, and use the pseudo-polynomial algorithm on the truncated game \(\hat{\mathcal {G}}\). If the value \(\mu _w(\hat{\mathcal {G}})\) in the truncated game from the starting node w is large enough (step 4), then we get a good relative approximation of the original value and we are done. Otherwise, the information that \(\mu _w(\hat{\mathcal {G}})\) is small allows us to reduce the maximum reward by a factor of 2 in the original game (step 9); we invoke Lemma 3 for this. Thus, the algorithm terminates in polynomial time (in the bit length of \(R(\mathcal {G})\)). To remove the dependence on D in the running-time, we need also to truncate the transition probabilities. In the algorithm, we denote by \(\tilde{P}\) the transition probabilities obtained from P by applying Lemma 4 with \(B=\lceil \log (1/\varepsilon ^{\prime })\rceil \), where we select \(\varepsilon ^{\prime }=\frac{p_{\min }^{2k}}{2^{k+1}n^2(k+2)^2\theta }\) with \(\theta =\theta (\mathcal {G}):=\frac{2(1+\varepsilon )(3+2\varepsilon )n}{\varepsilon p_{\min }^{2k+1}}\). Thus, we have \(2\delta (\mathcal {G},\varepsilon ^{\prime })\le 2^{k+1}\varepsilon ^{\prime } n^2(k+2)^2p_{\min }^{-2k}\le r^+(\mathcal {G})/\theta (\mathcal {G})=K(\mathcal {G})\).
Lemma 6
Let \(\mathbb {A}\) be a pseudo-polynomial algorithm that solves any BWR-game \((\mathcal {G}=(G,P,r),w)\), from any given starting position w, in time \(\tau (n,D,R)\). Then, for any \(\varepsilon \in (0,1)\), there is an algorithm that solves, in relatively \(\varepsilon \)-optimal strategies, any BWR-game \((\mathcal {G}=(G,P,r),w)\) from any given starting position w in time
Proof
The algorithm \({{\mathrm{{\text {FPTAS-BWR}}}}}(\mathcal {G},w,\varepsilon )\) is given as Algorithm 1. The bound on the running-time follows since, by step (9), each time we recurse on a game \(\tilde{\mathcal {G}}\) with \(r^+(\tilde{\mathcal {G}})\) reduced by a factor of at least half. Moreover, the rewards in the truncated game \(\hat{\mathcal {G}}\) are non-negative integers with a maximum value of \(r^+(\hat{\mathcal {G}})\le \theta \), and the smallest common denominator of the transition probabilities is at most \(\tilde{D}:=\frac{2}{\varepsilon ^{\prime }}\). Thus the time taken by algorithm \(\mathbb {A}\) for each recursive call is at most \(\tau \bigl (n,\tilde{D},\theta )\).
What remains to be done is to argue by induction (on \(r^+(\mathcal {G})\)) that the algorithm returns a pair \(\tilde{s}=(\tilde{s}_W,\tilde{s}_B)\) of \(\varepsilon \)-optimal strategies. For the base case, we have either \(r^+(\mathcal {G})\le 2\) or the value returned by the pseudo-polynomial \(\mathbb {A}\) satisfies \(\mu _w(\hat{\mathcal {G}})\ge 3/\varepsilon \). In the former case, note that since \(\Vert P-\tilde{P}\Vert _{\infty }\le \varepsilon ^{\prime }\) and \(r^+(\mathcal {G})\le 2\), Lemma 2 implies that the pair \(\tilde{s}=(\tilde{s}_W,\tilde{s}_B)\) returned in step 2 is absolutely \(\varepsilon ^{\prime \prime }\)-optimal, where \(\varepsilon ^{\prime \prime }=2\delta (\mathcal {G},\varepsilon ^{\prime })<\frac{\varepsilon p_{\min }^{2k+1}}{n}\). Lemma 8 and the integrality of the non-negative rewards imply that, for any situation s, \(\mu _w(\mathcal {G}(s))\ge p_{\min }^{2k+1}/n\) if \(\mu _w(\mathcal {G}(s))>0\). Thus, if \(\mu _w(\mathcal {G})>0\), then \(\varepsilon ^{\prime \prime }\le \varepsilon \mu _w(\mathcal {G})\), and it follows that \((\tilde{s}_W,\tilde{s}_B)\) is relatively \(\varepsilon \)-optimal. On the other hand, if \(\mu _w(\mathcal {G})=0\), then \(\mu _w(\mathcal {G}(\tilde{s}))\le \mu _w(\mathcal {G})+\varepsilon ^{\prime \prime }< p_{\min }^{2k+1}/n\), implying that \(\mu _w(\mathcal {G}(\tilde{s}))=0\). Thus, we get a relative \(\varepsilon \)-approximation in both cases.
Suppose now that \(\mathbb {A}\) determines that \(\mu _w(\hat{\mathcal {G}})\ge 3/\varepsilon \) in step 4, and hence the algorithm returns \((\tilde{s}_W,\tilde{s}_B)\). Note that \(\frac{1}{K} \cdot r_e- 1\le \hat{r}_e\le \frac{1}{K} \cdot r_e\) for all \(e\in E\), and \(\Vert P-\tilde{P}\Vert _{\infty }\le \varepsilon ^{\prime }\). Hence, by Lemmas 1 and 2, we have
and the pair \((\tilde{s}_W,\tilde{s}_B)\) returned in step 5 is absolutely \(K+2\delta (\mathcal {G},\varepsilon ^{\prime })\le 2K\)-optimal for \(\mathcal {G}\). [To see (7), let \(\tilde{\mathcal {G}}:=(G,\tilde{P},r)\). Then by Lemma 2, we have
Furthermore, as \(\hat{\mathcal {G}}yyy\) is obtained from \(\tilde{\mathcal {G}}\) by scaling and truncating the local rewards, we have by Lemma 1 (applied with \(\theta _1=\theta _2=\frac{1}{K}\), \(\gamma _1=-1\) and \(\gamma _2=0\)),
Combining (8) and (9), we get (7).]
Then (7) implies that
and we are done.
On the other hand, if \(\mu _w(\hat{\mathcal {G}})< 3/\varepsilon \) then, by (7), \(\mu _w(\mathcal {G})<\frac{K(3+2\varepsilon )}{\varepsilon }=\frac{p_{\min }^{2k+1}r^+}{2(1+\varepsilon )n}\). By Lemma 3, applied with \(t= K(3+2\varepsilon )/\varepsilon \), the game \(\tilde{\mathcal {G}}\) defined in step 11 satisfies \(\mu _w(\mathcal {G})=\mu _w(\tilde{\mathcal {G}})\), and any (relatively) \(\varepsilon \)-optimal strategy in \((\tilde{\mathcal {G}},w)\) (in particular the one returned by induction in step 11) is also \(\varepsilon \)-optimal for \((\mathcal {G},w)\). \(\square \)
Note that the running-time in the above lemma simplifies to \({{\mathrm{{\text {poly}}}}}(n, 1/\varepsilon , 1/p_{\min }) \cdot \log R\) for \(k = O(1)\).
2.4 Uniformly Relative Approximation for BW-Games
The FPTAS in Theorem 6 does not necessarily return a uniformly \(\varepsilon \)-optimal situation, even if the given pseudo-polynomial algorithm \(\mathbb {A}\) provides a uniformly optimal solution. For BW-games, we can modify this FPTAS to return a uniformly \(\varepsilon \)-optimal situation. The algorithm is given as Algorithm 2. The main difference is that when we recurse on a game with reduced rewards (step 11), we also have to delete all positions that have large values \(\mu _v(\tilde{\mathcal {G}})\) in the truncated game. This is similar to the approach used to decompose a BW-game into ergodic classes [21]. However, the main technical difficulty is that, with approximate equilibria, White or Black might still have some incentive to move to a lower- or higher-value class, respectively, since the values obtained are just approximations of the optimal values. We show that such a move will not be much profitable for either White nor for Black. Recall that we assume that the rewards are non-negative integers.
Lemma 7
Let \(\mathbb {A}\) be a pseudo-polynomial algorithm that solves, in uniformly optimal strategies, any BW-game \(\mathcal {G}\) in time \(\tau (n,R)\). Then for any \(\varepsilon >0\), there is an algorithm that solves, in uniformly relatively \(\varepsilon \)-optimal strategies, any BW-game \(\mathcal {G}\), in time \(O\bigl (\bigl (\tau \bigl (n,\frac{2(1+\varepsilon ^{\prime })^2n}{\varepsilon ^{\prime }}\bigr )+{{\mathrm{{\text {poly}}}}}(n)\bigr ) \cdot h\bigr )\), where \(h=\lceil \log R\rceil +1\), and \(\varepsilon ^{\prime }=\frac{\ln (1+\varepsilon )}{3h}\approx \frac{\varepsilon }{3h}\).
Proof
The algorithm \({{\mathrm{{\text {FPTAS-BW}}}}}(\mathcal {G},\varepsilon )\) is given as Algorithm 2. The bound on the running-time is obvious: in step (9), each time we recurse on a game \(\tilde{\mathcal {G}}\) with \(r^+(\tilde{\mathcal {G}})\) reduced by a factor of at least half. Moreover, the rewards in the truncated game \(\hat{\mathcal {G}}\) are integral with a maximum value of \(r^+(\hat{\mathcal {G}})\le \frac{r^+(\mathcal {G})}{K}\le \frac{2(1+\varepsilon ^{\prime })^2n}{\varepsilon ^{\prime }}\). Thus, the time that algorithm \(\mathbb {A}\) needs in each recursive call is bounded from above by \(\tau \bigl (n,\frac{2(1+\varepsilon ^{\prime })^2n}{\varepsilon ^{\prime }}\bigr )\).
So it remains to argue (by induction) that the algorithm returns a pair \((\tilde{s}_W,\tilde{s}_B)\) of (relatively) uniformly \(\varepsilon \)-optimal strategies. Let us index the different recursive calls of the algorithm by \(i=1,2,\ldots ,h^{\prime }\le h\) and denote by \(\mathcal {G}^{(i)}=(G^{(i)}=(V^{(i)},E^{(i)},r^{(i)})\) the game input to the ith recursive call of the algorithm (so \(\mathcal {G}^{(1)}=\mathcal {G}\)) and by \(\hat{s}^{(i)}=(\hat{s}^{(i)}_W,\hat{s}^{(i)}_B)\), \(\tilde{s}^{(i)}=(\tilde{s}^{(i)}_W,\tilde{s}^{(i)}_B)\) the pair of strategies returned either in steps 2, 4, 5, or 11. Similarly, we denote by \(V^{(i)}=V_W^{(i)}\cup V_B^{(i)}\), \(U^{(i)}\), \(r^{(i)}\), \(K^{(i)}\) \(\hat{r}^{(i)}\), \(\hat{\mathcal {G}}^{(i)}\), \(\tilde{\mathcal {G}}^{(i)}\) the instantiations of V, \(V_B\), \(V_W,\) U, r, \(\hat{r}\), \(\hat{\mathcal {G}}\), K, \(\tilde{\mathcal {G}}\), respectively, in the ith call of the algorithm. We denote by \(S_W^{(i)}\) and \(S_B^{(i)}\) the set of strategies in \(\mathcal {G}^{(i)}\) for White and Black, respectively. For a set U of positions, a game \(\mathcal {G}\), and a situation s, we denote by \(\mathcal {G}[U]=(G[U],r)\) and s[U], respectively, the game and situation induced on U. \(\square \)
Claim 1
-
(i)
There does not exist an edge \((v,u)\in E\) such that \(v\in V_B^{(i)}\cap U^{(i)}\) and \(u\in V^{(i)}{\setminus } U^{(i)}\).
-
(ii)
For all \(v \in V_W^{(i)}\cap U^{(i)}\), there exists a \(u\in U^{(i)}\) with \((v,u)\in E\).
-
(i’)
There does not exist an edge \((v,u)\in E\) such that \(v\in V_W^{(i)}{\setminus } U^{(i)}\) and \(u\in U^{(i)}\).
-
(ii’)
For all black positions \(v \in V_B^{(i)}{\setminus } U^{(i)}\), there exists a \(u\in V^{(i)}{\setminus } U^{(i)}\) such that \((v,u)\in E\).
-
(iii)
Let \(\hat{s}^{(i)}=(\hat{s}_W^{(i)},\hat{s}_B^{(i)})\) be the situation returned in step 4. Then, for all \(v\in U^{(i)}\), we have \(\hat{s}^{(i)}(v)\in U^{(i)}\), and, for all \(v\in V^{(i)}{\setminus } U^{(i)}\), we have \(\hat{s}^{(i)}(v)\in V^{(i)}{\setminus } U^{(i)}\).
Proof
By the optimality conditions in \(\hat{\mathcal {G}}^{(i)}\) (see, e.g., [21]), we have
-
(I)
\(\mu _v(\hat{\mathcal {G}}^{(i)})=\min \{\mu _u(\hat{\mathcal {G}}^{(i)}) \mid u\in V^{(i)}\) such that \((v,u)\in E\}\), for \(v\in V_B^{(i)}\), and
-
(II)
\(\mu _v(\hat{\mathcal {G}}^{(i)})=\max \{\mu _u(\hat{\mathcal {G}}^{(i)}) \mid u\in V^{(i)}\) such that \((v,u)\in E\}\), for any \(v\in V_W^{(i)}\).
(i) and (ii), together with the definition of \(U^{(i)}\), imply (i) and (ii), respectively. Similarly (i’) and (ii’) can be shown. The optimality conditions also imply that for all \(v\in V^{(i)}\), \(\mu _v(\hat{\mathcal {G}}^{(i)})=\mu _{\hat{s}^{(i)}(v)}(\hat{\mathcal {G}}^{(i)})\), which in turn implies (iii). \(\square \)
Note that Claim 1 implies that the game \(\mathcal {G}^{(i)}[V^{(i)}{\setminus } U^{(i)}]\) is well-defined since the graph \(G[V^{(i)}{\setminus } U^{(i)}]\) has no sinks. For a strategy \(s_W\) (and similarly for a strategy \(s_B\)) and a subset \(V^{\prime }\subseteq V\), we write \(S_W(V^{\prime })=\{s_W(u) \mid u\in V^{\prime }\}\). The following two claims state respectively that the values of the positions in \(U^{(i)}\) are well-approximated by the pseudo-polynomial algorithm and that these values are sufficiently larger than those in the residual set \(V^{(i)}{\setminus } U^{(i)}\).
Claim 2
For \(i=1,\ldots ,h^{\prime }\), let \(\hat{s}^{(i)}\) be the situation returned by the pseudo-polynomial algorithm on the game \(\hat{\mathcal {G}}^{(i)}\) in step 4. Then, for any \(w\in U^{(i)}\), we have
Proof
This follows from Lemma 1 by the uniform optimality of \(\hat{s}^{(i)}\) in \(\hat{\mathcal {G}}^{(i)}\) and the fact that \(\mu _w(\hat{\mathcal {G}}^{(i)})\ge 1/\varepsilon ^{\prime }\) for every \(w\in U^{(i)}.\) \(\square \)
Claim 3
For all \(u \in U^{(i)}\) and \(v\in V^{(i)}{\setminus } U^{(i)}\), we have \((1+\varepsilon ^{\prime })\mu _u(\mathcal {G}^{(i)})>\mu _v(\mathcal {G}^{(i)})\).
Proof
For \(u\in U^{(i)}, v\in V^{(i)}{\setminus } U^{(i)}\), we have \(\mu _u(\hat{\mathcal {G}}^{(i)})\ge 1/\varepsilon ^{\prime }\) and \(\mu _v(\hat{\mathcal {G}}^{(i)})< 1/\varepsilon ^{\prime }\). Thus, by Lemma 1,
\(\square \)
We observe that the strategy \(\tilde{s}^{(i)}\), returned by the ith call to the algorithm, is determined as follows (c.f. steps 11 and 11): for \(w\in U^{(i)}\), \(\tilde{s}^{(i)}(w)=\hat{s}^{(i)}(w)\) is chosen by the solution of the game \(\hat{\mathcal {G}}^{(i)}\), and for \(w\in V^{(i)}{\setminus } U^{(i)}\), \(\tilde{s}^{(i)}(w)\) is determined by the (recursive) solution on the residual game \(\tilde{\mathcal {G}}^{(i)}=\mathcal {G}^{(i+1)}\). The following claim states that the value of any vertex \(u\in V^{(i)}{\setminus } U^{(i)}\) in the residual game is a good (relative) approximation of the value in the original game \(\mathcal {G}^{(i)}.\)
Claim 4
For all \(i=1,\ldots ,h^{\prime }\) and any \(u\in V^{(i)}{\setminus } U^{(i)}\), we have
Proof
Fix \(u\in V^{(i)}{\setminus } U^{(i)}\). Let \(s^*=(s_W^*,s_B^*)\) and \((\bar{s}_W,\bar{s}_B)\) be optimal situations in \((\mathcal {G}^{(i)},u)\) and \((\bar{\mathcal {G}}^{(i)},u):=(\mathcal {G}^{(i)}[V^{(i)}{\setminus } U^{(i)}],u)\), respectively. Let us extend \(\bar{s}\) to a situation in \(\mathcal {G}^{(i)}\) by setting \(\bar{s}(v)=\hat{s}^{(i)}(v)\) for all \(v\in U^{(i)}\), where \(\hat{s}\) is the situation returned in by the pseudo-polynomial algorithm step 4. Then, by Claim 2.4(i’), White has no way to escape to \(U^{(i)}\), or in other words, \(s_W^*(u^{\prime })\in V^{(i)}{\setminus } U^{(i)}\) for all \(u^{\prime }\in V_W^{(i)}{\setminus } U^{(i)}\). Hence,
For similar reasons, \(\mu _u(\mathcal {G}^{(i)})\ge \mu _u(\bar{\mathcal {G}}^{(i)})\), if \(s_B^*(v)\in V^{(i)}{\setminus } U^{(i)}\) for all \(v\in V_B^{(i)}{\setminus } U^{(i)}\) such that v is reachable from u in the graph \(G(s_W^*,s_B^*)\). Suppose, on the other hand, that there is a \(v\in V_B^{(i)}{\setminus } U^{(i)}\) such that \(u^{\prime }=s_B^*(v)\in U^{(i)}\), and v is reachable from u in the graph \(G(s_W^*,s_B^*)\). Then (by Lemma 1) \(\mu _u(\mathcal {G}^{(i)})=\mu _{u^{\prime }}(\mathcal {G}^{(i)})\ge K^{(i)}\mu _{u^{\prime }}(\hat{\mathcal {G}}^{(i)})\ge \frac{K^{(i)}}{\varepsilon ^{\prime }}\). Moreover, the optimality of \((\hat{s}_W,\hat{s}_B)\) in \(\hat{\mathcal {G}}^{(i)}\) and the fact that \(\frac{1}{K^{(i)}}r^{(i)}- 1\le \hat{r}^{(i)}\le \frac{1}{K^{(i)}}r^{(i)}\) imply by Lemma 1 that
and
In particular,
where \(\mu _u(\mathcal {G}^{(i)}(\bar{s}_W,\hat{s}_B))=\mu _u(\bar{\mathcal {G}}^{(i)}(\bar{s}_W,\hat{s}_B))\) follows from Claim 1 (since \((\bar{s}_W,\hat{s}_B)(v)\in V^{(i)}{\setminus } U^{(i)}\)). It follows that \(\mu _u(\mathcal {G}^{(i)})\ge \frac{1}{1+2\varepsilon ^{\prime }}\mu _u(\bar{\mathcal {G}}^{(i)})\). \(\square \)
Let us fix \(\varepsilon _{h^{\prime }}=\varepsilon ^{\prime }\), and for \(i=h^{\prime }-1,h^{\prime }-2,\ldots ,1\), let us choose \(\varepsilon _i\) such that \(1+\varepsilon _i\ge (1+\varepsilon ^{\prime })(1+2\varepsilon ^{\prime })(1+\varepsilon _{i+1})\). Next, we claim that the strategies \((\tilde{s}_W^{(i)},\tilde{s}_B^{(i)})\) returned by the ith call of \({{\mathrm{{\text {FPTAS-BW}}}}}(\mathcal {G},\varepsilon )\) are relatively \(\varepsilon _i\)-optimal in \(\mathcal {G}^{(i)}\).
Claim 5
For all \(i=1,\ldots ,h^{\prime }\) and any \(w\in V^{(i)}\), we have
Proof
The proof is by induction on \(i=h^{\prime },h^{\prime }-1,\ldots ,1\). For \(i=h^{\prime }\), the statement follows directly from Claim 1 since \(U^{(h^{\prime })}=V^{(h^{\prime })}\). So suppose that \(i<h^{\prime }\).
By induction, \(\bar{s}^{(i)}=(\bar{s_W^{(i)}},\bar{s}_B^{(i)}):=(\tilde{s}_W^{(i)},\tilde{s}_B^{(i)})[V^{(i)}{\setminus } U^{(i)}]\) is \(\varepsilon _{i+1}\)-optimal in \(\mathcal {G}^{(i+1)}=\tilde{\mathcal {G}}^{(i)}\). Recall that the game \(\tilde{\mathcal {G}}^{(i)}\) is obtained from \(\bar{\mathcal {G}}^{(i)}:=\mathcal {G}^{(i)}[V^{(i)}{\setminus } U^{(i)}]\) by reducing the rewards according to step 9. Thus, Lemma 3 yields that \(\mu _v(\bar{\mathcal {G}}^{(i)})=\mu _v(\tilde{\mathcal {G}}^{(i)})\), and hence,
\(\square \)
Proof of (11): Consider an arbitrary strategy \(s_W\in S_W^{(i)}\) for White. Suppose first that \(w\in U^{(i)}.\) Note that, by Claim 1(iii), \(\tilde{s}_B^{(i)}(u)\in U^{(i)}\) for all \(u\in V_B\cap U^{(i)}.\) If also \(s_W(u)\in U^{(i)}\) for all \(u\in V_W\cap U^{(i)}\), such that u is reachable from w in the graph \(G(s_W,\tilde{s}_B^{(i)})\), then Claim 2 implies \(\mu _w(\mathcal {G}^{(i)}(s_W,\tilde{s}_B^{(i)}))\le (1+\varepsilon ^{\prime })\mu _w(\mathcal {G}^{(i)})\le (1+\varepsilon _i)\mu _w(\mathcal {G}^{(i)})\).
Suppose therefore that \(v=s_W(u)\not \in U^{(i)}\) for some \(u\in V_W\cap U^{(i)}\) such that u is reachable from w in the graph \(G(s_W,\tilde{s}_B^{(i)})\).
Note that \(\tilde{s}^{(i)}_B(v^{\prime })\in V^{(i)}{\setminus } U^{(i)}\) for all \(v^{\prime }\in V_B^{(i)}{\setminus } U^{(i)}\), and by Claim 1(i’), \(S_W^{(i+1)}\) is the restriction of \(S_W^{(i)}\) to \(V^{(i)}{\setminus } U^{(i)}\). Thus, we get the following series of inequalities:
The equality holds since v is reachable from w in the graph \(G(s_W,\tilde{s}_B^{(i)})\); the first inequality holds by (13); the second inequality holds because of (10); the third one follows from Claim 3; the fourth inequality holds since \((1+\varepsilon _{i+1})(1+2\varepsilon ^{\prime })(1+\varepsilon ^{\prime })\le (1+\varepsilon _i)\).
If \(w\in V^{(i)}{\setminus } U^{(i)},\) then a similar argument as in (15) and (16) shows that \(\mu _w(\mathcal {G}^{(i)}(s_W,\tilde{s}_B^{(i)}))\le (1+\varepsilon _{i+1})(1+2\varepsilon ^{\prime })\mu _w(\mathcal {G}^{(i)})\le (1+\varepsilon _i)\mu _w(\mathcal {G}^{(i)})\). Thus, (11) follows.
Proof of (12): Consider an arbitrary strategy \(s_B\in S_B^{(i)}\) for Black. If \(w\in U^{(i)},\) then we have \(\mu _w(\mathcal {G}^{(i)}(\tilde{s}_W^{(i)},s_B))\ge (1-\varepsilon ^{\prime })\mu _w(\mathcal {G}^{(i)})\ge (1-\varepsilon _i)\mu _w(\mathcal {G}^{(i)})\) from Claims 1(i–iii), and \(\varepsilon _i\ge \varepsilon ^{\prime }\).
Suppose now that \(w\in V^{(i)}{\setminus } U^{(i)}\). If \(s_B(v)\in V^{(i)}{\setminus } U^{(i)}\) for all \(v\in V_B^{(i)}{\setminus } U^{(i)}\), then we get by (14) and (10) that \(\mu _w(\mathcal {G}^{(i)}(\tilde{s}^{(i)}_W,s_B))\ge (1-\varepsilon _{i+1})\mu _w(\mathcal {G}^{(i)})\ge (1-\varepsilon _{i})\mu _w(\mathcal {G}^{(i)})\). A similar situation holds if \(s_B(v)\in V^{(i)}{\setminus } U^{(i)}\) for all \(v\in V_B^{(i)}{\setminus } U^{(i)}\) such that v is reachable from w in the graph \(G(\tilde{s}_W^{(i)},s_B)\). So it remains to consider the case when there is a \(v\in V_B^{(i)}{\setminus } U^{(i)}\) such that \(u=s_B(v)\in U^{(i)}\), and v is reachable from w in the graph \(G(\tilde{s}_W^{(i)},s_B)\). Since Black has no escape from \(U^{(i)}\) in this case [by Claim 1(i)], Claims 2 and 3 yield
where the last inequality follows from the fact that, for all \(i=1,\ldots ,h^{\prime }-1\), \( 1+\varepsilon _i\ge (1+2\varepsilon ^{\prime })(1+\varepsilon ^{\prime })^2\ge (1+\varepsilon ^{\prime })^3, \) and hence, \(1-\varepsilon _i\le 2-(1+\varepsilon ^{\prime })^3\le (1-\varepsilon ^{\prime })^2.\)
Finally, to finish the proof of Lemma 7, we set the \(\varepsilon _i\)’s and \(\varepsilon ^{\prime }\) such that \(\varepsilon _1=\bigl ((1+2\varepsilon ^{\prime })(1+\varepsilon ^{\prime })\bigr )^{h^{\prime }-1}(1+\varepsilon ^{\prime })-1\le \varepsilon \). Note that our choice of \(\varepsilon ^{\prime }=\frac{\ln (1+\varepsilon )}{3h}\) satisfies this as
3 Concluding Remarks
In this paper, we have shown that computing the game values of classes of stochastic mean payoff games with perfect information and a constant number of random positions admits approximation schemes, provided that the class of games at hand can be solved in pseudo-polynomial time.
To conclude this paper, let us raise a number of open questions:
-
1.
First, in the conference version of this paper [2], we claimed that, up to some technical requirements, a pseudo-polynomial algorithm for a class of stochastic mean payoff games implies that this class has polynomial smoothed complexity (smoothed analysis is a paradigm to analyze algorithms with poor worst-case and good practical performance. Since its invention, it has been applied to a variety of algorithms and problems to explain their performance or complexity, respectively [31, 38]).
However, the proof of this result is flawed. In particular, the proof of a lemma that is not contained in the proceedings version, but only in the accompanying technical report (Oberwolfach Preprints, OWP 2010-22, Lemma 4.3) is flawed. The reason for this is relatively simple: If we are just looking for an optimal solution, then we can show that the second-best solution is significantly worse than the best solution. For two-player games, where one player maximizes and the other player minimizes, we have an optimization problem for either player, given an optimal strategy of the other player. However, the optimal strategy of the other player depends on the random rewards of the edges. Thus, the two strategies are dependent. As a consequence, we cannot use the full randomness of the rewards to use an isolation lemma to compare the best and second-best response to the optimal strategy of the other player.
Therefore, the question, whether stochastic mean payoff games have polynomial smoothed complexity, remains open.
-
2.
In Sect. 2.3 we gave an approximation scheme that relatively approximates the value of a BWR-game from any starting position. If we apply this algorithm from different positions, we are likely to get two different relatively \(\varepsilon \)-optimal strategies. In Sect. 2.4 we have shown that a modification of the algorithm in Sect. 2.3 yields a uniformly relatively \(\varepsilon \)-optimal strategies when there are no random positions. It remains an interesting question whether this can be extended to BWR-games with a constant number of random positions.
-
3.
Is it true that pseudo-polynomial solvability of a class of stochastic mean payoff games implies polynomial smoothed complexity? In particular, do mean payoff games have polynomial smoothed complexity?
-
4.
Related to Question 3: is it possible to prove an isolation lemma for (classes of) stochastic mean payoff games? We believe that this is not possible and that different techniques are required to prove smoothed polynomial complexity of these games.
-
5.
While stochastic mean payoff games include parity games as a special case, the probabilistic model that we used here does not make sense for parity games. However, parity games can be solved in quasi-polynomial time [8]. One wonders if they also have polynomial smoothed complexity under a reasonable probabilistic model.
-
6.
Finally, let us remark that removing the assumption that k is constant in the above results remains a challenging open problem that seems to require totally new ideas. Another interesting question is whether stochastic mean payoff games with perfect information can be solved in parameterized pseudo-polynomial time with the number k of stochastic positions as the parameter?
References
Andersson, D., Miltersen, P.B.: The complexity of solving stochastic games on graphs. In: 20th International Symposium on Algorithms and Computation (ISAAC), Lecture Notes in Computer Science, vol. 5878, pp. 112–121. Springer (2009)
Boros, E., Elbassioni, K., Fouz, M., Gurvich, V., Makino, K., Manthey, B.: Stochastic mean payoff games: smoothed analysis and approximation schemes. In: Proceedings of the 38th International Colloquium on Automata, Languages and Programming (ICALP), Part I, Lecture Notes in Computer Science, vol. 6755, pp. 147–158. Springer (2011)
Boros, E., Elbassioni, K., Gurvich, V., Makino, K.: Every stochastic game with perfect information admits a canonical form. RRR-09-2009, RUTCOR. Rutgers University, New Brunswick (2009)
Boros, E., Elbassioni, K., Gurvich, V., Makino, K.: A convex programming-based algorithm for mean payoff stochastic games with perfect information. Optim. Lett. (2017)
Boros, E., Elbassioni, K.M., Gurvich, V., Makino, K.: A pumping algorithm for ergodic stochastic mean payoff games with perfect information. In: Proceedings of the 14th International Conference on Integer Programming and Combinatorial Optimization (IPCO), Lecture Notes in Computer Science, vol. 6080, pp. 341–354. Springer (2010)
Boros, E., Elbassioni, K.M., Gurvich, V., Makino, K.: A pseudo-polynomial algorithm for mean payoff stochastic games with perfect information and a few random positions. In: Fomin, F.V., Freivalds, R., Kwiatkowska, M.Z., Peleg, D. (eds.) Proceedings of 40th International Colloquium on Automata, Languages and Programming, Part I, Lecture Notes in Computer Science, vol. 7965, pp. 220–231. Springer (2013)
Boros, Endre, Gurvich, Vladimir: Why chess and backgammon can be solved in pure positional uniformly optimal strategies? RRR-21-2009, RUTCOR. Rutgers University, New Brunswick (2009)
Calude, C.S., Jain, S., Khoussainov, B., Li, W., Stephan, F.: Deciding parity games in quasipolynomial time. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19–23, 2017, pp. 252–263 (2017)
Chatterjee, K., de Alfaro, L., Henzinger, T.A.: Termination criteria for solving concurrent safety and reachability games. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 197–206 (2009)
Chatterjee, K., Ibsen-Jensen, R.: The complexity of ergodic mean-payoff games. In: Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E. (eds.) Proceedings of the 41st International Colloquium on Automata, Languages and Programming, Part II, Lecture Notes in Computer Science, vol. 8572, pp. 122–133. Springer (2014)
Chen, X., Deng, X., Teng, S.-H.: Settling the complexity of computing two-player Nash equilibria. J. ACM 56(3), 14 (2009)
Cho, Grace E., Meyer, Carl D.: Markov chain sensitivity measured by mean first passage times. Linear Algebra Appl. 316(1–3), 21–28 (2000)
Condon, Anne: The complexity of stochastic games. Inf. Comput. 96(2), 203–224 (1992)
Condon, A.: On algorithms for simple stochastic games. In: Cai, J.-Y. (ed.) Advances in Computational Complexity Theory, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 13, pp. 51–73. AMS, Providence, RI (1993)
Dai, D., Ge, R.: Another sub-exponential algorithm for the simple stochastic game. Algorithmica 61, 1092–1104 (2011)
Ehrenfeucht, A., Mycielski, J.: Positional games over a graph. Not. Am. Math. Soc. 20, A-334 (1973)
Ehrenfeucht, Andrzej, Mycielski, Jan: Positional strategies for mean payoff games. Int. J. Game Theory 8, 109–113 (1979)
Gentilini, Raffaella: A note on the approximation of mean-payoff games. Inf. Process. Lett. 114(7), 382–386 (2014)
Gillette, D.: Stochastic games with zero stop probabilities. In: Dresher, M., Tucker, A.W., Wolfe, P. (eds.) Contributions to the Theory of Games, Vol. 3, Annals of Mathematics Studies, vol. 39, pp. 179–187. Princeton University Press, Princeton (1957)
Gimbert, H., Horn, F.: Simple stochastic games with few random vertices are easy to solve. In: Proceedings of the 11th International Conference on Foundations of Software Science and Computational Structures (FoSSaCS), Lecture Notes in Computer Science, vol. 4962, pp. 5–19. Springer (2008)
Gurvich, Vladimir, Karzanov, Alexander V., Khachiyan, Leonid: Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Comput. Math. Math. Phys. 28, 85–91 (1988)
Halman, Nir: Simple stochastic games, parity games, mean payoff games and discounted payoff games are all LP-type problems. Algorithmica 49(1), 37–50 (2007)
Ibsen-Jensen, R., Miltersen, P.B.: Solving simple stochastic games with few coin toss positions. In: Epstein, L., Ferragina, P. (eds.) Proceedings of the 20th Annual European Symposium on Algorithms (ESA), Lecture Notes in Computer Science, vol. 7501, pp. 636–647. Springer (2012)
Nash Jr., J.F.: Equilibrium points in \(n\)-person games. In: Proceedings of the National Academy of Sciences, Vol. 36, pp. 48–49 (1950)
Nash Jr., J.F.: Non-cooperative games. Ann. Math. 54(1), 286–295 (1951)
Jurdziński, M.: Games for verification: algorithmic issues. Ph.D. thesis, University of Aarhus, BRICS (2000)
Karp, Richard M.: A characterization of the minimum cycle mean in a digraph. Discrete Math. 23, 309–311 (1978)
Karzanov, Alexander V., Lebedev, Vasilij N.: Cyclical games with prohibition. Math. Program. 60, 277–293 (1993)
Liggett, Thomas M., Lippman, Steven A.: Stochastic games with perfect information and time-average payoff. SIAM Rev. 4, 604–607 (1969)
Littman, M.L.: Algorithms for sequential decision making. Ph.D. thesis, Department of Computer Science, Brown University (1996)
Manthey, B., Röglin, H.: Smoothed analysis: analysis of algorithms beyond worst case. IT Inf. Technol. 53(6), 280–286 (2011)
Mine, H., Osaki, S.: Markovian decision process. Elsevier, Amsterdam (1970)
Moulin, Hervé: Extension of two person zero sum games. J. Math. Anal. Appl. 5(2), 490–507 (1976)
Moulin, H.: Prolongement des jeux à deux joueurs de somme nulle. Bull. Soc. Math. Fr. Mem. 45, 5–111 (1976)
Pisaruk, Nicolai N.: Mean cost cyclical games. Math. Oper. Res. 24(4), 817–828 (1999)
Roth, A., Balcan, M.-F., Kalai, A., Mansour, Y.: On the equilibria of alternating move games. In: Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 805–816. SIAM (2010)
Schewe, S.: From parity and payoff games to linear programming. In: Proceedings of the 34th International Symposium on Mathematical Foundations of Computer Science (MFCS), Lecture Notes in Computer Science, vol. 5734, pp. 675–686. Springer (2009)
Spielman, Daniel A., Teng, Shang-Hua: Smoothed analysis: an attempt to explain the behavior of algorithms in practice. Commun. ACM 52(10), 76–84 (2009)
Vorobyov, Sergei: Cyclic games and linear programming. Discrete Appl. Math. 156(11), 2195–2231 (2008)
Zwick, Uri, Paterson, Mike: The complexity of mean payoff games on graphs. Theoret. Comput. Sci. 158(1–2), 343–359 (1996)
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version appeared in the proceedings of ICALP 2011 [2]. The first author is grateful for the partial support of the National Science Foundation (CMMI-0856663, “Discrete Moment Problems and Applications”), and the first, second, fourth and fifth authors are grateful to the Mathematisches Forschungsinstitut Oberwolfach for providing a stimulating research environment with an RIP award in March 2010. The forth author gratefully acknowledges the partial support of the Russian Academic Excellence Project ‘5-100’.
Appendix: Lemmas About Markov Chains
Appendix: Lemmas About Markov Chains
For a situation s, let \(d_{G(s)}(u,v)\) be the stochastic distance from u to v in G(s), which is the shortest (directed) distance between vertices u and v in the graph obtained from G(s) by setting the length of every deterministic arc [i.e., one with \(p_{e}(s)=1\)] to 0 and of every stochastic arc [i.e., one with \(p_{e}(s)\in (0,1)\)] to 1. Let
be the stochastic diameter of \(\mathcal {G}\). Clearly, \(\lambda (\mathcal {G})\le k(\mathcal {G})\). Some of our bounds will be given in terms of \(\lambda \) instead of k, which implies stronger bounds on the running-times of some of the approximation schemes.
A set of vertices \(U \subseteq V\) is called an absorbing class of the Markov chain \(\mathcal {M}\) if there is no arc with positive probability from U to \(V {\setminus } U\), i.e., U can never be left once it is entered, and U is strongly connected, i.e., any vertex of U is reachable from any other vertex of U.
Lemma 8
Let \(\mathcal {M}=(G=(V,E),P)\) be a Markov chain on n vertices with starting vertex u. Then the limiting probability of any vertex \(v\in V\) is either 0 or at least \(p_{\min }^{2\lambda }/n\) and the limiting probability of any arc \((u,v)\in E\) is either 0 or at least \(p_{\min }^{2\lambda +1}/n\).
Proof
Let \(\pi \) and \(\rho \) denote the limiting vertex- and arc-distribution, respectively. Let C be any absorbing class of \(\mathcal {M}\) reachable from u. We deal with \(\pi \) first. Clearly, for any v that does not lie in any of these absorbing classes, we have \(\pi _v = 0\). It remains to show that for every \(v^{\prime }\in C\), we have \(\pi _{v^{\prime }} \ge p_{\min }^{2\lambda }/n\). Denote by \(\pi _{C} = \sum _{v\in C} \pi _v\) the total limiting probability of C. Note that \(\pi _{C}\) is equal to the probability that we reach some vertex \(v \in C\) starting from u. Since there is a simple path in G from u to C with at most \(\lambda \) stochastic vertices, this probability is at least \(p_{\min }^{\lambda }\). Furthermore, there exists a vertex \(v\in C\) with \(\pi _v \ge \pi _{C}/|C| \ge p_{\min }^{\lambda }/n\). Now for any \(v^{\prime }\in C\), there exists again a simple path in G from v to \(v^{\prime }\) with at most \(\lambda \) stochastic positions, so the probability that we reach \(v^{\prime }\) starting from v is at least \(p_{\min }^{\lambda }\). It follows that \(\pi _{v^{\prime }} \ge p_{\min }^{2\lambda }/n\).
Now for \(\rho \), note that \(\rho _{(u,v)} \ge \pi _up_{\min }\), if \((u,v)\in E\). Since \(\pi _u\) is either 0 or at least \(p_{\min }^{2\lambda }/n\), the claim follows. \(\square \)
A Markov chain is said to be irreducible if its state space is a single absorbing class. For an irreducible Markov chain, let \(m_{uv}\) denote the mean first passage time from vertex u to vertex v, and \(m_{vv}\) denote the mean return time to vertex v: \(m_{uv}\) is the expected number of steps to reach vertex v for the first time, starting from vertex u, and \(m_{vv}\) is the expected number of steps to return to vertex v for the first time, starting from vertex v. The following lemma relates these values to the sensitivity of the limiting probabilities of a Markov chain.
Lemma 9
(Cho and Meyer [12]) Let \(\varepsilon > 0\). Let \(\mathcal {M}=(G=(V,E),P)\) be an irreducible Markov chain. For any transition probabilities \(\tilde{P}\) with \(\Vert \tilde{P}-P\Vert _{\infty } \le \varepsilon \) such that the corresponding Markov chain \(\tilde{\mathcal {M}}\) is also irreducible, we have \(\Vert \tilde{\pi }-\pi \Vert _{\infty } \le \frac{1}{2} \varepsilon \cdot \max _v\frac{\max _{u\ne v}m_{uv}}{m_{vv}}\), where \(m_{vu}\) are the mean values defined with respect to \(\mathcal {M}\).
Let \(\mathcal {M}=(G=(V,E),P,r)\) be a weighted Markov chain. We denote by \(\mu _u(\mathcal {M}):=\sum _{(v,u)\in E}\pi _vp_{vu}r_{vu}\) the limiting average weight, where \(\pi =(\pi _v:v\in V)\) is the limiting distribution when u is the starting position. We will write \(\mu _u\) when \(\mathcal {M}\) is clear from the context.
Lemma 10
Let \(\mathcal {M}=(G=(V,E),P,r)\) be a weighted Markov chain with arc weights in \([r^-,r^+]\), and let \(\varepsilon \le \frac{1}{2}p_{\min }=\frac{1}{2}p_{\min }(\mathcal {M})\) be a positive constant. Let \(\tilde{\mathcal {M}}=(G=(V,E),\tilde{P},r)\) be a weighted Markov chain with transition probabilities \(\tilde{P}\) such that \(\Vert \tilde{P}-P\Vert _{\infty } \le \varepsilon \) and \(\tilde{p}_{uv}=0\) if \(p_{uv}=0\). Then, for any \(u\in V\), we have \(|\mu _u(\tilde{\mathcal {M}})-\mu _u(\mathcal {M})| \le \delta (\mathcal {M},\varepsilon )\), where \(\delta \) is defined as in (6):
where \(n =|V|\), \(p_{\min }= p_{\min }(\mathcal {M})\), \(k=k(\mathcal {M})\), and \(r^*=r^*(\mathcal {M}):=\max \{|r^+(\mathcal {M})|,|r^-(\mathcal {M})|\}\).
Proof
Fix the starting vertex \(u_0\in V\). Let \(\pi \) and \(\tilde{\pi }\) denote the limiting distributions corresponding to \(\mathcal {M}\) and \(\tilde{\mathcal {M}}\), respectively. We first bound \(\Vert \pi -\tilde{\pi }\Vert _{\infty }\). Since \(\varepsilon <p_{\min }\), we have \(\tilde{p}_{uv} = 0\) if and only if \(p_{uv} = 0\). It follows that \(\mathcal {M}\) and \(\tilde{\mathcal {M}}\) have the same absorbing classes. Let \(C_1, \dots , C_{\ell }\) denote these classes. Denote by \(\pi _{C_i} = \sum _{v\in C_i} \pi _v\) and \(\tilde{\pi }_{C_i} = \sum _{v\in C_i}\tilde{\pi }_v\) the total limiting probability of \(C_i\) with respect to \(\pi \) and \(\tilde{\pi }\), respectively. Furthermore, let \(\pi ^{|i}\) and \(\tilde{\pi }^{|i}\) be the limiting distributions, corresponding to \(\mathcal {M}\) and \(\tilde{\mathcal {M}}\), respectively, conditioned on the event that the Markov process is started in \(C_i\) (i.e., \(u_0\in C_i\)). Note that these conditional limiting distributions describe the limiting distributions for the irreducible Markov chains restricted to \(C_i\). By Lemma 9, we have \(\Vert \pi ^{|i} - \tilde{\pi }^{|i}\Vert _{\infty } \le \frac{1}{2} \varepsilon \cdot \max _{v\in C_i}\max _{\begin{array}{c} u \in C_i \\ u \ne v \end{array}}\frac{m_{uv}}{m_{vv}}\). \(\square \)
Claim 6
For any \(u,v \in C_i\), we have \(m_{uv}\le \frac{(\lambda +1)|C_i|}{p_{\min }^{\lambda }}\).
Proof
Fix \(v\in C_i\). Note that, for any \(u\in C_i\), we have
Let \(h=\max \{d_G(u,v) \mid u\in C_i\}\). For \(j=0,1,\ldots , h\), let \(X_j=\max \{m_{uv} \mid u\in C_i, d_G(u,v)=j\}\). Let \(\ell \) be in \({\text {argmax}}\{X_j \mid j \in \{1,\ldots ,h\}\}\). Then \(X_0\le |C_i|\) and, for \(j=1,\ldots ,h\), (17) implies that
Indeed, for a vertex for \(u\in V\) such that \(d_G(u,v)=j\), there is a path Q from u to v with j stochastic arcs. Let \(u^{\prime }\) be the vertex closest to u on Q such that \(d_G(u^{\prime },v)=j-1\), and let \(u^{\prime \prime }\) be the vertex on Q preceding \(u^{\prime }\). Then \(u^{\prime \prime }\) is stochastic, and hence by (17)
using the fact that \(X_j\le X_{\ell }\) for all j and \(p_{u^{\prime \prime }u^{\prime }}\ge p_{\min }\). Finally, \(m_{uv}\le |C_i|-1+m_{u^{\prime \prime }v}\) implies (18).
Applying (18) for \(j=1,\ldots ,\ell \) yields
This implies that \(X_{\ell }\le |C_i|\frac{1-p_{\min }^{\ell +1}}{1-p_{\min }}p_{\min }^{-\ell }\le |C_i|(\lambda +1)p_{\min }^{-\lambda }\). \(\square \)
It follows that \(\Vert \pi ^{|i} - \tilde{\pi }^{|i}\Vert _{\infty } \le \frac{\varepsilon (\lambda +1)|C_i|}{2p_{\min }^{\lambda }}\).
Claim 7
\(|\pi _{C_i} - \tilde{\pi }_{C_i}|\le \varepsilon n \lambda p_{\min }^{-\lambda }. \)
Proof
Without loss of generality we assume that \(u_0\not \in C_i\). For a transient vertex v (i.e., one for which \(\pi _v=0\)), let \(y_v\) and \(\tilde{y}_v\) be the absorption probability into class \(C_i\) in \(\mathcal {M}\) and \(\tilde{\mathcal {M}}\), respectively. In particular \(y_{u_0}=\pi _{C_i}\). Let \(p_{v C_i}=\sum _{u\in C_i} p_{vu}\) and \(\tilde{p}_{v C_i}=\sum _{u\in C_i} \tilde{p}_{vu}\). Then we have
Similarly,
Write \(\Delta _v:=|\tilde{y}_v-y_v|\). Subtracting (19) from (20) yields
Let \(h=\max \{d_G(u,C_i) \mid u\notin C_i \wedge d_G(u,C_i)<\infty \}\), where \(d_G(u,C_i)=\min \{d_G(u,v) \mid v\in C_i\}\) is the stochastic distance in G from u to \(C_i\). For \(j=0,1,\ldots , h\), let \(X_j=\max \{\Delta _{u} \mid u\notin C_i \wedge d(u,C_i)=j\},\) and let \(\ell \) be in \({\text {argmax}}\{X_j \mid j \in \{1\ldots ,h\}\}\). Then \(X_0=0\) (since deterministic vertices in \(\mathcal {M}\) remain deterministic in \(\tilde{\mathcal {M}}\)) and, for \(j=1,\ldots ,h\), (21) (for a stochastic vertex and (19) for a deterministic vertex) implies
Applying this iteratively gives us \(X_{\ell }\le \varepsilon n\frac{1-p_{\min }^{\ell }}{1-p_{\min }}p_{\min }^{-\ell }\le \varepsilon n \lambda p_{\min }^{-\lambda }\). \(\square \)
Let \(v\in V\) be an arbitrary vertex. If v does not lie in any absorbing class, then \(\pi _v = \tilde{\pi }_v = 0\). Otherwise, let \(v \in C_i\). By the above claims, we have
Similarly, we can conclude that \(\tilde{\pi }_v\le \pi _v+\delta ^{\prime }(\tilde{\mathcal {M}},\varepsilon )\). Note that \(p_{\min }(\tilde{\mathcal {M}})\ge p_{\min }(\mathcal {M})/2,\) since \(\varepsilon \le p_{\min }(\mathcal {M})/2.\) It follows that
which completes the proof.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Boros, E., Elbassioni, K., Fouz, M. et al. Approximation Schemes for Stochastic Mean Payoff Games with Perfect Information and Few Random Positions. Algorithmica 80, 3132–3157 (2018). https://doi.org/10.1007/s00453-017-0372-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-017-0372-7