1 Introduction

The rise of the Internet has led to an explosion in research in game theory, the mathematical modeling of competing agents in strategic situations. The central concept in such models is that of a Nash equilibrium, which defines a state where no agent gains an advantage by changing to another strategy. Nash equilibria serve as predictions for the outcome of strategic situations in which selfish agents compete.

A fundamental result in game theory states that if the agents can choose a mixed strategy (i.e., probability distributions of deterministic strategies), a Nash equilibrium is guaranteed to exist in finite games [24, 25]. Often, however, already pure (i.e., deterministic) strategies lead to a Nash equilibrium. Still, the existence of Nash equilibria might be irrelevant in practice since their computation would take too long (finding mixed Nash equilibria in two-player games is PPAD-complete in general [11]). Thus, algorithmic aspects of game theory have gained a lot of interest. Following the dogma that only polynomial time algorithms are feasible algorithms, it is desirable to show polynomial time complexity for the computation of Nash equilibria.

We consider two-player zero-sum stochastic mean payoff games with perfect information. In this case the concept of Nash equilibria coincides with saddle points or mini–max/maxi–min strategies. The decision problem associated with computing such strategies and the values of these games is in the intersection of NP and co-NP, but it is unknown whether it can be solved in polynomial time. In cases where efficient algorithms are not known to exist, an approximate notion of a saddle point has been suggested. In an approximate saddle point, no agent can gain a substantial advantage by changing to another strategy. In this paper, we design approximation schemes for saddle points for such games when the number of random positions is fixed (see Sect. 1.2 for a definition).

In the remainder of this section, we introduce the concepts used in this paper. Our results are summarized in Sect. 1.4. After that, we present our approximation schemes (Sect. 2). We conclude with a list of open problems (Sect. 3), where we address in particular the question of polynomial smoothed complexity of mean payoff games. In the conference version of this paper [2], we wrongly claimed that stochastic mean payoff games can be solved in smoothed polynomial time.

1.1 Stochastic Mean Payoff Games

1.1.1 Definition and Notation

The model that we consider is described is a stochastic mean payoff game with perfect information, or equivalently a BWR-game \(\mathcal {G}= (G, P,r)\):

  • \(G=(V,E)\) is a directed graph that may have loops and multiple edges, but no terminal positions, i.e., no positions of out-degree 0. The vertex set V of G is partitioned into three disjoint subsets \(V = V_B \cup V_W \cup V_R\) that correspond to black, white, and random positions, respectively. The edges stand for moves. The black and white positions are owned by two players: Black —the minimizer—owns the black positions in \(V_B\), and White —the maximizer—owns the white positions in \(V_W\). The positions in \(V_R\) are owned by nature.

  • P is the vector of probability distributions for all positions \(v \in V_R\) owned by nature. We assume that \(\sum _{u: (v,u) \in E} p_{vu} = 1\) for all \(v \in V_R\) and \(p_{vu} > 0\) for all \(v \in V_R\) and \((v,u) \in E\).

  • r is the vector of rewards; each edge e has a local reward \(r_e\).

Starting from some vertex \(v_0 \in V\), a token is moved along one edge e in every round of the game. If the token is on a black vertex, Black selects an outgoing edge e and moves the token along e. If the token is on a white vertex, then White selects an outgoing edge e. In a random position \(v \in V_R\), a move \(e= (v, u)\) is chosen according to the probabilities \(p_{v u}\) of the outgoing edges of v. In all cases, Black pays White the reward \(r_e\) on the selected edge e.

Starting from a given initial position \(v_0 \in V\), the game yields an infinite walk \((v_0, v_1, v_2, \ldots )\), called a play. Let \(b_i\) denote the reward \(r_{(v_{i-1},v_{i})}\) received by White in step i. The undiscounted limit average effective payoff is defined as the Cesàro average \(c=\liminf _{n\rightarrow \infty }\frac{\sum _{i=1}^n{{\mathrm{\mathbb {E}}}}[b_i]}{n}\). White’s objective is to maximize c, while the objective of Black is to minimize it.

In this paper, we will restrict our attention to the sets of pure (that is, non-randomized) and stationary (that is, history-independent) strategies of players White and Black, denoted by \(S_W\) and \(S_B\), respectively; such strategies are called positional strategies. Formally, a positional strategy \(s_W \in S_W\) for White is a mapping that assigns a move \((v,u) \in E\) to each position in \(V_W\). We sometimes abbreviate \(s_W(v)=(v,u)\) by \(s_W(v)=u\). Strategies \(s_B \in S_B\) for Black are analogously defined. A pair of strategies \(s = (s_W, s_B)\) is called a situation. By abusing notation, let \(s(v) = u\) if \(v \in V_W\) and \(s_W(v) = u\) or \(v \in V_B\) and \(s_B(v) = u\).

Given a BWR-game \(\mathcal {G}= (G, P, r)\) and a situation \(s = (s_B, s_W)\), we obtain a weighted Markov chain \(\mathcal {G}(s) = (G(s)=(V,E(s)),P(s), r)\) with transition matrix P(s) defined in the obvious way:

$$\begin{aligned} p_{vu}(s) = {\left\{ \begin{array}{ll} 1&{} \quad \text {if}\; v \in V_W \cup V_B \;\text {and}\; u=s(v), \\ 0&{}\quad \text {if}\; v \in V_W \cup V_B \;\text {and}\; u \ne s(v),\; \text {and}\\ p_{vu}&{} \quad \text {if}\; v\in V_R. \end{array}\right. } \end{aligned}$$

Here, \(E(s)=\{e=(v,u) \in E \mid p_{vu}(s)>0\}\) is the set of arcs with positive probability. Given an initial position \(v_0\in V\) from which the play starts, we define the limiting (mean) effective payoff \(c_{v_0}(s)\) in \(\mathcal {G}(s)\) as

$$\begin{aligned} c_{v_0}(s)=\rho (s)^Tr=\sum _{e \in E}\rho _{e}(s)r_{e}, \end{aligned}$$

where \(\rho (s)=\rho (s,v_0)\in [0,1]^E\) is the arc-limiting distribution for \(\mathcal {G}(s)\) starting from \(v_0\). This means that for \((v,u)\in E\), we have \(\rho _{vu}(s)=\pi _v(s)p_{vu}(s)\), where \(\pi \in [0,1]^V\) is the limiting distribution in the Markov chain \(\mathcal {G}(s)\) starting from \(v_0\). In what follows, we will use \((\mathcal {G},v_0)\) to denote the game starting from \(v_0\). We will simply write \(\rho (s)\) for \(\rho (s,v_0)\) if \(v_0\) is clear from the context. For rewards \(r: E \rightarrow \mathbb {R}\), let \(r^- = \min _{e} r_e\) and \(r^+ = \max _e r_e\). Let \([r] = [r^-,r^+]\) be the range of r. Let \(R=R(\mathcal {G}) =r^+-r^-\) be the size of the range.

1.1.2 Strategies and Saddle Points

If we consider \(c_{v_0}(s)\) for all possible situations s, we obtain a matrix game \(C_{v_0} : S_W \times S_B \rightarrow \mathbb {R}\), with entries \(C_{v_0}(s_W,s_B) = c_{v_0}(s_W,s_B)\). It is known that every such game has a saddle point in pure strategies [19, 29]. Such a saddle point defines an equilibrium state in which no player has an incentive to switch to another strategy. The value at that state coincides with the limiting payoff in the corresponding BWR-game [19, 29].

We call a pair of strategies optimal if they correspond to a saddle point. It is well-known that there exist optimal strategies \((s^*_W,s^*_B)\) that do not depend on the starting position \(v_0\). Such strategies are called uniformly optimal. Of course there might be several optimal strategies, but they all lead to the same value. We define this to be the value of the game and write \(\mu _{v_0}(\mathcal {G}) = C_{v_0}(s^*_W,s_B^*)\), where \((s^*_W,s^*_B)\) is any pair of optimal strategies. Note that \(\mu _{v_0}(\mathcal {G})\) may depend on the starting node \(v_0\). Note also that for an arbitrary situation s, \(\mu _{v_0}(\mathcal {G}(s))\) denotes the effective payoff \(c_{v_0}(s)\) in the Markov chain \(\mathcal {G}(s)\).

An algorithm is said to solve the game if it computes an optimal pair of strategies.

1.2 Approximation and Approximate Equilibria

Given a BWR-game \(\mathcal {G}=(G=(V,E),P,r)\), a constant \(\varepsilon >0\), and a starting position \(v\in V\), an \(\varepsilon \)-relative approximation of the value of the game is determined by a situation \((s_W^*,s_B^*)\) such that

$$\begin{aligned} \max _{s_W}\mu _v\left( \mathcal {G}\left( s_W,s_B^*\right) \right) \le (1+\varepsilon )\mu _v(\mathcal {G}) \quad \text {and} \quad \min _{s_B}\mu _v\left( \mathcal {G}\left( s_W^*,s_B\right) \right) \ge (1-\varepsilon )\mu _v(\mathcal {G}). \end{aligned}$$
(1)

An alternative concept of an approximate equilibrium are \(\varepsilon \)-relative equilibria. They are determined by a situation \((s_W^*,s_B^*)\) such that

$$\begin{aligned} \max _{s_W}\mu _v\left( \mathcal {G}\left( s_W,s_B^*\right) \right)\le & {} (1+\varepsilon )\mu _v\left( \mathcal {G}\left( s_W^*,s_B^*\right) \right) \nonumber \\ \text {and} \quad \min _{s_B}\mu _v\left( \mathcal {G}\left( s_W^*,s_B\right) \right)\ge & {} (1-\varepsilon )\mu _v\left( \mathcal {G}\left( s_W^*,s_B^*\right) \right) . \end{aligned}$$
(2)

Note that, for sufficiently small \(\varepsilon \), an \(\varepsilon \)-relative approximation implies a \(\Theta (\varepsilon )\)-relative equilibrium, and vice versa. Thus, in what follows, we will use these notions interchangeably. When considering relative approximations and relative equilibria, we assume that the rewards are non-negative integers.

An alternative to relative approximations is to look for an approximation with an absolute error of \(\varepsilon \); this is achieved by a situation \((s_W^*, s_B^*)\) such that

$$\begin{aligned} \max _{s_W}\mu _v\left( \mathcal {G}\left( s_W,s_B^*\right) \right) \le \mu _v(\mathcal {G})+\varepsilon \quad \text {and} \quad \min _{s_B}\mu _v\left( \mathcal {G}\left( s_W^*,s_B\right) \right) \ge \mu _v(\mathcal {G})-\varepsilon . \end{aligned}$$
(3)

Similarly, for an \(\varepsilon \)-absolute equilibrium, we have the following condition:

$$\begin{aligned}&\max _{s_W}\mu _v\left( \mathcal {G}\left( s_W,s_B^*\right) \right) \le \mu _v\left( \mathcal {G}\left( s_W^*,s_B^*\right) \right) +\varepsilon \quad \text {and}\quad \min _{s_B}\mu _v\left( \mathcal {G}\left( s_W^*,s_B\right) \right) \nonumber \\&\ge \mu _v\left( \mathcal {G}\left( s_W^*,s_B^*\right) \right) -\varepsilon . \end{aligned}$$
(4)

Again, an \(\varepsilon \)-absolute approximation implies a \(2\varepsilon \)-absolute equilibrium, and vice versa. When considering absolute equilibria and absolute approximations, we assume that the rewards come from the interval \([-1,1]\).

A situation \((s_W^*,s_B^*)\) is called relatively \(\varepsilon \)-optimal, if satisfies (1), and it is called absolutely \(\varepsilon \)-optimal if it satisfies (3). In the following, we will drop the specification of absolute and relative if it is clear from the context. If the pair \((s_W^*,s_B^*)\) is (absolutely or relatively) \(\varepsilon \)-optimal for all starting positions, it is called uniformly (absolutely or relatively) \(\varepsilon \)-optimal (also called subgame perfect).

We note that, under the above assumptions, the notion of relative approximation is stronger. Indeed, consider a BWR-game \(\mathcal {G}\) with rewards in \([-1,1]\). A relatively \(\varepsilon \)-optimal situation \((s_W^*,s_B^*)\) of the game \(\hat{\mathcal {G}}\) with local rewards given by \(\hat{r} = r + \mathbf {1}\ge 0\) (where \(\mathbf {1}\) is the vector of all ones, and the addition and comparison is meant component-wise) satisfies

$$\begin{aligned} \max _{s_W}\mu _v\left( \mathcal {G}\left( s_W,s_B^*\right) \right)&=\max _{s_W}\mu _v(\hat{\mathcal {G}}(s_W,s_B^*))-1 \le (1+\varepsilon )\mu _v(\hat{\mathcal {G}})-1 \\&=\mu _v(\mathcal {G})+\varepsilon \mu _v(\mathcal {G})+\varepsilon \le \mu _v(\mathcal {G})+2\varepsilon \end{aligned}$$

and

$$\begin{aligned} \min _{s_B}\mu _v({\mathcal {G}}(s_W^*,s_B))&=\min _{s_B}\mu _v(\hat{\mathcal {G}}(s_W^*,s_B))-1 \ge (1-\varepsilon )\mu _v(\hat{\mathcal {G}})-1 \\&=\mu _v(\mathcal {G})-\varepsilon \mu _v(\mathcal {G})-\varepsilon \ge \mu _v(\mathcal {G})-2\varepsilon . \end{aligned}$$

This is because \(\mu _v(\hat{\mathcal {G}}(s))=\mu _v(\mathcal {G}(s))+1\) for any situation s and \(\mu _v(\mathcal {G})\le 1\). Thus, we obtain a \(2\varepsilon \)-absolute approximation for the value of the original game.

An algorithm for approximating (absolutely or relatively) the values of the game is said to be a fully polynomial-time (absolute or relative) approximation scheme (FPTAS) if the running-time depends polynomially on the input size and \(1/\varepsilon \). In what follows, we assume without loss of generality that \(1/\varepsilon \) is an integer.

1.3 Previous Results

BWR-games are an equivalent formulation [21] of the stochastic games with perfect information and mean payoff that were introduced in 1957 by Gillette [19]. As it was noticed already in [21], the BWR model generalizes a variety of games and problems: BWR-games without random positions (\(V_R = \emptyset \)) are called cyclic or mean payoff games [16, 17, 21, 33, 34]; we call these BW-games. If one of the sets \(V_B\) or \(V_W\) is empty, we obtain a Markov decision process for which polynomial-time algorithms are known [32]. If both are empty (\(V_B = V_W = \emptyset \)), we get a weighted Markov chain. If \(V=V_W\) or \(V=V_B\), we obtain the minimum mean-weight cycle problem, which can be solved in polynomial time [27].

If all rewards are 0 except for m terminal loops, we obtain the so-called Backgammon-like or stochastic terminal payoff games [7]. The special case \(m=1\), in which every random node has only two outgoing arcs with probability 1 / 2 each, defines the so-called simple stochastic games (SSGs), introduced by Condon [13, 14]. In these games, the objective of White is to maximize the probability of reaching the terminal, while Black wants to minimize this probability. Recently, it has been shown that Gillette games (and hence BWR-games [3]) are equivalent to SSGs under polynomial-time reductions [1]. Thus, by recent results of Halman [22], all these games can be solved in randomized strongly subexponential time \(2^{O(\sqrt{n_d\log n_d})}{{\mathrm{{\text {poly}}}}}(|V|)\), where \(n_d=|V_B|+|V_W|\) is the number of deterministic positions.

Besides their many applications [26, 30], all these games are of interest to complexity theory: The decision problem “whether the value of a BW-game is positive” is in the intersection of NP and co-NP [28, 40]; yet, no polynomial algorithm is known even in this special case. We refer to Vorobyov [39] for a survey. A similar complexity claim holds for SSGs and BWR-games [1, 3]. On the other hand, there exist algorithms that solve BW-games in practice very fast [21]. The situation for these games is thus comparable to linear programming before the discovery of the ellipsoid method: linear programming was known to lie in the intersection of NP and co-NP, and the simplex method proved to be fast in practice. In fact, a polynomial algorithm for linear programming in the unit cost model would already imply a polynomial algorithm for BW-games [37]; see also [4] for an extension to BWR-games.

While there are numerous pseudo-polynomial algorithms known for BW-games [21, 35, 40], pseudo-polynomiality for BWR-games (with no restriction on the number of random positions) is in fact equivalent to polynomiality [1]. Gimbert and Horn [20] have shown that a generalization of simple stochastic games on k random positions having arbitrary transition probabilities [not necessarily (1 / 2, 1 / 2)] can be solved in time \(O(k!(|V||E|+L))\), where L is the maximum bit length of a transition probability. There are various improvements with smaller dependence on k [9, 15, 20, 23] (note that even though BWR-games are polynomially reducible to simple stochastic games, under this reduction the number of random positions does not stay constant, but is only polynomially bounded in n, even if the original BWR-game had a constant number of random positions). Recently, a pseudo-polynomial algorithm was given for BWR-games with a constant number of random positions and polynomial common denominator of transition probabilities, but under the assumption that the game is ergodic (that is, the value does not depend on the initial position) [5]. Then, this result was extended for the non-ergodic case [6]; see also [4].

As for approximation schemes, the only result we are aware [36] of is the observation that the values of BW-games can be approximated within an absolute error of \(\varepsilon \) in polynomial-time, if all rewards are in the range \([-1,1]\). This follows immediately from truncating the rewards and using any of the known pseudo-polynomial algorithms [21, 35, 40].

On the negative side, it was observed recently [18] that obtaining an \(\varepsilon \)-absolute FPTAS without the assumption that all rewards are in \([-1,1]\), or an \(\varepsilon \)-relative FPTAS without the assumption that all rewards are non-negative, for BW-games, would imply their polynomial time solvability. In that sense, our results below are the best possible unless there is a polynomial algorithm for solving BW-games.

1.4 Our Results

In this paper, we extend the absolute FPTAS for BW-games [36] in two directions. First, we allow a constant number of random positions, and, second, we derive an FPTAS with a relative approximation error. Throughout the paper, we assume the availability of a pseudo-polynomial algorithm \(\mathbb {A}\) that solves any BWR-game \(\mathcal {G}\) with integral rewards and rational transition probabilities in time polynomial in n, D, and R, where \(n=n(\mathcal {G})\) is the total number of positions, \(R=R(\mathcal {G}):=r^+(\mathcal {G})-r^-(\mathcal {G})\) is the size of the range of the rewards, \(r^+(\mathcal {G})=\max _e r_e\) and \(r^-(\mathcal {G})=\min _e r_e\), and \(D=D(\mathcal {G})\) is the common denominator of the transition probabilities. Note that the dependence on D is inherent in all known pseudo-polynomial algorithms for BWR-games. Note also that the affine scaling of the rewards does not change the game.

Let \(p_{\min }=p_{\min }(\mathcal {G})\) be the minimum positive transition probability in the game \(\mathcal {G}\). Throughout this paper, we will assume that the number k of random positions is bounded by a constant.

The following theorem says that a pseudo-polynomial algorithm can be turned into an absolute approximation scheme.

Theorem 1

Given a pseudo-polynomial algorithm for solving any BWR-game with \(k=O(1)\) (in uniformly optimal strategies), there is an algorithm that returns, for any given BWR-game with rewards in \([-1,1]\), \(k=O(1)\), and for any \(\varepsilon > 0\), a pair of strategies that (uniformly) approximates the value within an absolute error of \(\varepsilon \). The running-time of the algorithm is bounded by \({{\mathrm{{\text {poly}}}}}(n, 1/p_{\min }, 1/\varepsilon )\) [assuming \(k=O(1)\)].

We also obtain an approximation scheme with a relative error.

Theorem 2

Given a pseudo-polynomial algorithm for solving any BWR-game with \(k=O(1)\), there is an algorithm that returns, for any given BWR-game with non-negative integral rewards, \(k=O(1)\), and for any \(\varepsilon > 0\), a pair of strategies that approximates the value within a relative error of \(\varepsilon \). The running-time of the algorithm is bounded by \({{\mathrm{{\text {poly}}}}}(n,1/p_{\min },\log R, 1/\varepsilon )\) [assuming \(k=O(1)\)].

We remark that Theorem 1 (apart from the dependence of the running time on \(\log R\)) can be obtained from Theorem 2 (see Sect. 2). However, our reduction in Theorem 1, unlike Theorem 2, has the property that if the pseudo-polynomial algorithm returns uniformly optimal strategies, then the approximation scheme also returns uniformly \(\varepsilon \)-optimal strategies. For BW-games, i.e., the special case without random positions, we can also strengthen the result of Theorem 2 to return a pair of strategies that is uniformly \(\varepsilon \)-optimal.

Theorem 3

Assume that there is a pseudo-polynomial algorithm for solving any BW-game in uniformly optimal strategies. Then for any \(\varepsilon > 0\), there is an algorithm that returns, for any given BW-game with non-negative integral rewards, a pair of uniformly relatively \(\varepsilon \)-optimal strategies. The running-time of the algorithm is bounded by \({{\mathrm{{\text {poly}}}}}(n,\log R, 1/\varepsilon )\).

In deriving these approximation schemes from a pseudo-polynomial algorithm, we face two main technical challenges that distinguish the computation of \(\varepsilon \)-equilibria of BWR-games from similar standard techniques used in combinatorial optimization. First, the running-time of the pseudo-polynomial algorithm depends polynomially both on the maximum reward and the common denominator D of the transition probabilities. Thus, in order to obtain a fully polynomial-time approximation scheme (FPTAS) with an absolute guarantee whose running-time is independent of D, we have to truncate the probabilities and bound the change in the game value, which is a non-linear function of D. Second, in order to obtain an FPTAS with a relative guarantee, one needs (as often in optimization) a (trivial) lower/upper bound on the optimum value. In the case of BWR-games, it is not clear what bound we can use, since the game value can be arbitrarily small. The situation becomes even more complicated if we look for uniformly \(\varepsilon \)-optimal strategies. This is because we have to output just a single pair of strategies that guarantees \(\varepsilon \)-optimality from any starting position.

In order to resolve the first issue, we analyze the change in the game values and optimal strategies if the rewards or transition probabilities are changed. Roughly speaking, we use results from Markov chain perturbation theory to show that if the probabilities are perturbed by a small error \(\delta \), then the change in the game value is \(O(\delta n^2/p_{\min }^{2k})\) (see Sect. 2.1). It is worth mentioning that a somewhat related result was obtained recently for the class of so-called almost-sure ergodic games (not necessarily with perfect information) [10]. More precisely, it was shown that for this class of games there is an \(\varepsilon \)-optimal strategy with rational representation with denominator \(D=O(\frac{n^3}{\varepsilon p_{\min }^{k}})\) [10]. The second issue is resolved through repeated applications of the pseudo-polynomial algorithm on a truncated game. After each such application we have one of the following situations: either the value of the game has already been approximated within the required accuracy or it is guaranteed that the range of the rewards can be shrunk by a constant factor without changing the value of the game (see Sects. 2.32.4).

Since BWR-games with a constant number of random positions admit a pseudo-polynomial algorithm, as was recently shown [5, 6], we obtain the following results.

Corollary 1

  1. (i)

    There is an FPTAS that solves, within an absolute error guarantee, in uniformly \(\varepsilon \)-optimal strategies, any BWR-game with a constant number of random positions, \(1/p_{\min }={{\mathrm{{\text {poly}}}}}(n)\), and rewards in \([-1,1]\).

  2. (ii)

    There is an FPTAS that solves, within a relative error guarantee, in \(\varepsilon \)-optimal strategies, any BWR-game with a constant number of random positions, \(1/p_{\min }={{\mathrm{{\text {poly}}}}}(n)\), and non-negative rational rewards.

  3. (iii)

    There is an FPTAS that solves, within a relative error guarantee, in uniformly \(\varepsilon \)-optimal strategies, any BW-game with non-negative (rational) rewards.

The proofs of Theorems 12, and 3 will be given in Sects. 2.22.3, and 2.4, respectively.

2 Approximation Schemes

2.1 The Effect of Perturbation

Our approximation schemes are based on the following three lemmas. The first one (which is known) says that a linear change in the rewards corresponds to a linear change in the game value. In our approximation schemes, we truncate and scale the rewards to be able to run the pseudo-polynomial algorithm in polynomial time. We need the lemma to bound the error in the game value resulting from the truncation.

Lemma 1

Let \(\mathcal {G}=(G=(V,E),P,r)\) be a BWR-game. Let \(\theta _1,\gamma _1,\theta _2,\gamma _2\) be constants such that \(\theta _1,\theta _2>0\). Let \(\hat{\mathcal {G}}\) be a game \((G=(V,E),P,\hat{r})\) with \(\theta _1 r_e+\gamma _1\mathbf {1}\le \hat{r}_e\le \theta _2 r_e +\gamma _2\mathbf {1}\), for all \(e\in E\). Then for any \(v\in V\), we have \(\theta _1 \mu _v(\mathcal {G})+\gamma _1 \le \mu _v(\hat{\mathcal {G}})\le \theta _2 \mu _v(\mathcal {G}) +\gamma _2\). Moreover, if \((\hat{s}_W,\hat{s}_B)\) is an absolutely \(\varepsilon \)-optimal situation in \((\hat{\mathcal {G}},v)\), then

$$\begin{aligned} \max _{s_W}\mu _v\left( \mathcal {G}\left( s_W,\hat{s}_B\right) \right)&\le \frac{\theta _2\mu _v(\mathcal {G})+\gamma _2-\gamma _1+\varepsilon }{\theta _1} \quad \text {and} \nonumber \\ \min _{s_B}\mu _v\left( \mathcal {G}\left( \hat{s}_W,s_B\right) \right)&\ge \frac{\theta _1\mu _v(\mathcal {G})+\gamma _1-\gamma _2-\varepsilon }{\theta _2}. \end{aligned}$$
(5)

Proof

This uses only standard techniques, and we give the proof only for completeness. Let \((s_W^*,s_B^*)\) and \((\hat{s}_W,\hat{s}_B)\) be pairs of optimal strategies for \((\mathcal {G},v)\) and \((\hat{\mathcal {G}},v)\), respectively. Denote by \(\rho ^*,\hat{\rho },\rho ^{\prime },\) and \(\rho ^{\prime \prime }\) the (arc) limiting distributions for the Markov chains starting from \(v_0\) and corresponding to pairs \((s_W^*,s_B^*)\), \((\hat{s}_W,\hat{s}_B)\), \((s_W^*,\hat{s}_B)\), and \((\hat{s}_W,s_B^*)\), respectively. By the definition of optimal strategies and the facts that \(\Vert \rho ^{\prime }\Vert _1= \Vert \rho ^{\prime \prime }\Vert _1 =1\) (because they are probability distributions), we have the following series of inequalities:

$$\begin{aligned} \mu _v(\hat{\mathcal {G}})&=(\hat{\rho })^T\hat{r}\ge (\rho ^{\prime })^T\hat{r}\ge \theta _1(\rho ^{\prime })^T r+\gamma _1\ge \theta _1 (\rho ^*)^Tr+\gamma _1=\theta _1\mu _v(\mathcal {G})+\gamma _1 \quad \text {and}\\ \mu _v(\hat{\mathcal {G}})&=(\hat{\rho })^T\hat{r}\le (\rho ^{\prime \prime })^T\hat{r}\le \theta _2 (\rho ^{\prime \prime })^Tr+\gamma _2\le \theta _2 (\rho ^*)^Tr+\gamma _2=\theta _2\mu _v(\mathcal {G})+\gamma _2. \end{aligned}$$

To see the first bound in (5), note that for any \(s_W\), we have \(\mu _v(\mathcal {G}(s_W,\hat{s}_B))\le \frac{1}{\theta _1}(\mu _v(\hat{\mathcal {G}}(s_W,\hat{s}_B))-\gamma _1)\). Also, by the \(\varepsilon \)-optimality of \(\hat{s}_W\) in \((\hat{\mathcal {G}},v)\), we have \(\mu _v(\hat{\mathcal {G}}(s_W,\hat{s}_B))\le {\mu _v(\hat{\mathcal {G}})}+\varepsilon \le \theta _2\mu _v(\mathcal {G})+\gamma _2+\varepsilon \). The first bound in (5) follows. The second bound can be shown similarly. \(\square \)

The second lemma, which is new as far as we are aware, states that if we truncate the transition probabilities within a small error \(\varepsilon \), then the change in the game value is bounded by \(O(\varepsilon ^2 n^3/p_{\min }^{2k})\). More precisely, for a BWR-game \(\mathcal {G}\) and a constant \(\varepsilon >0\), define

$$\begin{aligned} \delta \left( \mathcal {G},\varepsilon \right) := \left( \frac{\varepsilon n^2}{2} \left( \frac{p_{\min }}{2}\right) ^{-k} \left( \varepsilon nk(k+1)\left( \frac{p_{\min }}{2}\right) ^{-k}+3k+1\right) +\varepsilon n\right) r^*, \end{aligned}$$
(6)

where \(n =n(\mathcal {G})\), \(p_{\min }= p_{\min }(\mathcal {G})\), \(k=k(\mathcal {G})\), and \(r^*=r^*(\mathcal {G}):=\max \{|r^+(\mathcal {G})|,|r^-(\mathcal {G})|\}\).

Lemma 2

Let \(\mathcal {G}=(G=(V,E),P,r)\) be a BWR-game with \(r\in [-1,1]^{E}\), and let \(\varepsilon \le p_{\min }/2 =p_{\min }(\mathcal {G})/2\) be a positive constant. Let \(\hat{\mathcal {G}}\) be a game \((G=(V,E),\hat{P},r)\) with \(\Vert P-\hat{P} \Vert _{\infty }\le \varepsilon \) (and \(\hat{p}_{uv}=0\) if \(p_{uv}=0\) for all arcs (uv)). Then we have \(|\mu _v(\mathcal {G})-\mu _v(\hat{\mathcal {G}})|\le \delta (\mathcal {G},\varepsilon )\) for any \(v\in V\). Moreover, if the pair \((\tilde{s}_W,\tilde{s}_B)\) is absolutely \(\varepsilon ^{\prime }\)-optimal in \((\hat{\mathcal {G}},v)\), then it is absolutely \((\varepsilon ^{\prime }+2\delta (\mathcal {G},\varepsilon ))\)-optimal in \((\mathcal {G},v)\).

Proof

We apply Lemma 10. Let \((s_W^*,s_B^*)\) and \((\hat{s}_W,\hat{s}_B)\) be pairs of optimal strategies for \((\mathcal {G},v)\) and \((\hat{\mathcal {G}},v)\), respectively. Write \(\delta =\delta (\mathcal {G},\varepsilon )\). Then optimality and Lemma 10 imply the following two series of inequalities:

$$\begin{aligned} \mu _v(\hat{\mathcal {G}})&=\mu _v(\hat{\mathcal {G}}(\hat{s}_W,\hat{s}_B))\ge \mu _v(\hat{\mathcal {G}}(s_W^*,\hat{s}_B)) \\&\ge \mu _v\left( \mathcal {G}\left( s_W^*,\hat{s}_B\right) \right) -\delta \ge \mu _v\left( \mathcal {G}\left( s_W^*,s_B^*\right) \right) - \delta =\mu _v(\mathcal {G})-\delta \quad \text {and}\\ \mu _v(\hat{\mathcal {G}})&=\mu _v(\hat{\mathcal {G}}(\hat{s}_W,\hat{s}_B))\le \mu _v(\hat{\mathcal {G}}(\hat{s}_W,s_B^*)) \\&\le \mu _v\left( \mathcal {G}\left( \hat{s}_W,s_B^*\right) \right) +\delta \le \mu _v\left( \mathcal {G}\left( s_W^*,s_B^*\right) \right) + \delta =\mu _v(\mathcal {G})+\delta . \end{aligned}$$

To see the second claim, note that for any \(s_W\in S_W\), we have

$$\begin{aligned} \mu _v\left( \mathcal {G}\left( s_W,\tilde{s}_B\right) \right) {\le } \mu _v(\hat{\mathcal {G}}(s_W,\tilde{s}_B)){+}\delta \le \mu _v(\hat{\mathcal {G}}(\hat{s}_W,\hat{s}_B))+\varepsilon ^{\prime }+\delta \le \mu _v(\mathcal {G})+\varepsilon ^{\prime }+2\delta . \end{aligned}$$

Similarly, we can show that \(\mu _v(\mathcal {G}(\tilde{s}_W,s_B))\ge \mu _v(\mathcal {G})-\varepsilon ^{\prime }-2\delta \) for all \(s_B\in S_B\). \(\square \)

Since we assume that the running-time of the pseudo-polynomial algorithm for the original game \(\mathcal {G}\) depends on the common denominator D of the transition probabilities, we have to truncate the probabilities to remove this dependence on D. By Lemma 2, the value of the game does not change too much after such a truncation.

The third result that we need concerns relative approximation. The main idea is to use the pseudo-polynomial algorithm to test whether the value of the game is larger than a certain threshold. If it is, we get already a good relative approximation. Otherwise, the next lemma says that we can reduce all large rewards without changing the value of the game.

Lemma 3

Let \(\mathcal {G}=(G=(V,E),P,r)\) be a BWR-game with \(r\ge 0\), and let v be any vertex with \(\mu _v(\mathcal {G})< t\). Suppose that \(r_e\ge t^{\prime }=ntp_{\min }^{-(2k+1)}\) for some \(e\in E\). Let \(\hat{\mathcal {G}}=(G=(V,E),P,\hat{r})\), where \(\hat{r}_e=\min \{r_e,t^{\prime \prime }\}\), \(t^{\prime \prime }\ge (1+\varepsilon )t^{\prime }\) for some \(\varepsilon \ge 0\), and \(\hat{r}_{e^{\prime }}=r_{e^{\prime }}\) for all \(e^{\prime }\ne e\). Then \(\mu _v(\hat{\mathcal {G}})=\mu _v(\mathcal {G})\), and any relatively \(\varepsilon \)-optimal situation in \((\hat{\mathcal {G}},v)\) is also relatively \(\varepsilon \)-optimal in \((\mathcal {G},v)\).

Proof

We assume that \(\hat{r}_e=t^{\prime \prime }\ge (1+\varepsilon )t^{\prime }\), since otherwise there is nothing to prove. Let \(s^*=(s^*_W,s^*_B)\) be an optimal situation for \((\mathcal {G},v)\). This means that \(\mu _v(\mathcal {G})=\mu _v(\mathcal {G}(s^*))=\rho (s^*)^Tr< t\). Lemma 8 says that \(\rho _e(s^*)>0\) implies \(\rho _{e}(s^*)\ge p_{\min }^{2k+1}/n\). Hence, \(r_{e}\rho _e(s^*)\le \rho (s^*)^Tr=\mu _v(\mathcal {G})<t\) implies that \(r_{e}<t^{\prime }\), if \(\rho _e(s^*)>0\). We conclude that \(\rho _e(s^*)=0\), and hence \(\mu _v(\hat{\mathcal {G}}(s^*))=\mu _v(\mathcal {G})\).

Since \(\hat{r}\le r\), we have \(\mu _v(\hat{\mathcal {G}}(s))\le \mu _v(\mathcal {G}(s))\) for all situations s. In particular, for any \(s_W\in S_W\),

$$\begin{aligned} \mu _v(\hat{\mathcal {G}}(s_W,s_B^*))\le \mu _v\left( \mathcal {G}\left( s_W,s_B^*\right) \right) \le \mu _v\left( \mathcal {G}\left( s_W^*,s_B^*\right) \right) =\mu _v(\hat{\mathcal {G}}(s^*_W,s^*_B)). \end{aligned}$$

We claim that also \(\mu _v(\hat{\mathcal {G}}(s_W^*,s_B))\ge \mu _v(\hat{\mathcal {G}}(s^*_W,s^*_B))\) for all \(s_B\in S_B\). Indeed, if there is a strategy \(s_B\) for Black such that \(\mu _v(\hat{\mathcal {G}}(s_W^*,s_B))<\mu _v(\hat{\mathcal {G}}(s^*_W,s^*_B))=\mu _v(\mathcal {G})<t\), then, by the same argument as above, we must have \(\rho _e(s_W^*,s_B)=0\) (since \(\rho _e(s^*_W,s_B)(1+\varepsilon )t^{\prime }\le \rho _e(s^*_W,s_B)t^{\prime \prime }=\rho _e(s^*_W,s_B)\hat{r}_e\le \rho (s^*_W,s_B)^T\hat{r}=\mu _v(\hat{\mathcal {G}}(s_W^*,s_B))<t\)). This, however, implies that

$$\begin{aligned} \mu _v\left( \mathcal {G}\left( s_W^*,s_B\right) \right) =\mu _v(\hat{\mathcal {G}}(s_W^*,s_B))<\mu _v(\hat{\mathcal {G}}(s^*_W,s^*_B))=\mu _v\left( \mathcal {G}\left( s_W^*,s_B^*\right) \right) , \end{aligned}$$

which is in contradiction to the optimality of \(s^*\) in \(\mathcal {G}\). We conclude that \((s_W^*,s_B^*)\) is also optimal in \(\hat{\mathcal {G}}\) and hence \(\mu _v(\hat{\mathcal {G}})=\mu _v(\mathcal {G})\).

Suppose that \((\hat{s}_W,\hat{s}_B)\) is a relatively \(\varepsilon \)-optimal situation in \((\hat{\mathcal {G}},v)\). Then \(\rho _e(s_W,\hat{s}_B)=0\) for any \(s_W\in S_W\). Indeed,

$$\begin{aligned} \rho _e(s_W,\hat{s}_B)(1+\varepsilon )t^{\prime }&=\rho _e(s_W,\hat{s}_B)\hat{r}_e\le \rho (s_W,\hat{s}_B)^T\hat{r}=\mu _v(\hat{\mathcal {G}}(s_W,\hat{s}_B)) \\&\le (1+\varepsilon )\mu _v(\hat{\mathcal {G}})=(1+\varepsilon )\mu _v(\mathcal {G})<(1+\varepsilon )t, \end{aligned}$$

gives a contradiction with Lemma 8 if \(\rho _e(s_W,\hat{s}_B)>0\). It follows that, for any \(s_W\in S_W\), \(\mu _v(\mathcal {G}(s_W,\hat{s}_B))=\mu _v(\hat{\mathcal {G}}(s_W,\hat{s}_B))\le (1+\varepsilon )\mu _v(\mathcal {G})\). Furthermore, for any \(s_B\in S_B\),

$$\begin{aligned} \mu _v\left( \mathcal {G}\left( \hat{s}_W,s_B\right) \right) \ge \mu _v(\hat{\mathcal {G}}(\hat{s}_W,s_B))\ge (1-\varepsilon )\mu _v(\hat{\mathcal {G}})=(1-\varepsilon )\mu _v(\mathcal {G}). \end{aligned}$$

\(\square \)

2.2 Absolute Approximation

In this section, we assume that \(r^-=-1\) and \(r^+=1\), i.e., all rewards are from the interval \([-1,1]\). We may assume also that \(\varepsilon \in (0,1)\) and \(\frac{1}{\varepsilon }\in \mathbb {Z}_+\). We apply the pseudo-polynomial algorithm \(\mathbb {A}\) on a truncated game \(\tilde{\mathcal {G}}=(G=(V,E),\tilde{P},\tilde{r})\) defined by rounding the rewards to the nearest integer multiple of \(\varepsilon /4\) (denoted \(\tilde{r}:=\lfloor r\rceil _{\frac{\varepsilon }{4}}\)) and truncating the vector of probabilities \((p_{(v,u)})_{u \in V}\) for each random node \(v\in V_R\), as described in the following lemma.

Lemma 4

Let \(\alpha \in [0,1]^{n}\) with \(\Vert \alpha \Vert _{1} = 1\). Let \(B \in \mathbb {N}\) such that \(\min _{i:\alpha _i>0}\{\alpha _i\}>2^{-B}\). Then there exists \(\alpha ^{\prime } \in [0,1]^{n}\) such that

  1. (i)

    \(\Vert \alpha ^{\prime }\Vert _{1}=1\);

  2. (ii)

    for all \(i=1,\ldots ,n\), \(\alpha ^{\prime }_i = c_i/2^{B}\) where \(c_i \in \mathbb {N}\) is an integer;

  3. (iii)

    for all \(i=1,\ldots ,n\), \(\alpha ^{\prime }_i>0\) if and only \(\alpha _i>0\); and

  4. (iv)

    \(\Vert \alpha -\alpha ^{\prime }\Vert _{\infty } \le 2^{-B}\).

Proof

This is straight-forward, and we include the proof only for completeness. Without loss of generality, we assume \(\alpha _i>0\) for all i (set \(\alpha _i^{\prime }=0\) for all i such that \(\alpha _i=0\)). Initialize \(\varepsilon _0=0\) and iterate for \(i=1,\ldots ,n\): set \(\alpha _i^{\prime } =\lfloor \alpha _i+\varepsilon _{i-1}\rceil _{2^{-B}}\) and \(\varepsilon _{i} =\alpha _i+\varepsilon _{i-1}-\alpha _i^{\prime }\). The construction implies (4). Note that \(|\varepsilon _i|\le 2^{-(B+1)}\) for all i, and \(\varepsilon _n=\sum _i \alpha _i-\sum _i\alpha _i^{\prime }\), which implies (4). Furthermore, \(|\alpha _i-\alpha _i^{\prime }|=|\varepsilon _i-\varepsilon _{i-1}|\le 2^{-B}\), which implies (4). Note finally that (4) follows from (4) since \(\min _{i:\alpha _i>0}\{\alpha _i\}>2^{-B}\). \(\square \)

Lemma 5

Let \(\mathbb {A}\) be a pseudo-polynomial algorithm that solves, in (uniformly) optimal strategies, any BWR-game \(\mathcal {G}=(G,P,r)\) in time \(\tau (n,D,R)\). Then for any \(\varepsilon >0\), there is an algorithm that solves, in (uniformly) absolutely \(\varepsilon \)-optimal strategies, any given BWR-game \(\mathcal {G}=(G,P,r)\) in time bounded by \(\tau \bigl (n,\frac{2^{k+4}n^2(3k+1)}{\varepsilon p_{\min }^{k}},\frac{8}{\varepsilon }\bigr )\).

Proof

We apply \(\mathbb {A}\) to the game \(\tilde{\mathcal {G}}=(G,\tilde{P},\tilde{r})\), where \(\tilde{r}:=\frac{4}{\varepsilon }\lfloor r\rceil _{\frac{\varepsilon }{4}}\). The probabilities \(\tilde{P}\) are obtained from P by applying Lemma 4 with \(B=\lceil \log _2(1/\varepsilon ^{\prime })\rceil \), where we select \(\varepsilon ^{\prime }\) such that \(\delta (\mathcal {G},\varepsilon ^{\prime })\le \frac{\varepsilon }{4}\) [as defined by (6)]. It is easy to check that \(\delta (\mathcal {G},\varepsilon ^{\prime })\le \varepsilon /4\) for \(\varepsilon ^{\prime }=\frac{\varepsilon p_{\min }^{k}}{2^{k+3}n^2(3k+1)}\), as \(r^*=1\). Note that all rewards in \(\tilde{\mathcal {G}}\) are integers in the range \([-\frac{4}{\varepsilon },\frac{4}{\varepsilon }]\). Since \(D(\tilde{\mathcal {G}})=2^B\) and \(R(\tilde{\mathcal {G}})= 8/\varepsilon \), the statement about the running-time follows.

Let \(\tilde{s}\) be the pair of (uniformly) optimal strategies returned by \(\mathbb {A}\) on input \(\tilde{\mathcal {G}}\). Let \(\hat{\mathcal {G}}\) be the game \((G,\tilde{P},r)\). Since \(\Vert \tilde{r}-\frac{4}{\varepsilon }r\Vert _{\infty }\le 1\), we can apply Lemma 1 (with \(\hat{r}=\tilde{r}\), \(\theta _1=\theta _2=\frac{4}{\varepsilon }\) and \(\gamma _1=-\gamma _2=-1\)) to conclude that \(\tilde{s}\) is a (uniformly) absolutely \(\frac{\varepsilon }{2}\)-optimal pair for \(\hat{\mathcal {G}}\). Now we apply Lemma 2 and conclude that \(\tilde{s}\) is (uniformly) \((\frac{\varepsilon }{2}+2\delta (\mathcal {G},\varepsilon ^{\prime }))\)-optimal for \(\mathcal {G}\). \(\square \)

Note that the above technique yields an approximation algorithm with polynomial running-time only for \(k=O(1)\), even if the pseudo-polynomial algorithm \(\mathbb {A}\) works for arbitrary k.

2.3 Relative Approximation

Let \(\mathcal {G}=(G,P,r)\) be a BWR-game on G with non-negative integral rewards, that is, \(r^-=0\) and \(\min _{e: r_{e}>0}r_e \ge 1\). The algorithm is given as Algorithm 1. The main idea is to truncate the rewards, scaled by a certain factor of 1 / K, and use the pseudo-polynomial algorithm on the truncated game \(\hat{\mathcal {G}}\). If the value \(\mu _w(\hat{\mathcal {G}})\) in the truncated game from the starting node w is large enough (step 4), then we get a good relative approximation of the original value and we are done. Otherwise, the information that \(\mu _w(\hat{\mathcal {G}})\) is small allows us to reduce the maximum reward by a factor of 2 in the original game (step 9); we invoke Lemma 3 for this. Thus, the algorithm terminates in polynomial time (in the bit length of \(R(\mathcal {G})\)). To remove the dependence on D in the running-time, we need also to truncate the transition probabilities. In the algorithm, we denote by \(\tilde{P}\) the transition probabilities obtained from P by applying Lemma 4 with \(B=\lceil \log (1/\varepsilon ^{\prime })\rceil \), where we select \(\varepsilon ^{\prime }=\frac{p_{\min }^{2k}}{2^{k+1}n^2(k+2)^2\theta }\) with \(\theta =\theta (\mathcal {G}):=\frac{2(1+\varepsilon )(3+2\varepsilon )n}{\varepsilon p_{\min }^{2k+1}}\). Thus, we have \(2\delta (\mathcal {G},\varepsilon ^{\prime })\le 2^{k+1}\varepsilon ^{\prime } n^2(k+2)^2p_{\min }^{-2k}\le r^+(\mathcal {G})/\theta (\mathcal {G})=K(\mathcal {G})\).

figure a

Lemma 6

Let \(\mathbb {A}\) be a pseudo-polynomial algorithm that solves any BWR-game \((\mathcal {G}=(G,P,r),w)\), from any given starting position w, in time \(\tau (n,D,R)\). Then, for any \(\varepsilon \in (0,1)\), there is an algorithm that solves, in relatively \(\varepsilon \)-optimal strategies, any BWR-game \((\mathcal {G}=(G,P,r),w)\) from any given starting position w in time

$$\begin{aligned} O\left( \left( \tau \left( n,\frac{2^{k+3}n^3(k+2)^2(1+\varepsilon )(3+2\varepsilon )}{\varepsilon p_{\min }^{4k+1}},\frac{2(1+\varepsilon )(3+2\varepsilon )n}{\varepsilon p_{\min }^{2k+1}}\right) +{{\mathrm{{\text {poly}}}}}(n)\right) \cdot \log (R)\right) . \end{aligned}$$

Proof

The algorithm \({{\mathrm{{\text {FPTAS-BWR}}}}}(\mathcal {G},w,\varepsilon )\) is given as Algorithm 1. The bound on the running-time follows since, by step (9), each time we recurse on a game \(\tilde{\mathcal {G}}\) with \(r^+(\tilde{\mathcal {G}})\) reduced by a factor of at least half. Moreover, the rewards in the truncated game \(\hat{\mathcal {G}}\) are non-negative integers with a maximum value of \(r^+(\hat{\mathcal {G}})\le \theta \), and the smallest common denominator of the transition probabilities is at most \(\tilde{D}:=\frac{2}{\varepsilon ^{\prime }}\). Thus the time taken by algorithm \(\mathbb {A}\) for each recursive call is at most \(\tau \bigl (n,\tilde{D},\theta )\).

What remains to be done is to argue by induction (on \(r^+(\mathcal {G})\)) that the algorithm returns a pair \(\tilde{s}=(\tilde{s}_W,\tilde{s}_B)\) of \(\varepsilon \)-optimal strategies. For the base case, we have either \(r^+(\mathcal {G})\le 2\) or the value returned by the pseudo-polynomial \(\mathbb {A}\) satisfies \(\mu _w(\hat{\mathcal {G}})\ge 3/\varepsilon \). In the former case, note that since \(\Vert P-\tilde{P}\Vert _{\infty }\le \varepsilon ^{\prime }\) and \(r^+(\mathcal {G})\le 2\), Lemma 2 implies that the pair \(\tilde{s}=(\tilde{s}_W,\tilde{s}_B)\) returned in step 2 is absolutely \(\varepsilon ^{\prime \prime }\)-optimal, where \(\varepsilon ^{\prime \prime }=2\delta (\mathcal {G},\varepsilon ^{\prime })<\frac{\varepsilon p_{\min }^{2k+1}}{n}\). Lemma 8 and the integrality of the non-negative rewards imply that, for any situation s, \(\mu _w(\mathcal {G}(s))\ge p_{\min }^{2k+1}/n\) if \(\mu _w(\mathcal {G}(s))>0\). Thus, if \(\mu _w(\mathcal {G})>0\), then \(\varepsilon ^{\prime \prime }\le \varepsilon \mu _w(\mathcal {G})\), and it follows that \((\tilde{s}_W,\tilde{s}_B)\) is relatively \(\varepsilon \)-optimal. On the other hand, if \(\mu _w(\mathcal {G})=0\), then \(\mu _w(\mathcal {G}(\tilde{s}))\le \mu _w(\mathcal {G})+\varepsilon ^{\prime \prime }< p_{\min }^{2k+1}/n\), implying that \(\mu _w(\mathcal {G}(\tilde{s}))=0\). Thus, we get a relative \(\varepsilon \)-approximation in both cases.

Suppose now that \(\mathbb {A}\) determines that \(\mu _w(\hat{\mathcal {G}})\ge 3/\varepsilon \) in step 4, and hence the algorithm returns \((\tilde{s}_W,\tilde{s}_B)\). Note that \(\frac{1}{K} \cdot r_e- 1\le \hat{r}_e\le \frac{1}{K} \cdot r_e\) for all \(e\in E\), and \(\Vert P-\tilde{P}\Vert _{\infty }\le \varepsilon ^{\prime }\). Hence, by Lemmas 1 and 2, we have

$$\begin{aligned} K\mu _w(\hat{\mathcal {G}})-\delta \left( \mathcal {G},\varepsilon ^{\prime }\right) \le \mu _w(\mathcal {G})\le K\mu _w(\hat{\mathcal {G}})+K+\delta \left( \mathcal {G},\varepsilon ^{\prime }\right) , \end{aligned}$$
(7)

and the pair \((\tilde{s}_W,\tilde{s}_B)\) returned in step 5 is absolutely \(K+2\delta (\mathcal {G},\varepsilon ^{\prime })\le 2K\)-optimal for \(\mathcal {G}\). [To see (7), let \(\tilde{\mathcal {G}}:=(G,\tilde{P},r)\). Then by Lemma 2, we have

$$\begin{aligned} \mu _w(\tilde{\mathcal {G}})-\delta \left( \mathcal {G},\varepsilon ^{\prime }\right) \le \mu _w(\mathcal {G})\le \mu _w(\tilde{\mathcal {G}})+\delta \left( \mathcal {G},\varepsilon ^{\prime }\right) . \end{aligned}$$
(8)

Furthermore, as \(\hat{\mathcal {G}}yyy\) is obtained from \(\tilde{\mathcal {G}}\) by scaling and truncating the local rewards, we have by Lemma 1 (applied with \(\theta _1=\theta _2=\frac{1}{K}\), \(\gamma _1=-1\) and \(\gamma _2=0\)),

$$\begin{aligned} \frac{1}{K}\mu _w(\tilde{\mathcal {G}})-1\le \mu _w(\hat{\mathcal {G}}yyy)\le \frac{1}{K}\mu _w(\tilde{\mathcal {G}}). \end{aligned}$$
(9)

Combining (8) and (9), we get (7).]

Then (7) implies that

$$\begin{aligned} K\le \frac{\mu _w(\mathcal {G})}{\mu _w(\hat{\mathcal {G}})-\frac{1}{2}}\le \frac{\mu _w(\mathcal {G})}{3/\varepsilon -\frac{1}{2}}\le \frac{\varepsilon }{2}\mu (\mathcal {G}), \end{aligned}$$

and we are done.

On the other hand, if \(\mu _w(\hat{\mathcal {G}})< 3/\varepsilon \) then, by (7), \(\mu _w(\mathcal {G})<\frac{K(3+2\varepsilon )}{\varepsilon }=\frac{p_{\min }^{2k+1}r^+}{2(1+\varepsilon )n}\). By Lemma 3, applied with \(t= K(3+2\varepsilon )/\varepsilon \), the game \(\tilde{\mathcal {G}}\) defined in step 11 satisfies \(\mu _w(\mathcal {G})=\mu _w(\tilde{\mathcal {G}})\), and any (relatively) \(\varepsilon \)-optimal strategy in \((\tilde{\mathcal {G}},w)\) (in particular the one returned by induction in step 11) is also \(\varepsilon \)-optimal for \((\mathcal {G},w)\). \(\square \)

Note that the running-time in the above lemma simplifies to \({{\mathrm{{\text {poly}}}}}(n, 1/\varepsilon , 1/p_{\min }) \cdot \log R\) for \(k = O(1)\).

2.4 Uniformly Relative Approximation for BW-Games

The FPTAS in Theorem 6 does not necessarily return a uniformly \(\varepsilon \)-optimal situation, even if the given pseudo-polynomial algorithm \(\mathbb {A}\) provides a uniformly optimal solution. For BW-games, we can modify this FPTAS to return a uniformly \(\varepsilon \)-optimal situation. The algorithm is given as Algorithm 2. The main difference is that when we recurse on a game with reduced rewards (step 11), we also have to delete all positions that have large values \(\mu _v(\tilde{\mathcal {G}})\) in the truncated game. This is similar to the approach used to decompose a BW-game into ergodic classes [21]. However, the main technical difficulty is that, with approximate equilibria, White or Black might still have some incentive to move to a lower- or higher-value class, respectively, since the values obtained are just approximations of the optimal values. We show that such a move will not be much profitable for either White nor for Black. Recall that we assume that the rewards are non-negative integers.

figure b

Lemma 7

Let \(\mathbb {A}\) be a pseudo-polynomial algorithm that solves, in uniformly optimal strategies, any BW-game \(\mathcal {G}\) in time \(\tau (n,R)\). Then for any \(\varepsilon >0\), there is an algorithm that solves, in uniformly relatively \(\varepsilon \)-optimal strategies, any BW-game \(\mathcal {G}\), in time \(O\bigl (\bigl (\tau \bigl (n,\frac{2(1+\varepsilon ^{\prime })^2n}{\varepsilon ^{\prime }}\bigr )+{{\mathrm{{\text {poly}}}}}(n)\bigr ) \cdot h\bigr )\), where \(h=\lceil \log R\rceil +1\), and \(\varepsilon ^{\prime }=\frac{\ln (1+\varepsilon )}{3h}\approx \frac{\varepsilon }{3h}\).

Proof

The algorithm \({{\mathrm{{\text {FPTAS-BW}}}}}(\mathcal {G},\varepsilon )\) is given as Algorithm 2. The bound on the running-time is obvious: in step (9), each time we recurse on a game \(\tilde{\mathcal {G}}\) with \(r^+(\tilde{\mathcal {G}})\) reduced by a factor of at least half. Moreover, the rewards in the truncated game \(\hat{\mathcal {G}}\) are integral with a maximum value of \(r^+(\hat{\mathcal {G}})\le \frac{r^+(\mathcal {G})}{K}\le \frac{2(1+\varepsilon ^{\prime })^2n}{\varepsilon ^{\prime }}\). Thus, the time that algorithm \(\mathbb {A}\) needs in each recursive call is bounded from above by \(\tau \bigl (n,\frac{2(1+\varepsilon ^{\prime })^2n}{\varepsilon ^{\prime }}\bigr )\).

So it remains to argue (by induction) that the algorithm returns a pair \((\tilde{s}_W,\tilde{s}_B)\) of (relatively) uniformly \(\varepsilon \)-optimal strategies. Let us index the different recursive calls of the algorithm by \(i=1,2,\ldots ,h^{\prime }\le h\) and denote by \(\mathcal {G}^{(i)}=(G^{(i)}=(V^{(i)},E^{(i)},r^{(i)})\) the game input to the ith recursive call of the algorithm (so \(\mathcal {G}^{(1)}=\mathcal {G}\)) and by \(\hat{s}^{(i)}=(\hat{s}^{(i)}_W,\hat{s}^{(i)}_B)\), \(\tilde{s}^{(i)}=(\tilde{s}^{(i)}_W,\tilde{s}^{(i)}_B)\) the pair of strategies returned either in steps 2, 4, 5, or 11. Similarly, we denote by \(V^{(i)}=V_W^{(i)}\cup V_B^{(i)}\), \(U^{(i)}\), \(r^{(i)}\), \(K^{(i)}\) \(\hat{r}^{(i)}\), \(\hat{\mathcal {G}}^{(i)}\), \(\tilde{\mathcal {G}}^{(i)}\) the instantiations of V, \(V_B\), \(V_W,\) U, r, \(\hat{r}\), \(\hat{\mathcal {G}}\), K, \(\tilde{\mathcal {G}}\), respectively, in the ith call of the algorithm. We denote by \(S_W^{(i)}\) and \(S_B^{(i)}\) the set of strategies in \(\mathcal {G}^{(i)}\) for White and Black, respectively. For a set U of positions, a game \(\mathcal {G}\), and a situation s, we denote by \(\mathcal {G}[U]=(G[U],r)\) and s[U], respectively, the game and situation induced on U. \(\square \)

Claim 1

  1. (i)

    There does not exist an edge \((v,u)\in E\) such that \(v\in V_B^{(i)}\cap U^{(i)}\) and \(u\in V^{(i)}{\setminus } U^{(i)}\).

  2. (ii)

    For all \(v \in V_W^{(i)}\cap U^{(i)}\), there exists a \(u\in U^{(i)}\) with \((v,u)\in E\).

  3. (i’)

    There does not exist an edge \((v,u)\in E\) such that \(v\in V_W^{(i)}{\setminus } U^{(i)}\) and \(u\in U^{(i)}\).

  4. (ii’)

    For all black positions \(v \in V_B^{(i)}{\setminus } U^{(i)}\), there exists a \(u\in V^{(i)}{\setminus } U^{(i)}\) such that \((v,u)\in E\).

  5. (iii)

    Let \(\hat{s}^{(i)}=(\hat{s}_W^{(i)},\hat{s}_B^{(i)})\) be the situation returned in step 4. Then, for all \(v\in U^{(i)}\), we have \(\hat{s}^{(i)}(v)\in U^{(i)}\), and, for all \(v\in V^{(i)}{\setminus } U^{(i)}\), we have \(\hat{s}^{(i)}(v)\in V^{(i)}{\setminus } U^{(i)}\).

Proof

By the optimality conditions in \(\hat{\mathcal {G}}^{(i)}\) (see, e.g., [21]), we have

  1. (I)

    \(\mu _v(\hat{\mathcal {G}}^{(i)})=\min \{\mu _u(\hat{\mathcal {G}}^{(i)}) \mid u\in V^{(i)}\) such that \((v,u)\in E\}\), for \(v\in V_B^{(i)}\), and

  2. (II)

    \(\mu _v(\hat{\mathcal {G}}^{(i)})=\max \{\mu _u(\hat{\mathcal {G}}^{(i)}) \mid u\in V^{(i)}\) such that \((v,u)\in E\}\), for any \(v\in V_W^{(i)}\).

(i) and (ii), together with the definition of \(U^{(i)}\), imply (i) and (ii), respectively. Similarly (i’) and (ii’) can be shown. The optimality conditions also imply that for all \(v\in V^{(i)}\), \(\mu _v(\hat{\mathcal {G}}^{(i)})=\mu _{\hat{s}^{(i)}(v)}(\hat{\mathcal {G}}^{(i)})\), which in turn implies (iii). \(\square \)

Note that Claim 1 implies that the game \(\mathcal {G}^{(i)}[V^{(i)}{\setminus } U^{(i)}]\) is well-defined since the graph \(G[V^{(i)}{\setminus } U^{(i)}]\) has no sinks. For a strategy \(s_W\) (and similarly for a strategy \(s_B\)) and a subset \(V^{\prime }\subseteq V\), we write \(S_W(V^{\prime })=\{s_W(u) \mid u\in V^{\prime }\}\). The following two claims state respectively that the values of the positions in \(U^{(i)}\) are well-approximated by the pseudo-polynomial algorithm and that these values are sufficiently larger than those in the residual set \(V^{(i)}{\setminus } U^{(i)}\).

Claim 2

For \(i=1,\ldots ,h^{\prime }\), let \(\hat{s}^{(i)}\) be the situation returned by the pseudo-polynomial algorithm on the game \(\hat{\mathcal {G}}^{(i)}\) in step 4. Then, for any \(w\in U^{(i)}\), we have

$$\begin{aligned} \max _{s_W\in S_W^{(i)}: s_W(U^{(i)}\cap V_W)\subseteq U^{(i)}}\mu _w\left( \mathcal {G}^{(i)}\left( s_W,\hat{s}_B^{(i)}\right) \right)&\le (1+\varepsilon ^{\prime })\mu _w\left( \mathcal {G}^{(i)}\right) \quad \text {and}\\ \min _{s_B\in S_B^{(i)}}\mu _w\left( \mathcal {G}^{(i)}\left( \hat{s}^{(i)}_W,s_B\right) \right)&\ge (1-\varepsilon ^{\prime })\mu _w\left( \mathcal {G}^{(i)}\right) . \end{aligned}$$

Proof

This follows from Lemma 1 by the uniform optimality of \(\hat{s}^{(i)}\) in \(\hat{\mathcal {G}}^{(i)}\) and the fact that \(\mu _w(\hat{\mathcal {G}}^{(i)})\ge 1/\varepsilon ^{\prime }\) for every \(w\in U^{(i)}.\) \(\square \)

Claim 3

For all \(u \in U^{(i)}\) and \(v\in V^{(i)}{\setminus } U^{(i)}\), we have \((1+\varepsilon ^{\prime })\mu _u(\mathcal {G}^{(i)})>\mu _v(\mathcal {G}^{(i)})\).

Proof

For \(u\in U^{(i)}, v\in V^{(i)}{\setminus } U^{(i)}\), we have \(\mu _u(\hat{\mathcal {G}}^{(i)})\ge 1/\varepsilon ^{\prime }\) and \(\mu _v(\hat{\mathcal {G}}^{(i)})< 1/\varepsilon ^{\prime }\). Thus, by Lemma 1,

$$\begin{aligned} \mu _v\left( \mathcal {G}^{(i)}\right)\le & {} K^{(i)}\mu _v\left( \hat{\mathcal {G}}^{(i)}\right) +K^{(i)}<\frac{K^{(i)}}{\varepsilon ^{\prime }}(1+\varepsilon ^{\prime })\le K^{(i)}\mu _u\left( \hat{\mathcal {G}}^{(i)}\right) \left( 1+\varepsilon ^{\prime }\right) \\\le & {} \mu _u\left( \mathcal {G}^{(i)}\right) \left( 1+\varepsilon ^{\prime }\right) . \end{aligned}$$

\(\square \)

We observe that the strategy \(\tilde{s}^{(i)}\), returned by the ith call to the algorithm, is determined as follows (c.f. steps 11 and 11): for \(w\in U^{(i)}\), \(\tilde{s}^{(i)}(w)=\hat{s}^{(i)}(w)\) is chosen by the solution of the game \(\hat{\mathcal {G}}^{(i)}\), and for \(w\in V^{(i)}{\setminus } U^{(i)}\), \(\tilde{s}^{(i)}(w)\) is determined by the (recursive) solution on the residual game \(\tilde{\mathcal {G}}^{(i)}=\mathcal {G}^{(i+1)}\). The following claim states that the value of any vertex \(u\in V^{(i)}{\setminus } U^{(i)}\) in the residual game is a good (relative) approximation of the value in the original game \(\mathcal {G}^{(i)}.\)

Claim 4

For all \(i=1,\ldots ,h^{\prime }\) and any \(u\in V^{(i)}{\setminus } U^{(i)}\), we have

$$\begin{aligned} \mu _u\left( \mathcal {G}^{(i)}\right) \le \mu _u\left( \mathcal {G}^{(i)}\left[ V^{(i)}{\setminus } U^{(i)}\right] \right) \le (1+2\varepsilon ^{\prime })\mu _u\left( \mathcal {G}^{(i)}\right) . \end{aligned}$$
(10)

Proof

Fix \(u\in V^{(i)}{\setminus } U^{(i)}\). Let \(s^*=(s_W^*,s_B^*)\) and \((\bar{s}_W,\bar{s}_B)\) be optimal situations in \((\mathcal {G}^{(i)},u)\) and \((\bar{\mathcal {G}}^{(i)},u):=(\mathcal {G}^{(i)}[V^{(i)}{\setminus } U^{(i)}],u)\), respectively. Let us extend \(\bar{s}\) to a situation in \(\mathcal {G}^{(i)}\) by setting \(\bar{s}(v)=\hat{s}^{(i)}(v)\) for all \(v\in U^{(i)}\), where \(\hat{s}\) is the situation returned in by the pseudo-polynomial algorithm step 4. Then, by Claim 2.4(i’), White has no way to escape to \(U^{(i)}\), or in other words, \(s_W^*(u^{\prime })\in V^{(i)}{\setminus } U^{(i)}\) for all \(u^{\prime }\in V_W^{(i)}{\setminus } U^{(i)}\). Hence,

$$\begin{aligned} \mu _u\left( \mathcal {G}^{(i)}\right)= & {} \mu _u\left( \mathcal {G}^{(i)}\left( s_W^*,s_B^*\right) \right) \le \mu _u\left( \mathcal {G}^{(i)}(s_W^*,\bar{s}_B)\right) \\= & {} \mu _u\left( \bar{\mathcal {G}}^{(i)}\left( s_W^*,\bar{s}_B\right) \right) \le \mu _u(\bar{\mathcal {G}}^{(i)}(\bar{s}_W,\bar{s}_B))=\mu _u(\bar{\mathcal {G}}^{(i)}). \end{aligned}$$

For similar reasons, \(\mu _u(\mathcal {G}^{(i)})\ge \mu _u(\bar{\mathcal {G}}^{(i)})\), if \(s_B^*(v)\in V^{(i)}{\setminus } U^{(i)}\) for all \(v\in V_B^{(i)}{\setminus } U^{(i)}\) such that v is reachable from u in the graph \(G(s_W^*,s_B^*)\). Suppose, on the other hand, that there is a \(v\in V_B^{(i)}{\setminus } U^{(i)}\) such that \(u^{\prime }=s_B^*(v)\in U^{(i)}\), and v is reachable from u in the graph \(G(s_W^*,s_B^*)\). Then (by Lemma 1) \(\mu _u(\mathcal {G}^{(i)})=\mu _{u^{\prime }}(\mathcal {G}^{(i)})\ge K^{(i)}\mu _{u^{\prime }}(\hat{\mathcal {G}}^{(i)})\ge \frac{K^{(i)}}{\varepsilon ^{\prime }}\). Moreover, the optimality of \((\hat{s}_W,\hat{s}_B)\) in \(\hat{\mathcal {G}}^{(i)}\) and the fact that \(\frac{1}{K^{(i)}}r^{(i)}- 1\le \hat{r}^{(i)}\le \frac{1}{K^{(i)}}r^{(i)}\) imply by Lemma 1 that

$$\begin{aligned} \forall s_W\in S_W^{(i)}: \mu _u\left( \mathcal {G}^{(i)}\left( \hat{s}_W,\hat{s}_B\right) \right)&\ge K^{(i)}\mu _u(\hat{\mathcal {G}}^{(i)}(\hat{s}_W,\hat{s}_B)) \nonumber \ge K^{(i)}\mu _u(\hat{\mathcal {G}}^{(i)}(s_W,\hat{s}_B))\nonumber \\&\ge \mu _u\left( \mathcal {G}^{(i)}\left( s_W,\hat{s}_B\right) \right) -K^{(i)} \\&\ge \mu _u\left( \mathcal {G}^{(i)}\left( s_W,\hat{s}_B\right) \right) -\varepsilon ^{\prime }\mu _u\left( \mathcal {G}^{(i)}\right) \end{aligned}$$

and

$$\begin{aligned} \forall s_B\in S_B^{(i)}: \mu _u\left( \mathcal {G}^{(i)}\left( \hat{s}_W,\hat{s}_B\right) \right)&\le K^{(i)}\mu _u(\hat{\mathcal {G}}^{(i)}(\hat{s}_W,\hat{s}_B))+K^{(i)} \\&\le K^{(i)}\mu _u(\hat{\mathcal {G}}^{(i)}(\hat{s}_W,s_B))+K^{(i)}\\&\le \mu _u\left( \mathcal {G}^{(i)}\left( \hat{s}_W,s_B\right) \right) +K^{(i)}\\&\le \mu _u\left( \mathcal {G}^{(i)}\left( \hat{s}_W,s_B\right) \right) +\varepsilon ^{\prime }\mu _u\left( \mathcal {G}^{(i)}\right) . \end{aligned}$$

In particular,

$$\begin{aligned} \mu _u\left( \mathcal {G}^{(i)}\right)= & {} \mu _u\left( \mathcal {G}^{(i)}\left( s_W^*,s_B^*\right) \right) \ge \mu _u\left( \mathcal {G}^{(i)}\left( \hat{s}_W,s_B^*\right) \right) \\&\ge \mu _u\left( \mathcal {G}^{(i)}\left( \hat{s}_W,\hat{s}_B\right) \right) -\varepsilon ^{\prime }\mu _u\left( \mathcal {G}^{(i)}\right) \\\ge & {} \mu _u\left( \mathcal {G}^{(i)}\left( \bar{s}_W,\hat{s}_B\right) \right) -2\varepsilon ^{\prime }\mu _u\left( \mathcal {G}^{(i)}\right) = \mu _u(\bar{\mathcal {G}}^{(i)}(\bar{s}_W,\hat{s}_B))-2\varepsilon ^{\prime }\mu _u\left( \mathcal {G}^{(i)}\right) \\\ge & {} \mu _u(\bar{\mathcal {G}}^{(i)}(\bar{s}_W,\bar{s}_B))-2\varepsilon ^{\prime }\mu _u\left( \mathcal {G}^{(i)}\right) =\mu _u(\bar{\mathcal {G}}^{(i)})- 2\varepsilon ^{\prime } \mu _u\left( \mathcal {G}^{(i)}\right) , \end{aligned}$$

where \(\mu _u(\mathcal {G}^{(i)}(\bar{s}_W,\hat{s}_B))=\mu _u(\bar{\mathcal {G}}^{(i)}(\bar{s}_W,\hat{s}_B))\) follows from Claim 1 (since \((\bar{s}_W,\hat{s}_B)(v)\in V^{(i)}{\setminus } U^{(i)}\)). It follows that \(\mu _u(\mathcal {G}^{(i)})\ge \frac{1}{1+2\varepsilon ^{\prime }}\mu _u(\bar{\mathcal {G}}^{(i)})\). \(\square \)

Let us fix \(\varepsilon _{h^{\prime }}=\varepsilon ^{\prime }\), and for \(i=h^{\prime }-1,h^{\prime }-2,\ldots ,1\), let us choose \(\varepsilon _i\) such that \(1+\varepsilon _i\ge (1+\varepsilon ^{\prime })(1+2\varepsilon ^{\prime })(1+\varepsilon _{i+1})\). Next, we claim that the strategies \((\tilde{s}_W^{(i)},\tilde{s}_B^{(i)})\) returned by the ith call of \({{\mathrm{{\text {FPTAS-BW}}}}}(\mathcal {G},\varepsilon )\) are relatively \(\varepsilon _i\)-optimal in \(\mathcal {G}^{(i)}\).

Claim 5

For all \(i=1,\ldots ,h^{\prime }\) and any \(w\in V^{(i)}\), we have

$$\begin{aligned} \max _{s_W\in S_W^{(i)}}\mu _w\left( \mathcal {G}^{(i)}\left( s_W,\tilde{s}_B^{(i)}\right) \right)&\le (1+\varepsilon _i)\mu _w\left( \mathcal {G}^{(i)}\right) \text { and}\end{aligned}$$
(11)
$$\begin{aligned} \min _{s_B\in S_B^{(i)}}\mu _w\left( \mathcal {G}^{(i)}\left( \tilde{s}^{(i)}_W,s_B\right) \right)&\ge (1-\varepsilon _i)\mu _w\left( \mathcal {G}^{(i)}\right) . \end{aligned}$$
(12)

Proof

The proof is by induction on \(i=h^{\prime },h^{\prime }-1,\ldots ,1\). For \(i=h^{\prime }\), the statement follows directly from Claim 1 since \(U^{(h^{\prime })}=V^{(h^{\prime })}\). So suppose that \(i<h^{\prime }\).

By induction, \(\bar{s}^{(i)}=(\bar{s_W^{(i)}},\bar{s}_B^{(i)}):=(\tilde{s}_W^{(i)},\tilde{s}_B^{(i)})[V^{(i)}{\setminus } U^{(i)}]\) is \(\varepsilon _{i+1}\)-optimal in \(\mathcal {G}^{(i+1)}=\tilde{\mathcal {G}}^{(i)}\). Recall that the game \(\tilde{\mathcal {G}}^{(i)}\) is obtained from \(\bar{\mathcal {G}}^{(i)}:=\mathcal {G}^{(i)}[V^{(i)}{\setminus } U^{(i)}]\) by reducing the rewards according to step 9. Thus, Lemma 3 yields that \(\mu _v(\bar{\mathcal {G}}^{(i)})=\mu _v(\tilde{\mathcal {G}}^{(i)})\), and hence,

$$\begin{aligned} \max _{s_W^{\prime }\in S_W^{(i+1)}}\mu _v(\bar{\mathcal {G}}^{(i)}(s_W^{\prime },\bar{s}_B^{(i)}))\le (1+\varepsilon _{i+1})\mu _v(\bar{\mathcal {G}}^{(i)})\end{aligned}$$
(13)
$$\begin{aligned} \min _{s_B^{\prime }\in S_B^{(i+1)}}\mu _v(\bar{\mathcal {G}}^{(i)}(\bar{s}^{(i)}_W,s_B^{\prime }))\ge (1-\varepsilon _{i+1})\mu _v(\bar{\mathcal {G}}^{(i)}). \end{aligned}$$
(14)

\(\square \)

Proof of (11): Consider an arbitrary strategy \(s_W\in S_W^{(i)}\) for White. Suppose first that \(w\in U^{(i)}.\) Note that, by Claim 1(iii), \(\tilde{s}_B^{(i)}(u)\in U^{(i)}\) for all \(u\in V_B\cap U^{(i)}.\) If also \(s_W(u)\in U^{(i)}\) for all \(u\in V_W\cap U^{(i)}\), such that u is reachable from w in the graph \(G(s_W,\tilde{s}_B^{(i)})\), then Claim 2 implies \(\mu _w(\mathcal {G}^{(i)}(s_W,\tilde{s}_B^{(i)}))\le (1+\varepsilon ^{\prime })\mu _w(\mathcal {G}^{(i)})\le (1+\varepsilon _i)\mu _w(\mathcal {G}^{(i)})\).

Suppose therefore that \(v=s_W(u)\not \in U^{(i)}\) for some \(u\in V_W\cap U^{(i)}\) such that u is reachable from w in the graph \(G(s_W,\tilde{s}_B^{(i)})\).

Note that \(\tilde{s}^{(i)}_B(v^{\prime })\in V^{(i)}{\setminus } U^{(i)}\) for all \(v^{\prime }\in V_B^{(i)}{\setminus } U^{(i)}\), and by Claim 1(i’), \(S_W^{(i+1)}\) is the restriction of \(S_W^{(i)}\) to \(V^{(i)}{\setminus } U^{(i)}\). Thus, we get the following series of inequalities:

$$\begin{aligned} \mu _w\left( \mathcal {G}^{(i)}\left( s_W,\tilde{s}_B^{(i)}\right) \right)&= \mu _v\left( \mathcal {G}^{(i)}\left( s_W,\tilde{s}_B^{(i)}\right) \right) \le (1+\varepsilon _{i+1})\mu _v\left( \mathcal {G}^{(i)}\left[ V^{(i)}{\setminus } U^{(i)}\right] \right) \end{aligned}$$
(15)
$$\begin{aligned}&\le (1+\varepsilon _{i+1})(1+2\varepsilon ^{\prime })\mu _v\left( \mathcal {G}^{(i)}\right) \\&< (1+\varepsilon _{i+1})(1+2\varepsilon ^{\prime })(1+\varepsilon ^{\prime })\mu _w\left( \mathcal {G}^{(i)}\right) \le (1+\varepsilon _i)\mu _w\left( \mathcal {G}^{(i)}\right) .\nonumber \end{aligned}$$
(16)

The equality holds since v is reachable from w in the graph \(G(s_W,\tilde{s}_B^{(i)})\); the first inequality holds by (13); the second inequality holds because of (10); the third one follows from Claim 3; the fourth inequality holds since \((1+\varepsilon _{i+1})(1+2\varepsilon ^{\prime })(1+\varepsilon ^{\prime })\le (1+\varepsilon _i)\).

If \(w\in V^{(i)}{\setminus } U^{(i)},\) then a similar argument as in (15) and (16) shows that \(\mu _w(\mathcal {G}^{(i)}(s_W,\tilde{s}_B^{(i)}))\le (1+\varepsilon _{i+1})(1+2\varepsilon ^{\prime })\mu _w(\mathcal {G}^{(i)})\le (1+\varepsilon _i)\mu _w(\mathcal {G}^{(i)})\). Thus, (11) follows.

Proof of (12): Consider an arbitrary strategy \(s_B\in S_B^{(i)}\) for Black. If \(w\in U^{(i)},\) then we have \(\mu _w(\mathcal {G}^{(i)}(\tilde{s}_W^{(i)},s_B))\ge (1-\varepsilon ^{\prime })\mu _w(\mathcal {G}^{(i)})\ge (1-\varepsilon _i)\mu _w(\mathcal {G}^{(i)})\) from Claims 1(i–iii), and \(\varepsilon _i\ge \varepsilon ^{\prime }\).

Suppose now that \(w\in V^{(i)}{\setminus } U^{(i)}\). If \(s_B(v)\in V^{(i)}{\setminus } U^{(i)}\) for all \(v\in V_B^{(i)}{\setminus } U^{(i)}\), then we get by (14) and (10) that \(\mu _w(\mathcal {G}^{(i)}(\tilde{s}^{(i)}_W,s_B))\ge (1-\varepsilon _{i+1})\mu _w(\mathcal {G}^{(i)})\ge (1-\varepsilon _{i})\mu _w(\mathcal {G}^{(i)})\). A similar situation holds if \(s_B(v)\in V^{(i)}{\setminus } U^{(i)}\) for all \(v\in V_B^{(i)}{\setminus } U^{(i)}\) such that v is reachable from w in the graph \(G(\tilde{s}_W^{(i)},s_B)\). So it remains to consider the case when there is a \(v\in V_B^{(i)}{\setminus } U^{(i)}\) such that \(u=s_B(v)\in U^{(i)}\), and v is reachable from w in the graph \(G(\tilde{s}_W^{(i)},s_B)\). Since Black has no escape from \(U^{(i)}\) in this case [by Claim 1(i)], Claims 2 and 3 yield

$$\begin{aligned} \mu _w\left( \mathcal {G}^{(i)}\left( \tilde{s}^{(i)}_W,s_B\right) \right)&=\mu _u\left( \mathcal {G}^{(i)}\left( \tilde{s}^{(i)}_W,s_B\right) \right) \ge (1-\varepsilon ^{\prime })\mu _u\left( \mathcal {G}^{(i)}\right) \\&>(1-\varepsilon ^{\prime })^2 \mu _w\left( \mathcal {G}^{(i)}\right) \ge (1-\varepsilon _i) \mu _w\left( \mathcal {G}^{(i)}\right) , \end{aligned}$$

where the last inequality follows from the fact that, for all \(i=1,\ldots ,h^{\prime }-1\), \( 1+\varepsilon _i\ge (1+2\varepsilon ^{\prime })(1+\varepsilon ^{\prime })^2\ge (1+\varepsilon ^{\prime })^3, \) and hence, \(1-\varepsilon _i\le 2-(1+\varepsilon ^{\prime })^3\le (1-\varepsilon ^{\prime })^2.\)

Finally, to finish the proof of Lemma 7, we set the \(\varepsilon _i\)’s and \(\varepsilon ^{\prime }\) such that \(\varepsilon _1=\bigl ((1+2\varepsilon ^{\prime })(1+\varepsilon ^{\prime })\bigr )^{h^{\prime }-1}(1+\varepsilon ^{\prime })-1\le \varepsilon \). Note that our choice of \(\varepsilon ^{\prime }=\frac{\ln (1+\varepsilon )}{3h}\) satisfies this as

$$\begin{aligned} \bigl ((1+2\varepsilon ^{\prime })(1+\varepsilon ^{\prime })\bigr )^{h^{\prime }-1}(1+\varepsilon ^{\prime })= & {} \frac{(1+2\varepsilon ^{\prime })^{h^{\prime }}(1+\varepsilon ^{\prime })^{h \prime }}{(1+2\varepsilon ^{\prime })}\\\le & {} \frac{e^{3h^{\prime }\varepsilon ^{\prime }}}{(1+2\varepsilon ^{\prime })} \le \frac{(1+\varepsilon )}{(1+2\varepsilon ^{\prime })}\le (1+\varepsilon ). \end{aligned}$$

3 Concluding Remarks

In this paper, we have shown that computing the game values of classes of stochastic mean payoff games with perfect information and a constant number of random positions admits approximation schemes, provided that the class of games at hand can be solved in pseudo-polynomial time.

To conclude this paper, let us raise a number of open questions:

  1. 1.

    First, in the conference version of this paper [2], we claimed that, up to some technical requirements, a pseudo-polynomial algorithm for a class of stochastic mean payoff games implies that this class has polynomial smoothed complexity (smoothed analysis is a paradigm to analyze algorithms with poor worst-case and good practical performance. Since its invention, it has been applied to a variety of algorithms and problems to explain their performance or complexity, respectively [31, 38]).

    However, the proof of this result is flawed. In particular, the proof of a lemma that is not contained in the proceedings version, but only in the accompanying technical report (Oberwolfach Preprints, OWP 2010-22, Lemma 4.3) is flawed. The reason for this is relatively simple: If we are just looking for an optimal solution, then we can show that the second-best solution is significantly worse than the best solution. For two-player games, where one player maximizes and the other player minimizes, we have an optimization problem for either player, given an optimal strategy of the other player. However, the optimal strategy of the other player depends on the random rewards of the edges. Thus, the two strategies are dependent. As a consequence, we cannot use the full randomness of the rewards to use an isolation lemma to compare the best and second-best response to the optimal strategy of the other player.

    Therefore, the question, whether stochastic mean payoff games have polynomial smoothed complexity, remains open.

  2. 2.

    In Sect. 2.3 we gave an approximation scheme that relatively approximates the value of a BWR-game from any starting position. If we apply this algorithm from different positions, we are likely to get two different relatively \(\varepsilon \)-optimal strategies. In Sect. 2.4 we have shown that a modification of the algorithm in Sect. 2.3 yields a uniformly relatively \(\varepsilon \)-optimal strategies when there are no random positions. It remains an interesting question whether this can be extended to BWR-games with a constant number of random positions.

  3. 3.

    Is it true that pseudo-polynomial solvability of a class of stochastic mean payoff games implies polynomial smoothed complexity? In particular, do mean payoff games have polynomial smoothed complexity?

  4. 4.

    Related to Question 3: is it possible to prove an isolation lemma for (classes of) stochastic mean payoff games? We believe that this is not possible and that different techniques are required to prove smoothed polynomial complexity of these games.

  5. 5.

    While stochastic mean payoff games include parity games as a special case, the probabilistic model that we used here does not make sense for parity games. However, parity games can be solved in quasi-polynomial time [8]. One wonders if they also have polynomial smoothed complexity under a reasonable probabilistic model.

  6. 6.

    Finally, let us remark that removing the assumption that k is constant in the above results remains a challenging open problem that seems to require totally new ideas. Another interesting question is whether stochastic mean payoff games with perfect information can be solved in parameterized pseudo-polynomial time with the number k of stochastic positions as the parameter?