On Completely Mixed Stochastic Games

In this paper, we consider a two-person finite state stochastic games with finite number of pure actions for both players in all the states. In particular, for a large number of results we also consider one-player controlled transition probability and show that if all the optimal strategies of the undiscounted stochastic game are completely mixed then for β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document} sufficiently close to 1; all the optimal strategies of β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}-discounted stochastic games are also completely mixed. A counterexample is provided to show that the converse is not true. Further, for single-player controlled completely mixed stochastic games if the individual payoff matrices are symmetric in each state, then we show that the individual matrix games are also completely mixed. For the non-zerosum single-player controlled stochastic game under some non-singularity conditions, we show that if the undiscounted game is completely mixed, then the Nash equilibrium is unique. For non-zerosum β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}-discounted stochastic games when Nash equilibrium exists, we provide equalizer rules for corresponding value of the game.


Introduction
Stochastic game (Zero-sum) was first introduced by Shaply ( [14]) where he showed the existence of the value of a stochastic game and existence of stationary optimal strategies for zero-sum, β-discounted stochastic games . Zero-sum undiscounted stochastic game was first introduced by Gillet ( [8]). He shows that, unlike β-discounted stochastic game, undiscounted game may not posses stationary optimal strategies. Blackwell and Ferguson ( [2]) studied Gillette's example and shows that though Gillette's game does not possess an stationary optimal for both the players, it has an ǫ-optimal behavioral strategy for player-1 and a stationary optimal strategy for player-2. In the same year Blackwell and Ferguson ( [2]) provide an example of undiscounted stochastic game which does not possess stationary optimal strategy for one of the players. Later Filar ([5]) introduce the completely mixed stochastic game (previously, completely mixed matrix game was defined by Kaplansky ([10]). Kaplansky's paper characterize completely mixed matrix games followed by Shapley's construction of 'Shapley matrix' ( [14]) builds a connection between the completely mixed matrix games and the completely mixed β-discounted stochastic games. See, for example, the recent paper [1]. Filar [5] extend some of the results of completely mixed matrix game to the completely mixed undiscounted stochastic game under the assumption of single player controlled transition probabilities.
Nonzero-sum versions of Shapley's stochastic games [14] with the discounted payoff criterion were first studied by Fink ([7]) and Takahashi ([16]). The theory of nonzero-sum stochastic games with the average payoffs per unit time for the players was first introduced by Rogers ( [13]) and later it was formally concidered by Sobel ([15]). They considered finite state spaces only and assumed that the transition probability matrices induced by any stationary strategies of the players are unichain. Parthasarathy and Raghavan ( [12]) first considered the single player controlled stochastic game. They ( [12]) have shown the existence of stationary strategy for nonzerosum game under the assumption of one player controlled transition. A constructive proof of their results is given in Nowak and Raghavan ([11]).
In this paper, we have shown that under the assumption of single player controlled transition, if the zero-sum undiscounted stochastic game is completely mixed then the β-discounted stochastic games are completely mixed for all β sufficiently close to 1. Counterexample for the converse of the previous result is also provided. Also, we have provided a necessary condition under which the individual matrix games are completely mixed. For the non-zerosum stochastic game, we have shown that under nominal assumption on the individual playoff matrices the undiscounted completely mixed stochastic game process an unique Nash equilibrium. We also have shown that for both undiscounted and β-discounted games if some Nash equilibrium (NE) of the stochastic game is completely mixed then NE of the game follows an equalizer rule. Some examples are provided to support our results. The paper ends with some open problems which are left unanswered.

Definition and preliminaries
For a finite state and finite action two players stochastic game, the game is played within two players (known as player-1 and player-2) and played in every day. Each day, the game will be in a specific state s and both players will play a matrix game R(s) (specified with the state s) and will get some reward, (which will add up to zero for the zero-sum stochastic game) which can be positive, negative or zero. The game will move to a new state in the next day and continue indefinitely.
Throughout the rest of the paper unless mentioned otherwise we will be assuming game means two player controlled stochastic game. Definition 2.1 (Two-person finite stochastic game). A two person finite stochastic game can think of a 6 tuple G = (S, A 1 , A 2 , r 1 , r 2 , q). The game is played between player-1 and player-2. S is the set of all states in the stochastic game. as state space is finite wlog S = {s 1 , s 2 , · · · , s K }. A 1 and A 2 be the set of all pure actions available for player-1 and player-2 respectively. That is, in state s ∈ S player i for i = {1, 2} has pure actions (finite) A i (s) = {1, 2, · · · , m i }. r 1 and r 2 are the reward functions for respective players and q is the transition probability function. If in state s ∈ S, player-1 and player-2 choose pure actions i and j respectively, then the payoff for player-1 on that specific day is r 1 (s, i, j) and for player-2is r 2 (s, i, j). With probability q(s ′ |s, i, j) the game moves to state s ′ on the next day.
For the zero-sum stochastic game r 2 = −r 1 (for convenience assume r 1 = r). So the total gain of player-1 is preciously the total loss of player-2 on a specific day and vice-verse. Thus called zero-sum game.
The payoff matrices for player-1 and player-2 respectively in state s ∈ S is denoted as follows.
In general, the strategy for a player can depend on the whole history up to today, but we will only be considering those strategies which do not depend on the previous history (markov strategy), that is, if the game is in state s 0 ∈ S today then for different past history of reaching s 0 both the player's strategy will be exactly same. This history independent strategy is known as stationary strategy.
Definition 2.2 (Stationary strategy). Denote P A k , for k ∈ {1, 2} be the set of all probability distribution of player k's action space A k . Then stationary strategy of player k is a function from state space S to the space P A k . Definition 2.3 (Undiscounted payoffs for stochastic game). Denote, r be the expected immediate reward for player i at n th day if the game starts in state s 0 and, player-1 and player-2 choose stationary strategies f and g respectively.
If player-1 and player-2 play the strategy f and g, respectively then the undiscounted payoff player-1 and player-2 get in state s 0 ∈ S respectively is Φ 1 (f, g)(s 0 ) and Φ 2 (f, g)(s 0 ). Where, The 'undiscounted payoff' is also known as limiting average payoff.
Definition 2.4 (β-discounted payoffs for general-sum stochastic game). If player-1 and player-2 play strategies f and g respectively then for discount factor β ∈ [0, 1), the β-discounted payoff in state s 0 ∈ S to player-1 and player-2 respectively are I 1 β (f, g)(s 0 ) and I 2 β (f, g)(s 0 ). Where, 1 (f, g, s 0 ) and, , 2} is the expected immediate reward of player i on n th day in state s 0 , and player-1 and player-2 play the strategies f and g, respectively.
For the zero-sum stochastic game we have Definition 2.5 (Optimal strategy and value of the zero-sum stochastic game). A pair of stationary strategies (f 0 , g 0 ) is said to be an optimal strategy in the zero-sum undiscounted stochastic game if, for all f ∈ P 1 and g ∈ P 2 , where, Φ(f, g) = (Φ(f, g)(s 1 ), Φ(f, g)(s 2 ), · · · , Φ(f, g)(s K )) T . The value of the undiscounted stochastic game in state s ∈ S is denoted as v(s) = Φ(f 0 , g 0 )(s).
A pair of stationary strategies (f 0 , g 0 ) is said to be an optimal stationary strategy for the zero-sum β-discounted stochastic game if, for all f ∈ P A1 and for all g ∈ P A2 , where, I β (f, g) = (I β (f, g)(s 1 ), I β (f, g)(s 2 ), · · · , I β (f, g)(s K )) T . The value of the β-discounted stochastic game in state s ∈ S is denoted as v β (s) = I β (f 0 , g 0 )(s).
Call f 0 (g 0 ) optimal for player-1 (2) for all g (for all f ) and s ∈ S. Analogous definition holds for undiscounted stochastic game. The value of a zero-sum stochastic game (both discounted and undiscounted) is always unique, whereas the optimal strategy of a stochastic game may or may not be unique.
Definition 2.6 (Nash Equilibrium). A pair of stationary strategy (f 0 , g 0 ) for player-1 and player-2 respectively is said to be Nash Equilibrium (NE) in the non-zerosum undiscounted stochastic game if, , 2} (Assuming both player-1 and player-2 wants to maximize their expected payoffs). The value of the game associated with the NE (f 0 , g 0 ) in state s ∈ S is given by v k f 0 ,g 0 (s) = Φ k (f 0 , g 0 )(s) for k ∈ {1, 2}. For non-zerosum undiscounted stochastic games the value of the game is not unique. It changes with the NE.
A pair of strategies (f 0 , g 0 ) for player-1 and player-1 respectively is said to be a Nash Equilibrium (NE) in the β-discounted stochastic game if I 1 β (f 0 , g 0 ) ≥ I 1 β (f, g 0 ) coordinate-wise, for all f ∈ P A1 and, where, I k β (f, g) = (I k β (f, g)(s 1 ), I k β (f, g)(s 2 ), · · · , I k β (f, g)(s K )) T for k ∈ {1, 2} (Assuming both player-1 and player-2 wants to maximize their expected payoff). The value of the β-discounted game associated with the NE (f 0 , g 0 ) in state s ∈ S is given by v k β, f 0 ,g 0 (s) = I k β (f 0 , g 0 )(s) for k ∈ {1, 2}. For non-zerosum β-discounted stochastic games the value of the game is not unique. It changes with the NE. Definition 2.7 (Single player controlled stochastic game ( [12])). A stochastic game is said to be a single player controlled stochastic game if the transition probability is controlled by only one player. For a player-2 controlled stochastic game, we have, q(s ′ |s, i, j) = q(s ′ |s, j) for all s, s ′ ∈ S, pure action i of player-1 and j of player-2.
Throughout the rest of the paper, unless mentioned otherwise we will be considering player-2 controlled stochastic game. An optimal strategy is said to be completely mixed if each coordinate of the strategy is strictly positive, that is each pure strategy is player with strictly positive probability.
Definition 2.8 (Completely mixed stochastic game ([5])). A stochastic game is said to be completely mixed if every optimal strategy (Nash equilibrium) for both the players are completely mixed. That is for both player-1 and player-2 in each state s ∈ S all the pure actions i and j for player-1 and player-2 respectively are played with strictly positive probabilities.
The transition probability matrix Q(f, g) for some stationary strategy f of player-1 and g of player-2, is defined as Q(f, g) = (q(s ′ |s, f, g)) K×K .
Under the assumption of player-2 controlled For a stationary strategy pair (or Nash equilibrium) (f, g) the reward vector is defined as Now the discounted and undiscounted payoffs for the stationary strategy (or Nash equilibrium) (f, g) for player-1 and player-2 respectively in state s ∈ S can be written as- And the notation {.} s represents the s th coordinate of the respective vector. Definition 2.9 (Uniform Discount Optimal). A strategy g 0 for player-2 is said to be an uniform discount optimal if g 0 is optimal for player-2 in the undiscounted stochastic game Γ as well as in the β-discounted game Γ β for all β close to 1. Similarly we can extend the definition for player-1 also. Definition 2.10 (Uniform Discount Equilibrium). A Nash equilibrium pair (f 0 , g 0 ) for player-1 and player-2 respectively is said to be an uniform discount equilibrium if (f 0 , g 0 ) is a Nash equilibrium in the undiscounted stochastic game Γ and (f β , g 0 ) is a Nash equilibrium in the β-discounted game Γ β for all β close to 1. Similarly we can define for player-1 also. Definition 2.11 (Auxiliary Game). For a β-discounted zero-sum stochastic game the auxiliary game in state s ∈ S is defined by the matrix R β (s). The (i, j) th entry of the matrix R β (s) is given by where, v β (s ′ ) is the value of the β-discounted stochastic game at state s ′ ∈ S. R β (s) is called the auxiliary game at state s (or starting at state s). The matrix R β (s) is known as, Shapley Matrix ( [14]).
The following results will be used to proof our results.
Result 2.12 (Theorem 1 page 475, [10]). Consider a two person zero sum matrix game with payoff matrix M ∈ R m×n . Suppose player-2 has a completely mixed optimal strategy y ′ then for any optimal strategy strategy x ′ for player-1, we have v is the value of the matrix game and m ij is the (i, j) th entry of the matrix M .
Result 2.13 ([12]). For player-2 controlled stochastic game (both zero-sum and non-zerosum) if (f β , g β ) is a pair of Nash equilibrium in the β-discounted stochastic game and if where, g is a stationary strategy for player-2. [12] also provides, , · · · , g β (s K )) and K is the number of states.

Zero-sum Games
Unless mentioned otherwise for Section 3, player-1 is the maximizing player and player-2 is the minimizing player. The following Lemma is required to proof the main theorem.
Lemma 3.1. Assume player-2 controlled transition. Suppose, there exists β 0 ∈ [0, 1) and a completely mixed stationary strategy g 0 such that g 0 is optimal for player-2 (Minimizer) in every β-discounted stochastic game for all β > β 0 . Let, β n ∈ [β 0 , 1) be such that β n ↑ 1. Let {f n } be optimal for player-1 (Maximizer) for β n -discounted stochastic game. Suppose f n → f 0 coordinate-wise, that is f n (s) → f 0 (s) for each state s ∈ S then f 0 is optimal for player-1 in the undiscounted stochastic game.
Proof. Under one player controlled transition probability, an undiscounted stochastic game has value restricted to stationary strategy ( [12]). Suppose f n is optimal for player-1 in the β n -discounted stochastic game. From the assumption we have a completely mixed strategy g 0 for player-2 in the β n -discounted stochastic game from Result 2.12 and Result 2.14: where, g is any stationary strategy of player-2, and v βn = (v βn (s 1 ), · · · , v βn (s K )) t .
Therefore, we have Also, we have f n → f 0 point-wise and the reward function r(., .) is a continuous function on player-1's strategy. Hence, for any given ǫ > 0 there exists N 0 ∈ N such that for all n ≥ N 0 : coordinate-wise, where e is a suitable length column vector with all entry as 1. Therefore we As (1 − β n ) is always non-negative for all β n ∈ (0, 1], we have.
β k n e as Q k is a stochastic matrix for each k. Therefore the above inequality reduces to the following inequality.
Now if we let β n ↑ 1, Using Result 2.13 we have the above inequality as follows. v where v = (v(s 1 ), · · · , v(s K )) t is the value of the undiscounted stochastic game. The argument is true for any ǫ > 0. Hence f 0 constructed above is optimal strategy for player-1 in the undiscounted stochastic game Γ.
If an undiscounted single player stochastic game is completely mixed then we can conclude the β-discounted stochastic games are completely mixed for β sufficient close to 1.
Proof. Let us assume by contradiction, that the undiscounted stochastic game is completely mixed but there does not exist any 0 ≤ β 0 < 1 such that the β-discounted game are completely mixed for all β > β 0 .
Then given any β ∈ [0, 1) we can find a β 1 ∈ (β 0 , 1) such that there will exists a non-completely mixed strategy f 1 which is optimal for the β 1 -discounted stochastic game.
Similarly, we can find a β 2 ∈ (β 1 , 1) such that there will exist a non-completely mixed strategy f 2 which is optimal for the β 2 -discounted stochastic game, as from the assumption above. Hence we will obtain a sequence β n ↑ 1 such that for all β n -discounted game we have non-completely mixed strategy f n . But we have finitely many states and finitely many pure actions in the stochastic game. Hence there exists a states ∈ S and a pure action i for player-1 such that the i th coordinate of f n (s) is zero for infinitely many f n . That is, we can find a sub-sequence f n k of f n for which a fixed states ∈ S and a fixed pure action i can be found, such that the i th pure strategy in the states th is always played with probability zero in the optimal strategy. Hence applying the Lemma 3.1, limit of this sub-sequence f n k of f n (denote as f 0 ) is optimal strategy for player-1 in the undiscounted game. However f 0 is not completely mixed, yet it is optimal in the undiscounted stochastic game. This is a contradiction. So there exists β 0 ∈ [0, 1) such that forall β > β 0 the β−discounted stochastic game Γ β obtained from the same payoff matrices are completely mixed.
The converse of the above theorem is not true. Let us consider an example of finite player-2 controlled zero-sum stochastic game where the β-discounted stochastic game is completely mixed for all β ∈ [0, 1) but the undiscounted stochastic game is not completely mixed. Note: s 2 is an absorbing state.
But if we consider the undiscounted stochastic game Γ. Consider the strategy f with f (s 1 ) = . f is a stationary optimal strategy for player-1 in the undiscounted stochastic game Γ but not completely mixed. Hence the undiscounted game Γ is not completely mixed. Lemma 3.3. Let, A = (a ij ) be a symmetric matrix of order n with a ij > 0 for every i and j. Let b t = (b 1 , b 2 , · · · , b n ) be a non-negative vector. Let, C = (c ij ) where, c ij = a ij + b j . Then C completely mixed matrix game implies A is also a completely mixed matrix game.
Let y be a completely mixed optimal strategy for C. Then Cy = ve, where e is a vector with all coordinates equals to 1 and v denotes the value of the matrix game C. It follows that Since A is symmetric, it follows that δ is the value of the matrix game A and (y, y) is an optimal strategy of A.
To complete the proof we will show that y is the only optimal strategy for both players in the matrix game A. Let, z be any other optimal strategy for player-2 in A. Then Az = δe. Since y is a completely mixed optimal for both players (using Result 2.12) we also have Ay = δe. Thus A(y − z) equals to zero vector. Since A is non-singular, y = z. In other words, every optimal strategy for player-2 in A coincides with y. Similar argument holds for player-1 in A. This completes the proof.
A similar type of proof is also provided in [1]. We now give a simple counterexample to show that if A is completely mixed matrix game, then C need not be completely mixed matrix game.
A is a completely mixed matrix game but C is not a completely mixed matrix game. So the converse of Lemma 3.3 is not true. Remark: Lemma 3.3 also holds under both the following circumstance. 1. a pair of completely mixed optimal strategies exist for the two players in C instead of C to be a completely mixed matrix game. 2. A to be a symmetric, nonnegative and irreducible instead of A to be a strictly positive matrix.
Undiscounted completely mixed stochastic games not only proceed completely mixed βdiscounted stocha-stic games for large enough β (see Theorem 3.2) but also process completely mixed matrix games under some symmetry assumption. Example of completely mixed undiscounted stochastic game are far from obvious. The following is a finite single player controlled completely mixed stochastic game. The player-2's unique optimal strategy is g = {( }. So (f, g) is the unique optimal strategy for the undiscounted game mentioned before.
In Example 3, we can see that for all β ∈ [0, 1) the β-discounted stochastic game Γ β are completely mixed. As both s 1 and s 2 are symmetric matrix Theorem 3.4 says individual matrix games are completely mixed, which is easy to see in the above example. Similar to Theorem 3.2, if we know the value of an undiscounted stochastic game is non-zero then we can conclude the value for the β-discounted games are also non-zero.
Theorem 3.5. Assume a player-2 controlled transition. Let, v(s) and v β (s) be the value of an undiscounted and β-discounted stochastic game in state s ∈ S respectively. If v(s) = 0 for some s ∈ S then there exists β 0 ∈ [0, 1) such that for all β > β 0 , the discounted value v β (s) = 0.
Note: The above theorem is true for non-zerosum stochastic game also. For non-zerosum stochastic game the proof will be exactly similar to the above proof using results of [12].
Note: The converse of the Theorem 3.5 is not true. The following example shows that.
where, y 0 is optimal strategy for player-2 in the matrix game C. Also, we have player-1 has an unique optimal in the matrix game A.
Proof. We skip the proof as it is straight forward.
Under the symmetric assumption, the undiscounted value satisfies linear equations.
Theorem 3.7. Let Γ be player-2 controlled completely mixed undiscounted stochastic game. Let for all s ∈ S, the individual payoff matrices R(S) proceeds val(R(s)) = val(R(s) t where, val(.) denotes the value of the corresponding matrix game. Then, for g 0 -the unique optimal for player-2 in undiscounted game Γ the following equality holds.
Proof. The Shapley matrix in state s, for the β-discounted stochastic game Γ β with same payoff matrix R(s) as of Γ is denoted as R β (s).
Now from Theorem 3.2 we know that the β-discounted stochastic game Γ β with same payoff matrix is completely mixed for all β > β 0 . This implies ∀s ∈ S and ∀β sufficiently close to 1 (say β > β 0 ) the Shapley matrix R β (s) is completely mixed. Assume the unique optimal ( [5]) in the undiscounted game to be (f 0 , g 0 ). We also have g 0 is an uniform discount optimal for player-2 in the undiscounted game Γ. Without loss of generality we will assume the matrix R(s) > 0∀s ∈ S. Take A = R(s) and C = R β (s) in Lemma 3.6 gives, where g 0 (s) j is the j th coordinate of g 0 (s). So we have the following.
Which then follows: Multiplying by (1 − β) in both the sides we obtain: The above equality is true for all β ∈ [0, 1). Taking limit we have the following.
Remark 3.8. If Γ is an undiscounted non-zerosum single player (player-2) controlled completely mixed stochastic game then all the strictly positive strategy for player-2 partitions the states S of undiscounted game Γ into k sets of ergodic chains C 1 , C 2 , · · · , C k and a set H of transient states. Also the stochastic game Γ can be divided into k subgames Γ 1 , Γ 2 , · · · , Γ k , with value that is independent of states in each of the game Γ s corresponding to states C s . For the zero-sum game in [6] it is shown that if s ∈ H then player-1 and player-2 both has exactly one action in the state s ∈ S.
Using Remark 3.8 and the fact that for player-2 controlled stochastic game the value is a linear function of player-1's strategy we can show that for an undiscounted stochastic game every completely mixed optimal strategy proceeds an equalizer rule in term of undiscounted value.
Theorem 3.9. Let Γ be a zero-sum undiscounted stochastic games where player-2 controls the transition probability. If (f 0 , g 0 ) is a completely mixed optimal strategy in the game Γ then the game follows the equalizer rule.
Proof. Filar ([5]) shows Theorem 3.9 when the strategies f and g are restricted to only the pure strategies of the respective players under the completely mixed assumption of the game.
Since the stochastic game Γ is completely mixed from Remark 3.8 we can conclude that it is sufficient to look at the stochastic games Γ c separately. Now for all the subgame Γ c with states restricted to C c are completely mixed. Also, if (f 0 , g 0 ) is optimal for the stochastic game Γ then (f 0 , g 0 ) restricted to state C c (denoted as (f 0 , g 0 ) c ) is optimal strategy in the stochastic game Γ c .
If we fixed player-1's (unique) optimal strategy f 0 c then there exists vector γ 0 along with value v c 1 of the game Γ c , which satisfy the following equality ( [9]): for all pure strategy j for player-2 and for all s ∈ C c . This implies v c g j (s) + γ 0 (s)g j (s) = r c (f 0 , j, s)g j (s) + for all player-2's pure strategy j, for all s ∈ C c and g ∈ P A2 . So, for all s ∈ C c . So we obtain for all s ∈ C c : v c + γ 0 (s) = r c (f 0 , g, s) + s ′ ∈S q(s ′ |s, g)γ 0 (s ′ ).
Writing the above equality in a vector notation we obtain. v c 1 + γ 0 = r c (f 0 , g) + Q(g)γ 0 for all g ∈ P A2 . Multiplying the Markov matrix Q * (g) in both the sides of the above equality we get: v c 1 + Q * (g)γ 0 = Φ c (f 0 , g) + Q * (g)γ 0 .
The above argument is true for all the individual completely mixed games Γ 1 , Γ 2 , · · · Γ k . Now if s ∈ H (H is the set of all transient states) then let p g (s, s ′ ) be the probability that s ′ is the first state reached outside H. So for all s ∈ H, We already have Φ c (f 0 , g)(s) = Φ c (f 0 , g 0 )(s) for s ∈ H c . Hence the claim is proved. The other side of the equality directly follows from the Proposition 3.3; [5] and the fact that Φ is continuous on player-1's strategy.
Let Γ be a player-2 controlled zero sum undiscounted stochastic game with individual payoff matrix R(s) for all s ∈ S. Let Γ β be the corresponding β-discounted stochastic game with same payoff matrix R(s). We say a stationary strategy g 0 is uniform discount optimal for player-2 in the game Γ if there exists β 0 ∈ [0, 1) such that g 0 is optimal for player-2 in the corresponding β-discounted game Γ β for all β > β 0 .
Remark 3.10 ( [12]). For β-discounted stochastic game for each s ∈ S there exists a nonsingular sub-matrixṘ β (s) of Shapley matrix R β (s) such that if we define for all β > β 0 where, e is the column vector of all 1's. Then the set of pair (f β , g β ) obtained from (ḟ β (s),ġ β (s)) by adding zero in the place corresponding to the rows/ columns ofṘ(s) which are not in R(s) from an optimal stationary strategy pair (f β , g β ) in Γ β for all β > β 0 . Furthermore for all s ∈ S it is also shown that,Ṙ(s) is non-singular andġ β1 (s) =ġ β2 (s) for all β 1 , β 2 > β 0 . Denote,ġ β1 (s) =ġ 0 (s) ∀β > β 0 . Then g 0 = (g 0 (1), g 0 (2), · · · , g 0 (S)) t where g 0 (s) is obtained by completingġ(s) with 0 ′ s is an uniformly discount optimal for player-2 in Γ. Also it turns out that (f 0 , g 0 ) where f 0 (s) = lim β↑1 f β (s) is an optimal strategy pair in the undiscounted game Γ. For undiscounted stochastic game this definition was first introduced by Filar( [4]). In matrix game, bimatrix game and discounted stochastic game if both player has same number of pure strategy than CM-I implies Completely mixed game and hence CM-II. Similarly CM-II also implies completely mixed game and hence CM-I. But for undiscounted stochastic game this is far form reality.
Lemma 3.12. If an undiscounted player-2 controlled stochastic game is CM-I then player-2 has a completely mixed optimal in Γ and there exists an β 0 ∈ [0, 1) such that for all β > β 0 the β-discounted stochastic game Γ β is completely mixed.
Proof. The proof goes exactly in the same way of Theorem 3.2. And since for β closed to 1 Γ β is completely mixed, the discount optimal for player-2 is also completely mixed. Hence that optimal is a completely mixed optimal for player-2 in the undiscounted stochastic game. Lemma 3.12 only says the β-discounted stochastic games are completely mixed. It is an unsolved question if the undiscounted stochastic game is completely mixed or not.

Non-zerosum game
Unless mentioned otherwise throughout Section 4 both player-1 and player-2 are maximizer. For the non-zerosum discounted stochastic game we have the following result which is needed to prove the main result.
(ii) For each s ∈ S, the pair (f 0 (s), g 0 (s)) constitutes an equilibrium point in the static bimatrix For bimatrix game we have an equalizer rule for the payoff under certain compeltely mixed assumption.  Let (A, B) be a bimatrix game. If (x 0 , y 0 ) be a EP in the bimatrix game with y 0 completely mixed then r 2 (x 0 , y) = r 2 (x 0 , y 0 ) for all player-2's stationary strategy y.
Proof. Under one player (player-2) controlled transition probability, an undiscounted non-zerosum stochastic game has value restricted to stationary strategy (CITATION REQUIRED). Suppose, (f n , g 0 ) is NE for β n -discounted stochastic game. As we have a completely mixed strategy g 0 for player-2 in the β n -discounted stochastic game from Theorem 4.1 and Lemma 4.2 we have I 2 βn (f n , g) ≡ v 2 βn coordinatewise for any stationary strategy g of player-2, where I 2 βn (f n , g) = (I 2 βn (f n , g)(s 1 ), · · · , I 2 βn (f n , g)(s K )) t and v 2 βn = (v 2 βn (s 1 ), · · · , v 2 βn (s K )). Therefore we have [I − β n Q(g)] −1 r 2 (f n , g) ≡ v 2 βn . Now we is a non-negative matrix. Hence we have the expression of r 2 (f n , g) as follows.
Now we have f n → f 0 point-wise and the reward function r(., .) is a continuous function on the strategy of player-1. Hence, for any given ǫ > 0 we have, coordinate-wise for all n ≥ N 0 , for some N 0 . Where e is a suitable length column vector with all entry as 1. Therefore we have, As, (1 − β n ) is always non-negative for all β n ∈ [0, 1), we have, β k n e as Q k is a stochastic matrix for each k. Therefore the above inequality reduces to the following inequality.
Now if we let β n ↑ 1 from [12] the above inequality will look as follows.
The following is an example of a non-zerosum completely mixed undiscounted stochastic game. This is the corresponding non-zerosum version of Example 3. This example is a player-2 controlled stochastic game with states s 1 and s 2 . player-2 controlled the transition. In both state s 1 and s 2 if player-2 chooses action 1 (column 1) then the game moves to state s 1 in the next day and if player-2 chooses action 2 (column 2) then the game moves to state s 2 in the next day.
The unique equilibrium strategy for player-1 is {( )} for the above-mentioned game.
Using 4.4 we can show that for an undiscounted stochastic game with one completely mixed NE the game process equalizer rule for undiscounted payoff. Unlike the zerosum case (see Theorem 3.9) we have equalizer for payoff for both players.
where, s ∈ H. We already have Φ 2 Proposition 4.9. Let Γ be non-zerosum player-2 controlled undiscounted completely mixed stochastic game. If R 1 (s) is non-singular for all s ∈ S. Then T = {g|(f, g) is a NE in Γ for some f ∈ P 1 } is singleton.
Proof. The proof is on the same way of filar's argument( [5] proposition 3.4) for zero-sum games. In [12] (page 390) provide the existence of uniform discount equilibrium in Γ. Let (f 0 , g 0 ) be NE in Γ with g 0 uniform discount equilibrium. Let we have some other NE (f * , g * ) (Note: f * may be equal with f 0 ). From remark 1 the sets C 1 , · · · , C k and H are same for g 0 and g * . Also for all s ∈ H the number of action available in s is exactly 1. So g 0 (s) = g * (s) for all s ∈ H. Now it is enough to consider the subgames Γ 1 , · · · , Γ k separately. Without loss of generality we consider only Γ 1 and assume that C 1 has S 1 states. Also, associate with a NE (f, g), the value v k = v k (s) is a constant for all s ∈ C 1 and k ∈ {1, 2}. The stationary matrices Q * (g 0 ) and Q * (g * ) each have identical rows u 0 = (u 0 (l), · · · , u 0 (S 1 )) and u * = (u * (1), · · · , u * (S 1 )) with u 0 (s) and u * (s) > 0 for all s ∈ C 1 . By theorem 8, for every pure stationary strategy a for player-1 in Γ 1 we have, for every s ∈ C 1 , v 1 (f 0 ,g 0 ) = Φ 1 (σ, g 0 )(s) = [Q * (g 0 )r 1 (σ, g)] s = S1 s ′ =1 where u 0 j (s ′ ) = u 0 (s ′ )g 0 j (s ′ ) for all j = 1, · · · , n s , and s ′ ∈ C 1 . The above equality holds with g * in place of g 0 and with u * (s ′ ) = u * (s ′ )g * (s ′ ) in place of u 0 (s ′ ). Now let z j (s) = u 0 j (s) − u * j (s) for all j = 1, · · · , n s and s ∈ C 1 . Then from above equation we obtain S1 s ′ =1 n s ′ j=1 r 1 (σ, j, s ′ )z j (s ′ ) = v 1 (f 0 ,g 0 ) − v 1 (f * ,g * ) =c for every pure stationary strategy σ for I in Γ 1 . Let Z = (Z l : Z 2 : · · · : Z S1 ) t be a column vector such that Z s = (Z 1 (s), Z 2 (s), · · · , Z n (S)) for each s ∈ C 1 , and let t 1 , be the number of pure stationary strategies for player-1 in Γ 1 . Fix σ(s) for each s > 1 and consider the n 1 equations extracted from (1) by letting σ(1) range over the n 1 -dimensional unit basis vectors (Since R 1 (s) is square from remark 3). These are equivalent to Now, if α = 0 the argument follows exactly in the same way of filar's argument for zero sum games with help of Assumption. If α = 0 the argument goes exactly in the same way of Filar's argument with the help of Remark 3.
Proof. From lemma 5.2 of [12] we already have S(g * ) is convex. Since we have Γ is completely mixed all elements of S(g * ) has to be completely mixed. Assume we have f 0 , f * ∈ S(g * ).
Therefore we can choose λ such the λf 0 (s) + (1 − λ)f * (s) is not completely mixed for some s ∈ S. Which is a contradiction. Hence the proposition is true.
So finally we have for all =⇒ v 2 β = I 2 β (f 0 , g) for all g ∈ P 2 . Similarly we can show I

Open Problem:
In a switching control Stochastic game ([4]) we get a partition S 1 and S 2 of state space S, such that in any states of S 1 player-1 alone controls the transition probability and in any states of S 2 player-2 alone controls the transition probability. The transition probabilities are as follows.
It is not known whether similar results of Theorem 3.2 and Theorem 3.4 holds for Switching control undiscounted stochastic games.