Construction of Nash Equilibrium in a Game Version of Elfving’s Multiple Stopping Problem

Multi-person stopping games with players’ priorities are considered. Players observe sequentially offers Y1,Y2,… at jump times T1,T2,… of a Poisson process. Y1,Y2,… are independent identically distributed random variables. Each accepted offer Yn results in a reward Gn=Ynr(Tn), where r is a non-increasing discount function. If more than one player wants to accept an offer, then the player with the highest priority (the lowest ordering) gets the reward. We construct Nash equilibrium in the multi-person stopping game using the solution of a multiple optimal stopping time problem with structure of rewards {Gn}. We compare rewards and stopping times of the players in Nash equilibrium in the game with the optimal rewards and optimal stopping times in the multiple stopping time problem. It is also proved that presented Nash equilibrium is a Pareto optimum of the game. The game is a generalization of the Elfving stopping time problem to multi-person stopping games with priorities.

for department 2, and so on until the first acceptance. Candidates rejected for department i cannot be considered in the future. The aim is to select candidates with maximal expected "skills". So, one may say that each department acts as an independent player in a stopping game with priorities.
We will formulate the problem as an m-person stopping game with priorities in which random offers are presented at jump times of a homogeneous Poisson process. Such a game has been considered in Ferenstein and Krasnosielska [8]. In this paper, we propose a new solution and we prove that a proposed strategy is a Nash equilibrium, which allows removing some assumption made in Ferenstein and Krasnosielska [8]. The difference between the solution proposed in this paper and those in [8] will be more thoroughly discussed at the end of the paper.
The game considered is a generalization, to the case of several players, of the optimal stopping time problem formulated and solved first by Elfving [6], later considered also by Siegmund [18]. Various modifications of the structure of the reward in the Elfving problem were considered in Albright [1], David and Yechiali [3], Krasnosielska [12,13], Gershkov and Moldovanu [10], Parlar et al. [16].
Stadje [19] considered an optimal multi-stopping time problem in Elfving setting, in which the final reward is the sum of selected discounted random variables. Various stopping games with rewards observed at jump times of a Poisson process were considered in Dixon [4], Enns and Ferenstein [7], Saario and Sakaguchi [17], Ferenstein and Krasnosielska [9]. Stopping games were introduced in seminal paper by Dynkin [5] as an application of optimal stopping time problems, since then often referred as Dynkin games. An extensive bibliography on stochastic games can be found in Nowak and Szajowski [15].

Multiple Stopping Time Problem
Let us recall the multi-stopping time problem presented in Stadje [19]. Let Y 1 , Y 2 , . . . be nonnegative independent identically distributed random variables with continuous distribution function F and E(Y 1 ) ∈ (0, ∞), Y 0 = 0. The random variables Y 1 , Y 2 , . . . are sequentially observed at jump times 0 < T 1 < T 2 < · · · of a homogeneous Poisson process N(s), s ≥ 0, with intensity function p(u) and T 0 = 0. Moreover, assume that the sequences {Y n } ∞ n=1 and {T n } ∞ n=1 are independent. Let r : [0, ∞) → [0, 1] be a right continuous, non-increasing function satisfying the conditions Assume that the set of points of discontinuity of r is finite.
Note that without loss of generality we can assume that p(u) ≡ 1 because a nonhomogeneous Poisson process can be reduced to a homogeneous Poisson process with intensity 1 (see [2, pp. 113-114]).
Note that the values of stopping times τ i are natural numbers and τ i = n means selecting an offer arriving at time T n . We are interested in finding an optimal m-stopping time for {G n } ∞ n=1 , that is, the m- and the optimal mean reward E(G τ 1,m k + · · · + G τ m,m k ).
Interpretation The problem can be interpreted as a problem of selling m commodities of the same type, where the offers are received sequentially and must be refused or accepted immediately on arrival. Let {s 0 , . . . , s l }, where 0 = s 0 < s 1 < · · · < s l−1 < s l = U , l < ∞, contains all points of discontinuity of the discount function r.
In the theorem below functions γ i determining the optimal expected and conditional expected total reward are obtained.
. . , m}, the function γ i is continuous and has continuous derivative on The additional expected reward which can be obtained from selling i instead of i − 1 commodities is For k ∈ N and i ∈ {1, . . . , m}, define and The Markov time τ i,m k can be interpreted as a time of selling the ith commodity among m commodities for sale, if we start the process of selling at the time of the kth observation. Note that τ i,m k is the first time after the stopping time τ i−1,m k , at which the reward is not smaller than the optimal conditional expected reward from selling the ith commodity in the future if we have i instead of i − 1 commodities for sale. Therefore, γ i (T n ) determines the minimum acceptable offer of selling the m − i + 1-th commodity at time T n . Hence, γ i is a threshold below which it is not profitable to sell the m − i + 1-th commodity.
Note that for each m ∈ N and in particular sup (τ 1 ,...,τ m )∈M 1 (m) Note that from monotonicity of the sequence {γ i (·)} m i=1 we get and from Theorem 2 we obtain Moreover, for k, m ∈ N and i ∈ {2, . . . , m}, we have Proof Immediately from (6) for i ≥ 2 and from (3) for i = 1.

The Game
Suppose that there are m > 1 ordered players who sequentially observe rewards G n at times T n , n = 1, 2, . . . . Players' indices 1, 2, . . . , m correspond to their ordering (ranking) called priority so that 1 refers to the player with the highest priority and m to the lowest one. Each player is allowed to obtain one reward at time of its appearance on the basis of the past and current observations and taking into account the players' priorities. More precisely, Player i, say, who has just decided to make a selection at T n gets the reward G n if and only if he has not obtained any reward before and there is no player with higher priority (lower order) who has also decided to take the current reward. As soon as the player gets the reward, he quits the game. The remaining players select rewards in the same manner, their priorities remain as previously.

Model of the Game
In this section, we make the same assumptions and denotations as in Sect Under the strategy profile ψ m , the reward of Player i, i ∈ {1, . . . , m}, is G σ i m (ψ m ) and the mean reward is Let Using Stadje's result from Theorem 2, we have that m selected rewards which maximize the expected total reward appear no later than τ m,m 1 . This motivates the searching of a Nash equilibrium strategy in the set D m 1 .
From Lemma 4, we get that {ψ m,i n } is a sequence of 0-1 valued {F n }-adapted random variables. Hence, from Lemma 3, we getψ m,i ∈ D 1 (13) for i ∈ {1, . . . , m}. According to the above profile, Player i in the m-person game will behave in the same manner as Player i in the i-person game, that is, Proof Proof uses induction on i and Lemma 6.
It follows from Proposition 1 that, for i ∈ {1, . . . , m}, Player i stops playing in the mperson game at the same time as in the i-person game, that is, where τ i,m k are defined in (3).
Proof Using (9), Proposition 1 and Lemma 7, we get Hence, from Theorem 2 and (4) we get the assertion.
Note that according to (17), the expected reward of Player i in the m-person game with profileψ m is equal to the expected reward of Player i in the i-person game with profileψ i .
Proof Assume that there exists ϕ m ∈ D m 1 such that V m,i (ϕ m ) ≥ V m,i (ψ m ), i = 1, 2, . . . , m and at least one of the inequalities is strict. Then, from Theorem 3 and (4), On the other hand, from (9), which with (20) gives a contradiction.
In the proposition below, we will show that players in the m-person game will choose the same rewards as those optimal in the multiple stopping problem. However, note that the stopping time selected by Player i in the m-person game can be different from the ith stopping in the multiple stopping problem.
In Theorem 6 below, we will show that presented Nash equilibrium is a sub-game perfect equilibrium. Let us remind that a sub-game perfect equilibrium is a Nash equilibrium if after any history all remaining players' strategy profile is a Nash equilibrium in the remaining part of the game. Let V k m,i (ψ m ) be conditional expected reward for Player i in the m-person game at time of the k-th offer, that is,

Theorem 6
The profileψ m is a sub-game perfect equilibrium.
Summary In Theorem 4, we have proved that the strategy profileψ m is a Nash equilibrium. According to the strategy, Player i in the m-person game behaves as Player i in the i-person game (Eq. (14)). Moreover, their selected stopping times and expected rewards are equal (Proposition 1 and Eq. (17)). Additionally, the expected reward of Player i in the m-person game in Nash equilibrium is equal to the expected reward from selling the ith good in the future, if there are i instead of i − 1 goods for sale (Lemma 8).
Note that τ 3 (1) = τ 1,3 1 . Hence from (11) we have In a similar way, we obtain that Player 2 in three-person game will finish the game at one of the two stopping times: τ 1,2 1 , τ 2,2 1 , that is, Moreover, the player with the highest priority will finish the game at σ 1 1 = τ 1,1 1 . Note that the ith player's stopping time is different from the optimal time of selling the ith commodity, but his expected reward is equal to the optimal expected reward, which can be obtained from selling the ith commodity.

Discussion
In this section, we compare in detail the results obtained in this paper with those presented in Ferenstein and Krasnosielska [8]. Let us briefly present the solution obtained in [8]. There it is assumed that there exist continuous functionsV m,i (·) such thatV m,i (u) ≤V m,i−1 (u), i ∈ {2, . . . , m}, u ∈ R + 0 , whereV m,i (u) is an"optimal" mean reward of Player i in the m-person game with reward structure G n (u) = Y n r(u + T n ). Next, there are defined Markov timeŝ τ i (u) = inf{n ≥ 1 : G n (u) ≥V m,i (u + T n )}, i = 1, 2, . . . , m, which are used to construct the following strategy: Next, it is proved that Player i stops in the i-person game at the same time as in the mperson game and in consequence the "optimal" expected reward of Player i in the m-person game is equal to the "optimal" expected reward of Player i in the i-person game, that is, , it was assumedV m,1 (u) =V 1,1 (u) and for i ≥ 2 whereV 1,1 (u) is the optimal expected reward in the Elfving problem. Hence, as in the Elfving problem, for each i ∈ {2, . . . , m} the differential equations describingV i,i (·) have been obtained. Next, it was stated (without proof) that the equations obtained have exactly one solution. Moreover, there were given some arguments suggesting that the profile is a Nash equilibrium. Note that the assumption given by (22) is a modification of those made in Elfving [6], and it was removed in Siegmund [18]. Note that the differential equations obtained in Ferenstein and Krasnosielska [8] are the same as those in Stadje [19], that is,V From (16), w have that the expected reward of Player i in the m-person game with profilê ψ m is equal to γ i (0). Hence, from (16), (23), and Theorem 4 we get that the profile is a Nash equilibrium.
The connections between the problem solved in Stadje [19] and the solution of the game presented in Ferenstein and Krasnosielska [8] are discussed in detail in the doctoral dissertation of Krasnosielska [14].

Proofs
Proof of Lemma 3 From (3), for i = 1, we have σ k . We will show that σ k i ≤ τ i,i k . From (11) and Lemma 2, we have The second inequality in Lemma 3 follows from (6).
Proof of Lemma 6 It is enough to show that σ k i = σ k j , j < i. The proof uses induction on i.
Proof of Lemma 7 The proof uses induction on m. Note that for each k ∈ N, we have σ k 1 = τ 1 (k) = τ 1,1 k . Hence, for m = 1, we get (15). Now assume that for each k ∈ N and given m ≥ 2, we have To prove that (15) is satisfied for k ∈ N, we will use the equality which will be proved later. From (27) and the induction assumption, we obtain where above, the middle equality follows from Lemma 2 and τ m (k) = τ 1,m k , which follows from (3). Now we will prove (27). From (11), we get where the last equality follows from (10) and (5). Adding the first and third summands appearing on the right side of the above equality, and using (5), we obtain Let i = 1, then from Lemma 5 and (11) we have σ k 1 = σ τ m (k)+1 where we used (15) where in the last but one equality we used (15).