Abstract
We consider a multi-person stopping game with players’ priorities and multiple stopping. Players observe sequential offers at random or fixed times. Each accepted offer results in a reward. Each player can obtain fixed number of rewards. If more than one player wants to accept an offer, then the player with the highest priority among them obtains it. The aim of each player is to maximize the expected total reward. For the game defined this way, we construct a Nash equilibrium. The construction is based on the solution of an optimal multiple stopping problem. We show the connections between expected rewards and stopping times of the players in Nash equilibrium in the game and the optimal expected rewards and optimal stopping times in the multiple stopping problem. A Pareto optimum of the game is given. It is also proved that the presented Nash equilibrium is a sub-game perfect Nash equilibrium. Moreover, the Nash equilibrium payoffs are unique. We also present new results related to multiple stopping problem.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Multi-person stopping games with players’ priorities have been investigated by many authors. The reason for it is diversity of applications of considered models which fit very well the problems present in economic theory and operations research (see, e.g., Heller 2012). This paper has two main contributions with respect to this domain of research. The first contribution is a consideration of a game where players can stop more than once, i.e., Player i can stop \(n_i\ge 1\) times (in the literature, it is usually assumed that each player can stop once, but such assumption is not always realistic). The second contribution is providing a construction of Nash equilibrium for this game based on a solution of a multiple stopping problem. Moreover, the Nash equilibrium is a sub-game perfect Nash equilibrium and at the same time it is a Pareto-optimum of the game. We also prove that Nash equilibrium payoffs are unique.
To motivate the study of games with mentioned properties, consider the following two examples. In the first example, m shops sell commodities of the same type and all the shops are situated in a shopping center. The ith shop has \(n_i\) commodities for sale. The shops are ordered according to their distance to the main entrance of the shopping center; the ordering is referred to as priority, so that 1 refers to the shop which is the closest to the entrance and m to the shop which is the farthest from the entrance. The market brings about a sequence of buying offers. A potential buyer always goes to the closest shop first and presents his offer. If the offer is accepted, the transaction is realized; if the offer is not accepted, the buyer goes to the second-closest shop etc. The aim of each shop is to maximize the expected total reward (profit).
Let us consider another example—one that will model assigning tasks in a computer cluster. Clusters execute computations in a distributed way, because of that, the computations take only a fraction of the time that would be used if they were executed on a single computer. They are a standard tool for advanced scientific calculations as well as for processing large amounts of data in enterprises. Assume that we have a cluster that consists of m computers. We want to build a computer system that assigns tasks to computers in the cluster. Each task is associated with an estimation of computing power needed to execute this task. Furthermore, each computer in the cluster has a certain computing power (this corresponds to its priority among the computers). The goal of the system is to assign the task to appropriate computer in the cluster or to decide that the task will be executed locally (i.e., on a computer that does not belong to the cluster; in our model we say that such task is “rejected”). We assume that each computer in the cluster can execute a limited number of tasks (because using them is time-consuming and costly). Therefore we need to find an optimal strategy on which the system will be based. The strategy (and consequently the system) should allow to maximize the expected amount of executed computation on each of computers separately in the cluster.
The mentioned examples motivated us to consider multi-person stopping games characterized by players’ priorities and multiple stopping. A special case of the considered game has been presented in Ferenstein and Krasnosielska (2009), Krasnosielska (2011), and Krasnosielska-Kobos and Ferenstein (2013). In mentioned papers, in opposite to this paper, authors considered a game with the same rewards’ structure as in the Elfving problem [see Elfving (1967) and Siegmund (1967)], under the assumption that each player has only one commodity for sale. More precisely, they assumed that the offers are independent identically distributed random variables observed at jump times of a Poisson process and the reward is equal to the value of a discounted offer. Moreover, in mentioned papers, in opposite to this paper, authors search for Nash equilibrium in a specific set of strategies, i.e. strategies where if players haven’t sold their commodities until a certain point in time, then they have to accept the first available offer after this time. The mentioned point in time is the optimal stopping time of selling last commodity in multiple selling problem in which the number of commodities is equal to the sum of all commodities of all players. Various games with rewards observed at jump times of a Poisson process with at most one stop for each player were considered in Dixon (1993) and Saario and Sakaguchi (1992), among others. Two-person game with finite horizon where players observe a Markov process and one of the players can accept two offers was investigated in Szajowski (2002). A game with continuous time where players have possibility to stop more than once was presented in Laraki and Solan (2005). An extensive bibliography on games can be found in Ekström and Peskir (2008), Nowak and Szajowski (1999), Peskir (2008), Ramsey and Szajowski (2008) and Solan and Vieille (2003).
General theorems on existence and form of solution of a multiple stopping problem were presented in Stadje (1985), Nikolaev (1999) and Kösters (2004). A multiple stopping problem with random horizon was analyzed in Krasnosielska-Kobos (2015). A multiple stopping problem based on the Elfving problem was presented in Stadje (1987) and a version without discounting in Sakaguchi (1972).
The paper is arranged as follows. Section 2 presents the multiple stopping problem. The game is formulated in Sect. 3 along with a construction of Nash equilibrium and its properties. Examples are analyzed in Sect. 4.
The idea of using the solution of a multiple stopping problem to construct a Nash equilibrium is based on Krasnosielska (2011) and Krasnosielska-Kobos and Ferenstein (2013).
2 Multiple stopping problem
In this section we will present a multiple stopping problem formulated and solved in Stadje (1985) with modifications proposed in Kösters (2004). Next we will introduce new results concerning multiple stopping problems (Proposition 1, Lemmas 1 and 2 and Theorem 3).
Let \((\Omega ,\mathcal {F},\mathbb {P})\) be a probability space and \(\{\mathcal {F}_j\}_{j=0}^\infty \) be a nondecreasing sequence of the \(\sigma \)-algebra \(\mathcal {F}\). Moreover, let \(G_j\) be \(\mathcal {F}_j\)-measurable and integrable random variable, \(j\in \mathbb {N}\). We define an n-stopping time with respect to \(\{\mathcal {F}_j\}_{j=0}^\infty \) to be a sequence \((t_1,\ldots ,t_n)\) of n stopping times with respect to \(\{\mathcal {F}_j\}_{j=0}^\infty \) such that \(t_1<\cdots <t_n<+\infty \). Let \(\mathcal {M}_k(n)\) be a set of all n-stopping times \((t_1,\ldots ,t_n)\) with respect to \(\{\mathcal {F}_j\}_{j=0}^\infty \) such that \(t_1\ge k\) and \(E(G_{t_1}+\cdots +G_{t_n})\) exists. Our aim is to find an optimal n-stopping time for \(\{G_j\}_{j=1}^\infty \), that is, an n-stopping time \((\tau ^{1,n}_1,\ldots ,\tau ^{n,n}_1)\in \mathcal {M}_1(n)\) that maximizes \(E(G_{t_{1}}+\cdots +G_{t_{n}})\) among all \((t_{1},\ldots ,t_{n})\in \mathcal {M}_1(n)\), and the optimal expected total reward \(E(G_{\tau ^{1,n}_1}+\cdots +G_{\tau ^{n,n}_1})\).
The presented problem can be interpreted as follows. n commodities are sold, where the offers are received sequentially and must be refused or accepted immediately on arrival. Acceptance of the jth offer results in the reward \(G_j\). The aim of the seller is to maximize the expected sum of n rewards.
Let \(S^{i}_{k}\) be the optimal conditional expected total reward obtained from selling i commodities when the sale of these commodities begins at the time of observation of the \(k\hbox {th}\) offer, this means
Moreover, let \(\gamma ^{i}_{k}\), \(k\in \mathbb {N}_0\), where \(\mathbb {N}_0=\mathbb {N}\cup \{0\}\), be the optimal conditional expected reward from selling the additional commodity if we have i instead of \(i-1\) commodities for sale at the time of the \(k\hbox {th}\) offer. This means
where \(S^{0}_{k+1}=0\). Note that \(\gamma ^{i}_{k}\) is a threshold below which it is not profitable to sell the first commodity among i commodities for sale at the time of observation of the \(k\hbox {th}\) offer. For \(k\in {\mathbb {N}}\) define
and \(\tau ^{i}(+\infty )=+\infty \). Let \(\tau ^{i,n}_k\), \(k\in {\mathbb {N}\cup \{+\infty \}}\), be the time of selling the \(i\hbox {th}\) commodity from n commodities for sale when the process of selling starts at the time of observation of the \(k\hbox {th}\) offer, i.e.
Note that \(\tau ^{i,n}_k\) is the first time after the time of selling \(i-1\)th commodity among n commodities for sale when the reward is not smaller than the threshold \(\gamma ^{n-(i-1)}_{k}\). This means, at the stopping time \(\tau ^{i,n}_k\) the obtained reward is not smaller than the optimal conditional expected reward from selling the additional commodity if we have \(n-i+1\) instead of \(n-i\) commodities for sale.
In the rest of this paper, we assume that the following assumption holds.
Assumption 1
(ii) \((\tau ^{1,n}_k,\ldots ,\tau ^{n,n}_k)\) is finite with probability one for all \(k\in \mathbb {N}\).
The first condition above ensures the existence of all considered expectations. Other such conditions are presented in Nikolaev (1999) and Kösters (2004).
In Theorem 2 below the solution of the multiple stopping problem is given.
Theorem 2
(Stadje 1985; Kösters 2004) \((\tau ^{1,n}_1,\ldots ,\tau ^{n,n}_1)\) is an optimal n-stopping time in \(\mathcal {M}_1(n)\) for the sequence \(\{G_k\}_{k=1}^\infty \). Moreover
and
In Stadje (1987), it is proved that in optimal stopping problem with Poisson stream of i.i.d. offers, the functions \(\gamma _j^i\) are non-increasing with respect to i. In Proposition 1, we will show that this property holds not only in the mentioned case. This property can be interpreted as follows. The larger the number of commodities a seller is left with, the more he is inclined to sell it for a lower price.
Proposition 1
For each \(j\in \mathbb {N}_0\) and given n we have \(\gamma _j^1\ge \cdots \ge \gamma _j^n\).
Proof
It is enough to show that \(S^2_{j+1}\le 2S^1_{j+1}\) and \(S^{i+1}_{j+1}+S^{i-1}_{j+1}\le 2S^{i}_{j+1}\) for \(i=2,\ldots ,n-1\). First equation is obvious. To prove the second one, define \(\tilde{\tau }^1_{j+1}\le \cdots \le \tilde{\tau }^{2i}_{j+1}\) such that \(\{\tilde{\tau }^1_{j+1},\ldots ,\tilde{\tau }^{2i}_{j+1}\} =\{\tau ^{1,i-1}_{j+1},\ldots ,\tau ^{i-1,i-1}_{j+1},\tau ^{1,i+1}_{j+1},\ldots ,\tau ^{i+1,i+1}_{j+1}\}\), \(\tilde{\tau }^1_{j+1}<\tilde{\tau }^3_{j+1}<\cdots <\tilde{\tau }^{2i-1}_{j+1}\) and \(\tilde{\tau }^2_{j+1}<\tilde{\tau }^4_{j+1}<\cdots <\tilde{\tau }^{2i}_{j+1}\). Note that \((\tilde{\tau }^1_{j+1},\tilde{\tau }^3_{j+1},\dots ,\tilde{\tau }^{2i-1}_{j+1})\) and \((\tilde{\tau }^2_{j+1},\tilde{\tau }^4_{j+1},\ldots ,\tilde{\tau }^{2i}_{j+1})\) are two i-stopping times. Hence
\(\square \)
It is convenient to allow for some of \(t_1,\ldots , t_n\) to take the value \(\infty \) with positive probability. Therefore, the following notation will be used. Let \(t_1\sqsubset t_2\) mean that \(t_1<t_2\) on \(\{t_1<\infty \}\) and \(t_1=t_2\) on \(\{t_1=\infty \}\). We define an extended n-stopping time with respect to \(\{\mathcal {F}_j\}_{j=0}^\infty \) to be a sequence \((t_1,\ldots ,t_n)\) of n extended stopping times with respect to \(\{\mathcal {F}_j\}_{j=0}^\infty \) such that \(t_1\sqsubset t_2\sqsubset \cdots \sqsubset t_n\). Note that an extended stopping time t is a stopping time if \(t<\infty \). Let \(\bar{\mathcal {M}}_k(n)\) be the set of all extended n-stopping times \((t_1,\ldots ,t_n)\) such that \(t_1\ge k\) and \(E(G_{t_1}+\cdots +G_{t_n})\) exists. Define \(G_\infty =\limsup _{k\rightarrow \infty } G_k\),
The lemma below is a generalization of Lemma 4.10 from Chow et al. (1971) to the case of multiple stopping.
Lemma 1
\(\bar{S}_\infty ^n={S}_\infty ^n=n\cdot G_\infty \).
Proof
For \(n=1\) the proof follows from Chow et al. (1971, Lemma 4.10 and Theorem 4.7). Assume that the lemma holds for \(j=1,\ldots ,n-1\). We will show that it holds for \(j=n\). Note that
where the last equality follows from Chow et al. (1971, Theorem 4.7). We get the assertion from induction assumption. \(\square \)
Theorem 3
For all \(n\in \mathbb {N}\) and \(k\in \mathbb {N}\) we have
Proof
The theorem is true for \(n=1\) (see Chow et al. 1971, Thm 4.7). Assume that the theorem is true for \(n-1\). We will prove it for n. Let \((t_1,\ldots ,t_n)\in \bar{\mathcal {M}}_k(n)\). From the fact that \(\mathcal {F}_{t_1+1}\) is a \(\sigma \)-field, Lemma 1, and the induction assumption, we get
where we used Theorem 4.7 from Chow et al. (1971) and Theorem 2. Since the above inequality is true for all \((t_1,\ldots ,t_n)\in \bar{\mathcal {M}}_k(n)\), we get \(\bar{S}_k^n\le S_k^n\). The inverse inequality is obvious, hence we get (3). The proof of (4) is similar to the one above. \(\square \)
Note that from (4), we get that \((\tau _1^{1,1},\ldots ,\tau _1^{n,n})\) is optimal in \(\bar{\mathcal {M}}_1(n)\). Moreover, replacing all sets of extended stopping times by sets of stopping times in the proof of Theorem 3, we get
and
These two equalities were mentioned and used in Stadje (1985) and discussed in Kösters (2004) (in both papers without proof). From (5) and (6), we get that to solve the n-stopping problem we can solve n one-stopping problems with a modified structure of rewards.
The lemma below is a generalization of the result of Krasnosielska-Kobos and Ferenstein (2013, Lemma 9) to the case of extended stopping times. The following notation will be needed. Let \(t_1\doteqdot t_2\) mean that \(t_1=t_2=\infty \) on \(\{t_1=\infty , t_2=\infty \}\) and \(t_1\ne t_2\) on \(\{t_1<\infty \ \text {or}\ t_2<\infty \}\).
Lemma 2
For each \(n,k\in \mathbb {N}\) we have
and
Proof
Note that \(\bar{\mathcal {M}}_k(n)\subseteq \{(t_1,\ldots ,t_n):t_1,\ldots ,t_n\in {\bar{\mathcal {M}}_k(1)},t_i\doteqdot t_j,i\ne j\}\). Let \(t_1,\ldots ,t_n\in \bar{\mathcal {M}}_k(1)\) be any extended stopping times such that \(t_i\doteqdot t_j\) for \(i\ne j\). Define \(t_{(1)}=\min \{t_1,\ldots ,t_n\}\), \(t_{(i+1)}=\min \{t_j:t_j\sqsupset t_{(i)},j\in \{1,\ldots ,n\}\}\). Then, \((t_{(1)},\ldots ,t_{(n)})\in \bar{\mathcal {M}}_k(n)\). \(\square \)
3 The game
In this section, we will formulate the game and a Nash equilibrium. We will also present its properties. The game is a generalization of the one presented in Krasnosielska (2011) and Krasnosielska-Kobos and Ferenstein (2013).
Suppose that there are \(m>1\) ordered players. Player 1 has the highest priority and Player m has the lowest one. Players observe sequential rewards \(\{G_n\}\). Player i is allowed to obtain \(n_i\) rewards. The decision about acceptance or rejection of the reward must be made at the time of its appearance. Player i who has just decided to select \(G_n\), gets this reward if and only if he has obtained at most \(n_i-1\) rewards so far and there is no player with higher priority who has also decided to take this reward. As soon as Player i gets \(n_i\) rewards, he quits the game. All the players follow this scenario. The priorities of other players remain the same. Each player observes decisions of other players.
3.1 Model of the game
Let us formulate the game formally. We make the same assumptions and use the same notations as in Sect. 2. Moreover, let D be the set of sequences of 0–1-valued \(\{\mathcal {F}_n\}\)-adapted random variables. Let \(\psi ^{m,i}=\{\psi ^{m,i}_n\}_{n\in \mathbb {N}}\in D\) be a strategy of Player i in the m-person game. If \(\psi ^{m,i}_n=1\), then, at the time of observation of the nth offer, the decision of Player i is: I accept the reward \(G_n\). Otherwise, his decision is: I reject the reward \(G_n\).
Let us explain why we can assume that \(\psi ^{m,i}\in D\) for each i. Decisions of Player 1 do not depend on decisions of other players because of his priority. Therefore, his decision concerning acceptance of the reward \(G_n\) is a Borel function of \(\mathcal {F}_n\)-measurable random variables, so his decision is also a 0–1-valued \(\mathcal {F}_n\)-measurable random variable. This random variable is denoted by \(\psi ^{m,1}_n\). Hence, \(\psi ^{m,1}\in D\). Player 2 makes the decision based on observations belonging to \(\mathcal {F}_n\) and sequence of decisions of Player 1 until this time because of players’ priorities. Hence, the decision of Player 2 concerning acceptance of the reward \(G_n\) is a 0–1-valued function of \(\mathcal {F}_n\)-measurable random variables and sequence \(\{\psi ^{m,1}_l\}_{l=1}^n\), where the random variables \(\psi ^{m,1}_1,\ldots ,\psi ^{m,1}_n\) are \(\mathcal {F}_n\)-measurable. Hence, the decision of Player 2 concerning acceptance of the reward \(G_n\) is a 0–1-valued \(\mathcal {F}_n\)-measurable random variable which is denoted by \(\psi ^{m,2}_n\). Therefore, \(\psi ^{m,2}\in D\). Analogically, we have that the decision \(\psi ^{m,i}_n\) of Player i concerning acceptance of the reward \(G_n\) is 0–1-valued \(\mathcal {F}_n\)-measurable random variable. Therefore, the decisions of players are sequences form D. The relation between players’ decisions in Nash equilibrium will be illustrated in Example 1.
We say that \(\psi ^{m}\) is the profile of the m-person game if \(\psi ^{m}=(\psi ^{m,1},\ldots ,\psi ^{m,m})\), where \(\psi ^{m,i}\in D\) for \(i\le m\). Let \(\sigma _{m}^{i,l}(\psi ^{m})\) be the time of selling the \(l\hbox {th}\) commodity by Player i in the m-person game with the strategy profile \(\psi ^m\). Note that on \(\{\sigma _{m}^{i,l-1}(\psi ^m)<\infty \}\) we have that \(\sigma _{m}^{i,l}(\psi ^m)\) is the first time after the time of selling the \(l-1\hbox {th}\) commodity by Player i in the m-person game such that Player i wants to sell the commodity at this time and there is no player with higher priority who also wants to sell his commodity at this time. Formally, we define \(\sigma _{m}^{i,l}(\psi ^{m})\), \(i\in \{1,\ldots ,m\}\), \(l\in \{1,\ldots ,n_i\}\), recursively as follows: \(\sigma _{m}^{i,0}(\psi ^m)=0\), on \(\{\sigma _{m}^{i,l-1}(\psi ^m)<\infty \}\)
On \(\{\sigma _{m}^{i,l-1}(\psi ^m)=\infty \}\) we set \(\sigma _{m}^{i,k}(\psi ^m)=\infty \) for \(k=l,\ldots ,n_i\). Note that if \(i_1\ne i_2\) or \(l_1\ne l_2\), then \(\sigma _{m}^{i_1,l_1}(\psi ^m)\ne \sigma _{m}^{i_2,l_2}(\psi ^m)\) on \(\{\sigma _{m}^{i_1,l_1}(\psi ^m)<\infty \ \text {or}\ \sigma _{m}^{i_2,l_2}(\psi ^m)<\infty \}\).
Under the strategy profile \(\psi ^{m}\), the total reward of Player i, \(i\in \{1,\ldots ,m\}\), is \(G_{\sigma _{m}^{i,1}(\psi ^m)}+\cdots +G_{\sigma _{m}^{i,n_i}(\psi ^m)}\) and the expected total reward is
If Player i stops \(k<n_i\) times throughout the entire game, then from (7), we have \({\sigma _{m}^{i,l}(\psi ^m)}=\infty \) for \(l=k+1,\ldots ,n_i\). Note that the expected reward of Player i does not change whether he sells a commodity at \(\infty \) or not.
Denote \(D^m=\underbrace{D\times \cdots \times D}_{m}\). Let us recall, that the strategy profile \(\varphi ^m\) is a Nash equilibrium in \(D^m\) if \(\varphi ^m\in D^m\) and it is not profitable for any of the players to change only his own strategy, assuming that all players know the equilibrium strategies of other players. In other words, \(\varphi ^m\) is a Nash equilibrium if for any profile \({\psi }^m\in D^m\)
where \(((\varphi ^m)^{-i},{\psi }^{m,i})=(\varphi ^{m,1},\ldots ,\varphi ^{m,i-1},{\psi }^{m,i},\varphi ^{m,i+1},\ldots ,\varphi ^{m,m})\).
We want to find a Nash equilibrium in the set \(D^m\) for the m-person game formulated above.
Discussion The problem can also be formulated as follows. Player 1 has to solve multiple stopping problem and stop at optimal stopping times \(\tau ^{1,n_1}_1,\ldots ,\tau ^{n_1,n_1}_1\) presented in Sect. 2. Player 2 also faces a standard multiple stopping problem but with modified reward structure, i.e., his reward \(G_j^2\) is equal to \(G_j\) except for stopping times \(\tau ^{1,n_1}_1,\ldots ,\tau ^{n_1,n_1}_1\) in which \(G_j^2\) is equal to \(H_j\), where for example, \(H_j=E(\min \{G_j,\ldots ,G_{j+N_m}\}\mid \mathcal {F}_j)-1\). Analogically, reward \(G_j^3\) of Player 3 is equal to \(G_j^2\) except when Player 2 stops (equivalently \(G_j^3\) is equal to \(G_j\) except when Player 1 or Player 2 stop). In stopping times of Player 2, we have \(G_j^3=H_j\), etc. In other words, Player 2 will stop at the stopping times that belong to the set \(\{\tau ^{1,N_2}_1,\ldots ,\tau ^{N_2,N_2}_1\}\) and don’t belong to the set of stopping times of Player 1. This is ensured by the modification of the reward structure: i.e. it is not profitable to accept reward \(H_j\) because there are at least \(N_m\) better offers to come. Consequently, the rewards accepted by Player 1 are ignored by players with lower priorities. Analogically, Player 3 will stop at the stopping times that belong to the set \(\{\tau ^{1,N_3}_1,\ldots ,\tau ^{N_3,N_3}_1\}\) and don’t belong to the set of stopping times of Players 1 and 2. The idea of using modified structure rewards similar to the one above will be used in the proof of Theorem 7.
3.2 Construction of Nash equilibrium
Let us present the construction of a Nash equilibrium. Denote \(N_0=0\) and \(N_l=n_1+\cdots +n_l\) for \(l\le m\). For \(k\in {\mathbb {N}\cup \{+\infty \}}\) and \(i\in \{1,\ldots ,N_m\}\), define
where \(\sigma _{i}^{+\infty }=+\infty \). Note that on \(\{\tau ^i(k)=\infty \}\), we have \(\sigma _{i}^k=\infty \).
Note that from Proposition 1 for each k, we get \(\tau ^{n}(k)\le \cdots \le \tau ^{1}(k)\). Hence, \(\sigma _{i}^k\) are well-defined. Moreover, similarly as in Krasnosielska (2011), and Krasnosielska-Kobos and Ferenstein (2013), from Assumption 1(ii), it can be shown that for each \(k\in \mathbb {N}\) and \(i\le N_m,\, \sigma _{i}^k\) is the stopping time with respect to \(\{\mathcal {F}_j\}_{j=0}^\infty \) such that \(\sigma _i^k\le \tau ^{N_m,N_m}_k<\infty \). Similarly, we have that \(\sigma _{i}^k\ne \sigma _{j}^k\) for \(i\ne j\), and for \(l,k\in \mathbb {N}\)
Note that \(\sigma _{1}^1\) is an optimal stopping time in a single stopping problem, \(\sigma _{2}^1\) is an optimal stopping time in two-stopping problem different than \(\sigma _{1}^1\). Thus we have that \(\sigma _2^1\in \{\tau _1^{1,2},\tau _1^{2,2}\}\) and \(\sigma _2^1\ne \tau _1^{1,1}\). Analogically, \(\sigma _{3}^1\) is an optimal stopping time in three-stopping problem different than \(\sigma _{2}^1\) and \(\sigma _{1}^1\), etc. So \(\sigma _3^1\in \{\tau _1^{1,3},\tau _1^{2,3},\tau _1^{3,3}\}\) and \(\sigma _3^1\notin \{\tau _1^{1,1},\tau _1^{1,2},\tau _1^{2,2}\}=\{\tau _1^{1,2},\tau _1^{2,2}\}\).
For \(l=1,\ldots ,m\), define \(\hat{\psi }^l=(\hat{\psi }^{l,1},\ldots ,\hat{\psi }^{l,l})\), where \(\hat{\psi }^{l,i}=\{\hat{\psi }^{l,i}_n\}_{n=1}^\infty \) and
where \(\mathbb {I}(A)\) is the indicator function of the event A. From definition of \(\sigma ^1_k\) we get that \(\{\hat{\psi }_n^{l,i}\}\) is a sequence of 0–1-valued \(\{\mathcal {F}_j\}\)-adapted random variables and \(\hat{\psi }^{l,i}\in D\). Note that the profile \(\hat{\psi }^l\) is a natural candidate for a Nash equilibrium. According to the strategy profile above, Player i in the l-person game will make the same decisions as Player i in the i-person game (under the assumption that the number of commodities that Player j, \(j=1,\ldots ,i\), has for sale is equal in both games), that is, \(\hat{\psi }^{l,i}=\hat{\psi }^{i,i}\).
Proposition 2
For \(l\in \{1,\ldots ,m\}\), \(i\in \{1,\ldots ,l\}\), we have
Proof
Immediate from (12) and (7) and properties of stopping times \(\sigma ^1_k\). \(\square \)
Note that
Hence for \(i\le l\) we have \({\sigma }^{i,j}_l(\hat{\psi }^{l})={\sigma }^{i,j}_i(\hat{\psi }^{i})\), that is, Player i in the l-person game will sell his \(j\hbox {th}\) commodity at the time of selling the \(j\hbox {th}\) commodity by Player i in the i-person game. Consequently,
Note that in accordance with (14), the expected total reward of Player i in the l-person game with the strategy profile \(\hat{\psi }^l\) is equal to the expected total reward of Player i in the i-person game with the strategy profile \(\hat{\psi }^i\) (under the assumption that the number of commodities that Player \(j,\, j=1,\ldots ,i\), has for sale is equal in both games). Moreover, according to (7), the decision of Player i does not influence the decisions of players with higher priority. Therefore, the existence of Nash equilibrium is intuitively clear.
Lemma 3
For \(l\in \{1,\ldots ,m\}\), we have
Proof
Using (8), (13), (11) and Theorem 2 we get (15). \(\square \)
Theorem 4
The profile \(\hat{\psi }^m\) is a Nash equilibrium in \(D^m\).
Proof
Considerations similar to those in Krasnosielska-Kobos and Ferenstein (2013, Theorem 4) and Lemma 2, Theorem 3, Lemma 3 and (14) give the assertion. \(\square \)
3.3 Properties of the constructed Nash equilibrium
In this section, we will prove some properties of the Nash equilibrium.
Let us recall that the strategy profile \(\varphi ^m\in D^m\) is Pareto-optimal, if it is impossible to make any player to be better off without making at least one player to be worse off. In other words, there does not exist a profile \({\psi }^m\in D^m\) such that for all \(i\in \{1,\ldots ,m\}\)
and at least one of these inequalities is strict.
Theorem 5
The profile \(\hat{\psi }^m\in D^m\) is Pareto-optimal in \(D^m\).
Proof
Considerations similar to those in Krasnosielska-Kobos and Ferenstein (2013, Theorem 5) and Theorem 3 and Lemmas 2 and 3 give the assertion. \(\square \)
We will show that the constructed strategy profile \(\hat{\psi }\) is a sub-game perfect Nash equilibrium, that is, after any history, all remaining players’ strategy profile is a Nash equilibrium in the remaining part of the game.
Let \(V_{m,i}^k(\psi ^m)\) be the conditional expected reward for Player i in the m-person game at the time of the \(k\hbox {th}\) offer, that is,
Assume that we are just before the observation of the \(k\hbox {th}\) offer. The number of commodities which have been sold by Player i to this time is \(\mathcal {F}_{k-1}\)-measurable random variable taking values \(0,1,\ldots ,n_i\), where \(n_i\) is the number of commodities for sale at the beginning of the game. This follows from the fact that decisions of all players made before the observation of the \(k\hbox {th}\) offer are \(\mathcal {F}_{k-1}\)-measurable random variables. Assume that the number of commodities which have been sold by Player i before observation of the \(k\hbox {th}\) offer is equal to \(\tilde{n}_i\). Hence, Player i has still \(n_i-\tilde{n}_i\) commodities for sale at the time of observation of the \(k\hbox {th}\) offer. Note that Player i has \(n_i-\tilde{n}_i\) commodities for sale at the time of observation of the \(k\hbox {th}\) offer if and only if \(\sigma _{m}^{i,\tilde{n}_i}(\psi ^m)< k\) and \(\sigma _{m}^{i,\tilde{n}_i+1}(\psi ^m)\ge k\) which follows from (7). Player i has finished the game before observation of the \(k\hbox {th}\) offer if and only if \(\sigma _{m}^{i,n_i}(\hat{\psi }^m)<k\). Moreover, let \(\tilde{N}_i=\tilde{n}_1+\cdots +\tilde{n}_i\) and \(\tau _1^{0,N_n}=0\).
Lemma 4
For \(n\le m\), \(j\le N_{n}-\tilde{N}_{n}\) and \(h\ge k\), \(k\in \mathbb {N}\), we get
Proof
The proof is by induction on j. Note that for \(h\ge k\) and \(j=1\) we get that the L.H.S. of (16) is equal to
which is equal to the R.H.S. of (16) for \(j=1\). Assume that (16) is satisfied for \(j-1,\, j\in \{2,\ldots ,N_{n}-\tilde{N}_n\}\). We will show that (16) holds for j. Note that the L.H.S. of (16) is equal to
which is equal to the R.H.S. of (16). \(\square \)
Theorem 6
The profile \(\hat{\psi }^m\) is a sub-game perfect Nash equilibrium.
Proof
Assume that we are just about to observe of the \(k\hbox {th}\) offer and up to this time \(l\le m\) players have remained in the game, say players numbered \(i_1,\ldots ,i_l\), and they have sold \(\tilde{n}_{i_1},\ldots ,\tilde{n}_{i_l}\) commodities respectively, where \(0\le \tilde{n}_{i_j}\le n_{i_j}-1\), \(j\in \{1,\ldots ,l\}\). Define \(\mathcal {I}=\{i_1,\ldots ,i_l\}\) and
We want to prove that for \(n\in \{1,\ldots ,l\}\)
where the profile \(\psi ^m\in D^m\) and \(\{\psi ^{m,i}_j\}_{j=1}^{k-1}=\{\hat{\psi }^{m,i}_j\}_{j=1}^{k-1}\) for \(i\in \{1,\ldots ,m\}\).
Note that before the \(k\hbox {th}\) observation, Players \(i_1,\ldots , i_n\) sold together \(\tilde{N}_{i_n}\) commodities from \({N}_{i_n}\). Hence, from Proposition 2 and (11) we have
Therefore, from Proposition 2 and (11), and next from (16) and Theorem 2 we get for \(n\le l\)
where we used Theorem 3, Lemma 2, (11), (13), (7), and (8). Hence from observation that \(V_{m,i_j}^k((\hat{\psi }^m)^{-i_n},{\psi }^{m,i_n})=V_{m,i_j}^k(\hat{\psi }^m)\) for \(j\le n-1\) we get
Since (17) holds for any l, \(i_1,\ldots ,i_l\) and \(\tilde{n}_{i_1},\ldots ,\tilde{n}_{i_l}\), we get the assertion. \(\square \)
In Theorem 7, we will show that the Nash equilibrium payoff is unique.
Theorem 7
Let a profile \(\hat{\varphi }^m\in D^m\) be a Nash equilibrium, then
Proof
Using Lemma 2, Theorem 3, (8) and (15) we get \(V_{m,1}(\hat{\varphi }^m)\le V_{m,1}(\hat{\psi }^{m})\). Note that
which results in \(V_{m,1}(\hat{\varphi }^m)= V_{m,1}(\hat{\psi }^m)\). Assume that (18) is satisfied for \(j=1,\ldots ,i-1\). We will show that (18) is satisfied for \(j=i\). Note that
where we used Lemma 2, Theorem 3 and (15). Hence, from the induction assumption, we get \(V_{m,i}(\hat{\varphi }^m)\le V_{m,i}(\hat{\psi }^m)\).
To finish the proof, we need to show that the inverse inequality holds. Let C be the set of all extended stopping times of Players \(1,\ldots ,i-1\) in the m-person game with the profile \(\hat{\varphi }^m\). Define \(H_n=E(\min \{G_n,\ldots ,G_{n+N_i}\}\mid \mathcal {F}_n)-1\). Let \(\sigma _0^*=0\). Moreover, for \(l=1,\ldots ,n_i\), let \(\tilde{G}_n^l=G_n\mathbb {I}(n\notin C\cup \{\sigma _0^*,\ldots ,\sigma _{l-1}^*\})+H_n\mathbb {I}(n\in C\cup \{\sigma _0^*,\ldots ,\sigma _{l-1}^*\})\), \(n\ge 1\), \(\tilde{G}_\infty ^l=\limsup _{n\rightarrow \infty }\tilde{G}_n^l=G_\infty \),
Note that \(\sigma _l^*\), \(l\in \{1,\ldots ,n_i\}\), is the optimal extended stopping time for the sequence \(\{\tilde{G}_n^l\}\) and \(\sigma _l^*\in \bar{\mathcal {M}}_1(1)\). Define \({\psi }^{m,i}_k=\mathbb {I}(\{\sigma _{1}^*=k\}\cup \cdots \cup \{\sigma _{n_i}^*=k\})\). Note that \(\{{\psi }^{m,i}_k\}\in D\). Since \(P(\sigma _l^*\in C\cup \{\sigma _0^*,\ldots ,\sigma _{l-1}^*\},\sigma _l^*<\infty )=0\) for \(l\ge 1\), the reward of Player i in the game with strategy \(((\hat{\varphi }^{m})^{-i},{\psi }^{m,i})\) is equal to \(\tilde{G}_{\sigma _1^*}^1+\cdots +\tilde{G}_{\sigma _{n_i}^*}^{n_i}\). Hence using the fact that \(\mathcal {M}_1(n_i)\subseteq \bar{\mathcal {M}}_1(n_i)\) we get
Hence, from induction assumption, (15) and the result of Krasnosielska-Kobos and Ferenstein (2013, Lemma 9) we have
where we used the result of Krasnosielska-Kobos and Ferenstein (2013, Lemma 9). Hence, from induction assumption, we get \(V_{m,i}(\hat{\varphi }^{m})\ge V_{m,i}(\hat{\psi }^{m})\). \(\square \)
4 Examples
In this section we will present a multiple stopping problem formulated and solved in Sakaguchi (1972) and Stadje (1987). Next, we will show how each player behaves in accordance with the Nash equilibrium, that is, how exactly the strategy formulated in Sect. 3 works.
Assume that \(Y_{1},Y_{2},\ldots \) are nonnegative independent random variables with a distribution function \(F(x)=1-\exp (x)\) for \(x\ge 0\). The random variables \(Y_{1},Y_{2},\ldots \) are sequentially observed at jump times \(0<T_1<T_2<\cdots \) of a homogeneous Poisson process with intensity 1 and \(T_0=0\). Moreover, assume that the sequences \(\{Y_k\}_{k=1}^\infty \) and \(\{T_k\}_{k=1}^\infty \) are independent. We assume that \(G_k=Y_k\mathbb {I}(T_k\in [0,10))\) and \(\mathcal {F}_k=\sigma (Y_1,\ldots ,Y_k,T_1,\ldots ,T_k)\), \(k\in \mathbb {N}\), \(\mathcal {F}_0=\{\emptyset ,\Omega \}\). The solution of this multiple stopping problem is given in theorems below.
Theorem 8
(Sakaguchi 1972; Stadje 1987) Random variables \(\gamma ^{i}_{k}\), \(i\in \{1,\ldots ,n\}\), \(k\in {\mathbb {N}}_0\), given in (1) are functions of one random variable \(T_k\), say \(\gamma ^{i}(T_k)\), i.e., for each \(i\in \{1,\ldots ,n\}\), \(k\in {\mathbb {N}}_0\), we have \(\gamma ^{i}_{k}=\gamma ^{i}(T_k)\), where
Theorem 9
(Stadje 1987) \(({\tau }^{1,n}_1,\ldots ,{\tau }^{n,n}_1)\), where \({\tau }^{i,n}_1\) is given in (2) is an optimal n-stopping time in \(\mathcal {M}_1(n)\) for the sequence \(\{{G}_k\}_{k=1}^\infty \). Moreover,
Note that from (14) and Lemma 3 we have
Assume that we have three commodities for sale, i.e., \(n=3\). Then
Therefore, from (19) we have that the optimal expected reward from selling three commodities is approximately 5.43.
Now we will show how the strategy in the Nash equilibrium works. The simulation of value of the offers and times of their appearance are presented in Table 1.
In the situation presented in Table 1, we have \(\tau ^1(1)=4\), \(\tau ^2(1)=2\) and \(\tau ^3(1)=2\). Moreover, \(\tau ^{1,2}_1=\tau ^2(1)=2\) and \(\tau ^{2,2}_1=\tau ^1(\tau ^2(1)+1)=\tau ^1(3)=4\). Analogically, from (2) we have \(\tau ^{1,3}_1=2\), \(\tau ^{2,3}_1=4\) and \(\tau ^{3,3}_1=5\). To compute \(\sigma ^1_3\), note that \(\tau ^{1}({\tau ^{3}(1)+1})=4\) and \(\tau ^{2}({\tau ^{3}(1)+1})=4\). Hence, from (9) and (10) we have \(\sigma ^1_1=4\), \(\sigma ^1_2=\tau ^{1,2}_1=2\) and \(\sigma ^1_3=\tau ^{3,3}_1=5\).
Example 1
Consider a three-person game in which each player has one commodity for sale. From (20) and (21) we get \(V_{3,1}(\psi ^3)=\ln (11)\), \(V_{3,2}(\psi ^3)=\ln ({61}/{11})\), and \(V_{3,3}(\psi ^3)=\ln ({683}/{183})\). However, from Proposition 2 we have \({\sigma }^{1,1}_3(\hat{\psi }^{3})=\sigma _{1}^1=4\), \({\sigma }^{2,1}_3(\hat{\psi }^{3})=\sigma _{2}^1=2\) and \({\sigma }^{3,1}_3(\hat{\psi }^{3})=\sigma _{3}^1=5\). Thus, Player 1 will sell his commodity at the stopping time \({\sigma }^{1,1}_3(\hat{\psi }^{3})=4\), i.e., at the time \(T_4=3.99\) and get the reward \(G_4=2.96\). Player 2 will sell his commodity at the stopping time \({\sigma }^{2,1}_3(\hat{\psi }^{3})=2\), i.e., at the time \(T_2=2.67\) and get the reward \(G_2=1.44\). Player 3 will sell his commodity at the stopping time \({\sigma }^{3,1}_3(\hat{\psi }^{3})=5\), i.e., at the time \(T_5=5.83\) and get the reward \(G_5=1.73\).
Note that the \(i\hbox {th}\) player’s time of game’s end can be different from the optimal time of selling of the \(i\hbox {th}\) commodity (for example \({\sigma }^{1,1}_3(\hat{\psi }^{3})=4\), since \(\tau ^{1,3}_1=2\)). However, his expected reward is equal to the optimal expected reward which can be obtained from selling the \(i\hbox {th}\) commodity, if we have i instead of \(i-1\) commodities for sale.
Note that from the beginning of the game to the end of the game by one of the players, Players 1, 2, and 3 make the decisions using the threshold functions \(\gamma ^1,\gamma ^2\), \(\gamma ^3\), respectively [see (9) and (10)]. In our example, Player 2 finishes the game first at stopping time \(\tau ^2(1)=2\). After this time, Player 1 keeps on making decisions using threshold function \(\gamma ^1\) (because of priority). However, situation of Player 3 is changed. From now until the end of the game of Player 1 (i.e., to the stopping time \(\sigma _3^{1,1}(\hat{\psi }^3)=\tau ^1(1)=4\)), Player 3 will uses the threshold function \(\gamma ^2\) (because \(\sigma _3^{3,1}(\hat{\psi }^m)=\sigma _3^1\) and from (10) on \(\{\tau ^2(1)=\tau ^3(1)=2\}\) we have \(\sigma _3^1=\sigma _2^3\)). In the example, Player 1 stops next, so after this time Player 3 will make the decisions using the threshold function \(\gamma ^1\) (because on \(\{\tau ^2(1)=\tau ^3(1)=2, \tau ^1(3)=\tau ^2(3)=4\}\) we have \(\sigma _3^{3,1}(\hat{\psi }^m)=\sigma _1^5=\tau ^1(5)\)). Note that during the whole game Player i uses the strategy \(\hat{\psi }^{3,i}\), \(i=1,2,3\). However, the strategy of Players 2 and 3 depends on the decisions of players with higher priorities. Therefore, their threshold functions can change during the game (in the example, Player 3 changes the threshold function depending on decisions made by other players).
Example 2
Consider a two-person game. In the game, Player 1 has one commodity for sale and Player 2 has two commodities for sale. Hence, from (20) and (21) we have \(V_{2,1}(\hat{\psi }^2)=\ln (11)\), \(V_{2,2}(\hat{\psi }^2)=\ln ({683}/{33})\). However, from Proposition 2 we have \({\sigma }^{1,1}_2(\hat{\psi }^{2})=\sigma _{1}^1=4,\, {\sigma }^{2,1}_2(\hat{\psi }^{2})=\min \{\sigma _{2}^1,\sigma _{3}^1\}=2\) and \({\sigma }^{2,2}_2(\hat{\psi }^{2})=\sigma _{3}^1=5\). Player 1 will sell his commodity at the time \(T_4=3.99\) and get the reward \(G_4=2.96\). Player 2 will sell his commodities at times \(T_2=2.67\) and \(T_5=5.83\) and get the total reward \(G_2+G_5 =3.17\).
The numerous examples of multiple stopping problems in which functions \(\gamma ^i\) have been obtained can be found in Sakaguchi (1972), Stadje (1985, 1987) (see also Krasnosielska-Kobos and Ferenstein 2013) and Krasnosielska-Kobos (2015).
References
Chow YS, Robbins H, Siegmund D (1971) Great expectations: the theory of optimal stopping. Houghton Mifflin Company, Boston
Dixon MT (1993) Equilibrium points for three games based on the Poisson process. J Appl Probab 30:627–638
Ekström E, Peskir G (2008) Optimal stopping games for Markov processes. SIAM J Control Optim 47:684–702
Elfving G (1967) A persistency problem connected with a point process. J Appl Probab 4:77–89
Ferenstein E, Krasnosielska A (2009) Nash equilibrium in a game version of Elfving problem. In: Pierre B, Gaitsgory V, Pourtallier O (eds) Advances in dynamic games and their applications. Analytical and numerical developments. Birkhäuser, Boston, pp 399–414
Heller Y (2012) Sequential correlated equilibria in stopping games. Oper Res 60:209–224
Kösters H (2004) A note on multiple stopping rules. Optimization 53:69–75
Krasnosielska A (2011) Issues of optimal stopping inspired by Elfving problem (in Polish). Doctoral Dissertation, Warsaw University of Technology
Krasnosielska-Kobos A (2015) Multiple-stopping problems with random horizon. Optimization 64:1625–1645
Krasnosielska-Kobos A, Ferenstein E (2013) Construction of Nash equilibrium in game version of Elfving’s multiple stopping problem. Dyn Games Appl 3:220–235
Laraki R, Solan E (2005) The value of zero-sum stopping games in continuous time. SIAM J Control Optim 43:1913–1922
Nikolaev ML (1999) On optimal multiple stopping of Markov sequences. Theory Prob Appl 43:298–306
Nowak AS, Szajowski K (1999) Nonzero-sum stochastic games. In: Bardi M, Raghavan TES, Parthasarathy T (eds) Stochastic and differential games: theory and numerical methods. Birkhäuser, Boston, pp 297–343
Peskir G (2008) Optimal stopping games and Nash equilibrium. Theory Probab Appl 53:558–571
Ramsey D, Szajowski K (2008) Selection of a correlated equilibrium in Markov stopping games. Eur J Oper Res 184:185–206
Saario V, Sakaguchi M (1992) Multistop best choice games related to the Poisson process. Math Jpn 37:41–51
Sakaguchi M (1972) A sequential assignment problem for randomly arriving jobs. Rep Stat Appl Res JUSE 19:99–109
Siegmund DO (1967) Some problems in the theory of optimal stopping. Ann Math Stat 38:1627–1640
Solan E, Vieille N (2003) Deterministic multi-player Dynkin games. J Math Econ 1097:1–19
Stadje W (1985) On multiple stopping rules. Optimization 16:401–418
Stadje W (1987) An optimal k-stopping problem for the Poisson process. In: Bauer P, Konecny F, Wertz W (eds) Mathematical statistics and probability theory, vol B. D Reidel Publishing Company, Dordrecht, pp 231–244
Szajowski K (2002) On stopping games when more than one stop is possible. In: Kolchin VF, Kozlov VY, Mazalov VV, Pavlov YL, Prokhorov YV (eds) Probability methods in discrete mathematics, proceedings of the fifth international petrozavodsk conference, May 2000. International Science Publishers, pp 57–72
Acknowledgments
The author wishes to express her special thanks to the anonymous reviewers of this manuscript for careful reading and valuable suggestions. The study is cofounded by the European Union from resources of the European Social Fund. Project PO KL ”Information technologies: Research and their interdisciplinary applications”, Agreement UDA-POKL.04.01.01-00-051/10-00.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Krasnosielska-Kobos, A. Construction of Nash equilibrium based on multiple stopping problem in multi-person game. Math Meth Oper Res 83, 53–70 (2016). https://doi.org/10.1007/s00186-015-0519-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-015-0519-8