1 Introduction

We consider the following stochastic differential game: each player can control the fluctuation intensity of her own state process, up to an individual random time horizon. The controls of a player are individually defined as a set of progressively measurable processes, with respect to a filtration modeling the player’s information flow, with values in a bounded interval \([\sigma _1, \sigma _2]\), where \(0< \sigma _1 < \sigma _2\) are the same for all. The players whose terminal states are among the highest \(p\in (0,1)\) receive a fixed prize, set to be equal to one. The other players do not receive anything.

The game models in stylized form competitions where only the best performing agents receive a fixed reward and where every agent can choose between risky and safe actions. The game thus allows to analyze the impact of rank-based rewards on the risk appetite of the competitors. The game has multiple interpretations. We refer to Sect. 6 for more details.

As usual, we fall back on the concept of Nash equilibria for predicting the players’ behavior. Given the discontinuous rewards, it turns out to be difficult to compute explicit equilibria. Moreover, it seems already difficult to prove existence by abstract means. A way out for games with many players is to fall back on the game’s mean field version and to derive an approximate equilibrium. The equilibrium of the mean field game here consists of a control and the respective state distribution at the terminal time. Under regularity conditions on the rewards and on the state equation coefficients, existence of equilibria in mean field games with diffusion control and common noise is proved in [2] using a relaxed formulation of the problem and the theory of second order BSDEs. Other contributions consider mean field games with diffusion control, such as [8] in a Principal-Agent setting, with applications to optimal energy demand management or [6, 7] in the case of extended mean field games with control interactions.

In this paper, we avoid second order BSDEs and show, using a direct argument, that there exists an equilibrium with a threshold control that consists in choosing the maximal diffusion rate \(\sigma _2\) when the state is below a given threshold, and the minimal rate \(\sigma _1\) else. The corresponding equilibrium distribution is the distribution of an oscillating Brownian motion at the terminal time.

The game bears similarities with the diffusion control game studied in [1]. In contrast to [1], however, we allow here the agents to be heterogeneously informed. Each player is assumed to observe her own state process, but the assumption on how much the agents know beyond this remains general. Some of the players may know, e.g., the time horizons, and some not. Whether a player knows her own time horizon or the time horizons of the opponents turns out to have no effect on the equilibrium. Indeed, in the many player game the empirical distribution of the time horizons is close to \(\mathcal {T}\). Thus knowing \(\mathcal {T}\) is sufficient for implementing an approximate equilibrium control. Similarly, it has no effect on the equilibrium whether the agents can observe the state processes of the opponents or not. For implementing the threshold control of the equilibrium each agent needs only to oberve her own process.

By defining the state dynamics in terms of solutions to controlled martingale problems and choosing controls of open loop type, we obtain a model framework that allows to cover general information structures, in particular situations where a player has only partial knowledge about the other players’ states and about the individual random time horizons. In the game of [1] the state dynamics are described in terms of stochastic differential equations and the players’ controls are modeled as closed loop controls.

In the appropriate mean field version of the game the representative player knows the distribution \(\mathcal {T}\) of the time horizon, but the actual time horizon arrives unpredicted. The threshold describing the equilibrium control strongly depends on the distribution \(\mathcal {T}\). For the cases where \(\mathcal {T}\) is an exponential distribution or a uniform distribution, we characterize the threshold of the equilibrium control as the unique root of a simple equation.

The article is organized as follows: in Sect. 2, we introduce the game model in more detail. For deriving a candidate for an approximate Nash equilibrium, we study the corresponding mean field game in Sect. 3 and show that an equilibrium control is given by a threshold control. In Sect. 4, we show that the n-tuple consisting of the mean field equilibrium controls is an \(\mathcal {O}(n^{-1/2})\)-Nash equilibrium of the n-player game. This means, in the approximate Nash equilibrium players only use information about their own state and choose maximal diffusion intensity below the optimal threshold and minimal diffusion intensity above. All additional information about the opponents’ states or the random times is irrelevant. In Sect. 5, we analyze the winning probability of a given player depending on the time horizon T and in Sect. 6, we provide a particular application of our model setting to online competitions. Finally, we consider in Sect. 7 an extension of the game to more general reward functions that are continuous, have exponential growth, satisfy a symmetry condition, and are convex below a certain threshold and concave above. We show that the same tuple consisting of the mean field equilibrium controls is also an approximate Nash equilibrium for these reward functions.

2 Game model

We describe the players’ states by means of controlled martingale problems. To this end, let \(\mathbb {R}_+:=[0,\infty )\), and \(\mathcal {C}(\mathbb {R}_+,\mathbb {R}^n)\) denote the space of continuous functions \(f:\mathbb {R}_+\rightarrow \mathbb {R}^n\) equipped with the metric

$$\begin{aligned}d(\omega _1,\omega _2):= \sum _{n=1}^{\infty } \frac{1}{2^n} \frac{\sup _{t\in [0,n]} |\omega _1(t)-\omega _2(t)| }{1+\sup _{t\in [0,n]} |\omega _1(t)-\omega _2(t)| },\ \omega _1,\omega _2\in \mathcal {C}(\mathbb {R}_+,\mathbb {R}^n).\end{aligned}$$

Let \(\mathcal {B}\left( \mathcal {C}(\mathbb {R}_+,\mathbb {R}^n) \right) \) denote the corresponding Borel \(\sigma \)-algebra on \(\mathcal {C}(\mathbb {R}_+,\mathbb {R}^n)\) and denote by \(X=(X^1,\ldots ,X^n)\) the canonical process on \(\mathcal {C}(\mathbb {R}_+,\mathbb {R}^n)\), i.e., \(X_t^{i}(\omega ):=\omega ^{i}(t)\), \(t\ge 0,\) for \(i=1,\ldots ,n\) and \(\omega \in \mathcal {C}(\mathbb {R}_+,\mathbb {R}^n)\). We refer to [18], Section 1.3, for more details on the construction of this measurable space and its properties.

Let \(\mathcal {T}\) be a probability measure on \(\mathbb {R}_+\), equipped with the usual Borel \(\sigma \)-algebra \(\mathcal {B}(\mathbb {R}_+)\), that describes the distribution of the players’ random times. We suppose that \(\mathcal {T}\) satisfies:

Assumption 2.1

$$\begin{aligned}\int _{0}^{\infty }\frac{1}{\sqrt{t}}\, \mathcal {T}(d t)<\infty .\end{aligned}$$

Let \(\mathcal {B}\left( \mathbb {R}_+^n\right) \) be the Borel \(\sigma \)-algebra on \(\mathbb {R}_+^n\) and define the measure \(\mathcal {T}^n=\bigotimes _{i=1}^n \mathcal {T}\) on \(\mathbb {R}_+^n\), i.e., \(\mathcal {T}^n\) is the n-fold product of \(\mathcal {T}\). We define \(\tau =(\tau _1,\ldots ,\tau _n)\) as canonical map on \(\mathbb {R}_+^n\), i.e., \(\tau (\omega )=(\tau _1(\omega ),\ldots ,\tau _n(\omega ))=(\omega _1,\ldots ,\omega _n)\) for any \(\omega \in \mathbb {R}_+^n\). Note that \(\tau _1,\ldots ,\tau _n\) are independent and identically distributed under \(\mathcal {T}^n\) by definition and the law of \(\tau _i\) is given by \(\mathcal {T}\).

We define a common measurable space by setting:

  1. (i)

    \(\Omega :=\mathcal {C}(\mathbb {R}_+,\mathbb {R}^n)\times \mathbb {R}_+^n\),

  2. (ii)

    \(\mathcal {F}:=\mathcal {B}\left( \mathcal {C}(\mathbb {R}_+,\mathbb {R}^n) \right) \otimes \mathcal {B}\left( \mathbb {R}_+^n\right) \), i.e., the smallest \(\sigma \)-algebra containing all sets of the form \(A\times B\) for \(A\in \mathcal {B}\left( \mathcal {C}(\mathbb {R}_+,\mathbb {R}^n) \right) \), \(B\in \mathcal {B}\left( \mathbb {R}_+^n\right) \).

We extend the definitions of X and \(\tau \) to \(\Omega \) by setting \(X(\omega _1,\omega _2):=X(\omega _1)\) and \(\tau (\omega _1,\omega _2):=\tau (\omega _2)\) for \((\omega _1,\omega _2)\in \Omega \). We define on \((\Omega ,\mathcal {F})\) the filtration \((\mathcal {F}_t)_{t\ge 0}\) by

$$\begin{aligned}\mathcal {F}_t= \sigma (\tau _1,\ldots ,\tau _n)\vee \sigma (X_s:0\le s\le t),\ t\ge 0.\end{aligned}$$

The filtration \((\mathcal {F}_t)_{t\ge 0}\) describes the overall information flow including the information about the values of the time horizons. Moreover, for Player i, we introduce the filtration \((\mathcal {F}^{i}_t)_{t\ge 0}\) describing the private information of Player i. We assume that \(\sigma \left( X_s^{i}:0\le s\le t \right) \subseteq \mathcal {F}^{i}_t\subseteq \mathcal {F}_t\) for all \(t\ge 0\).

Let \(\mathcal {A}^{i}\) denote the set of all \((\mathcal {F}_t^{i})_{t\ge 0}\)-progressively measurable \(\alpha :\Omega \times \mathbb {R}_+\rightarrow [\sigma _1,\sigma _2]\). We refer to elements of \(\mathcal {A}^{i}\) as admissible controls or strategies and write \(\mathcal {A}_n\) for the product \(\mathcal {A}^1\times \ldots \times \mathcal {A}^n\). We characterize the law of the state processes by means of martingale problems, introduced by Stroock and Varadhan. We refer to the monograph [18] for more details on martingale problems.

Definition 2.2

Let \(\alpha =(\alpha ^1,\ldots ,\alpha ^n)\in \mathcal {A}_n\). Then, a probability measure \(P^\alpha \) on \((\Omega ,\mathcal {F})\) is called a feasible state distribution if

  1. (i)

    \(P^\alpha \circ X_0^{-1}=\delta _{0}\),

  2. (ii)

    \(P^{\alpha }(\mathcal {C}(\mathbb {R}_+,\mathbb {R}^n)\times B) = \mathcal {T}^n(B)\) for all \(B\in \mathcal {B}\left( \mathbb {R}_+^n \right) \),

  3. (iii)

    for all \(f\in \mathcal {C}_c^2(\mathbb {R}^n,\mathbb {R})\), i.e., for all twice continuously differentiable \(f:\mathbb {R}^n\rightarrow \mathbb {R}\) with compact support, the process \(M^f\), defined by

    $$\begin{aligned} M^f_s:= f(X_s) - f(0) - \frac{1}{2}\int _{0}^{s} \sum _{j=1}^{n} \left( \alpha ^j_r\right) ^2 \partial _{jj} f(X_r)\, dr,\ s\ge 0, \end{aligned}$$
    (1)

    is an \((\mathcal {F}_t)_{t\ge 0}\)-martingale under \(P^\alpha \).

We denote by \(\mathcal {Q}(\alpha )\) the set of all feasible state distributions \(P^\alpha \).

Remark 2.3

The assumptions on the tuple \(\alpha \) in the previous definition do not exclude that \(\mathcal {Q}(\alpha )\) is empty. For a tuple \(\alpha \) to be a Nash equilibrium it is necessary, however, that \(\mathcal {Q}(\alpha ) \not = \emptyset \) (see Definition 2.8 below).

Remark 2.4

Note that condition (ii) implies that each random time \(\tau _i\) has distribution \(\mathcal {T}\). Moreover, condition (iii) yields that each state process \(X^{i}\) is a local \((\mathcal {F}_t)_{t\ge 0}\)-martingale and

$$\begin{aligned} \langle X^{i}, X^j \rangle _t&= {\left\{ \begin{array}{ll} \int _{0}^{t} (\alpha _s^{i})^2\, ds,\ {} &{} \text {if } i=j,\\ 0,\ {} &{}\text {else,} \end{array}\right. }\qquad t\ge 0. \end{aligned}$$

The state processes are even true martingales that are square integrable, i.e., \(E^{P^\alpha }\left[ (X^{i}_t)^2 \right] <\infty \), \(t\ge 0\), because the controls are bounded and thus, \(E^{P^\alpha }\left[ \langle X^{i}, X^i \rangle _t \right] <\infty \) for all \(t\ge 0\) (see, e.g., [17], Section II.6, Corollary 3, p.73).

Remark 2.5

The definition of the filtrations \((\mathcal {F}_t^{i})_{t\ge 0}\) is rather general. Each filtration \((\mathcal {F}_t^{i})_{t\ge 0}\) can contain information about the other players’ states and random times. Therefore, the game setting covers the cases where players can or cannot observe each other, and have knowledge about the random time horizons.

If, e.g., \(\mathcal {F}_t^{i}=\mathcal {F}_t\), each player can observe the state processes of the opponents and make her strategy depend on the opponents’ state trajectories. Moreover, each player has prior knowledge of the random time horizons. If, however, \(\mathcal {F}_t^{i}=\sigma (X_s^{i}:0\le s \le t)\), then each player can only observe her own state process. Neither information about the opponents’ states nor about the random times is available.

Despite this quite general information structure, we show in Theorem 4.1 that an approximate Nash equilibrium is given by a tuple of threshold controls depending only on the position of the single player’s state process.

Remark 2.6

The state processes can be equivalently described as weak solutions to an n-dimensional SDE (or via a stochastic integral w.r.t. some Brownian motion). Indeed, for some control \(\alpha \in \mathcal {A}_n\) with \(\mathcal {Q}(\alpha )\ne \emptyset \) and \(P\in \mathcal {Q}(\alpha )\), there exists an n-dimensional Brownian motion W on \((\Omega ,\mathcal {F},(\mathcal {F}_t)_{t\ge 0},P)\) such that the state processes \(X^1,\ldots ,X^n\) satisfy P-a.s.

$$\begin{aligned} X_t^{i} = \int _{0}^{t} \alpha ^{i}_s\, dW_s^{i},\ t\ge 0,\ i=1,\ldots ,n, \end{aligned}$$
(2)

and \((\Omega ,\mathcal {F},(\mathcal {F}_t)_{t\ge 0},P,X,W)\) is a weak solution to (2) (see, e.g., [11], Proposition 5.4.6). We use this connection in the mean field game presented in Sect. 3, and thus, characterize the state processes via stochastic integrals.

Remark 2.7

For all measurable feedback functions \(a:\mathbb {R}_+\times \mathbb {R}^n\rightarrow [\sigma _1,\sigma _2]^n\), there exists a solution \(P^{a}\) to the martingale problem (1) with \((\alpha _s)_{s\ge 0}=\left( a(s,X^1_s,\ldots ,X^n_s)\right) _{s\ge 0}\). This follows, e.g., from [13], Theorem 2.6.1, and [11], Proposition 5.4.11. If the filtration \((\mathcal {F}^{i}_t)_{t\ge 0}\) contains the information about all states, i.e. if \(\sigma \left( X_s: 0 \le s \le t \right) \subset \mathcal {F}_t^{i}\), then the control \(\left( a(s,X^1_s,\ldots ,X^n_s)\right) _{s\ge 0}\) is contained in \(\mathcal {A}^{i}\) and \(\{P^{a}\}\subseteq \mathcal {Q}(\alpha )\). This particularly holds for control tuples where each entry is a threshold control, i.e., each entry is given by the feedback function

$$\begin{aligned} m_b(x) = {\left\{ \begin{array}{ll} \sigma _2, &{} \text {if } x\le b,\\ \sigma _1, &{} \text {if } x>b, \end{array}\right. } \end{aligned}$$
(3)

where \(b\in \mathbb {R}\). In this case, the solution to the martingale problem (1) is even unique: Remark 3.1 below implies that the SDE (7) with \(m=m_b\) has a unique strong solution and [11], Corollary 5.4.9, then implies that uniqueness for the martingale problem (1) holds.

We suppose that each player aims at maximizing the probability of her own state at her terminal random time to be greater than the empirical \((1-p)\)-quantile of all states at the individual random times. More precisely, let

$$\begin{aligned} \mu ^{n} = \frac{1}{n} \sum _{i=1}^n \delta _{X^{i}_{\tau _i}} \end{aligned}$$

be the empirical distribution of the players’ states at the terminal times. We define the empirical \((1-p)\)-quantile by \(q(\mu ^n,1- p) = \inf \{r\in \mathbb {R}:\mu ^{n}((-\infty ,r])\ge 1-p \}\). Note that \(X^{i}_{\tau _i} > q(\mu ^n,1- p)\) if and only if the state of Player i is among the best \(\lfloor np\rfloor \) players at the terminal times.

It is standard to predict or explain the players’ behavior in terms of (approximate) Nash equilibria, which are here defined as follows.

Definition 2.8

Let \(\varepsilon \ge 0\). A tuple \(\alpha = (\alpha _1, \ldots ,\alpha _n) \in \mathcal {A}_n\) with \(\mathcal {Q}(\alpha )\ne \emptyset \) is called \(\varepsilon \)-Nash equilibrium of the n-player game if for all \(i \in \{1, \ldots , n\}\) and \(P^\alpha \in \mathcal {Q}(\alpha )\), we have

$$\begin{aligned} P^{\alpha }(X^{i}_{\tau _i}> q(\mu ^n,1-p))+\varepsilon \ge \sup _{\beta \in \mathcal {A}^{i}} \sup _{P\in \mathcal {Q}(\alpha _{-i},\beta )} P(X^{i}_{\tau _i}> q(\mu ^n,1-p)), \end{aligned}$$
(4)

where \((\alpha _{-i},\beta ) = (\alpha _1, \ldots , \alpha _{i-1}, \beta , \alpha _{i+1}, \ldots , \alpha _n)\) and \(\sup \emptyset = -\infty \).

Note that for \(\varepsilon =0\), the tuple \(\alpha \) in Definition 2.8 is a Nash equilibrium in the usual sense. In the case \(\varepsilon >0\), the tuple \(\alpha \) is also called an approximate Nash equilibrium.

We do not assume that the solutions to the martingale problem (1) are unique by limiting the set of controls. Hence, we require that (4) holds for all solutions to the martingale problem (1). We emphasize that for an arbitrary control tuple \(\alpha \in \mathcal {A}_n\), uniqueness of solutions to the martingale problem (1) can fail. For example, if \(n\ge 3\), then it was shown in [15] that there exists a diffusion coefficient, and hence, an operator such that the corresponding martingale problem does not have a unique solution. Equivalently, there exists a diffusion coefficient \(\sigma :\mathbb {R}^n\rightarrow \mathbb {R}^{n\times n} \) that is uniformly elliptic and such that for the corresponding SDE uniqueness in law fails (see also [5], Example 1.24). Thus, one can interpret (4) as follows: Player i has no incentive to change her strategy from \(\alpha \) to \(\beta \), no matter which distribution from \(\mathcal {Q}(\alpha _{-i}, \beta )\) is chosen. We stress, however, that for the approximate Nash equilibria derived from the corresponding mean field game, uniqueness is always satisfied.

We do not compute an exact Nash equilibrium for the n-player game. As in [1], we compute an approximate Nash equilibrium for large games by considering the mean field limit of the game. We show that a mean field equilibrium strategy is given by a threshold control.

Remark 2.9

Let \(T>0\) and \(\tau _1=\ldots =\tau _n=T\), i.e., \(\mathcal {T}=\delta _T\). Set \(\mathcal {F}^{i}_t=\sigma (X_s:0\le s\le t)\), \(t\ge 0\), \(i=1,\ldots ,n\), and consider only controls of the form \(\alpha _t= a(t,X_t^1,\ldots ,X_t^n)\) for some \(a:\mathbb {R}_+\times \mathbb {R}^n\rightarrow [\sigma _1,\sigma _2]\). Then, the above model is equivalent to the game model presented in the article [1].

3 Mean field game

In this section we describe the mean field version of the game introduced in Sect. 2.

Let \(0<\sigma _1<\sigma _2\) and \((\Omega ,\mathcal {F},(\mathcal {F}_t)_{t\ge 0},P)\) be a complete filtered probability space satisfying the usual conditions. We suppose that \((\Omega ,\mathcal {F},(\mathcal {F}_t)_{t\ge 0},P)\) supports an \((\mathcal {F}_t)_{t\ge 0}\)-Brownian motion \((W_t)_{t\ge 0}\) and an \(\mathbb {R}_+\)-valued random variable \(\tau \). We assume that \((W_t)_{t\ge 0}\) and \(\tau \) are independent and

$$\begin{aligned} E\left[ \tau ^{-\frac{1}{2}} \right] <\infty , \end{aligned}$$
(5)

see Assumption 2.1 above. Let \({\tilde{\mathcal {A}}}\) be the set of all processes \(\alpha :\Omega \times \mathbb {R}_+\rightarrow [\sigma _1,\sigma _2]\) that are \((\mathcal {F}_t)_{t\ge 0}\)-progressively measurable. Given an agent chooses the control \(\alpha \in {\tilde{\mathcal {A}}}\), the state process is defined by

$$\begin{aligned} X_t^{\alpha }:= \int _{0}^{t}\! \alpha _s\, dW_s,\ t\ge 0. \end{aligned}$$
(6)

Remark 3.1

All feedback controls with a feedback function \(m:\mathbb {R}\rightarrow [\sigma _1,\sigma _2]\) of bounded variation are contained in \({\tilde{\mathcal {A}}}\). Indeed, the SDE

$$\begin{aligned} dX_t = m(X_t) dB_t, \quad X_0 = 0, \end{aligned}$$
(7)

has a weak solution because of Theorem 2.6.1 in [13], and pathwise uniqueness applies according to results in [16]. Hence, there exists a unique strong solution \(X^m\) to (7) (cf. Section 5.3 in [11]) and the control \((m(X_t^m))_{t\ge 0}\) is contained in \({\tilde{\mathcal {A}}}\).

Let \(p \in (0,1)\) and denote by \(q(\mu ,1-p)\) the \((1-p)\)-quantile of some probability measure \(\mu \in \mathcal {P}(\mathbb {R})\), i.e., for a Borel probability measures \(\mu \) on \(\mathbb {R}\), \(q(\mu , 1-p)=\inf \{r \in \mathbb {R}: \mu ((-\infty , r]) \ge 1- p\} \). If \(\mu =\text {Law}(X_\tau ^{\alpha })\), i.e., \(\mu \) is the law of \(X_\tau ^\alpha \) for a control \(\alpha \in {\tilde{\mathcal {A}}}\), we write \(q(X^\alpha _\tau ,1-p)\) for \(q(\mu ,1-p)\). In the mean field game that corresponds to the n-player game described in Sect. 2, a single player wants to maximize the probability of being larger than the population \((1-p)\)-quantile over all admissible controls. In the mean field game, we define equilibria in the following sense:

Definition 3.2

A tuple \((\mu ^*,\alpha ^*)\in \mathcal {P}(\mathbb {R})\times {\tilde{\mathcal {A}}}\) is called mean field equilibrium if

  1. (i)

    for all \(\alpha \in {\tilde{\mathcal {A}}}\),

    $$\begin{aligned} P\big (X^{\alpha ^*}_\tau> q(\mu ^*,1-p)\big ) \ge P(X^{\alpha }_\tau > q(\mu ^*,1-p)), \end{aligned}$$
  2. (ii)

    it holds \(\mu ^*=\text {Law}(X^{\alpha ^*}_\tau )\).

Lemma 3.3

Let \(b \in \mathbb {R}\). Then

$$\begin{aligned} P(X^{m_b}_\tau> b) = \max _{\alpha \in {\tilde{\mathcal {A}}}} P(X^{\alpha }_\tau > b), \end{aligned}$$
(8)

where \(m_b\) is defined in Eq. (3).

Proof

The result follows from the diffusion control problem studied by McNamara [14]. In more detail, let \(\alpha \in {\tilde{\mathcal {A}}}\). Assume, without loss of generality, that there exists a regular conditional probability \(Q:\mathbb {R}_+\times \mathcal {F}\rightarrow [0,1]\) for \(\mathcal {F}\) given \(\tau \). Because \(\tau \) and W are independent, W is also a Brownian motion under \(Q(t,\cdot )\) for \(P_\tau \)-a.e. \(t\in \mathbb {R}_+\). Hence, we see that

$$\begin{aligned}Q(t,\{X_\tau ^{\alpha }>b\}) = Q(t,\{X_t^{\alpha }>b\})\le Q(t,\{X_t^{m_b}>b\}),\ \text {for}\ P_\tau \text {-a.e.}\ t\in \mathbb {R}_+, \end{aligned}$$

using either [14], Remark 8, or [19], Proposition C.5. Finally,

$$\begin{aligned}P(X^{\alpha }_\tau> b) = \int _{0}^{\infty } Q(t,X_t^\alpha>b)\, P_\tau (dt)\le \int _{0}^{\infty } Q(t,X_t^{m_b}>b)\, P_\tau (dt) = P(X_\tau ^{m_b}>b).\end{aligned}$$

We refer to [11], Section 5.3.C, or [10], Section 1.3, for more details on regular conditional probabilities. \(\square \)

Lemma 3.3 implies that the optimal control for (8) is the threshold control \(m_b\). In the following, we just write \(X^b\) for the state \(X^{m_b}\). The process \(X^b\) is a so-called oscillating Brownian motion (OBM) with threshold b, introduced in [12]. In more detail, OBM is defined as follows.

Definition 3.4

Let \(b\in \mathbb {R}\). We call the solution \(X^{b}\) of the SDE

$$\begin{aligned} dX_t = m_b(X_t) dB_t,\ X_0=0, \end{aligned}$$
(9)

oscillating Brownian motion (OBM) with threshold b and initial value 0.

There exists indeed a unique strong solution to the SDE (9) because of Remark 3.1. For OBMs, one can explicitly calculate the probability density function and the cumulative distribution function. For the reader’s convenience, we recall the following result on OBMs.

Proposition 3.5

Let \(b\in \mathbb {R}\) and \(X^{b}\) be an OBM with threshold b and initial value 0. Then, for all \(t > 0\), the random variable \(X^{b}_t\) has a probability density function \(p(t,-b,\cdot -b)\) with respect to the Lebesgue measure, where

$$\begin{aligned} p(t,x,y) = \left\{ \begin{array}{cc} \frac{2 \sigma _1}{\sigma _2(\sigma _1 + \sigma _2)} \frac{1}{\sqrt{2 \pi t}} e^{-( \frac{x}{\sigma _1} - \frac{y}{\sigma _2})^2 \frac{1}{2t}}, &{} \text { if } x \ge 0, y< 0,\\ \frac{2 \sigma _2}{\sigma _1(\sigma _1 + \sigma _2)} \frac{1}{\sqrt{2 \pi t}} e^{-( \frac{y}{\sigma _1} - \frac{x}{\sigma _2})^2 \frac{1}{2t}}, &{} \text { if } x< 0, y \ge 0, \\ \frac{1}{\sigma _1 \sqrt{2 \pi t}} \left( e^{- \frac{(y-x)^2}{2 \sigma _1^2 t}} + \frac{\sigma _2-\sigma _1}{\sigma _1 + \sigma _2} e^{- \frac{(y+x)^2}{2 \sigma _1^2 t}} \right) , &{} \text { if } x \ge 0, y \ge 0, \\ \frac{1}{\sigma _2 \sqrt{2 \pi t}} \left( e^{- \frac{(y-x)^2}{2 \sigma _2^2 t}} + \frac{\sigma _1-\sigma _2}{\sigma _1 + \sigma _2} e^{- \frac{(y+x)^2}{2 \sigma _2^2 t}} \right) ,&{} \text { if } x< 0, y < 0, \end{array} \right. \end{aligned}$$

for all \(x,y\in \mathbb {R}\). Moreover, the cumulative distribution function of \(X^{b}_t\) is given by

$$\begin{aligned} F^b_t(x) = {\left\{ \begin{array}{ll} \Phi \left( \frac{x}{\sigma _2 \sqrt{t}} \right) - \frac{\sigma _2-\sigma _1}{\sigma _1+\sigma _2} \Phi \left( \frac{x-2b}{\sigma _2 \sqrt{t}} \right) , &{} \text { if } x<b, b\ge 0,\\ \frac{2\sigma _2}{\sigma _1+\sigma _2} \Phi \left( \frac{ x-b\left( 1-\frac{\sigma _1}{\sigma _2} \right) }{\sigma _1 \sqrt{t}} \right) - \frac{\sigma _2-\sigma _1}{\sigma _1+\sigma _2}, &{} \text { if } x\ge b, b\ge 0,\\ \frac{2\sigma _1}{\sigma _1+\sigma _2} \Phi \left( \frac{x-b\left( 1-\frac{\sigma _2}{\sigma _1}\right) }{\sigma _2 \sqrt{t}} \right) , &{} \text { if } x<b, b<0,\\ \Phi \left( \frac{x}{\sigma _1\sqrt{t}} \right) - \frac{\sigma _2-\sigma _1}{\sigma _1+\sigma _2}\Phi \left( \frac{2b-x}{\sigma _1\sqrt{t}} \right) , &{} \text { if } x\ge b, b<0. \end{array}\right. } \end{aligned}$$
(10)

The proof of Proposition 3.5 follows either from [12], Theorem 1, or [19], Proposition B.2 and Proposition B.4. For more details on OBMs, we refer to [12] and [19], Appendix B. Using the independence of the OBM \(X^b\) and the random time \(\tau \), one can derive from Proposition 3.5 the cumulative distribution function of \(X_\tau ^b\).

Lemma 3.6

Let \(b\in \mathbb {R}\) and \(X^{b}\) be an OBM with threshold b and initial value 0. Then the cumulative distribution function of the random variable \(X_\tau ^b\), denoted by \(F_\tau ^b\), is given by

$$\begin{aligned}F_{\tau }^b(x) = \int _{0}^{\infty } F_t^b(x)\, P_\tau (dt),\ x\in \mathbb {R}.\end{aligned}$$

We refer to Proposition B.8 in [19] for the proof of Lemma 3.6. One can show that the functions \(F_t^b\) and \(F_\tau ^b\) are Lipschitz continuous in the state variable as well as in the threshold b (see Appendix B in [19] for more details).

The standard approach to solve mean field games is to consider mappings from probability distributions to the distributions of optimally controlled states and find their fixed points, the so-called equilibrium measures (see, e.g., [3] and [4]). However, Lemma 3.3 allows to study the distributions of OBMs only, which can be parameterized by the real-valued threshold \(b\in \mathbb {R}\). For identifying equilibria, it suffices to show that the function

$$\begin{aligned} f: \mathbb {R}\rightarrow \mathbb {R}, \quad b \mapsto q(X^b_\tau ,1-p), \end{aligned}$$

has a unique fixed point. Indeed, if \(f(b) = b\), then \(b = q(X^b_\tau , 1-p)\). Lemma 3.3 further implies that \(P(X^{b}_\tau> q(X^{b}_\tau ,1-p)) = \max _{\beta \in \mathcal {M}} P(X^{\beta }_\tau > q(X^{b}_\tau ,1-p))\); hence, \((\text {Law}(X_\tau ^b),m_b)\) is an equilibrium. The main result of this section is the following:

Theorem 3.7

There exists a unique \(b^*\in \mathbb {R}\) such that \(q\left( X_\tau ^{b^*},1-p\right) =b^*\). The tuple \((\text {Law}(X_{\tau }^{b^*}), m_{b^*})\) is a mean field equilibrium.

Proof

Recall that \(F_\tau ^b(x)=P(X_{\tau }^{b}\le x)\), \(x \in \mathbb {R}\). Consider the map \(f(b):=F_\tau ^b(b)\). Using the explicit formula for \(F_\tau ^b\) given by Proposition 3.5 and Lemma 3.6 (see also [19], Proposition B.4 and Proposition B.8), we observe that

$$\begin{aligned} f(b)= {\left\{ \begin{array}{ll} \displaystyle \int _{0}^{\infty }\left( \frac{2\sigma _2}{\sigma _2+\sigma _1} \Phi \left( \frac{b}{\sigma _2 \sqrt{t}} \right) - \frac{\sigma _2-\sigma _1}{\sigma _2+\sigma _1} \right) \, P_{\tau }(dt), &{} \text {if } b\ge 0,\\ &{} \\ \displaystyle \int _{0}^{\infty } \frac{2\sigma _1}{\sigma _2+\sigma _1} \Phi \left( \frac{b}{\sigma _1 \sqrt{t}} \right) \, P_{\tau }(dt), &{} \text {if } b< 0. \end{array}\right. } \end{aligned}$$
(11)

The function f is strictly increasing and \(f(0)=\frac{\sigma _1}{\sigma _1+\sigma _2}\). Moreover, f is Lipschitz continuous because of the assumption (5) above, and Proposition B.10 and Proposition B.12 in [19]. Monotone convergence implies that \(\lim _{b\rightarrow +\infty } f(b)=1\) and \(\lim _{b\rightarrow -\infty } f(b)=0\). Hence, the intermediate value theorem implies that there exists a unique \(b^*\in \mathbb {R}\) such that

$$\begin{aligned}f(b^*)=1-p \in (0,1). \end{aligned}$$

We have \(f(0)\le 1-p\) if and only if \(b^*\ge 0\). We conclude that \(q\left( X_{\tau }^{b^*},1-p\right) = b^*\) because \(F_\tau ^{b^*}(b^*)=1-p\). Lemma 3.3 implies that

$$\begin{aligned} P\big ( X_\tau ^{b^*}>q\big (X_{\tau }^{b^*},1-p\big ) \big )&= P\big ( X_\tau ^{b^*}> b^*\big )\\&= \sup _{\alpha \in {\tilde{\mathcal {A}}}} P \big ( X_\tau ^{\alpha }>b^*\big ) = \sup _{\alpha \in {\tilde{\mathcal {A}}}} P \big ( X_\tau ^{\alpha } >q\big (X_{\tau }^{b^*},1-p\big ) \big ), \end{aligned}$$

and thus, \((\text {Law}(X_{\tau }^{b^*}), m_{b^*})\) is a mean field equilibrium. \(\square \)

Example 3.8

If \(\tau \) is exponentially distributed with parameter \(\lambda >0\), then the threshold \(b^*\) in Theorem 3.7 is given by

$$\begin{aligned} b^*= \left\{ \begin{array}{ll} -\frac{\sigma _2}{ \sqrt{2\lambda }}\log \left( \frac{\sigma _1+\sigma _2}{\sigma _2} p \right) ,&{} \text {if } p < \frac{\sigma _2}{\sigma _1 + \sigma _2}, \\ \frac{\sigma _2}{\sqrt{2\lambda }} \log \left( \frac{\sigma _1+\sigma _2}{\sigma _1} \left( 1-p \right) \right) , &{} \text {if } p \ge \frac{\sigma _2}{\sigma _1 + \sigma _2}. \end{array} \right. \end{aligned}$$
(12)

Note that the exponential distribution satisfies the condition (5). Proposition 3.5 and Lemma 3.6 imply that for \(b\ge 0\)

$$\begin{aligned} F_{\tau }^b(b)&=1- \frac{2\lambda \sigma _2}{\sigma _1+\sigma _2} \int _{0}^{\infty } e^{-\lambda t} \Phi \left( -\frac{b}{\sigma _2\sqrt{t}} \right) \, dt = 1-\frac{\sigma _2}{\sigma _1+\sigma _2} e^{-\frac{b}{\sigma _2}\sqrt{2\lambda }}. \end{aligned}$$

The last step follows because

$$\begin{aligned} \Phi \left( -\frac{b}{\sigma _2\sqrt{t}} \right)&= \frac{1}{2} \text {erfc}\left( \frac{b}{\sigma _2 \sqrt{2 t}} \right) ,\qquad \text {erfc}(x):= \frac{2}{\sqrt{\pi }} \int _{x}^{\infty } e^{-y^2}\, dy, \end{aligned}$$

and the Laplace transform of the right-hand side is equal to \(\frac{1}{2\lambda } e^{-\frac{b}{\sigma _2}\sqrt{2\lambda }}\) (see, e.g., [9], Chapter 8, Table 3). Similarly, we see for \(b<0\) that

$$\begin{aligned} F_{\tau }^b(b)&= \frac{\sigma _1}{\sigma _1+\sigma _2} e^{\frac{b}{\sigma _1}\sqrt{2\lambda }}. \end{aligned}$$

Therefore, the function f in (11) is given by

$$\begin{aligned}f(b)= {\left\{ \begin{array}{ll} 1-\frac{\sigma _2}{\sigma _1+\sigma _2} e^{-\frac{b}{\sigma _2}\sqrt{2\lambda }}, &{} \text {if } b\ge 0,\\ \frac{\sigma _1}{\sigma _1+\sigma _2} e^{\frac{b}{\sigma _1}\sqrt{2\lambda }}, &{} \text {if } b< 0. \end{array}\right. } \end{aligned}$$

Now, it is straightforward to verify that \(f(b^*)=1-p\) if and only if \(b^*\) satisfies Eq. (12).

Example 3.9

Suppose that \(\tau \) is uniformly distributed on some interval \([t_1,t_2]\) with \(0\le t_1<t_2\). Notice that condition (5) is again satisfied. The cumulative distribution function of the OBM \(X^b\) at time \(\tau \) and the point b is given by

$$\begin{aligned} F_\tau ^b(b)&= \int _{t_1}^{t_2} \frac{1}{t_2-t_1} F_t^b(b)\, dt\\&= {\left\{ \begin{array}{ll} \displaystyle \frac{1}{t_2-t_1}\frac{ 2b^2}{\sigma _2(\sigma _1+\sigma _2)} \int _{\frac{\sigma _2^2}{b^2}t_1}^{\frac{\sigma _2^2}{b^2}t_2} \Phi \left( \frac{1}{\sqrt{t}} \right) \, dt- \frac{\sigma _2-\sigma _1}{\sigma _1+\sigma _2}, &{} \text {if } b> 0,\\ \ {} &{} \\ \displaystyle \frac{\sigma _1}{\sigma _1+\sigma _2}, &{} \text {if } b=0, \\ \ {} &{} \\ \displaystyle \frac{1}{t_2-t_1} \frac{2b^2}{\sigma _1(\sigma _1+\sigma _2)} \int _{\frac{\sigma _1^2}{b^2}t_1}^{\frac{\sigma _1^2}{b^2}t_2}\left( 1- \Phi \left( \frac{1}{\sqrt{t}} \right) \right) \, dt, &{} \text {if } b< 0, \end{array}\right. }\\&\\&= {\left\{ \begin{array}{ll} \displaystyle \frac{1}{t_2-t_1} \frac{ 2b^2}{\sigma _2(\sigma _1+\sigma _2)} \left( G\left( \frac{\sigma _2^2 t_2}{b^2}\right) - G\left( \frac{\sigma _2^2 t_1}{b^2} \right) \right) - \frac{\sigma _2-\sigma _1}{\sigma _1+\sigma _2}, &{} \text {if } b> 0,\\ \ {} &{} \\ \displaystyle \frac{\sigma _1}{\sigma _1+\sigma _2}, &{} \text {if } b=0, \\ \ {} &{} \\ \displaystyle \frac{1}{t_2-t_1} \frac{2b^2}{\sigma _1(\sigma _1+\sigma _2)} \left( G\left( \frac{\sigma _1^2 t_2}{b^2}\right) - G\left( \frac{\sigma _1^2 t_1}{b^2} \right) \right) , &{} \text {if } b< 0, \end{array}\right. } \end{aligned}$$

where \(G(s):= (s+1) \Phi \left( \frac{1}{\sqrt{s}}\right) + \sqrt{s} \varphi \left( \frac{1}{\sqrt{s}} \right) \), \(s>0\), and where \(\varphi \) and \(\Phi \) denote respectively the standard Gaussian density and cumulative distribution functions. For given parameters, one can solve the equation \(F_\tau ^b(b)=1-p\) for b numerically, but one cannot find a solution in closed form.

[0]

4 Approximate Nash equilibrium

We return to the setting of Sect. 2 and state the main result.

Theorem 4.1

Let \(b^*\in \mathbb {R}\) be the threshold of the mean field equilibrium control, given by Theorem 3.7. Let \(\alpha ^*= (\alpha ^{1,*}, \ldots , \alpha ^{n,*})\in \mathcal {A}_n\) be the tuple of strategies, defined by

$$\begin{aligned} \alpha ^{i,*}_t:=m_{b^*}(X_t^{i})={\left\{ \begin{array}{ll} \sigma _2, &{} \text { if } X^i_t\le b^{*},\\ \sigma _1, &{} \text { if } X^i_t> b^*, \end{array}\right. } \end{aligned}$$
(13)

Then, there exists a sequence \(\varepsilon _n \ge 0\) with \(\lim _{n\rightarrow \infty } \varepsilon _n = 0\) such that \(\alpha ^*\) is an \(\varepsilon _n\)-Nash equilibrium of the n-player game. We can choose \(\varepsilon _n\in \mathcal {O}(n^{-1/2})\).

Remark 4.2

The approximate equilibrium in Theorem 4.1 does not depend on the information structure of the game. Players may or may not have information about the other players’ states and the random times. For implementing the strategy (13), Player i only needs to observe her own state. Therefore, the tuple of strategies (13) is also an approximate Nash equilibrium for the game version where the players cannot observe each other.

Moreover, the tuple of strategies (13) is an approximate Nash equilibrium for the game where players are only allowed to choose feedback strategies depending on all players’ states as in the article [1]. In the case \(\tau _1,\ldots ,\tau _n\equiv T\), Theorem 4.1 covers the main result of the n-player game with finite deterministic time horizon presented in article [1]. We refer to Remark 2.9 above for more details.

The proof of Theorem 4.1 is similar to the proof of Theorem 2 in [1], where the case with a common deterministic time horizon is treated. Nevertheless, we present the proof of Theorem 4.1 in a concise manner. More details can be found in [19], Theorem 3.2.16.

Proof of Theorem 4.1

Note that \(\mathcal {Q}(\alpha ^*)\ne \emptyset \) (see Remark 2.7). Let \(i\in \{1,\ldots ,n\}\). We compare \(\alpha ^*= (\alpha ^{1,*}, \ldots , \alpha ^{n,*})\) with the tuple where Player i deviates from \(\alpha ^{i,*}\) by choosing a strategy \(\beta \). To this end, let \(\beta \in \mathcal {A}^{i}\) such that \(\mathcal {Q}(\alpha ^{-i,*},\beta )\ne \emptyset \), and let \(P\in \mathcal {Q}(\alpha ^*)\) and \(\tilde{P}\in \mathcal {Q}(\alpha ^{-i,*},\beta )\). In the proof, we use the following notation:

  • \((\alpha ^{-i,*},\beta ) = (\alpha ^{1,*}, \ldots , \alpha ^{i-1,*},\beta , \alpha ^{i+1,*}, \ldots , \alpha ^{n,*})\),

  • \(X^{-i}=(X^1,\ldots ,X^{i-1},X^{i+1},\ldots ,X^n)\),

  • \(\tau _{-i}=(\tau _1,\ldots ,\tau _{i-1},\tau _{i+1},\ldots ,\tau _n)\),

  • \(\mu :=P\circ \left( X_{\tau _1}^{1}\right) ^{-1}\) (i.e., the law of an OBM with threshold \(b^*\) at time \(\tau _1\)),

  • \(\mu ^{n}=\frac{1}{n} \sum _{j=1}^{n} \delta _{X_{\tau _j}^{j}}\) and \(\mu ^{n-1}=\frac{1}{n-1} \sum _{j\ne i} \delta _{X_{\tau _j}^{j}}\).

We aim at showing that

$$\begin{aligned} \tilde{P}\left( X^{i}_{\tau _i}> q(\mu ^{n}, 1-p)\right) - P\left( X^{i}_{\tau _i} > q(\mu ^{n},1- p)\right) \le \frac{C}{\sqrt{n}}, \end{aligned}$$
(14)

for some constant \(C>0\) independent of \(\tilde{P}\), P, and the control \(\beta \). Note that

$$\begin{aligned} P\circ \left( X^{-i},\tau _1,\ldots ,\tau _n \right) ^{-1}=\tilde{P}\circ \left( X^{-i},\tau _1,\ldots ,\tau _n \right) ^{-1}. \end{aligned}$$
(15)

Step 1. We first estimate \(P(X_{\tau _i}^{i} > q(\mu ^{n},1- p))\) from below. Notice that \(X^{i}_{\tau _i}\) and the empirical quantile \(q(\mu ^{n},1- p)\) are not independent. Therefore, we consider the empirical measure \(\mu ^{n-1}\) and define

$$\begin{aligned} A(n):= q\left( \mu ^{n-1}, \frac{n}{n-1}(1-p) \right) , \end{aligned}$$
(16)

i.e., A(n) is the empirical \(\frac{n}{n-1}(1-p)\)-quantile of the states \(X^{-i}\). Notice that A(n) is independent of \(X^{i}_{\tau _i}\) under P and

$$\begin{aligned} A(n) \ge q(\mu ^{n}, 1-p). \end{aligned}$$

Thus,

$$\begin{aligned} P\left( X^{i}_{\tau _i}> q(\mu ^{n},1- p)\right) \ge P\left( X^{i}_{\tau _i} > A(n)\right) = E\left[ 1- F^{b^*}_{\tau } (A(n)) \right] , \end{aligned}$$
(17)

where E denotes the expectation under P and \(F^{b^*}_{\tau }\) denotes the cumulative distribution function of the OBM with threshold \(b^*\) at a random time \(\tau \) (see Lemma 3.6).

Step 2. Next, we estimate the first term in (14) from above: we replace in (14) the quantile \(q(\mu ^{n},1- p)\) with a quantile that does not depend on \(\beta \). To this end, we observe that

$$\begin{aligned} D(n) := q\left( \mu ^{n-1}, 1- \frac{n}{n-1}p \right) \le q(\mu ^{n},1- p). \end{aligned}$$
(18)

From (18), we conclude that

$$\begin{aligned} \begin{aligned} \tilde{P}\left( X^{i}_{\tau _i}> q(\mu ^{n}, 1-p)\right)&\le \tilde{P}\left( X^{i}_{\tau _i}> D(n)\right) \\&= {\tilde{E}} \left[ \tilde{P} \left( X^{i}_{\tau _i} > D(n)\, |\, X^{-i},\tau _1,\ldots ,\tau _n \right) \right] , \end{aligned} \end{aligned}$$
(19)

where \(\tilde{E}\) denotes the expectation w.r.t. the probability measure \(\tilde{P}\). If Player i knew from the very beginning the value D(n), then \(m_{D(n)}\) would be the control maximizing the probability for Player i’s state to be greater than D(n) at time \(\tau _i\). This is a consequence of the control problem studied by McNamara [14].

To be more precise, note that \(\Omega \) is a complete, separable metric space and \(\mathcal {F}\) is its Borel \(\sigma \)-algebra. Thus, there exists a regular conditional probability \(Q:\Omega \times \mathcal {F}\rightarrow [0,1]\) for \(\mathcal {F}\) given \(\sigma \left( X^{-i},\tau _1,\ldots ,\tau _n\right) \). For \(\tilde{P}\)-almost every \(\omega \in \Omega \), we make the following observations: we can construct a Brownian motion under \(Q(\omega ,\cdot )\) such that

$$\begin{aligned} X_t^{i}&= \int _{0}^{t} \beta _s\, dW_s^i,\ t\ge 0,\ Q(\omega ,\cdot )\text {-a.s.} \end{aligned}$$

Moreover, we can work on a probability space \((\Omega ,{\hat{\mathcal {F}}},({\hat{\mathcal {F}}}_t)_{t\ge 0},Q(\omega ,\cdot ))\) satisfying the usual conditions because we can consider the completed version of \(\mathcal {F}\), denoted by \(\hat{\mathcal {F}}\), and the completed and right-continuous version of \((\mathcal {F}_t)_{t\ge 0}\), denoted by \(({\hat{\mathcal {F}}}_t)_{t\ge 0}\). Now, the results of McNamara (see [14], Remark 8) imply that on the probability space \((\Omega ,{\hat{\mathcal {F}}},({\hat{\mathcal {F}}}_t)_{t\ge 0},Q(\omega ,\cdot ))\) with Brownian motion W, time horizon \(T=\tau _i(\omega )\), and \(b=D(\omega ,n)\), we have

$$\begin{aligned}Q\left( \omega ,\left\{ X_{T}^{i}> b\right\} \right) \le 1- F_T^{b}(b),\end{aligned}$$

where \(F_T^b\) denotes the cumulative distribution function of an OBM with threshold b at time T (see Proposition 3.5 above). Hence,

$$\begin{aligned} \tilde{P}\left( X^{i}_{\tau _i} > D(n)\, |\, X^{-i},\tau _1,\ldots ,\tau _n \right) (\omega ) \le 1-F_T^{b}(b)|_{b=D(\omega ,n),T=\tau _i(\omega )}, \end{aligned}$$

for \(\tilde{P}\)-almost every \(\omega \in \Omega \). This implies, using the estimate (19), that

$$\begin{aligned}&\tilde{P}\left( X^{i}_{\tau _i} > q(\mu ^{n},1- p)\right) \le {E}\left[ 1-F_\tau ^{b}(b)|_{b=D(n)} \right] , \end{aligned}$$
(20)

where we use the identity (15), the fact that D(n) only depends on \((X^{-i},\tau ^{-i})\), the independence of \(\tau _1,\ldots ,\tau _n\) and \(X^{-i}\), and the connection of \(F_\tau ^{b}\) and \(F_t^{b}\) (see Lemma 3.6).

Step 3. We can now combine the estimates in (17) and (20) above leading to

$$\begin{aligned}&\tilde{P}\left( X^{i}_{\tau _i}> q(\mu ^{n}, 1-p) \right) - P\left( X^{i}_{\tau _i} > q(\mu ^{n},1- p)\right) \nonumber \\&\le E[ F_\tau ^{b^*}(A(n))] - E[F_\tau ^{D(n)} (D(n)) ] \nonumber \\&\le E \left| F_\tau ^{b^*}(A(n))-F_\tau ^{b^*}\left( q\left( \mu , \tfrac{n}{n-1}(1-p) \right) \right) \right| \end{aligned}$$
(21)
$$\begin{aligned}&\qquad + \left| F_\tau ^{b^*}\left( q\left( \mu , \tfrac{n}{n-1}(1-p) \right) \right) - F_\tau ^{b^*}\left( q\left( \mu , 1-\tfrac{n}{n-1}p \right) \right) \right| \end{aligned}$$
(22)
$$\begin{aligned}&\qquad \qquad + E\left| F^{b^*}_\tau \left( q\left( \mu , 1-\tfrac{n}{n-1}p \right) \right) -F^{b^*}_\tau (D(n)) \right| \end{aligned}$$
(23)
$$\begin{aligned}&\qquad \qquad \qquad + E\left| F^{b^*}_\tau \left( D(n)\right) -F_\tau ^{D(n)}(D(n)) \right| . \end{aligned}$$
(24)

We estimate the four terms on the right-hand side separately.

  1. (i)

    The term (21) can be estimated, using the Lipschitz continuity of \(F^{b^*}_\tau \) (see [19], Proposition B.10):

    $$\begin{aligned}&E \left| F_\tau ^{b^*}(A(n))-F_\tau ^{b^*}\left( q\left( \mu , \tfrac{n}{n-1}(1-p) \right) \right) \right| \nonumber \\&\le E \left[ C_1 \left| A(n)-q\left( \mu , \tfrac{n}{n-1}(1-p) \right) \right| \wedge 2 \right] \nonumber \\&= \int _{0}^{\infty }\! P\left( C_1\left| A(n) - q\left( \mu , \tfrac{n}{n-1}(1-p) \right) \right| \wedge 2> \varepsilon \right) \, d\varepsilon \nonumber \\&= C_1\int _{0}^{\frac{2}{C_1}}\! P\left( \left| A(n) - q\left( \mu , \tfrac{n}{n-1}(1-p) \right) \right| > \varepsilon \right) \, d\varepsilon , \end{aligned}$$
    (25)

    for \(C_1:=\frac{2}{\sigma _1\sqrt{2\pi }} E [\tau _1^{-1/{2}}]\). The \((n-1)\)-states \(X^{j}_{\tau _j}\), \(j\ne i\), are independent and identically distributed under P. Using Hoeffding’s inequality, one can deduce that

    $$\begin{aligned} P\left( \left| A(n) - q\left( \mu , \tfrac{n}{n-1}(1-p) \right) \right| > \varepsilon \right) \le 2e^{-2(n-1) C_2^2\varepsilon ^2},\ 0<\varepsilon < 2, \end{aligned}$$

    where the constant \(C_2>0\) only depends on \(C_1\). We find with (25) that

    $$\begin{aligned} \begin{aligned}&E \left| F^{b^*}_\tau (A(n))-F^{b^*}_\tau \left( q\left( \mu , \tfrac{n}{n-1}(1-p) \right) \right) \right| \\&\le 2 C_1\int _{0}^{\frac{2}{C_1}}\! e^{-2(n-1) C_2^2\varepsilon ^2}\, d\varepsilon \le \frac{C_1}{C_2 \sqrt{n-1}}\int _{0}^{\infty }\! e^{-\frac{x^2}{2}}\, dx \le \frac{\sqrt{2\pi } C_1 }{C_2} \frac{1}{\sqrt{n-1}}. \end{aligned} \end{aligned}$$
    (26)
  2. (ii)

    The term (22) can be rewritten as follows:

    $$\begin{aligned} \begin{aligned}&\left| F^{b^*}_\tau \left( q\left( \mu , \tfrac{n}{n-1}(1-p) \right) \right) - F^{b^*}_\tau \left( q\left( \mu , 1-\tfrac{n}{n-1}p \right) \right) \right| \\&= \left| \tfrac{n}{n-1}(1-p) - \left( 1-\tfrac{n}{n-1}p \right) \right| = \frac{1}{n-1}, \end{aligned} \end{aligned}$$
    (27)

    because \(F^{b^*}_\tau (x)=\mu ((-\infty ,x])\), \(x\in \mathbb {R}\).

  3. (iii)

    The term (23) can be estimated as the term (21), and thus,

    $$\begin{aligned} E\left| F^{b^*}_\tau \left( q\left( \mu , 1-\tfrac{n}{n-1}p \right) \right) -F^{b^*}_\tau (D(n)) \right| \le \frac{\sqrt{2\pi } C_1 }{C_2} \frac{1}{\sqrt{n-1}}. \end{aligned}$$
    (28)
  4. (iv)

    For the term (24), we observe, using the Lipschitz continuity of \(F_\tau ^b\) in the threshold b (see [19], Proposition B.12),

    $$\begin{aligned}&\left| F_\tau ^{b^*}\left( D(n) \right) -F_\tau ^{D(n)}(D(n))\right| \nonumber \\&\le \left( C_1|b^*-D(n)|\right) \wedge 2 \nonumber \\&\le \left( C_1 \left| b^*-q\left( \mu ,1-\tfrac{n}{n-1}p\right) \right| \right) \wedge 2+\left( C_1\left| q\left( \mu ,1-\tfrac{n}{n-1}p\right) -D(n)\right| \right) \wedge 2, \end{aligned}$$
    (29)

    with \(C_1\) defined as above in (i). The expected value of the second term can be estimated as the term (21). For the first term, we observe that there exists a constant \(C_3>0\) such that

    $$\begin{aligned} \left| b^*-q\left( \mu ,1-\tfrac{n}{n-1}p\right) \right|&= \left| (F_{\tau }^{b^*})^{-1}(1-p) -(F_{\tau }^{b^*})^{-1}\left( 1-\tfrac{n}{n-1}p\right) \right| \\&\le C_3\left| 1-p-\left( 1-\tfrac{n}{n-1}p\right) \right| = \frac{C_3 p}{n-1}. \end{aligned}$$

    We conclude that

    $$\begin{aligned} E \left| F^{b^*}_\tau (D(n))-F^{D(n)}_\tau \left( D(n)\right) \right| \le \left( \frac{\sqrt{2\pi }C_1}{C_2} + C_3 p \right) \frac{1}{\sqrt{n-1}}. \end{aligned}$$
    (30)

Finally, Eqs. (26), (27), (28), and (30) imply that

$$\begin{aligned}\tilde{P}\left( X^{i}_{\tau _i}> q(\mu ^{n},1- p) \right) - P\left( X^{i}_{\tau _i} > q(\mu ^{n}, 1-p)\right) \le \frac{C}{\sqrt{n-1}} \in \mathcal {O}\left( \frac{1}{\sqrt{n}} \right) ,\end{aligned}$$

for an appropriate constant \(C>0\) independent of \(\beta \), P, and \(\tilde{P}\). Thus, the feedback strategy \(\alpha ^*\) is an \(\mathcal {O}(n^{-1/2} )\)-Nash equilibrium. \(\square \)

Remark 4.3

The assumption that the random times \(\tau _1,\ldots ,\tau _n\) in the n-player game are independent is crucial. Theorem 4.1 does not extend to the case where the random times are non-constant and identical, because then the corresponding mean field game consists of

  1. 1.

    solving the optimization problem

    $$\begin{aligned}\sup _{\alpha } \int _{0}^{\infty } P(X_t^{\alpha }>q(\mu _t,1-p))\, P_\tau (dt), \end{aligned}$$

    and finding an optimal control \(\alpha ^*(\mu )\) for a given measure flow \(\mu =(\mu _t)_{t\ge 0}\),

  2. 2.

    finding a fixed point of the map \(\mu \mapsto \text {Law}\left( X^{\alpha ^*(\mu )}\right) \).

This mean field game is different from the one defined in Sect. 3. The control problem depends on flows of probability measures instead of a single probability measure on \(\mathbb {R}\). Therefore, the fixed point arguments used in the proof of Theorem 3.7 are not applicable.

5 The more time, the higher the winning probability

In this section we analyze how the winning probability of a player depends on the actual time horizon. We consider only the case where the competition is sufficiently fierce, i.e.

$$\begin{aligned} p<\frac{\sigma _2}{\sigma _1+\sigma _2}. \end{aligned}$$
(31)

Notice that (31) implies that the threshold level \(b^*\) of Theorem 3.7 is positive. We denote the mean field equilibrium distribution by \(\mu ^*\). Recall that \(\mu ^*\) is the law of the OBM with threshold \(b^*\) at an independent random time with distribution \(\mathcal {T}\).

Now suppose that n is large and that in the game with n players everyone controls their states with the threshold control \(m_{b^*}\). We select one player and assume that her realized time horizon is t. Then the probability for this particular player to be among the best p at the end of the game is approximately given by

$$\begin{aligned} w(t):= P\left( X_t^{b^*}>q\big (\mu ^*,1-p\big )\right)&= 1-F^{b^*}_t(b^*) = \frac{2\sigma _2}{\sigma _1+\sigma _2} \left( 1-\Phi \left( \frac{ b^*}{\sigma _2 \sqrt{t}} \right) \right) . \end{aligned}$$

We refer to the function w as the winning probability. Note that the winning probability is continuous and increasing in t. Moreover, we have

$$\begin{aligned} \lim _{t\downarrow 0} w(t) = 0. \end{aligned}$$

Thus, if the actual time horizon t is small, then the winning probability is close to zero. This is plausible, since the state process starts in zero and is stopped early, and hence attains the positive level \(b^*\) with a small probability only.

Next observe that

$$\begin{aligned} \lim _{t\rightarrow \infty } w(t) = \frac{\sigma _2}{\sigma _1+\sigma _2}. \end{aligned}$$

Thus, the winning probability is bounded by \(\frac{\sigma _2}{\sigma _1+\sigma _2}\), and the bound is almost attained for large time horizons t. The bound corresponds to the expected average time that an OBM is spending above the threshold in the long run. Indeed, irrespective of the threshold b, one can show that

$$\begin{aligned} \lim _{t \rightarrow \infty } P(X^b_t \ge b) = \frac{\sigma _2}{\sigma _1+\sigma _2}. \end{aligned}$$
(32)

The bound allows also for a control theoretical interpretation: to this end consider the ergodic control problem with target functional

$$\begin{aligned} J(\alpha ) := \liminf _{T \rightarrow \infty } \frac{1}{T} E \int _0^T 1_{\{X^\alpha _t \ge b\}} dt, \qquad \alpha \in {{\tilde{\mathcal {A}}}}, \end{aligned}$$

where \({{\tilde{\mathcal {A}}}}\) is defined as in Sect. 3 and \(X^\alpha \) is defined as in (6). One can show that \(m_b\) is an optimal control (see Remark 8 in [14]) and hence, using (32), \(\sup _{\alpha \in {{\tilde{\mathcal {A}}}}} J(\alpha ) = \frac{\sigma _2}{\sigma _1+\sigma _2}\).

6 An application to online competitions

Our game formulation is generic and can correspond to a variety of practical situations. For example, the game applies to managers of mutual funds striving for their funds to be among the best performing. This application is described, for homogeneous managers, in detail in Section 7 of [1].

We here provide an alternative application to online competitions in which, usually, teams have to collaborate in order to solve a problem and provide a solution within a limited time frame. Hackatons are examples of such competitions, but also data science and machine learning related competitions offered on some platforms such as Kaggle, DrivenData, AICrowd etc. During these competitions, a score, based on a given evaluation metric, can be calculated by each participant and a "public leaderboard" displays the relative ranks during the whole length of the competition.

In this context, the private state \(X^i_t\) of player i is interpreted as her score displayed at time t. The number of teams involved in online competitions can reach several tens of thousands, enough to consider the mean field approximation.

Level of risk.

We interpret the diffusion control as the possibility to control the level of risk taken. In the online contest’s setting, teams can indeed choose to try and use well established methods, whose robustness is already studied and for which errors can be more easily and quickly corrected. We interpret this as low risk and diffusion coefficient \(\sigma _1\). On the other hand, the choice of diffusion coefficient \(\sigma _2\) is interpreted as trying new techniques, for which there is less or no experience. Our results rigorously show in this context that if subtasks are going well, players will play safe, whereas if a given team is poorly performing on the evaluation metric, it has an incentive to play risky and try less common strategies.

Observability.

Participants to online competitions can submit a solution, and in that case their current score is calculated and displayed to all participants. However, if a team obtains a solution and tests it offline on the provided data set, then the associated score is not visible during the time it is not submitted on the platform. A given team can choose to reveal its solution and score only towards the end of the submission period.

Moreover, it is possible to design online competitions with only partial observability: one could easily imagine that teams only observe the best score, or a given quantile of the scores distribution, to assess their relative performance.

Our results show that if the number of players is large, observability does not matter, at least for the particular type of discontinuous criteria that we consider, which are common in these competitions, where a fixed cash prize is offered to the best performing team. This result also holds for continuous functions of the rank satisfying the symmetry condition given in Assumption 7.1.

Terminal time.

We consider two cases. Firstly, the case where the time horizons of all players are constant equal to \(T \in (0, \infty )\), interpreted as the date at which a final assessment of the evaluation metric is made by the contest organizers. Secondly, the time horizon \(\tau _i\) of team i quantifies the resources it can put into the competition, e.g. the number of working hours. For example, \(\tau _i\) can be set proportional to the deterministic assessment date and the number of team members. Section 5 reveals that larger teams have an advantage compared to smaller teams.

7 Extensions

In this section, we discuss more general reward criteria for the n-player game. In particular, we consider rewards at the random time horizons that are given by measurable functions \(\hbox { }\ g:\mathbb {R}\times \mathbb {R}\rightarrow \mathbb {R}\) of the state and the population quantile, instead of the “all-or-nothing” payoff given by the function \((x,q)\mapsto \mathbbm {1}_{(q,\infty )}(x)\) before. First, we show that the mean field equilibrium control of Sect. 3 is also an equilibrium for the reward functions g, if g satisfies a symmetry and convexity condition. Then, we prove that this equilibrium provides an approximate Nash equilibrium of the n-player game.

7.1 Mean field game

Assume that we are in the setting of Sect. 3. The whole analysis in Sect. 3 depends on the optimality of the threshold control \(m_b\) for the particular choice of the reward \(\mathbbm {1}_{(b,\infty )}\) (Lemma 3.3). Results of McNamara [14] imply that \(m_b\) is not only optimal for this reward but also for more general reward functions that are continuous, have exponential growth, and satisfy a convexity condition and a symmetry condition. In more detail, we can generalize Theorem 3.7 to measurable functions \(g:\mathbb {R}\times \mathbb {R}\rightarrow \mathbb {R}\) satisfying:

Assumption 7.1

  1. (i)

    \(g(\cdot ,q)\) is continuous and has exponential growth for any \(q\in \mathbb {R}\),

  2. (ii)

    \(g(\cdot ,q)\) is convex on \((-\infty ,q]\) and concave on \([q,\infty )\) for any \(q\in \mathbb {R}\),

  3. (iii)

    for all \(x\ge 0\) and \(q\in \mathbb {R}\) it holds

    $$\begin{aligned} \sigma _2 g(\sigma _1 x+q,q)+\sigma _1 g(-\sigma _2 x +q,q) = (\sigma _1+\sigma _2) g(q,q). \end{aligned}$$

Proposition 7.2

Let \(b^*\) be given by Theorem 3.7 and g satisfy Assumption 7.1. Then, \((\text {Law}(X_\tau ^{b^*}), m_{b^*})\) is also a mean field equilibrium for the reward function g, i.e.,

$$\begin{aligned}E\left[ g\big ( X^{b^*}_\tau , q\big (X^{b^*}_\tau ,1-p\big )\big ) \right] = \sup _{\alpha \in {\tilde{\mathcal {A}}}}E\left[ g\big ( X^{\alpha }_\tau , q\big (X^{b^*}_\tau ,1-p\big )\big ) \right] .\end{aligned}$$

Proof

As in Lemma 3.3, one can show for fixed \(b\in \mathbb {R}\) that

$$\begin{aligned}E\left[ g\big ( X^{b}_\tau , b\big ) \right] =\sup _{\alpha \in {\tilde{\mathcal {A}}}}E\left[ g\left( X^{\alpha }_\tau , b\right) \right] ,\end{aligned}$$

using either [14], Theorem 6, or [19], Theorem C.5. The assumptions on g guarantee that these theorems apply. Moreover, Theorem 3.7 implies the existence of a unique fixed point \(b^*\) of the map \(b\mapsto q\left( X_{\tau }^{b},1-p \right) \). For this fixed point \(b^*\), we see that

$$\begin{aligned}E\left[ g\big ( X^{b^*}_\tau , q\big (X^{b^*}_\tau ,1-p\big )\big ) \right] = \sup _{\alpha \in {\tilde{\mathcal {A}}}}E\left[ g\big ( X^{\alpha }_\tau , q\big (X^{b^*}_\tau ,1-p\big )\big ) \right] ,\end{aligned}$$

i.e., \((\text {Law}(X_\tau ^{b^*}), m_{b^*})\) is an equilibrium for the reward function g. \(\square \)

7.2 Approximate Nash equilibrium in the n-player game

Now, in the setting of Sect. 2, we show that the n-tuple with each entry equal to the mean field equilibrium strategy provides an approximate Nash equilibrium of the n-player game with reward g.

Definition 7.3

Let \(\varepsilon > 0\). A tuple \(\alpha = (\alpha _1, \ldots ,\alpha _n) \in \mathcal {A}_n\) with \(\mathcal {Q}(\alpha )\ne \emptyset \) is called \(\varepsilon \)-Nash equilibrium of the n-player game if for all \(i \in \{1, \ldots , n\}\) and \(P^{\alpha }\in \mathcal {Q}(\alpha )\)

$$\begin{aligned} E^{P^\alpha } \left[ g\left( X^{i}_{\tau _i},q(\mu ^n,1-p)\right) \right] +\varepsilon \ge \sup _{\beta \in \mathcal {A}^{i}} \sup _{{P}\in \mathcal {Q}(\alpha _{-i},\beta )} E^{ P} \left[ g\left( X^{i}_{\tau _i},q(\mu ^n,1-p)\right) \right] , \end{aligned}$$

where \((\alpha _{-i},\beta ) = (\alpha _1, \ldots , \alpha _{i-1}, \beta , \alpha _{i+1}, \ldots , \alpha _n)\) and \(\sup \emptyset = -\infty \).

With some additional assumptions on the terminal reward g, we can show:

Proposition 7.4

Let g satisfy Assumption 7.1. In addition, assume that g is uniformly bounded, continuous, and \(g(x,\cdot )\) is monotonically decreasing for all \(x\in \mathbb {R}\). Let \(\alpha ^*\in \mathcal {A}_n\) be defined as in Theorem 4.1. Then, there exists a sequence \(\varepsilon _n\ge 0\) with \(\lim _{n\rightarrow \infty }\varepsilon _n=0\) such that the control tuple \(\alpha ^*= (\alpha ^{1,*}, \ldots , \alpha ^{n,*})\) is an \(\varepsilon _n\)-Nash equilibrium of the n-player game.

Proof

All details of the proof can be found in Proposition 3.2.25 in [19]. Let \(i\in \{1,\ldots ,n\}\). Moreover, let \(\beta \in \mathcal {A}^{i}\) such that \(\mathcal {Q}(\alpha ^{-i,*},\beta )\ne \emptyset \) and choose \(P\in \mathcal {Q}(\alpha ^*)\), \(\tilde{P}\in \mathcal {Q}(\alpha ^{-i,*},\beta )\). Using the empirical quantiles A(n) and D(n) defined in (16) and (18), respectively, we find that

$$\begin{aligned} \tilde{E} \left[ g(X_{\tau _i}^{i},q(\mu ^n,1-p)) \right]&\le \tilde{E} \left[ g(X_{\tau _i}^{i},D(n)) \right] ,\\ {E} \left[ g(X_{\tau _i}^{i},q(\mu ^n,1-p)) \right]&\ge {E} \left[ g(X_{\tau _i}^{i},A(n)) \right] , \end{aligned}$$

because of the monotonicity of \(g(x,\cdot )\). We use the notation E and \(\tilde{E}\) for the expectation w.r.t. P and \(\tilde{P}\), respectively. Define the function

$$\begin{aligned}G(q):= E[g(X_{\tau _i}^{i},q)]=\int _{0}^{\infty } \int _{-\infty }^{\infty } g(x,q)p(t,-q,x-q)\, dx P_{\tau }(dt),\ q\in \mathbb {R},\end{aligned}$$

where p denotes the probability density function of the OBM defined in Proposition 3.5. Note that

$$\begin{aligned}{E} \left[ g(X_{\tau _i}^{i},A(n)) \right] = {E} \left[ G(A(n)) \right] ,\end{aligned}$$

because \(X_{\tau _1}^{1},\ldots ,X_{\tau _n}^{n}\) are independent under P. Similar to S, one can show that

$$\begin{aligned}\tilde{E} \left[ g(X_{\tau _i}^{i},D(n)) \right] \le \tilde{E} \left[ G(D(n)) \right] = {E} \left[ G(D(n)) \right] ,\end{aligned}$$

similar to Step 2 in the proof of Theorem 3.2.16 in [19]. The upper bound follows from Theorem 6 in [14]: the control maximizing the left-hand side is the threshold control with threshold D(n), if conditioned on D(n). We conclude that

$$\begin{aligned}&\tilde{E} \left[ g(X_{\tau _i}^{i},q(\mu ^n,1-p)) \right] - {E} \left[ g(X_{\tau _i}^{i},q(\mu ^n,1-p)) \right] \le {E} \left[ G(D(n)) \right] - {E} \left[ G(A(n)) \right] . \end{aligned}$$

Note that G is bounded and continuous, and the right-hand side only depends on the distribution of \(n-1\) independent OBMs with threshold \(b^*\). Moreover, A(n) and D(n) converge to \(b^*\) in probability (see Lemma 3.2.20 in [19]) and hence, also in distribution. This means

$$\begin{aligned}\lim _{n\rightarrow \infty } \left| {E} \left[ G(D(n)) \right] - {E} \left[ G(A(n)) \right] \right| =0.\end{aligned}$$

Therefore, we can find a sequence \((\varepsilon _n)_{n\in \mathbb {N}}\) with the desired properties. The sequence \((\varepsilon _n)_{n\in \mathbb {N}}\) is independent of \(i\in \{1,\ldots ,n\}\), \(\beta \in \mathcal {A}\), \(P\in \mathcal {Q}(\alpha ^*)\), and \(\tilde{P}\in \mathcal {Q}(\alpha ^{-i,*},\beta )\) because the distributions of A(n) under P and of D(n) under \(\tilde{P}\) are unique. \(\square \)