A fixed-point policy-iteration-type algorithm for symmetric nonzero-sum stochastic impulse games

Nonzero-sum stochastic differential games with impulse controls offer a realistic and far-reaching modelling framework for applications within finance, energy markets and other areas, but the difficulty in solving such problems has hindered their proliferation. Semi-analytical approaches make strong assumptions pertaining very particular cases. To the author's best knowledge, the only numerical method in the literature is the heuristic one we put forward to solve an underlying system of quasi-variational inequalities. Focusing on symmetric games, this paper presents a simpler and more efficient fixed-point policy-iteration-type algorithm which removes the strong dependence on the initial guess and the relaxation scheme of the previous method. A rigorous convergence analysis is undertaken with natural assumptions on the players strategies, which admit graph-theoretic interpretations in the context of weakly chained diagonally dominant matrices. A provably convergent single-player impulse control solver, often outperforming classical policy iteration, is also provided. The main algorithm is used to compute with high precision equilibrium payoffs and Nash equilibria of otherwise too challenging problems, and even some for which results go beyond the scope of all the currently available theory.


Introduction
Stochastic differential games model the interaction between players whose objective functions depend on the evolution of a certain continuous-time stochastic process. The subclass of impulse games focuses on the case where the players only act at discrete points in time by shifting the process. In doing so, each of them incurs into costs and possibly generates 'gains' for the others at the same time. They constitute a generalization of the well-known (single-player) optimal impulse control problems [ØS09, Chpt.7-10], which have found a wide range of applications in finance, energy markets and insurance [Bas19,CCTZ06,EH88,Kor99], among plenty of other fields.
From a deterministic numerical viewpoint, an impulse control problem entails the resolution of a differential quasi-variational inequality (QVI) to compute the value function and, when possible, retrieve an optimal strategy. Policy-iteration-type algorithms [AF16,CØS02,CMS07] undoubtedly occupy an ubiquitous place in this respect, and more so in the infinite horizon case.
The presence of a second player makes matters much more challenging, as one needs to find two optimal (or equilibrium) payoffs dependent on one another, and the optimal strategies take the form of Nash equilibria (NEs). And while impulse controls give a more realistic setting than 'continuous' controls in applications as the aforementioned ones, they normally lead to less tractable and very technical models.
It is not surprising then, that the literature in impulse games is limited and mainly focused on the zero-sum case [Cos13,EAM18,Azi19b]. The more general and flexible nonzero-sum instance has only recently began to receive more attention. The authors of [ABC + 19] consider for the first time a general two-player game where both participants act through impulse controls, 1 and characterize certain type of equilibrium payoffs and NEs via a system of QVIs by means of a Verification Theorem. Using this result, they manage to provide the first example of an (almost) fully analytically solvable game, motivated by central banks competition over the exchange rate. The result is generalized to arbitrary N players in [BCG19], which also gives a semi-analytical solution (i.e., depending on several parameters found numerically) to a concrete cash management problem. 2 A different, more probabilistic, approach is taken in [FK19] to find a semi-analytical solution to a strategic pollution control problem and to prove another Verification Theorem.
The previous examples, and the lack of others, 3 give testimony of how difficult it is to explicitly solve nonzero-sum impulse games. The analytical approaches require an educated guess to start with and (with the exception of the henceforth referred to as the linear game in [ABC + 19]) several parameters need to be solved for in general from highly-nonlinear systems of equations coupled with order conditions. All of this can be very difficult, if not prohibitive, when the structure of the game is not simple enough. Further, all of them (as well as the majority of the concrete examples in the impulse control literature) assume linear costs. In general, for nonlinear costs, the state to which each player wants to shift the process when intervening is not unique, but depends on the starting point. This effectively means that infinite parameters may need to be solved for, drastically discouraging this methodology.
While the need for numerical schemes able to handle nonzero-sum impulse games is obvious, unlike in the single-player case, this is an utterly underdeveloped line of research. Focusing on the purely deterministic approach, solving the system of QVIs derived in [ABC + 19] involves handling coupled free boundary problems, further complicated by the presence of nonlinear, nonlocal and noncontractive operators. Additionally, solutions will typically be irregular even in the simplest cases such as the linear game. Moreover, the absence of a viscosity solutions framework such as that of impulse control [Sey09] means that it is not possible to know whether the system of QVIs should have a solution or not (not to mention some form of uniqueness) unless one can explicitly solve it. This is further exacerbated by the fact that even defining such system requires a priori assumptions on the solution (the unique impulse property). This is also the case in [FK19].
To the author's best knowledge, the only numerical method available in the literature is our algorithm in [ABM + 19], which tackles the system of QVIs by sequentially solving single-player impulse control problems combined with a relaxation scheme. Unfortunately, the choice of the latter scheme is not obvious in general and it was verified that the convergence of the algorithm was reliant on a good choice of the initial guess. Additionally, while a numerical validation was performed, no rigorous convergence analysis was provided at the time.
Restricting attention to the one-dimensional infinite horizon two-player case, this paper puts the focus on certain nonzero-sum impulse games which display a symmetric structure between the players. This class is broad enough to include many interesting applications; no less than the competing central banks problem (whether in its linear form [ABM + 19] or others considered in the single Bank formulation [AF16,CZ99,JP93,MØ98]), the cash management problem [BCG19] (reducing its dimension by a simple change of variables) and the generalization of many impulse control problems to the two-player case.
For this class of games, an iterative algorithm is presented which substantially improves [ABM + 19, Alg.2] by harnessing the symmetry of the problem, removing the strong dependence on the initial guess and dispensing with the relaxation scheme altogether. The result is a simpler and more intuitive and efficient routine, for which a convergence analysis is provided. It is shown that the overall routine admits a representation that strongly resembles, both algorithmically and in its properties, that of the combined fixed-point policy-iteration methods [HFL12,Cli07], albeit with nonexpansive operators. Still, a certain contraction property can still be established.
To perform the analysis, we impose assumptions on the discretization scheme used on the system of QVIs and the discrete admissible strategies. These naturally generalize those of the impulse control case [AF16] and admit graph-theoretic interpretations in terms of weakly chained diagonally dominant (WCDD) matrices and their matrix sequences counterpart [Azi19a]. We establish a clear parallel between these discrete type assumptions, the behaviour of the players and the mimicking of the Verification Theorem.
Section 1 deals with the analytical problem. Starting with an overview of the model (Section 1.1), we recall the Verification Theorem of [ABC + 19] and the system of QVIs we want to solve (Section 1.2). We then give a precise definition of the class of symmetric nonzero-sum impulse games and establish some preliminary results (Section 1.3).
Section 2 moves on the analogue discrete problem. Section 2.1 specifies a general and abstract discrete version of the system of QVIs, such that any discretization scheme compliant with the assumptions to be imposed will be subject to the same results. Section 2.2 presents the iterative algorithm subject of this paper, and shows how the impulse control problems that need to be sequentially solved have a unique solution and can be handled by policy iteration. However, the latter has costly efficiency drawbacks, which is why Section 2.3 provides a general solver for impulse control problems which outperforms classical policy iteration in many practical situations. It consists of an instance of fixed-point policy-iteration that is noncompliant with the standard assumptions [HFL12]. We prove its convergence under our framework.
Section 2.4 characterizes the overall iterative algorithm as a fixed-point policy-iterationtype method, allowing for reformulations of the original problem and results pertaining the solutions. (The necessary matrix and graph-theoretic definitions and results needed are collected in Appendix A for the reader's convenience.) Section 2.5 carries on with the overall convergence analysis, and shows to which extent different sets of reasonable assumptions are enough to guarantee convergence to solutions, convergence of strategies and boundedness of iterates. A result of theoretical interest, giving sufficient conditions for convergence, is provided. Discretization schemes within the standing framework are provided in Section 2.6.
Section 3 presents all the numerical results. In Section 3.1, a variety of symmetric nonzerosum impulse games, many seemingly too complicated to be handled analytically, are explicitly solved for equilibrium payoffs and NEs strategies with great precision. This is done on a fixed grid, while considering different performance metrics and addressing practical matters of implementation. In the absence of a viscosity solutions framework to establish convergence to analytical solutions as the grid is refined, Section 3.2 performs a numerical validation using the only examples of symmetric solvable games in the literature. Section 3.3 addresses the case of games without NEs. Section 3.4 tackles games such that the results go beyond the scope of all the currently available theory, displaying discontinuous impulses and very irregular payoffs. The latter give insight and motivate further research into this field. Lastly, Section 4 concludes.

Analytical continuous-space problem
In this section we start by reviewing a general formulation of two-player nonzero-sum stochastic differential games with impulse controls, as considered in [ABC + 19], together with the main theoretic result of the authors: a characterization of certain NEs via a deterministic system of QVIs. The indexes of the players are denoted i, j ∈ {1, 2} with i = j. Since no other type of games is considered in this paper, we will often speak simply of 'games' for brevity. Afterwards we shall specialize the discussion in the (yet to be specified) symmetric instance.
Throughout the paper, we restrict our attention to the one-dimensional infinite-horizon case. (A similar review is carried out in [ABM + 19, Sect.1].) Some of the most technical details, concerning the well-posedness of the model, are left out for the sake of briefness and can be found in [ABC + 19, Sect.2].
1.1 General two-player nonzero-sum impulse games Let (Ω, F, (F t ) t≥0 , P) be a filtered probability space under the usual conditions supporting a standard one-dimensional Wiener process W . We consider two players that observe the evolution of a state variable X, modifying it when convenient through controls of the form The stopping times (τ k i ) are their intervention times and the F τ k i -measurable random variables (δ k i ) are their intervention impulses. Given controls (u 1 , u 2 ) and a starting point X 0 − = x ∈ R, we assume X = X x;u 1 ,u 2 has dynamics for some given drift and volatility functions µ, σ : R → R, locally Lipschitz with linear growth. 4 Equation (1.1.1) states that X evolves as an Itô diffusion in between the intervention times, and that each intervention consists in shifting X by applying an impulse. It is assumed that the players choose their controls by means of threshold-type strategies of the form ϕ i = (I i , δ i ), where I i ⊆ R is a closed set called intervention (or action) region and δ i : R → R is an impulse function assumed to be continuous. The complement C i = I c i is called continuation (or waiting) region. 5 That is, player i intervenes if and only if the state variable reaches her intervention region, by applying an impulse δ i (X t − ) (or equivalently, shifting X t − to X t − +δ i (X t − )). Further, we impose a priori constraints on the impulses: for each x ∈ R there exists a set ∅ = Z i (x) ⊆ R (further specified in Section 1.3) such that δ i (x) ∈ Z i (x) if x ∈ I i . 6 We also assume the game has no end and player 1 has the priority should they both want to intervene at the same time.
(The latter will be excluded later on; see Definition 1.3.1 and the remarks that follow it.) Given a starting point and a pair strategies, the (expected) payoff of player i is given by with X = X x;u 1 ,u 2 = X x;ϕ 1 ,ϕ 2 . For player i, ρ i > 0 represents her (subjective) discount rate, f i : R → R her running payoff, c i : R 2 → (0, +∞) her cost of intervention and g i : R 2 → R her gain due to her opponent's intervention (not necessarily non-negative). The functions f i , c i , g i are assumed to be continuous. Throughout the paper, only admissible strategies are considered. Briefly, (ϕ 1 , ϕ 2 ) is admissible if it gives well-defined payoffs for all x ∈ R, X ∞ has finite moments and, although each player can intervene immediately after the other, infinite simultaneous interventions are precluded. 7 As an example, if the running payoffs, costs and gains have polynomial growth, the 'never intervene strategies' ϕ 1 = ϕ 2 = (∅, ∅ → R) are admissible and the game can be played. Given a game, we want to know whether it admits some Nash equilibrium and how to compute it. Recall that a pair of strategies (ϕ * 1 , ϕ * 2 ) is a Nash equilibrium (NE) if for every admissible (ϕ 1 , ϕ 2 ), i.e., no player can gain from a unilateral change of strategy. If one such NE exists, we refer to , as a pair of equilibrium payoffs.

General system of quasi-variational inequalities
To present the system of QVIs derived in [ABC + 19], we need to define first the intervention operators. For any V 1 , V 2 : R → R and x ∈ R, the loss operator of player i is defined as When applied to an equilibrium payoff, the loss operator M i gives a recomputed present value for player i due to the cost of her own intervention. Given the optimality of the NEs, one would intuitively expect that M i V i ≤ V i for equilibrium payoffs and that the equality is attained only when it is optimal for player i to intervene. Under this logic: Definition 1.2.1. We say that the pair (V 1 , V 2 ) has the unique impulse property (UIP) if for each i = 1, 2 and , that realizes the supremum in (1.2.1). 9 If (V 1 , V 2 ) enjoys the UIP, we define the gain operator of player i as When applied to equilibrium payoffs, the gain operator H i gives a recomputed present value for player i due to her opponent's intervention. Finally, let us denote by A the infinitesimal generator of X when uncontrolled, i.e., for any V : R → R which is C 2 at some open neighborhood of a given x ∈ R. We assume this regularity holds whenever we compute AV (x) for some V and x. The following Verification Theorem, due to [ABC + 19, Thm.3.3], states that if a regular enough solution (V 1 , V 2 ) to a certain system of QVIs exist, then it must be a pair of equilibrium payoffs, and a corresponding NE can be retrieved. We state here a simplified version that applies to the one-dimensional infinite-horizon games at hand. 10 Theorem 1.2.2 (General system of QVIs). Given a game as in Section 1.1, let V 1 , V 2 : R → R be pair of functions with the UIP, such that for any i, j ∈ {1, 2}, i = j: has polynomial growth and bounded second derivative on some reduced neighbourhood of ∂C * i . Suppose further (I * i , δ * i ) i=1,2 are admissible strategies. 11 Then, (V 1 , V 2 ) are equilibrium payoffs attained at an NE ( The first equation of system (1.2.3) states that at an equilibrium, a player cannot increase her own payoff by a unilateral intervention. One therefore expects that the equality M j V j = V j will only hold when player j intervenes, or in other words, when the value she gains can exactly compensate the cost of her intervention. Consequently, the second equation says that a gain results from the opponent's intervention. Finally, the last one, means that when the opponent does not intervene, each player faces a classical (single-player) impulse control problem.
We conclude this section with some final observations that will be relevant in the sequel: Remark 1.2.3. Intuitively, one would expect that intervening with a null impulse should be equivalent to simply not intervening. These notions are indeed the same when interpreted through system (1.2.3), under our assumption that the cost of intervention is always strictly positive. If at some state x, M j V j (x) is realized for δ = 0, this means that intervening would result in a loss:  [Cos13]. (Concave costs are also assumed in [EAM18].) That is, c i (x, δ + δ) < c i (x, δ)+c i (x+δ,δ) for all x ∈ R, δ,δ ≥ 0. This models the situation in which simultaneous interventions are more expensive than a single one to the same effect. In such cases, it is easy to see that in the context of Theorem 1.2.2, player i will only shift the state variable towards her continuation region. 12

Symmetric two-player nonzero-sum impulse games
We want to focus our study on games which present a certain type of symmetric structure between the players, generalising the linear game [ABC + 19] and the cash management game [BCG19]. 13 We shall make a slight abuse of terminology with respect to the more common use of the term 'symmetric' in games theory, but this will be consistent throughout the paper.
Notation. The type of games presented in Section 1.1 are fully defined by setting the drift, volatility, impulse constraints, discount rates, running payoffs, costs and gains. In other words, any such game can be represented by a tuple G = (µ, σ, Definitions 1.3.1. We say that a game G = (µ, σ, ,2 is symmetric (with respect to zero) if (S1) µ is odd and σ is even (i.e., µ(x) = −µ(−x) and σ(x) = σ(−x) for all x ∈ R).
11 For consistency with the strategies definition, one should assume that δ * has been continuously extended to R. The conclusion is unaffected by the choice of the extension.
12 Let x ∈ I * i and suppose y * i := x+δ * i (x) ∈ I * i . Set y * * i := y * i +δ * i (y * ). Then, by the UIP, the definitions of δ * i (x) and I * i , and the concavity of the cost: which is a contradiction. 13 The latter can be reduced to one dimension with the change of variables x = x1 − x2. Additionally, we will restrict attention to unidirectional impulses, as these yield the most relevant NE found in [BCG19].
We say that the game is symmetric with respect to s (for some s ∈ R), if the s-shifted game (µ(x + s), σ(x + s), Z i (x + s), ρ i , f i (x + s), c i (x + s, δ), g i (x + s, δ)) i=1,2 is symmetric. We refer to x = s as a symmetry line of the game.
Condition (S1) is necessary for the state variable to have symmetric dynamics. In particular, together with (S3), it guarantees symmetry between solutions of the Hamilton-Jacobi-Bellman (HJB) equations of the players when there are no interventions, i.e., Examples 1.3.2. The most common examples of Itô diffusions satisfying this assumption are the scaled Brownian motion (symmetric with respect to zero) and the Ornstein-Uhlenbeck (OU) process (symmetric with respect to its long term mean). Condition (S3) is self-explanatory, while (S2) is only partly so. Indeed, although symmetric constraints on the impulses Z 1 (x) = −Z 2 (−x) should clearly be a requirement, the rest of (ii) is in fact motivated by the numerical method to be presented and the type of problems it can handle. On the one hand, the third equation of the QVIs system (1.2.3) implies that a stochastic impulse control problem for player i needs to be solved on C * j . The unidirectional impulses assumption is a common one for the convergence of policy iteration algorithms in impulse control. 14 However, it is often too restrictive for many interesting applications, such as when the controller would benefit the most from keeping the state variable within some bounded interval instead of simply keeping it 'high' or 'low' (see, e.g., [Bas19] and [AF16, Sect.6.1]). Interestingly enough, assuming unidirectional impulses turns out to be less restrictive when there is a second player present, with an opposed objective. Indeed, it is often the case that each player needs not to intervene in one of the two directions, and can instead rely on her opponent doing so, while capitalising a gain rather than paying a cost. See examples in Section 3.1 with quadratic and degree four running payoffs.
On the other hand, Z 1 (x) = {0} = Z 2 (−x) for all x ≥ 0y means that the intervention regions of the players do not cross over the symmetry line, i.e., I 1 ⊆ (−∞, 0) and I 2 ⊆ (0, +∞) for every pair of strategies. (See Remark 1.2.3.) This assumption guarantees in particular that the players never want to intervene at the same time (the priority rule can be disregarded).
There are more reasons why the last mentioned condition is less restrictive than it first appears to be. It is not uncommon to assume connectedness of either intervention or continuation regions (or other conditions implying them) both in impulse control [Ega08] and nonzero-sum games [DAFM18, Sect.1.2.1]. The same can be said for assumptions that prevent the players from intervening in unison [DAFM18, Sect.1.2.1], [Cos13, Rmk.6.5]. 15 In the context of symmetric games and payoffs (see Lemma 1.3.7) such assumptions would necessarily imply the intervention regions need to be on opposed sides of the symmetry line. Additionally, without any further requirements, strategies such that I 1 ⊇ (−∞, 0] and I 2 ⊇ [0, +∞) would be inadmissible in the present framework, as per yielding infinite simultaneous impulses.
Definitions 1.3.3. Given a symmetric game, we say that (I i , δ i ) i=1,2 are symmetric strategies (with respect to zero) if I 1 = −I 2 and δ 1 (x) = −δ 2 (−x). Given a symmetric game with respect to some s ∈ R, we say that (I i , ξ i ) i=1,2 are symmetric strategies with respect to s if (I i −s, ξ i ) i=1,2 are symmetric, and we refer to x = s as a symmetry line of the strategies.
. We say that they are symmetric functions with respect to s (for some s ∈ R) if V 1 (x + s), V 2 (x + s) are symmetric, and we refer to x = s as a symmetry line for V 1 , V 2 . Remark 1.3.5. Definition 1.3.3 singles out strategies that share the same symmetry line with the game. For the linear game, for example, the authors find infinitely many NEs [ABC + 19, Prop.4.7], each presenting symmetry with respect to some point s, but only one for s = 0 (hence, symmetric in the sense of Definition 1.3.3). At the same time, the latter is the only one for which the corresponding equilibrium payoffs V 1 , V 2 have a symmetry line as per Definition 1.3.4. The same is true for the cash management game [BCG19].
Remark 1.3.6. Throughout the paper we will work only with games symmetric with respect to zero, to simplify the notation. Working with any other symmetry line amounts simply to shifting the game and results back and forth. Lemma 1.3.7. For any symmetric game, strategies (ϕ 1 , ϕ 2 ) and functions V 1 , V 2 : R → R: To see (iii), one can check with the recursive definition of the state variable [ABC + 19, Def.2.2], that X −x;ϕ 1 ,ϕ 2 has the same law as −X x;ϕ 1 ,ϕ 2 (recall that the continuation regions are simply disjoint unions of open intervals). Noting also that intervention times and impulses are nothing but jump times and sizes of X, one concludes that J 1 (x; ϕ 1 , ϕ 2 ) = J 2 (−x; ϕ 1 , ϕ 2 ), as intended.
Convention 1.3.8. In light of Lemma 1.3.7, for any symmetric game we will often lose the player index from the notations and refer always to quantities corresponding to player 1, 16 henceforth addressed simply as 'the player'. Player 2 shall be referred to as 'the opponent'. Statements like 'V has the UIP' or 'V is a symmetric equilibrium payoff' are understood to refer to (V (x), V (−x)). Likewise, '(I, δ) is admissible' or '(I, δ) is a NE' refer to the pair (I, δ(x)), (−I, −δ(−x)).
Due to their general lack of uniqueness, it is customary in game theory to restrict attention to specific type of NEs, depending on the problem at hand (see for instance [HS88] for a treatment within the classical theory). Motivated by Lemma 1.3.7 (iii) and (iv), and by Remark 1.3.5, one can arguably state that symmetric NEs are the most meaningful for symmetric games. Furthermore, Lemma 1.3.7 implies that for symmetric games, one can considerably reduce the complexity of the full system of QVIs (1.2.3) provided the conjectured NE (or equivalently, the pair of payoffs) is symmetric. Using Convention 1.3.8, Theorem 1.2.2 and Lemma 1.3.7 give: Corollary 1.3.9 (Symmetric system of QVIs). Given a symmetric game as in Definition 1.3.1, let V : R → R be a function with the UIP, such that: has polynomial growth and bounded second derivative on some reduced neighbourhood of ∂C * . Suppose further that (I * , δ * ) is an admissible strategy.
Then, V is a symmetric equilibrium payoff attained at a symmetric NE (I * , δ * ).
Note that system (1.3.1) also omits the equation MV − V ≤ 0 as per being redundant. Indeed, by Definition 1.3.1 and Remark 1.2.3, the player does not intervene above 0, nor the opponent below it. Thus, System (1.3.1) significantly simplifies a numerical problem which is very challenging even in cases of linear structure [ABM + 19]. In light of the previous, we will focus our attention on symmetric NEs only and numerically solving the reduced system of QVIs (1.3.1).

Numerical discrete-space problem
In this section we consider a discrete version of the symmetric system of QVIs (1.3.1) over a fixed grid, and propose and study an iterative method to solve it. As it is often done in numerical analysis for stochastic control, for the sake of generality we proceed first in an abstract fashion without making reference to any particular discretization scheme. Instead, we give some general assumptions any such scheme should satisfy for the results presented to hold. Explicit discretization schemes within our framework are presented in Section 2.6 and used in Section 3.

Discrete system of quasi-variational inequalities
From now on we work on a discrete symmetric grid In general, by an 'operator' we simply mean some F : R G → R G , not necessarily linear nor affine unless explicitly stated. We shall identify grid points with indexes, functions in R G with vectors and linear operators with matrices; e.g., S = (S ij ) with S ij = 1 if x i = −x j and 0 otherwise. The (partial) order considered in R G and R G×G is the usual pointwise order for functions (elementwise for vectors and matrices), and the same is true for the supremum, maximum and arg-maximum induced by it.
We want to solve the following discrete nonlinear system of QVIs for v ∈ R G : Some remarks are in order. Firstly, in the same fashion as the continuous-space case, the sets I * , C * form a partition of the grid and represent the (discrete) intervention and continuation regions of the player, while −I * , −C * are such regions for the opponent.
Secondly, the general representation of M follows [CMS07,AF16]. For the standard choices of B(δ), our definition of H is the only one for which a discrete version of Lemma 1.3.7 holds true (see Section 2.6). However, since B and g are row-decoupled, SB(δ * )S and g(Sδ * ) cannot be, as each row x depends on δ * (−x). For this reason and the lack of maximization over −I * , there is no obvious way to reduce problem (2.1.1) to a classical Bellman problem: like in the impulse control case [AF16], to apply Howard's policy iteration [BMZ09, Ho-1]. Furthermore, unlike in the control case, even with unidirectional impulses and good properties for L and B(δ), system (2.1.1) may have no solution as in the analytical case [ABC + 19]. Thirdly, we have defined δ * in (2.1.3) by choosing one particular maximizing impulse for each x ∈ G. The main motivation behind fixing one is to have a well defined discrete system of QVIs for every v ∈ R G . (This is not the case for the analytical problem (1.3.1) where the gain operator H is not well defined unless V has the UIP.) Being able to plug in any v in (2.1.1) and obtain a residual will be useful in practice, when assessing the convergence of the algorithm (see Section 3). Whether a numerical solution verifies, at least approximately, a discrete UIP (and the remaining technical conditions of the Verification Theorem) becomes something to be checked separately.
Remark 2.1.1. Choosing the maximum arg-maximum in (2.1.3) is partly motivated by ensuring a discrete solution will inherit the property of Remark 1.2.4. (The proof remains the same, for the discretizations of Section 2.6.) We will also motivate it in terms of the proposed numerical algorithm in Remark 2.6.4. Note that in [ABM + 19] the minimum arg-maximum is used instead for both players. Nevertheless, the replication of property (ii), Lemma 1.3.7, dictates that it is only possible to be consistent with [ABM + 19] for one of the two players (in this case, the opponent).

Iterative algorithm for symmetric games
This section introduces the iterative algorithm developed to solve system (2.1.1), which builds on [ABM + 19, Alg.2] by harnessing the symmetry of the problem and dispenses with the need for a relaxation scheme altogether. It is presented with a pseudocode that highlights the mimicking of system (2.1.1) and the intuition behind the algorithm; namely: • The player starts with some suboptimal strategy ϕ 0 = (I 0 , δ 0 ) and payoff v 0 , to which the opponent responds symmetrically, resulting in a gain for the player (first equation of (2.1.1); lines 1, 2 and 4 of Algorithm 2.2.1).
• The player improves her strategy by choosing the optimal response, i.e., by solving a single-player impulse control problem through a policy-iteration-type algorithm (second equation of (2.1.1); line 5 of Algorithm 2.2.1).
• This procedure is iterated until reaching a stopping criteria (lines 6-8 of Algorithm 2.2.1).
Notation 2.2.1. In the following: G <0 and G ≤0 represent the sets of grid points which are negative and nonpositive respectively, and Φ the set of (discrete) strategies Set complements are taken with respect to the whole grid, Id : R G → R G is the identity operator; and given a linear operator O : R G → R G R G×G , v ∈ R G and subsets I, J ⊆ G, v I ∈ R I denotes the restriction of v to I and O IJ ∈ R I×J the submatrix/operator with rows in I and columns in J.
Algorithm 2.2.1 Iterative algorithm for symmetric games Set tol, scale > 0.
break from the iteration 8: The scale parameter in line 5 of Algorithm 2.2.1, used throughout the literature by Forsyth, Labahn and coauthors [AF16,FL07,HFL12,HFL13], prevents the enforcement of unrealistic levels of accuracy for points x where v k+1 (x) ≈ 0. Additionally, note that having chosen the initial guess for the payoff v 0 , the initial guess for the strategy is taken as the one induced by v 0 .
Line 5 of Algorithm 2.2.1 assumes we have a subroutine SolveImpulseControl(w, D) that solves the constrained QVI problem: for fixed G ≤0 ⊆ D ⊆ G (approximate continuation region of the opponent) and w ∈ R G (approximate payoff due to the opponent's intervention). Although we only need to solve for v = v D , the value of v D c = w D c impacts the solution both when restricting the equations and when applying the nonlocal operator M . Hence, the approximate payoff v k+1/2 fed to the subroutine serves to pass on the gain that resulted from the opponent's intervention and as an initial guess if desired (more on this on Remark 2.3.3). The remaining of this section consists in establishing an equivalence between problem (2.2.2) and a classical (unconstrained) QVI problem of impulse control, and showing that SolveIm-pulseControl can be defined, if wanted, by policy iteration. However, we will see in the next section a variant that performs better in many practical situations and, in particular, in the examples treated in Section 3. Let us suppose from here onwards that the following assumptions hold true (see Appendix A for the relevant Definitions A.1.1): (A0) For each strategy ϕ = (I, δ) ∈ Φ and x ∈ I, there exists a walk in graphB(δ) from row x to some row y ∈ I c .
(A1) −L is a strictly diagonally dominant (SDD) L 0 -matrix and, for each δ ∈ Z, Id − B(δ) is a weakly diagonally dominant (WDD) L 0 -matrix. Note thatΦ can be identified with the Cartesian product and If (A0) and (A1) also held true for the restricted matrices and strategies (A(·),Φ), the conclusion would follow. While (A1) is clearly inherited, (A0) may fail to do so, but only in non-problematic cases. To see this, letφ = (I,δ) ∈Φ, x ∈ I ⊆ D and δ ∈ Z, some extension of δ. Note that row x of A(φ) is WDD. We want to show that there is a walk in graphA(φ) from x to an SDD row.
By (A0) there must exist some walk x → y 1 → · · · → y n ∈ I c in graphB(δ). If this is in fact a walk from x to I c ∩ D in graphB(δ), then it verifies the desired property (just as in [AF16,Thm.4.3]). If not, then there must be a first 1 ≤ m ≤ n such that the subwalk x → y 1 → · · · → y m is in graphB(δ) but y m+1 / ∈ D. Since y m → y m+1 is an edge in graphB(δ), we have B(δ) ym,y m+1 = 0 and the WDD row (by (A1)) y m ofĨd −B(δ) is in fact SDD. Meaning that the subwalk x → y 1 → · · · → y m verifies the desired property instead.  2.4)), adding an appropriately chosen scaling factor λ to improve efficiency (Remark 2.2.4). It does, however, bear some drawbacks in practice. At each iteration, one needs to solve the system −A(φ k )v k+1 + b(φ k ) = 0 for someφ k ∈Φ. While the matrixL typically has a good sparsity pattern in applications (often tridiagonal), the presence ofB(δ k ) prevents A(φ k ) from inheriting the same structure and makes the resolution of the previous system a lot more costly, even when using a good ordering technique. An exact resolution often becomes prohibitive and an iterative method, such as GMRES or BiCGSTAB with preconditioning, is relied upon.
To circumvent the previous problem, we propose to choose SolveImpulseControl as an instance of a very general class of algorithms known as fixed-point policy iteration [HFL12,Cli07].
where the previous iterate value v k is given andΨ k is the diagonal matrix with ψ k as diagonal. In other words, we split the original policy matrix A(φ) =Ã(φ) −B(φ) and we apply a one-step fixed-point approximation, if (ṽ k+1 −ṽ k )/ max{ṽ k+1 , scale} ∞ < tol then end if 12: end for Lines 1-3 of Subroutine 2.3.1 deal with restricting the constrained problem, while the rest give a routine that can be applied to any QVI of the form (2.2.3). Starting from some suboptimal v 0 and I 0 , one computes a new payoffṽ 1 by solving the coupled equationsMṽ 0 −ṽ 1 = 0 on I 0 andLṽ 1 +f = 0 outside of I 0 . A new intervention region I 1 = Lṽ 1 +f ≤ λ Mṽ 1 −ṽ 1 is defined and the procedure is iterated. (Note that this alternative expression for the intervention region will give the same as Mṽ 1 −ṽ 1 = 0 for the solution of (2.2.3).) Algorithmically, the difference with classical policy iteration is thatṽ k+1 is computed in Line 8 with a fixed obstacleM v k , changing a quasivariational inequality for a variational one. The resulting method is intuitive and simple to implement, and the linear system (2.3.1) (Line 8) inherits the sparsity pattern ofL. This will normally result in more precision and lower space and time complexity. 18 For example, for an SDD tridiagonalL, the system can be solved (exactly in exact arithmetic, and stably in floating point one) in O(n) operations, with n = |D| [Hig02, Sect.9.5]. The matrix-vector multiplyB(δ k )ṽ k can take at most O(n 2 ) operations, but will reduce to O(n) for standard discretizations of impulse operators.
It is also worth mentioning that Subroutine 2.3.1 differs from the so-called iterated optimal stopping [CØS02,ØS09] in that the latter solves max Lṽ k+1 +f ,Mṽ k −ṽ k+1 = 0 exactly at the k-th iteration (by running a full subroutine of Howard's algorithm with fixed obstacle), while the former only performs one approximation step.
To establish the convergence of Subroutine 2.3.1 in the present framework, we add the following assumption: (A2) B(δ) has nonnegative diagonal elements for all δ ∈ Z. (2.3.3) Since I 0 = ∅, and due to (A1) and (A2),B(φ 0 ) = 0 andB(φ k ) ≥ 0 for all k. Thus, (ṽ k ) k≥1 is increasing by monotonicity ofÃ(φ k ). Furthermore, it must be bounded, since for all k ≥ 1: Remark 2.3.3. In light of Theorem 2.3.2 we will set I 0 = ∅ moving forward. It is natural however to chooseṽ 0 = w D and I 0 = Lṽ 0 +f ≤ λ Mṽ 0 −ṽ 0 . The experiments performed displayed (non-monotone) convergence and usually a faster one; but this is not proved here. Additionally, exact convergence was often observed.

Overall algorithm as a fixed-point policy-iteration-type method
The QVIs system (2.1.1) cannot be reduced in any apparent way to a Bellman formulation (2.1.4) (see comments preceding equation). Notwithstanding, we shall see that Algorithm 2.2.1 does take a very similar form to a fixed-point policy iteration algorithm as in (2.3.2) for some appropriate A, B, C. Further, assumptions resembling those of the classical case [HFL12] will be either satisfied or imposed to study its convergence. This is independent of whether SolveIm-pulseControl is chosen as in Subroutine 2.3.1 or Howard's algorithmas in Theorem 2.2.3, although we shall assume that the output I k , δ k in the latter case are defined in the same way as in the former. The matrix and graph theoretic definitions and properties used throughout this section can be found in Appendix A.
The following corollary is immediate by induction. It gives a representation of the sequence of payoffs in terms of the improving strategies throughout the algorithm.
We now establish some properties of the strategy-dependent matrix coefficients that will be useful in the sequel. Given a WDD (resp. substochastic) matrix A ∈ R G×G , we define its set of 'non-trouble states' (or rows) as J[A] := {x ∈ G : row x of A is SDD} (resp.Ĵ[A] := {x ∈ G : row x of A sums less than one}), and its index of connectivity conA (resp. index of contraction conA) by computing for each state the least length that needs to be walked on graphA to reach a non-trouble one, and then taking the maximum over all states (more details in Appendix A). This recently introduced concept gives an equivalent charaterization of the WCDD property for a WDD matrix as conA < +∞, and can be efficiently checked for sparse matrices in O(|G|) operations [Azi19a]. On the other hand, if A is substochastic then conA < ∞ if and only if its spectral radius verifies ρ(A) < 1 (Theorem A.1.6). The proof of the following lemma can be found in Appendix A.
As previously mentioned, system (2.1.1) may have no solution. The matrix coefficients introduced in this section allow us to algebraically characterize the existence of such solutions through strategy-dependent linear systems of equations. For each strategy ϕ ∈ Φ, let O ϕ : (i) v * solves the system of QVIs (2.1.1).
As mentioned in Remark 2.2.2, Assumption (A0) constrains the type of strategies the player can use, but without taking into account the opponent's response. This is enough for the singleplayer constrained problems to have a solution and, therefore, for Algorithm 2.2.1 to be well defined. But we cannot expect this restriction to be sufficient in the study of the two-player game and the convergence of the overall routine.
In order to improve the result of Proposition 2.4.5 let us consider the following stronger version of (A0) reflecting the interaction between the player and the opponent. Remark 2.4.6. (Interpretation) If ϕ, −ϕ are the strategies used by the player and the opponent respectively, 20 then (A0') asserts that states in their intervention regions will eventually be shifted to the common continuation region. This precludes infinite simultaneous interventions and emulates the admissibility condition of the continuous-state case. Fixing I = ∅ we recover (A0). Additionally, (A0') together with (A1) imply that (A − B)(ϕ, ϕ) is WCDD, hence an M -matrix, which is another one of the assumptions of the classical fixed-point policy iteration [HFL12].
Under this new assumption, the ϕ * = ϕ * (v * )-dependent systems of Proposition 2.4.5 will admit a unique solution. Then solving the original problem (2.1.1) amounts to finding v * ∈ R G that solves its induced linear system of equations.

Convergence analysis
We now study the convergence properties of Algorithm 2.2.1. Henceforth, the UIP refers to the obvious discrete analogue of Definition 1.2.1, where we replace the domain R, the impulse constraint Z and the operator M by their discretizations G, Z and M respectively.
The obvious first question to address is whether when Algorithm 2.2.1 converges, it does so to a solution of the system of QVIs (2.1.1). Unlike in the classical Bellman problem (2.1.4), problem (2.1.1) is intrinsically dependent on the particular strategy chosen by the player (see Propositions 2. 4.5 and 2.4.7). Accordingly, we start with a lemma addressing what can be said about the convergence of the strategies (ϕ k ) when the payoffs (v k ) converge.
Proof. That M v k → M v * is clear by continuity of the operators B(δ) and finiteness of Z.
Let x ∈ (∂I * ) c and suppose Lv * (x)+f (x) < M v * (x)−v * (x) (the other case being analogue). By continuity of L and M there must exist some k 0 such that Lv The statement about ψ, δ is proved as before by considering appropriate subsequences. Consequently, if v * has the UIP, then necessarily δ k → δ * (v * ) and Hv k → Hv * .
As a corollary we can establish that, should the sequence (v k ) converge, its limit must solve problem (2.1.1). If convergence is not exact however (i.e., in finite iterations), then we will ask that v * verifies some of the properties of the Verification Theorem in Corollary 1.3.9. Namely, the UIP and a discrete analogue of the continuity in the border of the opponent's intervention region. We emphasise that our main motivation in solving system (2.1.1) relies in Corollary 1.3.9 and its framework. Additionally, in most practical situations and for fine-enough grids, one can intuitively expect the discretization of an equilibrium payoff as in Corollary 2.1.1 to inherit the UIP. Lastly, we note that the exact equality Lv * + f = M v * − v * will typically not be verified for any point in the grid in practice, giving ∂I * = ∅. (ii) If v * has the UIP and Lv * + f = Hv * − v * on −∂I * , then v * solves (2.1.1).
Proof. (i) is an immediate consequence of Lemma 2.5.2 and Proposition 2.4.5.
Lemma 2.5.2 shows to what extent the convergence of the payoffs imply the convergence of the strategies. The following theorem, of theoretical interest, establishes a reciprocal under the stronger assumption (A0'). In general, since the set of strategies Φ is finite, the sequence of strategy-dependent coefficients of the fixed-point equations (2.4.1) will always be bounded and with finitely many limit points. However, if the approximating strategies are such that the former coefficients convergence, then Algorithm 2.2.1 is guaranteed to converge. Further, instead of looking at the convergence of A, B, C ϕ k , ϕ k+1 , we can instead consider the weaker condition of A −1 B, A −1 C ϕ k , ϕ k+1 converging.
Moreover, under our assumptions, (A − B) ϕ, ϕ is WCDD. Then Lemma 2.4.4 and Theorem A.1.6 imply that A −1 B ϕ, ϕ is contractive for some matrix norm. Lastly, note that the sequence of payoffs (v k ) k≥k 0 now satisfies the classical (constant-coefficients) contractive fixed-point recurrence v k+1 = A −1 B ϕ, ϕ v k + b, which converges to the unique fixed-point of the equation.
The classical fixed-point policy-iteration framework [HFL12,Cli07] assumes uniform contractiveness in · ∞ of the sequence of operators. This is a natural norm to consider in a context where matrices have properties defined row by row, such as diagonal dominance. 22 However, the authors mention convergence in experiments where only · ∞ -non-expansiveness held true. The latter is the typical case in our context, for the matrices A −1 B ϕ k , ϕ k+1 , which is why Theorem 2.5.4 relies on the fact that a spectral radius strictly smaller than one guarantees contractiveness in some matrix norm.
It is natural to ask whether there is some contractiveness condition that may account for the observations in [HFL12,Cli07] and that can be generalized to our context to further the study of Algorithm 2.2.1. Imposing a uniform bound on the spectral radii would not only be hard to check, but also difficult to manipulate, as the spectral radius is not sub-multiplicative. 23 Instead, we can consider the sequential indexes of contraction and connectivity, which naturally generalize those of the previous section by means of walks in the graph of a sequence of matrices (see Appendix A for more details). As before, they can be identified with one another (see Lemma A.1.5) and, given substochastic matrices, the sequential index of contraction tells us how many we need to multiply before the result becomes · ∞ -contractive (Theorem A.1.7). Thus, let us consider a uniform bound on the following sequential indexes of connectivity: (A0") There exists m ∈ N 0 such that for any sequence of strategies (ϕ k ) ⊆ Φ, Remark 2.5.5. Given ϕ, ϕ ∈ Φ, by considering the sequence ϕ, ϕ, ϕ, ϕ, . . . , we see that (A0") implies (A0'). In fact, (A0") can be interpreted as precluding infinite simultaneous impulses even when the players can adapt their strategies (cf. Remark 2.4.6), and imposing that the number of shifts needed for any state to reach the common continuation region is bounded.
Under this stronger assumption, we have: 22 Recall that this norm can be computed as the maximum absolute value row sum. 23 ρ(AB) ≤ ρ(A)ρ(B) does not hold in general when the matrices A and B do not commute.
Proof. In a similar way to Lemma 2.4.4, one can check that under (A0"),(A1),(A2) we have the following uniform bound for the sequential indexes of contraction: for any sequence of strategies (ϕ k ) ⊆ Φ. In other words, multiplying any m + 1 of the previous substochastic matrices results in a . ∞ -contractive one. Furthermore, since Φ m+1 is finite, there must a be uniform uniform contraction constant C 1 < 1. Let C 2 > 0 be a uniform bound for A −1 C. By the representation in Corollary 2.
Given n 0 ∈ N and k > (m + 1)n 0 , the same argument of the previous proof shows that one can decompose (v k ) as v k+1 = u k + F (ϕ k−(m+1)n 0 , . . . , ϕ k ) + w k , for a fixed function F , The latter is small if n 0 is large. Hence, one could heuristically expect that the trailing strategies are often the ones dominating the convergence of the algorithm. In fact, in all the experiments carried with a discretization satisfying (A0"),(A1),(A2), a dichotomous behaviour was observed: the algorithm either converged or at some point reached a cycle between a few payoffs. In the latter case, and restricting attention to instances in which one heuristically expects a solution to exist (more details in Section 3), it was possible to reduce the residual to the QVIs and the distance between the iterates by refining the grid.
The previous motivates the study of Algorithm 2.2.1 when the grid is sequentially refined, instead of fixed. Such an analysis however, would likely entail the need of a viscosity solutions framework as in [ABL18,BS91], which does not currently exist in the literature of nonzero-sum stochastic impulse games. Consequently, this analysis and the stronger convergence results that may come out of it are inevitably outside the scope of this paper.

Discretization schemes
Let us conclude this section by showing how one can discretize the symmetric system of QVIs (1.3.1) to obtain (2.1.1) in a way that satisfies the assumptions present throughout the paper. Recall that we work on a given symmetric grid G : Firstly, we want a discretization L of the operator A − ρId such that −L is an SDD L 0matrix as per (A1). A standard way to do this is to approximate the first (resp. second) order derivatives with forward and backward (resp. central) differences in such a way that we approximate the ordinary differential equation (ODE) 1 2 σ 2 V + µV − ρV + f = 0 with an upwind (or positive-coefficients) scheme. More precisely, for each x = x i ∈ G we approximate the first derivative with a forward (resp. backward) difference if its coefficient in the previous equation is nonegative (resp. negative) in x i , if µ(x i ) < 0 and the second derivative by In the case of an equispaced grid with step size h, this reduces to For the previous stencils to be defined in the extreme points of the grid, we consider two additional points x −N −1 , x N +1 and replace V (x −N −1 ), V (x N +1 ) in the previous formulas by some values resulting from artificial boundary conditions. A common choice is to impose Neumann conditions to solve for V (x −N −1 ), V (x N +1 ) using the first order differences from before. For example, in the equispaced grid case, given LBC, RBC ∈ R we solve for V (x −N − h) (resp. V (x N + h)) from the Neumann condition The choice of LBC, RBC is problem-specific and intrinsically linked to that of x N , although it does not affect the properties of the discrete operators. See more details in Section 3. The described procedure leads to a discretization of the ODE as Lv + f = 0, with L satisfying the properties we wanted.
(The strict diagonal dominance is a consequence of ρ > 0.) Note that the values of f at x −N , x N need to be modified to account for the boundary conditions.
Remark 2.6.1. One could increase the overall order of approximation by using central differences as much as possible for the first order derivatives, provided the scheme remains upwind (see [FL07,WF08] for more details). This is not done here in order to simplify the presentation.
We now approximate the impulse constraint sets Z(x) (x ∈ R) by finite sets ∅ = Z(x) ⊆ [0, +∞) (x ∈ G), such that Z(x) = {0} if x ≥ 0, and define the impulse operators where v[[y]] denotes linear interpolation of v on y using the closest nodes on the grid, and v[[y]] = v(x ±N ) if ±y > ±x ±N (i.e., 'no extrapolation'). This univocally defines the discrete loss and gain operators M and H as per (2.1.2), as well as the optimal impulse δ * according to (2.1.3). The set of discrete strategies Φ is defined as in (2.2.1).
This general discretization scheme satisfies assumptions (A0)-(A2) and one can impose some regularity conditions on the sets Z(x) and Z(x) such that the solutions of the discrete QVI problems (2.2.2) converge locally uniformly to the unique viscosity solution of the analytical impulse control problem, as the grid is refined. 26 See [Azi17, ABL18] for more details.
In order to preclude infinite simultaneous interventions it is enough to constrain the size of the impulses so that the symmetric point of the grid cannot be reached. That is, Z(x) ⊆ [0, −2x) for any x ∈ G <0 . In this case, the scheme satisfies the stronger conditions (A0"),(A1),(A2) (and in particular, (A0')). Note that we can take m = N in (A0"), as each positive impulse will lead to a state which is at least one node closer to x 0 = 0, where no player intervenes. Practically, it makes sense to make this choice when one suspects (or wants to check whether) there is a symmetric NE with no 'far-reaching impulses', in the previous sense.
Example 2.6.3. If Z(x) = [0, +∞) for x < 0, the analogous of Example 2.6.2 is now Z( Remark 2.6.4. Consider Example 2.6.3 in the context of Remark 2.1.1. As in Proposition 2.5.6 and due to Theorem A.1.7, the less impulses needed between the two players to reach the common continuation region, the faster that the composition of the fixed-point operators of Algorithm 2.2.1 becomes contractive. Hence, one could intuitively expect that when close enough to the solution, the choice of the maximum arg-maximum in (2.1.3) improves the performance of Algorithm 2.2.1. This is another motivation for such choice.

Numerical results
This section presents numerical results obtained on a series of experiments. See Introduction and Section 1.3 for the motivation and applications behind some of them. We do not assume a priori constraints on the impulses in the analytical problem. All the results presented were obtained on equispaced grids with step size h > 0 (to be specified) and with a discretization scheme as in Section 2.6 and Example 2.6.3. The extreme points of the grid are displayed on each graph.
For the games with linear costs and gains of the form c(x, δ) = c 0 +c 1 δ and g(x, δ) = g 0 +g 1 δ, with c 0 , c 1 , g 0 , g 1 constant, the artificial boundary conditions were taken as LBC = c 1 and RBC = g 1 for a sufficiently extensive grid. They result from the observation that on a hypothetical symmetric NE of the form ϕ For other examples, LBC, RBC and the grid extension were chosen by heuristic guesses and/or trial and error. However, in all the examples presented the error propagation from poorly chosen LBC, RBC was minimal.
The initial guess was set as v 0 = 0 and its induced strategy in all cases. SolveImpulseControl was chosen as Subroutine 2.3.1 with tol = 10 −15 and λ = 1. 27 Its convergence was exact however, in all the examples, and faster (in terms of time elapsed and number of operations) when it was compared with Howard's policy iteration (not reported). Instead of fixing a terminal tolerance tol beforehand, we display the highest accuracy that was attained in each case and the number of iterations needed for it. Section 3.1 considers a fixed grid and and games where the results point to the existence of a symmetric NE as per Corollary 1.3.9. Not having an analytical solution to compare with, results are assessed by means of the percentage difference between the iterations with scale = 1 as in [AF16], and the maximum pointwise residual to the system of QVIs (2.1.1), defined for v ∈ R G by setting Section 3.2 considers the only symmetric games in the literature with (semi-) analytical solution: the central bank linear game [ABC + 19] and the cash management game [BCG19], and computes the errors made by discrete approximations, showing in particular the effect of refining the grid. Not considered here is the strategic pollution control game [FK19], due to its inherent nonsymmetric nature. Section 3.3 comments on results obtained for games without NEs. Finally, Section 3.4 shows results that go beyond the scope of the currently available theory for impulse games.

Convergence to discrete solution on a fixed grid
Throughout this section the grid step size is fixed as h = .01, unless otherwise stated (although results where corroborated by further refinements). Each figure specifies the structure, G = (µ, σ, ρ, f, c, g), of the symmetric game solved and shows the numerical solutions at the terminal iteration for the equilibrium payoff, v k , and NE. Graphs plot payoff versus state of the process. The intervention region is displayed in red over the graph of the payoff for presentation purposes.
As a general rule, we focus on games with higher costs c than gains g, as the opposite typically leads to players attempting to apply infinite simultaneous impulses [ABC + 19] (i.e., inducing a gain from the opponent's intervention is 'cheap') leading to degenerate games. The following games resulted in exact convergence in finite iterations, which guarantees a solution of (2.1.1) was reached (Corollary 2.5.3), although very small acceptable errors where reached much sooner.
The following is an example in which the accuracy stagnates. At that point, the iterates start going back and forth between a few values. Although we cannot guarantee that we are close to a solution of (2.1.1), the results seem quite convincing, with both Diff and maxResQVIs reasonably low. In fact, simply halving the step size to h = .005 produces a substantial improvement of Diff=9.16e-11% and maxResQVIs=9.19e-11 in k =33 iterations.
The previous games have a state variable evolving as a scaled Brownian motion. We now move on to a mean reverting OU process with zero long term mean. (Recall that any other value an be handled simply by shifting the game.) In general, the experiments with this dynamics converged exactly in a very small number of iterations.
Note how all the games treated in this section exhibit a typical feature known to hold for simpler symmetric games [ABC + 19, ABM + 19, BCG19]: the equilibrium payoff of the player is only C 1 at the border of the intervention region ∂I * = {AV − ρV + f = MV − V } ∩ R <0 , and only continuous at the border of the opponent's intervention region −∂I * . In floating point arithmetic, the former makes the discrete approximation of ∂I * particularly elusive, while the latter can lead to high errors when close to −∂I * . As a consequence, Subroutine 2.3.1 (or any equivalent) will often misplace a few grid nodes between intervention and continuation regions, which will in turn make the residual resQVIs 'spike' on the opponent's side. Thus, a large value of maxResQVIs can at times be misleading, and further inspection of the pointwise residuals is advisable.
As a matter of fact, halving the step size to h = .005 in the last example results in exact convergence with terminal maxResQVIs= 2510, but the residual on all grid nodes other than the 'border' of the opponent's intervention region and a contiguous one is less than 1.43e-11. This is an extreme example propitiated by the almost vertical shape of the solution close to such border. Thus, while it is useful in practice to consider a stopping criteria for Algorithm 2.2.1 based on maxResQVIs, this phenomenon needs to be minded.
The last example also shows how impulses at a NE can lead to different endpoints depending on the state of the process. This is often the case when costs are nonlinear. In fact: Lemma 3.1.1. Let (I * , δ * ) and V be a symmetric NE and equilibrium payoff as in Corollary 1.3.9. Suppose Z(x) = [0, +∞) for x < 0 and c = c(x, δ) ∈ C 2 R × (0, +∞) , and consider the re-parametrization c = c(x, y) := c(x, y − x). Suppose that y * := x + δ * (x) is constant for all x ∈ (I * ) • . 28 Then c xy (x, y * ) ≡ 0 on (I * ) • .
The result is immediate since MV (x) = sup y>x {V (y) − c(x, y)} = V (y * ) − c(x, y * ) on (I * ) • and V ∈ C 1 − C * ). (y * / ∈ −I * or there would be infinite simultaneous impulses.) While the sufficient condition c xy (x, y * ) ≡ 0 for some y * is verified for linear costs, it is not in general and certainly not for c(x, δ) = 10 + 20 √ δ as in the last example.

Convergence to analytical solution with refining grids
A convergence analysis from discrete to analytical solution with refining grids is outside the scope of this paper, and seems to be far too challenging when a viscosity solutions framework is yet to be developed. Instead, we present here a numerical validation using the solutions of the linear and cash management games. We focus first and foremost in the former, as it has an almost fully analytical solution, with only one parameter to be found numerically as opposed to four for the latter.  Table 1: Convergence to analytical solution when refining equispaced symmetric grid with step size h and endpoint x N = 4. Game: µ = 0, σ = .15, ρ = .02, f = x + 3, c = 100 + 15δ, g = 15δ. %error := (v −V )/V ∞ , with V exact solution and v discrete approximation after Its iterations. · ∞ is computed over the grid.
The exact NE of this game (up to five significant figures) is given by the intervention region I * = (−∞, −2.8238] and impulse function δ * (x) = 1.5243 − x, while the approximation given 28 A • denotes the interior of the set A ⊆ R. by Algorithm 2.2.1with h = 1/64 is (−∞, −2.8125], 1.5313 − x , with absolute errors on the parameters of no more than the step size.
The cash management game [BCG19] with unidirectional impulses can be embedded in our framework by reducing its dimension with the change of variables x = x 1 − x 2 , changing minimization by maximization and relabelling the players. With the parameter values of [BCG19, Fig.1b], it translates into the game: µ = 0, σ = 1, ρ = .5, f = −|x|, c = 3 + δ, g = −1. The authors found numerically a symmetric NE approximately equal to (−∞, −5.658], 0.686 − x , while Algorithm 2.2.1 with x N = 8 and h = 1/64 gives (−∞, −5.6563], 0.6875 − x . The absolute difference on the parameters is once again below the grid step size.

Games without Nash equilibria
It is natural to ask how does Algorithm 2.2.1 behave on games without symmetric NEs, and whether anything can be inferred from the results. For the linear game, two cases without NEs (symmetric or not) are addressed in [ABC + 19]: 'no fixed cost' and 'gain greater than cost'. Both of them yield degenerate 'equilibria' where the players perform infinite simultaneous interventions. When tested for several parameters, Algorithm 2.2.1 converged in finitely many iterations (although rather slowly) and yielded the exact same type of 'equilibria'. 29 For a fine enough grid, the latter can be identified heuristically by some node in the intervention region that would be shifted to its symmetric one (infinite alternated interventions), or to its immediate successor over and over again, until reaching the continuation region (infinite onesided interventions). 30 In the first case, we recovered the limit 'equilibrium payoffs' of [ABC + 19, Props.4.10,4.11]. Intuitively, the players in this game take advantage of free interventions, whether by no cost or perfect compensation, in order to shift the process as desired. Note that when c = g, the impulses that maximimize the net payoff are not unique.
In the second case, grid refinements showed the discrete payoffs to diverge towards infinity at every point. This is again consistent with the theory: each player forces the other one to act, producing a positive instantaneous profit. Iterating this procedure infinitely often leads to infinite payoffs for every state.
Tested games in which Algorithm 2.2.1 failed to converge (and not due to stagnation nor a poor choice of the grid extension) were characterized by iterates reaching a cycle, typically with high values of Diff and maxResQVIs regardless of the grid step size. In many cases, the cycles would visit at least one payoff inducing infinite simultaneous impulses. While this might, potentially and heuristically, be indicative of the game admitting no symmetric NE, there is not much more than can be said at this stage.

Beyond the Verification Theorem
We now present two cases in which the solution of the discrete QVIs system (2.1.1) found with Algorithm 2.2.1 (by exact convergence) does not comply with the continuity and smoothness assumptions of Corollary 1.3.9. However, the results are sensible enough to heuristically argue they may correspond to NEs beyond the scope of the Verification Theorem. In both cases h = .01. Finer grids yielded the qualitative same results.
The first one considers costs convex in the impulse. When far enough from her continuation region, it is cheaper for the player to apply several (finitely many) simultaneous impulses instead of one, to reach the state she wishes to (cf. Remark 1.2.4). In this game, the optimal impulse δ * becomes discontinuous, and its discontinuity points are those in the intervention region where the equilibrium payoff is non-differentiable. These, in turn, translate into discontinuities on the opponent's intervention region (one smoothness degree less, as with the border of the regions).
The second one considers linear gains, quadratic running payoffs and costs concave on the impulse. The latter makes the player shift the process towards her continuation region (cf. Remark 1.2.4). However, when far enough from the border, instead of shifting the process directly to her 'preferred area', the player chooses to pay a bit more to force her opponent's intervention, inducing a gain and letting the latter pay for the final move. Once again, this causes δ * to be discontinuous and leads to a non-differentiable (resp. discontinuous) point for the equilibrium payoff in the intervention (resp. opponent's intervention) region.
Under the previous reasoning, one could intuitively guess that setting g = 0 in this game should remove the main incentive the player has to force her opponent to act. This is in fact the case, as shown below. As a result, δ * is continuous and the resulting equilibrium payoff falls back into the domain of the Verification Theorem.
We remark that, should the previous solutions correspond indeed to NEs, then the alternative semi-analytical approach of [FK19] could not have produced them either, as the latter can only yield continuous equilibrium payoffs.

Concluding remarks
This paper presents a fixed-point policy-iteration-type algorithm to solve systems of quasivariational resulting from a Verification Theorem for symmetric nonzero-sum stochastic impulse games. In this context, the method substantially improves the only algorithm available in the literature while providing the convergence analysis that we were missing. Graph-theoretic assumptions relating to weakly chained diagonally dominant matrices, which naturally parallel the admissibility of the players strategies, allow to prove properties of contractiveness, boundedness of iterates and convergence to solutions. A result of theoretical interest giving sufficient conditions for convergence is also proved. Equilibrium payoffs and Nash equilibria of games too challenging for the available analytical approaches are computed with high precision on a discrete setting. Numerical validation with analytical solutions is performed when possible, with reassuring results, but it is noted that grid refinements may be needed at times to overcome stagnation. Thus, formalising the approximating properties of the discrete solutions as well as deriving stronger convergence results for the algorithm may need a viscosity solutions framework currently missing in the theory. This is further substantiated by the irregularity of the solutions, particularly those found which escape the available theoretical results. This motivates further research while providing a tool that can effectively be used to gain insight into these very challenging problems.

A Matrix and graph theoretic definitions and results
For the reader's convenience, this appendix summarizes some important algebraic and graph theoretic definitions and results used throughout the paper. More details can be found in the references given below. Henceforth, A ∈ R N ×N is a real matrix, Id ∈ R N ×N is the identity, ρ(·) denotes the spectral radius and R N , R N ×N are equipped with the elementwise order. We talk about rows and 'states' interchangeably. (D4) A is monotone if it is nonsingular and A −1 ≥ 0. Equivalently, A is monotone if Ax ≥ 0 implies x ≥ 0 for any x ∈ R N .
(D7) The directed graph of A is the pair graphA := (V, E), where V := {1, . . . , N } is the set of vertexes and E ⊆ V × V is the set of edges, such that (i, j) ∈ E iff A ij = 0.
(D9) A is weakly chained diagonally dominant (WCDD) if it is WDD and for each WDD row of A there is a walk in graphA to an SDD row (identifying vertexes and rows).
(D10) A is (right) substochastic or sub-Markov (resp. stochastic or Markov ) if A ≥ 0 and each row sums at most one (resp. exactly one). Equivalently, A is substochastic if A ≥ 0 and A ∞ ≤ 1. (Recall that A ∞ is the maximum row-sum of absolute values.) (D11) If A is a WDD (resp. substochastic) matrix, its set of 'non-trouble states' (or rows) is J It is clear that SDD =⇒ WCDD =⇒ WDD, and by definition, L-matrix =⇒ L 0 -matrix =⇒ Z-matrix. Also by definition, if A is WDD, then A is WCDD ⇐⇒ conA < +∞. For the following theorem, recall the characterization of the spectral radius ρ(A) = inf{ A : · is a matrix norm} and Gelfand's formula ρ(A) = lim n→+∞ A n 1/n , for any matrix norm · . Note also that if A is substochastic, then A n is also substochastic for any n ∈ N 0 , A n ∞ ≤ 1 and ρ(A) ≤ 1. In particular, conA < +∞ if and only if ρ(A) < 1.
The indexes of contraction and connectivity can be generalized in a natural way to sequences (A k ) ⊆ R N ×N by considering walks i 1 → i 2 → . . . such that i k → i k+1 is an edge in graphA k (see [Azi19a, App.B] for more details). Theorem A.1.6 extends in the following way: Proof of Lemma 2.4.4. For briefness, we omit the dependence on ϕ, ϕ from the notation.
A −1 B ≥ 0 since B ≥ 0 and A is monotone. To see that its rows sum up to one, let 1 ∈ R G be the vector of ones. It is easy to check that under (A1)-(A2), AA −1 B1 = B1 ≤ A1, which implies A −1 B1 ≤ 1. The WDD L 0 -property of A − B is due to (A1). 31 (·) + denotes positive part, inf ∅ = +∞ and sup ∅ = −∞. The index is the least length that needs to be walked on graphA to reach the non-trouble states when starting from an arbitrary trouble one. 32 [Azi19a, Thm.2.24] is formulated in terms of L-matrices instead. However, it is trivial to see that: WCDD L0-matrix ⇐⇒ WCDD L-matrix.
Note that by Proposition A.1.5, we can consider A −1 B and Id − A −1 B interchangeably. Let us show that J (A − B) ϕ, ϕ ⊆Ĵ A −1 B ϕ, ϕ . By monotonicity and L 0 -property, A −1 must have strictly positive diagonal elements. Indeed, if there was some index i such that A −1 ii = 0, then 1 = [AA −1 ] ii = j =i A ij A −1 ji ≤ 0. Consider now some x i ∈Ĵ A −1 B c , i.e.,