1 Introduction

Between Aliceland and Bobbesia lies a sparsely populated desert. Until recently, neither of the two countries had any interest in the desert. However, geologists have recently discovered that it contains large oil reserves. Now, both Aliceland and Bobbesia would like to annex the desert, but they worry about a military conflict that would ensue if both countries insist on annexing.

Table 1 models this strategic situation as a normal-form game. The strategy \(\mathrm {DM}\) (short for “Demand with Military”) denotes a military invasion of the desert, demanding annexation. If both countries send their military with such an aggressive mission, the countries fight a devastating war. The strategy \(\mathrm {RM}\) (for “Refrain with Military”) denotes yielding the territory to the other country, but building defenses to prevent an invasion of one’s current territories. Alternatively, the countries can choose to not raise a military force at all, while potentially still demanding control of the desert by sending only its leader (\(\mathrm {DL}\), short for “Demand with Leader”). In this case, if both countries demand the desert, war does not ensue. Finally, they could neither demand nor build up a military (\(\mathrm {RL}\)). If one of the two countries has their military ready and the other does not, the militarized country will know and will be able to invade the other country. In game-theoretic terms, militarizing therefore strictly dominates not militarizing.

Table 1 The Demand Game

Instead of making the decision directly, the parliaments of Aliceland and Bobbesia appoint special commissions for making this strategic decision, led by Alice and Bob, respectively. The parliaments can instruct these representatives in various ways. They can explicitly tell them what to do – for example, Aliceland could directly tell Alice to play \(\mathrm {DM}\). However, we imagine that the parliaments trust the commissions’ judgments more than they trust their own and hence they might prefer to give an instruction of the type, “make whatever demands you think are best for our country” (perhaps contractually guaranteeing a reward in proportion to the utility of the final outcome). They might not know what that will entail, i.e., how the commissions decide what demands to make given that instruction. However – based on their trust in their representatives – they might still believe that this leads to better outcomes than giving an explicit instruction.

We will also imagine these instructions are (or at least can be) given publicly and that the commissions are bound (as if by a contract) to follow these instructions. In particular, we imagine that the two commissions can see each other’s instructions. Thus, in instructing their commissions, the countries play a game with bilateral precommitment. When instructed to play a game as best as they can, we imagine that the commissions play that game in the usual way, i.e., without further abilities to credibly commit or to instruct subcommittees and so forth.

It may seem that without having their parliaments ponder equilibrium selection, Aliceland and Bobbesia cannot do better than leave the game to their representatives. Unfortunately, in this default equilibrium, war is still a possibility. Even the brilliant strategists Alice and Bob may not always be able to resolve the difficult equilibrium selection problem to the same pure Nash equilibrium.

In the literature on commitment devices and in particular the literature on program equilibrium, important ideas have been proposed for avoiding such bad outcomes. Imagine for a moment that Alice and Bob will play a Prisoner’s Dilemma (Table 3) (rather than the Demand Game of Table 1). Then the default of (Defect, Defect) can be Pareto-improved upon. Both original players (Aliceland and Bobbesia) can use the following instruction for their representatives: “If the opponent’s instruction is equal to this instruction, Cooperate; otherwise Defect.” ([22, 33, 46], Sect. 10.4, [55]) Then it is a Nash equilibrium for both players to use this instruction. In this equilibrium, (Cooperate, Cooperate) is played and it is thus Pareto-optimal and Pareto-better than the default.

In cases like the Demand Game, it is more difficult to apply this approach to improve upon the default of simply delegating the choice. Of course, if one could calculate the expected utility of submitting the default instructions, then one could similarly commit the representatives to follow some (joint) mix over the Pareto-optimal outcomes (\((\mathrm {RM},\mathrm {DM})\), \((\mathrm {DM}, \mathrm {RM})\), \((\mathrm {RM}, \mathrm {RM})\), \((\mathrm {DL}, \mathrm {DL})\), etc.) that Pareto-improves on the default expected utilities.Footnote 1 However, we will assume that the original players are unable or unwilling to form probabilistic expectations about how the representatives play the Demand Game, i.e., about what would happen with the default instructions. If this is the case, then this type of Pareto improvement on the default is unappealing.

The goal of this paper is to show and analyze how even without forming probabilistic beliefs about the representatives, the original players can Pareto-improve on the default equilibrium. We will call such improvements safe Pareto improvements (SPIs). We here briefly give an example in the Demand Game.

The key idea is for the original players to instruct the representatives to select only from \(\{\mathrm {DL},\mathrm {RL}\}\), i.e., to not raise a military. Further, they tell them to disvalue the conflict outcome without military \((\mathrm {DL}, \mathrm {DL})\) as they would disvalue the original conflict outcome of war in the default equilibrium. Overall, this means telling them to play the game of Table 2. (Again, we could imagine that the instructions specify Table 2 to be how Aliceland and Bobbesia financially reward Alice and Bob.) Importantly, Aliceland’s instruction to play that game must be conditional on Bobbesia also instructing their commission to play that game, and vice versa. Otherwise, one of the countries could profit from deviating by instructing their representative to always play \(\mathrm {DM}\) or \(\mathrm {RM}\) (or to play by the original utility function).

Table 2 A safe Pareto improvement for the Demand Game

The game of Table 2 is isomorphic to the \(\mathrm {DM}\)-\(\mathrm {RM}\) part of the original Demand Game of Table 1. Of course, the original players know neither how the original Demand Game nor the game of Table 2 will be played by the representatives. However, since these games are isomorphic, one should arguably expect them to be played isomorphically. For example, one should expect that \((\mathrm {RM}, \mathrm {DM})\) would be played in the original game if and only if \((\mathrm {RL}, \mathrm {DL})\) would be played in the modified game. However, the conflict outcome \((\mathrm {DM}, \mathrm {DM})\) is replaced in the new game with the outcome \((\mathrm {DL}, \mathrm {DL})\). This outcome is harmless (Pareto-optimal) for the original players.

Contributions Our paper generalizes this idea to arbitrary normal-form games and is organized as follows. In Sect. 2, we introduce some notation for games and multivalued functions that we will use throughout this paper. In Sect. 3, we introduce the setting of delegated game playing for this paper. We then formally define and further motivate the concept of safe Pareto improvements. We also define and give an example of unilateral SPIs. These are SPIs that require only one of the players to commit their representative to a new action set and utility function. In Sect. 3.2, we briefly review the concepts of program games and program equilibrium and show that SPIs can be implemented as program equilibria. In Sect. 4.2, we introduce a notion of outcome correspondence between games. This relation expresses the original players’ beliefs about similarities between how the representatives play different games. In our example, the Demand Game of Table 1 (arguably) corresponds to the game of Table 2 in that the representatives (arguably) would play \((\mathrm {DM},\mathrm {DM})\) in the original game if and only if they play \((\mathrm {DL},\mathrm {DL})\) in the new game, and so forth. We also show some basic results (reflexivity, transitivity, etc.) about the outcome correspondence relation on games. In Sect. 4.3 we show that the notion of outcome correspondence is central to deriving SPIs. In particular, we show that a game \(\Gamma ^s\) is an SPI on another game \(\Gamma\) if and only if there is a Pareto-improving outcome correspondence relation between \(\Gamma ^s\) and \(\Gamma\).

To derive SPIs, we need to make some assumptions about outcome correspondence, i.e., about which games are played in similar ways by representatives. We give two very weak assumptions of this type in Sect. 4.4. The first is that the representatives’ play is invariant under the removal of strictly dominated strategies. For example, we assume that in the Demand Game the representatives only play \(\mathrm {DM}\) and \(\mathrm {RM}\). Moreover we assume that we could remove \(\mathrm {DL}\) and \(\mathrm {RL}\) from the game and the representatives would still play the same strategies as in the original Demand Game with certainty. The second assumption is that the representatives play isomorphic games isomorphically. For example, once \(\mathrm {DL}\) and \(\mathrm {RL}\) are removed for both players from the Demand Game, the Demand Game is isomorphic to the game in Table 2 such that we might expect them to be played isomorphically. In Sect. 4.5, we derive a few SPIs – including our SPI for the Demand Game – using these assumptions. Section 4.6 shows that determining whether there exists an SPI based on these assumptions is NP-complete. Section 5 considers a different setting in which we allow the original players to let the representatives choose from newly constructed strategies whose corresponding outcomes map arbitrarily onto feasible payoff vectors from the original game. In this new setting, finding SPIs can be done in polynomial time. We conclude by discussing the problem of selecting between different SPIs on a given game (Sect. 6) and giving some ideas for directions for future work (Sect. 7).

2 Preliminaries

We here give some basic game-theoretic definitions. We assume the reader to be familiar with most of these concepts and with game theory more generally.

An n-player (normal-form) game is a tuple \((A,\mathbf {u})\) of a set \(A=A_1\times ... \times A_n\) of (pure) strategy profiles (or outcomes) and a function \(\mathbf {u}:A \rightarrow \mathbb {R}^n\) that assigns to each outcome a utility for each player. The Prisoner’s Dilemma shown in Table 3 is a classic example of a game. The Demand Game of Table 1 is another example of a game that we will use throughout this paper.

Instead of \((A,\mathbf {u})\) we will also write \((A_1,...,A_n,u_1,...,u_n)\). We also write \(A_{-i}\) for \(\times _{j\ne i} A_i\), i.e., for the Cartesian product of the action sets of all players other than i. We similarly write \(\mathbf {u}_{-i}\) and \(\mathbf {a}_{-i}\) for vectors containing utility functions and actions, respectively, for all players but i. If \(u_i\) is a utility function and \(\mathbf {u}_{-i}\) is a vector of utility functions for all players other than i, then (even if \(i\ne 1\)) we use \((u_i,\mathbf {u}_{-i})\) for the full vector of utility functions where Player i has utility function \(u_i\) and the other players have utility functions as specified by \(\mathbf {u}_{-i}\). We use \((A_i,A_{-i})\) and \((a_i,\mathbf {a}_{-i})\) analogously.

We say that \(a_i\in A_i\) strictly dominates \(a_i'\in A_i\) if for all \(a_{-i}\in A_{-i}\), \(u_i(a_i,a_{-i})>u_i(a_{i}',a_{-i})\). For example, in the Prisoner’s Dilemma, Defect strictly dominates Cooperate for both players. As noted earlier, \(\mathrm {DM}\) and \(\mathrm {RM}\) strictly dominate \(\mathrm {DL}\) and \(\mathrm {RL}\) for both players.

For any given game \(\Gamma =(A,\mathbf {u})\), we will call any game \(\Gamma '=(A',\mathbf {u}')\) a subset game of \(\Gamma\) if \(A_i'\subseteq A_i\) for \(i=1,...,n\). Note that a subset game may assign different utilities to outcomes than the original game. For example, the game of Table 2 is a subset game of the Demand Game.

We say that some utility vector \(\mathbf {y}\in \mathbb {R}^n\) is a Pareto improvement on (or is Pareto-better than) \(\mathbf {y}'\in \mathbb {R}^n\) if \(y_i\ge y_i'\) for \(i=1,...,n\). We will also denote this by \(\mathbf {y}\ge \mathbf {y}'\). Note that, contrary to convention, we allow \(\mathbf {y}=\mathbf {y}'\). Whenever we require one of the inequalities to be strict, we will say that \(\mathbf {y}\) is a strict Pareto improvement on \(\mathbf {y}'\). In a given game, we will also say that an outcome \(\mathbf {a}\) is a Pareto improvement on another outcome \(\mathbf {a}'\) if \(\mathbf {u}(\mathbf {a}) \ge \mathbf {u}(\mathbf {a}')\). We say that \(\mathbf {y}\) is Pareto-optimal or Pareto-efficient relative to some \(S\subset \mathbb {R}^n\) if there is no element of S that strictly Pareto-dominates \(\mathbf {y}\).

Let \(\Gamma =(A,\mathbf {u})\) and \(\Gamma '=(A',\mathbf {u}')\) be two n-player games. Then we call an n-tuple of functions \(\Phi =\left( \Phi _i:A_i\rightarrow A_i'\right) _{i=1,...,n}\) a (game) isomorphism between \(\Gamma\) and \(\Gamma '\) if there are vectors \(\varvec{\lambda }\in \mathbb {R}_{+}^n\) and \(\mathbf {c}\in \mathbb {R}^n\) such that

$$\begin{aligned} u_i(a_1,...,a_n) = \lambda_i u_i'(\Phi _1(a_1),...,\Phi _n(a_n))+c_i \end{aligned}$$

for all \(\mathbf {a}\in A\) and all \(i=1,\ldots,n\). If there is an isomorphism between \(\Gamma\) and \(\Gamma '\), we call \(\Gamma\) and \(\Gamma '\) isomorphic. For example, if we let \(\Gamma\) be the Demand Game and \(\Gamma ^s\) the subset game of Table 2, then \((\{\mathrm {DM},\mathrm {RM}\},\{\mathrm {DM},\mathrm {RM}\},\mathbf {u})\) is isomorphic to \(\Gamma ^s\) via the isomorphism \(\Phi\) with \(\Phi _i(\mathrm {DM})=\mathrm {DL}\) and \(\Phi _i(\mathrm {RM})=\mathrm {RL}\) and the constants \(\varvec{\lambda }=(1,1)\) and \(\mathbf {c}=(0,0)\).

Table 3 The Prisoner’s dilemma

3 Delegation and safe Pareto improvements

We consider a setting in which a given game \(\Gamma\) is played through what we will call representatives. For example, the representatives could be humans whose behavior is determined or incentivized by some contract à la the principal–agent literature [28]. Our principals’ motivation for delegation is the same as in that literature (namely, the agent being in a better (epistemic) position to make the choice). However, the main question asked by the principal-agent literature is how to deal with agents that have their own preferences over outcomes, by constraining the agent’s choice (e.g. [21, 25]), setting up appropriate payment schemes (e.g. [23, 29, 37, 53]), etc. In contrast, we will throughout this paper assume that the agent has no conflicting incentives.

We imagine that one way in which the representatives can be instructed is to in turn play a subset game \(\Gamma ^s=(A_1^s\subseteq A_1,...,A_n^s\subseteq A_n,\mathbf {u}^s)\) of the original game, without necessarily specifying a strategy or algorithm for solving such a game. We emphasize, again, that \(\mathbf {u}^s\) is allowed to be a vector of entirely different utility functions. For any subset game \(\Gamma ^s\), we denote by \(\Pi (\Gamma ^s)\) the outcome that arises if the representatives play the subset game \(\Gamma ^s\) of \(\Gamma\). Because it is unclear what the right choice is in many games, the original players might be uncertain about \(\Pi (\Gamma ^s)\). We will therefore model each \(\Pi (\Gamma ^s)\) as a random variable. We will typically imagine that the representatives play \(\Gamma\) in the usual simultaneous way, i.e., that they are not able to make further commitments or delegate again. For example, we imagine that if \(\Gamma\) is the Prisoner’s Dilemma, then \(\Pi (\Gamma )=(\mathrm {Defect},\mathrm {Defect})\) with certainty.

The original players trust their representatives to the extent that we take \(\Pi (\Gamma )\) to be a default way for the game to played for any \(\Gamma\). That is, by default the original players tell their representatives to play the game as given. For example, in the Demand Game, it is not clear what the right action is. Thus, if one can simply delegate the decision to someone with more relevant expertise, that is the first option one would consider.

We are interested in whether and how the original players can jointly Pareto-improve on the default. Of course, one option is to first compute the expected utilities under default delegation, i.e., to compute \(\mathbb {E}\left[ \mathbf {u}(\Pi (\Gamma ))\right]\). The players could then let the representatives play a distribution over outcomes whose expected utilities exceed the default expected utilities. However, this is unrealistic if \(\Gamma\) is a complex game with potentially many Nash equilibria. For one, the precise point of delegation is that the original players are unable or unwilling to properly evaluate \(\Gamma\). Second, there is no widely agreed upon, universal procedure for selecting an action in the face of equilibrium selection problems. In such cases, the original players may in practice be unable to form a probability distribution over \(\Pi (\Gamma )\). This type of uncertainty is sometimes referred to as Knightian uncertainty, following Knight’s [26] distinction between the concepts of risk and uncertainty.

We address this problem in a typical way. Essentially, we require of any attempted improvement over the default that it incurs no regret in the worst-case. That is, we are interested in subset games \(\Gamma ^s\) that are Pareto improvements with certainty under weak and purely qualitative assumptions about \(\Pi\).Footnote 2 In particular, in Sect. 4.4, we will introduce the assumptions that the representatives do not play strictly dominated actions and play isomorphic games isomorphically.

Definition 1

Let \(\Gamma ^s\) be a subset game of \(\Gamma\). We say \(\Gamma ^s\) is a safe Pareto improvement (SPI) on \(\Gamma\) if \(\mathbf {u}(\Pi (\Gamma ^s))\ge \mathbf {u}(\Pi (\Gamma ))\) with certainty. We say that \(\Gamma ^s\) is a strict SPI if furthermore, there is a player i s.t. \(u_i(\Pi (\Gamma ^s))>u_i(\Pi (\Gamma ^s))\) with positive probability.

For example, in the introduction we have argued that the subset game in Table 2 is a strict SPI on the Demand Game (Table 1). Less interestingly, if we let \(\Gamma =(A,\mathbf {u})\) be the Prisoner’s Dilemma (Table 3), then we would expect \((\{\mathrm {Cooperate}\},\{\mathrm {Cooperate}\},\mathbf {u})\) to be an SPI on \(\Gamma\). After all, we might expect that \(\Pi (\Gamma )=(\mathrm {Defect},\mathrm {Defect})\) with certainty, while it must be \(\Pi (\{\mathrm {Cooperate}\},\{\mathrm {Cooperate}\},\mathbf {u})=(\mathrm {Cooperate},\mathrm {Cooperate})\) with certainty, for lack of alternatives. Both players prefer mutual cooperation over mutual defection.

3.1 Unilateral safe Pareto improvements

Both SPIs given above require both players to let their representatives choose from restricted strategy sets to maximize something other than the original player’s utility function.

Definition 2

We will call a subset game \(\Gamma ^s=(A^s,\mathbf {u}^s)\) of \(\Gamma =(A,\mathbf {u})\) unilateral if for all but one \(i\in \{1,...,n\}\) it holds that \(A_i^s=A_i\) and \(u_i^s=u_i\). Consequently, if a unilateral subset game \(\Gamma ^s\) of \(\Gamma\) is also an SPI for \(\Gamma\), we call \(\Gamma ^s\) a unilateral SPI.

We now give an example of a unilateral SPI using the Complicated Temptation Game. (We give the not-so-complicated Temptation Game – in which we can only give a trivial example of SPIs – in Sect. 4.5.) Two players each deploy a robot. Each of the robots faces two choices in parallel. First, each can choose whether to work on Project 1 or Project 2. Player 1 values Project 1 higher and Player 2 values Project 2 higher, but the robots are more effective if they work on the same project. To complete the task, the two robots need to share a resource. Robot 2 manages the resource and can choose whether to control Robot 1’s access tightly (e.g., by frequently checking on the resource, or requiring Robot 1 to demonstrate a need for the resource) or give Robot 1 relatively free access. Controlling access tightly decreases the efficiency of both robots, though the exact costs depend on which projects the robots are working on. Robot 1 can choose between using the resource as intended by Robot 2; or give in to the temptation of trying to steal as much of the resource as possible to use it for other purposes. Regardless of what Robot 2 does (in particular, regardless of whether Robot 2 controls access or not), Player 1 prefers trying to steal. In fact, if Robot 2 controls access and Robot 1 refrains from theft, they never get anything done. Given that Robot 1 tries to steal, Player 2 prefers his Robot 2 to control access. As usual we assume that the original players can instruct their robots to play arbitrary subset games of \(\Gamma\) (without specifying an algorithm for solving such a game) and that they can give such instructions conditional on the other player providing an analogous instruction.

We formalize this game as a normal-form game in Table 4. Each action consists of a number and letter. The number indicates the project that the agent pursues. The letters indicates the agent’s policy towards the resource. In Player 2’s action labels, C indicates tight control over the resource, while F indicates free access. In Player 1’s action labels, T indicates giving in to the temptation to steal as much of the resource as possible, while R indicates refraining from doing so.

Player 1 has a unilateral SPI in the Complicated Temptation Game. Intuitively, if Player 1 commits to refrain, then Player 2 need not control the use of the resource. Thus, inefficiencies from conflict over the resource are avoided. However, Player 1’s utilities in the resulting game of choosing between projects 1 and 2 are not isomorphic to the original game of choosing between projects 1 and 2. The players might therefore worry that this new game will result in a worse outcome for them. For example, Player 2 might worry that in this new game the project 1 equilibrium (\(T_1,F_1\)) becomes more likely than the project 2 equilibrium. To address this, Player has to commit her representative to a different utility function that makes this new game isomorphic to the original game.

We now describe the unilateral SPI in formal detail. Player 1 can commit her representative to play only from \(R_1\) and \(R_2\) and to assign utilities \(u^s_1(R_1,F_1)=u_1(T_1,C_1)=4\), \(u^s_1(R_1,F_2)=u_1(T_1,C_2)=1\), \(u^s_1(R_2,F_1)=u_1(T_2,C_1)=1\), and \(u^s_1(R_2,F_2)=u_1(T_2,C_2)=2\); otherwise \(u_1^s\) does not differ from \(u_1\). The resulting SPI is given in Table 5. In this subset game, Player 2’s representative – knowing that Player 1’s representative will only play from \(R_1\) and \(R_2\) – will choose from \(F_1\) and \(F_2\) (since \(F_1\) and \(F_2\) strictly dominate \(C_1\) and \(C_2\) in Table 5). Now notice that the remaining subset game is isomorphic to the \((\{T_1,T_2\},\{C_1,C_2\})\) subset game of the original Complicated Temptation Game, where \(T_1\) maps to \(R_1\) and \(T_2\) maps to \(R_2\) for both Player 1, and \(C_1\) maps to \(F_1\) and \(C_2\) maps to \(F_2\) for Player 2. Player 1’s representative’s utilities have been set to be the same between the two; and Player 2’s utilities happen to be the same up to a constant (1) between the two subset games. Thus, we might expect that if \(\Pi (\Gamma )=(T_1,C_1)\), then \(\Pi (\Gamma ^s)=(R_1,F_1)\), and so on. Finally, notice that \(\mathbf {u}(R_1,F_1)\ge \mathbf {u}(T_1,C_1)\) and so on. Hence, Table 5 is indeed an SPI on the Complicated Temptation Game.

Table 4 Complicated Temptation Game
Table 5 Safe Pareto improvement for the Complicated Temptation Game

Such unilateral changes are particularly interesting because they only require one of the players to be able to credibly delegate. That is, it is enough for a single player to instruct their representative to choose from a restricted action set to maximize a new utility function. The other players can simply instruct their representatives to play the game in the normal way (i.e., maximizing the respective players’ original utility functions without restrictions on the action set). In fact, we may also imagine that only one player i delegates at all, while the other players choose an action themselves, after observing Player i’s instruction to her representative.

One may object that in a situation where only one player can credibly commit and the others cannot, the player who commits can simply play the meta game as a standard unilateral commitment (Stackelberg) game (as studied by, e.g., [11, 52, 59]) or perhaps as a first mover in a sequential game (as solved by subgame-perfect equilibrium), without bothering with any (safe) Pareto conditions, i.e., without ensuring that all players are guaranteed a utility at least as high as their default \(\mathbf {u}(\Pi (\Gamma ))\). For example, in the Complicated Temptation Game, Player 1 could simply commit her representative to play \(R_1\) if she assumes that Player 2’s representative will be instructed to best respond.

The Stackelberg sequential play perspective is appropriate in many cases. However, we think that in many cases the player with fine-grained commitment ability cannot assume that the other players’ representatives will simply best respond. Instead, players often need to consider the possibility of a hostile response if their commitment forces an unfair payoff on the other players. In such cases, unilateral SPIs are relevant.

The Ultimatum game is a canonical example in which standard solution concepts of sequential play fail to predict human behavior. In this game, subgame-perfect equilibrium has the second-moving player walk away with arbitrarily close to nothing. However, experiments show that people often resolve the game to an equal split, which is the symmetric equilibrium of the simultaneous version of the game [38].

A policy of retaliating for unfair payoffs imposed by a first mover’s commitments can arise in a variety of ways within standard game-theoretic models. For one, we may imagine a scenario in which only one Player has the fine-grained commitment and delegation abilities needed for SPIs but that the other players can still credibly commit their representatives to retaliate against any “commitment trickery” that clearly leaves them worse off. We may also imagine that other players or representatives come into the scenario having already made such commitments. For example, many people appear credibly committed by intuitions about fairness and retributivist instincts and emotions (see, e.g., [44], Chapter 6, especially the section “The Doomsday Machine”). Perhaps these features of human psychology allow human second players in the Ultimatum game empirically outperform subgame-perfect equilibrium. Second, we may imagine that the players who cannot commit are subject to reputation effects. Then they might want to build a reputation of resisting coercion. In contrast, it is beneficial to have a reputation of accepting SPIs on whatever game would have otherwise been played.

3.2 Implementing safe Pareto improvements as program equilibria

Fig. 1
figure 1

A diagram describing the meta-game in the case of two players

So far, we have been vague about the details of the strategic situation that the original players face in instructing their representatives. From what sets of actions can they choose? How can they jointly let the representatives play some new subset game \(\Gamma ^s\)? Are SPIs Nash equilibria of the meta game played by the original players? If I instruct my representative to play the SPI of Table 2 in the Demand Game, could my opponent not instruct her representative to play \(\mathrm {DM}\)?

In this section, we briefly describe one way to fill this gap by discussing the concept of program games and program equilibrium [5, 13, 15, 36] ([46], Sect. 10.4) [55]. This section is essential to understanding why SPIs (especially omnilateral ones) are relevant. However, the remaining technical content of this paper does not rely on this section and the main ideas presented here are straightforward from previous work. We therefore only give an informal exposition. For formal detail, see Appendix 1.

For any game \(\Gamma =(A,\mathbf {u})\), the program equilibrium literature considers the following meta game. First, each player i writes a computer program. Each program then receives as input a vector containing everyone else’s chosen program. Each player i’s program then returns an action from \(A_i\), player i’s set of actions in \(\Gamma\). Together these actions then form an outcome \(\mathbf {a}\in A\) of the original game. Finally, the utilities \(\mathbf {u}(\mathbf {a})\) are realized according to the utility function of \(\Gamma\). The meta game can be analyzed like any other game. Its Nash equilibria are called program equilibria. Importantly, the program equilibria can implement payoffs not implemented by any Nash equilibria of \(\Gamma\) itself. For example, in the Prisoner’s Dilemma, both players can submit a program that says: “If the opponent’s chosen computer program is equal to this computer program, Cooperate; otherwise Defect.” [22, 33] ([46], Sect. 10.4), [55] This is a program equilibrium which implements mutual cooperation.

In the setting for our paper, we similarly imagine that each player i can write a program that in turn chooses from \(A_i\). However, the types of programs that we have in mind here are more sophisticated than those typically considered in the program equilibrium literature. Specifically we imagine that the programs are executed by intelligent representatives who are themselves able to competently choose an action for player i in any given game \(\Gamma ^s\), without the original player having to describe how this choice is to be made. The original player may not even understand much about this program other than that it generally plays well. Thus, in addition to the elementary instructions used in a typical computer program (branches, comparisons, arithmetic operations, return, etc.), we allow player i to use instructions of type “Play \(\Pi _i(\Gamma ^s)\)” in the program she submits. This instruction lets the representative choose and return an action for the game \(\Gamma ^s\). Apart from the addition of this instruction type, we imagine the set of instructions to be the same as in the program equilibrium literature. To jointly let the representatives play, e.g., the SPI \(\Gamma ^s\) of Table 2 on the Demand Game of Table 1, the representatives can both use an instruction that says, “If the opponent’s chosen program is equal to this one, play \(\Pi _i(\Gamma ^s)\); otherwise play \(\mathrm {DM}\)”. Assuming some minimal rationality requirements on the representatives (i.e., on how the representative resolves the “play \(\Pi _i(\Gamma ^s)\)” instruction), this is a Nash equilibrium. Figure 1 illustrates how (in the two-player case) the meta game between the original players is intended to work.

For illustration consider the following two real-world instantiations of this setup. First, we might imagine that the original players hire human representatives. Each player specifies, e.g., via monetary incentives, how she wants her representative to act by some contract. For example, a player might contract her representative to play a particular action; or she might specify in her contract a function (\(u_i^s\)) over outcomes according to which she will pay the representative after an outcome is obtained. Moreover, these contracts might refer to one another. For example, Player 1’s contract with her representative might specify that if Player 2 and his representative use an analogous contract, then she will pay her representative according to Table 2. As a second, more futuristic scenario, you could imagine that the representatives are software agents whose goals are specified by so-called smart contracts, i.e., computer programs implemented on a blockchain to be publicly verifiable [8, 47].

To justify our study of SPIs, we prove that every SPI is played in some program equilibrium:

Theorem 1

Let \(\Gamma\) be a game and \(\Gamma ^s\) be an SPI of \(\Gamma\). Now consider a program game on \(\Gamma\), where each player i can choose from a set of computer programs that output actions for \(\Gamma\). In addition to the normal kind of instructions, we allow the use of the command “play \(\Pi _i(\Gamma ')\)” for any subset game \(\Gamma '\) of \(\Gamma\). Finally, assume that \(\Pi (\Gamma )\) guarantees each player i at least that player’s minimax utility (a.k.a. threat point) in the base game \(\Gamma\). Then \(\Pi (\Gamma ^s)\) is played in a program equilibrium, i.e., in a Nash equilibrium of the program game.

We prove this in Appendix 1.

As an alternative to having the original players choose contracts separately, we could imagine the use of jointly signed contracts which only come into effect once signed by all players (cf. [24, 34]). Another approach to bilateral commitment was pursued by Raub [45] based on earlier work by Sen [51]. Raub and Sen use preference modification as a mechanism for commitment. For example, in the Prisoner’s Dilemma, each player can separately instruct their representative to prefer cooperating over defecting if and only if the opponent also cooperates. If both players use this instruction, then mutual cooperation becomes the unique Pareto-optimal Nash equilibrium. On the other hand, if only one player instructs their representative to adopt these preferences and the other maintains the usual Prisoner’s Dilemma preferences, the unique equilibrium remains mutual defection. Thus, the preference modification is used to commit to cooperating conditional on the other player making an analogous commitment. Because this is slightly confusing in the context of our work – seeing as our work involves both modifying one’s preferences and mutual commitment, but generally without using the former as a means to the latter – we discuss Raub’s and Sen’s work and its relation to ours in more detail in Appendix 2.

4 Safe Pareto improvements through outcome correspondence

4.1 Multivalued functions

For sets M and N, a multi-valued function \(\Phi :M\multimap N\) is a function which maps each element \(m\in M\) to a set \(\Phi (m)\subseteq N\). For a subset \(Q\subseteq M\), we define

$$\begin{aligned} \Phi (Q){:=}\bigcup _{m\in Q} \Phi (m). \end{aligned}$$

Note that \(\Phi (Q)\subseteq N\) and that \(\Phi (\emptyset )=\emptyset\). For any set M, we define the identity function \(\mathrm {id}_M:M\multimap M:m\rightarrow \{ m \}\). Also, for two sets M and N, we define \(\mathrm {all}_{M,N}:M \multimap N:m \mapsto N\). We define the inverse

$$\begin{aligned} \Phi ^{-1}:N \multimap M :n \mapsto \{ m \in M \mid n\in \Phi (m) \}. \end{aligned}$$

Note that \(\Phi ^{-1}(\emptyset )=\emptyset\) for any multi-valued function \(\Phi\). For sets M, N and Q and functions \(\Phi :M\multimap N\) and \(\Psi :N \multimap Q\), we define the composite \(\Psi \circ \Phi :M \multimap Q :m \mapsto \Psi (\Phi (m))\). As with regular functions, composition of multi-valued functions is associative. We say that \(\Phi :M\multimap N\) is single-valued if \(|\Phi (m)|=1\) for all \(m\in M\). Whenever a multi-valued function is single-valued, we can apply many of the terms for regular functions. For example, we will take injectivity, surjectivity, and bijectivity for single-valued functions to have the usual meaning. We will never apply these notions to non-single-valued functions.

4.2 Outcome correspondence between games

In this section, we introduce a notion of outcome correspondence, which we will see is essential to constructing SPIs.

Definition 3

Consider two games \(\Gamma =(A_1,...,A_n,\mathbf {u})\) and \(\Gamma '=(A_1',...,A_n',\mathbf {u}')\). We write \(\Gamma \sim _{\Phi } \Gamma '\) for \(\Phi :A \multimap A'\) if \(\Pi (\Gamma ')\in \Phi (\Pi (\Gamma ))\) with certainty.

Note that \(\Gamma \sim _{\Phi } \Gamma '\) is a statement about \(\Pi\), i.e., about how the representatives choose. Whether such a statement holds generally depends on the specific representatives being used. In Sect. 4.4, we describe two general circumstances under which it seems plausible that \(\Gamma \sim _{\Phi } \Gamma '\). For example, if two games \(\Gamma\) and \(\Gamma '\) are isomorphic, then one might expect \(\Gamma \sim _{\Phi } \Gamma '\), where \(\Phi\) is the isomorphism between the two games.

We now illustrate this notation using our discussion from the Demand Game. Let \(\Gamma\) be the Demand Game of Table 1. First, it seems plausible that \(\Gamma\) is in some sense equivalent to \(\Gamma '\), where \(\Gamma '=(\{\mathrm {DM},\mathrm {RM},\mathrm {u})\) is the game that results from removing \(\mathrm {DL}\) and \(\mathrm {RL}\) for both players from \(\Gamma\). Again, strict dominance could be given as an argument. We can now formalize this as \(\Gamma \sim _{\Phi } \Gamma '\), where \(\Phi (a_1,a_2)=\{(a_1,a_2)\}\) if \(a_1,a_2\in \{\mathrm {DM},\mathrm {RM}\}\) and \(\Phi (a_1,a_2)=\emptyset\) otherwise. Next, it seems plausible that \(\Gamma '\sim _{\Psi } \Gamma ^s\), where \(\Gamma ^s\) is the game of Table 2 and \(\Psi\) is the isomorphism between \(\Gamma '\) and \(\Gamma ^s\).

We now state some basic facts about the relation \(\sim\), many of which we will use throughout this paper.

Lemma 2

Let \(\Gamma =(A,\mathbf {u})\), \(\Gamma '=(A',\mathbf {u}')\), \(\hat{\Gamma }=(\hat{A},\hat{\mathbf {u}})\) and \(\Phi ,\Xi :A\multimap A'\), \(\Psi :A' \multimap \hat{A}\).

  1. 1.

    Reflexivity: \(\Gamma \sim _{\mathrm {id}_A} \Gamma\), where \(\mathrm {id}_{A} :A \multimap A :\mathbf {a} \mapsto \{\mathbf {a}\}\).

  2. 2.

    Symmetry: If \(\Gamma \sim _{\Phi } \Gamma '\), then \(\Gamma ' \sim _{\Phi ^{-1}} \Gamma\).

  3. 3.

    Transitivity: If \(\Gamma \sim _{\Phi } \Gamma '\) and \(\Gamma '\sim _{\Psi } \hat{\Gamma }\), then \(\Gamma \sim _{\Psi \circ \Phi } \hat{\Gamma }\).

  4. 4.

    If \(\Gamma \sim _{\Phi } \Gamma '\) and \(\Phi (\mathbf {a})\subseteq \Xi (\mathbf {a})\) for all \(\mathbf {a}\in A\), then \(\Gamma \sim _{\Xi } \Gamma '\).

  5. 5.

    \(\Gamma \sim _{\mathrm {all}_{A,A'}} \Gamma '\), where \(\mathrm {all}_{A,A'} :A \multimap A' :\mathbf {a} \mapsto A'\).

  6. 6.

    If \(\Gamma \sim _{\Phi } \Gamma '\) and \(\Phi (\mathbf {a})=\emptyset\), then \(\Pi (\Gamma )\ne \mathbf {a}\) with certainty.

  7. 7.

    If \(\Gamma \sim _{\Phi } \Gamma '\) and \(\Phi ^{-1}(\mathbf {a}')=\emptyset\), then \(\Pi (\Gamma ')\ne \mathbf {a}'\) with certainty.

Proof

  1. 1.

    By reflexivity of equality, \(\Pi (\Gamma )=\Pi (\Gamma )\) with certainty. Hence, \(\Pi (\Gamma )\in \mathrm {id}_A(\Pi (\Gamma ))\) by definition of \(\mathrm {id}_A\). Therefore, \(\Gamma \sim _{\mathrm {id}_A} \Gamma\) by definition of \(\sim\), as claimed.

  2. 2.

    \(\Gamma \sim _{\Phi } \Gamma '\) means that \(\Pi (\Gamma ')\in \Phi (\Pi (\Gamma ))\) with certainty. Thus,

    $$\begin{aligned} \Pi (\Gamma )\in \{ \mathbf {a}{\in } A\mid \Pi (\Gamma '){\in } \Phi (\mathbf {a}) \} =\Phi ^{-1}(\Pi (\Gamma ')), \end{aligned}$$

    where equality is by the definition of the inverse of multi-valued functions. We conclude (by definition of \(\sim\)) that \(\Gamma '\sim _{\Phi ^{-1}} \Gamma\) as claimed.

  3. 3.

    If \(\Gamma \sim _{\Phi } \Gamma '\), \(\Gamma '\sim _{\Psi } \hat{\Gamma }\), then by definition of \(\sim\), (i) \(\Pi (\Gamma ')\in \Phi (\Pi (\Gamma ))\) and (ii) \(\Pi (\hat{\Gamma })\in \Psi (\Pi (\Gamma '))\), both with certainty. The former (i) implies \(\{ \Pi (\Gamma ')\} \subseteq \Phi (\Pi (\Gamma ))\). Hence,

    $$\begin{aligned} \Psi (\Pi (\Gamma '))=\Psi (\{\Pi (\Gamma ')\}) \subseteq \Psi (\Phi (\Pi (\Gamma ))). \end{aligned}$$

    With ii, it follows that \(\Pi (\hat{\Gamma })\in \Psi (\Phi (\Pi (\Gamma )))\) with certainty. By definition, \(\Gamma \sim _{\Psi \circ \Phi }\hat{\Gamma }\) as claimed.

  4. 4.

    It is

    $$\begin{aligned} \Pi (\Gamma ')\in \Phi (\Pi (\Gamma )) \subseteq \Xi (\Pi (\Gamma )) \end{aligned}$$

    with certainty. Thus, by definition \(\Gamma \sim _{\Xi } \Gamma '\).

  5. 5.

    By definition of \(\Pi\), it is \(\Pi (\Gamma ')\in A'\) with certainty. By definition of \(\mathrm {all}_{A,A'}\), it is \(\mathrm {all}_{A,A'}(\Pi (\Gamma ))=A'\) with certainty. Hence, \(\Pi (\Gamma ')\in \mathrm {all}_{A,A'} (\Pi (\Gamma ))\) with certainty. We conclude that \(\Gamma \sim _{\mathrm {all}_{A,A'}}\Gamma '\) as claimed.

  6. 6.

    With certainty, \(\Pi (\Gamma ')\in \Phi (\Pi (\Gamma ))\) (by assumption). Also, with certainty \(\Pi (\Gamma ')\notin \emptyset\). Hence, \(\Phi (\Pi (\Gamma ))\ne \emptyset\) with certainty. We conclude that \(\Pi (\Gamma )\ne \mathbf {a}\) with certainty.

  7. 7.

    If \(\Gamma \sim _{\Phi }\Gamma '\), then by reflexivity of \(\sim\) (Lemma 2.1) \(\Gamma '\sim _{\Phi ^{-1}}\Gamma\). If \(\Phi ^{-1}(\mathbf {a}')=\emptyset\), then by Lemma 2.6, \(\Pi (\Gamma ')\ne \mathbf {a}'\) with certainty.\(\square\)

Items 1-3 show that \(\sim\) has properties resembling those of an equivalence relation. Note, however, that since \(\sim\) is not a binary relationship, \(\sim\) itself cannot be an equivalence relation in the usual sense. We can construct equivalence relations, though, by existentially quantifying over the multivalued function. For example, we might define an equivalence relation R on games, where \((\Gamma ,\Gamma ')\in R\) if and only if there is a single-valued bijection \(\Phi\) such that \(\Gamma \sim _{\Phi } \Gamma '\).Footnote 3 Item 4 states that if we can make an outcome correspondence claim less precise, it will still hold true. Item 5 states that in the extreme, it is always \(\Gamma \sim _{\mathrm {all}_{A,A'}} \Gamma '\), where \(\mathrm {all}_{A,A'}\) is the trivial, maximally imprecise outcome correspondence function that confers no information. Item 6 shows that \(\sim\) can be used to express the elimination of outcomes, i.e., the belief that a particular outcome (or strategy) will never occur.

Besides an equivalence relation, we can also use \(\sim\) with quantification over the respective outcome correspondence function to construct (non-symmetric) preorders over games, i.e., relations that are transitive and reflexive (but not symmetric or antisymmetric). Most importantly, we can construct a preorder \(\succeq\) on games where \(\Gamma \succeq \Gamma '\) if \(\Gamma \sim _{\Phi } \Gamma '\) for a \(\Phi\) that always increases every player’s utilities.

4.3 A theorem connecting outcome correspondence with safe Pareto improvements

We now show that as advertised, outcome correspondence is closely tied to SPIs. The following theorem shows not only how outcome correspondences can be used to find (and prove) SPIs. It also shows that any SPI requires an outcome correspondence relation via a Pareto-improving correspondence function.

Definition 4

Let \(\Gamma =(A,\mathbf {u})\) be a game and \(\Gamma ^s = (A^s,\mathbf {u}^s)\) be a subset game of \(\Gamma\). Further let \(\Phi :A\rightarrow A^s\) be such that \(\Gamma \sim _{\Phi } \Gamma '\). We call \(\Phi\) a Pareto-improving outcome correspondence (function) if \(\mathbf {u}(\mathbf {a}^s)\ge \mathbf {u}(\mathbf {a})\) for all \(\mathbf {a}\in A\) and all \(\mathbf {a}^s\in \Phi (\mathbf {a})\).

Theorem 3

Let \(\Gamma =(A,\mathbf {u})\) be a game and \(\Gamma ^s = (A^s,\mathbf {u}^s)\) be a subset game of \(\Gamma\). Then \(\Gamma ^s\) is an SPI on \(\Gamma\) if and only if there is a Pareto-improving outcome correspondence from \(\Gamma\) to \(\Gamma ^s\).

Proof

\(\Leftarrow\): By definition, \(\Pi (\Gamma ^s )\in \Phi (\Pi (\Gamma ))\) with certainty. Hence, for \(i=1,2\),

$$\begin{aligned} u_i(\Pi (\Gamma ^s))\in u_i(\Phi (\Pi (\Gamma ))) \end{aligned}$$

with certainty. Hence, by assumption about \(\Phi\), with certainty, \(u_i(\Pi (\Gamma ^s))\ge u_i(\Pi (\Gamma ))\).

\(\Rightarrow\): Assume that \(u_{i} (\Pi (\Gamma )) \ge u_i(\Pi (\Gamma ^s))\) with certainty for \(i=1,2\). We define

$$\begin{aligned} \Phi :A\rightarrow A ^s:\mathbf {a} \mapsto \left\{ \mathbf {a}^s\in A^s \mid \mathbf {u}(\mathbf {a}^s) \ge \mathbf {u} (\mathbf {a}) \right\} . \end{aligned}$$

It is immediately obvious that \(\Phi\) is Pareto-improving as required. Also, whenever \(\Pi (\Gamma )=\mathbf {a}\) and \(\Pi (\Gamma ^s)=\mathbf {a}^s\) for any \(\mathbf {a}\in A\) and \(\mathbf {a}^s\in A^s\), it is (by assumption) with certainty \(\mathbf {u}(\mathbf {a}^s)\ge \mathbf {u}(\mathbf {a})\). Thus, by definition of \(\Phi\), it holds that \(\mathbf {a}^s\in \Phi (\mathbf {a})\). We conclude that \(\Gamma \sim _{\Phi } \Gamma ^s\) as claimed.\(\square\)

Note that the theorem concerns weak SPIs and therefore allows the case where with certainty \(\mathbf {u}(\Pi (\Gamma ))=\mathbf {u}(\Pi (\Gamma ^s))\). To show that some \(\Gamma ^s\) is a strict SPI, we need additional information about which outcomes occur with positive probability. This, too, can be expressed via our outcome correspondence relation. However, since this is cumbersome, we will not formally address strictness much to keep things simple.Footnote 4

We now illustrate how outcome correspondences can be used to derive the SPI for the Demand Game from the introduction as per Theorem 3. Of course, at this point we have not made any assumptions about when games are equivalent. We will introduce some in the following section. Nevertheless, we can already sketch the argument using the specific outcome correspondences that we have given intuitive arguments for. Let \(\Gamma\) again be the Demand Game of Table 1. Then, as we have argued, \(\Gamma \sim _{\Phi } \Gamma '\), where \(\Gamma '=(\{\mathrm {DM},\mathrm {RM}\},\{\mathrm {DM},\mathrm {RM}\},\mathbf {u})\) is the game that results from removing \(\mathrm {DL}\) and \(\mathrm {RL}\) for both players; and \(\Phi (a_1,a_2)=\{(a_1,a_2)\}\) if \(a_1,a_2\in \{\mathrm {DM},\mathrm {RM}\}\) and \(\Phi (a_1,a_2)=\emptyset\) otherwise. In a second step, \(\Gamma '\sim _{\Psi } \Gamma ^s\), where \(\Gamma ^s\) is the game of Table 2 and \(\Psi\) is the isomorphism between \(\Gamma '\) and \(\Gamma ^s\). Finally, transitivity (Lemma 2.3) implies that \(\Gamma \sim _{\Psi \circ \Phi } \Gamma ^s\). To see that \(\Psi \circ \Phi\) is Pareto-improving for the original utility functions of \(\Gamma\), notice that \(\Phi\) does not change utilities at all. The correspondence function \(\Psi\) maps the conflict outcome \((\mathrm {DM},\mathrm {DM})\) onto the outcome \((\mathrm {DL},\mathrm {DL})\), which is better for both original players. Other than that, \(\Psi\), too, does not change the utilities. Hence, \(\Psi \circ \Phi\) is Pareto-improving. By Theorem 3, \(\Gamma ^s\) is therefore an SPI on \(\Gamma\).

In principle, Theorem 3 does not hinge on \(\Pi (\Gamma )\) and \(\Pi (\Gamma ^s)\) resulting from playing games. An analogous result holds for any random variables over \(A\) and \(A^s\). In particular, this means that Theorem 3 applies also if the representatives receive other kinds of instructions (cf. Sect. 3.2). However, it seems hard to establish non-trivial outcome correspondences between \(\Pi (\Gamma )\) and other types of instructions. Still, the use of more complicated instructions can be used to derive different kinds of SPIs. For example, if there are different game SPIs, then the original players could tell their representatives to randomize between them in a coordinated way.

4.4 Assumptions about outcome correspondence

To make any claims about how the original players should play the meta-game, i.e., about what instructions they should submit, we generally need to make assumptions about how the representatives choose and (by Theorem 3) about outcome correspondence in particular.Footnote 5 We here make two fairly weak assumptions.

4.4.1 Elimination

Our first is that the representatives never play strictly dominated actions and that removing them does not affect what the representatives would choose.

Assumption 1

Let \(\Gamma =(A,\mathbf {u})\) be an arbitrary n-player game where \(A_1,...,A_n\) are pairwise disjoint, and let \(\tilde{a}_i\in A_i\) be strictly dominated by some other strategy in \(A_i\). Then \(\Gamma \sim _{\Phi } (A_{-i}, A_i-\{ \tilde{a}_i \}, \mathbf {u}_{|(A_{-i}, A_i-\{ \tilde{a}_i \})})\), where for all \(a_{-i}\in A_{-i}\), \(\Phi (\tilde{a}_i,a_{-i})=\emptyset\) and \(\Phi (a_i,a_{-i})=\{(a_i,a_{-i})\}\) whenever \(a_i\ne \tilde{a}_i\).

Assumption 1 expresses that representatives should never play strictly dominated strategies. Moreover, it states that we can remove strictly dominated strategies from a game and the resulting game will be played in the same way by the representatives. For example, this implies that when evaluating a strategy \(a_i\), the representatives do not take into account how many other strategies \(a_i\) strictly dominates. Assumption 1 also allows (via Transitivity of \(\sim\) as per Lemma 2.3) the iterated removal of strictly dominated strategies. The notion that we can (iteratively) remove strictly dominated strategies is common in game theory [27, 41], ([39], Sect. 2.9, Chapter 12) and has rarely been questioned. It is also implicit in the solution concept of Nash equilibrium – if a strategy is removed by iterated strict dominance, that strategy is played in no Nash equilibrium. However, like the concept of Nash equilibrium, the elimination of strictly dominated strategies becomes implausible if the game is not played in the usual way. In particular, for Assumption 1 to hold, we will in most games \(\Gamma\) have to assume that the representatives cannot in turn make credible commitments (or delegate to further subrepresentatives) or play the game iteratively [4].

4.4.2 Isomorphisms

Our second assumption is that the representatives play isomorphic games isomorphically when those games are fully reduced.

Assumption 2

Let \(\Gamma =(A,\mathbf {u})\) and \(\Gamma '=(A',\mathbf {u}')\) be two games that do not contain strictly dominated actions. If \(\Gamma\) and \(\Gamma '\) are isomorphic, then there exists an isomorphism \(\Phi\) between \(\Gamma\) and \(\Gamma '\) such that \(\Gamma \sim _{\Phi } \Gamma '\).

Similar desiderata have been discussed in the context of equilibrium selection, e.g., by Harsanyi and Selten ([20], Chapter 3.4), (cf. [56], for a discussion in the context of fully cooperative multi-agent reinforcement learning).

Note that if there are multiple game isomorphisms, then we assume outcome correspondence for only one of them. This is necessary for the assumption to be satisfiable in the case of games with action symmetries. (Of course, such games are not the focus of this paper.) For example, let \(\Gamma\) be Rock–Paper–Scissors. Then \(\Gamma\) is isomorphic to itself via the function \(\Phi\) that for both players maps Rock to Paper, Paper to Scissors, and Scissors to Rock. But if it were \(\Gamma \sim _{\Phi }\Gamma\), then this would mean that if the representatives play Rock in Rock–Paper–Scissors, they play Paper in Rock–Paper–Scissors. Contradiction! We will argue for the consistency of our version of the assumption in Sect. 4.4.3. Notice also that we make the assumption only for reduced games. This relates to the previous point about action-symmetric games. For example, consider two versions of Rock–Paper–Scissors and assume that in both versions both players have an additional strictly dominated action that breaks the action symmetries e.g., the action, “resign and give the opponent \(\$10\) if they play Rock/Paper”. Then there would only be one isomorphism between these two games (which maps Rock to Paper, Paper to Scissors, and Scissors to Rock for both players). However, in light of Assumption 1, it seems problematic to assume that these strictly dominated actions restrict the outcome correspondences between these two games.Footnote 6

One might worry that reasoning about the existence of multiple isomorphisms renders it intractable to deal with outcome correspondences as implied by Assumption 2, and in particular that it might make it impossible to tell whether a particular game is an SPI. However, one can intuitively see that the different isomorphisms between two games do analogous operations. In particular, it turns out that if one isomorphism is Pareto-improving, then they all are:

Lemma 4

Let \(\Phi\) and \(\Psi\) be isomorphisms between \(\Gamma\) and \(\Gamma '\). If \(\Phi\) is (strictly) Pareto-improving, then so is \(\Psi\).

We prove Lemma 4 in Appendix 3.

Lemma 4 will allow us to conclude from the existence of a Pareto-improving isomorphism \(\Phi\) that there is a Pareto-improving \(\Psi\) s.t. \(\Gamma \sim _{\Psi }\Gamma '\) by Assumption 2, even if there are multiple isomorphisms between \(\Gamma\) and \(\Gamma '\). In the following, we can therefore afford to be lax about our ignorance (in some games) about which outcome isomorphism induces outcome equivalence. We will therefore generally write “\(\Gamma \sim _{\Phi }\Gamma '\) by Assumption 2” as short for “\(\Phi\) is a game isomorphism between \(\Gamma\) and \(\Gamma '\) and hence by Assumption 2 there exists an isomorphism \(\Psi\) such that \(\Gamma \sim _{\Psi }\Gamma '\)”.

One could criticize Assumption 2 by referring to focal points (introduced by Schelling [49], pp. 54–58, [48] cf., e.g., [9, 18, 30, 54]) as an example where context and labels of strategies matter. A possible response might be that in games where context plays a role, that context should be included as additional information and not be considered part of \((A,\mathbf {u})\). Assumption 2 would then either not apply to such games with (relevant) context or would require one to, in some way, translate the context along with the strategies. However, in this paper we will not formalize context, and assume that there is no decision-relevant context.

4.4.3 Consistency of Assumptions 1 and 2

We will now argue that there exist representatives that indeed satisfy Assumptions 1 and 2, both to provide intuition and because our results would not be valuable if Assumptions 1 and 2 were inconsistent. We will only sketch the argument informally. To make the argument formal, we would need to specify in more detail what the set of games looks like and in particular what the objects of the action sets are.

Imagine that for each player i there is a bookFootnote 7 that on each page describes a normal-form game that does not have any strictly dominated strategies. The actions have consecutive integer labels. Importantly, the book contains no pair of games that are isomorphic to each other. Moreover, for every fully reduced game, the book contains a game that is isomorphic to this game. (Unless we strongly restrict the set of games under consideration, the book must therefore have infinitely many pages.) We imagine that each player’s book contains the same set of games. On each page, the book for Player i recommends one of the actions of Player i to be taken deterministically.Footnote 8

Each representative owns a potentially different version of this book and uses it as follows to play a given game \(\Gamma\). First the given game is fully reduced by iterated strict dominance to obtain a game \(\Gamma ^{\mathrm {red}}\). They then look up the unique game in the book that is isomorphic to \(\Gamma ^{\mathrm {red}}\) and map the action labels in \(\Gamma ^{\mathrm {red}}\) onto the integer labels of the game in the book via some isomorphism. If there are multiple isomorphisms from \(\Gamma ^{\mathrm {red}}\) to the relevant page in the book, then all representatives decide between them using the same deterministic procedure. Finally they choose the action recommended by the book.

It is left to show a pair of representatives \(\Pi\) thus specified satisfies Assumptions 1 and 2. We first argue that Assumption 1 is satisfied. Let \(\Gamma\) be a game and let \(\Gamma '\) be a game that arises from removing a strictly dominated action from \(\Gamma\). By the well known path independence of iterated elimination of strictly dominated strategies [1, 19, 41], fully reducing \(\Gamma\) and \(\Gamma '\) results in the same game. Hence, the representatives play the same actions in \(\Gamma\) and \(\Gamma '\).

Second, we argue that Assumption 2 is satisfied. Let us say \(\Gamma\) and \(\hat{\Gamma }\) are fully reduced and isomorphic. Then it is easy to see that each player i plays \(\Gamma\) and \(\hat{\Gamma }\) based on the same page of their book. Let the game on that book page be \(\tilde{\Gamma }\). Let \(\Phi :A \rightarrow \tilde{A}\) and \(\Phi ':A'\rightarrow \tilde{A}\) be the bijections used by the representatives to translate actions in \(\Gamma\) and \(\Gamma '\), respectively, to labels in \(\tilde{\Gamma }\). Then if the representatives take actions \(\mathbf {a}\) in \(\Gamma\), the actions \(\Phi (\mathbf {a})\) are the ones specified by the book for \(\tilde{\Gamma }\), and hence the actions \(\hat{\Phi }^{-1}(\Phi (\mathbf {a}))\) are played in \(\Gamma '\). Thus \(\Gamma \sim _{\hat{\Phi }^{-1} \circ \Phi } \hat{\Gamma }\). It is easy to see that \(\hat{\Phi }^{-1} \circ \Phi\) is a game isomorphism between \(\Gamma\) and \(\hat{\Gamma }\).

4.4.4 Discussion of alternatives to Assumptions 1 and 2

One could try to use principles other than Assumptions 1 and 2. We here give some considerations. First, game theorists have also considered the iterated elimination of weakly dominated strategies [17] ([31], Sect. 4.11). Unfortunately, the iterated removal of weakly dominated strategies is path-dependent ([27], Sect. 2.7.B) ([7], Sect. 5.2) ([39], Sect. 12.3). That is, for some games, iterated removal of weakly dominated strategies can lead to different subset games, depending on which weakly dominated strategy one chooses to eliminate at any stage. A straightforward extension of Assumption 1 to allow the elimination of weakly dominated strategies would therefore be inconsistent in such games, which can be seen as follows. Work on the path dependence of iterated removal of weakly dominated strategies has shown that there are games \((A,\mathbf {u})\) with two different outcomes \(\tilde{\mathbf {a}},\hat{\mathbf {a}}\in A\) such that by iterated removal of weakly dominated strategies from \(\Gamma\), we can obtain both \((\{\tilde{\mathbf {a}}\},\mathbf {u})\) and \((\{\hat{\mathbf {a}}\},\mathbf {u})\). If we had an assumption analogous to Assumption 1 but for weak dominance, then (with Lemma 2.3 (transitivity)), we would obtain both that \(\Gamma \sim _{\tilde{\Phi }}(\{\tilde{\mathbf {a}}\},\mathbf {u})\) and that \(\Gamma \sim _{\hat{\Phi }}(\{\hat{\mathbf {a}}\},\mathbf {u})\), where \(\tilde{\Phi }(\mathbf {a})=\emptyset\) for all \(\mathbf {a}\ne \tilde{\mathbf {a}}\) and \(\hat{\Phi }(\mathbf {a})=\emptyset\) for all \(\mathbf {a}\ne \hat{\mathbf {a}}\). The former would mean (by Lemma 2.6) that for all \(\mathbf {a}\ne \tilde{\mathbf {a}}\) we have that \(\Pi (\Gamma )\ne \mathbf {a}\) with certainty; while the latter would mean that that \(\mathbf {a}\ne \hat{\mathbf {a}}\) we have that \(\Pi (\Gamma )\ne \mathbf {a}\) with certainty. But jointly this means that for all \(\mathbf {a}\in A\), we have that \(\Pi (\Gamma )\ne \mathbf {a}\) with certainty, which cannot be the case as \(\Pi (\Gamma )\in A\) by definition. Thus, we cannot make an assumption analogous to Assumption 1 for weak dominance.

As noted above, the iterated removal of strictly dominated strategies, on the other hand, is path-independent, and in the 2-player case always eliminates exactly the non-rationalizable strategies [1, 19, 41]. Many other dominance concepts have been shown to have path independence properties. For an overview, see Apt [1]. We could have made an independence assumption based any of these path-independent dominance concepts. For example, elimination of strategies that are strictly dominated by a mixed strategy (or, equivalently, of so-called never-best responses) is also path independent ([40], Sect. 4.2).

With Assumptions 1 and 2, all our outcome correspondence functions are either 1-to-1 or 1-to-0. Other elimination assumptions could involve the use of many-to-1 or even many-to-many functions. In general, such functions are needed when a strategy \(\tilde{a}_i\) can be eliminated to obtain a strategically equivalent game, but in the original game \(\tilde{a}_i\) may still be played. The simplest example would be the elimination of payoff-equivalent strategies. Imagine that in some game \(\Gamma\) for all opponent strategies \(a_{-i}\in A_{-i}\) it is the case that \(\mathbf {u}(\tilde{a}_i,a_{-i})=\mathbf {u}(\hat{a}_i,a_{-i})\) and that there are no other strategies that are similarly payoff-equivalent to \(\tilde{a}_i\) and \(\hat{a}_i\). Then one would assume that \(\Gamma \sim _{\Phi } (A_i-\{\tilde{a}_i\},A_{-i},\mathbf {u})\), where \(\Phi\) maps \(\tilde{a}_i\) onto \(\{ \hat{a}_i \}\) and otherwise \(\Phi\) is just the identity function. As an example, imagine a variant of the Demand Game in which Player 1 has an additional action \(\mathrm {DM}'\) that results in the same payoffs as \(\mathrm {DM}\) for both players against Player 2’s \(\mathrm {DM}\) and \(\mathrm {RM}\) but potentially slightly different payoffs against \(\mathrm {DL}\) and \(\mathrm {RL}\). With our current assumptions we would be unable to derive a non-trivial SPI for this game. However, with an assumption about the elimination of duplicate actions in hand, we could (after removing \(\mathrm {DL}\) and \(\mathrm {RL}\) as usual) remove \(\mathrm {DM}'\) or \(\mathrm {DM}\) and thereby derive the usual SPI. Many-to-1 elimination assumptions can also arise from some dominance concepts if they have weaker path independence properties. For example, iterated elimination by so-called nice weak dominance [32] is only path-independent up to strategic equivalence. Like the assumption about payoff-equivalent strategies, an elimination assumption based on nice weak dominance therefore cannot assume that the eliminated action is not played in the original game at all.

4.5 Examples

In this section, we use Lemma 2, Theorem 3, and Assumptions 1 and 2 to formally prove a few SPIs.

Proposition (Example) 5

Let \(\Gamma\) be the Prisoner’s Dilemma (Table 3) and \(\Gamma ^s=(A_1^s,A_2^s,u_1^s,u_2^s)\) be any subset game of \(\Gamma\) with \(A_1^s=A_2^s=\{\mathrm {Cooperate}\}\). Then under Assumption 1, \(\Gamma ^s\) is a strict SPI on \(\Gamma\).

Proof

By applying Assumption 1 twice and Transitivity once, \(\Gamma \sim _{\Phi } \Gamma ^D\), where \(\Gamma ^D=(\{\mathrm {Defect}\},\{\mathrm {Defect}\},\mathbf {u})\) and \(\Phi (\mathrm {Defect},\mathrm {Defect})=\{ (\mathrm {Defect},\mathrm {Defect}) \}\) and \(\Phi (a_1,a_2)=\emptyset\) for all \((a_1,a_2)\ne (\mathrm {Defect},\mathrm {Defect})\). By Lemma 2.5, we further obtain \(\Gamma ^D \sim _ {\mathrm {all}} \Gamma ^s\), where \(\Gamma ^ s\) is as described in the proposition. Hence, by transitivity, \(\Gamma \sim _{\mathrm {all} \circ \Phi } \Gamma ^s\). It is easy to verify that the function \(\mathrm {all}\circ \Phi\) is Pareto-improving.\(\square\)

Proposition (Example) 6

Let \(\Gamma\) be the Demand Game of Table 1 and \(\Gamma ^s\) be the subset game described in Table 2. Under Assumptions 1 and 2, \(\Gamma ^s\) is an SPI on \(\Gamma\). Further, if \(P(\Pi (\Gamma ){=}(\mathrm {DM},\mathrm {DM}))>0\), then \(\Gamma ^s\) is a strict SPI.

Proof

Let \((A_1,A_2,u_1,u_2)=\Gamma\). We can repeatedly apply Assumption 1 to eliminate from \(\Gamma\) the strategies \(\mathrm {DL}\) and \(\mathrm {RL}\) for both players. We can then apply Lemma 2.3 (Transitivity) to obtain \(\Gamma \sim _{\Phi } \hat{\Gamma }\), where \(\hat{\Gamma }= (\{\mathrm {DM},\mathrm {RM}\},\{\mathrm {DM},\mathrm {RM}\},u_1,u_2)\) and

$$\begin{aligned} \Phi (a_1,a_2)=\left\{ \begin{array}{cl} \{(a_1,a_2)\} &{} \text {if } a_1,a_2\in \{\mathrm {DM},\mathrm {RM}\}\\ \emptyset &{} \text {otherwise}\\ \end{array}\right. . \end{aligned}$$

Next, by Assumption 2, \(\hat{\Gamma }\sim _{\Psi } \Gamma ^s\), where \(\Psi _i(\mathrm {DM})=\mathrm {DL}\) and \(\Psi _i(\mathrm {RM})=\mathrm {RL}\) for \(i=1,2\). We can then apply Lemma 2.3 (Transitivity) again, to infer \(\Gamma \sim _{\Psi \circ \Phi } \Gamma ^s\). It is easy to verify that for all \((a_1,a_2)\in A_1\times A_2\), it is for all \((a_1^s,a_2^s)\in \Psi (\Phi (\Gamma ^s))\) the case that \(\mathbf {u}(a_1^s,a_2^s) \ge \mathbf {u}(a_1,a_2)\).\(\square\)

Next, we give two examples of unilateral SPIs. We start with an example that is trivial in that the original player instructs her resentatives to take a specific action. We then give the SPI for the Complicated Temptation game as a non-trivial example.

Consider the Temptation Game given in Table 6. In this game, Player 1’s T (for Temptation) strictly dominates R. Once R is removed, Player 2 prefers C. Hence, this game is strict-dominance solvable to (TC). Player 1 can safely Pareto-improve on this result by telling her representative to play R, since Player 2’s best response to R is F and \({\mathbf {u}}(R,F)=(4,4)>(1,2)={\mathbf {u}}(T,C)\). We now show this formally.

Table 6 Simple Temptation Game

Proposition (Example) 7

Let \(\Gamma =(A_1,A_2,u_1,u_2)\) be the game of Table 6. Under Assumption 1, \(\Gamma ^s=(\{R\},A_2,u_1,u_2)\) is a strict SPI on \(\Gamma\).

Proof

First consider \(\Gamma\). We can apply Assumption 1 to eliminate Player 1’s R and then apply Assumption 1 again to the resulting game to also eliminate Player 2’s R. By transitivity, we find \(\Gamma \sim _{\Phi } \Gamma '\), where \(\Gamma '=(\{T\},\{C\},u_1,u_2)\) and \(\Phi (T,C)=\{(T,C)\}\) and \(\Phi (A_1\times A_2-\{(T,C)\})=\emptyset\).

Next, consider \(\Gamma ^s\). We can apply Assumption 1 to remove Player 2’s strategy C and find \(\Gamma ^s\sim _{\Psi } \hat{\Gamma }^s\), where \(\hat{\Gamma }^s =(\{R\},\{F\},u_1,u_2)\) and \(\Psi (R,F)=\{(R,F)\}\) and \(\Psi (R,C)=\emptyset\).

Third, \(\Gamma '\sim _{\mathrm {all}} \hat{\Gamma }^s\) by Lemma 2.5, where \(\mathrm {all}(T,C)=\{(R,F)\}\).

Finally, we can apply transitivity to conclude \(\Gamma \sim _{\Xi } \Gamma ^s\), where \(\Xi = \Psi ^{-1} \circ \mathrm {all} \circ \Phi\). It is easy to verify that \(\Xi (T,C)=(R,F)\) and \(\Xi (A_1\times A_2-\{(R,F)\})=\emptyset\). Hence, \(\Xi\) is Pareto-improving and so by Theorem 3, \(\Gamma ^s\) is an SPI on \(\Gamma\).\(\square\)

Note that in this example, Player 1 simply commits to a particular strategy R and Player 2 maximizes their utility given Player 1’s choice. Hence, this SPI can be justified with much simpler unilateral commitment setups [11, 52, 59]. For example, if the Temptation Game was played as a sequential game in which Player 1 plays first, its unique subgame-perfect equilibrium is (RF).

In Table 4 we give the Complicated Temptation Game, which better illustrates the features specific to our setup. Roughly, it is an extension of the simpler Temptation Game of Table 6. In addition to choosing T versus R and C versus F, the players also have to make an additional choice (1 versus 2), which is difficult in that it cannot be solved by strict dominance. As we have argued in Sect. 3.1, the game in Table 5 is a unilateral SPI on Table 4. We can now show this formally.

Proposition (Example) 8

Let \(\Gamma\) be the Complicated Temptation Game (Table 4) and \(\Gamma ^s\) be the subset game in Table 5. Under Assumptions 1 and 2, \(\Gamma ^s\) is a unilateral SPI on \(\Gamma\).

Proof

In \(\Gamma\), for Player 1, \(T_1\) and \(T_2\) strictly dominate \(R_1\) and \(R_2\). We can thus apply Assumption 1 to eliminate Player 1’s \(R_1\) and \(R_2\). In the resulting game, Player 2’s \(C_1\) and \(C_2\) strictly dominate \(F_1\) and \(F_2\), so one can apply Assumption 1 again to the resulting game to also eliminate Player 2’s \(F_1\) and \(F_2\). By transitivity, we find \(\Gamma \sim _{\Phi } \Gamma '\), where \(\Gamma '=(\{T_1,T_2\},\{C_1,C_2\},u_1,u_2)\) and

$$\begin{aligned} \Phi (a_1,a_2)=\left\{ \begin{array}{cl} \{(a_1,a_2)\} &{} \text {if } a_1\in \{T_1,T_2\} \text { and }a_2\in \{C_1,C_2\}\\ \emptyset &{} \text {otherwise}\\ \end{array}\right. . \end{aligned}$$

Next, consider \(\Gamma ^s\) (Table 5). We can apply Assumption 1 to remove Player 2’s strategies \(C_1\) and \(C_2\) and find \(\Gamma ^s\sim _{\Psi } \hat{\Gamma }^s\), where \(\hat{\Gamma }^s =(\{R_1,R_2\},\{F_1,F_2\},u_1^s,u_2)\) and

$$\begin{aligned} \Psi (a_1,a_2)=\left\{ \begin{array}{cl} \{(a_1,a_2)\} &{} \text {if } a_1\in \{R_1,R_2\}\text { and }a_2\in \{F_1,F_2 \}\\ \emptyset &{} \text {otherwise}\\ \end{array}\right. . \end{aligned}$$

Third, \(\Gamma '\sim _{\Xi } \hat{\Gamma }^s\) by Assumption 2, where \(\Xi\) decomposes into \(\Xi _1\) and \(\Xi _2\), corresponding to the two players, respectively, where \(\Xi _1(T_i)=R_i\) and \(\Xi _2(C_i)=F_i\) for \(i=1,2\).

Finally, we can apply transitivity and the rule about symmetry and inverses (Lemma 2.2) to conclude \(\Gamma \sim _{\Psi ^{-1}\circ \Xi \circ \Phi } \Gamma ^s\). It is easy to verify that \(\Psi ^{-1}\circ \Xi \circ \Phi\) is Pareto-improving.\(\square\)

4.6 Computing safe Pareto improvements

In this section, we ask how computationally costly it is for the original players to identify for a given game \(\Gamma\) a non-trivial SPI \(\Gamma ^s\). Of course, the answer to this question depends on what the original players are willing to assume about how their representatives act. For example, if only trivial outcome correspondences (as per Lemmas 2.1 and 2.5) are assumed, then the decision problem is easy. Similarly, if \(\Gamma \sim _{\Phi }\Gamma '\) for given \(\Phi\) is hard to decide (e.g., because it requires solving for the Nash equilibria of \(\Gamma\) and \(\Gamma '\)), then this could trivially also make the safe Pareto improvement problem hard to decide. We specifically are interested in deciding whether a given game \(\Gamma\) has a non-trivial SPI that can be proved using only Assumptions 1 and 2, the general properties of game correspondence (in particular Transitivity (Lemma 2.3), Symmetry (Lemma 2.2) and Theorem 3).

Definition 5

The SPI decision problem consists in deciding for any given \(\Gamma\), whether there is a game \(\Gamma ^s\) and a sequence of outcome correspondences \(\Phi ^1,...,\Phi ^k\) and a sequence of subset games \(\Gamma ^0=\Gamma , \Gamma ^1,...,\Gamma ^k=\Gamma ^s\) of \(\Gamma\) s.t.:

  1. 1.

    (Non-triviality:) If we fully reduce \(\Gamma ^s\) and \(\Gamma\) using iterated strict dominance (Assumption 1), the two resulting games are not equal. (Of course, they are allowed to be isomorphic.)

  2. 2.

    For \(i=1,...,k\), \(\Gamma ^{i-1}\sim _{\Phi ^{i}} \Gamma ^{i}\) is valid by a single application of either Assumption 1 or Assumption 2, or an application of Assumption 1 in reverse via Lemma 2.2.

  3. 3.

    For all \(\mathbf {a}\in A\), and whenever \(\mathbf {a}^s\in (\Phi ^k \circ \Phi ^{k-1} \circ ... \circ \Phi ^{1})(\mathbf {a})\), it is the case that \(u(\mathbf {a}^s) \ge \mathbf {u}(\mathbf {a})\).

For the strict SPI decision problem, we further require:

  1. (4.)

    There is a player i and an outcome \(\mathbf {a}\) that survives iterated elimination of strictly dominated strategies from \(\Gamma\) s.t. \(u_i((\Phi ^k \circ \Phi ^{k-1} \circ ... \circ \Phi ^{1})(\mathbf {a}))>u_i(\mathbf {a})\).

For the unilateral SPI decision problem, we further require:

  1. (5.)

    For all but one of the players i, \(u_i=u_i^s\) and \(A_i=A_i^s\).

Many variants of this problem may be considered. For example, to match Definition 1, the definition of the strict SPI problem assumes that all outcomes \(\mathbf {a}\) that survive iterated elimination occur with positive probability. Alternatively we could have required that for demonstrating strictness, there must be a player i such that for all \(\mathbf {a}\in A\) that survive iterated elimination, \(u_i((\Phi ^k \circ \Phi ^{k-1} \circ ... \circ \Phi ^{1})(\mathbf {a}))>u_i(\mathbf {a})\). Similarly one may wish to find SPIs that are strict improvements for all players. We may also wish to allow the use of the elimination of duplicate strategies (as described in Sect. 4.4.4) or trivial outcome correspondence steps as per Lemma 2.5. These modifications would not change the computational complexity of the problem, nor would they require new proof ideas. One may also wish to compute all SPIs, or – in line with multi-criteria optimization [14, 58] – all SPIs that cannot in turn be safely Pareto-improved upon. However, in general there may exist exponentially many such SPIs. To retain any hope of developing an efficient algorithm, one would therefore have to first develop a more efficient representation scheme (cf. [42], Sect. 16.4).

Theorem 9

The (strict) (unilateral) SPI decision problem is NP-complete, even for 2-player games.

Proposition 10

For games \(\Gamma\) with \(|A_1|+...+|A_n|=m\) that can be reduced (via iterative application of Assumption 1) to a game \(\Gamma '\) with \(|A_1'|+...+|A_n'|=l\), the (strict) (unilateral) SPI decision problem can be solved in \(O(m^l)\).

The full proof is tedious (see Appendix 4), but the main idea is simple, especially for omnilateral SPIs. To find an omnilateral SPI on \(\Gamma\) based on Assumptions 1 and 2, one has to first iteratively remove all strictly dominated actions to obtain a reduced game \(\Gamma '\), which the representatives would play the same as the original game. This can be done in polynomial time. One then has to map the actions \(\Gamma '\) onto the original \(\Gamma\) in such a way that each outcome in \(\Gamma '\) is mapped onto a weakly Pareto-better outcome in \(\Gamma\). Our proof of NP-hardness works by reducing from the subgraph isomorphism problem, where the payoff matrices of \(\Gamma '\) and \(\Gamma\) represent the adjacency matrices of the graphs.

Besides being about a specific set of assumptions about \(\sim\), note that Theorem 9 and Proposition 10 also assume that the utility function of the game is represented explicitly in normal form as a payoff matrix. If we changed the game representation (e.g., to boolean circuits, extensive form game trees, quantified boolean formulas, or even Turing machines), this can affect the complexity of the SPI problem. For example, Gabarró, García and Serna [16] show that the game isomorphism problem on normal-form games is equivalent to the graph isomorphism problem, while it is equivalent to the (likely computationally harder) boolean circuit isomorphism problem for a weighted boolean formula game representation. Solving the SPI problem requires solving a subset game isomorphism problem (see the proof of Lemma 28 in Appendix 4 for more detail). We therefore suspect that the SPI problem analogously increases in computational complexity (perhaps to being \(\Sigma _2^p\)-complete) if we treat games in a weighted boolean formula representation. In fact, even reducing a game using strict dominance by pure strategies – which contributes only insignificantly to the complexity of the SPI problem for normal-form games – is difficult in some game representations [10], Sect. 6. Note, however, that for any game representation to which 2-player normal-form games can be efficiently reduced – such as, for example, extensive-form games – the hardness result also applies.

5 Safe Pareto improvements under improved coordination

5.1 Setup

In this section, we imagine that the players are able to simply invent new token strategies with new payoffs that arise from mixing existing feasible payoffs. To define this formally, we first define for any game \(\Gamma =(A,\mathbf {u})\),

$$\begin{aligned} \mathcal {C}(\Gamma ){:=}\mathbf {u}(\Delta (A)) =\left\{ \sum _{\mathbf {a}\in A} p_{\mathbf {a}} {\mathbf {u}} (\mathbf {a}) \;|\; \sum _{\mathbf {a}\in A} p_{\mathbf {a}} =1 \text { and } \forall \mathbf {a}\in A:p_{\mathbf {a}}\in [0,1] \right\} \end{aligned}$$

to be the set of payoff vectors that are feasible by some correlated strategy. The underlying notion of correlated strategies is the same as in correlated equilibrium [2, 3], but in this paper it will not be relevant whether any such strategy is a correlated equilibrium of \(\Gamma\). Instead their use will hinge on the use of commitments (cf. [34]). Note that \(\mathcal {C}(\Gamma )\) is exactly the convex closure of \(\mathbf {u}(A)\), i.e., the convex closure of the set of deterministically achievable utilities of the original game.

For any game \(\Gamma\), we then imagine that in addition to subset games, the players can let the representatives play a perfect-coordination token game \((A^s,\mathbf {u}^s,\mathbf {u}^e)\), where for all i, \(A_i^s\cap A_i=\emptyset\) and \(u_i^s:A^s \rightarrow \mathbb {R}\) are arbitrary utility functions to be used by the representatives and \({\mathbf {u}}^e:A^s \rightarrow \mathcal {C}(\Gamma )\) are the utilities that the original players assign to the token strategies.

The instruction \((A^s,\mathbf {u}^s,\mathbf {u}^e)\) lets the representatives play the game \((A^s,\mathbf {u}^s)\) as usual. However, the strategies \(A^s\) are imagined to be meaningless token strategies which do not resolve the given game \(\Gamma\). Once some token strategies \(\mathbf {a}^s\) are selected, these are translated into some probability distribution over A, i.e., into a correlated strategy of the original game. This correlated strategy is then played by the original players, thus giving rise to (expected) utilities \({\mathbf {u}}^e(\mathbf {a}^s)\in \mathcal {C}(\Gamma )\). These distributions and thus utilities are specified by the original players.

Definition 6

Let \(\Gamma\) be a game. A perfect-coordination SPI for \(\Gamma\) is a perfect-coordination token game \((A^s,\mathbf {u}^s,\mathbf {u}^e)\) for \(\Gamma\) s.t. \(\mathbf {u}^e(\Pi (A^s,u^s))\ge \mathbf {u} (\Pi (\Gamma ))\) with certainty. We call \((A^s,\mathbf {u}^s,\mathbf {u}^e)\) a strict perfect-coordination SPI if there furthermore is a player i for whom \(u^e_i(\Pi (A^s,u^s))> u_i (\Pi (\Gamma ))\) with positive probability.

As an example, imagine that \(\Gamma\) is just the \(\mathrm {DM}\)-\(\mathrm {RM}\) subset game of the Demand Game of Table 1. Then, intuitively, an SPI under improved coordination could consist of the original players telling the representatives, “Play as if you were playing the \(\mathrm {DM}\)-\(\mathrm {RM}\) subset game of the Demand Game, but whenever you find yourself playing \((\mathrm {DM}, \mathrm {DM})\), randomize [according to some given distribution] between the other (Pareto-optimal) outcomes instead”. Formally, \(A_1^s=\{\hat{D},\hat{R}\}\) and \(A_2^s=\{\hat{D},\hat{R}\}\) would then consist of tokenized versions of the original strategies. The utility functions \(u_1^s\) and \(u_2^s\) are then simply the same as in the original Demand Game except that they are applied to the token strategies. For example, \(\mathbf {u}^s(\hat{D}, \hat{R})=(2,0)\). The utilities for the original players remove the conflict outcome. For example, the original players might specify \({\mathbf {u}}^e(\hat{D},\hat{D})=(1,1)\), representing that the representatives are supposed to play \((\mathrm {RM},\mathrm {RM})\) in the \((\hat{D},\hat{D})\) case. For all other outcomes \((\hat{a}_1,\hat{a}_2)\), it must be the case that \({\mathbf {u}}^e(\hat{a}_1,\hat{a}_2) ={\mathbf {u}}^s(\hat{a}_1,\hat{a}_2)\) because the other outcomes cannot be Pareto-improved upon. As with our earlier SPIs for the Demand Game, Assumption 2 implies that \(\Gamma \sim _{\Phi } \Gamma ^s\), where \(\Phi\) maps the original conflict outcome \((\mathrm {DM}, \mathrm {DM})\) onto the Pareto-optimal (\(\hat{D}\),\(\hat{D}\)).

Relative to the SPIs considered up until now, these new types of instructions put significant additional requirements on how the representatives interact. They now have to engage in a two-round process of first choosing and observing one another’s token strategies and then playing a correlated strategy for the original game. Further, it must be the case that this additional coordination does not affect the payoffs of the original outcomes. The latter may not be the case in, e.g., the Game of Chicken. That is, we could imagine a Game of Chicken in which coordination is possible but that the rewards of the game change if the players do coordinate. After all, the underlying story in the Game of Chicken is that the positive reward – admiration from peers – is attained precisely for accepting a grave risk.

5.2 Finding safe Pareto improvement under improved representative coordination

With these more powerful ways to instruct representatives, we can now replace individual outcomes of the default game ad libitum. For example, in the reduced Demand Game, we singled out the outcome \((\mathrm {DM}, \mathrm {DM})\) as Pareto-suboptimal and replaced it by a Pareto-optimal outcome, while keeping all other outcomes the same. This allows us to construct SPIs in many more games than before.

Definition 7

The strict full-coordination SPI decision problem consists in deciding for any given \(\Gamma\) whether under Assumption 2 there is a perfect-coordination SPI \(\Gamma ^s\) for \(\Gamma\).

Lemma 11

For a given n-player game \(\Gamma\) and payoff vector \(\mathbf {y}\in \mathbb {R}^n\), it can be decided by linear programming and thus in polynomial time whether \(\mathbf {y}\) is Pareto-optimal in \(\mathcal {C}(\Gamma )\).

For an introduction to linear programming, see, e.g., Schrijver [50]. In short, a linear program is a specific type of constrained optimization problem that can be solved efficiently.

Proof

Finding a Pareto improvement on a given \(\mathbf {y}\in \mathbb {R}^n\) can be formulated as the following linear program:

$$\begin{aligned} \text {Variables:}\quad&p_{\mathbf {a}}\in [0,1]\text{ for all }\mathbf {a}\in A\\ \text{Maximize} \quad&\sum _{i=1}^n \left( \sum _{\mathbf {a}\in A} p_{\mathbf {a}} u_i(\mathbf {a}) \right) - y_i \\ \text{s.t.} \quad&\sum _{\mathbf {a}\in A} p_{\mathbf {a}}=1\\&\sum _{\mathbf {a}\in A} p_{\mathbf {a}} u_i(\mathbf {a}) \ge y_i\text{ for }i=1,...,n \end{aligned}$$

\(\square\)

Based on Lemma 11, Algorithm 1 decides whether there is a strict perfect-coordination SPI for a given game \(\Gamma\).

figure a

It is easy to see that this algorithm runs in polynomial time (in the size of, e.g., the normal form representation of the game). It is also correct: if it returns True, simply replace the Pareto-suboptimal outcome while keeping all other outcomes the same; if it returns False, then all outcomes are Pareto-optimal within \(\mathcal {C}(\Gamma )\) and so there can be no strict SPI. We summarize this result in the following proposition.

Proposition 12

Assuming \(\mathrm {supp}(\Pi (\Gamma ))\) is known and that Assumption 2 holds, it can be decided in polynomial time whether there is a strict perfect-coordination SPI.

5.3 Characterizing safe Pareto improvements under improved representative coordination

From the problem of deciding whether there are strict SPIs under improved coordination at all, we move on to the question of what different perfect-coordination SPIs there are. In particular, one might ask what the cost is of only considering safe Pareto improvements relative to acting on a probability distribution over \(\Pi (\Gamma )\) and the resulting expected utilities \(\mathbb {E}\left[ \mathbf {u}(\Pi (\Gamma ))\right]\). We start with a lemma that directly provides a characterization. So far, all the considered perfect-coordination SPIs \((A^s,\mathbf {u}^s,\mathbf {u}^e)\) for a game \((A,\mathbf {u})\) have consisted in letting the representatives play a game \((A^s,\mathbf {u}^s)\) that is isomorphic to the original game, but Pareto-improves (from the original players’ perspectives, i.e., \(\mathbf {u}^e\)) at least one of the outcomes. It turns out that we can restrict attention to this very simple type of SPI under improved coordination.

Lemma 13

Let \(\Gamma =(\{a_1^1,...,a_1^{l_1}\},...,\{a_n^1,...,a_n^{l_n}\},\mathbf {u})\) be any game. Let \(\Gamma '\) be a perfect-coordination SPI on \(\Gamma\). Then we can define \({\mathbf {u}}^e\) with values in \(\mathcal {C}(\Gamma )\) such that under Assumption 2 the game

$$\begin{aligned} {\begin{matrix} \Gamma ^s=\bigg (&{} \hat{A}_1{:=}\{\hat{a}_1^1,...,\hat{a}_1^{l_1}\},...,\hat{A}_n{:=}\{\hat{a}_n^1,...,\hat{a}_n^{l_n}\},\\ &{}~~~\hat{ \mathbf {u}}:(\hat{a}_1^{i_1},...,\hat{a}_n^{i_n})\mapsto \mathbf {u}( a_1^{i_1},..., a_n^{i_n}),\mathbf {u}^e \bigg ) \end{matrix}} \end{aligned}$$

is also an SPI on \(\Gamma\), with

$$\begin{aligned} \mathbb {E}\left[ {\mathbf {u}}(\Pi (\Gamma ^s)) \;|\; \Pi (\Gamma ){=}\mathbf {a} \right] = \mathbb {E}\left[ {\mathbf {u}}(\Pi (\Gamma ')) \;|\; \Pi (\Gamma ){=}\mathbf {a} \right] \end{aligned}$$

for all \(\mathbf {a}\in A\) and consequently \(\mathbb {E}\left[ {\mathbf {u}}(\Pi (\Gamma ^s)) \right] = \mathbb {E}\left[ {\mathbf {u}}(\Pi (\Gamma ')) \right]\).

Proof

First note that \((\hat{A},\hat{\mathbf {u}})\) is isomorphic to \(\Gamma\). Thus by Assumption 2, there is isomorphism \(\Phi\) s.t. \(\Gamma \sim _{\Phi } (\hat{A},\hat{\mathbf {u}})\). WLOG assume that \(\Phi\) simply maps \(a_1^{i_1},..., a_n^{i_n} \mapsto \hat{a}_1^{i_1},...,\hat{a}_n^{i_n}\). Then define \(\mathbf {u}^e\) as follows:

$$\begin{aligned} \mathbf {u}^e(\hat{a}_1^{i_1},...,\hat{a}_n^{i_n})=\mathbb {E}\left[ {\mathbf {u}}' (\Pi (\Gamma ')) \mid \Pi (\Gamma )=( a_1^{i_1},...,a_n^{i_n}) \right] . \end{aligned}$$

Here \({\mathbf {u}}'\) describes the utilities that the original players assign to the outcomes of \(\Gamma '\). Since \({\mathbf {u}}'\) maps onto \(\mathcal {C}(\Gamma )\) and \(\mathcal {C}(\Gamma )\) is convex, \(\mathbf {u}^e\) as defined also maps into \(\mathcal {C}(\Gamma )\) as required. Note that for all \(a_1^{i_1},...,a_n^{i_n}\) it is by assumption \({\mathbf {u}}' (\Pi (\Gamma '))\ge {\mathbf {u}}(a_1^{i_1},...,a_n^{i_n})\) with certainty. Hence,

$$\begin{aligned} u^{e} (\hat{a}_{1}^{{i_{1} }} ,...,\hat{a}_{n}^{{i_{n} }} )) = & \;{\mathbb{E}}\left[ {{\mathbf{u^{\prime}}}(\Pi (\Gamma ^{\prime})){\mid }\Pi (\Gamma ) = (a_{1}^{{i_{1} }} ,...,a_{n}^{{i_{n} }} )} \right] \\& \ge\;{\mathbf{u}}(a_{1}^{{i_{1} }} ,...,a_{n}^{{i_{n} }} ), \\ \end{aligned}$$

as required.\(\square\)

Because of this result, we will focus on these particular types of SPIs, which simply create an isomorphic game with different (Pareto-better) utilities. Note, however, that without assigning exact probabilities to the distributions of \(\Pi (\Gamma ),\Pi (\Gamma ')\), the original players will in general not be able to construct a \(\Gamma ^s\) that satisfies the expected payoff equalities. For this reason, one could still conceive of situations in which a different type of SPI would be chosen by the original players and the original players are unable to instead choose an SPI of the type described in Lemma 13.

Lemma 13 directly implies a characterization of the expected utilities that can be achieved with perfect-coordination SPIs. Of course, this characterization depends on the exact distribution of \(\Pi (\Gamma )\). We omit the statement of this result. However, we state the following implication.

Corollary 14

Under Assumption 2, the set of Pareto improvements that are safely achievable with perfect coordination

$$\begin{aligned} \{ \mathbb {E}[ {\mathbf {u}}(\Gamma ') ]\mid \Gamma '\text { is perfect-coordination SPI on }\Gamma \} \end{aligned}$$

is a convex polygon.

Because of this result, one can also efficiently optimize convex functions over the set of perfect-coordination SPIs. Even without referring to the distribution \(\Pi (\Gamma )\), many interesting questions can be answered efficiently. For example, we can efficiently identify the perfect-coordination SPI that maximizes the minimum improvements across players and outcomes \(\mathbf {a}\in A\).

In the following, we aim to use Lemma 13 and Corollary 14 to give maximally strong positive results about what Pareto improvements can be safely achieved, without referring to exact probabilities over \(\Pi (\Gamma )\). To keep things simple, we will do this only for the case of two players. To state our results, we first need some notation: We use

$$\begin{aligned} \mathrm {PF}(\mathcal {C}) {:=}\left\{ \mathbf {y}\in \mathcal {C} \;|\; \not \exists \mathbf {y}'{\in }\mathcal {C},i\in \{1,...,n\}: \mathbf {y}'\ge \mathbf {y}, y'_i>y \right\} \end{aligned}$$

to denote the Pareto frontier of a convex polygon \(\mathcal {C}\) (or more generally convex, closed set). For any real number \(x\in \mathbb {R}\), we use \(\pi _i(x,\mathcal {C}(\Gamma ))\) to denote the \({\mathbf {y}}'\in \mathcal {C}(\Gamma )\) which maximizes \(y'_{-i}\) under the constraint \(y'_{i}= x\). (Recall that we consider 2-player games, so \(y'_{-i}\) is a single real number.) Note that such a \({\mathbf {y}}'\) exists if and only if x is i’s utility in some feasible payoff vector. We first state our result formally. Afterwards, we will give a graphical explanation of the result, which we believe is easier to understand.

Theorem 15

Make Assumption 2. Let \(\Gamma\) be a two-player game. Let \({\mathbf {y}}\in \mathbb {R}^2\) be some potentially unsafe Pareto improvement on \(\mathbb {E}\left[ {\mathbf {u}}(\Pi (\Gamma )) \right]\). For \(i=1,2\), let \(x^{\mathrm {min}/\mathrm {max}}_i=\min /\max u_i\left( \mathrm {supp}(\Pi (\Gamma )) \right)\). Then:

  1. A)

    If there is some element in \(\mathcal {C}(\Gamma )\) which Pareto-dominates all of \(\mathrm {supp}(\Pi (\Gamma ))\) and if \({\mathbf {y}}\) is Pareto-dominated by an element of at least one of the following three sets:

  • \(L_1{:=}\) the line segment between \(\pi _1(x_1^{\mathrm {min}},\mathrm {PF}(\mathcal {C}(\Gamma ))\) and \(\pi _1(x_1^{\mathrm {max}},\mathrm {PF}(\mathcal {C}(\Gamma ))\);

  • \(L_2{:=}\) the segment of the curve \(\mathrm {PF}(\mathcal {C}(\Gamma ))\) between \(\pi _{1}(x^{\mathrm {max}}_1 , \mathrm {PF}(\mathcal {C}(\Gamma ))))\) and \(\pi _{2}(x^{\mathrm {max}}_2, \mathrm {PF}(\mathcal {C}(\Gamma ))))\);

  • \(L_3{:=}\) the line segment between \(\pi _2(x_2^{\mathrm {max}},\mathrm {PF}(\mathcal {C}(\Gamma ))\) and \(\pi _2(x_2^{\mathrm {min}},\mathrm {PF}(\mathcal {C}(\Gamma ))\).

Then there is an SPI under improved coordination \(\Gamma ^s\) such that \(\mathbb {E}\left[ {\mathbf {u}}(\Pi (\Gamma ^s)) \right] ={\mathbf {y}}\).

  1. B)

    If there is no element in \(\mathcal {C}(\Gamma )\) which Pareto-dominates all of \(\mathrm {supp}(\Pi (\Gamma ))\) and if \(\mathbf {y}\) is Pareto-dominated by an element each of \(L_1\) and \(L_3\) as defined above, then there is a perfect-coordination SPI \(\Gamma ^s\) such that \(\mathbb {E}\left[ {\mathbf {u}}(\Pi (\Gamma ^s)) \right] ={\mathbf {y}}\).

We now illustrate the result graphically. We start with Case A, which is illustrated in Fig. 2. The Pareto-frontier is the solid line in the north and east. The points marked x indicate outcomes in \(\mathrm {supp}(\Pi (\Gamma ))\). The point marked by a filled circle indicates the expected value of the default equilibrium \(\mathbb {E}\left[ {\mathbf {u}} (\Pi (\Gamma )) \right]\). The vertical dashed lines starting at the two extreme x marks illustrate the application of \(\pi _1\) to project \(x_1^{\mathrm {min}/\mathrm {max}}\) onto the Pareto frontier. The dotted line between these two points is \(L_1\). Similarly, the horizontal dashed lines starting at x marks illustrate the application of \(\pi _2\) to project \(x_2^{\mathrm {min}/\mathrm {max}}\) onto the Pareto frontier. The line segment between these two points is \(L_3\). In this case, this line segments lies on the Pareto frontier. The set \(L_2\) is simply that part of the Pareto frontier, which Pareto-dominates all elements of \(\mathrm {supp}(\Pi (\Gamma ))\), i.e., the part of the Pareto frontier to the north-east between the two intersections with the northern horizontal dashed line and eastern vertical dashed line. The theorem states that for some \({\mathbf {y}}\in \mathbb {R}^2\) to be a Pareto improvement, it must be in the gray area.

Case B of Theorem 15 is depicted in Fig. 3. Note that here the two line segments \(L_1\) and \(L_3\) intersect. To ensure that a Pareto improvement is safely achievable, the theorem requires that it is below both of these lines, as indicated again by the gray area.

Fig. 2
figure 2

This figure illustrates Theorem 15, Case A

Fig. 3
figure 3

This figure illustrates Theorem 15, Case B

For a full proof, see Appendix 5. Roughly, Theorem 15 is proven by re-mapping each of the outcomes of the original game as per Lemma 13. For example, the projection of the default equilibrium \(\mathbb {E}\left[ \mathbf {u}(\Pi (\Gamma ))\right]\) (i.e., the filled circle) onto \(L_1\) is obtained as an SPI by projecting all the outcomes (i.e., all the x marks) onto \(L_1\). In Case A, any utility vector \(\mathbf {y}\in L_2\) that Pareto-improves on all outcomes of the original game can be obtained by re-mapping all outcomes onto \(\mathbf {y}\). Other kinds of \(\mathbf {y}\) are handled similarly.

As a corollary of Theorem 15, we can see that all (potentially unsafe) Pareto improvements in the \(\mathrm {DM}\)-\(\mathrm {RM}\) subset game of the Demand Game of Table 1 are equivalent to some perfect-coordination SPI. However, this is not always the case:

Proposition 16

There is a game \(\Gamma =(A,\mathbf {u})\), representatives \(\Pi\) that satisfy Assumptions 1 and 2, and an outcome \(\mathbf {a}\in A\) s.t. \(u_i(\mathbf {a})>\mathbb {E}\left[ u_i(\Pi (\Gamma )) \right]\) for all players i, but there is no perfect-coordination SPI \((A^s,\mathbf {u}^s,\mathbf {u}^e)\) s.t. for all players i, \(\mathbb {E}\left[ u^e_i(\Pi (A^s,\mathbf {u}^s)) \right] = u_i(\mathbf {a})\).

As an example of such a game, consider the game in Table 7. Strategy c can be eliminated by strict dominance (Assumption 1) for both players, leaving a typical Chicken-like payoff structure with two pure Nash equilibria ((ab) and (ba)), as well as a mixed Nash equilibrium \((\nicefrac {3}{8}*a+\nicefrac {5}{8}*b,\nicefrac {3}{8}*a+\nicefrac {5}{8}*b)\).

Table 7 An example of a game in which – depending on \(\Pi\) – a Pareto improvement may not be safely achievable

Now let us say that in the resulting game \(P(\Pi (\Gamma ){=}(a,b))=p=P(\Pi (\Gamma ){=}(b,a))\) for some p with \(0< p \le \nicefrac {1}{2}\). Then one (unsafe) Pareto improvement would be to simply always have the representatives play (cc) for a certain payoff of (3, 3). Unfortunately, there is no safe Pareto improvement with the same expected payoff. Notice that (3, 3) is the unique element of \(\mathcal {C}(\Gamma )\) that maximizes the sum of the two players’ utilities. By linearity of expectation and convexity of \(\mathcal {C}(\Gamma )\), if for any \(\Gamma ^s\) it is \(\mathbb {E}\left[ {\mathbf {u}}(\Pi (\Gamma ^s)) \right] =(3,3)\), it must be \({\mathbf {u}}(\Pi (\Gamma ^s))=(3,3)\) with certainty. Unfortunately, in any safe Pareto improvement the outcomes (ab) and (ba) must corresponds to outcomes that still give utilities of (4, 0) and (0, 4), respectively, because these are Pareto-optimal within the set of feasible payoff vectors. We illustrate this as an example of Case B of Theorem 15 in Fig. 4.

Fig. 4
figure 4

This figure illustrates the Game of Table 7 as an instance of Theorem 15, Case B

6 The SPI selection problem

In the Demand Game, there happens to be a single non-trivial SPI. However, in general (even without the type of coordination assumed in Sect. 5) there may be multiple SPIs that result in different payoffs for the players. For example, imagine an extension of the Demand Game imagine that both players have an additional action \(\mathrm {DL}'\), which is like \(\mathrm {DL}\), except that under \((\mathrm {DL}', \mathrm {DL}')\), Aliceland can peacefully annex the desert. Aliceland prefers this SPI over the original one, while Bobbesia has the opposite preference. In other cases, it may be unclear to some or all of the players which of two SPIs they prefer. For example, imagine a version of the Demand Game in which one SPI mostly improves on \((\mathrm {DM},\mathrm {DM})\) and another mostly improves on the other three outcomes, then outcome probabilities are required for comparing the two. If multiple SPIs are available, the original players would be left with the difficult decision of which SPI to demand in their instruction.Footnote 9

This difficulty of choosing what SPI to demand cannot be denied. However, we would here like to emphasize that players can profit from the use of SPIs even without addressing this SPI selection problem. To do so, a player picks an instruction that is very compliant (“dove-ish”) w.r.t. what SPI is chosen, e.g., one that simply goes with whatever SPI the other players demand as long as that SPI cannot further be safely Pareto-improved upon.Footnote 10 In many cases, all such SPIs benefit all players. For example, optimal SPIs in bargaining scenarios like the Demand Game remove the conflict outcome, which benefits all parties. Thus, a player can expect a safe improvement even under such maximally compliant demands on the selected SPI.

In some cases there may also be natural choices of demands a là Schelling or focal points ([48], pp. 54–58). If the underlying game is symmetric, a symmetric safe Pareto improvement may be a natural choice. For example, the fully reduced version of the Demand Game of Table 1 is symmetric. Hence, we might expect that even if multiple SPIs were available, the original players would choose a symmetric one.

7 Conclusion and future directions

Safe Pareto improvements are a promising new idea for delegating strategic decision making. To conclude this paper, we discuss some ideas for further research on SPIs.

Straightforward technical questions arise in the context of the complexity results of Sect. 4.6. First, what impact on the complexity does varying the assumptions have? Our NP-completeness proof is easy to generalize at least to some other types of assumptions. It would be interesting to give a generic version of the result. We also wonder whether there are plausible assumptions under which the complexity changes in interesting ways. Second, one could ask how the complexity changes if we use more sophisticated game representations (see the remarks at the end of that section). Third, one could impose additional restrictions on the sought SPI. Fourth, we could restrict the games under consideration. Are there games in which it becomes easy to decide whether there is an SPI?

It would also be interesting to see what real-world situations can already be interpreted as utilizing SPIs, or could be Pareto-improved upon using SPIs.