Computing a Pessimistic Stackelberg Equilibrium with Multiple Followers: The Mixed-Pure Case

The search problem of computing a Stackelberg (or leader-follower)equilibrium (also referred to as an optimal strategy to commit to) has been widely investigated in the scientific literature in, almost exclusively, the single-follower setting. Although the optimistic and pessimistic versions of the problem, i.e., those where the single follower breaks any ties among multiple equilibria either in favour or against the leader, are solved with different methodologies, both cases allow for efficient, polynomial-time algorithms based on linear programming. The situation is different with multiple followers, where results are only sporadic and depend strictly on the nature of the followers’ game. In this paper, we investigate the setting of a normal-form game with a single leader and multiple followers who, after observing the leader’s commitment, play a Nash equilibrium. When both leader and followers are allowed to play mixed strategies, the corresponding search problem, both in the optimistic and pessimistic versions, is known to be inapproximable in polynomial time to within any multiplicative polynomial factor unless P=NP\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textsf {P}=\textsf {NP}$$\end{document}. Exact algorithms are known only for the optimistic case. We focus on the case where the followers play pure strategies—a restriction that applies to a number of real-world scenarios and which, in principle, makes the problem easier—under the assumption of pessimism (the optimistic version of the problem can be straightforwardly solved in polynomial time). After casting this search problem (with followers playing pure strategies) as a pessimistic bilevel programming problem, we show that, with two followers, the problem is NP-hard and, with three or more followers, it cannot be approximated in polynomial time to within any multiplicative factor which is polynomial in the size of the normal-form game, nor, assuming utilities in [0, 1], to within any constant additive loss stricly smaller than 1 unless P=NP\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textsf {P}=\textsf {NP}$$\end{document}. This shows that, differently from what happens in the optimistic version, hardness and inapproximability in the pessimistic problem are not due to the adoption of mixed strategies. We then show that the problem admits, in the general case, a supremum but not a maximum, and we propose a single-level mathematical programming reformulation which asks for the maximization of a nonconcave quadratic function over an unbounded nonconvex feasible region defined by linear and quadratic constraints. Since, due to admitting a supremum but not a maximum, only a restricted version of this formulation can be solved to optimality with state-of-the-art methods, we propose an exact ad hoc algorithm (which we also embed within a branch-and-bound scheme) capable of computing the supremum of the problem and, for cases where there is no leader’s strategy where such value is attained, also an α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}-approximate strategy where α>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha > 0$$\end{document} is an arbitrary additive loss (at most as large as the supremum). We conclude the paper by evaluating the scalability of our algorithms via computational experiments on a well-established testbed of game instances.

quadratic function over an unbounded nonconvex feasible region defined by linear and

Introduction
In recent years, Stackelberg (or Leader-Follower) Games (SGs) and their corresponding Stackelberg Equilibria (SEs) have attracted a growing interest in many disciplines, including theoretical computer science, artificial intelligence, and operations research. SGs describe situations where one player (the leader) commits to a strategy and the other players (the followers) first observe the leader's commitment and, then, decide how to play. In the literature, SEs are often referred to as optimal strategies (for the leader) to commit to. SGs encompass a broad array of real-world games. A prominent example is that one of security games, where a defender, acting as leader, is tasked to allocate scarce resources to protect valuable targets from an attacker, who acts as follower [3,17,28]. Besides the security domain, applications can be found in, among others, interdiction games [10,23], toll-setting problems [19], and network routing [2].
While, with only a few exceptions (see [6,8,13,18,21]), the majority of the gametheoretical investigations on the computation of SEs assumes the presence of a single follower, in this work we address the multi-follower case.
When facing an SG and, in particular, a multi-follower one, two aspects need to be considered: the type of game (induced by the leader's strategy) the followers play and, in it, how ties among the multiple equilibria which could arise are broken.
As to the nature of the followers' game, and restricting ourselves to the cases which look more natural, the followers may play hierarchically one at a time, as in a hierarchical Stackelberg game [14], simultaneously and cooperatively [13], or simultaneously and noncooperatively [4].
As to breaking ties among multiple equilibria, it is natural to consider two cases: the optimistic one (often called strong SE), where the followers end up playing an equilibrium which maximizes the leader's utility, and the pessimistic one (often called weak SE), where the followers end up playing an equilibrium by which the leader's utility is minimized. This distinction is customary in the literature since the seminal paper on SEs with mixed-strategy commitments by Von Stengel and Zamir [34]. We remark that the adoption of either the optimistic or the pessimistic setting does not correspond to assuming that the followers could necessarily agree on an optimistic or pessimistic equilibrium in a practical application. Rather, by computing an optimistic and a pessimistic SE the leader becomes aware of the largest and smallest utility she can get without having to make any assumptions on which equilibrium the followers would actually end up playing if the game resulting from the leader's commitment were to admit more than a single one. What is more, while an optimistic SE accounts for the best case for the leader, a pessimistic SE accounts for the worst case. In this sense, the computation of a pessimistic SE is paramount in realistic scenarios as, differently from the optimistic one, it is robust, guaranteeing the leader a lower bound on the maximum utility she would get independently of how the followers would break ties among multiple equilibria. As we will see, though, this degree of robustness comes at a high computational cost, as computing a pessimistic SE is a much harder task than computing its optimistic counterpart.

Stackelberg Nash Equilibria
Throughout the paper, we will consider the case of normal-form games where, after the leader's commitment to a strategy, the followers play simultaneously and noncooperatively, reaching a Nash equilibrium. We refer to the corresponding equilibrium as Stackelberg Nash Equilibrium (SNE). 1 We focus on the case where the followers are restricted to pure strategies. This restriction is motivated by several reasons. First, while the unrestricted problem is already hard with two followers (as shown in [4]), it is not known whether the restriction to followers playing pure strategies makes the problem easier or not. Secondly, many games admit pure-strategy NEs, among which potential games [25], congestion games [29], and toll-setting problems [19] and, as we show in Sect. 3.3, the same also holds with high probability in many unstructured games.

Original Contributions
After briefly pointing out that an optimistic SNE (with followers restricted to pure strategies) can be computed efficiently (in polynomial time) by a mixture of enumeration and linear programming, we entirely devote the remainder of the paper to the pessimistic case (with, again, followers restricted to pure strategies). In terms of computational complexity, we show that, differently from the optimistic case, in the pessimistic one the equilibrium-finding problem is NP-hard with two or more followers, while, when the number of followers is three or more, the problem cannot be approximated in polynomial time to within any polynomial multiplicative factor nor to within any constant additive loss unless P = NP. To establish these two results, we introduce two reductions, one from Independent Set and the other one from 3-SAT.
After analyzing the complexity of the problem, we focus on its algorithmic aspects. First, we formulate the problem as a pessimistic bilevel programming problem with multiple followers. We then show how to recast it as a single-level Quadratically Constrained Quadratic Program (QCQP), which we show to be impractical to solve due to admitting a supremum, but not a maximum. We then introduce a restriction based on a Mixed-Integer Linear Program (MILP) which, while forsaking optimality, always admits an optimal (restricted) solution. Next, we propose an exact algorithm to compute the value of the supremum of the problem based on an enumeration scheme which, at each iteration, solves a lexicographic MILP (lex-MILP) where the two objective functions are optimized in sequence. Subsequently, we embed the enumerative algorithm within a branch-and-bound scheme, obtaining an algorithm which is, in practice, much faster. We also extend the algorithm (in both versions) so that, for cases where the supremum is not a maximum, it returns a strategy by which the leader can obtain a utility within an additive loss α with respect to the supremum, for any arbitrarily chosen α > 0. To conclude, we experimentally evaluate the scalability of our methods over a testbed of randomly generated instances.
The status, in terms of complexity and known algorithms, of the problem of computing an SNE (with followers playing pure or mixed strategies) is summarized in Table 1. The original results we provide in this paper are reported in boldface.

Paper Outline
The paper is organized as follows. 2 Previous works are introduced in Sect. 2. The problem we study is formally stated in Sect. 3, together with some preliminary results. In Sect. 4, we present the computational complexity results. Sect. 5 introduces the single-level reformulation(s) of the problem, while Sect. 6 describes our exact algorithm (in its two versions). An empirical evaluation of our methods is carried out in Sect. 7. Sect. 8 concludes the paper.

Previous Works
As we mentioned in Sect. 1, most of the works on (normal-form) SGs focus on the single-follower case. In such case, as shown in [14] the follower always plays a pure strategy (except for degenerate games). In the optimistic case, an SE can be found in polynomial time by solving a Linear Program (LP) for each action of the (single) follower (the algorithm is, thus, a multi-LP). Each LP maximizes the expected utility of the leader subject to a set of constraints imposing that the given follower's action is a best-response [14]. As shown in [13], all these LPs can be encoded into a single LP-a slight variation of the LP that is used to compute a correlated equilibrium (the solution concept where all the players can exploit a correlation device to coordinate their strategies). 3 Some works study the equilibrium-finding problem (only in the 2 A preliminary version of this work appeared in [12]. Compared to it, this paper extends the complexity results by studying the inapproximability of the problem (Sect. 4), introduces and analyses a single-level QCQP reformulation and an MILP restriction of it (Sect. 5), substantially extends the mathematical details needed to establish the correctness of our algorithms, also illustrating their step-by-step execution on an example (Sect. 6 and Appendix A), and it reports on an extensive set of computational results carried out to validate our methods (Sect. 7). 3 In this case, the leader and the follower play correlated strategies under rationality constraints imposed on the follower only, maximizing the leader's expected utility. Algorithm multi-LP [14,34] multi-LP [14,34] multi-LP [34] multi-LP [34] n 3, |F| 2 Complexity P NP-hard, inapx. [4] optimistic version) in structured games where the action space is combinatorial. See [7] for more references.
For what concerns the pessimistic single-follower case, the authors of [34] study the problem of computing the supremum of the leader's expected utility. They show that, for the latter, it suffices to consider the follower's actions which constitute a best-response to a full-dimensional region of the leader's strategy space. The multi-LP algorithm the authors propose solves two LPs per action of the follower, one to verify whether the best-response region for that action is full-dimensional (so to discard it if full-dimensionality does not hold) and a second one to compute the best leader's strategy within that best-response region. The algorithm runs in polynomial time. While the authors limit their analysis to computing the supremum of the leader's utility, we remark that such value does not always translate into a strategy that the leader can play as, in the general case where the leader's utility does not admit a maximum, there is no leader's strategy giving her a utility equal to the supremum. In such cases, one should rather look for a strategy providing the leader with an expected utility which approximates the value of the supremum. This aspect, which is not addressed in [34], will be tackled on the multi-follower case by our work.
The multi-follower case, which, to the best of our knowledge, has only been investigated in [4,6], is computationally much harder than the single-follower case. It is, in the general case where leader and followers are entitled to mixed strategies, NP-hard and inapproximable in polynomial time to within any multiplicative factor which is polynomial in the size of the normal-form game unless P = NP. 4 In the aforementioned works, the problem of finding an equilibrium in the optimistic case is formulated as a nonlinear and nonconvex mathematical program and solved to global optimality (within a given tolerance) with spatial branch-and-bound techniques. No exact methods are proposed for the pessimistic case.

Problem Statement and Preliminary Results
After setting the notation used throughout the paper, this section offers a formal definition of the equilibrium-finding problem we tackle in this work and illustrates some of its properties.

Notation
Let N = {1, . . . , n} be the set of players and, for each player p ∈ N , let A p be her set of actions, of cardinality m p = |A p |. Let also A = Ś p∈N A p = A 1 × · · · × A n . For each player p ∈ N , let x p ∈ [0, 1] m p , with a p ∈A p x a p p = 1, be her strategy vector (or strategy, for short), where each component x a p p of x p represents the probability by which player p plays action a p ∈ A p . For each player p ∈ N , let also Δ p = {x p ∈ [0, 1] m p : a p ∈A p x a p p = 1} be the set of her strategies, or strategy space, which corresponds to the standard (m p − 1)-simplex in R m p . A strategy is said pure when only one action is played with positive probability, i.e., when x p ∈ {0, 1} m p , and mixed otherwise. In the following, we denote the collection of strategies of the different players (called strategy profile) by x = (x 1 , . . . , x n ). For the case where all the strategies are pure, we denote the collection of actions played by the players (called action profile) by a = (a 1 , . . . , a n ).
Given a strategy profile x, we denote the collection of all the strategies in it but the one of player p ∈ N by x − p , i.e., x − p = (x 1 , . . . , x p−1 , x p+1 , . . . , x n ). Given x − p and a strategy vector x p , we denote the whole strategy profile x by (x − p , x p ). For action profiles, a − p and (a − p , a p ) are defined analogously. For the case were all players are restricted to pure strategies with the sole exception of player p, who is allowed to play mixed strategies, we use the notation (a − p , x p ).
We consider normal-form games where U p ∈ Q m 1 ×···×m n represents, for each player p ∈ N , her (multidimensional) utility (or payoff) matrix. For each p ∈ N and given an action profile a = (a 1 , . . . , a n ), each component U a 1 ...a n p of U p corresponds to the utility of player p when all the players play the action profile a. For the ease of presentation and when no ambiguity arises, we will often write U a p in place of U a 1 ...a n p . Given a collection of actions a − p and an action a p ∈ A p , we will also use U a − p ,a p p to denote the component of U p corresponding to the action profile (a − p , a p ). Given a strategy profile x = (x 1 , . . . , x n ), the expected utility of player p ∈ N is the n-th-degree polynomial a∈A U a p x a 1 1 x a 2 2 . . . x a n n . An action profile a = (a 1 , . . . , a n ) is called pure strategy Nash Equilibrium (or pure NE, for short) if, when the players in N \{ p} play as the equilibrium prescribes, player p cannot improve her utility by deviating from the equilibrium and playing another action a p = a p , for all p ∈ N . More generally, a mixed strategy Nash Equilibrium (or mixed NE, for short) is a strategy profile x = (x 1 , . . . , x n ) such that no player p ∈ N could improve her utility by playing a strategy x p = x p assuming the other players would play as the equilibrium prescribes. A mixed NE always exists [26] in a normal-form game, while a pure NE may not. For more details on (noncooperative) game theory, we refer the reader to [32].
Similar definitions hold for the case of SGs when assuming that only a subset of players (the followers) play an NE given the strategy the leader has committed to.

The Problem and Its Formulation
In the following, we assume that the n-th player takes the role of leader. We denote the set of followers (the first n − 1 players) by F = N \{n}. For the ease of notation, we also define A F = Ś p∈F A p as the set of followers' action profiles, i.e., the set of all collections of followers' actions. We also assume, unless otherwise stated, m p = m for every player p ∈ N , where m denotes the number of actions available to each player. This is without loss of generality, as one could always introduce additional actions with a utility small enough to guarantee that they would never be played, thus obtaining a game where each player has the same number of actions.
As we mentioned in Sect. 1, in this work we tackle the problem of computing an equilibrium in a normal-form game where the followers play a pure NE once they have observed the leader's commitment to a mixed strategy. We refer to an Optimistic Stackelberg Pure-Nash Equilibrium (O-SPNE) when the followers play a pure NE which maximises the leader's utility, and to a Pessimistic Stackelberg Pure-Nash Equilibrium (P-SPNE) when they seek a pure NE by which the leader's utility is minimized.

The Optimistic Case
Before focusing our attention entirely on the pessimistic case, let us briefly address the optimistic one.
An O-SPNE can be found by solving the following bilevel programming problem with n − 1followers: Note that, due to the integrality constraints on x p for all p ∈ F, each follower can play a single action with probability 1. By imposing the argmax constraint for each p ∈ F, the formulation guarantees that each follower plays a best-response action a p , thus guaranteeing that the action profile a −n = (a 1 , . . . , a n−1 ) with, for all a p ∈ A p , a p = 1 if and only if x a p p = 1, be an NE for the given x n . It is crucial to note that the maximization in the upper level is carried out not only w.r.t. x n , but also w.r.t. x −n . This way, if the followers' game admits multiple NEs for the chosen x n , optimal solutions to Problem (1) are then guaranteed to contain followers' action profiles which maximize the leader's utility-thus satisfying the assumption of optimism.
As shown in the following proposition, computing an O-SPNE is an easy task:

Proposition 1 In a normal-form game, an O-SPNE can be computed in polynomial time by solving a multi-LP.
Proof It suffices to enumerate, in O(m n−1 ), all the followers' action profiles a −n ∈ A F and, for each of them, solve an LP to: i) check whether there is a strategy vector x n for the leader for which the action profile a −n is an NE and ii) find, among all such strategy vectors x n , one which maximizes the leader's utility. The action profile a −n which, with the corresponding x n , yields the largest expected utility for the leader is an O-SPNE. Given a followers' action profile a −n , i) and ii) can be carried out in polynomial time by solving the following LP, where the second constraint guarantees that a −n = (a 1 , . . . , a n−1 ) is a pure NE for the followers' game for any of its solutions x n : max x n a n ∈A n U a −n ,a n n x a n n s.t. a n ∈A n U a −n ,a n p x a n n a n ∈A n U a 1 ...a p ...a n−1 a n p x a n n ∀ p ∈ F, a p ∈ A p \{a p } x n ∈ Δ n .
As the size of an instance of the problem is bounded from below by m n , one can enumerate over the set of the followers' action profiles (whose cardinality is m n−1 ) in polynomial time. The claim of polynomiality of the overall algorithm follows due to linear programming being solvable in polynomial time.

The Pessimistic Case
In the pessimistic case, the computation of a P-SPNE amounts to solving the following pessimistic bilevel problem with n − 1 followers: There are two differences between this problem and its optimistic counterpart: the presence of the min operator in the objective function and the fact that Problem (2) calls for a sup rather than for a max. The former guarantees that, in the presence of many pure NEs in the followers' game for the chosen x n , one which minimizes the leader's utility is selected. The sup operator is introduced because, as illustrated in Subsect. 3.3, the pessimistic problem does not admit a maximum in the general case. Throughout the paper, we will compactly refer to the above problem as where f is the leader's utility in the pessimistic case, defined as a function of x n . Since a pure NE may not exist for every leader's strategy x n , we define sup x n ∈Δ n f (x n ) = −∞ whenever there is no x n such that the resulting followers' game admits a pure NE. Note that f is always bounded from above when assuming bounded payoffs and, thus, sup x n ∈Δ n f (x n ) < ∞.

Some Preliminary Results
Since not all normal-form games admit a pure NE, a normal-form game may not admit an (optimistic or pessimistic) SPNE. Assuming that the payoffs of the game are independent and follow a uniform distribution, and provided that the number of players' actions is sufficiently large, with high probability there always exists a leader's commitment such that the resulting followers' game has at least one pure NE. This is shown in the following proposition: Proposition 2 Given a normal-form game with n players and independent uniformly distributed payoffs, the probability that there exists a leader's strategy x n ∈ Δ n inducing at least one pure NE in the followers' game approaches 1 as the number of players' actions m goes to infinity.
Proof As shown in [33], in an n-player normal-form game with independent and uniformly distributed payoffs the probability of the existence of a pure NE can be expressed as a function of the number of players' actions m, say P(m), which approaches 1 − 1 e for m → ∞. Assume now that we are given one such n-player normal-form game. Then, for every leader's action a n ∈ A n , let P a n (m) be the probability that the followers' game induced by the leader's action a n admits a pure NE. Since each of the followers' games resulting from the choice of a n also has independent and uniformly distributed payoffs, all the probabilities are equal, i.e., P a n (m) = P(m) for every a n ∈ A n . It follows that the probability that at least one of such followers' games admits a pure NE is: Since this probability approaches 1 as m goes to infinity, the probability of the existence of a leader's strategy x n ∈ Δ n which induces at least one pure NE in the followers' game also approaches 1 for m → ∞.
The fact that Problem (2) may not admit a maximum is shown by the following proposition:

Proposition 3
In a normal-form game, Problem (2) may not admit a max even if the followers' game admits a pure NE for every leader's mixed strategy x n .
Proof Consider a game with n = 3, The matrices reported in the following are the utility matrices for, respectively, the case where the leader plays action a 1 3 with probability 1, action a 2 3 with probability 1, or the strategy vector x 3 = (1 − ρ, ρ) for some ρ ∈ [0, 1] (the third matrix is the convex combination of the first two with weights x 3 ): In the optimistic case, one can verify that (a 1 1 , a 2 2 , a 2 3 ) is the unique O-SPNE (as it achieves the largest leader's payoff in U 3 , no mixed strategy x 3 would yield a better utility).
In the pessimistic case, the leader induces the followers' game in the third matrix by playing x 3 = (1 − ρ, ρ). For ρ < 1 2 , (a 1 1 , a 2 2 ) is the unique NE, giving the leader a utility of 5 + 5ρ. For ρ 1 2 , there are two NEs, (a 1 1 , a 2 2 ) and (a 2 1 , a 1 2 ), with a utility of, respectively, 5 + 5ρ and 1. Since, in the pessimistic case, the latter is selected, we conclude that the leader's utility is equal to 5 + 5ρ for ρ < 1 2 and to 1 for ρ 1 2 (see Fig. 1 for an illustration). Thus, Problem (2) admits a supremum of value 5 + 5 2 , but not a maximum. We remark that the result in Proposition 3 is in line with a similar result shown in [34] for the single-follower case, as well as with those which hold for general pessimistic bilevel problems [35].
The relevance of computing a pessimistic SPNE is highlighted by the following proposition:  ) and the NE (a 1 1 , a 2 2 ) for ρ = 0 (with leader's utility 1). Therefore, the game admits a unique O-SPNE achieved at ρ = 0 (utility 1), and a unique P-SPNE achieved at ρ = 1 (utility 4 μ ). See Fig. 2 for an illustration of the leader's utility function.
To show the first part of the claim, it suffices to observe that the ratio between the leader's utility in the unique O-SPNE, which is equal to 1, and that one in a P-SPNE, which is equal to μ 4 , becomes arbitrarily large when letting μ → ∞, whereas the difference between these two quantities approaches 1 for μ approaching ∞.
As to the second part of the claim, after perturbing the value that x 3 takes in the unique O-SPNE by any arbitrarily small > 0 (i.e., by considering the leader's , whose ratio w.r.t. the utility of 1 in the unique O-SPNE becomes again arbitrarily large for μ → ∞, whereas the difference between these two quantities approaches 1 for μ approaching ∞.

Computational Complexity
Let P-SPNE-s be the search version of the problem of computing a P-SPNE for normalform games. In Sect. 4.1, we show that solving P-SPNE is NP-hard for n 3 (i.e., with at least two followers). Moreover, in Sect. 4.2 we prove that for n 4 (i.e., for games with at least three followers) the problem is inapproximable, in polynomial time, to within any polynomial multiplicative factor or to within any constant additive loss unless P = NP. We introduce two reductions, a non approximation-preserving one which is valid for n 3 and another one only valid for n 4 but approximationpreserving.
In decision form, the problem of computing a P-SPNE reads: Definition 1 (P-SPNE-d) Given a normal-form game with n 3 players and a finite number K , is there a P-SPNE in which the leader achieves a utility greater than or equal to K ?
In Sect. 4.1, we show that P-SPNE-d is NP-complete by polynomially reducing to it Independent Set (IND-SET) (one of Karp's original 21 NP-complete problems [16]). In decision form, IND-SET reads: ∈ E) of size greater than or equal to J ?
In Sect. 4.2, we prove the inapproximability of P-SPNE-s for the case with at least three followers by polynomially reducing to it 3-SAT (another of Karp's 21 NP-complete problems [16]). 3-SAT reads: Definition 3 (3-SAT) Given a collection C = {φ 1 , . . . , φ t } of clauses (disjunctions of literals) on a finite set V of Boolean variables with |φ c | = 3 for 1 c t, is there a truth assignment for V which satisfies all the clauses in C?

NP-Completeness
Before presenting our reduction, we introduce the following class of normal-form games: Definition 4 Given two rational numbers b and c with 1 > c > b > 0 and an integer r 1, let Γ c b (r ) be a class of normal-form games with three players (n = 3), the first two having r + 1 actions each with action sets A 1 = A 2 = A = {1, . . . , r , χ} and the third one having r actions with action set A 3 = A\{χ }, such that, for every third player's action a 3 ∈ A\{χ }, the other players play a game where: -the payoffs on the main diagonal (where both players play the same action) satisfy No restrictions are imposed on the third player's payoffs. See Fig. 3 for an illustration of one such game Γ c b (r ) with r = 3, parametric in b and c.
The special feature of Γ c b (r ) games is that, no matter which mixed strategy the third player (the leader) commits to, with the exception of (χ , χ ) only the diagonal outcomes can be pure NEs in the resulting followers' game. Moreover, for every subset of diagonal outcomes there is a leader's strategy such that this subset precisely corresponds to the set of all pure NEs in the followers' game. This is formally stated by the following proposition: : a 1 ∈ A\{χ }} with S = ∅, a leader's strategy x 3 ∈ Δ 3 such that the outcomes (a 1 , a 1 ) ∈ S are exactly the pure NEs in the resulting followers' game.
Proof First, observe that the followers' payoffs that are not on the main diagonal are independent of the leader's strategy x 3 . Thus, any outcome (a 1 , a 2 ) with a 1 , a 2 ∈ A\{χ } and a 1 = a 2 cannot be an NE, as the first follower would deviate by playing action χ so to obtain a utility c > b. Analogously, any outcome (χ , a 2 ) with a 2 ∈ A\{χ } cannot be an NE because the second follower would deviate by playing χ (since b > 0). The same holds for any outcome (a 1 , χ) with a 1 ∈ A\{χ }, since the second follower would be better off playing another action (as b > 0). The last outcome on the diagonal, (χ , χ ), cannot be an NE either, as the first follower would deviate from it (as she would get c in it, while she can obtain 1 > c by deviating).
As a result, the only outcomes which can be pure NEs are those in {(a 1 , a 1 ) : a 1 ∈ A\{χ }}. When the leader plays a pure strategy a 3 ∈ A\{χ }, the unique pure NE in The third player (the leader) selects a matrix, while the first and the second players (the followers) select rows and columns, respectively. The third player's payoffs are defined starting from the graph in Fig. 5, as explained in the proof of Theorem 1 the followers' game is (a 3 , a 3 ) as, due to providing the followers with their maximum payoff, they would not deviate from it. Outcomes (a 1 , a 1 ) with a 1 ∈ A\{χ, a 3 } are not NEs as, with them, the first follower would get 0 < c. In general, if the leader plays an arbitrary mixed strategy x 3 ∈ Δ 3 the resulting followers' game is such that the payoffs in (a 3 , a 3 ) with a 3 ∈ A\{χ } are (x a 3 3 , x a 3 3 ). Noticing that (a 3 , a 3 ) is an equilibrium if and only if x a 3 3 c (as, otherwise, the first follower would deviate by playing action χ ), we conclude that the set of pure NEs in the followers' game is

c}.
In order to guarantee that, for every possible S ⊆ {(a 1 , a 1 ) : a 1 ∈ A\{χ }} with S = ∅, there is a leader's strategy such that S contains all the pure NEs of the followers' game, we must properly choose the value of c. Choosing c 1 r suffices, as, for any set S, the leader's strategy x 3 ∈ Δ 3 such that x a 3 3 = 1 |S| for every a 3 ∈ A\{χ } with (a 3 , a 3 ) ∈ S induces a followers' game in which all the outcomes in S are NEs.
Notice that the followers' game always admits a pure NE for any leader's commitment As shown in Fig. 4 for r = 3, the leader's strategy space Δ 3 is partitioned into 2 r −1 regions, each corresponding to a subset of {(a 1 , a 1 ) : a 1 ∈ A\{χ }} containing those diagonal outcomes which are the only pure NEs in the followers' game. Hence, in a Γ c b (r ) game with c 1 r the number of combinations of outcomes which may constitute the set of pure NEs in the followers' game is exponential in r , and, thus, in the size of the game instance.
Relying on Proposition 5, we can establish the following result: Theorem 1 P-SPNE-d is strongly NP-complete even for n = 3.
Proof For the sake of clarity, we split the proof over multiple steps.
Mapping Given an instance of IND-SET-d, i.e., an undirected graph G = (V , E) and a positive integer J , we construct a special instance Γ (G) of P-SPNE-d of class Γ c b (r ) as follows. Assuming an arbitrary labelling of the vertices In compliance with Definition 4, in which no constraints are specified for the leader payoffs, we define: = U a 2 a 2 a 1 3 = 1 otherwise; -for every a 3 ∈ A\{χ }: U a 3 a 3 a 3 3 = 0 and U χχa 3 3 = 0; -for every a 3 ∈ A\{χ } and for every a 1 , a 2 ∈ A with a 1 = a 2 : U a 1 a 2 a 3 3 = U a 2 a 1 a 3 3 = 0.
As an example, Fig. 5 illustrates an instance of IND-SET-d from which the game depicted in Fig. 3 is obtained by applying our reduction. Finally, let K = J −1 J . Note that this transformation can be carried out in time polynomial in the number of vertices |V | = r . W.l.o.g., we assume that the graph G contains no isolated vertices. Indeed, it is always possible to remove all the isolated vertices from G (in polynomial time), solve the problem on the residual graph, and, then, add the isolated vertices back to the independent set that has been found, still obtaining an independent set.
If. We show that, if the graph G contains an independent set of size greater than or equal to J , then Γ (G) admits a P-SPNE with leader's utility greater than or equal to K . Let V * be an independent set with |V * | = J . Consider the case in which outcomes (a 1 , a 1 ), with v a 1 ∈ V * , are the only pure NEs in the followers' game, and assume that the leader's strategy x 3 is x a 3 3 = 1 |V * | if v a 3 ∈ V * and x a 3 3 = 0 otherwise. Since, by construction, U a 1 a 1 a 3 3 = 1 for all a 3 ∈ A\{χ, a 1 }, the leader's utility at an equilibrium (a 1 , a 1 ) is: Only if. We show that, if Γ (G) admits a P-SPNE with leader's utility greater than or equal to K , then G contains an independent set of size greater than or equal to J . Due to Proposition 5, at any P-SPNE the leader plays a strategyx 3 inducing a set of pure NEs in the followers' game corresponding to c}. We now show that the leader would never play two actions a 1 , a 2 ∈ A\{χ } and {v a 1 , v a 2 } ∈ E with probability greater than or equal to c in a P-SPNE. By contradiction, assume that the leader's equilibrium strategyx 3 is such thatx a 1 3 ,x a 2 3 c. When the followers play the equilibrium (a 1 , a 1 ) (the same holds for (a 2 , a 2 )), the leader's utility is: In the right-hand side, the first term is < 1 (as the leader's payoffs are 1 and c), which is strictly less than −1. It follows that, since (a 1 , a 1 ) (or, equivalently, (a 2 , a 2 )) always provides the leader with a negative utility, she would never playx 3 in an equilibrium. This is because, by playing a pure strategy she would obtain a utility of at least zero (as the followers' game admits a unique pure NE giving her a zero payoff when she plays a pure strategy). As a result, we have U a 3  Note that, in any equilibrium (a 1 , a 1 ) ∈ S * , the leader's utility is: where, in the first summation in the right-hand side, each payoff U a 1 a 1 a 3 3 is equal to 1 (asx a 1 3 c andx a 3 3 c). We show that the same holds for each payoff U a 1 a 1 a 3 3 appearing in the second summation. By contradiction, assume that there exists an action a 3 ∈ A\{χ } such thatx a 3 3 < c and U a 1 a 1 a 3 3 = −1−c c for some equilibrium (a 1 , a 1 ) ∈ S * . By shifting all the probability thatx 3 places on a 3 to actions a 1 such that (a 1 , a 1 ) ∈ S * (so thatx a 3 3 = 0), we obtain a new leader's strategy which induces the same set S * of pure NEs in the followers' game. Moreover, the leader's utility in any equilibrium (a 1 , a 1 ) ∈ S * strictly increases if U a 1 a 1 a 3 3 = −1−c c , while it stays the same when U a 1 a 1 a 3 3 = 1. This contradicts the fact thatx 3 is a P-SPNE. Thus, all the actions a 3 ∈ A\{χ } such thatx a 3 3 < c satisfy U a 1 a 1 a 3 As a result, the leader's utility at an equilibrium (a 3 , a 3 ) ∈ S * is 1 −x a 3 3 . Since, due to the pessimistic assumption, the leader maximizes her utility in the worst NE, her best choice is to select anx 3 such that all NEs yield the same utility, that is: for every a 1 , a 2 with (a 1 , a 1 ), (a 2 , a 2 ) ∈ S * . This results in the leader playing all actions a 3 such that (a 3 , a 3 ) ∈ S * with the same probabilityx a 3 3 = 1 |S * | , obtaining a utility of |S * |−1 |S * | = K . Therefore, the vertices in the set {v a 3 : (a 3 , a 3 ) ∈ S * } form an independent set of G of size |S * | = J . The reduction is, thus, complete.
NP membership Given a triple (a 1 , a 2 , x 3 ) which is encoded with a number bits which is polynomial w.r.t. the size of the game, we can verify in polynomial time whether (a 1 , a 2 ) is an NE in the followers' game induced by x 3 and whether, when playing (a 1 , a 2 , x 3 ), the leader's utility is at least as large as K . The existence of such a triple follows as a consequence of the correctness of either of the two equilibriumfinding algorithms that we propose in Sect. 6-we refer the reader to Sect. 6.2 for a discussion on this. Therefore, we deduce that P-SPNE belongs to NP. Moreover, since in the game of the reduction the players' payoffs are encoded with a polynomial number of bits and due to IND-SET being strongly NP-complete, P-SPNE-d is strongly NP-complete.

Inapproximability
We show now that P-SPNE-s (the search problem of computing a P-SPNE) is not only NP-hard (due to its decision version, P-SPNE-d, being NP-complete), but it is also difficult to approximate. Since the reduction from IND-SET which we gave in Theorem 1 is not approximation-preserving, we propose a new one based on 3-SAT (see Definition 3). We remark that, differently from our previous reduction (which holds for any number of followers greater than or equal to two), this one requires at least three followers.
In the following, given a literal l (an occurrence of a variable, possibly negated), we define v(l) as its corresponding variable. Moreover, for a generic clause we denote the ordered set of possible truth assignments to the variables, namely, where, in each truth assignment, a variable is set to 1 if positive and to 0 if negative. Given a generic 3-SAT instance, we build a corresponding normal-form game as detailed in the following definition. Each action a 4 ∈ {1, . . . , r } is associated with variable v a 4 . The other players share the same set of actions A, with where each action ϕ ca is associated with one of the eight possible assignments of truth to the variables appearing in clause φ c , so that ϕ ca corresponds to the a-th assignment in the ordered set L φ c . For each player p ∈ {1, 2, 3}, we define her utilities as follows: -for each a 4 ∈ A 4 \{w} and for each a 1 ∈ A\{χ } with a 1 = ϕ ca = l 1 l 2 l 3 , 4 and l p is a positive literal or v(l p ) = v a 4 and l p is negative; -for each a 4 ∈ A 4 \{w} and for each The payoff matrix of the fourth player is so defined: -for each a 4 ∈ A 4 and for each a 1 ∈ A\{χ } with a 1 = ϕ ca = l 1 l 2 l 3 , U a 1 a 1 a 1 a 4 4 = if the truth assignment identified by ϕ ca makes φ c false (i.e., whenever, for each p ∈ {1, 2, 3}, the clause φ c contains the negation of l p ), while U a 1 a 1 a 1 a 4 4 = 1 otherwise; -for each a 4 ∈ A 4 and for each a 1 , a 2 , a 3 ∈ A such that a 1 = a 2 ∨a 2 = a 3 ∨a 1 = a 3 , with the addition of the triple (χ, χ, χ), U a 1 a 2 a 3 a 4 4 = 0.
Games adhering to Definition 5 have some interesting properties, which we formally state in the following Propositions 6 and 7 .
First, we give a characterization of the strategy space of the leader in terms of the set of pure NEs in the followers' game. In particular, given a game Γ (C, V ), the leader's strategy space Δ 4 is partitioned according to the boundaries x a 4 4 = 1 r +1 , for a 4 ∈ A 4 \{w}, by which Δ 4 is split into 2 r regions, each corresponding to a possible truth assignment to the variables in V . Specifically, in the assignment corresponding to a region, variable v a 4 takes value 1 if x a 4 4 1 r +1 , while it takes value 0 if x a 4 4 1 r +1 . Moreover, for each a 1 ∈ A\{χ } and a 1 = ϕ ca an outcome (a 1 , a 1 , a 1 ) is an NE in the followers' game only in the regions of the leader's strategy space whose corresponding truth assignment is compatible with the one represented by ϕ ca . For instance, if ϕ ca =v 1 v 2 v 3 the corresponding outcome is an NE only if x 1 4 1 r +1 , x 2 4 1 r +1 , and x 3 4 1 r +1 (with no further restrictions on the other probabilities). Formally, we can claim the following: Proposition 6 Given a game Γ (C, V ) and an action a 1 ∈ A\{χ } with a 1 = ϕ ca = l 1 l 2 l 3 , the outcome (a 1 , a 1 , a 1 ) is an NE of the followers' game whenever the leader commits to a strategy x 4 ∈ Δ 4 such that: Proof Observe that, the followers' payoffs do not depend on the leader's strategy x 4 in the outcomes not in {(a 1 , a 1 , a 1 ) : a 1 ∈ A\{χ }}. Thus, for every a 1 , a 2 , a 3 ∈ A\{χ } such that a 1 = a 2 ∨ a 2 = a 3 ∨ a 1 = a 3 the outcome (a 1 , a 2 , a 3 ) cannot be an NE as the first follower would deviate by playing action χ , obtaining a utility at least as large as 1 r +1 , instead of 1 r +2 . Also, for all a 2 , a 3 ∈ A\{χ } the outcome (χ , a 2 , a 3 ) is not an NE since the second follower would be better off playing χ (as she gets 1 > 0). Analogously, for all a 1 , a 3 ∈ A\{χ } the outcome (a 1 , χ, a 3 ) cannot be an NE as the third follower would deviate to χ (getting a utility of 1 > 0). For all a 3 ∈ A, a similar argument also applies to the outcome (χ , χ , a 3 ) as the first follower would have an incentive to deviate by playing any action different from χ (note that (χ, χ, χ), whose payoffs are defined in the last item of Definition 5, is included). Moreover, for all a 1 ∈ A\{χ } the outcome (a 1 , χ, χ) is not an NE as the second follower would deviate to any other action (getting a utility of 1). For all a 1 , a 2 ∈ A\{χ }, the same holds for the outcome (a 1 , a 2 , χ), where the first follower would deviate and play action χ , and for the outcome (χ , a 2 , χ) where, for all a 2 ∈ \{χ }, the second follower would deviate and play χ .
Therefore, the only outcomes which can be NEs in the followers' game are those in {(a 1 , a 1 , a 1 ) : a 1 ∈ A\{χ }}. Assume that the leader commits to an arbitrary mixed strategy x 4 ∈ Δ 4 . For each a 1 ∈ A\{χ } with a 1 = ϕ ca = l 1 l 2 l 3 and for each p ∈ {1, 2, 3}, the outcome (a 1 , a 1 , a 1 ) provides follower p with a utility of u p such that: 4 4 if v(l p ) = v a 4 and l p is a positive literal; u p = 1 − x a 4 4 if v(l p ) = v a 4 and l p is a negative literal; The outcome (a 1 , a 1 , a 1 ) is an NE if the following conditions hold: u p 1 r +1 for each p ∈ {1, 2, 3} such that l p is positive, as otherwise follower p would deviate and play χ ; u p r r +1 for each p ∈ {1, 2, 3} such that l p is negative, as otherwise follower p would deviate and play χ ; The claim is proven by these conditions, together with the definition of u p .
The characterization of the leader's strategy space given in Proposition 6 establishes the relationship between the leader's utility in a P-SPNE of a game Γ (C, V ) and the feasibility of the corresponding 3-SAT instance. We highlight it in the following proposition. Γ (C, V ), the leader's utility in a P-SPNE is 1 if and only if the corresponding 3-SAT instance is feasible, and it is equal to otherwise.

Proposition 7 Given a game
Proof The result follows form Proposition 6. If the 3-SAT instance is a YES instance (i.e., if it is feasible), there exists then a strategy x 4 ∈ Δ 4 such that all the NEs of the resulting followers' game provide the leader with a utility of 1. This is because there is a region corresponding to a truth assignment which satisfies all the clauses. On the other hand, if the 3-SAT instance is a NO instance (i.e., if it is not satisfiable), then in each region of the leader's strategy space there exits an NE for the followers' game which provides the leader with a utility of . Therefore, the followers would always play such equilibrium due to the assumption of pessimism.
We are now ready to state the result.

Theorem 2 With n
4 and unless P = NP, P-SPNE-s cannot be approximated in polynomial time to within any multiplicative factor which is polynomial in the size of the normal-form game given as input, nor (assuming the payoffs are normalized between 0 and 1) to within any constant additive loss strictly smaller than 1.
Proof Given a generic 3-SAT instance, let us build its corresponding game Γ (C, V ) according to Definition 5. This construction can be done in polynomial time as |A 4 | = r + 1 and |A| = |A 1 | = |A 2 | = |A 3 | = 8t + 1 are polynomials in r and t, and, therefore, the number of outcomes in Γ (C, V ) is polynomial in r and t. Furthermore, let us select ∈ 0, 1 2 r (the polynomiality of the reduction is preserved as 1 2 r is representable in binary encoding with a polynomial number of bits).
By contradiction, let us assume that there exists a polynomial-time approximation algorithm A capable of constructing a solution to the problem of computing a P-SPNE with a multiplicative approximation factor 1 poly(I ) , where poly(I ) is any polynomial function of the size I of the normal-form game given as input. By Proposition 7, it follows that, when applied to Γ (C, V ), A would return an approximate solution with value greater than or equal to 1 · 1 poly(I ) > 1 2 r (for a sufficiently large r ) if and only if the 3-SAT instance is feasible. When the 3-SAT instance is not satisfiable, A would return a solution with value at most 1 2 r . Since this would provide us with a solution to 3-SAT in polynomial time, we conclude that P-SPNE-s cannot be approximated in polynomial time to within any polynomial multiplicative factor unless P = NP.
For the additive case, observe that an algorithm A with a constant additive loss < 1 would return a solution of value at least 1 − for feasible 3-SAT instances and a solution of value at most 1 2 r for infeasible ones. As for any < 1 − 1 2 r this algorithm would allow us to decide in polynomial time whether the 3-SAT instance is feasible or not, a contradiction unless P = NP, we deduce 1 − 1 2 r . Since 1 2 r → 0 for r → ∞, this implies 1, a contradiction.

Single-Level Reformulation and Restriction
In this section, we propose a single-level reformulation of the problem admitting a supremum but, in general, not a maximum, and a corresponding restriction which always admits optimal (restricted) solutions.
For notational simplicity, we consider the case with n = 3 players. Although notationally more involved, the generalization to n 3 is straightforward. With only two followers, Problem (2), i.e., the bilevel programming formulation we gave in Sect. 3.2, reads:

Single-Level Reformulation
In order to cast Problem (3) into a single-level problem, we first introduce the following reformulation of the followers' problem:

Lemma 1
The following MILP, parametric in x 3 , is an exact reformulation of the followers' problem of finding a pure NE which minimizes the leader's utility given a leader's strategy x 3 : Proof Note that, in Problem (3), a solution to the followers' problem satisfies x a 1 1 = x a 2 2 = 1 for some (a 1 , a 2 ) ∈ A 1 × A 2 and x a 1 1 = x a 2 2 = 0 for all (a 1 , a 2 ) = (a 1 , a 2 ). Problem (4) encodes this in terms of the variable y a 1 a 2 by imposing y a 1 a 2 = 1 if an only if (a 1 , a 2 ) is a pessimistic NE. Let us look at this in detail.
Due to Constraints (4b) and (4e), y a 1 a 2 is equal to 1 for one and only one pair (a 1 , a 2 ).
Due to Constraints (4c) and (4d), for all (a 1 , a 2 ) such that y a 1 a 2 = 1 there can be no action a 1 ∈ A 1 (respectively, a 2 ∈ A 2 ) by which the first follower (respectively, the second follower) could obtain a better payoff when assuming that the other follower would play action a 2 (respectively, action a 1 ). This guarantees that (a 1 , a 2 ) be an NE. Also note that Constraints (4c) and (4d) boil down to the tautology 0 0 for any (a 1 , a 2 ) ∈ A 1 × A 2 with y a 1 a 2 = 0.
By minimizing the objective function (which corresponds to the leader's utility), a pessimistic pure NE is found.
To arrive at a single-level reformulation of Problem (3), we rely on linear programming duality to restate Problem (4) in terms of optimality conditions which do not employ the min operator. First, we show the following:

Lemma 2 The linear programming relaxation of Problem (4) always admits an optimal integer solution.
Proof Let us focus on Constraints (4c) and analyse, for all (a 1 , a 2 ) ∈ A 1 × A 2 and 3 3 which multiplies y a 1 a 2 . The coefficient is equal to the regret the first player would suffer from by not playing action a 1 . If equal to 0, we have the tautology 0 0. If the regret is positive, after dividing by 3 3 both sides of the constraint we obtain y a 1 a 2 0, which is subsumed by the nonnegativity of y a 1 a 2 . If the regret is negative, after diving both sides of the constraint again by a 3 ∈A 3 (U a 1 a 2 a 3 1 − U a 1 a 2 a 3 1 )x a 3 3 we obtain y a 1 a 2 0, which implies y a 1 a 2 = 0. A similar reasoning applies to Constraints (4d).
Let us now define O as the set of pairs (a 1 , a 2 ) such that there is as least an action a 1 or a 2 for which one of the followers suffers from a strictly negative regret. We 3 3 < 0}. Relying on O, Problem (4) can be rewritten as: All variables y a 1 a 2 with (a 1 , a 2 ) ∈ O can be discarded. We obtain a problem with a single constraint imposing that the sum of all the y a 1 a 2 variables with (a 1 , a 2 ) / ∈ O be equal to 1. The linear programming relaxation of such problem always admits an optimal solution with y a 1 a 2 = 1 for the pair (a 1 , a 2 ) which achieves the largest value of a 3 ∈A 3 U a 1 a 2 a 3 3 x a 3 3 (ties can be broken arbitrarily), and with y a 1 a 2 = 0 otherwise.
As a consequence of Lemma 2, the following can be established: Proof By relying on Lemma 2, we first introduce the linear programming dual of the linear programming relaxation of Problem (4). Thanks to Constraints 4b, y a 1 ,a 2 ∈ {0, 1} can be relaxed w.l.o.g. into y a 1 ,a 2 ∈ Z + for all a 1 ∈ A 1 , a 2 ∈ A 2 . This way, we do not have to introduce a dual variable for each of the constraints y a 1 ,a 2 1 which would be introduced when relaxing y a 1 ,a 2 ∈ {0, 1} into y a 1 ,a 2 ∈ [0, 1]. Letting α, β a 1 a 2 a 1 1 , and β a 1 a 2 a 2 2 be the dual variables of, respectively, Constraints (4b), (4c), and (4d), the dual reads: A set of optimality conditions for Problem (4) can then be derived by simultaneously imposing primal and dual feasibility for the sets of primal and dual variables (by imposing the respective constraints) and equating the objective functions of the two problems.
The dual variable α can then be removed by substituting it by the primal objective function, leading to Constraints (5e).
The result in the claim is obtained after introducing the leader's utility as objective function and then casting the resulting problem as a maximization problem (in which a supremum is sought).
Since, as shown in Proposition 3, the problem of computing a P-SPNE in a normalform game may only admit a supremum but not a maximum, the same must hold for Problem (5) due to its correctness (established in Theorem 3).
We formally highlight this property in the following proposition, showing in the proof how this can manifest in terms of the variables of the formulation.

Proposition 8 In the general case, Problem (5) may not admit a finite optimal solution.
Proof Consider the game introduced in the proof of Proposition 3 and let x 3 = (1 − ρ, ρ) for ρ ∈ [0, 1]. Adopting, for convenience, the notation (a 1 1 , a 1 2 ) = (1, 1), (a 1 1 , a 2 2 ) = (1, 2), (a 2 1 , a 1 2 ) = (2, 1), and (a 2 1 , a 2 2 ) = (2, 2), Constraints (5e) read: Note that the left-hand sides of the four constraints are all equal to the objective function (i.e., to the leader's utility). Let us consider the case ρ < 0.5 for which, as shown in the proof of Proposition 3, (1, 2) is the unique pure NE in the followers' game. (1,2) is obtained by letting y 12 = 1 and y 11 = y 21 = y 22 = 0, for which the left-hand sides of the four constraints become equal to 7.5 − 5 . Note that such value converges to the supremum as → 0. For this choice of y and letting ρ = 0.5 − for ∈ (0, 0.5] (which is equivalent to assuming ρ < 0.5), the constraints read: As → 0, we have a finite lower bound for β 111 2 and β 221 1 , but we also have β 211 1 + β 212 2 6.5−5 → ∞, which prevents β 211 1 and β 212 2 from taking a finite value. With a similar argument, one can verify that there is no other way of achieving an objective function value approaching 7.5 as, for ρ 5, the third constraint in the original system imposes an upper bound on the objective function value of 1.

A Restricted Single-Level (MILP) Formulation
As state-of-the-art numerical optimization solvers usually rely on the boundedness of their variables when tackling a problem, due to the result in Proposition 8 solving the single-level formulation in Problem (5) may be numerically impossible.
We consider, here, the option of introducing an upper bound of M on both β a 1 a 2 a 1 1 and β a 1 a 2 a 2 2 , for all a 1 ∈ A 1 , a 2 ∈ A 2 , a 1 ∈ A 1 , a 2 ∈ A 2 . Due to the continuity of the objective function, this suffices to obtain a formulation which, although being a restriction of the original one, always admits a maximum (over the reals) as a consequence of Weierstrass' extreme-value theorem. Quite conveniently, this restricted reformulation can be cast as an MILP, as we now show. (5)  M hold for all a 1 ∈ A 1 , a 2 ∈ A 2 , a 1 ∈ A 1 , a 2 ∈ A 2 , and a restricted one when these bounds are not valid.

Theorem 4 One can obtain an exact MILP reformulation of Problem
Proof After introducing the variable z a 1 a 2 a 3 , each bilinear product y a 1 a 2 x a 3 3 in Problem (5) can be linearised by substituting z a 1 a 2 a 3 for it and introducing the McCormick envelope constraints [24], which are sufficient to guarantee z a 1 a 2 a 3 = y a 1 a 2 x a 3 3 if y a 1 a 2 takes binary values [1].
Assuming β a 1 a 2 a 1 1 ∈ [0, M] for each a 1 ∈ A 1 , a 2 ∈ A 2 , a 1 ∈ A 1 , we can restrict ourselves to β a 1 a 2 a 1 1 ∈ {0, M}. This is the case also in the dual (reported in the proof of Theorem 3). Indeed, the dual problem asks for solving the following problem: The min operator ranges over functions (one for each pair (a 1 , a 2 ) ∈ A 1 × A 2 ) defined on disjoint domains (the β 1 , β 2 variables contained in each such function are not contained in any of the other ones). Therefore, we can w.l.o.g. set the value of β 1 and β 2 so that each function be individually maximized. For each (a 1 , a 2 ) ∈ A 1 × A 2 , this is achieved by setting, for each a 1 ∈ A 1 (resp., a 2 ∈ A 2 ) β a 1 a 2 a 1 1 (resp., β ) to its lower bound of 0.
We can, therefore, introduce the variable p by M is explained as follows. Assume that those upper bounds are introduced into Problem (5). If M is not large enough for the chosen x 3 (remember that, as shown in Proposition 8, one may need M → ∞ for x 3 approaching a discontinuity point of the leader's utility function), Constraints (5e) may remain active for some (â 1 ,â 2 ) which is not an NE for the chosen x 3 . Let  (a 1 , a 2 ) be the worst-case NE the followers would play and assume that the righthand side of Constraint (5e) for (â 1 ,â 2 ) is strictly smaller than the utility the leader would obtain if the followers played the NE (a 1 , a 2 ), namely, a 3 ∈A 3 Uâ 1â2 a 3 3 x a 3 3 − a 1 ∈A 1 βâ 1â2 a 1 1 x a 3 3 . Letting y a 1 a 2 = 1, this constraint would be violated (as, with that value of y, the left-hand side of the constraint would be a 3 ∈A 3 U a 1 a 2 a 3 3 x a 3 3 , which we assumed to be strictly larger than the right-hand side). This forces the choice of a different x 3 for which the upper bound of M on β a 1 a 2 a 1 1 and β a 1 a 2 a 2 2 is sufficiently large not to cause the same issue with the worst-case NE corresponding to that x 3 , thus restricting the set of strategies the leader could play.
In spite of this, by solving the MILP reformulation outlined in Theorem 4 we are always guaranteed to find optimal (restricted) solutions to it (if M is large enough for the restricted problem to admit feasible solutions). Such solutions correspond to feasible strategies of the leader, guaranteeing her a lower bound on her utility at a P-SPNE.

Exact Algorithm
In this section, we propose an exact exponential-time algorithm for the computation of a P-SPNE, i.e., of sup x n ∈Δ n f (x n ), which does not suffer from the shortcomings of the formulations we introduced in the previous section. In particular, if there is no x n ∈ Δ n where the leader's utility f (x n ) attains sup x n ∈Δ n f (x n ) (as f (x n ) does not admit a maximum), our algorithm also returns, together with the supremum, a strategŷ x n which provides the leader with a utility equal to an α-approximation (in the additive sense) of the supremum, namely, a strategyx n satisfying sup x n ∈Δ n f (x n )− f (x n ) α for any additive loss α > 0 chosen a priori. We first introduce a version of the algorithm based on explicit enumeration, in Sect. 6.1, which we then embed into a branch-andbound scheme in Sect. 6.3.
In the remainder of the section, we denote the closure of a set X ⊆ Δ n relative to aff(Δ n ) by X , its boundary relative to aff(Δ n ) by bd(X ), and its complement relative to Δ n by X c . Note that, here, aff(Δ n ) denotes the affine hull of Δ n , i.e., the hyperplane in R m containing Δ n .

Computing sup x n ∈1 n f (x n )
The key ingredient of our algorithm is what we call outcome configurations. Letting A F = Ś p∈F A p , we say that a pair (S + , S − ) with S + ⊆ A F and S − = A F \S + is an outcome configuration for a given x n ∈ Δ n if, in the followers' game induced by x n , all the followers' action profiles a −n ∈ S + constitute an NE and all the action profiles a −n ∈ S − do not.
For every a −n ∈ A F , we define X (a −n ) as the set of all leader's strategies x n ∈ Δ n for which a −n is an NE in the followers' game induced by x n . Formally, X (a −n ) corresponds to the following (closed) polytope: x n ∈ Δ n : a n ∈A n U a −n ,a n p x a n n a n ∈A n U a −n ,a n p x a n n ∀ p ∈ F, a p ∈ A p \{a p } with a − n = (a 1 , . . . , a p−1 , a p , a p+1 , . . . , a n−1 ) For every a −n ∈ A F , we also introduce the set X c (a −n ) of all x n ∈ Δ n for which a −n is not an NE. For that purpose, we first define the following set for each p ∈ F: D p (a −n , a p ):= ⎧ ⎨ ⎩ x n ∈ Δ n : a n ∈A n U a −n ,a n p x a n n < a n ∈A n U a −n ,a n p x a n n with a − n = (a 1 , . . . , a p−1 , a p , a p+1 , . . . , a n−1 ) n , a p ), which is a not open nor closed polytope (as it has a missing facet, the one corresponding to its strict inequality), is the set of all values of x n for which player p would achieve a better utility by deviating from a −n and playing a different action a p ∈ A p . For every p ∈ F, a −n ∈ A F , and a p ∈ A p , we call the corresponding set D p (a −n , a p ) degenerate if U a −n ,a n p = U a −n ,a n p for each a n ∈ A n (recall that a −n = (a 1 , . . . , a p−1 , a p , a p+1 , . . . , a n−1 )). In a degenerate D p (a −n , a p ), the constraint a n ∈A n U a −n ,a n p x a n n < a n ∈A n U a −n ,a n p x a n n reduces to 0 < 0. Since, in principle, any player could deviate from a −n by playing any action not in a −n , X c (a −n ) is the following disjunctive set: Notice that, since any point in bd(X c (a −n )) which is not in bd(Δ n ) would satisfy, for some a p , the (strict, originally) inequality of D p (a −n , a p ) as an equation, such point is not in X c (a −n ) and, hence, bd(X c (a −n )) ∩ X c (a −n ) ⊆ bd(Δ n ). The closure X c (a −n ) of X c (a −n ) is obtained by discarding any degenerate D p (a −n , a p ) and by turning the strict constraint in the definition of each nondegenerate D p (a −n , a p ) into a nonstrict one. Note that degenerate sets are discarded as, for such sets, turning their strict inequality into a inequality would result in turning the empty set D p (a −n , a p ) (whose closure is the empty set) into Δ n . An illustration of X (a −n ) and X c (a −n ), together with the closure X c (a −n ) of the latter, is reported in Fig. 6.
By leveraging these definitions, we can now focus on the set of all leader's strategies which realize the outcome configuration (S + , S − ), namely: is not an open nor a closed set. Due to X (S + ) being closed, the only points of bd(X (S + ) ∩ X (S − )) which are not in X (S + ) ∩ X (S − ) itself are the very points in bd(X (S − )) which are not in X (S − ). As a consequence, Let us define the set P:={(S + , S − ) : S + ∈ 2 A F ∧ S − = 2 A F \S + }, which contains all the outcome configurations of the game. The following theorem highlights the structure of f (x n ), suggesting an iterative way of expressing the problem of computing sup x n ∈Δ n f (x n ). We will rely on it when designing our algorithm.
Theorem 5 Let ψ(x n ; S + ):= min a −n ∈S + a n ∈A n U a −n ,a n n x a n n . The following holds: Proof Let Δ n be the set of leader's strategies x n for which there exists a pure NE in the followers' game induced by x n , namely, Δ n :={x n ∈ Δ n : f (x n ) > −∞}. Since, by definition, f (x n ) = −∞ for any x n / ∈ Δ n and the supremum of f (x n ) is finite due to the finiteness of the payoffs (and assuming the followers' game admits at least a pure NE for some x n ∈ Δ n ), we can, w.l.o.g., focus on Δ n and solve sup x n ∈Δ n f (x n ). In particular, the collection of the sets X (S + ) ∩ X (S − ) = ∅ which are obtained for all (S + , S − ) ∈ P forms a partition of Δ n . Due to the fact that at any x n ∈ X (S + ) ∩ X (S − ) the only pure NEs induced by x n in the followers' game are those in S + , f (x n ) = ψ(x n ; S + ). Since the supremum of a function defined over a set is equal to the largest of the suprema of that function over the subsets of such set, we have: What remains to show is that the following relationship holds for all X (S + )∩ X (S − ) = ∅: Since ψ(x n ; S + ) is a continuous function (it is the point-wise minimum of finitely many continuous functions), its supremum over X (S + ) ∩ X (S − ) equals its maximum over the closure X (S + ) ∩ X (S − ) of that set. Hence, the relationship follows due to In particular, Theorem 5 shows that f (x n ) is a piecewise function with a piece for each set X (S + ) ∩ X (S − ), each of which corresponding to the (continuous over its domain) piecewise-affine function ψ(x n ; S + ). It follows that the only discontinuities of f (x n ) (due to which f (x n ) may admit a supremum but not a maximum) are those where, in Δ n , x n transitions from a set X (S + ) ∩ X (S − ) to another one.
We show how to translate the formula in Theorem 5 into an algorithm by proving the following theorem: Theorem 6 There exists a finite, exponential-time algorithm which computes sup x n ∈Δ n f (x n ) and, whenever sup x n ∈Δ n f (x n ) = max x n ∈Δ n f (x n ), also returns a strategy x * n with f (x * n ) = max x n ∈Δ n f (x n ). Proof The algorithm relies on the expression given in Theorem 5. All pairs (S + , S − ) ∈ P can be constructed by enumeration in time exponential in the size of the instance. 5 In particular, the set P contains 2 m n−1 outcome configurations, each corresponding to a bi-partition of the outcomes of the followers' game into S + and S − (there are m n−1 such outcomes, due to having m actions and n − 1 followers).
For every p ∈ F, let us define the following sets, parametric in 0: x n ∈ Δ n : a n ∈A n U a −n ,a n p x a n n + a n ∈A n U a −n ,a n p x a n n with a − n = (a 1 , . . . , a p−1 , a p , a p+1 , . . . , a n−1 ) We can verify whether X (S + ) ∩ X (S − ) = ∅ by verifying whether there exists some > 0 such that X (S + ) ∩ X (S − ; ) = ∅. This can be done by solving the following problem and checking the strict positivity of in its solution: max In the algorithm, to verify whether f (x n ) admits max x n ∈Δ n f (x n ) (and to compute it if it does) we solve the following problem (rather than the aforementioned max x n ∈X (S + )∩X (S − ) ψ(x n ; S + )): This problem calls for a pair (x n , ) with x n ∈ X (S + ) ∩ X (S − ; ) such that, among all pairs which maximize ψ(x n ; S + ), is as large as possible. This way, in any solution (x n , ) with > 0 we have x n ∈ X (S + ) ∩ X (S − ) (rather than x n ∈ X (S + ) ∩ X (S − )).
Since, there, ψ(x n ; S + ) = f (x n ), we conclude that f (x n ) admits a maximum (equal to the value of the supremum) if > 0, whereas it only admits a supremum if = 0. Problem (10) can be solved in, at most, exponential time by solving the following lex-MILP: s.t. η a n ∈A n U a −n ,a n n x a n n ∀a −n ∈ S + where η is maximized first, and second. In practice, it suffices to solve two MILPs in sequence: one in which the first objective function is maximized, and then another one in which the second objective function is maximized after imposing the first objective function to be equal to its optimal value.

Finding an˛-Approximate Strategy
For those cases where f (x n ) does not admit a maximum, we look for a strategyx n such that, for any given additive loss α > 0, sup x n ∈Δ n f (x n ) − f (x n ) α, i.e., for an (additively) α-approximate strategyx n . Its existence is guaranteed by the following lemma: Lemma 3 Consider the sets X ⊆ R n , for some n ∈ N, and Y ⊆ R, and a function f : X → Y with s:= sup x∈X f (x) satisfying s < ∞. For any α ∈ (0, s], there exists then an x ∈ X : s − f (x) α.
Proof By negating the conclusion, we deduce the existence of some α ∈ (0, s] such that, for every x ∈ X , s − f (x) > α. Then, f (x) < s − α for all x ∈ X . This implies s = sup x∈X f (x) s − α < s: a contradiction.
After running the algorithm we outlined in the proof of Theorem 5 to compute the value of the supremum, an α-approximate strategyx n can be computed a posteriori thanks to the following result: Theorem 7 Assume that f (x n ) does not admit a maximum over Δ n and that, according to the formula in Theorem 5, s:= sup x n ∈Δ n f (x n ) is attained at some outcome configuration (S + , S − ). Then, an α-approximate strategyx n can be computed for any α > 0 in at most exponential time by solving the following MILP: max ,x n s.t. a n ∈A n U a −n ,a n n x a n n s − α ∀a −n ∈ S + Proof Let x * n ∈ X (S + )∩X (S − ) be the strategy where the supremum is attained according to the formula in Theorem 5, namely, where ψ(x * n , S + ) = max x n ∈X (S + )∩X (S − ) ψ(x n ; S + ) = s. Problem (12) calls for a solution x n of value at least s − α (thus, for an α-approximate strategy) belonging to X (S + )∩ X (S − ; ) with as large as possible, whose existence is guaranteed by Lemma 3. Due to the lexicographic nature of the algo-rithmLet (x n ,ˆ ) be an optimal solution to Problem (12).
. Thus, f (x n ) is continuous at x n =x n , implying ψ(x n ; S + ) = f (x n ). Therefore, by playingx n the leader achieves a utility of at least s − α.

Outline of the Explicit Enumeration Algorithm
The complete enumerative algorithm is detailed in Algorithm 1. In the pseudocode, CheckEmptyness(S + , S − ) is a subroutine which looks for a value of 0 which is optimal for Problem (7), while Solve-lex-MILP(S + , S − ) is another subroutine which solves Problem (11). Note that Problem (7) may be infeasible. If this is the case, we assume that CheckEmptyness(S + , S − ) returns = 0, so that the outcome configuration (S + , S − ) is discarded. Let us also observe that (in Algorithm 1) Problem (11) cannot be infeasible, as it is always solved for an outcome configuration (S + , S − ) whose corresponding Problem (7) is feasible. Due to the lexicographic nature of the algorithm, f (x n ) admits a maximum if and only if the algorithm returns a solution with best. * > 0. If best. * = 0, x * n is just a strategy where sup x n ∈Δ n f (x n ) is attained (in the sense of Theorem 5). In the latter case, an α-approximate strategy is found by invoking the procedure Solve-MILP-approx(best.S + , best.S − , best_value), which solves Problem (12) on the outcome configuration (best.S + , best.S − ) on which the supremum has been found.
In Appendix A.1, we report the illustration of the execution of Algorithm 1 on a normal-form game with two followers.

On The Polynomial Representability of P-SPNEs
The algorithm that we have presented is based on solving Problem 11 a number of times, once per outcome configuration (S + , S − ) ∈ P.
As Problem 11 is an MILP, its solutions can be computed by a standard branch-andbound algorithm based on solving, in an enumeration tree, a set of linear programming relaxations of Problem 11 in which the value of (some of) its binary variables is fixed to either 0 or 1. We remark that both Problem 11 and its linear programming relaxations with fixed binary variables contain a polynomial (in the size of the game) number of variables and constraints. Moreover, all the coefficients in the problem are polynomially bounded, as they are produced by adding/subtracting the players' payoffs.
Since the extreme solution of a linear programming problem can be encoded by a number of bits which is also bounded by a polynomial function of the instance size (see Lemma 8.2, page 373, in [9]), we have that any x n which (for some followers' action profile a −n ) constitutes a P-SPNE can be succintly encoded by a polynomial number of bits. This observation completes the proof of Theorem 1, showing that P-SPNE-d belongs to NP.

Branch-and-Bound Algorithm
As it is clear, computing sup x n ∈Δ n f (x n ) with the enumerative algorithm can be impractical for any game of interesting size, as it requires the explicit enumeration of all the outcome configurations of a game-many of which will, incidentally, yield empty regions X (S + ) ∩ X (S − ). A more efficient algorithm, albeit one still running in exponential time in the worst-case, can be designed by relying on a branch-and-bound scheme.

Computing sup x n ∈1 n f (x n )
Rather than defining S − = A F \S + , assume now S − ⊆ A F \S + . In this case, we call the corresponding pair (S + , S − ) a relaxed outcome configuration.
Starting from any followers' action profile a −n ∈ A F with X (a −n ) = ∅, the algorithm constructs and explores, through a sequence of branching operations, two search trees, whose nodes correspond to relaxed outcome configurations. One tree accounts for the case where a −n is an NE and contains the relaxed outcome configuration (S + , S − ) = ({a −n }, ∅) as root node. The other tree accounts for the case where a −n is not an NE, featuring as root node the relaxed outcome configuration If S − ⊂ A F \S + (which can often be the case when relaxed outcome configurations are adopted), solving max x n ∈X (S + )∩X (S − ) ψ(x n ; S + ) might not give a strategy x n for which the only pure NEs in the followers' game it induces are those in S + , even if x n ∈ X (S + )∩ X (S − ) (rather than x n ∈ X (S + )∩ X (S − )). This is because, due to S + ∪ S − ⊂ A F , there might be another action profile, say a −n ∈ A F \(S + ∪ S − ), providing the leader with a utility strictly smaller than that corresponding to all the action profiles in S + . Since, if this is the case, the followers would respond to x n by playing a −n rather than any of the action profiles in S + , max x n ∈X (S + )∩X (S − ) ψ(x n ; S + ) could be, in general, strictly larger than sup x n ∈Δ n f (x n ), thus not being a valid candidate for the computation of the latter.
In order to detect whether one such a −n exists, it suffices to carry out a feasibility check (on x n ). This corresponds to looking for a pure NE in the followers' game different from those in S − (which may become NEs on bd(X (S + ) ∩ X (S − )) which minimizes the leader's utility-this can be done by inspection in O(m n−1 ). If the feasibility check returns some a −n / ∈ S + , the branch-and-bound tree is expanded by performing a branching operation. Two nodes are introduced: a left node with (S + L , S − L ) where S + L = S + ∪ {a −n } and S − L = S − (which accounts for the case where a −n is a pure NE), and a right node with (S + R , S − R ) where S + R = S + and S − R = S − ∪ {a −n } (which accounts for the case where a −n is not a pure NE). If, differently, a −n ∈ S + , then ψ(x n ; S + ) represents a valid candidate for the computation of sup x n ∈Δ n f (x n ) and, thus, no further branching is needed (and (S + , S − ) is a leaf node).
The bounding aspect of the algorithm is a consequence of the following proposition:

Proposition 9
Solving max x n ∈X (S + )∩X (S − ) ψ(x n ; S + ) for some relaxed outcome configuration (S + , S − ) gives an upper bound on the leader's utility under the assumption that all followers' action profiles in S + constitute an NE and those in S − do not.
Proof Due to (S + , S − ) being a relaxed outcome configuration, there could be outcomes not in S + which are NEs for some x n ∈ X (S + ) ∩ X (S − ). Due to ψ(x n ; S + ) being defined as min a −n ∈S + a n ∈A n U a −n ,a n n x a n n , ignoring any such NE at any x n ∈ X (S + ) ∩ X (S − ) can only result in the min operator considering fewer outcomes a −n , thus overestimating ψ(x n ; S + ) and, ultimately, f (x n ). Thus, the claim follows.
As a consequence of Proposition 9, optimal values obtained when computing the value of max x n ∈X (S + )∩X (S − ) ψ(x n ; S + ) throughout the search tree can be used as bounds as in a standard branch-and-bound method.
Since max x n ∈X (S + )∩X (S − ) ψ(x n ; S + ) is not well-defined for nodes where S + = ∅, for them we solve, rather than an instance of Problem (11), a restriction of the optimistic problem (see Sect. 3) with constraints imposing that all followers' action profiles in S − are not NEs. We employ the following formulation, which we introduce directly for the lexicographic case: max y,x n , a∈A U a −n ,a n n y a −n x a n n ; (13a) with a − n = (a 1 , . . . , a p−1 , a p , a p+1 , . . . , a n−1 ) The problem can be turned into a lex-MILP by linearising each bilinear product y a −n x a n n by means of McCormick's envelope constraints and by restating Constraint (13f) as done in the MILP Constraints 8.

Finding an˛-Approximate Strategy
In the context of the branch-and-bound algorithm, an α-approximate strategyx n cannot be found by just relying on the a posteriori procedure outlined in Theorem 7. This is because when (S + , S − ) is a relaxed outcome configuration there might be an action profile a −n ∈ A F \(S + ∪ S − ) (i.e., one not accounted for in the relaxed outcome configuration) which not only is an NE in the followers' game induced byx n , but which also provides the leader with a utility strictly smaller than ψ(x n ; S + ). If this is the case, the strategyx n found with the procedure of Theorem 7 may return a utility arbitrarily smaller than the supremum s and, in particular, smaller than s − α.
To cope with this shortcoming and establish whether such an a −n exists, we first computex n according to the a posteriori procedure of Theorem 7 and, then, perform a feasibility check. If we obtain an action profile a −n ∈ S + ,x n is then an α-approximate strategy and the algorithm halts. If, differently, we obtain some a −n / ∈ S + for which the leader obtains a utility strictly smaller than ψ(x n ; S + ), we carry out a new branching operation, creating a left and a right child node in which a −n is added to, respectively, S + and S − . This procedure is then reapplied on both nodes, recursively, until a strategy x n for which the feasibility check returns an action profile in S + is found. Such a strategy is, by construction, α-approximate.
Observe that, due to the correctness of the algorithm for the computation of the supremum, there cannot be at x * n an NE a −n worse than the worst-case one in S + . If a new outcome a −n becomes the worst-case NE atx n , due to the fact that it is not a worst-case NE at x * n there must be a strategyx n which is a convex combination of x * n andx n where either a −n is not an NE or, if it is, it yields a leader's utility not worse than that obtained with the worst-case NE in S + . An α-approximate strategy is thus guaranteed to be found on the segment joiningx n and x * n by applying Lemma 3 with X equal to that segment. Thus, the algorithm is guaranteed to converge.

Outline of the Branch-and-Bound Algorithm
The complete outline of the branch-and-bound algorithm is detailed in Algorithm 2. F is the frontier of the two search trees, containing all nodes which have yet to be explored. Initialize() is a subprocedure which creates the root nodes of the two search trees, while pick() extracts from F the next node to be explored. Feasibil-ityCheck(x n , S − ) performs the feasibility check operation for the leader's strategy x n , looking for the worst-case pure NE in the game induced by x n and ignoring any outcome in S − . CreateNode(S + , S − ) (detailed in Algorithm 3) adds a new node to F, also computing its upper bound and the corresponding values of x n and . More specifically, CreateNode(S + , S − ) performs the same operations of a generic step of the enumerative procedure in Algorithm 1 for a given S + and S − , with the only difference that, here, we invoke the subprocedure Solvelex-MILP-Opt(S + , S − ) whenever S + = ∅ to solve Problem (13), while we invoke Solve-lex-MILP(S + , S − ) to solve Problem (11) if S + = ∅. In the last part of the algorithm, Solve-MILP-approx(best.S + , best.S − , best_val) attempts to compute an α-approximate strategy as done in Algorithm 1. In case the feasibility check fails for it, we call the procedure Branch-and-Bound-approx(best.S + , best.S − , best.x * n ), which runs a second branch-and-bound method, as described in Sect. 6.3.2, until an α-approximate solution is found.
In Appendix A.2, we report the illustration of the execution of Algorithm 2 on a normal-form game with two followers.
For MILP and Bnb-α, we report the results for different values of M and α. BnB-sup and BnB-α are initialized with an outcome which results in an O-SPNE for some leader's strategy. Specifically, we add it to S + in the starting node with empty S − and to S − in the starting node with empty S + . The next node to explore is always selected according to a best-bound rule. We generate a testbed of random normal-form games with payoffs independently drawn from a uniform distribution over [1,100], using GAMUT [27]. The results are then normalized to the interval The experiments are run on a UNIX machine with a total of 32 cores working at 2.3 GHz, equipped with 128 GB of RAM. The computations are carried out on a single thread, with a time limit of 3600 s per instance. Table 2 reports the results on games with two followers (n = 3) and m 30, comparing QCQP, MILP (with M = 10, 100, 1000), BnB-sup, and BnB-α (with α = 0.001, 0.01, 0.1).

Experimental Results with Two Followers
QCQP can be solved only for instances with up to m = 18 due to BARON running out of memory on larger games. With m 18, feasible solutions are found, on average, 6 When solving QCQP and MILP, Gap corresponds to the gap "internal" to the solution method. Since QCQP and MILP impose artificial restrictions (present by design in MILP and introduced automatically by the solver in QCQP), such value is, in general, not valid for the original, unrestricted problem. This is not the case for BnB-sup and BnB-α, for which Gap is a correct estimate of the difference between the best found LB and the value of the supremum (overestimated by UB).

Table 2
Experimental results for games with n  in 91% of the cases, but their quality is quite poor (the additive gap is equal to 0.34 on average). The time limit is reached on almost each instance, even those with m = 4, with the sole exception of those with m = 18, on which the solver halts prematurely due to memory issues. MILP performs much better than QCQP, handling instances with up to m = 30 actions per player. M = 100 seems to be the best choice, for which we obtain, on average, LBs of 0.68 and gaps of 0.28, with a computing time slightly smaller than 2600 s. For M = 1000, the number of feasible solutions found increases from 94% to 97%, but LBs and gaps become slightly worse, possibly due to the fact that MILP solvers are typically quite sensitive to the magnitude of "big M" coefficients (which, if too large, can lead to large condition numbers, resulting in numerical issues).
BnB-sup substantially outperforms QCQP and MILP, finding not just feasible solutions but optimal ones for every game instance with m 25 and solving to optimality 47% of the instances with m = 30. The average computing time is of 359 s, and it reduces to 126 if we only consider the instances with m 25 (all solved to optimality). BnB-sup shows that the supremum of the leader's utility is very large on the games in our testbed, equal to 0.96 on average on the instances with m 25 for which the supremum is computed exactly.
The time taken by BnB-α to find an α-approximate strategy is, in essence, unaffected by the value of α. Since, in its implementation, BnB-α requires a relaxed outcome configuration on which the value of the supremum has been attained to compute an α-approximate strategy, we have run it only on instances with m 25 (on which the supremum has always been computed by BnB-sup). Table 3 reports further results obtained with BnB-sup for games with n = 3 and up to m = 70 actions per player. As the table shows, while some optimal solutions can still be found for m = 35, optimality is lost for game instances with m 40. Nevertheless, BnB-sup still manages to find feasible solutions for instances with up to m = 70, obtaining solutions with an average LB of 0.55 and an average additive gap of 0.44. Under the conservative assumption that games with 35 m 70 admit suprema of value close to 1 (which is empirically true when m 30), BnB-sup provides, on average, solutions that are less than 50% off of optimal ones.

Experimental Results with More Followers and Final Observations
Results obtained with BnB-sup with more than two followers (n = 4, 5) are reported in Table 4 for m 14. For the sake of comparison, we also report the results obtained for the same values of m and n = 3 that are contained in Tables 2 and 3 .
As the table illustrates, computing the value of the supremum of the leader's utility becomes very hard already for m = 12 with n = 4, for which the algorithm manages to find optimal solution in only 60% of the cases. For m = 14 and n = 4, no instance is solved to optimality within the time limit. For n = 5, the problem becomes hard already for m = 8, where only 53% of the instances are solved to optimality. With m = 12 and n = 5, no instances at all are solved to optimality.
We do not report results on game instances with n = 4, 5 and m > 14 as such games are so large that, on them, BnB-sup incurs memory problems when solving the MILP subproblems.
In spite of the problem of computing a P-SPNE being a nonconvex pessimistic bilevel program, with our branch-and-bound algorithm we can find solutions with an additive optimality gap 0.01 for three-player games with up to m = 20 actions (containing three payoffs matrices with 8000 entries each), which are comparable, in size, to those solved in previous works which solely tackled the problem of computing a single NE maximizing the social welfare, see, e.g., [31].

Conclusions and Future Works
We have shown that the problem of computing a pessimistic Stackelberg equilibrium with multiple followers playing pure strategies simultaneously and noncooperatively (reaching a pure Nash equilibrium) is NP-hard with two or more followers and inapproximable in polynomial time (to within multiplicative polynomial factors and constant additive losses) when the number of followers is three or more unless P = NP. We have proposed an exact single-level QCQP reformulation for the problem, with a restricted version which we have cast into an MILP, and an exact exponential-time algorithm (which we have then embedded in a branch-and-bound scheme) for finding the supremum of the leader's utility and, in case there is no leader's strategy where such value is attained, also an α-approximate strategy.
Future developments include establishing the approximability status of the problem with two followers, the generalization to the case with both leader and followers playing mixed strategies, partially addressed in [4,5] (even though we conjecture that this problem could be much harder, probably p 2 -hard), and the study of structured games (e.g., congestion games beyond the special case of singleton games with monotonic costs which are shown to be polynomially solvable in [11,20]).
The algorithms we have proposed can constitute a useful framework for developing solution methods for games in which the normal-form representation cannot be assumed as input. Retaining the main structure of our algorithms, such games could be tackled by adapting the subproblems that are solved for each (relaxed) outcome configuration to the case where the followers' actions cannot be all taken into account explicitly. For outcomes in S + , a cutting plane method could be employed to generate a best response for each of the followers iteratively, without having to generate all of them a priori. For outcomes in S − , one could adopt a column generation approach to iteratively add sets D p (a −n , a p ) for different followers p ∈ F and action profiles a −n ∈ S − , thus iteratively enlarging the set of strategies the leader could play to improve her utility while guaranteeing that the outcomes in S − are not Nash equilibria.
One could also address solution concepts in which, in case the followers' game admitted multiple Nash equilibria, the followers would choose one which maximizes a sequence of objective functions in the lexicographic sense. For instance, they could, first, look for an equilibrium which maximizes the social welfare or their total utility, breaking ties by choosing one which also maximizes (optimistic case) or minimizes (pessimistic case) the leader's utility. Our algorithm could be extended to this case by casting the subproblem which is solved for each (relaxed) outcome configuration as a bilevel programming problem where the leader looks for a strategy x n which maximizes her utility at either the best (optimistic case) or the worst (pessimistic case) equilibrium for the followers among those which maximize their collective utility (social welfare or total utility).

A. 1 Illustration of the Explicit Enumeration Algorithm
We show how Algorithm 1 works on the example provided above. The algorithm iterates over all the outcome configurations (S + , S − ) ∈ P by enumerating all the subsets of followers' action profiles S + ⊆ A F , with S − = A F \S + . For the ease of presentation, we denote by 1 , 2 , 3 , and 4 the followers' action profiles (a 1 1 , a 1 2 ), (a 1 1 , a 2 2 ), (a 2 1 , a 1 2 ), and (a 2 1 , a 2 2 ), respectively. When convenient, we represent a leader's strategy x 3 ∈ Δ 3 via a single parameter ρ ∈ [0, 1], letting x 3 = (1 − ρ, ρ). The following is a detailed description of all the iterations performed by the algorithm. Note that the iteration corresponding to S + = ∅ can always be omitted, as, in that case, S − = { 1 , 2 , 3 , 4 } = A F and f (x 3 ) = −∞ for any x 3 ∈ X (S − ) since the followers' game for x 3 has no pure NEs. Fig. 7 The leader's utility in the normal-form game used for the illustration of the algorithms, plotted as a function of ρ, where the leader's strategy is