Computing mixed strategies equilibria in presence of switching costs by the solution of nonconvex QP problems

In this paper we address a game theory problem arising in the context of network security. In traditional game theory problems, given a defender and an attacker, one searches for mixed strategies which minimize a linear payoff functional. In the problem addressed in this paper an additional quadratic term is added to the minimization problem. Such term represents switching costs, i.e., the costs for the defender of switching from a given strategy to another one at successive rounds of the game. The resulting problem is a nonconvex QP problem with linear constraints. We prove that, though simple when one of the two terms in the objective function is missing, the problem is, in general, a NP-hard one. The proof is based on a reduction from the decision version of the clique problem. Next, we discuss different reformulations and bounds for such problem. Finally, we show, through an extensive set of computational experiments, that the most promising approach to tackle this problem appears to be a branch-and-bound approach where a predominant role is played by an optimality-based domain reduction, based on the multiple solutions of LP problems at each node of the branch-and-bound tree. The computational cost per node is, thus, high but this is largely compensated by the rather small size of the corresponding tree.


Introduction
Consider a finite two-person zero-sum game Γ , composed from a player set N = {1, 2}, each member thereof having a finite strategy space S 1 , S 2 associated with it, and a utility function u i : S 1 × S 2 → R for all i ∈ N . We assume a zero-sum competition, making u 2 := −u 1 hereafter. The game is then the triple Γ = (N, S = {S 1 , S 2 }, H = {u 1 , −u 1 }), and is most compactly represented by giving only the payoff function u 1 in matrix form (since the strategy spaces are finite) as A ∈ R |S1|×|S2| = (u 1 (x, y)) (x,y)∈S1×S2 .
An equilibrium in Γ is a simultaneous optimum for both players w.r.t. u 1 . Assuming a maximizing first player, an equilibrium is a pair (x * , y * ) satisfying the saddle-point condition It is well known that many practical games do not have such an equilibrium point; as one of the simplest instances, consider the classical rock-scissors-paper game, represented by the payoff matrix rock scissors paper rock This game has no equilibria in pure strategies: any fixed choice of rock, scissors or paper would imply a constant loss for the first player (and likewise for the second player). This means that player 1 is forced to randomize its actions in every round of the game, and this concept leads to the idea of mixed extensions of a game, which basically changes the above optimization problem into one over the convex hulls ∆(S 1 ), ∆(S 2 ) of the action spaces, rather than the finite sets S 1 , S 2 . An element of ∆(S i ) is then a probability distribution over the elements of the support S i , and prescribes to pick a move at random whenever the game is played.
The game rewards its players after each round, and upon every new round, both players are free to choose another element from their action space at random. Implicitly, this choice is without costs, but what if not? Many real life instances of games do incur a cost for changing one's action from a 1 ∈ S 1 in the first to some distinct a 2 ∈ S 1 in the next round. The area of system security [1,19] offers rich examples of such instances, such as (among many): -Changing passwords [14]: if the currently chosen password is p 1 and we are obliged to pick a fresh password (say, different from the last couple of passwords that we had in the past), the use of the new password p 2 = p 1 induces quite some efforts, as we have to memorize the password, while choosing it as hard as possible to guess. -Patrolling and surveillance [2,13]: consider a security guard on duty to repeatedly check a few locations, say A, B, . . . , E, which are connected at distances as depicted in Fig. 1. This is a chasing-evading game with the guard acting as player 1 against an intruder being player 2, and with the payoff function u being an indicator of whether the guard caught the intruder at location i ∈ {A,. . . ,E}, or whether the two missed each other. This is yet another instance of a game with all equilibria in mixed strategies, but with the unpleasant side-effect for the guard that gets the prescription to randomly spot check distant locations to "play" the equilibrium x * , the guard would have to move perhaps long distances between the locations. For example, if it is at A in round 1 and the next sample from the random distribution x * ∈ ∆({A,. . . ,E}) tells to check point E next, the shortest path would be of length 1 + 3 + 2 = 6 over C. Starting from A, however, it would be shorter and hence more convenient for the guard to check location B first along the way, but this would mean deviating from the equilibrium! A normal game theoretic equilibrium calculation does not consider this kind of investment to change the current strategy. This may not even count as bounded rationality, but simply as acting "economic" from the guard's perspective. But acting economically here is then not governed by a utility maximizing principle, but rather by a cost minimization effort.
-Generalizing the patrolling game example, the issue applies to all sorts of moving target defense: for example, changing the configuration of a computer system so as to make it difficult for an attacker to break in, often comes with high efforts and even risks for the defending player 1 (the system administrator), since it typically means taking off machines from the network, reconfiguring them to close certain vulnerabilities, and then putting them back to work hoping that everything restarts and runs smoothly again. A normal game theoretic model accounts only for the benefits of that action, but not for the cost of taking the action.
Including the cost to switch from one action to the next is more complicated than just assigning a cost function c : S 1 → R and subtracting this from the utilities to redefine them as u 1 (i, j) = u 1 (i, j) − c(i), since the cost to play a i will generally depend on the previous action a j played in the previous round.
We can model this sort of payoff by another function s : S 1 × S 1 → R that we call the switching cost. The value of s(i, j) is then precisely the cost incurred to change the current action i ∈ S 1 into the action j ∈ S 1 in the next round of the game. Intuitively, this adds another payoff dimension to the game, where a player, w.l.o.g. being player 1 in the following, plays "against itself", since the losses are implied by its own behavior. While the expected payoffs in a matrix game A under mixed strategies x ∈ ∆(S 1 ), y ∈ ∆(S 2 ) are expressible by the bilinear functional x T Ay, the same logic leads to the hypothesis that the switching cost should on average be given by the quadratic functional x T Sx, where the switching cost matrix is given, like the payoff matrix above, as This intuition is indeed right [15], but for a rigorous problem statement, let us briefly recap the derivation given independently later by [24] to formally state the problem.

The problem
Let the game come as a matrix A ∈ R n×m with equilibrium (x * , y * ), and let it be repeated over the time t ∈ N. At each time t, let X t ∼ x * be the random action sampled from the equilibrium distribution over the action space (with x * being the optimal distribution). By the implicit yet standard assumption of stochastic independence between two repetitions of the game, we have Pr(X t−1 = i, X t = j) = Pr(X t−1 = i) · Pr(X t = j), so that any future system state remains equally predictable whether or not the current state of the system is known. Hence, the switching cost can be written as With this, player 1's payoff functional becomes vector-valued now as and the game is multi-objective for the first player. As we are interested mostly in the best behavior for player 1 and the analysis would be symmetric from player 2's perspective, we shall not explore the view of the second player hereafter.

Remark 1
The game could be equally well multi-objective for the second player too, and in fact a practical instance of such a situation may also come from security: it could be in an adversary's interest to "keep the defender busy", thus causing much friction by making the defender move fast from one place to the other. This is yet just another instance of a denial-of-service attack, to which such a game model would apply.
For the sake of computing a multi-objective equilibrium, more precisely a Pareto-Nash equilibrium, the algorithm in [16] based on the method laid out in [11] proceeds by scalarizing (1) by choice of some α ∈ (0, 1), to arrive at the real-valued goal function for the first player to optimize. Now, the usual way from here to an optimization problem for player 1 involving a rational opponent applies as for standard matrix games [15]: let e i ∈ R m be i-th unit vector, then arg max y∈∆(S2) (x T Ay) = arg max i (x T Ae i ).
which is almost the familiar optimization program to be solved for a Nash equilibrium in a finite matrix game. It differs from the well known linear program only in the quadratic term, and in fact, the equilibrium problem for matrix games is recovered by substituting α = 1 in (2). The problem would again become trivial for α = 0, since in that case, only the switching cost matters and hence every degenerate distribution corresponding to a strategy i ∈ S 1 that never changes is directly an equilibrium. Excluding the standard equilibria obtained at α = 1 and the meaningless results expected for α = 0, problem (2) is interesting only for values of α strictly between 0 and 1.
Note that the matrix S in the quadratic term will (in most cases) have a zero diagonal, be indefinite and not symmetric in general (patrolling game example given above already exhibits a variety of counterexamples leading to nonsymmetric distance matrices S if the graph is directed). Of course, symmetry of S can be easily recovered, so in what follows we will assume that S is symmetric.
Remark 2 (On Alternatives to Solve the Switching Cost Problem) Game models that reward players based on their past actions are stochastic games. Extending equilibria to this class of games is possible [18], yet can render such models practically infeasible to set up (the model then contains the specification of a stochastic chain whose states equal different payoff structures, and the modeler is required to specify transition probabilities between different such games; an information that people in practice may lack or find difficult to reliably give).
The dependence of next actions on past ones extends to other scenarios too: for example, if the game is about coordination in wireless settings (e.g., collaborative drones), the players, e.g., drones, share a common communication channel. Every exchange of information occupies that channel for a limited period of time, thus constraining what the other players can do at the moment. Such effects can be described by stochastic games, but depending on how far the effect reaches in the future, backward inductive solution methods may become computationally infeasible [8]; likewise, extending the strategy space to plan ahead a fixed number of k steps (to account for one strategy determining the next k repetitions of the game) may exponentially enlarge the strategy space (by a factor of 2 O(k) , making the game infeasible to analyze if k is large). Games with switching cost offer a neat bypass to that trouble: if an action is such that it occupies lots of resources for a player, thus preventing it from taking further moves in the next round of the game, we can express this as a switching cost. Assume, for instance, that an action in a game Γ is such that the player is blocked for the next k rounds, then the switching cost is k-times the expected utility u (with the expectation taken over the equilibrium distribution played by the participants) that these next k rounds would give. Virtually, the situation is thus like if the player would have paid the total average gain over the next rounds where it is forced to remain idle (thus gaining nothing): Expression (3) will in practice be only an approximate identity, since we assumed that the game, viewed as a stochastic process, has already converged to stationarity (so that the equilibrium outcome u is actually rewarded). The speed of convergence, indeed, can itself be of interest to be controlled in security applications using moving target defenses [24]. The crucial point of modeling a longer lasting effect of the current action like described above, however, lies in the avoidance of complexity: expression (3) has no issues with large k, while more direct methods of modeling a game over k rounds, or including a dependency on the last k moves, is relatively more involved (indeed, normal stochastic games consider a first-order Markov chain, where the next state of the game depends on the last state; the setting just described would correspond to an order k chain, whose conversion into a first order chain is also possible, but complicates matters significantly).

Paper Outline
The paper is structured as follows. In Section 3 we show that the problem to be solved is a NP-hard one through a reduction from the decision version of the clique problem. In Section 4 we discuss different reformulations (bilinear, MILP) of the problem and possible bounds for it. In Section 5 we present a (spatial) branch-and-bound approach for the problem, putting a particular emphasis on the bound-tightening procedure, which turns out to be the most effective to attack it. In Section 6 we present some computational experiments. We first describe the set of test instances. Next, we discuss the performance of existing solvers over these instances. Finally, we present and comment the computational results attained by the proposed approach.

Complexity Result
In this section we consider the complexity of the problem to be solved and we show that its corresponding decision problem is NP-complete. Let where S ii = 0 for each i = 1, . . . , n and S ij ≥ 0 for each i = j, while A k ≥ 0, k = 1, . . . , m. Moreover, let ∆ n = {x ∈ R n + : e T x = 1} be the n-dimensional unit simplex. After incorporating α and (1 − α) respectively into S and A k , k = 1, . . . , m, problem 2 introduced in Section 2 is equivalent to min x∈∆n f (x). Then, we would like to establish the complexity of the following decision problem: As already remarked in Section 2, the problem would be trivial for the quadratic part of f alone (the minimum of the quadratic form in f over ∆ n is equal to 0 and is attained at each vertex of the unit simplex). It would also be polynomially solvable for the piece-wise linear part of f alone (one only needs to solve a linear problem). However, it turns out that the overall problem is difficult, as proved in the following theorem.
Proof. We prove the result by providing a polynomial transformation of the max clique decision problem, i.e., given a graph G = (V, E) and a positive integer k ≤ |V |, does there exist a clique C in G with cardinality at least k? We define the following instance of the decision problem (5). Let Moreover, let m = n and for each k = 1, . . . , n let A k = e k , where e k is the vector with all components equal to 0, except the k-th one, which is equal to 1. Stated in another way, the piecewise linear part is max k=1,...,n x k . Finally, let ξ = 1 k . We claim that the minimum value of f over ∆ n is not larger than ξ = 1 k if and only if G contains a clique with cardinality k. The if part is very simple. Indeed, let us consider the feasible solution where C is a clique of cardinality k, and let x i = 0 otherwise. Then, the value of f at this point is equal to 1 k . Indeed , the value of the quadratic part is 0, while the value of the piece-wise linear part is 1 k . The proof of the only if part is a bit more complicated. We would like to prove that, in case no clique with cardinality at least k exists, then the minimum value of f over the unit simplex is larger than 1 k . Let us denote by x * the minimum of f over the unit simplex. Let and let C be the maximum clique over the sub-graph induced by K, whose cardinality is at most k − 1. We first remark that if, for some i ∈ K \ C, it holds that then the quadratic part contains the term which concludes the proof. Therefore, we assume that either Now, let which concludes the proof. Then, we assume that for each k 1 , k 2 ∈ K 1 , (k 1 , k 2 ) ∈ E, i.e., K 1 itself is a clique. Now let us consider the following subset of C is also a clique with cardinality larger than C, which is not possible in view of the fact that C has maximum cardinality. Then, in view of (6) we have that , and, moreover, by definition of K 1 , we also have and, consequently, taking into account that |T | ≤ k − 1, for at least one index j ∈ T it must hold that so that the piece-wise linear part of f is larger than 1 k , which concludes the proof.

A review of different reformulations and bounds
In the next subsections we will briefly present and discuss different reformulations and bounds that we tested to attack the problem of minimizing the objective function (4) over the unit simplex. Later on, in Section 5 we will discuss in more detail the reformulation and the related bounding procedure which turned out to be the best one in practice. Before proceeding, we remark that, by a standard procedure, the convex piece-wise linear term of the objective function can be replaced by a single variable with the addition of linear constraints, i.e., where we recall that ∆ n denotes the n-dimensional unit simplex.

Bilinear reformulation
A simple reformulation of problem (7), with bilinear objective function and linear constraints, is the following: where S i denotes the i-th row of matrix S. If lower bounds xi , yi and upper bounds u xi , u yi are available for x i and y i , respectively, then the well known McCormick underestimating function (see [12]) can be employed to limit from below the bilinear term In fact, it turns out that McCormick underestimating function is the convex envelope of the bilinear term over the given rectangle. Then, after introducing the additional variables f i , we have that the optimal value of the following LP problem is a lower bound for problem (7) when variables x i and y i , i = 1, . . . , n, are bounded to belong to the intervals [ xi , u xi ] and [ yi , u yi ], respectively: In spite of the fact that reformulation (8) and lower bound (9) are rather simple ones, a branch-andbound approach based on them turned out to be, by far, the best performing of all the approaches we tested to attack problem (7). However, as we will see through some computational experiments, efficiency of such an approach strongly depends on the use of suitable bound-tightening techniques, which will be discussed in Section 5.1.

Reformulation based on KKT conditions
Similarly to what already done in [9,23] for box-constrained quadratic programming problems and, very recently, in [7,25], for Standard Quadratic Programming (StQP) problems, one can exploit KKT conditions to reformulate problem (7) as a Mixed Integer Linear Programming (MILP) problem. The KKT conditions for problem (7) are the following Let us denote by X lin the region defined by the linear constraints of the KKT conditions (in fact, all of them except the complementarity conditions), and by X comp the region defined by the complementarity conditions. Now let us consider a point (x, v, µ, γ, λ) fulfilling the KKT conditions, i.e., belonging to X lin ∩X comp . For each i ∈ {1, . . . , n}, let us multiply the first equation by x i and then sum the result over all i. We end up with In view of the complementarity conditions we also have so that at KKT points the objective function of the problem turns out to be equal to 1 2 (λ + v). Then, since the global minimizer is a KKT point, problem (7) can be rewritten as follows min (x,v,γ,µ,λ)∈Xlin∩Xcomp In this reformulation the objective function is a linear one and the difficulty of the problem now lies in the complementarity conditions. In order to deal with them, we can introduce the binary variables δ i , i = 1, . . . , n, and ρ j , j = 1, . . . , m, and we will introduce suitable constraints such that In order to introduce suitable constraints we need suitable upper bound values u xi , u µi , u γj , u v−A T j x respectively for x i , µ i , γ j , and v − A T j x. Now, we can reformulate problem (7) as a MILP: An obvious upper bound for x i is u xi = 1. In order to compute an upper bound for µ i and γ i we first remark, from (10) and recalling that S has nonnegative entries, that for x ∈ ∆ n so that we can compute upper bounds for µ i and γ i by solving the following linear problems For what concerns an upper bound for v − A T j x, if an upper bound GU B for the optimal value of problem (7) and, thus, also for v, is available, then we can compute it by solving the following linear problem In fact, all these upper bounds can be strengthened through the bound-tightening techniques described in Section 5.1. The reformulation of problem (7) as a MILP is quite appealing and allows to employ powerful solvers like CPLEX or GUROBI to solve it. However, while this reformulation turned out to be effective when solving StQP problems (see [7,25]), the computational experiments in Section 6.2 will show that it does not appear to be an effective approach for the solution of problem (7).

Semidefinite relaxation and convex envelope
Following a standard approach for quadratic problems, problem (7) can be reformulated as follows This immediately leads to the classical semidefinite relaxation for quadratic problems by simply replacing the equality constraint X = xx T with the semidefinite constraint X xx T . Unfortunately, the bound obtained by solving the semidefinite relaxation is a trivial one, namely the solution of the linear problem min v v ≥ A T j x j = 1, . . . , m x ∈ ∆ n .
To see this, it is enough to observe that for each x ∈ ∆ n , we can always find a diagonal matrix X such that X xx T holds and, thus, X is feasible for the semidefinite relaxation. In view of the structure of matrix S, with all diagonal elements equal to 0, it turns out that S • X = 0 for such diagonal matrix, so that we are left with the simple bound (12).
We end up with the trivial bound (12) even when we replace the quadratic part of the objective function of problem (7) with its convex envelope over ∆ n . Indeed, function x T Sx is edge-concave (see [20,21]), i.e., it is concave over any segment parallel to an edge of the unit simplex. To see this it is enough to show that for all i, j ∈ {1, . . . , n}, i = j, we have that where e i and e j are the i-th and j-th vertex of the unit simplex (recall that all diagonal elements of S are null and all off-diagonal elements are nonnegative). Then, it is well known (see [20,21]) that edge-concave functions have a vertex polyhedral convex envelope, namely, their convex envelope is the affine function interpolating x T Sx at the vertices of the unit simplex ∆ n . Since S has a null diagonal, this means that the convex envelope is the null function, so that by replacing the quadratic term with its convex envelope over the unit simplex we are back to (12).

A branch and bound approach
As already discussed in Section 4.1, given the lower bounds xi , yi and the upper bounds u xi , u yi for variables x i and y i respectively, i ∈ {1, . . . , n}, a lower bound for problem (7) over the box can be computed by solving the LP problem (9). In the proposed branch-and-bound approach the size of the boxes is progressively reduced through the branching strategy discussed in Section 5.2. However, a reduction merely based on the branching strategy would lead to a quite inefficient algorithm. It turns out that performance can be strongly enhanced by a bound-tightening technique. Such technique, described in the following Section 5.1, is expensive but, as we will see through the computational experiments, extremely rewarding. Next, in Section 5.2 we will briefly discuss the simple branching strategy. Finally, in Section 5.3 we will discuss a technique to strengthen the lower bound, based on the solution of a (small-dimensional) MILP. Note that we do not discuss convergence of the proposed branch-and-bound approach, since it easily follows by rather standard and general arguments which can be found in [10].

Bound-tightening technique
At each node, associated to a box , we refine the bounds by recomputing xi , yi , u yi , u xi . All bounds are computed by solving LP problems having the feasible set defined by constraints (9b)-(9h) and the additional constraint stating that we are only interested at feasible solutions where the underestimating function, i.e., the left-hand side of the constraint, corresponding to the objective function (9a), is not larger than the current upper bound GU B. In other words, we are applying an optimality-based domain reduction (see, e.g., [5,22]), removing feasible points which do not allow to improve the current best feasible solution. The 4n LP problems to be solved have the following objective functions: 1. min y i for computing yi ; 2. max y i for computing u yi ; 3. min x i for computing xi ; 4. max x i for computing u xi .
Once we have strengthened the bounds, we recompute the lower bound by solving (9) with the new lower and upper limits for the variables. Note that the underestimating function and, thus, the lower bound depends on such limits. But also the converse is true: the limits depend on the underestimating function. Thus, once we have computed the lower bound, we can solve again the LP problems in order to further reduce the limits. These can be iteratively reduced until the difference between the (non-decreasing) lower bounds at two consecutive iterations fall below a given threshold (we set = 1.e −3 in our experiments). Note that rather than solving all 4n LP problems with the same feasible set, we could solve each of them with a different feasible region by incorporating all previously computed new limits in the definition of the feasible region for the next limit to be computed. That leads to sharper bounds, however we excluded this opportunity since we observed that LP solvers strongly benefit from the opportunity of solving problems over the same feasible region. The described bound-tightening technique is quite expensive. In fact, what we observed through our computational experiments is that it is not necessary to solve all 4n LPs but it is enough to concentrate the effort on the most 'critical' variables. More precisely, in order to reduce the number of LPs without compromising the performance, we employed the following strategies. First, we computed the quantities: where x * and f * are computed at the solution of problem (9). In other words, g i measures the distance between the real and underestimated value of the bilinear term x i y i . The larger the distance, the higher is the need for a more accurate underestimation of the bilinear term. Then, we solved the following LP problems.
-0.2n LP problems with objective function min y i , for all i corresponding to the 0.2n largest g i values; a fixed number T (T = 10 in our experiments) of LP problems with objective function max y i , for all i corresponding to the T largest g i values; again T LPs problems with objective function max x i , for all i corresponding to the T largest g i values; no LP problem with objective function min x i . These choices have been driven by some experimental observations. In particular, we noticed that the lower limit for y i is the most critical for the bound computation or, stated in another way, constraint is often the active one. For this reason a larger budget of LP problems is allowed to improve this lower limit with respect to the upper limits. Instead, we never try to improve the lower limit xi because it is experimentally observed that this limit can seldom be improved. This way, the overall number of LPs to be solved at each iteration is reduced to 0.2n + 2T (note that the overall number of LPs to be solved at each node can be computed by multiplying this value by the number of iterations, which, however, is not known in advance).
In spite of the reduction of the number of LPs solved, the computational cost per node is still remarkable. But, as we will see in the section about computational experiments, this recursive bound-tightening procedure allows to considerably reduce the number of branch-and-bound nodes.

Branching Strategy
The branching strategy we employed is a rather standard one. Given a node associated to a box B , and the optimal solution (x * , y * , f * , v * ) of the corresponding relaxation (9), we first select an index r ∈ {1, . . . , n} such that r ∈ arg max i=1,...,n g i , where g i is defined in (14). In other words, we select the index corresponding to the bilinear term where we have the largest error at the optimal solution of the relaxation. Next, we might define the following branching operations: Branching on x and y: Define four children nodes by adding constraints {x r ≤ x * r , y r ≤ y * r }, {x r ≤ x * r , y r ≥ y * r }, {x r ≥ x * r , y r ≤ y * r }, {x r ≥ x * r , y r ≥ y * r }, respectively (quaternary branching); Branching on x: Define two children nodes by adding constraints x r ≤ x * r and x r ≥ x * r , respectively (binary branching); Branching on y: Define two children nodes by adding constraints y r ≤ y * r and y r ≥ y * r , respectively (binary branching).
Note that all choices above, with the new McCormick relaxation given by the new limits on the variables, reduce to zero the error for bilinear term x r y r at the optimal solution (x * , y * , f * , v * ). It is worthwhile to remark that the computed lower bound tends to become exact even when branching is always performed with respect to variables of the same type (say, always variables x i , i = 1, . . . , n). Indeed, only one of the two boxes B x or B y needs to converge to a single point in order to let the underestimating function values converge to the original objective function values. This is a consequence of the fact that the McCormick underestimation function tends to a given node of the branch-and-bound tree is small compared to the time needed for the boundtightening operation, so that we can improve the quality of the bound without significatively increasing the computational cost per node. Note that the intervals [ xi , u xi ] and [ yi , u yi ] could be split into more than two sub-intervals with the addition of suitable binary variables. However, our computational experiments suggest that this does not lead to significative improvements.
6 Numerical Results

Test instances description
The game is about spot checking a set of n places to guard them against an adversary. The places are spatially scattered, with a directed weighted graph describing the connections (direct reachability) of place v from place u by an edge v → u with a random length.
The payoffs in the game are given by an n × n-Matrix A (so m = n in the above description), and are interpreted as the loss that the defending player 1 suffers when checking place i while the attacker is at place j. Thus, the defender can: either miss the attacker (i = j) in which case there will be a Weibull-distributed random loss with shape parameter 5 and scale parameter 10.63 (so that the variance is 5); or hit the attacker at i = j, in which case there is zero loss.
The defender is thus minimizing, and the attacker is maximizing. The problem above is that of the defender. The Nash equilibrium then gives the optimal random choice of spot checks to minimize the average loss. To avoid trivialities, the payoff matrices are constructed not to admit pure strategy equilibria, so that the optimum (without switching cost) is necessarily a mixed strategy.
As for the switching cost, if the defender is currently at position i and next -according to the optimal random choice -needs to check the (non-adjacent) place j, then the cost for the switch from i to j is the shortest path in the aforementioned graph (note that, since the graph is directed, the matrix S is generally nonsymmetric).
For the random instances, the matrix S is thus obtained from a (conventional) all-shortest path algorithm applied to the graph. Note that the graph is an Erdös-Renyi type graph with n nodes and p = 0.3 chance of any two nodes having a connection.
The weights in the graph were chosen exponentially distributed with rate parameter λ = 0.2, and the Weibull distribution for the losses has a shape parameter 5 and scale parameter ∼ 10.63, so that both distributions have the same variance of 5.
The graph sizes considered are n = 50, 75, 100 and for each graph size we consider ten random instances.
We shift the matrix S to make less relevant the quadratic part that is more difficult to bound. More in detail, given the vector c ∈ R n with we have on the feasible set (i.e., exploiting e T x = 1) that c n e T    Then our objective function in the practical implementation becomes Note that all the data of the test instances are available at the web site http://www.iasi. cnr.it/~liuzzi/StQP.

A comparison with the existing QP literature
The problem discussed in this paper belongs to the class of nonconvex QP problems with linear constraints, which is a quite active research area. Even well-known commercial solvers, like CPLEX and GUROBI, have recently included the opportunity of solving these problems. In [25] different solution approaches have been compared over different nonconvex QP problems, namely: Standard Quadratic Programming problems (StQP), where the feasible region is the unit simplex; BoxQP, where the feasible region is a box; and general QPs, where the feasible region is a general polytope. The tested approaches have been the nonconvex QP solver of CPLEX, quadprogBB (see [6]), BARON (see [17]), and quadprogIP, introduced in [25] itself. According to the computational results reported in that work, depending on the class of QP problems considered, the best performing approaches are the nonconvex QP solver of CPLEX and quadprogIP. For this reason we ran these solvers over our test instances, with the addition of the nonconvex QP solver of GUROBI.
For these experiments we employed an Intel R Core i7-8550U CPU at 1.8GHz with 4 cores and 16GB main memory. Since even at dimension n = 50 and for different α values most instances were not solved within a time limit of 3 hours, we do not report extensive results. Rather, we only discuss the behavior over a single instance with n = 50 and α = 0.5, which, however, is representative of the behavior over the other instances. We first ran the nonconvex QP solvers both of CPLEX and of GUROBI. Both were unable to terminate within 3 hours. The relative percentage gap still to be closed after that time was 3.61% and 5.63% for CPLEX and GUROBI, respectively.
Next, we ran quadprogIP. This is based on a MILP reformulation of nonconvex QP problems with linear constraints, which in some cases appears to be a rather promising way to tackle them. Since this solver requires lower and upper bounds for all variables, we needed to introduce a lower and upper bound for variable v. Due to nonnegativity of vectors A j , j = 1, . . . , m, a simple lower bound for v is 0, while a valid upper bound, fulfilled by optimal solutions of the problem, is max j=1,...,m, i=1,...,n A ji .
After 3 hours the percentage gap still to be closed was 17.88%. However, we made a further experiment. After observing that a critical aspect of quadprogIP is the computation of valid upper bounds for primal and dual variables, we introduced a pre-processing phase, where all the upper bounds u xi , u µi , u γj , u v−A T j x , respectively for x i , µ i , γ j , and v − A T j x, appearing in the MILP reformulation (11) have been strengthened via the optimality-based bound-tightening technique discussed in Section 5.1 (an estimate GU B for the globally optimal value was obtained by running some local searches from randomly generated points). After that we ran GUROBI and CPLEX to solve the MILP reformulation (11). The pre-processing phase definitely helped, but still none of the two solvers was able to terminate within the 3 hours time limit. In this case GUROBI performed better and terminated after 3 hours with a percentage gap of 0.25 % still to be closed, while CPLEX terminated after the same time with a percentage gap of 1.66 % still to be closed.
Finally, we ran the approach proposed in this paper and it terminated after 114 seconds returning an optimal solution with a relative tolerance value ε = 10 −6 . The main reason for the much better performance of our approach lies in the optimality-based bound-tightening technique. To confirm this fact, we made a further experiment by running the same approach but without activating the bound-tightening procedure. After 3 hours the gap still to be closed was equal to 4.73%.
In the following subsection we report the results attained by our approach for different n and α values and we comment the results.

Computational results over the proposed test instances
In this section we show through some computational experiments that our approach successfully solves the problem for the sizes (50 ≤ n ≤ 100) of the real instances arising from the game theory application discussed in Section 2. Moreover, we show it scales well with the dimension, outperforming all the available software for solving non convex QPs, as already observed in the previous subsection.
The algorithm has been coded using the Julia [4] language (version 1.3.1). Doing the implementation we parallelized as much as possible the bound-tightening procedure discussed in Section 5.1, where many LPs with the same feasible region need to be solved. The code is available for download at the URL http://www.iasi.cnr.it/~liuzzi/StQP. All the experiments have been carried out on an Intel R Xeon R gold 6136 CPU at 3GHz with 48 cores and 256GB main memory. This machine is more powerful with respect to the one employed in the experiments discussed in the previous subsection. Just to give an idea of the relative performance, we remark that with the machine used for the previous experiments, the largest time to solve instances with n = 50 is approximately 600 seconds, while with the machine employed for the experiments in this section the largest time is approximately 120 seconds.
We have solved ten instances for three different sizes n = 50, 75, 100 and for the following values of α = {0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}. Note that lower and larger values of α with respect to those we tested give rise to much simpler instances (recall that the problem becomes polynomially solvable for the extreme values α = 0 and α = 1). We set a time limit of 10800s for all instances. Differently from the previous subsection, where we stopped when a relative tolerance ε = 10 −6 was used to comply with the one of the existing solvers, in this section we fixed a relative tolerance ε = 10 −3 , which is considered adequate for practical applications. For n = 50 and n = 75 we solve all the instances to optimality (in fact, the largest time to solve an instance with n = 50 is about 2 minutes, whereas for n = 75 the largest time is below 1 hour, but most of the problems are solved within 10 minutes). In Figures 2a-2c we report the box plot of number of nodes, number of LPs and CPU time needed for the different values of α when n = 50. The figure shows that the hardest instances are the ones corresponding to the central values of α and this will turn out to hold true also at larger dimensions. We also observe that the overall number of nodes is extremely limited, thus confirming that, while computationally expensive, the bound-tightening procedure allows to considerably reduce the size of the branch-and-bound tree (again, this fact is observed also at larger dimensions). We also stress that n = 50 is a typical size for real instances of this problem arising from the game theory application discussed in Section 2. In Figures 3, we report the different box plots for all the instances at n = 75. It is worthwhile to remark that we solve most of them within ten minutes and exploring less than 300 nodes. Finally, in Figure 4 we report the performance of our method on problems of dimension n = 100. In this case there are seven instances we are not able to solve within the time limit. These occur for values α ∈ {0.6, 0.7, 0.8}, thus confirming that the central values of this parameter give rise to the most challenging instances. With respect to n = 50 and n = 75, we have the additional box plot displayed in Figure 4d reporting the final percentage gap when the time limit is reached. Note that it is never larger than 1.2% and most of the times it is lower than 0.5%, thus showing that, even when the algorithm does not terminate, the quality of the returned solution is guaranteed to be high. We also tried to apply the bound strategy described in Section 5.3 on the seven instances that we did not manage to solve to optimality. The result are shown in Table 1, and it can be observed that the MILP-based  approach manages to reduce the gap within the three hours, but does not improve the solution enough to close the gap. In general, when the bound tightening approach is used, the gap decreases faster, but the price to pay at each node is clearly higher, and the gain is not significant enough to close the gap a lot faster. It may be worth to use this strategy when one has a limited time budget since it allows to produce a better quality solution. But, on the other hand, it appears that only a limited number of nodes which are not closed by the bound-tightening procedure can instead be closed by the MILP-based bound. This is remarkable since, as previously commented in Section 5.3, the MILP-based bound, in fact, corresponds to the lowest bound obtained after five consecutive quaternary branching operations (without bound-tightening), i.e., after the generation of 1024 nodes.