Mean-field-game model for Botnet defense in Cyber-security

We initiate the analysis of the response of computer owners to various offers of defence systems against a cyber-hacker (for instance, a botnet attack), as a stochastic game of a large number of interacting agents. We introduce a simple mean-field game that models their behavior. It takes into account both the random process of the propagation of the infection (controlled by the botner herder) and the decision making process of customers. Its stationary version turns out to be exactly solvable (but not at all trivial) under an additional natural assumption that the execution time of the decisions of the customers (say, switch on or out the defence system) is much faster that the infection rates.


Introduction
A botnet, or zombie network, is a network of computers infected with a malicious program that allows cybercriminals to control the infected machine remotely without the user's knowledge. Botnets have become a source of income for entire groups of cybercriminals since the cost of running botnets is cheap and the risk of getting caught is relatively small due to the fact that other people's assets are used to launch attacks. The interactive process of the attackers and defenders can be modeled as a Game. The use of game theory in modeling attacker-defender has been extensively adopted in the computer security domain recently; see [5], [22] and [24] and bibliography there for more details. Two aspects are important. The first one is the contamination effect. The second one is the large number of computers. So, in fact, one deals with a stochastic game of a large number of interacting agents. This is amenable to Mean Field theory. To investigate this approach represents the main objective of this paper. Our model takes into account both the random process of the propagation of the infection (controlled by the botnet herder) and the decision making process of customers. We develop a stationary version which turns out to be exactly solvable (but not at all trivial) under an additional natural assumption that the execution time of the decisions of the customers (say, switch on or out the defense system) is much faster that the infection rates.
Similar models can be applied to the analysis of defense against a biological weapon, for instance by adding the active agent (principal interested in spreading the disease), into the general mean-field epidemic model of [23] that extends the well established SIS (susceptible-infectious-susceptible) and SIR (susceptible-infectious-recovered) models.
The paper is organized as follows. In the next section we introduce our model, formulate the basic mean-field game (MFG) consistency problem in its dynamic and stationary versions leading to precise formulation of our main problem of characterizing the stable solutions (equilibria) of the stationary problem. This problem is a consistency problem between an HJB equation for a stochastic control of individual players and a fixed point problem for an evolutionary dynamics. These two preliminary problems are fully analyzed in Sections 3 and 4 respectively. Section 5 is devoted to the final synthesis of the stationary MFG problem from the solutions to these two preliminary problems. In particular, the phase transitions and the bifurcation points changing the number of solutions are explicitly found. In the last section further perspectives are discussed.

The model
Assume that any computer can be in 4 states: DI, DS, UI, US, where the first letter, D or U, refers to the state of a defended (by some system, which effectiveness we are trying to analyze) or an unprotected computer, and the second letter, S and I, to susceptible or infected state. The change between D and U is subject to the decisions of computer owners (though the precise time of the execution of her intent is noisy) and the changes between S and I are random with distributions depending on the level of efforts v H of the Herder and the state D or U of the computer.
Let n DI , n DS , n U I , n U S denote the numbers of computers in the corresponding states with N = n DS + n DI + n U I + n U S the total number of computers. By a state of the system we shall mean either the 4-vector n = (n DI , n DS , n U I , n U S ) or its normalized version x = (x DI , x DS , x U I , x U S ) = n/N. The fraction of defended computers x DI + x DS represents the analogue of the control parameter v D from [5], the level of defense of the system, though here it results as a compound effect of individual decisions of all players.
The control parameter u of each player may have two values, 0 and 1, meaning that the player is happy with the level of defense (D or I) or she prefers to switch one to another. When the updating decision 1 is made, the updating effectively occurs after some exponential time with the parameter λ (measuring the speed of the response of the defense system). The limit λ → ∞ corresponds to the immediate execution.
The recovery rates (the rates of change from I to S) are given constants q D rec and q U rec for defended and unprotected computers respectively, and the rates of infection from the direct attacks are v H q D inf and v H q U inf respectively with constants q D inf and q U inf . The rates of infection spreading from infected to susceptible computers are β U U /N, β U D /N, β DU /N, β DD /N, with numbers β U U , β U D , β DU , β DD , where the first (resp second) letter in the index refers to the state of the infected (resp. susceptible) computer (the scaling 1/N is necessary to make the rates of unilateral changes and binary interactions comparable in the N → ∞ limit).
Thus if all computers use the strategy u DS , u DI , u U S , u U I , u ∈ {0, 1} and the level of attack is v H , the evolution of the frequencies x in the limit N → ∞ can be described by the following system of ODE: where v D is interpreted as the defender group's combined defense effort, then summing up the first and the third equations in (1) leads to the equatioṅ for the total fraction of infected computers x = x DI + x U I . This equation coincides (up to some constants) with equation (2) from [5], which is the starting point of the analysis of paper [5].
It is instructive to see, how evolution (1) can be deduced rigorously as the limit of the Markov processes specifying the random dynamics of N players. The generator of this Markov evolution on the states n is (where the unchanged values in the arguments of F on the r.h.s are omitted) +λn DI u DI F (n DI − 1, n U I + 1) + λn U I u U I F (n U I − 1, n DI + 1), or in terms of x as where {e j } is the standard basis in R 4 . If F is a differentiable function, the generator L N turns to the generator in the limit N → ∞. This is a first order partial differential operator. Its characteristics are given precisely by the ODE (1). A rigorous derivation showing the solutions to (1) describe the limit of the Markov chain generated by (3) can be found e.g. in [17].
We shall now use the Markov model above to assess the actions of individual players. If x(t) and v H (t) are given, the dynamics of each individual player is the Markov chain on 4 states with the generator L ind g(DI) = λu ind (DI)(g(UI) − g(DI)) + q D rec (g(DS) − g(DI)), depending on the individual control u ind .
Assuming that an individual pays a fee k D per unit of time for the defense system and k I per unit time for losses resulting from being infected, her cost during a period of time T , that she tries to minimize, is where 1 D (resp. 1 I ) is the indicator function of the states DI, DS (resp. of the states DI, UI). Assuming that the Herder has to pay k H v H per unit of time using efforts v H and receive the income f (x) depending on the distribution x of the states of the computers, her payoff, that she tries to maximize, is Therefore, starting with some control the Herder can find his optimal strategy v H (t) solving the deterministic optimal control problem with dynamics (1) and payoff (7) finding both optimal v H and the trajectory x(t). Once x(t) and v H (t) are known, each individual should solve the Markov control problem (5) with costs (6) thus finding the individual optimal strategy The basic MFG consistency equation can now be explicitly written as Instead of analyzing this rather complicated dynamic problem, we shall look for a simpler problem of consistent stationary strategies.
There are two standard stationary problems naturally linked with a dynamic one, one being the search for the average payoff for long period game, and another the search for discounted optimal payoff. The first is governed by the solutions of HJB of the form (T − t)µ + g, linear in t (then µ describing the optimal average payoff), so that g satisfies the stationary HJB equation: where min is over two values {0, 1}. We shall denote u = (u DI , u U I , u DS , u U S ) the argmax in this solution.
The discounted optimal payoff (with the discounting coefficient δ) satisfies the stationary HJB The analysis of these two settings is mostly analogous. We shall concentrate on the first one. Introducing the coefficients the stationary HJB equation (8) rewrites as where the choice of the first term as the infimum in these equations corresponds to the choice of control u = 1.
The stationary MFG consistency problem is in finding x = (x DI , x DS , x U I , x U S ) and u = (u DI , u DS , u U I , u U S ), where x is the stationary point of evolution (1), that is with u = (u DI , u DS , u U I , u U S ) giving minimum in the solution to (8) or (11). Thus x is a fixed point of the limiting dynamics of the distribution of large number of agents such that the corresponding stationary control is individually optimal subject to this distribution. Yet in other words, x = (x DI , x DS , x U I , x U S ) and u = (u DI , u DS , u U I , u U S ) solve (8), (12) simultaneously.
Fixed points can practically model a stationary behavior only if they are stable. Thus we are interested in stable solutions (x, u) to the stationary MFG consistency problem (12), (8), where a solution is stable if the corresponding stationary distribution x is a stable equilibrium to (1) (with u fixed by this solution).
Apart from stability, the fixed points can be classified via its efficiency. Namely, let us say that a solution to the stationary MFG is efficient (or globally optimal) if the corresponding average cost µ is minimal among all other solutions.
Talking about strategies, let us reduce the discussion to non-degenerate situations, where the minima in (11) are achieved on a single value of u only. In principle, there are 16 possible pure stationary strategies (functions from the state space to {0, 1}). But not all of them can be realized as solutions to (11). In fact if u DI = 1, then g(UI) < g(DI) (can be equal in degenerate case) and thus u U I = 0. This argument forbids all but four strategies as possible solutions to (11), namely (13) The first two strategies, either always choose U or always choose D, are acyclic, that is the corresponding Markov processes are acyclic in the sense that there does not exist a cycle in a motion subject to these strategies. Other two strategies choose between U and D differently if infected or not.
Of course, allowing degenerate strategies, more possibilities arise. To complete the model, let us observe that the natural assumptions on the parameters of the model arising directly from their interpretation are as follows: We shall always assume (14) hold. Two additional natural simplifying assumptions that we shall use sometimes are the following: the infection rate does not depend on the level of defense of the computer transferring the infection, but only on the level of defence of the susceptible computer, that is, instead of four coefficients β one has only 2 of them and the recovery rate do not depend on whether a computer is protected against the infection or not: As we shall see, a convenient assumption, which is weaker than (16), turns out to be Finally, it is reasonable to assume that customers can switch rather quickly their regime of defence (once they are willing to) meaning that we are effectively interested in the asymptotic regime of large λ. As we shall show, in this regime the stationary MFG problem above can be completely solved analytically. In this sense the present model is more complicated than a related mean-field game model of corruption with three basic states developed in [19], where a transparent analytic classification of stable solutions is available already for arbitrary finite λ.

Analysis of the stationary HJB equation
Let us start by solving HJB equation (11).
Consider strategy (i) of (13), so that being unprotected is always optimal. Then (11) becomes As the solution g is defined up to an additive constant we can set g(US) = 0. Then (18) becomes From the third and fourth equations we find Substituting these values in the first and second equations we obtain and the conditions g(UI) ≤ g(DI), g(US) = 0 ≤ g(DS) become respectively. Consider strategy (ii) of (13), so that being defended is optimal. Then (11) becomes Setting g(DS) = 0 yields λ(g(DI) − g(UI)) + q U rec (g(US) − g(UI)) + k I = µ, − λg(US)) + β(g(UI) − g(US)) = µ.
From the first and second equations we find Substituting these values in the third and fourth equations we obtain .
(26) and the conditions g(UI) ≥ g(DI), g(US) ≥ g(DS) = 0 turn to respectively. Consider strategy (iii) of (13). Then (11) becomes Setting g(DS) = 0 yields µ = αg(DI) + k D from the second equation, then λg(UI) = g(DI)(α + λ + q D rec ) − k I from the first equation and rec from the third one. Plugging these expressions in the fourth equation of (28) we find (after many cancelations) g(DI) and then the other values of g: Hence .
from the first one. Plugging these expressions in the second equation of (31) we find g(UI) and then the other values of g: Hence the conditions g(UI) ≥ g(DI), g(DS) ≥ g(US) = 0 rewrite as We are now interested in finding out how many solutions equation (11) may have for a given x. The first observation in this direction is that the interior of the domain defined by (22) (that is, with a solution of case (i)) and the interior of the domain defined by (30) (that is, with a solution of case (iii)) do not intersect, because the first inequality in (22) contradicts the second inequality in (30) (apart from the boundary). Similarly, the interior of the domain defined by (22) (that is with a solution of case (i)) and the interior of the domain defined by (33) (that is, with a solution of case (iv)) do not intersect, and the interior of the domain defined by (27) (that is, with a solution of case (ii)) does not intersect with the domains having solutions in cases (iii) or (iv).
Next we find that one can distinguish two natural domains of x classifying the solutions to HJB equation (11): More explicitly, By (14) it is seen that under a natural additional simplifying assumptions (16) or even (17), all positive x belong to D 1 (or its boundary), so that D 2 is empty.
Under additional assumption (15) the condition x ∈ D 1 gets a simpler form To link with the conditions for cases (i)-(iv) one observes the following equivalent forms of the main condition of being in D 1 : From here it is seen that if x belongs simultaneously to the interiors of the domains specified by (22) and (27) (that is, with solutions in cases (i) and (ii) simultaneously), then necessarily x ∈ D 1 (that is, for x ∈ D 2 the conditions specifying cases (i) and (ii) are incompatible). On the other hand, if x belongs simultaneously to the interiors of the domains specified by (30) and (33) (that is, with solutions in cases (iii) and (iv) simultaneously), then necessarily x ∈ D 2 (that is, for x ∈ D 1 the conditions specifying cases (iii) and (iv) are incompatible).
Denoting κ = k D /k i , we can summarize the properties of HJB equation (11) as follows (uniqueness is always understood up to the shifts in g).
then there exists a unique solution to (11) belonging to case (iii) and there are no other solutions to (11).
then there exists a unique solution to (11) belonging to case (iv) and there are no other solutions to (11). (

3) A solution belonging to case (i) exists if and only if
and is unique if this holds. A solution belonging to case (ii) exists if and only if and .
which obviously hold. (2) When two solutions exist simultaneously one can discriminate them by the values of the average payoff µ. One sees from (20) and (25), that µ arising from cases (i) and (ii) are different (apart from a single value of κ). (3) The uniqueness result under (16) is quite remarkable, as it does not seem to follow a priori from any intuitive arguments.
Again directly from the argument above one can conclude the following.
then there exists a unique solution to (11) belonging to case (i) and there are no other solutions to (11).
then there exists a unique solution to (11) belonging to case (ii) and there are no other solutions to (11). (

3) A solution belonging to case (iii) exists if and only if
and is unique if this holds. A solution belonging to case (iv) exists if and only if and is unique if this holds. Either of conditions (44) or (45) is incompatible with either (42) or (43). In particular, equation (11) may have at most two solutions (if (44) and (45) hold simultaneously).
Essential simplifications that allow eventually for a full classification of the stationary MFG consistency problem occur in the limit of large λ. For a precise formulation in case one needs further decomposition of the domains D 1 , D 2 . Namely, for j = 1, 2, let respectively. In particular, solutions of case (ii) become impossible.
(2) Suppose x ∈ D 1 and (46) holds. If x ∈ D 11 , there exists a unique solution to (11), which belongs to cases (ii), (iii), (i) for respectively. If x ∈ D 12 , solutions from case (iii) do not exist and there exist two solutions to (11) for belonging to cases (i) and (ii), and only one solution otherwise.
(3) Suppose x ∈ D 2 . If x ∈ D 22 , solutions from case (iii) do not exist and there is always a unique solution to (11) belonging to case (ii), (iv) or (i), for respectively. If x ∈ D 21 , then there are two solutions to (11) for which belong to cases (iii) and (iv), and one solution otherwise. This unique solution belongs to case (ii) or (i) for , κ > δ β + q U rec respectively and to case (iv) otherwise.

Analysis of the fixed points
Next we are solving the fixed point system (12).
In case (i), that is with u U I = u U S = 0, u DI = u DS = 1, equation (12) takes the form Adding the first two equations we get x DI = x DS = 0, and the system reduces to the single equation x U S β − x U I q U rec = 0. Substituting the value of β yields Denoting y = x U I it follows that x U S = 1 − y and thus This equation has a unique solution on the interval (0, 1): (55) The stability of the fixed point x = (0, 0, x * , 1 − x * ) means its stability as a fixed point of the dynamics We rewrite it by shifting the variables by the value of the stationary point, that is, in terms of x DI , x DS , y = x U I − x * , z = x U S − (1 − x * ). Since the sum of these variables is one, we have effectively the system of three equations on the variables x DI , x DS , y: Its linear approximation around the fixed point (0, 0, 0) is and the corresponding characteristic equation for the eigenvalues ξ is The free term cancels in the second multiplier and we get the eigenvalues The second and the third eigenvalues being negative, the condition of stability is reduced to the negativity of the first eigenvalue, that is, to the condition But it always holds for x * of form (55). Thus we proved the first part of the following statement and the second is analogous.
(1) There exists a unique solution to system (12) with the strategy U being individually optimal (that is, with the first acyclic stationary strategy u U I = u U S = 0, u DI = u DS = 1) and it is stable. It equals x = (0, 0, x * U I , 1 − x * U I ) with x * U I given by (55).
(2) There exists a unique solution to system (12) with the strategy D being individually optimal (that is, with the second acyclic stationary strategy) and it is stable. It equals x = (x * DI , 1 − x * DI , 0, 0) with x * DI being the unique solution of equation on the interval (0, 1), that is Let us consider case (iii): u U I = u DS = 0, u DI = u U S = 1. Then (12) takes the form By adding the first two equations we get x DI = x U S with two independent equations left: This rewrites as two equations on the two independent variables x DI , x U I as Solving the second equation with respect to x U I , and substituting in the first one, leads to a fourth order equation on y = x DI . This equation does not seem to be much revealing in general. Of course it can be fully analyzed by numeric methods, but we shall turn now to the large λ asymptotics that yields more manageable results. For large λ we get directly from (63) that But this implies that x DI is small of order O(λ −1 ), so that Substituting this in the first equation of (62) yields which is of the same type as equations (59) up to terms of order λ −1 (and coincides with it under (15), (16)). Therefore, for large λ, there exists a unique solution to (65) from the interval (0, 1):x * The stability of the fixed point x = (x * DI , x * DS = 1 −x * U I − 2x * DI ,x * U I , x * U S = x * DI ) means its stability as a fixed point of the dynamics In terms of independent variables with . Linearized around the fixed point (0, 0, 0) system (68) takes the form up to terms of order O(λ −1 ). Thus the matrix of the linear approximation divided by λ is The first order approximation of this matrix in λ −1 is and has eigenvalue −1 of double multiplicity and a zero eigenvalue. Hence all eigenvalues of M(λ) are negative if and only if its determinant det(M(λ)) is negative. As seen directly and is negative for large λ if and only if β U D (2x * U I − 1) > q U rec + q D rec v H , which always holds by (66). Thus we proved the first part of the following statement and the second part is analogous.
(1) For large λ there exists a unique solution to system (12) in case (iii), that is with u U I = u DS = 0, u DI = u U S = 1, and it is stable. It has the form x = (0, 1 −x * U I ,x * U I , 0) up to corrections of order λ −1 , withx * U I being the unique solution of equation (65) on (0, 1) given by (66).
(2) For large λ there exists a unique solution to system (12) in case (iv), that is with u U I = u DS = 1, u DI = u U S = 0, and it is stable. It has the form x = (x * DI , 0, 0, 1 −x * U I ) up to corrections of order λ −1 , withx * DI being the unique solution of equation on (0, 1).

Solutions to the stationary MFG problem
Combining Propositions 4.1, 4.2 and 3.3 allows one to fully characterize the solutions to our stationary MFG consistency problem for large λ.
The most straightforward general conclusion is the following.
Theorem 5.1. For large λ there may exist up to 4 solutions to the stationary MFG problem, with only one in each of the cases (i) -(iv). All these solutions are stable.
Remark 3. Notice that already this statement is not at all obvious a priori, and may not be true for finite λ, where solutions to case (iii) or (iv) are found from an equation of fourth order.
As an example of more precise classification, let us present it under assumption (17) that ensures that all solutions lie in the domain D 1 .
Let us introduce the function First let (16) hold. It is seen from Propositions 4.1 and 4.2 that for large λ, (and apart from κ from negligible intervals of size of order λ −1 that we shall ignore), a solution of the stationary MFG problem exists in case (i) if κ > κ * = κ(x * U I ), and a solution of the stationary MFG problem exists in case (iii) if κ <κ * = κ(x * U I ), where x * U I andx * U I are given by (55) and (66) respectively. Thus one can have up to two (automatically stable) solutions to the stationary MFG problem. Let us make this number precise. various directions. For instance, it is practically important to allow for the choice of various competing protection systems, leading to a model with 2d basic states: iI and iS, where i ∈ {1, · · · , d} denotes the ith defense system available (which can be alternatively interpreted as the levels of protection provided by a single or different firms), while S and I denote again susceptible or infected state, with all other parameters depending on i. On the other hand, in the spirit of papers [18], [17] that concentrate on modeling myopic behavior (rather than rational optimization) of players one can consider the set of computer owners consisting of two groups, rational optimizers and those changing their strategies by copying their neighbors.
The main theoretical question arising from our results concerns the rigorous relation between stationary and dynamic MFG solutions, which in general is in front of research in the mean-field game literature. We hope that working with our simple model with fully solved stationary version can help to get new insights in this direction. In the present context the question can be formulated as follows. Suppose that, if at some moment of time N players are distributed according certain frequency vector x among the four basic state, each player chooses the optimal strategy u arising from the solution of the stationary problem for fixed x (fully described in Section 3), and the Markov evolution continues according to the generator L. When two solutions are available, players may be supposed to choose the one with the lowest µ (see see Remark 2 (2)). The resulting changes in x induce the corresponding changes of u specifying a well-defined Markov process on the states of N agents. Intuitively, we would expect this evolution stay near our stationary MFG solutions for large N and t. Can one prove something like that?