Mean Field Games on Prosumers

In the realm of dynamic demand, prosumers are agents that can produce and consume goods. In this paper, the problem we address involves a large population of prosumers and each prosumer has to decide if (s)he wants to produce or consume a certain amount of goods. The strategy of each agent depends on the average behavior of the population. We set the problem in a game-theoretic framework by modeling every prosumer as a player. For every player, a cost functional is designed to incentivize cooperation among the players. By taking the population size very large, a mean field game arises. The contributions of this paper are as follows. Firstly, we formulate the problem as first-order and second-order mean field games, the latter arises when we take stochastic disturbances into account. Secondly, mean field equilibria are derived by studying the corresponding linear-quadratic optimal control problem. Thirdly, results on stability of the first-order and second-order equilibria are established. A numerical study covering our findings concludes the paper.


Introduction
In this article we study dynamic demand by means of mean field games on a large population of prosumers. A prosumer is modeled as a player with two actions: to produce or to consume goods. The strategy of the prosumer (to produce or to consume) depends on the average behavior of the other players.
In the model proposed in this paper, we assume that there is a continuous demand. In order to meet the demand, the prosumer can choose to produce goods at certain time periods. However, producing goods is costly and we assume that a prosumer cannot produce indefinitely (for example, a machine needs maintenance, or workers have ended their shift, et cetera.) On top of that, producing too much and having an overstock is unfavorable as well, as there might be warehouse costs to store the goods, or the quality of goods is declining the longer they are stored. For the prosumer it is thus of interest to know when (s)he should "switch on" to produce, and when to "switch off" to consume.
In mean field games, the underlying structured network interconnecting the players does not play a role. In mean field games the amount of players tends to infinity, and thus it is not of interest to study which player does what, but rather, how many players do what. In other words, when a player decides his/her next action, it does not consider the strategies of its neighboring players, but (s)he bases his/her strategy on the average behavior of the population.
In the mean field game we will develop, we construct a cost function in such a way that it is advantageous to produce goods when many other prosumers are not producing, and to consume goods when many prosumers are producing. The crosscoupling mean field term serves as a feedback that influences the policy making of the prosumer.
The model in this paper is now as follows. Each prosumer is characterized by three state variables: the inventory level, the probability of being in the producer mode, and the probability of being in the consumer mode. The dynamics of these variables are captured by differential equations. Every player is furthermore equipped with a finite horizon cost functional, which involves consumption costs and possible error costs. The error costs can be regarded as deviations from the mean behavior, and thus this error serves as cross-coupling term. Every prosumer wants to minimize his/her own cost functional, which depends on the mean behavior through the error costs, and thus the mean field game framework comes into play.
The highlights in this article are as follows. We formulate dynamic demand of prosumers as a mean field game, and to the best of our knowledge, this has not been done before. Although a preliminary version of this paper has appeared as a conference proceeding [1], the conference article only addressed first-order mean field games. In this paper, we incorporate stochastic disturbances, driven by Brownian motion, on the dynamics. This leads to the study of second-order mean field games. As a second addition, we have included exogeneous disturbances on the state such that we can derive robustness results. Thirdly, we provide stability results on the mean field equilibria, for both the first-order and second-order cases.
Both first-order and second-order mean field games were formulated by Guéant et al. in [2] and Lasry and Lions [3], and independently by Huang et al. in [4,5] and [6]. For more information on mean field games in general, we refer the reader to [7] and [8]. Mean field games in the classical game theory notion can also be found in [9] and [10]. In [11], mean field games on storage in smart grids are studied. Mean field games in combination with dynamic marketing are also presented in [12], where the situation is applied to electrical power grids. Another example of an application of mean field games can be found in [13], where mean field games are used to analyze density flows in a network. Another study of prosumers in a game-theoretic setting has been conducted in [14], where the authors used a community-based market mechanism. For general information behind the economic models of prosumers and supply and demand, we refer to [15].
We denote by ℝ, ℝ n and ℝ n×m the real numbers, the real n-dimensional vector space and the set of all real n × m matrices, respectively. For a set of measured values x i , i = 1, … , n , the mean is denoted by x . The transpose of matrix A is denoted by A T , and with 0 n×n we denote the n × n zero matrix. Finally, < ⋅, ⋅ > F denotes the Frobenius inner product of two matrices.
The paper is organized as follows. The underlying model will be presented in Sect. 2 and first-order mean field games are introduced in Sect. 3. Next, we analyze the first-order mean field equilibrium and its stability in Sect. 4. A second-order mean field game, which arises naturally when we include stochastic disturbances, is formulated in Sect. 5. Two cases of stochastic disturbance are considered, both statedependent and state-independent variance, and results on the second-order mean field equilibrium can be found in Sects. 6 and 7, respectively. Finally, in Sect. 8 we run a numerical experiment covering our theoretical results. Conclusions and directions for future research are presented in Sect. 9.

Population of Prosumers
We will now describe the model of the prosumers, which will turn into a mean field game later on. For every prosumer we construct a cost functional, and the aim is to incentivize prosumers to cooperate.
Let us now consider a large population of prosumers and a finite time horizon window [0, T]. Each prosumer is characterized by the following continuous states: the inventory level x(t), the probability of being in the mode of producer y 1 (t) ∈ [0, 1] , and the probability of being in the mode of consumer y 2 (t) ∈ [0, 1] at time t ∈ [0, T] . In other words, when y 1 (t) = 1 the prosumer is producing, while when y 2 (t) = 1 the prosumer is consuming.
The inventory level of each prosumer evolves according to the following stockbased differential equation where the rates p and d are given positive scalars representing the production and demand rates, respectively, and is a positive constant. Note that in the model above we assume that there is always a demand present.
Next, we set the problem in a stochastic framework, where each prosumer is in either the mode of producer or it is in the mode of consumer, with probabilities y 1 ∈ [0, 1] and y 2 ∈ [0, 1] , respectively. The expected inventory level x(t) then evolves according to the following differential equation The modes y 1 (t) and y 2 (t) are characterized by evolving dynamics in the form of a differential equation as well. To formulate our problem as a control problem we assume that y 1 (t) and y 2 (t) can be controlled by input variables u 1 (t) and u 2 (t) , as follows: For a mean field game formulation, we need to consider a probability density function m(⋅) that describes the density of the prosumers which satisfies ∫ ℝ ∫ [0,1]×[0,1] m(x, y, t)dxdy = 1 for every t. Note that y is shorthand notation for (y 1 , y 2 ) , so by The function m is a probability density function and describes the distribution of the prosumers, i.e., the way in which the prosumers are distributed over the states (inventory level and mode). Let us now define which are the producer and consumer distributions, respectively.
To capture the mismatch between producers and consumers, we define the error as the difference between the percentage of producers and consumers: at every time t. Next, the running cost c(⋅) is as follows Note that these costs depend on the distribution m(⋅) by means of the error e(t). In the running costs, q 1 , q 2 , r 1 , r 2 , S and W are propitious positive scalars.
The running cost (6), is built up as follows. The first term 1 2 qx(t) 2 is a penalty on the inventory level, since we assume that it is desirable to have no overstock. The second term 1 2 q 2 (y 1 (t) − 1 2 ) 2 is a penalty on y 1 and is introduced such that y 1 will not deviate from the center of [0, 1] too much. This term is a scaled second-order Taylor approximation around y 1 = 1 2 of the inverse barrier function 1 In this way, the axiom of probability y 1 ∈ [0, 1] will not be violated. The terms 1 2 r 1 u 2 1 (t) and 1 2 r 2 u 2 2 (t) are costs on the inputs. The cost term y 1 (t)Se(t) can be regarded as a cost for following the crowd behavior. To see that, note that if the error e(t) > 0 is positive, i.e., there is more produced than consumed, minimizing the cost y 1 (t)Se(t) leads to y 1 → 0 . Hence, the prosumer will not produce. On the other hand, if there is more consumed than produced, the error is negative, and minimizing the cost leads to y 1 → 1 so the prosumer will produce. Lastly, we have a producing cost y 1 (t)W . Adding all this gives us the running cost c(⋅) as in (6), and the total costs for a prosumer now consist of the running costs plus some terminal cost g(X(T)), where X(T), denoting the state of the prosumer at the final time T, and g are yet to be specified. We will study the relation between y 1 (t) and y 2 (t) in more detail. In our model, we assume that y 2 (t) = 1 − y 1 (t) , that is, the probability of being in the mode of consumer is simply the complement of the probability of being in the mode of producer. By imposing y 1 (t) + y 2 (t) = 1 , the variable y 2 (t) becomes redundant and the above model simplifies as follows and the complete dynamics are governed by With the constraint y 2 (t) = 1 − y 1 (t) , we note that m(⋅) = m(x, y 1 , t) and For the error we now have that e(t) = (p + d)m p (t) − d.
The problem we consider now is the following. For a given finite horizon T > 0 and an initial distribution m 0 ∶ ℝ × [0, 1] → [0, ∞) , subject to the dynamics in system (8), we want to minimize the following cost functional over all possible inputs u(⋅).

First-order Mean Field Game
We will first review the concept of first-order mean field game. In a first-order mean field game, the microscopic dynamics is deterministic and the resulting mean field game involves only the first derivatives of the value function and of the density function. Consider the following generic cost subject to the following dynamics: Here, we write X(t) = [x(t), y 1 (t)] T for the state, we denote the running cost by c(⋅) and the terminal cost is g(X(T)). The input u(⋅) is any state feedback closed-loop control. Let the optimal value of J(X, u, t) be denoted by the value function v(X, t). Following the technical results in [3], the following mean field game can be formulated: These two coupled differential equations form the mean field game. The first partial differential Eq. (11) is the Hamilton-Jacobi-Bellman equation which returns the value function v(X, t), for a given distribution m(X, t). With the boundary condition v(X, T) = g(X(T)) on the final time T, this partial differential equation has to be solved backwards in time. The third line of (11) describes the optimal closed-loop control u * (X(t)).
The second partial differential Eq. (12) represents the transport equation. For a given optimal control u * (X(t)) , we know the vector field F(X, u * (X)) , and (12) gives us the distribution m(X, t) by solving the differential equation forward in time with boundary condition m(X, 0) = m 0 (X) at the initial time. Finally, once we know m(X, t), it can be put into the running cost c(X, u, m) and the error can be obtained from (5) with (9). We remark that X (t) is given by and hence, a mean field equilibrium solution is any pair (v(X, t),X(t)) that satisfies (11) and (12).
We will now proceed as follows. In the next section, the aim is to find a mean field equilibrium in the case of prosumers. To do so, we look at the mean field best response u * to a given distribution m(⋅) . We note that the problem we consider is of the form (10), if we substitute the running cost (6) together with the dynamics (8), where

First-order Mean Field Equilibrium
Considering the problem of minimizing the generic cost in (6) subject to dynamics as displayed in (8), we note that we have a linear-quadratic optimization problem of the form (10), namely: where we recall that X(t) = [x(t), y 1 (t)] T and where x(t) denotes the inventory level and y 1 (t) denotes the probability of being in the mode of consumer. Furthermore we have and also The linear-quadratic problem posed in (13) is easily solved by the use of calculus of variations [16], and we are now in a position to state and prove the main result.

Theorem 1
The mean field game described by (11) and (12) admits an equilibrium in terms of v and X that can be found by the following expressions In the above, the mean Ψ is given by and the unknowns P, Ψ, can be found by solving the following set of Riccati equations together with boundary conditions In addition to this, for the strategy of the mean field equilibrium we have that where P is an unknown symmetric matrix, Ψ is an unknown vector, and is an unknown scalar function. Then we have Now, the optimal state feedback is given by and with substitution of (13) we find Finding u * is straightforward, by taking the gradient of the above expression and setting it equal to zero in order to find the minimum: The corresponding Hamilton-Jacobi equation is now And substituting the expression for u * and rewriting we find Expanding the brackets and performing additional algebraic manipulations, such as PAx is a scalar and P is symmetric, we find By grouping the terms we obtain Now, if the terms between brackets in the above expression are zero, then at least the right-hand side equals zero. In order to have a sufficient condition for a solution the expressions between brackets need to be zero. These are the required Riccati equations. Substituting the optimal control u * in Eq. (13) to obtain the closed-loop dynamics, and averaging both sides, one obtains Eq. (14) and this completes the proof. ◻ Next, we will find a result on stability of the mean field equilibrium, governed by closed-loop dynamics (14). Let the set X denote the set of equilibria, i.e., where Ψ depends on the error e. We now introduce the function V(X(t)) to denote the distance from X(t) to this set X , so where Π X (X) denotes the projection of X onto the set X . We are now ready to state and prove the following corollary on stability. We show that the distance between the set of equilibria and the state space converges to zero. Hence, an equilibrium is reached asymptotically.

Corollary 2 Under the following inequality
the closed-loop dynamics are asymptotically stable, in other words, we have that lim t→∞ V(X(t)) = 0.
Proof We perform the following computation: where the first equality comes from using the chain rule. We note that in the final expression, the second term is negative by assumption. Hence, we derive that So we can establish that V(X(t)) converges asymptotically, and this completes the proof. ◻ To practically implement the result of Theorem 1, we make the following remark. Often a discretized version of Theorem 1 is considered, which is useful in numerical simulations, of which we will give an example in Sect. 8. In the spirit of receding horizon, at every iteration, instead of solving the Riccati Eq. (15) for a finite horizon window, one solves the Riccati equations corresponding to the infinite horizon control problem So, at every time instance the above equations are solved and consequently the optimal control input u * is computed. This is fed through giving us a new state and we repeat the procedure. We will also use this algorithm when we perform the numerical experiment in Sect. 8.

Second-order Mean Field Game
We now consider the general formulation of our problem, where there is stochastic disturbance incorporated in the dynamics driven by Brownian motion, and an exogenous disturbance w on the state. This yields a second-order mean field game, which involves second-order derivatives of the value function and density function. The dynamics are now described by a stochastic differential equation and the cost function is considered by its expected valuė PA + A T P − PBR −1 B T P + Q = 0, where B(t) denotes the Brownian motion in ℝ • and Σ(X) is the stochastic coefficient matrix. Furthermore the exogenous disturbance w and matrix D are given by Or, in particular, we have that Note that we are minimizing the control input u, while maximizing the disturbance w. In the above, c(⋅) denotes the running cost, g(X(T)) is the terminal penalty, and u(⋅) and w(⋅) are any closed-loop control and disturbance. Again as before, we denote v(X, t) as the value function which optimizes J(⋅). The second-order mean field game can now be formulated as follows, corroborating the technical results in [7] and [3] Here, < ⋅, ⋅ > F denotes the Frobenius inner product on two matrices, and Σ ik (X) denotes the ik th entry of the matrix Σ(X) (not to be confused with a summation symbol, that is also present in Eq. (20)). These two coupled differential equations form the second-order mean field game, since second-order derivatives of the value function v and distribution m appear in the Hamilton-Jacobi-Bellman Eq. (19) and transport Eq. (20), respectively. The rest is similar to the first-order formulation. (17)

dX(t) = (AX(t) + Bu(t) + C + Dw(t))dt + ΣdB(t).
(19) (20) So, as before, the first partial differential equation in (19) is the Hamilton-Jacobi-Bellman equation which needs to be solved backwards in time. For a given distribution m, it returns the value function v(X, t). The third and fourth lines of (19) are the optimal closed-loop control u * (X(t)) as minimizer and the worst-case disturbance w * (X(t)) as maximizer of the Hamiltonian function in the right-hand side.
The partial differential Eq. (20) represents the transport equation and it needs to be solved forward in time. Once the optimal closed-loop control u * (X(t)) and maximal disturbance w * (X(t)) are provided, and consequently the vector field F(X, u * (X), w * (X)) is known, it returns the distribution m. Once we know m(X, t), it can be put into the running cost c(X, u, m) and the error can be obtained from (5) with (9). We remark that X (t) is given by and hence, a mean field equilibrium solution is any pair (v(X, t),X(t)) that satisfies (19) and (20).
We will now proceed as follows. We look at the mean field best response u * and worst-case disturbance w * to a given distribution m(⋅) . We note that the problem we consider is of the form (17), if we substitute the running cost (6) together with the dynamics (8), where X(t) = [x(t), y(t)] T , and u(t) = [u 1 (t), u 2 (t)] T , and w(t) = [w 1 (t), w 2 (t)] T . For simplicity, in our analysis we let the stochastic matrix be given by so the first entry 1 (x) only depends on the inventory level x, and the last entry 2 (y) only depends on the fraction of producers y. Having this in mind, note that the Frobenius inner product appearing in (19) simplifies to As before, we consider a value function of the form where P(t) is a symmetric matrix. Note that

With this in mind, note that Hamilton-Jacobi-Bellman Eq. (19) becomes with boundary condition
We will now describe two stochastic mean field equilibria, for different choices of the stochastic matrix Σ(X).

Variance Linearly Dependent on the State
In a first scenario, suppose that the Brownian motion is linear in the state, as follows where s 1 , s 2 are some positive constants. This means that if the state is close to zero, the additional stochastic disturbance is small, while for a large deviation from the equilibrium, there is a high disturbance. In the presence of such a Brownian motion, we have the following result.

Theorem 3 A mean field game described by (19) and (20) admits an equilibrium of the form
Here, Ψ is given by where P is an unknown symmetric matrix, Ψ is an unknown vector, and is an unknown scalar function. Then we have Now, the optimal state feedback is given by and with substitution of (18) we find Finding u * is straightforward, we take the gradient in the above expression and set it equal to zero to find the minimum, which is Similarly, the maximal possible disturbance is given by and with substitution of (18) we find Finding w * is straightforward, we take the gradient in the above expression and set it equal to zero to find the maximum, which is The corresponding Hamilton-Jacobi equation is now And substituting the expression for u * and w * and rewriting we find Performing various algebraic manipulations and noting that 1 2 s 2 1 x 2 P 11 + 1 2 s 2 2 y 2 P 22 = 1 2 X T PX , we find And grouping the terms yield Now, if the terms between brackets in the above expression are zero, then at least the right-hand side equals zero. Thus, in order to have a sufficient condition for a solution the expressions between brackets need to be zero. These are the required Riccati equations.
Substituting the optimal control u * and maximum disturbance w * in Eq. (18) to obtain closed-loop dynamics, and averaging both sides, one obtains Note that there is no Brownian motion anymore since the average of this signal is zero. We have now derived Eq. (23), and this completes the proof. ◻ We are now in a position to derive a stability result on these dynamics. Substituting the mean field control u * and disturbance w * in (17), we obtain the following closedloop dynamics For the above stochastic differential equation, we let X denote the set of equilibria, i.e., Analogously to what we have done in Sect. 4, we let V(X(t)) denote the distance from X(t) to this set X , so where Π X (X) denotes the projection of X onto the set X . We have the following result on stability.

Corollary 4 Let the following inequality hold true
In this case, the distance function V(X(t)) is second-moment bounded.
What this means is that the distance between the set of equilibria and the state space is bounded by the variance. The proof is as follows.
Proof For dynamics governed by the infinitesimal generator of X is an operator A acting on V: where X denotes the gradient, and <, > F denotes the Frobenius inner product. In our case, this means that .
And we note that AV(X) . This completes the proof. ◻

Constant Variance
In another scenario, suppose that the Brownian motion Σ in Eq. (17) is constant, by which we mean that it is independent of the state, as follows: Here, s 1 , s 2 are some positive constants. In such a case, we have the following result.
Proof We have already established that the optimal input u * and maximal disturbance w * are given by The corresponding Hamilton-Jacobi equation is And substituting the expression for u * and w * and rewriting we find Expanding the brackets gives Recalling that x T PAx is a scalar and thus equal to its transpose, we remark that x T PAx = (x T PAx) T = x T A T Px since P is symmetric. Furthermore, note that 1 2 s 2 1 P 11 + 1 2 s 2 2 P 22 = 1 2 trace(P) , and so doing some additional algebraic manipulations we find And we note that AV(X) < 0 if the condition holds. This completes the proof. ◻

Numerical Experiment
In this section we corroborate the theoretical results by performing a numerical studies. We analyze a population of n = 200 prosumers and we set a time frame of T = 40 . For each prosumer, the production and demand rates are p = 5 and d = 3 , respectively, and we have set = 1 . The penalties on the input are set to r 1 = r 2 = 10 , we set q 2 = 10 and the producing costs are W = 10.
As a first experiment, we show the validity of Theorem 1. To run the numerical experiment, we apply a discretized version of (14). At every iterative step, for every prosumer, we compute the best response u * for the infinite horizon control problem, which is then fed through to compute the next state. Equivalent to solving the Riccati equation using the build-in Matlab command care, we can also use the direct result established in Theorem 1. We then have In a first case, we take a high penalty on the inventory level with q 1 = 100 , while setting a relative low penalty on the error term with S = 10 . We have plotted the dynamics of every agent in Fig. 1. Due to the large value of the penalty on the inventory level q 1 , the inventory levels are driven to zero as expected. Here, having no inventory is the desired case, as a positive inventory level leads to storage or warehouse costs, while a shortage in inventory means that one could not match the demand and the revenue is lower. Both of these situations are undesirable, and by introducing a relative high costs q 1 = 100 on the inventory level x(t), we penalize for these cases.
We also note that the mode y 1 , i.e., the probability of being in the mode of producer, converges to a value of 0.60. This is to be expected as the dynamics are governed by ̇x = py 1 − d − x and for an inventory level of zero, equality holds true if y 1 = d p = 0.60.
In a second experiment we take a relative high penalty on the error, S = 100 , while the penalty on the inventory level is quite low with q 1 = 10 . The results are shown in Fig. 2. We expect that for this case the error is steered to zero, due to the high penalty on the error. Since e = (p + d)ȳ 1 − d , we note that e = 0 if and only if ȳ 1 = d p+d = 0.375 . We observe from the plotted results in Fig. 2 that the mode y 1 converges to this value as expected. The converging value of the inventory level in this case is as follows. We have ̇x = py 1 − d − x and as y 1 converges to 0.375, the inventory level converges to x = py 1 −d = −1.125 then. This can be seen from the simulation results in Fig. 2 as well. Finally, we run simulations in which we add stochastic disturbances in the form of a Brownian motion. The Brownian motion can be regarded as a random walk, where at each step one adds an infinitesimal random walk of −1, 0, +1 . For the simulations, we consider the first scenario where we have q 1 = 100 and S = 10 . We have shown before how in such scenario we converge to the equilibrium of x = 0 and y 1 = 0.60 . We now show how the performance changes when we incorporate the Brownian motion. First, we decided to add a state-dependent variance. The simulation results are shown in Fig. 3. We observe that we still reach an attractive neighborhood around the equilibrium (x, y 1 ) = (0, 0.60) . We do, however, not completely converge to a single value as we constantly add a stochastic disturbance interfering with the dynamics.
In a second case, we decided to add a state-independent variance. The simulation results are shown in Fig. 4. We observe that we still reach an attractive neighborhood around the equilibrium (x, y 1 ) = (0, 0.60) . We do, however, not completely converge to a single value as we constantly add a stochastic disturbance interfering with the dynamics. The disturbance in this case is bigger as compared to the previous case, because here we have a state-independent variance. In the case of a state-dependent variance, the variance goes to zero in the equilibrium.

Conclusion
In this paper we analyzed mean field games on a large population of prosumers. In our context, a prosumer is an agent that can either produce or consume, and its decision to produce or consume depends on the average behavior of the population. A prosumer is characterized by an inventory level, the probability of being in the producer mode, and the probability of being in the consumer mode. For simplicity, in this paper we have set the probability of being in consumer mode equal to the complement of the probability of being in producer mode. We studied both first-order and second-order mean field games, either with or without disturbances. This disturbances can either be exogeneous on the state, or stochastic on the dynamics. In any case, we derived a formulation for the mean field equilibrium and we found corresponding stability results on the equilibria.
In future research, robustness properties are of interest. Also we would like to study the case in which the probability of being in consuming mode is not the complement of the probability of being in producing mode. In such a case, a third state corresponding to being idle is added, and in such a state the prosumer is not producing nor consuming.

Availability of Data and Materials Not applicable.
Code Availability The randomly generated datasets and corresponding code are available from the corresponding author on reasonable request.