Diffusion and Localization of Relative Strategy Scores in The Minority Game

We study the equilibrium distribution of relative strategy scores of agents in the asymmetric phase (α≡P/N≳1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha \equiv P/N\gtrsim 1$$\end{document}) of the basic Minority Game using sign-payoff, with N agents holding two strategies over P histories. We formulate a statistical model that makes use of the gauge freedom with respect to the ordering of an agent’s strategies to quantify the correlation between the attendance and the distribution of strategies. The relative score x∈Z\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in \mathbb {Z}$$\end{document} of the two strategies of an agent is described in terms of a one dimensional random walk with asymmetric jump probabilities, leading either to a static and asymmetric exponential distribution centered at x=0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x=0$$\end{document} for fickle agents or to diffusion with a positive or negative drift for frozen agents. In terms of scaled coordinates x/N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x/\sqrt{N}$$\end{document} and t / N the distributions are uniquely given by α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} and in quantitative agreement with direct simulations of the game. As the model avoids the reformulation in terms of a constrained minimization problem it can be used for arbitrary payoff functions with little calculational effort and provides a transparent and simple formulation of the dynamics of the basic Minority Game in the asymmetric phase.


Introduction
A minority game can be exemplified by the following simple market analogy; An odd number N of traders (agents) must at each time step choose between two options, buying or selling a share, with the aim of picking the minority group. If sell is in minority and buy in majority one may expect the price to go up to satisfy demand and vice versa if buy is in minority, thus motivating the minority character of the game. Clearly, there is no way to make everyone content, at least half of the agents will inevitably end up in the majority group each round. As the losing agents will try to improve their lot there is no static equilibrium. Instead, agents might be expected to adapt their buy or sell strategies based on perceived trends in the history of outcomes [1][2][3][4][5][6][7][8][9][10][11][12].
The Minority Game proposed by Challet and Zhang [2,3] formalizes this type of market dynamics where agents of limited intellect compete for a scarce resource by adapting to the aggregate input of all others [1,12]. Each agent has a set of strategies that, depending on the recent past history of minority groups going m time steps back, gives a prediction of the next minority being buy or sell. The agent uses at each time step her highest scoring strategy which has most accurately predicted correct minority groups historically. The state space of the game is given by the strategy scores of each agent together with the recent history of minority groups, and the discrete time evolution in this space represents an intricate dynamical system.
What makes the game appealing from a physics perspective is that it can be described using methods for the statistical physics of disordered systems, with the set of randomly assigned strategies corresponding to quenched disorder [5,8,[13][14][15][16][17]. In particular Challet, Marsili, and co-workers showed that the model can be formulated in terms of the gradient descent dynamics of an underlying Hamiltonian [13], plus noise. The asymptotic dynamics corresponds to minimizing the Hamiltonian with respect to the frequency at which agents use each strategy, a problem which in turn can be solved using the replica method [8,17,18]. In a complementary development Coolen solved the statistical dynamics of the problem in its full complexity using generating functionals [14][15][16].
The game is controlled by the parameter α = P/N , where P = 2 m is the number of distinct histories that agents take into account, which tunes the system through a phase transition (for N → ∞) at a critical value α c = 0.3374 . . .. In the symmetric (or crowded) phase, α < α c , the game is quasi-periodic with period 2P where a given history gives alternately one or the other of the outcomes for minority group [4,19]. A somewhat oversimplified characterization of the dynamics is that the information about the last winning minority group for a given history gives a crowding effect [20] where many agents want to repeat the last winning outcome which then counterproductively instead puts them in the majority group. The crowding also gives large fluctuations of the size of the minority group.
In the asymmetric (or dilute) phase, α > α c , agents are sufficiently uncorrelated that crowding effects are not important and there is no periodic behavior. Instead, as exemplified in Fig. 1 the score dynamics is random but with a net correlation between agents that makes fluctuations in the size of the minority group small. The dilute occupation of the full strategy space gives rise to a non-uniform frequency distribution of histories which can be beneficial for agents with strategies that are tuned to this asymmetry.
In this paper we study the dynamics of the Minority Game in the asymmetric phase by formulating a simplified statistical model, focusing on finding probability distributions for the relative strategy scores. In particular, we study the original formulation of the game with sign-payoff for which quantitative results are challenging to derive. By sorting the strategies based on how strongly they are correlated with the average over all strategies in the game, we find that sufficient statistical information can be extracted to formulate a quantitatively accurate model for α 1.
We discuss how the relative score for each agent can be derived from the master equation of a random walk on a chain with asymmetric jump probabilities to nearest neighbor sites, and how these jump probabilities can be calculated from the basic dynamic update equation of the scores. The corresponding probability distributions of scores are either of the form of exponential localization or diffusion with a drift. In the appendices we show that the model is related to but independent from the Hamiltonian formulation and we show how it can also be readily applied to the game with linear payoff where the master equation has long-range hopping.
Although the MG is well understood from the classic works discussed above, it is our hope that the simplified model of the steady state attendance and score distributions presented in this paper provides an alternative and readily accessible perspective on this fascinating model.

Definition of the Game and Outline
In order to give an overview of our results and for completeness we start by providing the formal definition of the Minority Game and some basic properties [2,3,10,11].
At each discrete time step every agent gives a binary bid a i (t) = ±1, all of which are collected into a total attendance (N odd) and the winning minority group is then identified through −sign(A t ). A binary string of the m past winning bids, called a history μ, is provided as global information to each agent upon which to base her decision for the following round. There are thus μ = 1, . . . , P with P = 2 m different histories. At her disposal each agent has two randomly assigned strategies (a.k.a. strategy tables) that provide a unique bid for each history. The bid of strategy j = 1, 2 of agent i = 1, . . . , N in response to history μ is given by a μ i, j = ±1 and the full strategy is the P dimensional random binary vector a i, j . There are thus a total of 2 P distinct strategies available.
The agent uses at each time step the strategy that has made the best predictions for minority group historically. This is decided by a score U i, j (t) for each strategy which is updated according to U i, , irrespectively of the strategy actually being used or not. (Here the superscript μ on A t just indicates that the attendance will depend on the history μ(t) giving the bids at time t.) Ties, i.e. U i,1 = U i,2 , are decided by a coin toss.
Since it is only the relative score between an agent's two strategies that is important in deciding which strategy to use, one may focus on the relative score This is updated according to where and where ξ i = ( a i,1 − a i,2 )/2 is an agents "difference vector" that takes values ±1 or 0 for each history μ.
To make the dynamics generated by these equations more concrete, Fig. 1 shows the scores of the strategies of four particular agents U i,1/2 , i = 1, . . . , 4 for one realization of a game with N = 101, P = 2 7 , together with the corresponding relative scores x i (inset), over a limited time interval. As exemplified by this figure agents come in two flavors, known as "frozen" and "fickle" [5,14]. An agent is frozen if one of her strategies performs consistently . At each time step every agent uses the one of her two strategies which has the highest momentary score, given by how well the strategy has predicted the past minority groups. The corresponding score difference x i (t) (inset) shows the distinction between frozen agents that consistently use a single strategy, and fickle agents that switch between strategies better than the other, such that on average the score difference is diverging, whereas fickle agents have a relative score that meanders around x = 0 switching their used strategy. The motion of x i for both fickle and frozen agents is a random walk with a bias towards or away from x = 0. A basic problem is to characterize and understand this random walk and derive the corresponding probability distribution P i (x, t); the probability to find agent i at position x at time t [10,16].

Outline and Results
As presented in Sect. 3 we can quantify the correlation between an agent's strategies, specified by ξ μ i , and the total attendance A μ t , which in turn allows for characterizing the mean (time averaged) step size i = x i (t + 1) − x i (t) in terms of a distribution over agents P( i ). In agreement with earlier work we find that i has two contributions; one center (x = 0) seeking bias term which arises from self interaction (the used strategy contributes to the attendance and as such is more likely to be in the majority group [17]) and a fitness term which reflects the relative adaptation of the agent's two strategies to the time averaged stochastic environment of the game. The distribution of step sizes over the population of agents are shown in Fig. 3 where frozen agents are simply those where the fitness overcomes the bias, such that i > 0 for x > 0 or i < 0 for x < 0, whereas for fickle agents i < 0 for x > 0 and vice versa.
Knowing the mean step size of an agent allows for a formulation in terms of a one dimensional random walk ( Fig. 4) with corresponding jump probabilities, as presented in Sect. 4. Depending on whether it is more likely to jump towards the center or not (fickle or frozen respectively) the master equation on the chain can be solved in terms of a stationary exponential distribution centered at x = 0 or (in the continuum limit) a normal distribution with a variance and mean that grow linearly in time (diffusion with drift). These are the distributions P i (x, t) depending on i .
In simulations over many agents it is natural to consider the full distribution P( thus the probability of finding an agent at time t with relative score x. In terms of scaled coordinates x/ √ N and t/N we find that the distribution only depends on α. The model distributions show excellent agreement with direct numerical simulations (Figs. 5 and 6) with no fitting parameters. This result for the full distribution of relative scores together with its systematic derivation for the original sign-payoff game represent the main results of this paper.
In Appendix 2 we discuss the relation between the model presented in this work and the formulation in terms of a minimization problem of a Hamiltonian generator of the asymptotic dynamics [8,13]. We find that one way to view the present model is as a reduced ansatz for the ground state where the only parameters are the fraction of positively and negatively frozen agents (solved for self-consistently) instead of the full space of the frequency of use of each strategy. With this ansatz closed expressions can be derived for the steady state distributions irrespective of the form of the Hamiltonian.
In Appendix 3 we show how the model applies to the game with linear payoff

Statistical Model
We will now turn to describing the statistical model in some detail and derive the results discussed in the previous section. We define for each agent the sum and difference of strategies for each bid ω i = ( a i,1 + a i,2 )/2 and (as discussed above) ξ i = ( a i,1 − a i,2 )/2 [5]. Clearly ω μ i , being the sum of two random numbers ±1 is distributed over (−1, 0, 1) with probability (1/4, 1/2, 1/4). A non-zero value of ω μ i means that agent i always has the same bid for history μ independently of which strategy it has in play. The sum over all agents, = N i=1 ω i , thus gives a constant history dependent but time independent background contribution to the attendance. (In the sense that every time history μ occurs in the time series it gives the same contribution.) This background μ is, for large N , normally distributed with mean zero and variance An interesting property of the Minority Game is that there is a "Z 2 gauge" freedom with respect to an arbitrary choice of which is called strategy 1 and which is 2, thus corresponding to a change of sign of ξ i . Such a sign change will simply result in a change of sign of x i (t) having no consequence on which strategy is actually in play. (It is the strategy in play which is an observable, not whether it is labeled by 1 or 2.) Nevertheless, it turns out that making a consistent definition of the order of strategies is helpful in formulating a simple statistical model. Explicitly we order the two strategies ("fix the gauge") of all agents i such that Shortly we will describe the distribution over agents of ξ μ i , to quantify its anticorrelation with μ i . To proceed we write the attendance at a time step t with history μ as where s i (t) = ±1 depending on which strategy agent i is playing [5]. Again, the relative strategy score x i of agent i is updated according to Eq. 4. Given the background contribution to the attendance we expect there to be a surplus of s i = 1 in the steady state with our choice of gauge because the strategy 1 is expected to be favored by the score update function.
(In other words, strategy 1 is expected to have a higher fitness.) However, this correlation is not trivial as the accumulated score also depends on the dynamically generated contribution the attendance. As discussed previously some fraction φ of the agents are frozen, in the sense of always using the same strategy, s i = constant. We make an additional distinction (made significant by our choice of gauge) and separate the group of frozen agents into those with s i (t) = 1 (fraction φ 1 ), and those with s Clearly, we expect the former to be more plentiful than the latter. We will now derive steady state distributions over agents for the mean step size i . For this purpose we will write the attendance as where corresponding to the three categories of agents discussed previously. We will make the following simplifying approximations for these three components: the fickle component we will model as completely disordered, such that s i (t) = ±1 is random, and correspondingly (for large N) S t is normally distributed with mean zero and variance the fraction of fickle agents. (Thus, neglecting that the fickle agents would also have a net anticorrelation with the background ). We will assume the frozen agents to simply be a sum of independent random variables drawn from the distribution of ξ , thus neglecting that the agents that are frozen may come from the extremes of this distribution.
To proceed, we need to find the distribution of ξ i , i.e. how it varies over the set of agents. (Henceforth we will usually drop the index i and regard the objects as drawn from a distribution.) Begin by defining ψ = Random(±1) ξ , which is thus disordered with respect to the sign of · ψ 1 . The object ψ μ is independent of μ (ignoring 1/N corrections due to μ = 0 limiting the available bids ±1), taking values (1, 0, −1) with probability (1/4, 1/2, 1/4), which gives mean zero and variance 1/2. Consider the joint object h = 1 P · ψ, for large P this becomes normally distributed with mean zero and variance σ 2 h = 1 P (N /2)(1/2) = 1/(4α) [5]. Now, to quantify the correlation between ξ and we define the object Note that what we here refer to as ψ is what is called ξ in the literature [5]. In this paper we reserve ξ for the object where strategies are ordered such that · ξ i ≤ 0, corresponding to ξ which consequently has mean <h >= − dh P(h)|h| = −1/ √ 2πα and <h 2 >= σ 2 h . We will represent this distribution by assuming that each component ξ μ are independent Gaussian random variables with a mean that is linearly dependent on μ . With this assumption we find the conditional distribution where c(α) = 2 πα , and σ 2 ξ = 1/2, and where we write the normal distribution over x with mean μ and variance This quantifies that ξ μ is on average anticorrelated with μ which is expected to place strategy 1 in the minority group more often than strategy 2.
Using Eq. 11 we can also calculate the distributions of X μ (Y μ ) as the sum of with conditional variances σ 2

Distribution of Step Sizes
Given the model expressions for the distributions of all the components of the score update equation (Eq. 4) we will find the distribution of mean (time averaged) step sizes. As a first step we integrate out the fast variable S t to get a conditional on μ time averaged step size μ = (t)|μ . (Over a long time series of the game every history μ will occur many times, we thus average over all those occurrences of a single history.) This corresponds to The second term, which is a self-interaction, follows from the discrete nature of the original problem. It gives a negative bias for the used strategy coming from the fact that if the net attendance from all other agents is zero, the used strategy puts the agent in the majority group.
(The factor 1 2 in the delta function is to account for the fact that the attendance, as defined in Eq. 1, changes in steps of two and the factor sign(x)ξ μ comes from the fact that only the used strategy enters the attendance.) Integrated this gives where we have identified the first term as a fitness fit which quantifies the relative fitness of the agent's two strategies and the second as a negative bias bias for the used strategy as discussed previously.
To calculate the distribution of mean step sizes we will assume that histories occur with the same frequency such that = 1 P μ μ . This is in fact not the case for a single realization of the game in the dilute phase, some histories occur more often than others, as one can see directly from any simulation in this regime. Nevertheless, for large P we will assume that this variation of occurrences of μ averages out. As discussed extensively in the literature the overall behavior of the game is insensitive to whether the actual history is used (endogenous information) as input to the agents or if a random history is supplied (exogenous information) [10,11,16,21,22]. This is also confirmed by the present work through the good agreement between the model using exogenous information and simulations in which we use the actual history.
Assuming large P and given the assumption of independence of the distributions , ξ, X, Y for different μ we expect the distribution P( ) to approach a Gaussian (by the central limit theorem) with mean with μ as in Eq. 15, and with variance σ 2 = 1 P ( 2 −¯ 2 ). The integrals are readily done analytically as described in the Appendix 1, but the expressions are very lengthy. The main features can be expressed in the following form: where˜ bias/fit > 0 are functions that only depend on N and P through α = P/N , change slowly as a function of the arguments in the physically relevant regime 0 ≤ φ 1 + φ 2 ≤ 1 (Fig.  7) and which satisfy˜ bias (α, 0, 0) = 1 √ 2π and˜ fit (α, 0, 0) = 1 π . As seen from Eq. 17, the mean bias is towards x = 0, the used strategy is penalized, while the mean fitness is positive acting to increase the relative score x, consistent with our choice of gauge as discussed earlier.
The only appreciable contribution to the variance comes from the fitness term scaling as 1/P whereas the bias has a variance that scales with 1/(N P) and thus negligible (as is the cross term). The variance can be written whereσ > 0 also changes slowly in the relevant regime (Fig. 7) and satisfiesσ (α, 0, 0) = 1 The width of the fitness distribution explains the fact that even though¯ fit > 0 consistent with φ 1 = 0, there are also some agents with a large negative fitness which implies φ 2 = 0. The fact that ξ · < 0 thus does not necessarily imply that strategy 1 is more successful than strategy 2 as the correlation with the other frozen agents is also an important factor. For large α, both the mean and variance of the fitness vanish, as can be understood as a result of there being too few agents compared to the number of possible outcomes to maintain any appreciable correlation between an agents strategies and the aggregate background, ξ · ≈ 0. In this limit, since the bias term always penalizes the used strategy there can be no frozen agents. We also see that both the mean and width of the distribution for given α scales with 1/ √ N , consistent with simulations (Fig. 3).

Fraction of Frozen Agents
For each agent the score difference x i moves with a mean step per unit time of where fit is drawn from the distribution N (¯ fit , σ fit ). If the fitness is high, such that + > 0, the agent will have a net positive movement and the agent is frozen, with x i > 0 and growing unbounded. The fraction of positive frozen agents is given by Similarly, if the fitness is relatively very poor, such that − < 0 the agent is frozen (with x i < 0) with magnitude growing unbounded. The fraction of negatively frozen agents is given by and correspondingly the complete fraction of frozen agents φ = φ 1 + φ 2 and fickle agents ϕ = 1−φ are found. Since˜ fit ,˜ bias , andσ are functions of α, φ 1 , and φ 2 , the two equations allow for solving for φ 1 (α) and φ 2 (α) as a function of the only parameter α. We find that the solutions are readily found by forward iteration, and the results are plotted and compared to direct simulations of the game in Fig. 2 2 . The fit is good, but there is no indication of a phase transition for small α in this simplified model. From simulations we can also measure the distribution of mean step sizes to compare to the model, which is shown in Fig. 3. There we show an intermediate value of α, the fit in terms of mean and width is not as good close to α c and almost perfect for large α, but everywhere the data seems well represented by a normal distribution. We also use the mean step size distributions from simulations to calculate the fraction of frozen agents, Fig. 2. (The naive way to distinguish between frozen and switching agents; to introduce a cut-off x cut at some time t, with any agents with |x t | > x cut considered frozen, makes it difficult to distinguish between frozen and switching agents with near 0.)

Distributions Over x
We now use the fact that each agent is characterized by an average step size per unit time, specified by the fitness fit , to describe the movement of the relative score x on the set of integers. Consider that the agent at time step t has score difference x, what is the probability that at time t + 1 the score difference is x ? In each time step, x can only change by −1, 0, 1 as given by the basic score update Eq. 4. We specify the respective probabilities p − , p 0 , p + with p − + p 0 + p + = 1 for x > 0 and q − , q 0 , q + for x < 0. The mean probability that x remains unchanged is p 0 = q 0 = 1 2 as this corresponds to ξ μ i = 0, meaning that the agent's two strategies have the same bid which on average (over μ) will be the case for half of the histories. It should also be clear that the stepping probabilities cannot depend on the magnitude of x, only the sign, because the difference in score between strategies does not enter the game, only which strategy is currently used. The case x = 0 has to be treated separately; we toss a coin to decide which strategy is used, thus the probability for a +1 increment is ( p + + q + )/2 and for a −1 increment is ( p − + q − )/2. The movement of x thus corresponds to a one-dimensional random walk on a chain, with asymmetric jump probabilities, as sketched in Fig. 4.
To relate the probabilities to the mean step size we note that for x > 0, + = 1 · p + + 0 · p 0 − 1 · p − , which together with the conservation of probability and the fact that p 0 = 1/2 gives where results for q follow from the same analysis for x < 0. Keeping in mind that for a fickle agent + < 0 and − > 0 this is of course consistent with p + < p − and q − < q + . A frozen agent is instead given by p + > p − or q − > q + . . . . Fig. 4 The movement of the relative strategy score x of an agent is described by a random walk on a chain with jump probabilities p + , p − , p 0 for x > 1 (i.e. strategy 1 in play) and q + , q − , q 0 for x < −1 (i.e. strategy 2 in play). At the boundary x = −1, 0, 1 due to the coin toss choice of strategy the probabilities are altered as in the figure With the known probabilities we can write down a master equation on the chain for the probability distribution P x (t) (implicit fit dependence) and at the boundary Assuming that the distribution is stationary, such that P x (t) = P x , and concentrating on x > 0, we find after some manipulations the equation which has the exponential solution In the last step we used Eq. 23 and the fact that from Eq. 17 the mean step size is small such that | + | ∼ 1/ √ N 1. From this we can identify a decay length x + = 1/(4| + |) ∼ √ N , which characterizes the range of positive excursions of the score difference of the fickle agent. Clearly, this solution requires p − > p + ( + < 0) to be bounded, as is the case for fickle agents. From the same analysis for x < 1 the fickle agents with q − < q + have the What remains is to match up the solutions for positive and negative x at the interface. This can be solved exactly, but given that the exponential prefactor is small we settle for the approximate expression From this expression we see that the distribution is asymmetric, such that given that on average | + | < − agents are more likely to be found with x > 0. This opens up for a more sophisticated modelling (left for future work) where this aspect is fed back into the initial statistical description of the sum of fickle agents through the dynamical variable S t , the total attendance of the fickle agents, acquiring a mean depending on μ.
For the frozen agents the master equation is the same, but given p + > p− (or q − > q + ) we expect a drift of the mean of the distribution. Thus focusing on long times we can consider one or the other of Eqs. 25 depending on whether the agent is frozen with x > 0 or x < 0. For x > 0 and assuming that the agent at time t = 0 is at site x = 0 (neglecting the influence of any excursions to x < 0) we can write down an exact expression for P x (t) in terms of a multinomial distribution. Alternatively, and simpler, we can take the continuum limit P x (t + 1) = P(x, t) + d P dt and P x±1 (t) = P(x, t) ± d P dx + 1 2 d 2 P dx 2 to find the Fokker-Planck equation Given the initial condition P(x, 0) = δ(x) this has the solution P(x, t) = N x (x, σ t ) with x = ( p + − p − )t = + t and σ 2 t = ( p + + p − )t = 1 2 t, thus describing diffusion with a drift.

Full Score Distributions
Given that we now have a description of the relative score distribution of a single agent in terms of an asymmetric exponential decay or diffusion, we can also consider the full distribution of relative scores over all agents, by integrating over the distribution of mean step sizes. Defining the scaled variablesx = x/ √ N andt = t/N we write P(x,t) = P fi (x) + P fr,+ (x,t) + P fr,− (x,t), corresponding to the stationary distribution of the fickle agents and diffusive distributions of the frozen agents with x > 0 and x < 0 respectively. The first component is where ± corresponds to x < 0 and x > 0 respectively, and where b α = |˜ bias |. For the frozen agents we have where σ 2 t =t/2. These expressions are compared to direct simulations of the game for intermediate α ≈ 4 in Fig. 5. The simulations are averaged over a specific time window and the diffusive component Eq. 31 is integrated over the corresponding scaled time window. The agreement is excellent over the complete stationary and diffusive components of the distribution and shows the data collapse in terms of scaled coordinates. In Fig. 6 we also show a comparison for large α ≈ 80 where the simulations have no frozen agents and all fickle agents are localized by a length close to the α → ∞ value x 0 = √ π N /8.
The asymmetry of these plots is an artefact of our gauge choice ξ i · ≤ 0 which implies that on average agents will use strategy 1 (x > 0) more frequently than strategy 2 (x < 0). To restore the full symmetry is simply a matter of symmetrizing the distributions around x = 0.
Finally, we remark that the formal solution in terms of an exponential distribution of strategy scores for frozen agents was derived in [13] from a Fokker-Planck equation for the linear payoff game. See Appendix 2 and 3 for a further discussion of the comparison between the present model and the Hamiltonian formulation.

Summary
We have studied the asymmetric phase of the basic Minority Game, focusing on the statistical distribution of relative strategy scores and the original sign-payoff formulation of the game. We formulate a statistical model for the attendance that relies on a specific gauge choice in which the two strategies of each agent are ordered with respect to the background ( ξ i · ≤ 0 for all agents i). Using this model we can derive a distribution of the mean step per time increment for the relative scores, specified in terms of a bias for the used strategy and the relative fitness of the two strategies. The relative strategy score for each agent is conveniently described as a random walk on an integer chain, where the jump probabilities are calculated from the mean step. The probability distribution of observing the agent at some position on the chain at a given time is either given by a static asymmetric exponential localized around x = 0 for fickle agents or to diffusion with a drift for frozen agents. Excellent agreement with direct simulations of the game for the score distribution confirms the basic validity of the modelling. At the same time, as discussed in the appendix, the fluctuations of the attendance are overestimated by the model. By contrasting with the Hamiltonian formulation of the dynamics the reason for this discrepancy is readily understood from viewing the model as a crude ansatz for full minimization problem. This also opens up for improving the model by introducing some variational parameters without having to confront the full complexity of the minimization of a non-quadratic Hamiltonian for general payoff functions. We thank Erik Werner for valuable discussions. Simulations were performed on resources at Chalmers Centre for Computational Science and Engineering (C3SE) provided by the Swedish National Infrastructure for Computing (SNIC).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix 1: Solving for Mean and Variance of Step Size
The integrals to calculate the mean and variance for the distribution of average step sizes, Eq. 16, are Gaussian integrals including the error function. To solve these we first rescale the variables in terms of the variance /σ → , X/σ X | → X etc. and perform the integral over the distribution of agents ξ which evaluates to ξ | = −c(α) / √ 2N (c(α) = 2 πα ) and ξ 2 | = 1 2 . We are left with integrals bias = −sign(x) and To evaluate these we use the following integral formulas and where A is a symmetric (positive definite) matrix, and b and c are real constants. The bias term thus follows from a direct application of the first integral formula to a 3 × 3 matrix. The fitness term follows from a substitution X = X + √ φ 1 c(α) and Y = Y − √ φ 2 c(α) to apply the second integral formula over and subsequently the first integral formula on a 2×2 matrix. The variance can be calculated by the substitution for , z = + √ φ 1 X + √ φ 2 Y , followed by integrating out X and Y to finally apply the third integral formula over z. The actual expressions are quite lengthy 3 , but the important features can be represented according to Eqs. 17 and 19 in terms of functions˜ bias (α, φ 1 , φ 2 ),˜ fit (α, φ 1 , φ 2 ), andσ (α, φ 1 , φ 2 ). After solving for for the fractions of frozen agents φ 1 (α) and φ 2 (α) using Eqs. 21 and 22, we can consider these functions as dependent only on the control parameter α. The dependence on α is plotted in Fig. 7, to point out that these functions change little over the whole relevant range α > α c ≈ 0.3.
This we can alternatively write (using Eq. 38) as σ 2 = 1 P μ A|μ 2 + σ 2 S = H + σ 2 S . Here H , the predictability, also has the alternative form (using Eq. 6) Correspondingly we find for the rapidly fluctuating field S t the variance (using σ 2 ξ = 1/2). The latter expression has no contribution from frozen agents (as expected), and assuming that the distribution of m i is quite strongly centred at 0 it will be close to, but always lower than, our assumed value of ϕ N /2.
Consider now the fixed history time averaged step size for agent i, The aim is to find a Hamiltonian generator H of the long time dynamics such that the time and history averaged update is given bȳ (Note that this expression is not equivalent to Eq. 16. The latter is the mean of a distribution, whereas the present object represents the full distribution of average step sizes over agents corresponding to different i.) A function that does this is H = d S P(S)G( A|μ + S) where G(x) = x sign(x) such that dG dx = sign(x), which evaluates to Thinking of the long-time evolution of the score difference for agent x i which has an average step size¯ i , we find that if¯ i > 0 the agent will be frozen positive, with m i = 1 and similarly if¯ i < 0 it will be frozen negative, with m i = −1. Only if¯ i = 0 the agent will be fickle, with −1 < m i < 1. Considering that¯ i = − ∂H ∂m i we find the three cases: m 1 = 1 corresponds to ∂H ∂m i < 0, m 1 = −1 corresponds to ∂H ∂m i > 0, and −1 < m i < 1 corresponds to ∂H ∂m i = 0. The solution to this thus corresponds to finding the minimum of H with respect to {m i }.
The minimization of Eq. 39 however, looks like a formidable problem in the thermodynamic limit, and we are not aware that it has been pursued in the literature. (Note that A|μ ∼ √ N ∼ σ S such that an expansion in A|μ is not appropriate.) This is in contrast to the case of linear payoff (see Appendix 3) where H linear = H = 1 P μ A|μ 2 which is a quadratic form in the variables m i . For the latter case the minimization problem has been solved using the replica method [8,17,18]. The equilibrium score distributions that we focus on in the present work have been solved for in [13] but to the best of our knowledge not for the sign-payoff game. Also, it appears that these distributions have not been discussed or studied in any detail, or compared to simulations, in earlier work. where as before c = c(α) = √ 2/πα and φ 1 and φ 2 are the respective fractions of frozen agents. We note that the step size is of order 1 for the linear payoff, compared to order 1/ √ N for the sign payoff game. Similarly in both cases, for large α the fitness drops out, ensuring that there are no frozen agents. For moderate α the fraction of frozen agents need to be solved for self-consistently through the equations As for the sign-payoff game the results from solving these equations numerically are in good agreement with simulation data in the dilute phase as shown in Fig. 8. (Note, compared to Fig. 2, that both the data and model results for the fraction of frozen agents are very similar and quite insensitive to whether sign-payoff or linear payoff is used.) The fluctuations of attendance σ 2 = A 2 = H + ϕ N /2 with H = 1 P μ A|μ 2 = N 2 (1 − c(φ 1 − φ 2 ))) 2 are compared to simulations in Fig. 9. These are clearly significantly overestimated by the model. (Similar results are found for the sign-payoff game and model.) Following the exposition in Appendix 2, the reasons for this discrepancy is quite clear. The model always overestimates the fluctuations S t , and since we are assuming that only the frozen agents contribute to A|μ we also miss the contribution of the fickle agents to reduce H . There seems to be a quite clear path to improve the model along these lines, which is left for future work. Here we opt for the simplicity of solving the present model and the fact that it does give quantitative agreement with distribution of relative strategy scores. As a next step we can find the score distributions by solving the master equation on an integer chain. In contrast to the t game where scores are only updated by 0 or ±1, we now have to consider longer range hopping where scores are updated by integer steps in the range −N to N . Taking into account the individual time averaged step size ± = fit ∓ 1 2 (for x > 0 and x < 0 respectively) and the fact that ξ μ(t) A t has variance N /2, we expect that the jump propabilities are well represented by a normal distribution (for a jump from x to x ) The master equation takes the form Taking the continuum limit over space and ignoring complications due to the boundary x = 0, this can be solved in terms of exponential localization for fickle agents ( + < 0 and − > 0) and diffusion with a drift for frozen agents ( + > 0 or − < 0). For fickle agents the score distributions are given by P(x) ∼ e ∓4| ± |x/N (47) for x > 0 and x < 0 respectively, which in the large α limit reduces to P(x) ∼ e ∓2x/N . For frozen agents the distributions are given by for positively and negatively frozen agents respectively.