Sampling Dynamics of a Symmetric Ultimatum Game

We propose a dynamic three-strategy symmetric model of the Ultimatum Game with players using a sampling procedure. We allow an intermediate strategy, interpreted as a social norm, to evolve in time according to beliefs of players about an average offer. We show that a social norm converges to a self-consistent offer of about 15 % in the unique globally asymptotically stable equilibrium of our model.


Introduction
Cooperation between unrelated individuals in animal and human societies is an intriguing issue in biology and social sciences, cf. [4,5,11,17]. Usually, it is addressed within gametheoretic models such as the Prisoner's Dilemma and the Snowdrift games. In economics, one of the fundamental questions is concerned with the bargaining problems. The essential features of this social dilemma are present in the Ultimatum Game. It is a nonsymmetric game where the goal is to divide a fixed prize, of unit worth, between two players. The first player, called the proposer, makes an offer-the share of the prize. The second player, called the responder, either accepts or rejects the offer. If the offer is accepted, the responder gets the offer and the proposer gets the rest of the prize, otherwise both players get nothing. It is easy to see that any offer may be supported by a Nash equilibrium. Let the proposer make an offer α ∈ [0, 1] and the responder reject any offer strictly lower than α and accept all offers not smaller than α. Any such pair of strategies constitutes a Nash equilibrium. However, these equilibria are based on empty threats. Suppose that the proposer deviates from the equilibrium and makes an offer α < α. Then the responder rejects the offer even though she would be better off accepting it.
A stronger notion of the subgame perfect equilibrium is used to select one from many Nash equilibria. If the prize is perfectly divisible, that is the strategy space of the proposer is continuous, there is a single subgame perfect equilibrium in which the proposer offers α = 0 and the responder accepts any offer. If the prize has a grid with a size g > 0, that is g is the smallest nonzero offer, and consequently the strategy space of the proposer is discrete, there are two subgame perfect equilibria. The first one is the same as in the continuous case. In the second equilibrium, the proposer offers α = g and the responder accepts any positive offer.
There is a vast body of literature on an experimental treatment of the Ultimatum Game. The precise analysis of this literature is beyond the scope of this note. However, the single most important finding is that the offers observed during experiments are different from predictions based on the concept of the subgame perfect equilibrium and vary between 20 % and 40 %. An excellent survey is in [8].
There were many ways of explaining this systematic violation of theoretical predictions including dependence of subjects' preferences on payoffs of other players in various ways and all sorts of models of adaptation and learning, including our previous paper [10]. Here, we offer a different dynamic approach using a notion of the sampling equilibrium introduced in [12] and further developed in [16].
The rest of the note is organized as follows. In Sect. 2, the main model is derived. It is studied and discussed in Sect. 3. We conclude in Sect. 4.

Ultimatum Game
We propose a symmetric version of the Ultimatum Game. Game roles, the proposer or the responder, are assigned to players at random with equal probabilities. Therefore, a pure strategy must define an action in each role and so we define a pure strategy to be a pair (α, β), where α is an offer made while a player is a proposer and β is a minimal accepted offer while a player is a responder. To make things simple, we restrict possible offers to the following three possibilities.
A strategy is called egoistic if α = 0, altruistic if α = 1, or is of an intermediate type if α = δ, δ ∈ (0, 1). Also, to keep the number of pure strategies conveniently small, we need to relate the acceptance levels β with the offer levels α. To do so, we assume that players are symmetric across their roles, α = β. That is if a player offers a share α in the role of the proposer then he expects the same offer while being the responder. Obviously, a player accepts all higher offers and rejects all lower offers. In short, players of our symmetric Ultimatum Game have three pure strategies at their disposal: (0, 0), (δ, δ), and (1, 1). Payoffs are therefore given by two matrices P and R for the proposer and the responder, respectively, whereP ij and R ij are payoffs of the proposer (the row player) and the responder (the column player) respectively if the proposer plays the ith strategy and the responder the j th one (payoffs in bold are rejections).
The game we consider is a very simplified version of the Ultimatum Game and resembles the cardinal Ultimatum Game introduced in [3]. The cardinal Ultimatum Game is an extensive form game where the first player (the proposer) has only two strategies. She can offer an equal share α = 1/2 or offer a share close to the perfect subgame equilibrium of the game. The second player (the responder) can accept or reject any offer.
It is noted in [3] that the game is derived by "abstracting the crucial features of the full ultimatum game ("crucial" relative to observed patterns of lab play)." We make some changes to the cardinal Ultimatum Game. They are dictated by some modeling considerations on one hand and technical necessities on the other one. We start with a theoretical model and try to say something about the possible values of the parameter δ. In fact, we treat the intermediate strategy, given by the value of the parameter δ as a social norm, admittedly simplified, where δ plays the role of the actual average offer and players' beliefs about an average offer which are equal at the equilibrium. Therefore, we do not want to use any data to formulate the prior of the model. This is why we add the altruistic strategy so that the possible strategies span the whole spectrum of possibilities even if this strategy is not (or extremely rarely) observed during experiments.
Also, we make the game symmetric. This is because we consider the evolution of the social norm within the population in the very long run. It seems to us that it would be very questionable to assume that some part of the population in the long run consists of proposers and the other part of responders. Such an assumption is perfectly valid in biological scenarios where players are animals and so they may be of different species. It can also makes sense in some economic scenarios as well, e.g., a population of universities and a population of prospective students, but we do not see any compelling reason why in the context of bargaining an asymmetric scenario should be of any interest to explain the evolution of the social norm. 1 Consequently, we make the game symmetric. Due to technical reasons we need to narrow down the set of pure strategies and so we remove asymmetric strategies. Finally, the model of the game is more general than the cardinal Ultimatum Game but at the same time the set of strategies is restricted to the symmetric ones. However, our model still allows for the same game patterns as the original cardinal Ultimatum Game proposed in [3], hence it can reproduce typical patterns of play observed in laboratory experiments.

Sampling Dynamics of the Ultimatum Game
A mixed strategy is a probability distribution over the set of pure strategies. The set of probability distributions is denoted by Δ. A mixed strategy x ∈ Δ can be interpreted as a distribution of pure strategies in a large population of players that are randomly matched into pairs to play a symmetric normal form game, that is we assume a standard evolutionary type setting, cf. [6,18]. Now we construct our dynamical model. We assume that players use a testing or sampling procedure introduced in [12]. We restrict ourselves to 1-sampling procedure-players use (test) once each pure strategy against randomly chosen opponents and adopt a strategy with the highest payoff. If the probability that the ith strategy is a winning one is equal to the current fraction of that strategy in the population, then we say that the population is at the sampling equilibrium.
More precisely, we denote by w(i, x) the probability that the ith strategy will provide the highest payoff in the population where the distribution of pure strategies is x. Each pure strategy i defines a random variable v i (x) with payoffs a ij and probabilities x (obviously, care must be taken if there are tied payoffs). These random variables are independent. We get where # arg max j v j (x) is a number of best alternatives. In other words, a winning probability w(i, x) is the probability that the random variable associated with the i-th pure strategy yields the highest payoff when all random variables v i are sampled once. In the case of a tie, the probability is split equally among best alternatives. The vector of winning probabilities is denoted by w(x). A mixed strategyx is a sampling equilibrium ifx = w(x). It is not difficult to see that w(x) is a polynomial in x, hence for any game there exists a sampling equilibrium by the Brouwer's fixed-point theorem.
Sampling dynamics, a formal model of the above dynamic sampling process, was postulated 2 in [16] and some further properties were studied in [14]. It is a system of ordinary differential equations of the forṁ (3) In a vector notation, it becomesẋ = w(x) − x. It can be easily verified that a simplex Δ is forward invariant under the sampling dynamics. Also, a set of all critical points of a vector field w(x) − x is a set of all sampling equilibria. This dynamics is well behaved since the vector field is a polynomial function in x and so for any initial condition x there exists a unique solution ξ(t, x), t ≥ 0. It is important to note the unique feature of the sampling equilibrium concept. In contrast to the notion of Nash equilibrium, a sampling equilibrium depends only on inequalities between payoffs and so the notion of sampling equilibrium is an "ordinal" concept. Any perturbation of payoffs does not change the sampling equilibria of the game as long as the inequalities between the payoffs are preserved, and consequently the order of the payoffs. In other words, the concept of sampling equilibrium assumes only one thing about the players that they prefer higher payoffs to lower payoffs but does not take into account the differences between the payoffs unlike the Nash equilibrium. Now we would like to justify the use of the sampling dynamics (3). One reason is the learning procedure behind it. It is based not on imitation like most dynamics applied to Ultimatum Game (mostly in the form of the replicator dynamics 3 ), but on the procedure of comparing private outcomes in a sequence of games. From the descriptions of the experiments, it seems that subjects cannot observe choices made by other participants and so this excludes any model based on imitation. The second reason is that the model of the game is severely simplified and so the use of a learning procedure that discerns the finest differences in payoffs seems inappropriate. The model of sampling procedure, and consequently the notion of sampling equilibrium that is somewhat grainy, seems more suitable. Now comes the crucial part of our construction. We symmetrize the Ultimatum Game by constructing random variables v i (x). These random variables depend on the value of δ in general, but it is enough to write down only a single table. Table 1 presents these random variables. The construction details are discussed in Appendix A.
Depending on the value of a parameter δ there are three distinct cases. For δ < 1/2, we have δ < 1 − δ and the values in Table 1 are in the ascending order. For δ > 1/2, the two middle columns should be interchanged while for δ = 1/2, the two middle columns should be added. The value of a parameter δ does change the winning probabilities w(i, x) required for the sampling dynamics (3). To get winning probabilities, we have to consider all 24 possible realizations of the random vector (v 1 , v 2 , v 3 ) and calculate probabilities according to (2). This results in three different dynamics. Sampling dynamics for the symmetric Ultimatum Game for δ < 1/2 reads for δ = 1/2 reads and for δ > 1/2 reads

Results and Discussion
In the previous section, we constructed sample dynamics (4)- (6). Here, we discuss its qualitative behavior. In particular, we look at stationary points, that is time independent solutions of (4)- (6). Such points provide distributions of pure strategies in the population at the equi-librium. We would like to interpret the intermediate pure strategy (δ, δ) as the social norm and hence the believed δ should be the mean offer in the population at the equilibrium. To make the concept of social norm sensible, we require the equilibrium to satisfy the following three conditions.
Property 1 is necessary to claim that an equilibrium supports the social norm. Suppose that there were two asymptotically stable equilibria and only one would satisfy remaining two conditions, we would run into troubles as we would have to provide a story explaining why the initial condition should belong to the basin of attraction of the correct equilibrium. This is not to say that in different populations there may not be different social norms.

whereδ(δ) is an average offer at the equilibrium.
Property 2 is a consistency condition. If we want to interpret the parameter δ as an average value of an offer in a population at the equilibrium, then the value predicted by the model needs to be consistent with the assumed value, i.e., Eq. (7) has to be satisfied. We use Eq. (7) to find the exact value of δ predicted by the model. Also, one may interpret parameter δ as players' beliefs about the actual average offer in the population. Then the consistency condition is an exact analog 4 of the equilibrium condition in [13].
There is a different way of interpreting condition (7). In fact, we have defined a class of models indexed with the parameter δ. If we assume a particular value of this parameter then we arrive at the corresponding equilibrium given that Property 1 is satisfied. At the equilibrium, we can calculate the mean offerδ which gives rise to another model. We are looking for a fixed point of this process, that is a fixed point in the space of models.

Property 3 (Mode-consistency)
At the equilibriumx δ , we require that Property 3 is required if we want to interpret the intermediate strategy as the social norm. If this condition is violated, then it amounts to a statement that we have an equilibrium at which the social norm is not used by the majority of the population. What kind of a norm would that be then?
There is an interplay between the last two properties. Property 2 can give a very sharp prediction but it is based on a statistics that is not robust. On the other hand, Property 3 leads usually to a set of values of the parameter δ but is based on a robust statistics. We want to have an equilibrium satisfying both properties but we are interested in the predicted set as well as in the particular predicted value of δ understanding that the predicted sharp value of δ can be quite far away from some of the experimental data.
It is obvious that the crucial property required for further discussion is the uniqueness and the global stability of equilibrium. We prove in Appendix B that our sampling dynamics (4)-(6) has Property 1 for any value of the parameter δ.
We note that it is not possible to calculate the exact analytical form of the equilibrium. Therefore, in the further discussion we revert to numerical solutions. We find numerically an approximation of the equilibriumx δ . The equilibriumx δ depends on a value of the parameter δ in the following noncontinuous waŷ  The equilibrium is constant on intervals (0, 1/2) and (1/2, 1) because the change of δ does not lead to changes in inequalities defining winning probabilities. This results from the sampling procedure being "grainy" as we have noted before.
A quick look at the formulae (9) shows that the Property 3 is satisfied only for δ ≤ 1/2. For δ > 1/2, the egoistic strategy has the largest share of a population. Consequently, we conclude that our model predicts that the average offer should not be larger than 50 %. This result corresponds nicely with the experimental results.
To check Property 2, we use formula (7) and find that there is a unique value of a parameter δ * satisfying the equationδ(δ * ) = δ * , namely δ * ≈ 0.146. Figure 1(d) shows the dependence of an average on the value of the parameter δ. This value fits nicely into a region δ ≤ 1/2 where the Property 3 is satisfied.
Concluding, all three Properties 1-3 are satisfied for δ ≈ 0.146. Hence, the model's predictions correspond nicely with the experimental results and are far better than the predictions of the fully rational game theory, i.e. a subgame perfect equilibrium. Figures 1(a)-(c) show the behavior of the sampling dynamics for all three distinct cases.
Our model allows to study the evolution of beliefs (and consequently a strategy (δ, δ) interpreted as a social norm) by iterating a function In this setting, a mean-consistent social norm results from a fixed point of this function. We can consider the following scenario. A system starts with any value of beliefs δ 0 . After a while, the state of the system converges to the corresponding equilibriumx δ 0 . Once near the equilibrium, the society eventually learns that the mean offer is different than the beliefs δ 0 and adapts beliefs to δ t 1 =δ(δ 0 ). This leads to a new equilibriumx δt 1 and the process is repeated. We can use a recurrence equation δ t n+1 =δ(δ tn ), n ∈ N and δ 0 ∈ [0, 1] given, modeling an evolution of beliefs. Starting with any initial value of δ 0 , after at most two steps we have δ tn < 1/2 and then we know that the population eventually converges to the equilibriumx δ<1/2 that is the unique globally asymptotically stable equilibrium 5 and consequently the beliefs δ n converge to the actual average offer and the intermediate strategy (δ, δ) converges to the mean-consistent and mode-consistent social norm (δ * , δ * ). The actual timing t n of steps in the recurrence equation (10) is irrelevant as long as t n → ∞ thus we can think of two different time scales where the time scale for a distribution of strategies within a population is fast and the time scale of evolution of beliefs (or a social norm) is slow. Our model predicts the convergence of beliefs and a social norm to some value corresponding well with the observed behavior in a process where at each step the population adjusts beliefs to the current equilibriumx δt n given a current social norm (δ tn , δ tn ). Figure 1(d) shows this behavior for initial values δ 0 = 0 and δ 0 = 1. For δ 0 close to 1, two steps are required to have δ tn < 1/2.

Conclusions
We presented a model of a symmetric Ultimatum Game and analyzed the sampling dynamics describing the evolution of a population of players using the sampling procedure. The model is parameterized by a parameter δ interpreted as an average offer at the equilibrium. The intermediate strategy (δ, δ) is interpreted as the "social norm" strategy.
Sampling procedure leads to dynamics with the unique globally asymptotically stable equilibriumx δ that depends on δ in a noncontinuous way but is constant over two separate intervals. We showed that the natural property of the mode-consistency leads to the selection of δ smaller than 1/2. The property of the mean-consistency results in a particular choice of a value of the parameter δ * ≈ 0.146. Both results correspond nicely with the observed behavior in experiments.
We would like to stress that the reported experimental mean offers vary wildly and are sometimes quite far away from our mean-consistent value of δ * . This may be for several reasons. Firstly, the model presented here is an extreme simplification and as such should not be construed as an exercise in fitting. Rather, we start with a model and want to derive some bounds on the predicted average offer in experiments based on certain "natural conditions" of mean-consistency and mode-consistency. Secondly, the mean statistics is not robust and may be severely distorted by outliers, and even more so in small samples. The more robust statistics, a mode, gives rise to predictions that are less precise but at the same time more consistent with the data. It is clear that, even taking into account the statistical properties of mean, the derived mean-consistent value of δ * is certainly too low in comparison to the observed modal offer of around 40 %, cf. [9]. It was argued that the focal point, cf. [15], of the Ultimatum Game is the social norm of equal split, cf. [7], which is a better explanation of the observed offers than our model. However, as noted in [2], "social comparisons activate the norm of equity: responders expect to be treated like others in like circumstances." In our model, each player compares the received offer only with what he offers while being the proposer. There are neither comparisons between players nor any knowledge about the mean offer and so the focal point of the equal split is never triggered. Our model takes into account only the I want to be treated the way I treat others norm but not the I want to be treated like others are norm. This is probably not enough to warrant the value of δ close to the even split social norm.
As mentioned before, our model is an extreme simplification. Although, it does capture the basic features of simple bargaining situations, more detailed models are needed. Further analysis of Ultimatum Game within the sampling dynamics framework can proceed by introducing more intermediate strategies with the limit case containing all intermediate strategies (α, α), α ∈ (0, 1). This may lead to some infinitely-dimensional dynamical systems or partial differential equations describing the evolution of a density function over the set of strategies. Other possible extension is the inclusion of direct comparisons between players (perhaps through studying a game on a network) or the inclusion of the information about the mean offer into the players' payoff function. Construction and analysis of such systems are left for a future work, however, it should be noted that such analysis may be extremely difficult.

Appendix A: Construction of Random Variables
It is crucial for the construction of the random variables v i (x) displayed in Table 1 to realize that it is required to look at both matrices P and R in parallel. This is because the game roles are assigned at random. A player testing a pure strategy i may happen to be a proposer or a responder with equal probabilities.
We start with the construction of the random variable v 1 (x) associated with the first (egoistic) pure strategy. For this strategy we need to look at the first row of the matrix P and the first column of the matrix R. We present these matrices here with the appropriate rows and columns marked: The first row of the matrix P happens with probability 1/2 and the first column of the matrix R happens with probability 1/2. The probability of the event v 1 (x) = 0 is the sum of probabilities of the following events: a player is assigned a role of proposer and runs into either an intermediate player or an altruistic player and a player is assigned a role of a responder and runs into an egoistic player. The probability of the first event is while the probability of the second event is

Together we get
Pr v 1 (x) = 0 = 1 2 (x 2 + x 3 ) + 1 2 In the same way, we obtain Pr v 1 (x) = δ = 1 2 x 2 because δ payoff may be realized only if the player is assigned a role of responder and runs into an intermediate player.
Obviously, we have Pr[v 1 (x) = 1 − δ] = 0, and finally Pr v 1 (x) = 1 = 1 2 The whole distribution of the random variable v 1 (x) reads which is the first row of Table 1.
Similarly, we see that for the random variable v 2 (x) we need to look at the second row of the matrix P and the second column of the matrix R: We have Pr v 2 (x) = 0 = 1 2 Pr v 2 (x) = δ = 1 2 x 2 , Finally, for the random variable v 3 (x) we look at the last row and the last column of the matrix P and R, respectively, We have Pr v 2 (x) = δ = 0, Pr v 2 (x) = 1 − δ = 0, This concludes calculations for all entries in Table 1.

Appendix B: Proof of Property 1
The global behavior of all systems (4)-(6) is described by a proposition that follows but we start with a lemma first.