## Abstract

The Individual Evolutionary Learning (IEL) model explains human subjects’ behavior in a wide range of repeated games which have unique Nash equilibria. Using a variation of ‘better response’ strategies, IEL agents quickly learn to play Nash equilibrium strategies and their dynamic behavior is like that of humans subjects. In this paper we study whether IEL can also explain behavior in games with gains from coordination. We focus on the simplest such game: the 2 person repeated Battle of Sexes game. In laboratory experiments, two patterns of behavior often emerge: players either converge rapidly to one of the stage game Nash equilibria and stay there or learn to coordinate their actions and alternate between the two Nash equilibria every other round. We show that IEL explains this behavior if the human subjects are truly in the dark and do not know or believe they know their opponent’s payoffs. To explain the behavior when agents are not in the dark, we need to modify the basic IEL model and allow some agents to begin with a good idea about how to play. We show that if the proportion of inspired agents with good ideas is chosen judiciously, the behavior of IEL agents looks remarkably similar to that of human subjects in laboratory experiments.

This is a preview of subscription content, log in to check access.

## Notes

- 1.
- 2.
- 3.
Basically, it is any adjustment algorithm that describes out-of-equilibrium adaptation of the following form: \(X^*_k = X^*_{k-1} + \lambda (X^*_k - X*_{k-1})\) where \(\lambda \in [0,1]\) is a so-called ’relaxation parameter’ and \(X^*_k\) is the estimate of the of the equilibrium value of an economic variable at the \(k^{th}\) iteration. See Sargent (1993) for a definition and a detailed description of the ‘relaxation’ algorithm.

- 4.
Our preliminary simulation results as well as pilot experimental sessions indicate that the main results in this paper would also be true if \(A^i_t = \{0,1\}\).

- 5.
See Boylan and El Gamal (1993), for an experimental evaluation of these models.

- 6.
- 7.
McKelvey and Palfrey (2002) used this terminology to refer to their set of experiments with human subjects where subjects did not know the payoff matrix of players they were matched with.

- 8.
In fact, subjects could only choose in increments of 0.01.

- 9.
Only 10% of the pairs, from all of the 80 experiments conducted for this paper, did not converge to integer values for \(m_1, m_2 \) and \( m_3\). In all cases, \(m_3 < 3.5\) and so we classified these as Other.

- 10.
If we were to reclassify those as Alternate and recompute everything below, it would change the quantitative calculations a little, but it would not affect the qualitative conclusions.

- 11.
This set can and will contain duplicates.

- 12.
These examples were generated by beginning a simulation and taking the first instance of each. We could have generated a lot and then picked out the ones that “looked the best”, but we decided that going with the first gives the reader a better idea of how IEL really performs.

- 13.
We provide all of our experimental data in the Supplemental Material.

- 14.
We admit that this was a particularly lucky outcome for us. We would not expect to get the exact same proportions if we replicated the experiment with another 20 pairs. But we would expect to get something similar.

- 15.
In this paper we do not model how these good ideas come into being. One possibility could be from a game theoretic analysis. Another could be social learning through prior experience such as modeled in Hanaki et al. (2005).

- 16.
But this replacement will not happen instantaneously. Thus, this model is also consistent with the agent beginning with no inspiration and having an ‘aha’ moment sometime in the early rounds.

- 17.
When an uninspired agent plays another uninspired agent with symmetric payoffs, it is exactly the same as the simulation in Sect. 2.2 when \(K=2\). We use those results here.

- 18.
The Mahalanobis distance is \(d(\overrightarrow{x},\overrightarrow{y}) = \sqrt{((\overrightarrow{x}-\overrightarrow{y})^TS^{-1}(\overrightarrow{x}-\overrightarrow{y})},\) where \(\overrightarrow{x}\) is the data point, \(\overrightarrow{y}\) is the mean of the distribution, and

*S*is the covariance matrix of the distribution. This is basically a variance adjusted RMSD. See Mahalanobis (1936) and McLachlan (1992) for definitions and uses of the Mahalanobis distance. - 19.
For a normal distribution of 2 dimensions, the square of the distance of an observation, \(d^2,\) is chi-square distributed. So the probability that \(d<t\) is \(1-\exp (\frac{-t^2}{2})\).

- 20.
We thank Catherine Eckel for reminding us of this observation.

- 21.
Also see Bednar et al. (2012) for similar findings.

- 22.
As the reader will see, a plan is not a strategy. A stratgey is a function describing the choice of an action, or probability density over actions, contingent on history. A plan is simply a commitment to a sequence of actions, independently of what others will do. It may be part of the implementation of a strategy but it is not a strategy in and of itself. For example, “Tit for Tat” is a strategy while “Cooperate today and Defect tomorrow” is a plan.

- 23.
We use the symbol \(X^{(k)}\) to denote the

*k*times Cartesian product of*X*. - 24.
If \(p^i_{t,j} = p^i_{t,j}(1)\) then

*j*is only expanded. If \(p^i_{t,j} = [p^i_{t,j}(1),\ldots , p^i_{t,j}(K)]\) then*j*is only contracted. - 25.
If some of the \( W^i_{t,j} \) are less than zero, we need to adjust this equation so as to produce a probability in [0, 1]. We let \( \pi ^i_{t,j} = \frac{W^i_{t,j}-\epsilon ^i_t}{\sum _{k=1}^{J} (W^i_{t,k}-\epsilon ^i_t)}\) where \(\epsilon ^i_t = \min \{0, \min _j W^i_{t,j}\}\).

## References

Arifovic, J., & Ledyard, J. (2004). Scaling up learning models in public good games.

*Journal of Public Economic Theory*,*6*(2), 203–238.Arifovic, J., & Ledyard, J. (2007). Call market book information and efficiency.

*Journal of Economic Dynamics and Control*,*31*, 1971–2000.Arifovic, J., & Ledyard, J. (2011). A behavioral model for mechanism design: Individual evolutionary learning.

*Journal of Economic Behavior and Organization*,*78*(3), 374–395. https://doi.org/10.1016/j.jebo.2011.01.021.Arifovic, J., & Ledyard, J. (2012). Individual evolutionary learning, other-regarding preferences, and the voluntary contributions mechanism.

*Journal of Public Economics*,*96*, 808–823. https://doi.org/10.1016/j.jpubeco.2012.05.013.Bednar, J., Chen, Y., Liu, T. X., & Page, S. (2012). Behavioral spillovers and cognitive load in multiple games: An experimental study.

*Games and Economic Behavior*,*74*(1), 12–31.Boylan, R., & El Gamal, M. (1993). Fictitious play: A statistical study of multiple economic experiments.

*Games and Economic Behavior*,*5*, 205–222.Bush, R. R., & Mosteller, F. (1951). A mathematical model for simple learning.

*Psychological Review*,*58*, 313–323. https://doi.org/10.1037/h0054388.Camerer, C. F., & Ho, T. (1999). Experience-weighted attraction in games.

*Econometrica*,*67*, 827–874.Cason, T., Lau, S., & Mui, V. (2013). Learning, teaching, and turn taking in the repeated assignment game.

*Economic Theory*,*54*, 335–357.Cheung, Y., & Friedman, D. (1997). Individual learning in normal form games: Some laboratory results.

*Games and Economic Behavior*,*55*, 340–371.Erev, I., Ert, E., & Roth, A. E. (2010). A choice prediction competition for market entry games: An introduction.

*Games*,*1*, 117136.Erev, I., & Roth, A. E. (1998). Predicting how people play games: Reinforcement learning in experimetnal games with unique, mixed strategy equilibria.

*American Economic Review*,*88*, 848–881.Fischbacher, U. (2007). z-Tree: Zurich toolbox for ready-made economic experiments.

*Experimental Economics*,*10*, 171–178.Hanaki, N., Sethi, R., Erev, I., & Peterhansl, A. (2005). Learning plans.

*Journal of Economic Behavior and Organization*,*56*, 523–542.Ioannou, C., & Romero, J. (2014). A generalized approach to belief learning in repeated games.

*Games and Economic Behavior*,*87*, 178–203.Mahalanobis, P. C. (1936). On the generalised distance in statistics.

*Proceedings of the National Institute of Sciences of India*,*2*(1), 49–55.McKelvey, R. D., & Palfrey, T. R. (2002).

*Playing in the dark: Information, learning, and coordination in repeated games*. Caltech working paper.McLachlan, G. J. (1992).

*Discriminant analysis and statistical pattern recognition*(p. 12). Wiley Interscience. ISBN 0-471-69115-1.Myung, N., & Romero, J. (2013).

*Computational testbeds for coordination games*. Working paper.Rapoport, A., Melvin, J. G., & Gordon, D. G. (1978).

*The 2 × 2 game*. Ann Arbor: University of Michigan Press.Sargent, T. (1993).

*Bounded rationality in macroeconomics*. Oxford: Oxford University Press.Sonsino, D., & Sirota, J. (2003). Strategic pattern recognition—Experimental evidence.

*Games and Economic Behavior*,*44*, 390–411.Van Huyck, J. B., Battalio, R. C., & Beil, R. O. (1990). Tacit coordination games, strategic uncertainty, and coordination failure.

*The American Economic Review*,*80*, 234–248.Van Huyck, J. B., Battalio, R. C., & Beil, R. O. (1991). Strategic uncertainty, equilibrium selection, and coordination failure in average opinion games.

*The Quarterly Journal of Economics*,*106*, 885–910.Van Huyck, J. B., Battalio, R. C., & Ranking, F. W. (2007a). Selection dynamics and adaptive behavior without much information.

*Economic Theory*,*33*, 53–65.Van Huyck, J. B., Battalio, R. C., & Walters, M. F. (2007b). Evidence on learning in coordination games.

*Experimental Economics*,*10*, 205–220.Van Huyck, J. B., Cook, J. P., & Battalio, R. C. (1994). Selection dynamics, asymptotic stability, and adaptive behavior.

*Journal of Political Economy*,*102*, 975–1005.Van Huyck, J. B., Cook, J. P., & Battalio, R. C. (1997). Adaptive behavior and coordination failure.

*Journal of Economic Behavior and Organization*,*32*, 483–503.

## Author information

### Affiliations

### Corresponding author

## Additional information

We thank Sarah Deretic, Kevin James, Brian Merlob and Heng Sok for their excellent research assistance. We would also like to thank John Duffy, Tim Cason, Julian Romero, participants at the Workshop in Memory of John van Huyck, Southern Methodist University, 2015, participants at the Southern Economic Association Meetings, New Orleans, 2015, as well as two referees and an editor. Jasmina Arifovic gratefully acknowledges financial support from CIGI-INET Grant #5553. John Ledyard thanks the Moore Foundation whose grant to Caltech for Experimentation with Large, Diverse and Interconnected Socio-Economic Systems, Award #1158, supported the experimental work.

## Electronic supplementary material

Below is the link to the electronic supplementary material.

## Appendices

### Appendix 1: Description of IEL

IEL is based on an evolutionary process which is individual, and not social. IEL is particularly well-suited to repeated games with large action spaces such as convex subsets of multi-dimensional Euclidean space. At the heart of each IEL agent’s strategy is a collection of possible plans of action that they carry in their memory. These remembered plans are continually evaluated and the better ones are implemented with higher probability. In previous manifestations, IEL agents used particularly simple plans; those that involved only one period. In this paper, we allow them to consider multi-period plans.^{Footnote 22}

A *plan of action of length K for agent i* is \(p^i = [p^i(1),\ldots , p^i(k)]\) where \(p^i(r) \in A^i\). A finite *set of plans for i at t* is \(\mathcal {P}_t^{i} =\{p^i_j\}_{j=1}^J\) where^{Footnote 23} \(\mathcal {P}^i_t \subset A^{i(1)} \cup \cdots \cup A^{i(K)}\).

*Initialization* At the start of play, *i*’s initial set \(\mathcal {P}^i_0\) is created by randomly selecting *J* elements from \(A^{i(1)} \cup \cdots \cup A^{i(K)}\). For \(j=1,\ldots ,J\), a value for the length *k* is chosen from \(\{1,\ldots ,K\}\) with equal probability and, then, for each \(p_j(d)\) with \(d=1,\ldots ,k\), a value is randomly generated from [0, 1]. Agent *i* next selects a plan \(p_{0j}\) from \(\mathcal {P}^i_0\) with probability 1 / *J*. Finally *i* implements the plan so selected. If the selected plan is \(j^*\) then \(a^i_0 = p^i_{0,j^*}(1)\).

As the repeated game progresses, when a plan \(p^i_{t,j}= (p^i_{t,j}(1),\ldots ,p^i_{t,j}(k))\) is selected, it is implemented in its entirety by the IEL agent. That is, the entry \(p^i_{tj}(1)\) is played in period *t*, the entry \(p^i_{tj}(d)\) is played in period \(t+d-1\) up to period \(t+k-1\). An IEL agent’s strategy implements a plan, lets it play out, and then chooses a new plan to implement. At the beginning of a round *t* in which *i* s not continuing implementation of a previously selected plan,*i* computes a new set of plans \(\mathcal {P}_{t}^{i}\) and valuations \(W_{t}^{i}\). This computation is at the heart of our behavioral model and consists of three pieces: *rotation*, *experimentation* and *replication*.

*Rotation* This step sets things up for the next round by repositioning every plan in the considered set. Every plan \(p^i_{t-1,j} = (p^i_{t-1,j}(1),\ldots ,p^i_{t-1,j}(k)) \in \mathcal {P}^i_{t-1}\), is replaced with a new plan \(p^{*i}_{t-1,j} = (p^i_{t-1,j}(2),\ldots ,p^i_{t-1,j}(k), p^i_{t-1,j}(1))\). That is all plans with length greater than one are rotated so as to remain consistent with the timing of play.

*Experimentation* Experimentation introduces new, alternative plans that otherwise might never have a chance to be tried. This insures that a certain amount of diversity in possible plans is maintained.

For each \(j = 1, \dots , J\), with probability \(\rho _{val}\), \(p^i_{t-1,j} \in \mathcal {P}^i_{t-1}\) is chosen for value experimentation. If *j* is not chosen for value experimentation, then \(p^i_{tj} = p^i_{t-1,j}\). If *j* is chosen, then for all *d*, \(p^i_{tj}(d) \sim N(p_{t-1,j} (d),\sigma |A^i)\). That is the new value for the *d*th entry of *p* is chosen randomly from the truncated normal centered at the old value. We now have a new, temporary set \(X_t = \{p^i_{tj}\}_{j=1}^J.\)

Next experimentation with length occurs. For each \(j = 1, \dots , J\), with probability \(\rho _{len}\), \(p^i_{j,t} \in X_t \) is expanded or contracted by 1 with equal probability if that is possible.^{Footnote 24} If *j* is expanded, the new value is appended at the latest time. The new element is either 0 or 1, whichever has the highest foregone utility.

*Replication* Replication follows experimentation. Replication reinforces plans that would have been good choices in previous rounds. It does that by creating a value for each plan and allowing higher valued plans to replace those that are valued less. Replication begins with the set \(X_t\) left after experimentation and creates a new set \(\mathcal {P}^i_t\).

First, for each \(p^i_{t,j} \in X_t,\) compute valuations \(W^i_{t,j}\). These valuations are based on the average utility that plan would have attained over the past *L* periods had it been played. For each \(p^i_{tj} \) in \(X_t\)

Here \(a^{-i}_{t-d+1-L}\) is the action played by all other agents in period \(t-d+1-L\). If \(p^i_j = (p^i_j(1),\ldots ,p^i_j(k))\) then the index *z* of \(p_j^i(z)\) is mod k. In this paper we let \(L=K\).

Next, for \(j=1,\ldots , J\), pick two members of \(\{1,\ldots ,J\}\) randomly (with uniform probability) with replacement. Let these be *k* and \(k^\prime \). Then,

Note that \(p_{j,t}^{i} \in \mathcal {P}^i_t\) while \(p_{k,t-1}^{i} \) and \(p_{k^\prime ,t-1}^{i} \in X_t\). After \(\mathcal {P}^i_t\) is created, the respective valuations \(W^i_j\) are computed for those elements.

Replication for *t* favors alternatives with a lot of replicates at \(t-1\) and favors alternatives that would have done well over the past *L* periods, had they been used. So replication is a process with a form of averaging over *all* past periods. If the actual actions of others would have provided a favorable situation for particular plan \( p_{j}^{i}\) on average then that plan will tend to accumulate replicates in \(\mathcal {P}_{t}^{i}.\) It is fondly remembered and will be more likely to be actually implemented. Over time, the sets \(\mathcal {P}_{t}^{i}\) will become more homogeneous as most of its plans become replicates of the best performing alternative.

After updating the set of plans and their valuations, the IEL agent selects a plan for *t*, \(p^i_t\).

*Selection* The set of plans and their valuations, \((\mathcal {P}^i_t,W_{t}^{i})\), induce a mixed plan on \(\Delta (A^i)\) at *t*, and for some future rounds. In round *t*, *i* *selects* a plan \(p^i_j \in \mathcal {P}^i_t\) with probability^{Footnote 25} \( \pi ^i_{t,j} = \frac{W^i_{t,j}}{\sum _{k=1}^{J} W^i_{t,k}}\). The selected plan is then implemented in its entirety.

*Parameters* IEL is completely specified by 5 parameters: \((J, \rho _{val},\rho _{len}, \sigma , K)\). We let \(J= 180\), \(\rho _{val} = 0.033\), \(\sigma = 0.05\), and \(\rho _{len} = 0.0033\). Results from our previous IEL applications show that the same parameter values for \((J, \rho _{value}, \sigma )\) have worked well across many different environments (call market experiments in Arifovic and Ledyard 2007; Groves-Ledyard experiments in Arifovic and Ledyard 2004, 2011; voluntary contribution mechanisms, Arifovic and Ledyard 2012). Further those results have also shown that the behavior of an IEL agent is reasonably robust against changes in these parameters.

We now have a complete model of behavior for a general repeated game.

### Appendix 2: Instructions

### Instructions for dark information

**Introduction** You are about to participate in a session on decision making and you will be paid for your participation. This is part of a study intended to provide insight into certain features of decision processes. What you earn depends partly on your decisions and partly on the decisions of others. If you follow the instructions carefully and make good decisions you may earn a considerable amount of money. You will be paid in cash at the end of the experiment.

You will be asked to make choices in a number of rounds. You will be randomly paired with another person, your partner, for a sequence of rounds. A sequence of rounds is referred to as a **match**.

**Match** The length of a match is 40 rounds. In each round, you make a choice by entering a number between 0 and 1. The person you are matched with, your partner, also makes a choice. The payoff you receive for each round depends on your and your partners choices. For each round, you will see on the computer screen the information about your choice, your partners choice, and the payoff that you earned in that round.

**Payoffs** All the payoffs are stated in experimental currency units (ECU) and can be converted into cash at a rate of $1 dollar per 100 ECUs at the end of the experiment. The payoffs are accumulated over the rounds. In each round, you will see the cumulative payoff for all the rounds that you have already played at the bottom of the computer screen.

You can check your payoff for various combinations of your choices using the *payoff tables* that are in your folder or using the *what-if-calculator*.

**Payoff Table** For each match you can look up a payoff table to see what payoff you would receive for different choices that the two of you make. Your choices are given on the vertical axis and your partners on the horizontal axis. The choices are given in the increments of 0.1. Each cell gives your payoff for a given combination of choices.

For example, in case of match 1, if you choose 0.6 and your partner chooses 0.5, your payoff is 7.8.

**What-if-Calculator** You will find a what-if-calculator on your computer screen. You can use it at the beginning of each round to calculate your payoff for various combinations of your choices.

The payoff tables and the what-if-calculator provide the same type of information about your payoffs for various combinations of choices.

### Instructions for full information

**Introduction** You are about to participate in a session on decision making and you will be paid for your participation. This is part of a study intended to provide insight into certain features of decision processes. What you earn depends partly on your decisions and partly on the decisions of others. If you follow the instructions carefully and make good decisions you may earn a considerable amount of money. You will be paid in cash at the end of the experiment.

You will be asked to make choices in a number of rounds. You will be randomly paired with another person, your partner, for a sequence of rounds. A sequence of rounds is referred to as a **match**.

**Match** The length of a match is 40 rounds. In each round, you make a choice by entering a number between 0 and 1. The person you are matched with, your partner, also makes a choice. The payoff you receive for each round depends on your and your partners choices. For each round, you will see on the computer screen the information about your choice, your partners choice, and the payoff that you each earned in that round.

**Payoffs** All the payoffs are stated in experimental currency units (ECU) and can be converted into cash at a rate of $1 dollar per 100 ECUs at the end of the experiment. The payoffs are accumulated over the rounds. In each round, you will see the cumulative payoff for all the rounds that you have already played at the bottom of the computer screen.

You can check your payoff for various combinations of your choices using the *payoff tables* that are in your folder or using the *what-if-calculator*.

**Payoff Table** For each match you can look up a payoff table to see what payoff you would receive for different choices that the two of you make. Your choices are given on the vertical axis and your partners on the horizontal axis. The choices are given in the increments of 0.1. Each cell gives your payoff for a given combination of choices.

For example, in case of match 1, if you choose 0.6 and your partner chooses 0.5, your payoff is 7.8.

**What-if-Calculator** You will find a what-if-calculator on your computer screen. You can use it at the beginning of each round to calculate your payoff for various combinations of your choices.

The payoff tables and the what-if-calculator provide the same type of information about your payoffs for various combinations of choices.

## Rights and permissions

## About this article

### Cite this article

Arifovic, J., Ledyard, J. Learning to alternate.
*Exp Econ* **21, **692–721 (2018). https://doi.org/10.1007/s10683-018-9568-1

Received:

Revised:

Accepted:

Published:

Issue Date:

### Keywords

- Battle of Sexes
- Alternation
- Learning

### JEL Classification

- C72
- C73
- D83