1 Introduction

Subjects manage to coordinate their decisions with a surprisingly high frequency in laboratory experiments on pure coordination games (e.g. see Schelling 1960; Mehta et al. 1994; Bardsley et al. 2010). This is both impressive and important. It is impressive because the games are symmetric in pay-offs. Strategies have different labels and the player’s objective is to match the choice of the other, defined in terms of its label. The pay-off from matching is always the same, whichever label it occurs around, and it is zero otherwise. There is nothing in terms of pay-offs to distinguish the strategy labels and so it seems that subjects are able to exploit some asymmetry in the labels themselves when they manage to match to a surprising degree.Footnote 1 It is important because many real-world interactions appear to have multiple equilibria. The selection of one involves solving a coordination game and it seems we have a surprising capacity to do this in the laboratory, even in the difficult cases of pure coordination. This paper is concerned with this apparent capacity to solve pure coordination games.

First, we consider whether the capacity to coordinate in the laboratory depends on restricting the number of strategy options. This is potentially important because laboratory experiments often restrict the strategy options, whereas choices outside the laboratory are more typically unconstrained. As a result, the observed capacity to coordinate in the laboratory could to some degree be an artifact of the laboratory setting and its constraint on choice. Towards this end, our first experiment tests for differences in coordination when otherwise identical coordination problems have either restricted or unrestricted options. Second, we examine with the aid of another experiment how the observed levels of coordination in the first experiment might be explained by appealing to the idea that people follow rules that are based on the labels of the strategies in such games.

The difference between restricted and unrestricted options seems likely to be important because the scope for dis-coordination appears to grow as the possible number of objects in the choice set increases. For instance, when there is one option, the only outcome is a coordinated one; whereas, when there are two options, only half of the possible outcomes are coordinated, when there are three options, the number of possible outcomes that are coordinated drops to one third, and so on. Our first Hypothesis follows from this reasoning.

H1: Coordination is higher when the choice set is restricted.

We test H1 in our first experiment by asking some subjects to play 16 pure coordination games with a restricted number of strategy options and others to play 16 corresponding games with no restrictions. In each coordination game, the options belong to a particular category (i.e. a class of objects). For example, one category is car manufacturers. In the restricted version for this category, we ask subjects to choose one of the following options: {HONDA, MERCEDES, FORD, FERRARI, BMW}. The identified car manufacturers are the strategy labels. The analogous unrestricted version of this coordination game asks subjects to choose a car manufacturer. In both cases, the subjects are told that they will be randomly matched with another subject and ‘if you give the same answer as the other person you will win some MONEY. If not, you will win NOTHING’.

In fact, we reject H1. Contrary to expectation, coordination is typically higher when options are unrestricted than when they are restricted. It seems in this respect that laboratory experiments with restricted sets may understate the likely coordination when there are unrestricted options outside the laboratory. The result sets a problem that we take up in our second experiment: why is coordination higher when options are unrestricted?

We address this problem from a particular perspective. We follow Schelling (1960) and Sugden (1995) in assuming that individuals use a selection rule based on the labels attached to the strategies in such games. In particular, subjects are randomly matched and we elicit their application of seven rules in a ‘naïve’ and in a ‘strategic’ way. Thus, suppose the rule is ‘favourites’ and is applied to the category of car manufacturers. In the ‘naïve’ elicitation, we ask the subject to choose their favourite: ‘Circle/write down your favourite car manufacturer’. When applied ‘strategically’, we ask the subject to choose the other person’s favourite: ‘Circle/write down the other person’s favourite car manufacturer’. The seven rules are drawn from the psychology literature.

We use these elicitations to assess whether any of three conceptually distinct reasons, within this framework, explains why one observes more frequent coordination in unrestricted than restricted sets of the first experiment. One possible reason is that individuals are using the same rule and coordination improves because the individual interpretations of that rule become closer to each other when there are unrestricted options. For example, with an ‘odd one out’ rule it may help such convergence in interpretation when selecting a number from the unrestricted set of all positive integers as compared with a restricted set like {1, 2, 3, 7, 9}. This is because, in the restricted set, both ‘1’ and ‘9’ might stand out as, respectively, the lowest and highest while ‘2’ could do the same as the only even number. However, in the unrestricted set there is no highest number and there are lots of even numbers, so only the lowest number, ‘1’ stands out. This is the basis of our second Hypothesis. We define the average own rule concordance rate for rule i as the average coordination rate that would be achieved across all categories when all subjects use rule i in the manner suggested by the elicitations of rule i in the second experiment.

H2: The average own rule concordance rate is higher with unrestricted than with restricted options.

Another possible reason is that each individual could be using a constant rule across the categories but the constant rule is not the same for all individuals and greater coordination is achieved because the application of these different rules yields greater concordance on the unrestricted than the restricted options. For example, people who select the object with the most features associated with that category of objects (i.e. the ‘most similar’ rule) could plausibly choose ‘1’ on both the unrestricted and restricted positive integer sets because in both cases every number is divisible by ‘1’ (but not by any other). As a result, if, as suggested above, people using the ‘odd one out’ rule are more likely to select ‘1’ on the unrestricted than the restricted set, then they will achieve more coordination with those using the ‘most similar’ rule on the unrestricted than on the restricted set. This is our third Hypothesis. We define the average cross-rule concordance rate between rule i and j as the average coordination rate that would be achieved across the categories when one individual uses rule i and the other individual uses rule j (in the manner suggested by the rule elicitations in the second experiment). H3 is distinct from H2, but they are not mutually exclusive.

H3: The average cross-rule concordance rate is higher with unrestricted rather than restricted options.

The final possible explanation builds on the second. Individual subjects use different rules and the frequency of their use of each rule changes between the unrestricted and restricted options sets. For example, the ‘odd one out rule’ might seem obviously applicable in the restricted set of car manufacturers like {HONDA, MERCEDES, FORD, FERRARI, BMW} when subjects come from the US because FORD is the only US manufacturer while all others originate from foreign countries. But, this rule may not be as attractive to use on an unrestricted domain of car manufacturers because being a domestic manufacturer no longer singles out one option and there are no obvious alternative bases for an ‘odd one out rule’ that would work on car labels. H4 concerns whether there is evidence of different frequency of rule use on restricted and unrestricted options.

H4: Rules are used with different frequencies by individuals when options are unrestricted as compared with when they are restricted.

We find no evidence for H2. Indeed for each of our rules when used by all, the concordance rate is higher with the restricted options than the unrestricted ones. Nor do we find evidence in favour of H3: the cross-rule concordance between different rules in our experiment is very similar and almost always slightly higher when options are restricted rather than unrestricted. Finally in relation to H4, we find evidence that subjects use rules with different frequencies on the restricted as compared with the unrestricted sets. Using the different estimated frequencies of use, we are able to generate the predicted coordination rates when subjects are given restricted and unrestricted options. These predictions are below, but close to what was observed in the first experiment and the predicted coordination rate is higher, albeit only marginally so, when the options are unrestricted. In short, there is evidence that different rules are triggered when options are restricted as compared with when they are unrestricted and this contributes to the explanation of why we observe higher coordination when options are unrestricted.

The first experiment on restricted and unrestricted coordination games and its results are described in Sects. 2 and 3. Sections 4 and 5 do the same for the second experiment on the possible explanation of the differences in coordination observed in the first experiment. Section 6 discusses these results and concludes.

2 Experiment I

Individuals play 16 pure coordination games. Each game is distinguished by the category of labels used to identify possible strategy choices. The categories were selected because each has more than five possible labels and subjects are likely to have sufficient knowledge of the category to give access to at least this number of label options. The categories are set out in Table 1. Flowers, car manufacturers, positive numbers and colours have been used in previous experiments on focal points.

Table 1 Labels for strategies in restricted sets

Subjects play a block of eight games in an unrestricted version where there is no restriction on the labels (e.g. “Choose the same colour as the other person”) and the remaining block of eight games in a restricted version where there are five labels (e.g. “Choose the same color as the other person: {green, black, red, yellow, blue}”). The blocks and versions are counterbalanced within each session. Further details and screenshots appear in the Supplementary Material. Subjects see the restricted version labels in a row and the order of the labels is independently randomized for each subject. They are randomly and anonymously matched to another subject throughout the experiment and feedback is provided only at the end of the experiment. Choices are incentivized through the creation of a ‘pot’ (£5 for each person in that session). This is equally divided at the end of the experiment between those pairs of players who choose the same label for a randomly chosen game. This incentive mechanism leads to a separation of coordination from other-regarding motivations. In practice, the unrestricted version could create disagreements between experimenter and subjects about outcomes. For example, ‘soccer’ and ‘football’ typically refer to the same sport for British citizens but different ones for American citizens.Footnote 2 As a result, we did not actually use the unrestricted version for payment. Nevertheless, both versions are incentivized using Bardsley’s (2000) conditional information lottery by truthfully telling subjects that, as only one category is for payment and it will be announced at the end of the experiment, it is in their best interest to play as if all categories were for payment.

Table 2 Summary results: Experiment I

The five labels were those that were chosen most frequently in the pilot of the unrestricted version of each game. This selection procedure prevents any discretion by the experimenter, avoids possible confounds and increases the statistical power of tests for difference. For instance, while a random selection of labels would avoid experimenter discretion, it creates a possible confound because only some of the labels in the restricted set might be recognized by subjects. Likewise, if the design included a random selection of labels for the restricted version, there might be no common labels for comparison or the frequency of choices of these common labels would be too small to perform a non-parametric test.Footnote 3 Table 1 also gives the five labels that were available in the restricted version of each game. They are listed in the order of frequency from left (most chosen) to right (least chosen).

The experiment was conducted in seven sessions at the University of East Anglia in March 2009. 100 subjects from the general student population were recruited randomly through an email via the distribution list of the university. Subjects participated anonymously at computer workstations. Instructions were read aloud (Appendix A). Every subject received £2 for participating and could expect to get £5 from the ‘pot’ for an average of 45 minutes work.

3 Results: Experiment I

Table 2 presents a summary of the results for each category. Column (1) presents the number of labels chosen by at least one subject in the unrestricted version. The only category where less than 5 labels were mentioned is Sports; the median number across all categories is 6.5 (average is 6.75). Columns (2) and (3) show the modal label of the distribution; and columns (4) and (5) the degree of coordination (c) for the unrestricted and restricted version, respectively, defined as:

$$\begin{aligned} c=\sum _g m_g \left( {m_g -1} \right) /\left[ {N\left( {N-1} \right) } \right] \end{aligned}$$
(1)

where \(m_g\) is the number of subjects choosing strategy labelled g for a given category and version, and N is the total number of subjects facing that decision problem. It measures, given the actual responses of subjects, the probability of two randomly picked individuals choosing the same label.

To test H1, we use a bootstrap method with 20,000 simulated samples, to compare, for every category, whether the c-index in the restricted version can be obtained from the distribution of choices in the unrestricted version. Column (6) in Table 2 shows the levels of significance in a two-tail test and the direction in which the null hypothesis is rejected. There is only one category where coordination is higher in restricted than unrestricted at 95% or higher confidence levels (Metal). The remaining 15 categories tell against the hypothesis: the coordination level is significantly higher in unrestricted than restricted sets for 6 and there is no difference at 95% confidence levels in 9 categories. So, we reject the Hypothesis.

4 Experiment II

4.1 The rules

We consider seven possible rules and locate each briefly in the experimental and broader psychology literatures. They, in different ways, cash out why something might ‘stand out’. Their precise operationalization for the naïve elicitation of Experiment II is given in parenthesis. Of course, the list is selective but we shall provide a cross check on their plausibility by considering whether the recommendation of each rule would be good advice for someone playing against the population from Experiment I.

Favourite (“Choose your favourite”) has a long history in the experimental literature on coordination games (see Mehta et al. 1994). Bardsley et al. (2010), for instance, provide evidence of a correlation between choices in coordination games and a post-experiment questionnaire on favourites.

Odd-one-out (“Choose the least similar”) is central to Schelling’s (1960) original work and it has been elaborated by Bacharach (1993, 2006) as a rarity preference. Bacharach and Bernasconi (1997) and Bardsley et al. (2010) find some support for such choices: e.g. Mannheim was chosen in the set {Mannheim, Berlin, Brussels, Lisbon, Madrid} and glass in the set {glass, diamond, emerald, sapphire}.

Prominence (“Choose the top of the most natural ranking”) can be understood as what is pre-eminent in a natural ranking for a category. What makes a mountain a good mountain? Height. What makes a footballer a good footballer? Skill at playing football. So, Everest is the top of the most natural ranking of mountains, Maradona or Pelé is the top of the most natural ranking of footballers, and so on. This might explain why Schelling found that Grand Central Station was the most salient place for people to meet in New York: it was the place where most people pass through on a single day.

Typicality I and II (“Choose the best known” and “Choose the most frequently mentioned”) are two versions of an availability heuristic in cognitive psychology and may be used because something that appears frequently in the world comes more readily to mind (Tversky and Kahneman 1973). For example, Tesco has the largest number of supermarket stores in the UK and so may come most easily to mind. Likewise, there are more Ford cars than the other makes and so on. This rule is also suggested by Sugden’s (1995) normative theory of focal points where subjects should choose more mentioned items because this increases the probability of coordination under team reasoning assumptions.

Prototypicality (“Choose an example”) is another availability heuristic (see Rosch 1977). The idea is that the world is highly organized (e.g. creatures with feathers are more likely to also have wings than creatures with fur) and the process for perceiving and storing that information is accordingly highly structured. Prototypes are the most characteristic members of the set and so they require least cognitive effort to retrieve. This may explain why, when children are asked to draw a flower, they draw something similar to a daisy and why, in Mehta et al. (1994), John is the most chosen boy’s name and Ford is the most chosen car manufacturer.

Similarity (“Choose the most similar”). Objects typically share an overlapping set of features (see Tversky 1977) and the one that contains most of these elements may stand out for this reason. For example, flowers usually have scent, they are colourful, often come in different varieties of the genus which are given identifying names by horticulturalists who enter them in competitions, are displayed in vases, are often given as gifts on special days, and so on. Thus, roses might be chosen with this rule because they exemplify all these characteristics, whereas the other flowers in one respect or another fall short.

4.2 Naïve (‘pick’) and strategic (‘guess’) choices

In each session, the 16 categories are divided into two blocks of 8. The subjects are divided into two and each group is presented in the first stage with one of the blocks. They are asked in the naïve version to pick using each of the seven rules: for example, with the favourite rule on the colour category, they are asked ‘Choose your favourite colour’. Once this stage has been completed, they receive the instructions for the second, strategic stage. In this stage, the blocks are swapped and subjects are asked to guess the label that a randomly selected subject (from the other group) chose in the first stage: e.g. ‘Choose the other person’s favourite colour’. The choices in the second stage are incentivized in a similar same way as in Experiment I. At the end of the experiment, a question is randomly selected and the subjects correctly choosing their randomly matched partner’s choice share a ‘pot’. The only difference is that each group has their own pot.

The sessions differ only with respect to whether the labels in each category are an unrestricted or restricted set. When the labels are restricted, they are the same five labels for each category as in Experiment I.

The experiment was conducted at the University of East Anglia in December 2009. 198 subjects were recruited through an email via the distribution list of the University of East Anglia. Subjects were recruited from the general student population and participated anonymously at a workstation. Subjects were assigned one out of the four models of booklets with a randomly determined order of psychological rules, categories and, in the restricted version, a random order of labels (see supplementary material). Instructions were read aloud (Appendix A). Every subject received £2 for participating and £5 in expectation from the pot; the experiment lasted on average for 45 min.

5 Results: Experiment II

With four booklets, order or learning effects would produce different distributions of answers across the booklets. However, the distribution was only significantly different for 31 of the 416 questions (i.e. 7.5%). Thus, we discount the influence of order and learning effects.

To check on the plausibility of our chosen rules, we consider whether each rule would constitute good advice to someone who was playing against the population from Experiment I. Our criteria of good advice are that someone using a rule would do significantly better in terms of coordination than would be achieved if they followed conventional game theory by selecting their strategy randomly. Tables 3 and 4 give the levels of coordination that would be achieved with each rule on each category. In addition, the coordination rate for each rule is bootstrapped following the protocol described earlier to check whether the coordination rate for each rule is significantly different from the level that would be achieved when someone uses a randomizing strategy. The success rate for randomizing is not well defined for unrestricted sets, but we also use 0.2 in the test for significance to keep comparability. The test, therefore, errs on the side of toughness in unrestricted sets. Almost all rules are plausible in the sense that a person would almost always do better by using the rule. The one exception is the ‘Odd-one-out’ rule: on unrestricted sets, it is never significantly different; and to a lesser extent on restricted sets where it is only significantly better than random in 6 of the 16 categories.

Table 3 Coordination rate between Exp II rule user and players from Exp I for each rule (unrestricted)
Table 4 Coordination rate between Exp II rule user and players from Exp I for each rule (restricted)

One can also compare the level of coordination that is achieved when using a rule in play with the population from Experiment II with the coordination level that was actually achieved in Experiment I. Figures 1 and 2 give this in the form of scatter diagram for the unrestricted and restricted version, respectively, where each observation is a category. When they fall around a straight line, this suggests that the use of the rule tracks well the differences in coordination across categories in Experiment I and when they fall around a 45\(^\circ \) ray from the origin, the use of the rule is also close to the actual kevel of coordination achieved in Experiment I. Again, it is apparent that the ‘Odd-one-out’ rule does least well in these respects.

Fig. 1
figure 1

Own rule concordance rate for each rule in Exp II and actual coordination in Exp I (unrestricted)

Fig. 2
figure 2

Own rule concordance rate for each rule in Exp II and actual coordination in Exp I (restricted)

We test H2 by comparing the level of coordination that would be achieved by each rule on the restricted and unrestricted versions of labels, had it been used by all players in either the ‘naïve’ or ‘strategic’ form. Table 5 gives the average own concordance rate that would have been achieved by each rule. For each rule, the average rate is higher in the restricted version than the unrestricted version. This is true for both naïve and strategic elicitations. Looking at each rule in each category, we find that the number of instances in which the restricted version produces a higher c-index is 159 out of 192 (87 in naïve choices and 72 in strategic choices). This would not be generated randomly (binomial test p value <0.01). We also note that the own concordance rate for each rule is always higher in the ‘strategic’ than in the ‘naïve’ elicitation.Footnote 4

Table 5 Average probability of coordination (c-index) across categories for a given rule
Table 6 Average concordance between rules (c-index) across categories (strategic elicitation)

We test H3 by generating a prediction for the coordination rate that would be observed between subjects using each possible pair of rules, based on what Experiment II suggests subjects using those rules will choose. Table 6 gives these average cross-rule concordance rates for the 16 categories when the options are unrestricted and restricted. It is apparent that the cross concordance rates for ‘strategic’ elicitation are almost always higher on the restricted version. There is only one pair, ‘Typicality I (Best known)’ and ‘Prominence’ where the concordance rate is actually higher on the unrestricted options set, and then only very marginally. This counts against H3.Footnote 5

Table 7 Estimated frequency of each rule use on ‘unrestricted’ in the aggregate and by category
Table 8 Estimated frequency of each rule use on ‘restricted’ in the aggregate and by category

We test H4 by first estimating what weights, when attached to each rule for each category, can best explain the choice of labels in Experiment I under the assumption that each rule yields the distribution of choices revealed by Experiment II (see Crawford and Iriberri 2007, for a similar approach). Using maximum likelihood, we estimate these for each category individually as well as in the aggregate by pooling all the decision problems. The comparison of the aggregate with the average estimate for each category gives some indication of how stable are the estimates of weights across categories. Tables 7 and 8 give these weights for the unrestricted and restricted versions of the coordination problems. They suggest that while both versions of ‘Typicality’ (‘Best known’ and ‘Frequent’) rules are used with similar frequency under both conditions, ‘Odd-one-out’, ‘Favourite’ and ‘Prominence’ rules are used much more frequently when the options are restricted. The corresponding rules that are much more popular in the unrestricted condition are ‘Prototypicality’ (of course, by construction) and ‘Similarity’. This is evidence in favour of subjects using the rules with different frequencies when options are restricted/unrestricted.

We examine whether these differences in rule use can account for the higher coordination when options are unrestricted using these weights to generate predicted coordination rates under these assumptions. We do this both with and without errors in the application of a rule. The results are given in Table 9. The predicted coordination rate on average is higher when the options are unrestricted, albeit only marginally so.

Table 9 Estimated average coordination rate across categories using the estimated frequencies of rule use

6 Discussion and conclusion

Our first experiment produced a surprising result: when the options in a pure coordination game are restricted to five, subjects coordinate less well than when their choice set is unrestricted. This is important because it suggests that the artifact of reducing the choice options in the laboratory will not bias upwards the estimate from these experiments of coordination when there are no such restrictions (e.g. outside the laboratory). Nevertheless, it is surprising because the scope for dis-coordination seems likely to grow as the number of options available for choice increases.

Our second experiment addresses this puzzle from a particular perspective. It develops a new method for investigating the principles that underlie salience. Under the assumption that people use one of our seven rules and that they use the same rule, we cannot explain our ‘surprising’ result that subjects appear to coordinate better when the options are unrestricted than when they are restricted by appealing to the greater precision of a rule on the unrestricted set of options. This is because own rule concordance is actually lower for all rules in the unrestricted version. Likewise, under the assumption that people use our seven rules with the same frequency in both the restricted and unrestricted versions but that this frequency differs across individuals, we cannot explain the surprising result because the cross-rule concordance is almost always higher on the restricted version. In short, our first two possible explanations of the surprising result fail. Instead, however, we do have evidence, again if we assume that subjects use our seven rules, that they use these rules with different frequencies under the unrestricted and restricted conditions. Given the own and cross-rule concordances that we generate, this difference in use could explain, albeit only in part, the otherwise ‘surprising’ result from our first experiment.

In particular, while the use of some rules like the two versions of is used on both ‘Typicality’ (‘Best known’ and ‘Frequent’) versions, an ‘Odd-one-out’ and a ‘Favourite’ rule are only used when options are restricted. This is an interesting difference. It also makes sense. In the restricted version of our decision problems, everyone has the same options to choose from, while in the unrestricted case, the options that come to subject minds need not be the same. This does not matter for a rule like ‘Typicality I and II’ (‘Best known’ or ‘Frequent’) where there is, in principle, one object that is most frequently mentioned or best known among all possibilities or among the restricted set of possibilities, but it does for a rule like the ‘Odd-one-out’.

The ‘odd one out’ rule depends critically on the full set of options that come to a subject’s mind because one cannot identify the ‘odd one out’ independently of this set. Hence if the options differ across individuals, as they can on the unrestricted domain, so is their likely choice of an ‘Odd-one-out’. For this reason, it is not as well suited to the restricted as the unrestricted version of the coordination problems (because, through construction, everyone has the same options in mind in the restricted case). So the rule could function in the restricted version of the experiment. Indeed this is what Table 6 reveals. It is a good rule for coordination in restricted sets when used against itself but not in unrestricted ones.Footnote 6 The same argument could apply to the ‘Favourite’ rule in the unrestricted case because some people may think, for instance, of their favourite footballer from among those currently playing while others may think of footballers from any time (that they have known).

If this explains the use of these rules on restricted but not unrestricted sets, it also helps explain why our subjects did better on the unrestricted coordination problems. The ‘Odd-one-out’ rule, in particular, does well against itself but it does much worse than other rules when playing against other rules (see Table 6). Thus, in so far as there is a diversity of rule use (as our estimates suggest), coordination is impaired by the use of the ‘Odd-one-out’ rule in restricted coordination games. This loss of coordination does not occur in the unrestricted versions of the problem because, again, if our estimates of rule use are correct, the ‘Odd-one-out rule’ is not used.

That we might be able to explain the differences in this way in the use of rules across the restricted and unrestricted versions of these coordination problems is reassuring in the sense that if people follow rules, then the selection of the rule should be intelligible as part of the process of rule following. On one account, rules might be generated through an evolutionary process where rules that coordinate well survive. On another, they might be selected by team reasoners reflecting on which rule would coordinate best. Both make success in coordination crucial for the adoption of a rule and this is how we have attempted to make sense of the difference in the use of these rules across the restricted and unrestricted coordination problems.