Investigating the failure to best respond in experimental games

We examine strategic sophistication using eight two-person 3 × 3 one-shot games. To facilitate strategic thinking, we design a ‘structured’ environment where subjects first assign subjective values to the payoff pairs and state their beliefs about their counterparts’ probable strategies, before selecting their own strategies in light of those deliberations. Our results show that a majority of strategy choices are inconsistent with the equilibrium prediction, and that only just over half of strategy choices constitute best responses to subjects’ stated beliefs. Allowing for other-regarding considerations increases best responding significantly, but the increase is rather small. We further compare patterns of strategies with those made in an ‘unstructured’ environment in which subjects are not specifically directed to think strategically. Our data suggest that structuring the pre-decision deliberation process does not affect strategic sophistication.


Introduction
The experimental game literature has produced a number of studies showing that a substantial proportion of individuals' strategy choices neither correspond to equilibrium predictions, nor are best responses, as judged according to the beliefweighted values of the available options, especially in environments where games are complex and learning opportunities are limited (e.g., see Camerer, 2003;Costa-Gomes & Weizsäcker, 2008;Danz et al., 2012;Hoffmann, 2014;Polonio & Coricelli, 2019;Sutter et al., 2013).
Various possibilities have been suggested to explain deviations from equilibrium behaviour, including subjects' naivety, low engagement, or limited understanding of the strategic environment, especially when subjects are inexperienced and a game is presented for the first time. Some have suggested that there may be a lack of game-form recognition, i.e., a failure to understand correctly the relationship between possible strategies, outcomes and payoffs (e.g., Bosch-Rosa & Meissner, 2020;Cason & Plott, 2014;Chou et al., 2009;Cox & James, 2012;Fehr & Huck, 2016;Rydval et al., 2009;Zonca et al., 2020). Studies using choice process data (e.g., Brocas et al., 2014;Devetag et al., 2016;Hristova & Grinberg, 2005;Polonio et al., 2015;Stewart et al., 2016) suggest that when choosing a strategy, subjects often pay disproportionally more attention to their own payoffs or to specific salient matrix cells, and a non-negligible fraction of subjects never look at the opponent's payoffs, thereby completely disregarding the strategic nature of the game they are playing. As a result, some part of the observed inconsistency might be driven by a failure to incorporate all relevant information, or by the use of heuristic rules that correspond to a simplified-and often incorrect-version of the actual game in question.
Another possible source of the seeming failure to reason strategically might lie in the existence of other-regarding motives (see e.g., Cooper & Kagel, 2016;Fehr & Schmidt, 2006;Sobel, 2005 for overviews of the literature). If individuals' strategy choices are not solely driven by self-interest but involve social preferences, it should not be surprising that subjects depart from optimal behaviour calculated on the basis of own payoffs only. To date, the role of other-regarding motives in normal-form games has been investigated mainly indirectly by monitoring the patterns of information acquisition using eye-or mouse-tracking and connecting the revealed search patterns to different types of other-regarding preferences (e.g., Costa-Gomes et al., 2001;Devetag et al., 2016;Polonio & Coricelli, 2019;Polonio et al., 2015). While evidence from this literature suggests that other-regarding motives may interact with the observed levels of strategic sophistication, the correspondence between strategy choices and process data in establishing causal links is less than perfect, since all the inferences are drawn via subjects' information search patterns. Ideally, what is needed is an explicit measure of individuals' other-regarding propensities as related to the payoff structures of the games under consideration.
In this paper, we focus primarily on two questions. First, can we increase best response rates by encouraging subjects to think more systematically about the 1 3 subjective values they assign to the payoff pairs and about the weights they attach to other players' possible actions? Second, will allowance for other-regarding considerations increase best response rates?
We try to answer these questions by conducting a laboratory experiment in which subjects are presented with a set of eight one-shot two-person 3 × 3 normal-form games. 1 To encourage strategic sophistication, we design a Structured treatment, where we prompt players to think about their evaluation of payoff pairs in conjunction with their beliefs about other players' possible strategy choices, after which they choose their own strategies in the light of those deliberations and with that information still in front of them.
Specifically, using a ranking task, we first ask subjects to think about the subjective value they attach to the various cells in each 3 × 3 game. The joint evaluation of payoffs requires subjects to attend more closely to their counterparts' incentives and motives in general. This increased attention towards others might translate into a higher degree of strategic thinking by helping less sophisticated subjects to become better at predicting their counterparts' strategies. 2 At the same time, based on subjects' stated rankings, we may infer something about their other-regarding considerations, thus relaxing the assumption of pure self-interest (as used by much of the literature) in cases where it does not seem appropriate. Some previous studies (e.g., Bayer & Renou, 2016) have tried to infer subjects' social preferences from their behaviour in a scenario such as a modified dictator game and then 'import' this information into the games that are their main focus. However, other-regarding motives are highly context-dependent and findings from a different decision environment might not carry over (see Galizzi and Navarro-Martínez, 2019, for evidence and a meta-analysis). By contrast, our ranking task explores the expression of at least some other-regarding preferences in a format intrinsic to the actual games under consideration. Thus, the purpose of the joint evaluation of payoffs is two-fold: to increase strategic sophistication, while providing some measure of other-regarding preferences. 3 Second, using a belief-elicitation task, we ask subjects to quantify the chances of each of the opponents' strategies being played. Importantly, we do not require those estimates to conform to any particular assumptions about the rationality of reasoning of the other players; we simply ask subjects for their best judgements. Previous literature investigating the effect of belief elicitation on equilibrium play 1 All data and code supporting the findings of the experiment are available from the Open Science Framework, accessible at https:// osf. io/ zhufe/. 2 The joint evaluation feature of the ranking task is of particular importance given previous evidence suggesting that a non-negligible fraction of subjects neglect their counterpart's payoffs in strategic environments or focus only on a subset of all possible game outcomes (e.g., Devetag et al., 2016;Polonio et al, 2015). See also the discussion in Weizsäcker (2003), where adding context to the usually abstract matrix representations of normal-form games is suggested as a way to increase attention to others' incentives.

3
Investigating the failure to best respond in experimental… has obtained mixed results: some studies find that belief elicitation does not affect game play (e.g., Costa-Gomes & Weizsäcker, 2008;Ivanov, 2011;Nyarko & Schotter, 2002;Polonio & Coricelli, 2019), whereas others show that whether belief elicitation influences game play depends on the properties of the game (e.g., Hoffmann, 2014; see also Schlag et al., 2015 andTrevino, 2014, for reviews). In contrast to these studies, here we investigate the effect of beliefs in conjunction with stated rankings over payoffs, rather than the effect of beliefs alone. As such, the addition of the ranking task might influence consistency rates as well as game play.
Our main results can be summarized as follows. In line with previous literature (e.g., Costa-Gomes & Weizsäcker, 2008;Danz et al., 2012;Hoffmann, 2014;Polonio & Coricelli, 2019;Sutter et al., 2013) we find that a sizeable proportion of players do not choose equilibrium strategies, and fail to best respond to their own stated beliefs. Importantly, while we find that the rate of best response increases significantly when we allow for other-regarding preferences as revealed by the ranking task, the overall magnitude of the difference is relatively small (54% vs. 57%)although for the subset of subjects we classify as inequity averse, best response rates increase from 54% to 61% when using rankings instead of own payoffs. This finding reveals that the observed increase at the aggregate level is almost entirely driven by those subjects who are not motivated only by their own earnings. Still, even for this subset, a significant share of non-optimal play remains, which is line with evidence from some previous studies showing that, in contexts different from ours, otherregarding preferences alone cannot fully explain deviations from equilibrium behaviour (e.g., Andreoni, 1995;Andreoni & Blanchard, 2006;Fischbacher et al., 2009;Krawczyk & Le Lec, 2015).
Our results also suggest that the likelihood of choosing non-optimally is decreasing in the costs of doing so. That is, while non-optimal strategy choices are relatively common when the expected payoffs from the different strategies are very similar, such choices become less and less likely as the difference in expected payoffs between strategies grows, whether measured in terms of foregone monetary payoffs or in terms of foregone ranking values. This appears to be compatible with the notion of Quantal Response Equilibrium (McKelvey & Palfrey, 1995) or some other 'error' model (see, for instance, Harrison, 1989, for a discussion of the 'flat maximum' critique).
To examine more systematically how far and in what ways our intervention alters strategic sophistication, if at all, we compare the patterns of strategy choice in the Structured sample with the responses of a different sample drawn from the same population who we shall call the Unstructured sample and who were presented with exactly the same eight one-shot games but were asked to make their decisions without any prior structured tasks. Since the great majority of game experiments to date have been conducted in this unstructured manner, it is of interest to see how the patterns of choice might be affected. We find overall no impact of structuring the decision process: our attempts to induce 'harder thinking' does not affect chosen strategies.
The rest of the paper is organized as follows. In Sect. 2, we describe the design and implementation of the experiment. We present our main results in Sect. 3. Section 4 discusses and concludes.

3 2 The experiment
We chose a set of eight one-shot 3 × 3 normal-form games that were adjusted versions of the games used in Colman et al. (2014). The games are displayed in Fig. 1. 4 In each cell, the first number indicates the payoff to the BLUE (row) player and the second number indicates the payoff to the RED (column) player. All payoffs are in UK pounds. The games were chosen because they have no obvious payoff-dominant solutions and because they were explicitly designed to differentiate between competing theoretical explanations (see Table 1). Furthermore, previous evidence from the patterns of chosen strategies in these games suggests relatively low-effort thinking, which gives us enough room to detect whether structuring responses leads to higher levels of strategic reasoning (see Colman et al., 2014;Pulford et al., 2018). Table 1 summarizes the strategic structure for each game and player role. Besides the Nash equilibrium prediction, we consider additionally Level-k models, which often out-predict equilibrium play (e.g., Camerer et al., 2004;Costa-Gomes & Crawford, 2006;Ho et al., 1998;Nagel, 1995;Stahl & Wilson, 1994 to allow for bounded depth of reasoning. The Level-1 model assumes that a player best responds Investigating the failure to best respond in experimental… to the belief that assigns uniform probabilities to their counterpart's actions. The Level-2 model predicts that a player best responds to the belief that their counterpart is playing according to the Level-1 model. At the beginning of the experiment, subjects were randomly allocated either the role of the BLUE (row) or the RED (column) player, and they remained in that role throughout the whole experiment. Subjects were then presented with each game in turn, proceeding at their own speed. The order in which the eight games were displayed was randomized and subjects received no feedback about others' responses until the end of the experiment.
For each game in the Structured treatment, subjects completed three different tasks: a ranking, a belief-elicitation and a strategy-choice task. 5 The purpose of the first two tasks was to structure subjects' decision-making process and to induce them to think about all relevant aspects of a game before choosing their preferred strategy. In all tasks, we recorded how much time subjects spent before submitting their decisions. To make each response incentive compatible, one out of the eight games was randomly selected for payment, and another random draw directed whether subjects' earnings were determined according to the ranking, the belief-elicitation, or the strategy-choice task. We now describe each of those tasks in turn, together with the mechanism designed to motivate subjects to give thoughtful and accurate responses.
In the ranking task, subjects were asked to rank all possible unique payoff combinations in a particular game from their most preferred to their least preferred one. Figure 2 shows a screenshot of the interface of this task as shown to the subjects. At the top of the screen, subjects were shown the game being played. In the bottom left corner of the screen, they were shown all the possible payoff pairs, ordered as they Table 1 Structure of the games and models' predicted actions We indicate whether a game has a unique Nash equilibrium or not (Unique Nash), whether another cell constitutes a strict Pareto improvement (Nash Pareto dominated), whether it is symmetric or not (Symmetric), as well as predictions according to the Nash, Level-1 and Level-2 models. For Game 2 that has multiple Nash equilibria we report predictions on the Pareto optimal

Game
Unique Nash Nash Pareto dominated appear in the game from top left to bottom right. 6 They stated their ranking of these pairs by typing in a number between 1 and 9, where 1 corresponded to the pair they liked best and 9 indicated their least preferred pair.
In the event that the ranking task was selected for payment, one player in a pair of players (either RED or BLUE) was selected as the decision maker. We then randomly picked two of the possible payoff combinations from the selected game and paid both players according to the combination that the randomly selected decision maker had ranked as more preferable. For example, if the randomly selected player was the BLUE player, and the two randomly selected payoff pairs were (8, 6) and (10, 4), we looked at which payoff pair the BLUE player ranked higher and paid both players accordingly (either £8 for BLUE and £6 for RED, or £10 for BLUE and £4 for RED player).
Once subjects had submitted their rankings, they proceeded automatically to the belief-elicitation task. In this task, players were asked to think about the ten players participating in the same session who had been assigned to the role of the other colour and they were asked to state their best estimates about how many of these ten players would choose each of their three possible strategies available to them. 7 A screenshot of the interface used in the belief-elicitation task is shown in Fig. 3. In the event that the belief-elicitation task was selected for payment, we randomly picked one of the three strategies available to the other colour of player and then compared the subject's stated belief about the frequency of that strategy choice with the actual frequency among the ten players assigned to that colour in the session. If both numbers matched, subjects were paid £5. Otherwise they received no payoff. 8

Fig. 4
Screenshot of the Structured treatment's strategy-choice task 8 We chose this incentive mechanism instead of the quadratic scoring rule because of its simplicity and to avoid distortion due to the possibility of participants reporting beliefs away from extreme probabilities (see the discussion in Schlag et al., 2015).
Finally, on the last screen of each game subjects had to indicate their preferred strategy in the strategy-choice task (see Fig. 4 for a screenshot of the interface) on the usual understanding that if this task were selected as the basis for payment, they would be paired at random with another participant and each member of the pair would be paid according to the intersection of their chosen strategies. For instance, if the BLUE player chose strategy A, and the RED player strategy D, the BLUE player would receive £8 and the RED player £6.
Importantly, in order to reinforce the effect of previous deliberation on the selected strategy and to control for differences in working memory capacity (see e.g., Devetag et al., 2016), at the time they were choosing their strategy, subjects could see their responses in the ranking and in the belief-elicitation tasks (as shown in Fig. 4), making it as easy as we could for them to think strategically in light of their previous deliberations, if that is what they wished to do.
In order to measure whether structuring subjects' decision-making process and inducing them to think about all relevant aspects of a game before making their decisions had any substantial systematic effect on strategy choice at all, we ran a separate Unstructured control treatment. Using the exact same eight one-shot games, subjects were simply asked to state-without any prior deliberation tasks-which of the three available strategies they wanted to play. In this unstructured environment, the sequence of the eight games was repeated twelve times. In the present paper we discuss only the data from the first sequence. Since subjects had no information about the number of sequences, the repetition could not have affected their strategy choices when they saw each game for the first time. To avoid any income effects on subjects' behaviour in the repetitions of the Unstructured treatment, we randomly selected for payment a game from any sequence, and not necessarily from the first. We then paid subjects according to the intersection of their chosen strategies. 9 The experiment was run at the CeDEx laboratory at the University of Nottingham using students from a wide range of disciplines recruited through ORSEE (Greiner, 2015). The experiment was computerized using z-Tree (Fischbacher, 2007). We conducted ten sessions (five per treatment) with a total of 194 subjects (100 in the Structured treatment, 94 in the Unstructured treatment, 61% of them female, average age 20.8 years). At the beginning of each session, subjects were informed about their role (BLUE or RED) and about the payment procedure that would follow at the end of the experiment. Before the experiment started, subjects were asked to read some preliminary instructions of an example 3 × 3 game and to demonstrate they fully understand the required tasks by answering a series of control questions before they could proceed to the actual experiment.

3
Investigating the failure to best respond in experimental…

Results
We organise our results as follows. In Sect. 3.1, we start with a descriptive analysis of chosen strategies and stated beliefs in the Structured treatment, and discuss whether behaviour in this environment exhibits high degree of strategic sophistication, assuming subjects are only interested in maximizing their own payoff. In Sect. 3.2, we turn our focus to the ranking task, analysing the extent to which subjects are motivated by other-regarding concerns. We then investigate whether accounting for such social preferences increases the proportion of optimal strategies. In Sect. 3.3, we discuss possible determinants of non-optimal behaviour. Finally, in Sect. 3.4 we compare patterns of chosen strategies across the Structured and the Unstructured treatments.

Strategy choices, beliefs and best responses
We start our analysis by focusing on the average proportion of chosen strategies that were predicted by the Nash equilibrium together with the Level-1 and Level-2 models. This is shown in the left side of Table 2 (see the Strategy-choice task columns). On average, the proportion of Nash equilibrium strategy choices varies considerably across games with a minimum of 0.09 (Game 7) and a maximum of 0.49 (Game 6). The average rate of equilibrium strategies in games with a unique Nash equilibrium The left side shows the average proportion of strategy choices in accordance with the different models' predictions. The right side shows the average probability with which subjects estimated each of the different models' predictions to be chosen by their counterpart. For Game 2, which has two Nash equilibria, we show the rate with which subjects chose the Pareto dominant Nash equilibrium, and the rate with which they estimate their counterpart will choose the Pareto dominant Nash equilibrium. Tables A1 and A2 in Online Appendix A report these rates separately for the row and column players. The total average for the Nash equilibrium is calculated over the games with unique equilibrium (excluding Game 2) Game Strategy-choice task Belief-elicitation task Next, we contrast the patterns of chosen strategies with the average probability with which subjects estimated that the counterpart is playing each model's predictions. As depicted by the right side of Table 2 (see the Belief-elicitation task columns in Table 2), for stated beliefs we observe a similar pattern as for chosen strategies: Most subjects do not expect their counterparts to play according to equilibrium: the average probability with which subjects estimated that their counterparts are playing the equilibrium strategy (in games with a unique Nash equilibrium) amounts to only 0.28. Instead, following the same pattern as the chosen strategies, subjects most often predicted their counterparts would play according to the Level-1 model (probability = 0.52), followed by the Level-2 model (probability = 0.34).
The fact that most subjects expect that their counterpart will play their Level-1 strategy, and at the same time, as a response, they also choose their Level-1 strategy, already provides a first indication at the aggregate level that subjects often did not best respond. If they were best responding to their own stated beliefs, they should have moved one step up in the hierarchy and chosen strategies predicted by the Level-2 model. This is not what we observe in our data, since the proportion of Level-2 chosen strategies (probability = 0.41) is lower than the proportion of Level-1 beliefs (probability = 0.52) (see e.g., Polonio & Coricelli, 2019 for similar evidence).
To provide more in-depth evidence on subjects' best response behaviour, in the following, we investigate the level of consistency between chosen strategies and stated beliefs at the individual level. To test whether subjects best respond to their stated beliefs, we calculate a player's expected payoff for each possible strategy available to them on the basis of those stated beliefs, assuming either linear utility of payoffs, or some degree of risk aversion. More specifically, we use the power law function x with = 1 , = 0.8 , and = 0.5 . We then simply count how often a Table 3 Frequency of best responses using expected payoffs subject chooses the strategy that gives them the highest expected utility. The results are given in Table 3. Table 3 reveals that with linear utilities of payoffs ( = 1 ), the average proportion of best responses varies from a minimum of 0.42 (Game 8) to a maximum of 0.68 (Game 7). Averaged over all games, the best response rate is 0.54. Although this is significantly higher than predicted by chance (t test, p < 0.001), it means that almost half of all strategy choices are non-optimal. Furthermore, the average best response rate does not change if we allow for some degree of risk aversion (see last two columns of Table 3). These data are in line with results from previous literature (e.g., Costa-Gomes & Weizsäcker, 2008;Polonio & Coricelli, 2019;Rey-Biel, 2009;Sutter et al., 2013), which report consistency levels ranging from 54% to 67%. We add to this literature by showing that, despite having their personal rankings of payoff pairs and their stated beliefs about their counterparts' strategies on display at the time they are choosing their own strategy, subjects often choose strategies which do not maximise the expected utility of their own payoffs.
All in all, these results suggest that our effort to influence subjects' strategic thinking via increased involvement with the game at hand did not translate to either a high frequency of choices that are part of a Nash equilibrium strategy or high best response rates.

The role of other-regarding preferences
A possible explanation for the seeming failure to 'think like a game theorist' (Croson, 2000) as described above might be that individuals do not only care about their own payoff, but also incorporate the payoffs to others into their utility function (see e.g., Bolton & Ockenfels, 2000;Charness & Rabin, 2002;Dufwenberg & Table 4 Mean ranking for pairs of own and other's payoffs Ranking from 1 (most preferred) to 9 (least preferred). Payoff pairs that did not appear in any of our games are displayed by '-'. In some games there were less than nine unique payoff pairs as some payoff pairs appeared multiple times. In particular, in games 3, 4, and 5 there are only eight unique payoff pairs, and in game 7 there are only seven. In this case, subjects had to rank each payoff pair only once and hence, only ranks between 1 and 8 or 1 to 7, respectively, were possible. To correct for this, in Table A3 in Online Appendix A we display an alternative version of Table 3  Kirchsteiger Falk & Fischbacher, 2006;Fehr & Schmidt, 1999;Rabin, 1993 for social preference models and Cooper & Kagel, 2016;Fehr & Schmidt, 2006;Sobel, 2005 for empirical overviews of the literature). It could be that strategy choices which appear non-optimal under the assumption of pure self-interest might be fully rational once we allow for subjects' other-regarding preferences.
To examine this possibility, we turn to the results of the ranking task, in which subjects in the Structured treatment were asked to rank the different payoff combinations in each game from most preferred (1) to least preferred (9). Table 4 shows the mean ranks for all own-other payoff pairs, averaged over all games. 10 Not surprisingly, subjects generally prefer more money over less. That is, holding constant the other player's payoff, the mean ranking score generally decreases as own payoff increases. At the same time, we find that the other player's payoff systematically affects subjects' rankings. In particular, our results reveal that that subjects are, on average, inequity averse (e.g., Bolton & Ockenfels, 2000;Fehr & Schmidt, 1999). That is, in Table 4, holding constant the subject's own payoff (i.e., fixing a row), the most preferred pair lies on the main diagonal (as highlighted by the italicized cells) where both players obtain the same positive payoff. The exception occurs in the first row where both payoffs are zero: on average, when their own payoff is zero, subjects prefer unequal payoffs even though this involves the other player receiving more than they do.
These results are further corroborated by OLS regressions, in which we use the rank as the dependent variable and own payoff as well as the absolute difference between own and other's payoff as independent variables. The results are reported in Table 5. We find that increasing own payoff has a strong and significant negative effect on stated ranks, consistent with people preferring more money over less. At the same time, the absolute difference between their own and their counterpart's Investigating the failure to best respond in experimental… payoff has a significant positive effect, indicating that, ceteris paribus, people dislike inequity. 11 To explore whether other-regarding concerns might account for some or many of the departures from own-payoff best responses, we re-calculate optimal strategy choices based on subjects' stated beliefs and rankings (rather than payoffs). That is, similar to the analysis above, we first calculate the expected ranking for each possible strategy, assuming linear utilities in rankings. We then simply count how often a subject chooses the strategy that gives her the most preferred expected ranking.
The results reveal that best response rates do indeed increase relative to the case when only own payoffs are considered. In particular, in 6 out of 8 games the fraction of best response rates is higher when using subjects' rankings rather than their own payoffs, with the difference between the two ranging from one to eleven percentage points (see the All players column of Table A5 in Online Appendix A). On average, however, the best response rate increases only moderately from 54% to 57%, a difference that is nevertheless statistically significant (Wilcoxon Signed-rank test, p = 0.004; paired t test, p = 0.025). 12 To shed some further light on the role of other-regarding motives, we explore the underlying heterogeneity in social preferences. In particular, while the analysis above indicates that subjects are on average inequity averse, previous literature has shown that individuals typically differ with regard to their other-regarding concerns (see e.g., the discussion in Iriberri & Rey-Biel, 2013). To test for potential heterogeneity, as a simple measure of a subject's social type, we re-estimate the model from Table 5 separately for each individual. We then use the sign and the significance of the coefficient for the absolute difference between own and other's payoffs to classify subjects into different distributional preference types: Selfish (if a subject's ranking is not significantly affected by differences between own and other's payoffs), Inequity Averse (if a subject's ranking is significantly increasing in payoff differences), and Inequity Seeking (if a subject's ranking is significantly decreasing in payoff differences). 13 On this basis, 55 out of 100 subjects are classified as Selfish, while 44 subjects are classified as Inequity Averse. We find no subject to be Inequity Seeking. 14 This classification allows us to re-calculate the best response rates separately for each type (see the Selfish and Inequity Averse players columns of Table A5 in Online Appendix A). For Selfish individuals we find that, on average, the best response rate amounts to 55%, irrespective of whether using expected payoffs or expected rankings (Wilcoxon Signed-rank test, p = 0.372; paired t test, p = 0.766). For individuals classified as Inequity Averse, in contrast, we find that including their responses in the ranking task significantly increases their level of best response from 54% to 61% (Wilcoxon Signed-rank test, p = 0.004; paired t test, p = 0.011). Thus, the observed increase in best response rates at the aggregate level when using rankings instead of own payoffs is almost entirely driven by those subjects who are not motivated only by their own earnings-a reassuring result, as it shows that the ranking task is picking up something that feeds into subjects' strategy choices.
In sum, while we find that taking into account individuals' attitudes to unequal payoffs can indeed improve best response rates, the differences are quite small. The face value interpretation of our data therefore gives only rather limited support for the idea that such preferences are a major explanation for non-optimal behaviour, as conventionally judged in terms of maximising expected own-payoff values. However, some caveats are in order here. In particular, although the ranking task may do a good job of tapping into players' attitudes to inequity, there may be other dimensions of social preferences not captured by this instrument affecting players' strategy choices. For example, revealed preferences in the ranking task might be context dependent: when some pairing of own-other payoffs is being evaluated in the context of where that cell is located in the game matrix, there may be considerations of reciprocity or (un)kindness that are absent from the ranking exercise. Similarly, the order by which players encounter the ranking and the belief-elicitation task might influence their chosen strategies. If the order is reversed, and subjects first consider how specific outcomes depend on their counterparts' intentions, their subsequent ranking may be more aligned with the perceived (un)kindness of these outcomes in the context of the game. Furthermore, some subjects might be prosocial and can reveal this preference in a static scenario, but when the decision becomes strategic, they might believe that their counterparts are selfish, and choose a more selfish strategy instead. We cannot exclude these possibilities, and our data do not allow us to examine any particular consideration of this kind in a systematic manner. Thus, caution should be exercised in extrapolating insights to environments that measure a wider range of social preferences besides inequity concerns.
Investigating the failure to best respond in experimental…

Possible factors associated with non-optimal play
In this section, we try to shed some light on possible determinants of non-optimal play other than attitudes to inequity. As a first step, we provide some descriptive statistics of the underlying heterogeneity of non-optimal play. At the individual level, we find substantial variation in the degree to which subjects best respond. While the majority of people (67% when using expected payoffs and 77% when using expected rankings) choose optimal strategies in at least half of the games, only very few people do so in all eight games (see Figure A1 in Online Appendix A for the full distribution). The mean (median) number of optimal strategy choices is 4.33 (4) when using expected payoffs, and 4.59 (4.5) when using expected rankings, a difference that is statistically significant (Wilcoxon Signedrank test, p = 0.004; paired t test, p = 0.025). This confirms that considering subjects' attitudes to inequity increases best response behaviour, but that this effect is only modest in size.
Next, we look at the cost of non-optimal play. We compare the expected payoff (expected ranking) between the chosen strategy and the strategy that would have been optimal given a subject's stated beliefs. Figure 5 shows the percentage of nonoptimal strategy choices as a function of the foregone expected payoff (left panel) or foregone expected ranking (right panel). It appears that non-optimal strategies are particularly likely when the loss resulting from such choices is small, while they become less and less likely the larger the size of the loss. On average, conditional on choosing non-optimally, subjects forego £2 of expected payoffs (median £1.8) and 1.4 points in expected rankings (median 1).  Table 6 Regression analysis of optimal strategies This table reports coefficient estimates from logistic regressions. The dependent variable is whether the chosen strategy is optimal regarding the expected payoff (Models (1)-(3)) and the expected ranking (Models (4)- (6)). We only use data from cases in which the optimal strategy was unique, i.e., we are excluding cases in which based on subjects' stated beliefs two or more options were optimal. When using expected payoffs this is leaving us with 761 out of 800 cases. When using expected rankings this is leaving us with 767 out of 800 cases. Standard errors clustered at the individual level are reported in parentheses. ***p < 0.01, **p < 0.05, *p < 0.10 Dependent variable Investigating the failure to best respond in experimental… To provide more detail, Table 6 reports results from a series of logistic regressions with choosing optimally as the binary dependent variable. In Model (1), we use the standard deviation in the expected earnings across the three available strategies as the explanatory variable. The results show a significant positive coefficient, indicating that the more dissimilar the available strategies are (with regard to their expected earnings), the higher the likelihood of choosing optimally: intuitively, if one strategy stands out as the best, the easier it is to choose optimally. This finding is further reflected in response times: the bigger the advantage of the best option, the faster subjects reach a decision (see Table A6 in Online Appendix A).
In Model (2), we add the standard deviation of the three possible own earnings within the optimal strategy as a second explanatory variable. The coefficient is significantly negative, indicating that the higher the variation in own payoffs within the strategy that is optimal given the stated beliefs, the less likely subjects are to choose optimally. Our finding that strategy variance might act as a determinant of choice is in line with previous studies (see e.g., Devetag et al., 2016;Di Guida and Devetag, 2013) showing that choice behaviour is susceptible to the influence of out-of-equilibrium features of the games under consideration. Di Guida and Devetag (2013), for example, show that increasing the payoff variance in the strategy with the highest expected payoff significantly shifts choice behaviour away from that strategy. It is not clear, however, from the aforementioned studies, whether this shift reflects a tendency to pick a strategy that is both attractive and relatively safe or whether it is simply an attempt to avoid the worst possible payoff.
In Model (3), we separate this effect by including two dummy variables indicating whether the optimal strategy contains the lowest or highest possible payoff within a given game. The results reveal that containing the minimum possible payoff has a strong negative impact on the likelihood of choosing optimally. At the same time, containing the maximum possible payoff only has a small positive and insignificant effect. It thus seems that the negative effect of variation in own payoffs is mainly driven by subjects trying to avoid the worst possible payoff, even if this means deviating from the optimal strategy. This result bears some resemblance with recent evidence by Avoyan and Schotter (2020), who study allocation of attention in experimental games. In line with our results, they find that when presented with a pair of games, subjects pay more attention to the game with the greatest minimum payoff, although in their case the game with the largest maximum payoff also receives increased attention.
In Models (4) to (6), we repeat the same analysis but now using optimal strategies calculated based on subjects' rankings of payoffs rather than their own payoffs only. The results corroborate our previous findings.

Structuring the decision process has no significant effect on chosen strategies
Our analysis so far has looked at behaviour only in the structured environment, where subjects were encouraged to invest more effort in thinking about the decision situation. Our results have revealed that despite our attempt to induce subjects to extensively engage themselves with all relevant elements of the game at hand, strategic sophistication (as measured by the fraction of Nash-consistent chosen strategies) as well as best response behaviour was rather low. To test whether structuring subjects' decision-making process has any influence on strategy choices at all, in the following we compare the patterns of choice in the Structured treatment with those in the Unstructured treatment. The results are reported in Table 7. Overall, we do not find any evidence that structuring players' decision processes has a significant effect on actual game play. That is, we find no significant differences in the rate with which subjects play according to the Nash, Level-1, or Level-2 predictions across the two treatments. On average, chosen strategies in the Unstructured treatment are actually slightly more likely to be consistent with the Nash prediction in games with a unique Nash equilibrium (31% vs. 27%) and slightly less likely to be consistent with the Level-1 (46% vs. 50%) and Level-2 (38% vs. 41%) prediction, but none of these differences is statistically significant (all p values > 0.139). 15 These results hold if we compare chosen strategies separately for each game and player type (BLUE or RED player) (see Tables B1 and B2 in Online Appendix B). In sum, in line with the results of Costa-Gomes and Weizsäcker (2008) who find no effect of belief elicitation on actual game play, we find no strong effect even when we elicit payoff rankings in conjunction with beliefs. 16 Table 7 Comparison of game play based on different models' predictions Average proportion of chosen strategies in accordance with the different models' predictions in the Structured (S) and the Unstructured (U) treatment. p values from logistic regressions with robust standard errors (clustered at the individual level). For Game 2, which has two Nash equilibria, we show the rate with which subjects chose the Pareto dominant Nash equilibrium. The total average for the Nash equilibrium is calculated over the games with unique equilibrium (excluding Game 2)  15 We also do not find any systematic differences when we compare the overall distribution of chosen strategies across treatments for either of the players: out of the sixteen possible comparisons between the Structured and the Unstructured treatment, only one yields a weakly significant result (RED players in game six, p = 0.074). All other comparisons are not significant at the 10%. 16 It has been suggested to us that the lack of difference in subjects' behaviour across the two treatments may have been driven by our incentivizing procedure. In the Structured treatment where we employ a single sequence of the eight games using three different tasks, each decision in a game has a probability 1 3 Investigating the failure to best respond in experimental…

Concluding remarks
Our study sought to examine the extent to which strategic thinking might increase if subjects' possible unfamiliarity with the strategic environment were offset by asking them to focus in turn on their ranking of payoff pairs and on their beliefs about the other players' probable choices before selecting their strategies. Although the ranking exercise was not guaranteed to capture all social preferences that might affect strategic choice, it was intended to encourage players to pay attention to all of their own and the other player's payoffs, with considerations of the other player's payoffs hopefully leading to more thought being given to the other player's likely choice of strategy. Despite structuring the decision process in this way, a majority of chosen strategies were not consistent with equilibrium behaviour. Furthermore, subjects failed to best respond to their own stated beliefs almost half of the time when judged in terms of maximising own-payoff expected values. Allowing for different levels of risk aversion using a standard utility function form did not help much-although avoiding the worst possible payoff did appear to carry some weight. We further examined the possible role of subjects' attitudes to inequity: we found that making allowance for such preferences increased best response rates, but only marginally. We went to considerable lengths, within the usual experimental time and budget constraints, to structure the decision process in a way that would facilitate strategic thinking. The fact that, despite such assistance, just under half of the chosen strategies were non-optimal might be taken to indicate some intrinsic limit to the precision with which the expected utilities of options can be judged by players. It might be that some degree of variability or noise enters into strategic choice, with the result that in some proportion of cases an option other than the best response may be chosen. Kahneman et al. (2016) put this view concisely, when they observed: 'Where there is judgment, there is noise-and usually more of it than you think'. Thus, some irreducible minimum level of imprecision might generate some proportion of sub-optimal choices. Indeed, the evidence that the likelihood of non-optimal responses falls as the opportunity loss increases is consistent with various models of noise and stochastic behaviour. Quantal Response Equilibrium is probably the best known of these, but others that apply accumulator mechanisms (e.g., Golman et al., 2020) or preferences for deliberate randomization (e.g. Agranov & Ortoleva, 2017;Cerreia-Vioglio et al., 2019;Dwenger et al., 2018) may also be candidates for further consideration.
Another possibility might be the failure to fully understand the structure of the game and how combinations of strategies produce outcomes (e.g., see Bosch-Rosa & Meissner, 2020;Chou et al., 2009 for similar evidence). Even though the purpose Footnote 16 (continued) of 1/24 to be payoff relevant (3 tasks × 8 games). In the Unstructured treatment, each decision in a game has a probability of 1/96 to be chosen for payment (12 repetitions × 8 games). Given that subjects were not aware of the total number of games in either of the treatments, we think it is unlikely that the lack of significant differences in subjects' chosen strategies is driven by our incentivizing procedure. However, we cannot rule out this possibility. of the ranking and the belief-elicitation exercise was to provide subjects with a more thorough rendering of the strategic environment, the manipulation might not have been equally effective for all. A proportion of subjects might have still focused their attention solely in the strategy-choice task due to cognitive inability to process the insights gained from the pre-decision tasks at the same time (see e.g., Bosch-Rosa et al., 2018;Carpenter et al., 2013;Fehr & Huck, 2016 for related evidence on how cognitive abilities matter for strategic sophistication). Such cognitive limitations might explain the use of heuristic approaches that do not comply with the logic of equilibrium behaviour by avoiding for instance the worst possible payoff. Perhaps these possibilities are complementary, and non-equilibrium behaviour is simply too heterogeneous to be captured by a single explanation. We leave this open for future research.
Finally, we want to comment on the lack of any significant difference in the overall patterns of chosen strategies between the Structured and Unstructured treatments. Our finding may reassure researchers that simply asking participants to pick their preferred strategy is an adequate way for experiments to be conducted. Dispensing with demanding prior ranking and belief-elicitation procedures does not, on this evidence, greatly affect the quality of the data. Scarce laboratory time and money can instead be devoted to collecting larger and more powerful datasets.