Skip to main content
Log in

Predicting human cooperation in the Prisoner’s Dilemma using case-based decision theory

  • Published:
Theory and Decision Aims and scope Submit manuscript

Abstract

In this paper, we show that Case-based decision theory, proposed by Gilboa and Schmeidler (Q J Econ 110(3):605–639, 1995), can explain the aggregate dynamics of cooperation in the repeated Prisoner’s Dilemma, as observed in the experiments performed by Camera and Casari (Am Econ Rev 99:979–1005, 2009). Moreover, we find CBDT provides a better fit to the dynamics of cooperation than does the existing Probit model, which is the first time such a result has been found. We also find that humans aspire to a payoff above the mutual defection outcome but below the mutual cooperation outcome, which suggests they hope, but are not confident, that cooperation can be achieved. Finally, our best-fitting parameters suggest that circumstances with more details are easier to recall. We make a prediction for future experiments: if the repeated PD were run for more periods, then we would be begin to see an increase in cooperation, most dramatically in the second treatment, where history is observed but identities are not. This is the first application of Case-based decision theory to a strategic context and the first empirical test of CBDT in such a context. It is also the first application of bootstrapped standard errors to an agent-based model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The free parameters of CBSA include two kinds of forgetfulness and an aspiration level. See Sect. 3 for details.

  2. Like other specification tests, passing the test does not mean that the model is necessarily correctly specified; only that failing the test would have been evidence that it is misspecified.

  3. Incorporating cognitive constraints into economic models is a hallmark of work in the intersection of economics and psychology, such as Simon (1957), Simon et al. (2008), Tyson (2008), Hanoch (2002), Ballinger et al. (2011), and Cappelletti et al. (2011).

  4. In particular, the ‘SHJ’ series of classification learning experiments, starting with Shepard et al. (1961) and including Nosofsky (1986) and Nosofsky et al. (1994).

  5. The experimental literature on the PD and other repeated games is vast. A sample follows; also see the discussion in Sect. 6. Brosig (2002) show that signaling in face-to-face experiments may be effective at encouraging cooperation. In one-shot games, there exists a low level of cooperation (Bereby-Meyer and Roth 2006). Agents may learn to cooperate in repeated games, especially when monitoring of other players actions is allowed (Selten and Stoecker 1986; Andreoni and Miller 1993; Hauk and Nagel 2001) but that cooperation breaks down during the course of the game. Evidence on the altruism motivation is mixed, some finding evidence for (e.g., Kreps et al. 1982) and some against (e.g., Cooper et al. 1996). Other papers of note include Ellison (1994), Bó (2005) and Bo and Fréchette (2011). Chong et al. (2006) and Camerer and Hua Ho (1999) use an ‘experience weighted attraction’ model to study learning in a repeated trust game. This model postulates that players remember the history of previous play and form beliefs about what other players will do in the future and also are reinforced by how successful previous strategies have been. Monterosso (2002) uses false feedback to disrupt equilibria and measures the effects; this work seems also well suited for analysis by CBSA and analyzing these data is a possible future extension.

  6. See Camera and Casari (2009) for details about why these payoff values were selected.

  7. Randomly paired with a uniform probability.

  8. The excluded treatment involves an additional stage game after the PD is played. We hope to study this treatment in a future extension.

  9. This is akin to the player’s information set, but there is an important difference: the information set usually includes the history of play. That is not necessarily the case here, because ‘history’ is typically handled by the player’s memory, that is, set of past learning experiences, and not the problem vector.

  10. These need not be defined for a decision theory, so are not a formal part of CBDT, and need only be formally defined when one seeks to generate simulated choice behavior to compare to empirical data.

  11. Known exogenous forces acting on the agent are part of the problem vector.

  12. When \(p_\text {recall}{}=1\), the agent has perfect recall. It then corresponds to CBDT as it appears in Gilboa and Schmeidler (1995).

  13. For another point of view on this criterion, consider two competing theories that attempt to explain the same phenomenon: theory \(a\) and theory \(b\). Suppose that there is some empirical data available about the phenomenon. Suppose theory \(a\) has many more free parameters than theory \(b\) does. Now suppose we calibrate theories \(a\) and \(b\) to the data, and we find, after calibration, that theory \(a\) and theory \(b\) both explain the same fraction of the variation in the data and explain the same qualitative phenomena. Then, under the model complexity criterion, theory \(b\) is preferred.

  14. i.e., the equivalent to \(\beta = \frac{X'X}{X'Y}\).

  15. Thanks to an anonymous referee for this suggestion.

  16. To the authors’ knowledge, this is the first example of bootstrapping the standard errors of parameters in an agent-based model.

  17. There are also parameters we do not choose to vary. For example, we do not vary the functional form of similarity: we choose only the ‘accumulative’ form of similarity over ‘average similarity,’ as average similarity is found in Pape and Kurtz (2013) to cause the counterfactual behavior of individuals believing actions to be irrelevant. We also do not vary from the functional form of inverse exponential weighted Euclidean distance. There is a strong empirical case for this functional form in psychology, which is bolstered by the results found in Pape and Kurtz (2013). Please see that paper and Sect. 3 of this paper for details.

  18. An alternative modeling choice is to posit a two types of agents: ones who always cooperate when indifferent and ones who always defect. Then one calibrates the relative size of the two populations to the known cooperation rate in the first round. This alternative modeling strategy does not make a large difference in the results so it is not presented here.

  19. In the case of concept learning, the outcome variable in question is the rate of misclassification of particular objects over time in a supervised learning environment

  20. e.g., Nosofsky and Palmeri (1996), Love et al. (2004), Kurtz (2007) and Vigo (2013).

  21. They cite Bishop et al. (1975) as a statistical source for this method.

  22. This is possible if one notes that these computational structures of CBSA could ‘encode strategies’ in the way that a computer program encodes program behavior.

  23. Moreover, Matsui (2000) shows that Case-based Decision Theory and Expected Utility Theory can both represent the same choice behavior almost always. If the Probit can be likened to an expected utility perspective, then Matsui’s result would suggest that both the Probit and CBSA could match.

  24. It is interesting to consider running the models on each others’ outcomes: what strategies does a Probit suggest that the CBSA has encoded? And, in the other direction, what CBSA parameters emerge when CBSA seeks to predict the implied Probit strategies? We intend to consider these questions in a future extension.

  25. Camera and Casari provide their own interpretation of the Probit parameter estimates, so there is no need to reproduce it here.

  26. This is also predicted in the long run by CBSA; see Sect. 5.4 below.

  27. We do not interpret \(p\), the problem set, and \(\alpha \), the probability of cooperating when indifferent, because they are set by theory; therefore, their values are not a “finding” and are therefore it is not appropriate to interpret them as one would estimated parameters. Please see Sect. 3 for how those parameters were chosen.

  28. One could also interpret the value as being induced by some ‘hope distribution’ over all four payoff values, \(5, 10, 25,\) and \(30\).

  29. The agent-based modeling literature describes these two behaviors as ‘exploration’ versus ‘exploitation,’ so aspiration levels determine the switch from exploration to exploitation.

  30. We intend to develop this relationship in future work.

  31. Pape and Kurtz (2013) investigate alternative similarity functions, including ones that involve ‘no’ similarity, and these do not capture the facts regarding relative difficulty of problems that human subjects find.

  32. With \(1100\) simulation runs for each of the six problem category types, case-based decision theory finds the ordering that is observed in humans, with statistically significant differences. (This echoes the results of Pape and Kurtz (2013)). However, with the same number of runs, reinforcement learning finds all problems of equal difficulty (at least, not statistically significantly different in difficulty) and the point estimates do not follow the human ordering. Please see Appendix 3 for details.

References

  • Andreoni, J., & Miller, J. H. (1993). Rational cooperation in the finitely repeated prisoner’s dilemma: Experimental evidence. The Economic Journal, 103(418), 570–585.

    Article  Google Scholar 

  • Axelrod, R. (1980). Effective choice in the prisoner’s dilemma. Journal of Conflict Resolution, 24(1), 3–25.

    Google Scholar 

  • Ballinger, T. P., Hudson, E., Karkoviata, L., & Wilcox, N. T. (2011). Saving behavior and cognitive abilities. Experimental Economics, 14(3), 349–374.

    Article  Google Scholar 

  • Bereby-Meyer, Y., & Erev, I. (1998). On learning to become a successful loser: A comparison of alternative abstractions of learning processes in the loss domain. Journal of mathematical psychology, 42(2), 266–286.

    Article  Google Scholar 

  • Bereby-Meyer, Y., & Roth, A. E. (2006). The speed of learning in noisy games: Partial reinforcement and the sustainability of cooperation. The American Economic Review, 96(4), 1029–1042.

    Article  Google Scholar 

  • Billot, A., Gilboa, I., & Schmeidler, D. (2008). Axiomatization of an exponential similarity function. Mathematical Social Sciences, 55(2), 107–115.

    Article  Google Scholar 

  • Bishop, Y. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press.

    Google Scholar 

  • Bleichrodt, H., Filko, M., Kothiyal, A., & Wakker, P. P. (2012). Making case-based decision theory directly observable. New York: Mimeo.

    Google Scholar 

  • Bó, P. D. (2005). Cooperation under the shadow of the future: Experimental evidence from infinitely repeated games. The American Economic Review, 95(5), 1591–1604.

    Article  Google Scholar 

  • Bo, P. D., & Fréchette, G. R. (2011). The evolution of cooperation in infinitely repeated games: Experimental evidence. The American Economic Review, 101(1), 411–429.

    Article  Google Scholar 

  • Brosig, J. (2002). Identifying cooperative behavior: Some experimental results in a prisoner’s dilemma game. Journal of Economic Behavior & Organization, 47(3), 275–290.

    Article  Google Scholar 

  • Camera, G., & Casari, M. (2009). Cooperation among strangers under the shadow of the future. The American Economic Review, 99(3), 979–1005.

    Article  Google Scholar 

  • Camerer, C., & Hua Ho, T. (1999). Experience-weighted attraction learning in normal form games. Econometrica, 67(4), 827–874.

    Article  Google Scholar 

  • Cappelletti, D., Guth, W., & Ploner, M. (2011). Being of two minds: Ultimatum offers under cognitive constraints. Journal of Economic Psychology, 32(6), 940–950.

    Article  Google Scholar 

  • Chong, J. K., Camerer, C. F., & Ho, T. H. (2006). A learning-based model of repeated games with incomplete information. Games and Economic Behavior, 55(2), 340–371.

    Article  Google Scholar 

  • Cooper, R., DeJong, D. V., Forsythe, R., & Ross, T. W. (1996). Cooperation without reputation: Experimental evidence from Prisoner’s Dilemma games. Games and Economic Behavior, 12(2), 187–218.

    Article  Google Scholar 

  • Ellison, G. (1994). Cooperation in the Prisoner’s Dilemma with anonymous random matching. The Review of Economic Studies, 61(3), 567–588.

    Article  Google Scholar 

  • Ellsberg, D. (1961). Risk, ambiguity, and the savage axioms. Quarterly Journal of Economics, 75(4), 643–669.

    Article  Google Scholar 

  • Erev, I., & Roth, A. E. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American economic review, 99(3), 848–881.

    Google Scholar 

  • Erev, I., & Roth, A. E. (2001). Simple reinforcement learning models and reciprocation in the prisoners dilemma game. In G. Gigerenzer & R. Selten (Eds.), Bounded rationality: The adaptive toolbox (pp. 215–231). Cambridge, MA: MIT Press.

    Google Scholar 

  • Erev, I., Bereby-Meyer, Y., & Roth, A. E. (1999). The effect of adding a constant to all payoffs: Experimental investigation, and implications for reinforcement learning models. Journal of Economic Behavior & Organization, 39(1), 111–128.

    Article  Google Scholar 

  • Friedman, J. W. (1971). A non-cooperative equilibrium for supergames. The Review of Economic Studies, 38, 1–12.

    Article  Google Scholar 

  • Fudenberg, D., & Kreps, D. M. (1993). Learning mixed equilibria. Games and Economic Behavior, 5(3), 320–367.

    Article  Google Scholar 

  • Fudenberg, D., & Kreps, D. M. (1995). Learning in extensive-form games I. Self-confirming equilibria. Games and Economic Behavior, 8(1), 20–55.

    Article  Google Scholar 

  • Fudenberg, D., & Maskin, E. (1986). The folk theorem in repeated games with discounting or with incomplete information. Econometrica, 54, 533–554.

    Article  Google Scholar 

  • Gayer, G., Gilboa, I., & Lieberman, O. (2007). Rule-based and case-based reasoning in housing prices. The BE Journal of Theoretical Economics (Advances), 7(1), 1–35.

    Google Scholar 

  • Gilboa, I., & Schmeidler, D. (1995). Case-based decision theory. The Quarterly Journal of Economics, 110(3), 605–639.

    Article  Google Scholar 

  • Gilboa, I., & Schmeidler, D. (1996). Case-based optimization. Games and Economic Behavior, 15, 1–26.

    Article  Google Scholar 

  • Golosnoy, V., & Okhrin, Y. (2008). General uncertainty in portfolio selection: A case-based decision approach. Journal of Economic Behavior & Organization, 67(3), 718–734.

    Article  Google Scholar 

  • Hanoch, Y. (2002). Neither an angel nor an ant: Emotion as an aid to bounded rationality. Journal of Economic Psychology, 23(1), 1–25.

    Article  Google Scholar 

  • Hauk, E., & Nagel, R. (2001). Choice of partners in multiple two-person prisoner’s dilemma games an experimental study. Journal of conflict resolution, 45(6), 770–793.

    Article  Google Scholar 

  • Hume, D. (1748). Philosophical essays concerning human understanding. London: A. Millar.

  • Kreps, D. M., Milgrom, P., Roberts, J., & Wilson, R. (1982). Rational cooperation in the finitely repeated Prisoners’ Dilemma. Journal of Economic theory, 27(2), 245–252.

    Article  Google Scholar 

  • Kurtz, K. J. (2007). The divergent autoencoder (DIVA) model of category learning. Psychonomic Bulletin & Review, 14(4), 560–576.

    Article  Google Scholar 

  • Kydland, F. E., & Prescott, E. C. (1996). The computational experiment: An econometric tool. The Journal of Economic Perspectives, 10(1), 69–85.

    Article  Google Scholar 

  • Love, B., Medin, D., & Gureckis, T. (2004). SUSTAIN: A network model of category learning. Psychological Review, 111(2), 309.

    Article  Google Scholar 

  • Luce, R. D., & Raiffa, H. (1957). Games and decisions. New York: WIley.

    Google Scholar 

  • Matsui, A. (2000). Expected utility and case-based reasoning. Mathematical Social Sciences, 39(1), 1–12.

    Article  Google Scholar 

  • Miller, J. H. (1996). The coevolution of automata in the repeated Prisoner’s Dilemma. Journal of Economic Behavior & Organization, 29(1), 87–112.

    Article  Google Scholar 

  • Monterosso, J. (2002). The fragility of cooperation: A false feedback study of a sequential iterated Prisoner’s Dilemma. Journal of Economic Psychology, 23(4), 437–448.

    Article  Google Scholar 

  • Nachbar, J. H. (1997). Prediction, optimization, and learning in repeated games. Econometrica, 65, 275–309.

    Article  Google Scholar 

  • Nash, J. (1950). The bargaining problem. Econometrica, 18(2), 155–162.

    Article  Google Scholar 

  • Nash, J. (1953). Two-person cooperative games. Econometric, 21(1), 128–140.

    Article  Google Scholar 

  • Nosofsky, R. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology, 115(1), 39–57.

    Article  Google Scholar 

  • Nosofsky, R., & Palmeri, T. (1996). Learning to classify integral-dimension stimuli. Psychonomic Bulletin and Review, 3, 222–226.

    Article  Google Scholar 

  • Nosofsky, R., Gluck, M., Palmeri, T., McKinley, S., & Glauthier, P. (1994). Comparing models of rule-based classification learning: A replication and extension of Shepard, Hovland, and Jenkins (1961). Memory and Cognition, 22, 352–352.

    Article  Google Scholar 

  • Ossadnik, W., Wilmsmann, D., & Niemann, B. (2012). Experimental evidence on case-based decision theory. Theory and Decision, 75, 1–22. doi:10.1007/s11238-012-9333-4.

    Google Scholar 

  • Pape, A. D., & Kurtz, K. J. (2013). Evaluating case-based decision theory: Predicting empirical patterns of human classification learning. Games and Economic Behavior, 82, 52–65.

    Article  Google Scholar 

  • Roth, A. E., & Erev, I. (1995a). Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term. Games and Economic Behavior, 8(1), 164–212.

    Article  Google Scholar 

  • Roth, A. E., & Erev, I. (1995b). Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term. Games and Economic Behavior, 8(1), 164–212.

    Article  Google Scholar 

  • Savage, L. J. (1954). The foundations of statistics. New York: Wiley.

    Google Scholar 

  • Selten, R., & Stoecker, R. (1986). End behavior in sequences of finite prisoner’s dilemma supergames a learning theory approach. Journal of Economic Behavior & Organization, 7(1), 47–70.

    Article  Google Scholar 

  • Shepard, R. (1987). Toward a universal law of generalization for psychological science. Science, 237(4820), 1317.

    Article  Google Scholar 

  • Shepard, R., Hovland, C., & Jenkins, H. (1961). Learning and memorization of classifications. Psychological Monographs, 75, 1–41.

    Article  Google Scholar 

  • Simon, H. A. (1957). Models of man; social and rational. New York: Wiley.

    Google Scholar 

  • Simon, H. A., Egidi, M., & Marris, R. L. (2008). Economics, bounded rationality and the cognitive revolution. Northampton, MA: Edward Elgar Publishing.

    Google Scholar 

  • Tesfatsion, L. (2006). Handbook of computational economics. Amsterdam: Elsevier B.V.

    Google Scholar 

  • Tyson, C. J. (2008). Cognitive constraints, contraction consistency, and the satisficing criterion. Journal of Economic Theory, 138(1), 51–70.

    Article  Google Scholar 

  • Vigo, R. (2013). The GIST of concepts. Cognition, 129(1), 138–162.

    Article  Google Scholar 

  • von Neumann, J., & Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton, NJ: Princeton University Press.

    Google Scholar 

  • Wooldridge, J. (2012). Introductory econometrics: A modern approach. Andover, MA: Cengage Learning.

    Google Scholar 

Download references

Acknowledgments

This work is supported by the USDA National Institute of Food and Agriculture, Hatch project 1005053.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Duus Pape.

Additional information

The order of the authors is not indicative of effort or contribution towards this article.

Appendices

Appendix 1: Details about the constrained probit

Similar to the Camera and Casari (2009) paper, this is a Probit regression to identify the marginal effect of these different strategies. Table 2 shows the results of the different strategies from the ‘constrained Probit,’ which differs from the one in Camera and Casari (2009) only in that individual and cycle fixed effects are excluded. To make clear the construction of the variables included, the grim trigger is coded as 1 for all periods following a defection. The lag variables are to control for the five following periods after a defection. The lag 1 contains a 1 for the first period after a defection by an opponent but zero for all other periods, and a lag 2 contains a 1 for the second period after a defection by an opponent. If the player chooses to defect after observing a defection by an opponent, then the expectation would be a negative coefficient on at least one of the grim trigger or lags.

Appendix 2: The explicit problem-result map corresponding to Camera and Casari (2009)

Fig. 7
figure 7

The Camera and Casari Prisoner’s Dilemma experiment PRM

Appendix 3: Case-based decision theory versus reinforcement learning

Table 2 Probit regression on individual choice to cooperate: marginal effects

Table 3 contains the results from CBSA versus Reinforcement Learning on the six canonical learning problems tested in the concept learning literature starting with Shepard et al. (1961). CBSA was tested against these types in Pape and Kurtz (2013). Standard CBSA with perfect memory and the standard similarity function was tested against reinforcement learning for these six problems and compared against the data from Nosofsky and Palmeri (1996). The ordering of the columns, \( I < \textit{IV} < \textit{III} < V < \textit{II} < \textit{VI}\), indicates the relative difficulty of the problems as humans find it (in the data by Nosofsky and Palmeri). Each cell contains 1100 simulation runs. Note that the order of the means of the CBSA results matches the human ordering and that the order of the means of RL does not. Moreover, the ordering of CBSA is significant (all neighboring pairs pass t-tests at the \(1\,\%\) level). We believe this is attributable to the similarity function that CBSA has that allows extrapolation from some problems to others.

Table 3 CBSA versus reinforcement learning: Nosofsky and Palmeri (1996)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guilfoos, T., Pape, A.D. Predicting human cooperation in the Prisoner’s Dilemma using case-based decision theory. Theory Decis 80, 1–32 (2016). https://doi.org/10.1007/s11238-015-9495-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11238-015-9495-y

Keywords

Navigation