Skip to main content

The Cooperative Origins of Epistemic Rationality?


Recently, both evolutionary anthropologists and some philosophers have argued that cooperative social settings unique to humans play an important role in the development of both our cognitive capacities and what Michael Tomasello terms the “construction” of “normative rationality” or “a normative point of view as a self-regulating mechanism.” In this article, I use evolutionary game theory to evaluate the plausibility of the claim that cooperation fosters epistemic rationality. Employing an extension of signal-receiver games that I term “telephone games,” I show that cooperative contexts work as advertised: under plausible conditions, these scenarios favor epistemically rational agents over irrational ones designed to do just as well as them in non-interactive contexts. I then show that the basic results are strengthened by introducing complications that make the game more realistic.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. There are some noteworthy differences between the treatment given by anthropologists and that given by philosophers. Tomasello (2014) and Mercier and Sperber (2017) are primarily focused on why we recognize epistemic norms. Dogramaci (2012, 2015), Sharadin (2018), Graham (2020) are more concerned with normative questions. My focus in this essay will be on the former.

  2. This is true even of some (but not all) of those philosophers suspicious of traditional understandings of objectivity; see, e.g., Douglas (2009).

  3. As one reviewer helpfully reminded me, the game is often called “Chinese whispers,” and there’s a fairly long history of analogies between evolutionary mechanisms of various sorts and “Chinese whispers” (see, e.g., Ridley, 2001). I have yet to encounter a formalization using game theoretic tools along the lines developed here, but it would not surprise me to discover one.

  4. As noted above, there are interesting debates about how to understand the sense of objectivity involved in epistemic rationality, and there may well be cases in which there’s no such thing as an objective probability that might complicate the analysis. In such cases, it looks like the proposed epistemic norms suffer a kind of presupposition failure. For present purposes, however, I’ll assume that the cases of interest are ones in which there are objective probabilities in the form of well-defined long-run frequencies. The question is whether hewing to such probabilities is evolutionarily advantageous.

  5. In all of the simulations conducted in this paper, the standard deviation for observation power was .05. Test runs with other amounts of variance indicated that, as expected, increasing/decreasing the variance simply served to smooth/roughen the observed effect.

  6. Note that we could interpret this game differently: we could understand the rational agents as honest and the wishful ones as dishonest (in a particular way). That is: in cooperative scenarios, wishful thinking can be seen as akin to a kind of dishonesty.

  7. Well, ok, that’s not quite right: essentially, the wishful agent maximizes the expected utility of their beliefs relative to a restricted subset of possible outcomes—namely, non-cooperative scenarios. One of way of viewing the results that follow in this paper is as showing that this restricted subset is not representative: an agent well-designed for non-cooperative scenarios is not necessarily well-designed for cooperative ones. Even the wishful thinkers that I’ve designed here, who are unrealistically good at managing traditional problems with overconfidence in the setting of individual belief-formation, can have trouble in social contexts, because the latter present pitfalls for wishful thinkers that are unpredictable from a purely individualist perspective.

  8. For discussion of the modeling assumptions involved in the use of the replicator dynamics, see O’Connor (2020). For discussion of the mathematical differences between different versions of the discrete replicator dynamics, see Pandit, Mukhopadhyay, and Chakraborty (2018).

  9. As one reviewer rightly noted, we should expect that agents in this kind of cooperative situation would develop mechanisms to determine what kind of agent was providing them with testimony. Since these would essentially be cheater-detection mechanisms, I take it that their development and operation will introduce a familiar arms-race between detectors and cheaters. While it’s an interesting question whether this arms-race will play out in the same way as in more familiar scenarios, issues of space require that question to be left to future work. (See also the end of Sect. 5.)


  • Bicchieri, C. (1997). Learning to cooperate. In C. Bicchieri, R. Jeffrey, & B. Skyrms (Eds.), The dynamics of norms (pp. 17–46). Cambridge University Press.

    Google Scholar 

  • Bicchieri, C. (2006). The grammar of society: The nature and dynamics of social norms. Cambridge University Press.

    Google Scholar 

  • Bicchieri, C. (2017). Norms in the wild: How to diagnose, measure, and change social norms. Oxford University Press.

    Book  Google Scholar 

  • Dogramaci, S. (2012). Reverse engineering epistemic evaluations. Philosophy and Phenomenological Research, 84(3), 513–30.

    Article  Google Scholar 

  • Dogramaci, S. (2015). Communist conventions for deductive reasoning. Noûs, 49(4), 776–99.

    Article  Google Scholar 

  • Douglas, H. (2009). Science, policy, and the value-free ideal. Pittsburgh University Press.

    Book  Google Scholar 

  • Forber, P. (2010). Confirmation and explaining how possible. Studies in History and Philosophy of Science Part C, 41(1), 32–40.

    Article  Google Scholar 

  • Forber, P., & Smead, R. (2015). Evolution and the classification of social behavior. Biology & Philosophy, 30(3), 405–21.

    Article  Google Scholar 

  • Gigerenzer, G. (2008). Rationality for mortals: How people cope with uncertainty. Oxford University Press.

    Google Scholar 

  • Graham, P. J. (2020). Assertions, handicaps, and social norms. Episteme (online first).

  • Hieronymi, P. (2005). The wrong kind of reason. Journal of Philosophy, 102(9), 437–57.

    Article  Google Scholar 

  • Lewis, D. (1969). Convention. Harvard University Press.

    Google Scholar 

  • McLoone, B., Fan, W. T. L., Pham, A., Smead, R., & Loewe, L. (2018). Stochasticity, selection, and the evolution of cooperation in a two-level Moran model of the snowdrift game. Complexity, 2018, 1–14.

    Article  Google Scholar 

  • McLoone, B., & Smead, R. (2014). The ontogeny and evolution of human collaboration. Biology & Philosophy, 29(4), 559–576.

    Article  Google Scholar 

  • Mercier, H., & Sperber, D. (2017). The enigma of reason. Harvard University Press.

    Google Scholar 

  • Morton, J. M. (2017). Reasoning under scarcity. Australasian Journal of Philosophy, 95(3), 543–59.

    Article  Google Scholar 

  • O’Connor, C. (2020). Games in the philosophy of biology. Cambridge University Press.

    Book  Google Scholar 

  • Okasha, S. (2013). The evolution of Bayesian updating. Philosophy of Science, 80(5), 745–57.

    Article  Google Scholar 

  • Okasha, S. (2018). Agents and goals in evolution. Oxford University Press.

    Book  Google Scholar 

  • Pandit, V., Mukhopadhyay, A., & Chakraborty, S. (2018). Weight of fitness deviation governs strict physical chaos in replicator dynamics. Chaos, 28(3), 1–12.

    Article  Google Scholar 

  • Ridley, M. (2001). The cooperative gene: How Mendel’s demon explains the evolution of complex beings. Free Press.

    Google Scholar 

  • Roberts, G. (2005). Cooperation through interdependence. Animal Behavior, 70(4), 901–8.

    Article  Google Scholar 

  • Sharadin, N. (2018). Epistemic instrumentalism and the reason to believe in accord with the evidence. Synthese, 195(9), 3791–3809.

    Article  Google Scholar 

  • Skyrms, B. (1996). Evolution of the social contract. Cambridge University Press.

    Book  Google Scholar 

  • Skyrms, B. (2004). The stag hunt and the evolution of social structure. Cambridge University Press.

    Google Scholar 

  • Skyrms, B. (2010). Signals: Evolution, learning, and information. Oxford University Press.

    Book  Google Scholar 

  • Sterelny, K. (2012). The evolved apprentice: How evolution made humans unique. The MIT Press.

    Book  Google Scholar 

  • Tomasello, M. (2014). A natural history of human thinking. Harvard University Press.

    Book  Google Scholar 

  • Tomasello, M. (2016). A natural history of human morality. Harvard University Press.

    Book  Google Scholar 

  • Tomasello, M. (2017). Becoming human: A theory of ontogeny. Harvard University Press.

    Google Scholar 

  • Tomasello, M. (2020). The ontogenetic foundations of epistemic norms. Episteme (online first).

  • Tomasello, M., Melis, A. P., Tennie, C., Wyman, E., Herrmann, E., Gilby, I. C., Hawkes, K., Sterelny, K., Wyman, E., Tomasello, M., & Melis, A. (2012). Two key steps in the evolution of cooperation: The interdependence hypothesis. Current Anthropology, 53(6), 673–92.

    Article  Google Scholar 

  • Zollman, K. J. S. (2015). Modeling the social consequences of testimonial norms. Philosophical Studies, 172(9), 2371–2383.

    Article  Google Scholar 

  • Zollman, K. J. S. (2017). Learning to collaborate. In T. Boyer-Kassem, C. Mayo-Wilson, & M. Weisberg (Eds.), Scientific collaboration and collective knowledge: New essays (pp. 65–77). Oxford University Press.

    Google Scholar 

Download references


I’d like to thank Hannah Rubin for comments on an earlier version of this paper, and both Hannah Rubin and Mike Schneider for a number of extremely fruitful conversations on both the topics pursued here and the methods employed.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Corey Dethier.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Technical Details

Appendix: Technical Details

Defining Agential Behavior

Let \(Pr_s\) indicate the agent’s subjective probability, \(Pr_o\) the objective probability, and u the utility function for that agent. Our rational agents have perfect uptake of the objective probabilities (i.e., for all \(s_i \in S\), \(Pr_s(s_i|o) = Pr_o(s_i|o)\)), act to maximize expected utility, and are “credulous” in the sense that when they receive a report regarding the probability distribution from another agent, their subjective probability assignment is the same as the assignment that they receive (letting \(Pr_r\) indicate the probability assignment received: for all \(s_i \in S\), \(Pr_s(s_i) = Pr_r(s_i)\)). Credulity is an idealization, obviously, but it may be a good one (Zollman, 2015).

Wishful agents are little more complicated. First, they assign probabilities as follows:

$$\begin{aligned} Pr_s(s_i|o)&= \frac{Pr_o(s_i|o)u(s_i)}{\sum _{s_j \in S} Pr_o(s_j|o)u(s_j)} \end{aligned}$$

which is to say that their probability assignment is equivalent to their expected utility for each state normalized so as to behave like probabilities. Similarly, when receiving a probability assignment, they incorporate their utilities into their assignment of probabilities like so:

$$\begin{aligned} Pr_s(s_i)&= \frac{Pr_r(s_i)u(s_i)}{\sum _{s_j \in S} Pr_r(s_j)u(s_j)} \end{aligned}$$

In other words, wishful agents are “credulous” in the same sense that rational agents are: they don’t discount the reports of others by taking into account the trustworthiness of the sender or by or incorporating their own priors. Instead, they take the probabilities at “face value,” only incorporating their preferences, just as they do in response to an observation. Finally, in order to avoid double-counting their utilities, they determine actions a little bit differently, namely by picking the action that has the highest “probability” according to their assignment.

Note that this particular way of cashing out what “wishful” means only results in identical behavior with rational agents in non-interactive scenarios due to stipulations about the payoff structure. In particular, it requires that action x and state x are guaranteed to have the same expected utility.

Determining Expected Utilities

In any given scenario, every agent has an expected utility. So, for example, in a degree-2 cardinality-2 game with one rational agent and one wishful agent, mean observation power set to .75, and observation power \(\sigma ^2\) of .05, the expected utility of the rational agent (\(u_R\)), wishful agent (\(u_W\)), and actor (\(u_A\)) are \(u_R(1,1) = 1.19\), \(u_W(1,1) = 1.29\), and \(u_A(1,1) = 1.29\), where the numbers in parentheses indicate the number of rational and wishful agents in the chain respectively. To determine these values, I ran one million simulations for each scenario.

The overall expected utility for an agent is then determined by how likely they are to find themselves in each possible scenario, which depends on the agent themselves and the proportion of agents playing the rational strategy. Letting R indicate this latter value, the overall expected utility for a rational agent in any degree-2 cardinality-2 game is

$$\begin{aligned} u_R&= \frac{2}{3}\Big [\big [R * u_R(2,0)\big ] + \big [(1-R) * u_R(1,1)\big ]\Big ] \nonumber \\&\quad +\,\frac{1}{3}\Big [\big [R^2 * u_A(2,0)\big ] + \big [2* R*(1-R)* u_A(1,1)\big ] \nonumber \\&\quad +\,\big [(1-R)^2 * u_A(0,2)\big ]\Big ] \end{aligned}$$

Spelling this out: any agent in a cardinality-2 game has a 2 in 3 probability of being a testifier rather than an actor; given that they are a testifier, their probability of being paired with a rational agent is given by R and their probability of being paired with a wishful agent by \(1-R\). These facts allow us to weight the probability of different scenarios that the rational agent can find themselves in, giving us their total expected utility when entering into the game. The expected utility for a wishful agent, while identical in the second term, differs from the above in that the first term is

$$\begin{aligned} u_W&= \frac{2}{3}\Big [\big [R * u_W(1,1)\big ] + \big [(1-R) * u_W(0,2)\big ]\Big ] + \cdots \end{aligned}$$

The difference stems from the features of the individual agent themselves: wishful agents can’t find themselves in a group of only rational agents, because they themselves are part of the group (and similarly for rational agents).

The Initial Model

To run the model itself, I set R to .5 and then allowed it to evolve according to a discrete version of the replicator dynamics. That is, at each step n, \(R(n+1)\) was calculated by adding the weighted advantage of rational agents over the average agent like so:

$$\begin{aligned} R(n+1)&= R(n) + R(n)\big [u_R - [R(n)*u_R + (1-R(n))*u_W]\big ] \end{aligned}$$

The simulation halted when one of three conditions was met: (a) one strategy was played by \(99.99\%\) or more of agents, (b) the percentage of agents playing each strategy after a given round was identical (to the hundredth of a percent) to the percentage playing after the previous round, (c) 1000 rounds had passed. (Why 1000 rounds? I found that there are three different behaviors that I wanted to distinguish: (a) halting at a percentage substantially under 100%, (b) relatively quick but ultimately asymptotic approach of 100%, and (c) very slow movement towards 100%. At 500, it’s hard to distinguish between (a) and (c). At 1500 it starts to get hard to distinguish between (b) and (c). 1000 is a happy medium.)

A summary of the results for the first game can be found in Table 2.

Table 2 Summary of simulation results for games of cardinality 2, 3, and 4, where “\(\%\)R” indicates the percentage of the population playing the rational strategy and “Turns” the number of turns the model took to shut off

Complicating the Model

The complications introduced in Sect. 5 work as follows. First, I introduced non-cooperators who do not interact with other agents: they simply take a guaranteed payoff, which ensures that their expected utility is equal to 1 + 1/the cardinality of the game (the second term represents their chance of getting a “preferred” state via this guaranteed payoff). Second, I had cooperators of both wishful and rational variety randomly attempt to cooperate with five other individuals, with the degree of the game depending on how many of those individuals were also cooperators and failure to find any other cooperators resulting in a payoff of 0. This captures the element of risk associated with the cooperative strategy: the more non-cooperators in the population, the higher the probability a cooperator gets nothing. Third, to capture the benefits of cooperation, I increased the payoffs depending on the number of cooperators in the group: it’s hard to get five cooperators together—and doing so risks massive information loss—but the rewards for success in such cases are dramatic. Specifically, I set the advantage equal to 1 + the number of total cooperators in the interaction divided by 2. (This particular choice of advantage is largely arbitrary, but the overall pattern of when cooperation is favored doesn’t depend on it.) The expected utility of a rational agent is then given by:

$$\begin{aligned} u_R&= \big [Pr(1\ \text {cooperator}) *\ \text {ex. payoff in d-1 game} *\ \text {adv. of d-1 game}\big ]\nonumber \\&\quad + \big [Pr(2\ \text {cooperators}) *\ \text {ex. payoff in d-2 game} *\ \text {adv. of d-2 game}]\ +\cdots \end{aligned}$$
$$\begin{aligned}= \big [5(1-N)(N)^4 * u_{R_1} * 1.5\big ]\ +\ \big [10(1-N)^2(N)^3 * u_{R_2} * 2\big ]\ +\cdots \end{aligned}$$

where N is the proportion of the population who are not cooperating and \(u_{R_x}\) (or, in the wishful case, \(u_{W_x}\)) the expected utility of agents in games of degree-x. The extension to the wishful case is straightforward.

Introducing the additional kind of agent required slightly altering the dynamics. Rather than (3), above, \(R(n+1)\) and \(N(n+1)\) were calculated like so:

$$\begin{aligned} R(n+1)= R(n) + (1-N(n))*R(n)\big [u_R - [R(n)*u_R + (1-R(n))*u_W]\big ] \end{aligned}$$
$$\begin{aligned} N(n+1)&= N(n) + N(n)\Big [u_N - \big [N(n)u_N + (1-N(n)) \nonumber \\&\quad *[R(n)*u_R + (1-R(n))*u_W]\big ]\Big ] \end{aligned}$$

The second equation is standard: \(N(n+1)\) is calculated by adding the weighted advantage of non-cooperators over the average agent. The first is modified to account for the assumption that the difference between rational and wishful agents is only relevant to cooperators.

A summary of the results for the second game can be found in Table 3.

Table 3 Summary of simulation results for the more complicated games. As before, %R indicates the percentage of rationals, %N indicates the percentage of non-cooperators. The initial proportion of rational to wishful agents in all three cardinalities was 1:1

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dethier, C. The Cooperative Origins of Epistemic Rationality?. Erkenn (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: