When we are investigating biological cases in which cooperation of various varieties has evolved, often a pattern appears—these examples seem particularly amenable to “freeloading” or “cheating.” Consider the case of predator inspection in guppies (Dugatkin and Alfieri 1991). In some species of fish, a small number of individuals in a school (the “inspectors”) will leave the group to slowly approach and gather information about a much larger possible predator. Now, consider the possible fitness advantage or disadvantage that might hold between each pair of fish in a group. If both fish choose to inspect a predator, they gain information about the predator while sharing the risk. If neither inspect, they lack the information about the predator but neither incurs any risk. But the best outcome of all for any individual fish is the outcome in which the other inspects (incurring all the risk) while it itself stays back. It is best, it seems, for any individual fish to be a freeloader, to refuse to contribute to the group's mutual benefit.
This structure of incentives, as it turns out, is quite common in nature—examples include food gathering, tree height, the expansion of plant roots, body size, and even the replication of virus populations (see Easley and Kleinberg 2010, Chap. 7). Axelrod and Hamilton (1981, p. 1392) describe the same pattern in the differential response of bacteria to environmental change and in the behavior of primates. It is also apparent in human behavior, such as deciding whether to shoot at the enemy during trench warfare in World War I (Axelrod 1984, pp. 73–87). A particularly pedagogically important example is the evolution of behavior that conforms to moral norms—it is easy to see that being a “moral cheater” while those around you do the right thing presents many of the same advantages as the sort of freeloading described here. The importance of teaching the evolution of morality has been stressed by Allchin (2009a, b).
What's more, this network of incentives is equivalent to a well-studied problem in game theory: the prisoner's dilemma. In this example, we are asked to consider two imprisoned members of a gang, one of whom has committed a crime. The state lacks enough evidence to convict either one, so they attempt to get each prisoner to turn the other in. Each prisoner (confined separately and unable to communicate with his partner) is given the choice to remain silent (to “cooperate” with their partner) or to turn state's evidence and testify against the other (to “defect” against their partner). If both cooperate (remain silent), they will each receive a small jail term. If both defect (turn the other in), the state knows that one of them must be lying and gives both a moderate jail sentence. If one defects and another cooperates, however, the defector walks free for his assistance, and the cooperator receives the maximum possible sentence.
We can now formalize this structure using the tools of game theory. Turn the various possible “outcomes” for the two prisoners into numerical “payoff” values—with higher “payoff” for lower jail time. If both prisoners choose to cooperate, they each receive a payoff of three (the light sentence). Should one cooperate and the other defect, the cooperator gets the “sucker's payoff” of zero (the maximum sentence), and the defector gets five (walking away free, the best of all). Should they both defect, however, they both receive a payoff of one (the moderate sentence; see Table 1). So it is better for both if both cooperate than if both defect.
Table 1 Payoffs for each player (“A, B”) in the traditional prisoner's dilemma (Axelrod 1984, p. 8) But now, consider whether it is more beneficial for me to cooperate or to defect. If my opponent chooses to defect, then I should defect, to receive a payoff of one instead of zero. On the other hand, if my opponent chooses to cooperate, then I should also defect, to receive a payoff of five instead of three. Defection, therefore, is (in the terminology of game theory) strongly dominant—regardless of what my opponent does, defecting gives me a higher payoff than cooperating, so it seems that I should choose to defect even if I don't know what he'll do. Similar reasoning on my opponent's part leads to the conclusion that he should defect as well. Mutual defection is thus recommended by dominance reasoning, despite the fact that it would be better for both of us if we both were to cooperate rather than defect.
The evolution of cooperation in situations with the structure of the prisoner's dilemma therefore poses an interesting problem for evolutionary theory. For in cases in which phenomena in nature have the structure of the prisoner's dilemma, it seems that cooperation cannot evolve: defecting is always individually beneficial, regardless of what the partner's action is, and defection will thus be evolutionarily favored.Footnote 3 And yet, cooperation seems to have evolved in some of these situations—guppies do inspect predators.Footnote 4 How can we explain these instances of evolved cooperation?
One method for escaping the dilemma was brought to the fore by Axelrod and Hamilton (1981) and led to Axelrod's seminal book, The Evolution of Cooperation (1984). The key, as Axelrod and Hamilton argue, is to move to the iterated prisoner's dilemma. In many real-world cases, including the example of predator inspection in guppies, individuals each interact repeatedly with a limited number of others, and so (1) each individual is involved in many prisoner's dilemma-type situations with each other individual, and (2) each individual remembers how other individuals behaved in past interactions. If we add to the model the assumptions that the game is played more than once and that the players keep track of what happened during their previous interactions with each other, then we can allow players' choices in a particular interaction to depend on what happened in the past. And this might give individuals an incentive to cooperate: if they can elicit future cooperation by cooperating now, the long-run payoff associated with cooperating now might be higher than that associated with defecting now.
As long as there is some chance that two individuals will meet again, there is no best strategy independent of the strategy used by the other player. In particular, unless the other player is completely insensitive to what an individual does (duplicating, in effect, the behavior of the non-iterated prisoner's dilemma), “always defect” is not a very attractive strategy. Strategies that can successfully elicit future cooperation from the other player will net a higher payoff. (And strategies that can elicit future cooperation from the other player while cooperating only minimally themselves will net the highest payoff of all.) There is an enormously high number of possible strategies, so the question of which strategy generally does best—which strategy performs well against a wide variety of strategies that may be employed by other players—is difficult to answer analytically. Luckily, though, we can shed some light on it experimentally. To do so, Axelrod (1984) ran a very large iterated prisoner's dilemma tournament. He solicited entrants—computer programs each containing a strategy for playing the iterated prisoner's dilemma—from professional game theorists and received 14 entries in total. He then had each entrant compete head-to-head against each other entrant, for five games of 200 moves each. Each entrant's final score was the sum of its scores in each pair-wise matchup.Footnote 5 The strategies with the highest final scores, surprisingly enough, possessed some very “cooperative” characteristics. Most strikingly, the top eight strategies, and none of the others, were “nice”: they never played “defect” before their opponent did. Of these top strategies, the most successful were “retaliatory” but “forgiving”: they punished what Axelrod called “uncalled for” defection (1984, p. 44) but retained a “propensity to cooperate” with that opponent nonetheless (1984, p. 36). The highest-performing strategy, “Tit-for-Tat,” has all three of these features: it cooperates on the first round, and then on each subsequent round repeats the previous action of the player with which it interacts. This implies that it is both retaliatory (if the opponent defects, Tit-for-Tat will defect in the next interaction) and forgiving (if the opponent cooperates, even after a defection, Tit-for-Tat will cooperate in the next interaction).
Axelrod then ran a second tournament with newly solicited entries (this time 62). In this tournament, each pair-wise matchup again played the iterated game five times, each time with a finite number of rounds. But here, the number of rounds was determined by setting the probability that two strategies would meet again (i.e., that the game would continue) to 0.99654.Footnote 6 Interestingly enough, Tit-for-Tat won again. And this was true even though the results of the first tournament were publicly available and programmers could attempt to specifically devise a strategy that would outperform it. The success of Tit-for-Tat appears to be extremely robust. Here, it seems, is an opening for the evolution of cooperative behavior.
The simple model developed by Axelrod has been discussed extensively throughout the literature. Skyrms has described both the prisoner's dilemma results and a similar game known as the stag hunt in the context of the evolution of cooperation (Skyrms 2003), and has also applied these insights to considering the development of the social contract (Skyrms 1996). Sigmund (2010) has connected the prisoner's dilemma and a handful of other games to phenomena like learning, reputation, repetition, and public goods (such as in the tragedy of the commons).
Further, researchers have repeatedly extended the basic framework presented here in order to provide more robust models of real-world behavior. The tournament has been modified to include choice and refusal of partners (Stanley et al.1994), to cope with noise in signaling the choice to cooperate or defect (Wu and Axelrod 1995), to interactions between more than two players (Yao and Darween 1995), and to include the effects of the spatial organization of players in the interactions (Ferriere and Michod 1995). Related models in evolutionary game theory have been investigated that explore the punishment of defection (Boyd et al.2010), the choice to join either a group that punishes or a group that fails to punish (Hauert et al.2007), the development of reward rather than punishment systems (Rand et al.2009), and the dispensation of rewards to strangers who have in turn been kind to others (Ule et al.2009). All these various extensions and expansions derive from the fundamental idea of the iterated prisoner's dilemma tournament.