The consequences of switching strategies in a two-player iterated survival game

We consider two-player iterated survival games in which players are able to switch from a more cooperative behavior to a less cooperative one at some step of an n-step game. Payoffs are survival probabilities and lone individuals have to finish the game on their own. We explore the potential of these games to support cooperation, focusing on the case in which each single step is a Prisoner’s Dilemma. We find that incentives for or against cooperation depend on the number of defections at the end of the game, as opposed to the number of steps in the game. Broadly, cooperation is supported when the survival prospects of lone individuals are relatively bleak. Specifically, we find three critical values or cutoffs for the loner survival probability which, in concert with other survival parameters, determine the incentives for or against cooperation. One cutoff determines the existence of an optimal number of defections against a fully cooperative partner, one determines whether additional defections eventually become disfavored as the number of defections by the partner increases, and one determines whether additional cooperations eventually become favored as the number of defections by the partner increases. We obtain expressions for these switch-points and for optimal numbers of defections against partners with various strategies. These typically involve small numbers of defections even in very long games. We show that potentially long stretches of equilibria may exist, in which there is no incentive to defect more or cooperate more. We describe how individuals find equilibria in best-response walks among n-step strategies.


Introduction
In a two-player iterated survival game, individuals may or may not survive each step and an individual whose partner has died must continue alone (Eshel and Weinshall 1988). It is a game against Nature (Lewontin 1961) such as when individuals have to fend off repeated attacks by a predator (Garay 2009;De Jaegher and Hoyer 2016) or face other sorts of adversity (Emlen 1982;Harms 2001;Smaldino et al. 2013;De Jaegher 2019). As Darwin (1859, p. 69) had noted: "When we reach the Arctic regions, or snow-capped summits, or absolute deserts, the struggle for life is almost exclusively with the elements." Observing animals living together under harsh physical and biological conditions, Kropotkin (1902) suggested that mutual aid is inevitable in evolution. Iterated survival games are a simple way to model these scenarios, and they do show that self-sacrificing cooperative behaviors can be strongly favored when the prospects for lone individuals are not great (Eshel and Weinshall 1988;Eshel and Shaked 2001;Garay 2009;Wakeley and Nowak 2019).
We consider iterated survival games of fixed length n. We assume that there are two possible single-step strategies or behaviors: C and D. The probability an individual lives through a single step is given by Table 1, and the game is symmetric in that both players receive payoffs (live or die in each step) according to this matrix. We assume that a > d, so individuals in CC pairs fare better than individuals in D D pairs. Total payoffs, which are overall survival probabilities, accrue multiplicatively across the n steps. These depend on the overall, n-step strategies of individuals, which are fixed strings of Cs and Ds. We limit our attention to strategies which switch from the more cooperative behavior (C) to the less cooperative behavior (D) once, at some step of the game. Our goal is to understand the consequences of this, both for individual survival within a game and for overall strategy choice by individuals.
From the standpoint of behavioral biology or mathematical ecology, this is a phenomenological rather than a mechanistic model (Geritz and Kisdi 2012). It is described plainly in terms of the relative survival of types in different combinations, and skirts any details about 'who helps whom achieve what' (Rodrigues and Kokko 2016). Survival is an obviously crucial kind of utility for individuals, which also combines in various ways with fertility to produce evolutionary fitness (Argasinski and Broom 2013). Here, Table 1 The single-step payoff (a, b, c, d or a 0 ) in a symmetric two-player survival game is the probability of survival of an individual when the individual and partner have specified single-step strategies, either C or D, or when the individual is playing alone because the partner has died (Ø) The loner survival probability, a 0 , does not depend on the individual's strategy. All five payoff values are strictly between 0 and 1 we do not consider differences in fertility. The only payoff is survival: one if the individual survives to the end of the game, zero otherwise. We allow any values between 0 and 1 for all five single-step payoffs (a, b, c, d, a 0 ) but we assume that they are fixed for the entire game and that survival outcomes are statistically independent, both in different steps and for different players in a single step. The consequent multiplicative accrual of payoffs turns relatively mild single-step games into mortally challenging iterated games as n increases. This naturally produces strong interdependence between individuals, which is known to favor cooperation and is purposely assumed in other models (Roberts 2005). When both players are present, then depending on the magnitudes of a versus c and b versus d, each step will fall into one of the four well-known classes of symmetric two-player games. Ignoring the possibility that some payoffs might be equal: a < c and b < d defines the class of games represented by the Prisoner's Dilemma (Tucker 1950;Rapoport and Chammah 1965); a > c and b < d defines the class represented by the Stag Hunt (Skyrms 2004); a < c and b > d defines the class represented by the Hawk-Dove game (Maynard Smith and Price 1973;Maynard Smith 1978); and a > c and b > d defines the class which was recently dubbed the Harmony Game (De Jaegher and Hoyer 2016). In the case of the Prisoner's Dilemma, a corresponds to the "reward" payoff, b to the "sucker's" payoff, c to the "temptation" payoff, and d to the "punishment" payoff (Rapoport and Chammah 1965). Wakeley and Nowak (2019) considered individuals with constant strategies, all-C or all-D, and studied how the relative frequency of all-C changes over time in a wellmixed population due to differential death in the two-player iterated survival game. Depending especially on the number of iterations n and the loner survival probability a 0 , the n-step game may be of a different type than the single-step game, with obvious implications for the evolution of cooperation. For example, if n is large and a 0 is small, the n-step game may be a Harmony Game even if the single-step game is a Prisoner's Dilemma. Then cooperation is favored despite the fact that it seems better to defect in any given step. On the other hand, if a 0 is large, the n-step game may favor all-D even if the single-step game is a Harmony Game.
Here we study the problem of strategy choice for a broader range of n-step strategies, specifically ones which switch from C to D at some step of the game. Strategy S i plays D for the final i steps of the game (and C for the first n − i steps). Thus, S 0 is all-C and S n is all-D. The series of single-step strategies between an individual with strategy S j and a partner with strategy S i may be depicted as in the case j ≥ i. Considering all i, j ∈ 0, n , we ask how the individual's probability of survival depends on j given i, as well as on the other six parameters (a, b, c, d, a 0 , n). We wish to understand how cooperation may be supported in these games. We describe the structure of incentives for increasing or decreasing the number of end-game defections, and we identify optima for which there is no incentive for the individual (or the partner) to change strategy.
We focus on the case where each single-step game is a Prisoner's Dilemma and ask how the incentive to defect may be undermined upon iteration when loners are at a relative disadvantage. However, we describe these games over the full range of a 0 and present some results for all four classes of single-step games. Restricting attention to strategies of the form S i allows us to work with closed-form expressions for overall payoffs in a simple space of strategies. We assume that individuals possess one of the n + 1 possible pure strategies and make choices among these based on overall payoffs. With respect to single-step Prisoner's Dilemmas, we know that S 1 is favored over S 0 but we do not know how far back into the game such advantages extend. Although we do not consider mixed strategies or frequencies of strategies in populations, our results have immediate consequences for Nash equilibria and evolutionary stability (Hofbauer and Sigmund 1998;Cressman 2005;Sandholm 2010).

Markov model of individual survival and preliminary calculations
The survival game is symmetric, so we can focus on one player, nominally the individual of Table 1. The individual is in one of three possible situations: alive with a partner, alive without a partner or dead. We use a Markov chain to model transitions among these three states. The probabilities of surviving to the next round are given by Table 1, symmetrically for both players, and players live or die independently of one another in each step of the game. The chain is non-homogenous because transition probabilities depend on the single-step strategies of the individual and partner in each round of the game, and these may change, for example as in (1).
There are four ways the individual can be alive with a partner, or four possible pairs of single-step strategies, with the individual listed first and the partner listed second: CC, DC, C D, and D D. We use these to index four corresponding single-step transition matrices. We use Ø to denote that one of the players has died and * as a placeholder for the partner when the individual has died. The game always starts with two players, but then changes state randomly according to these matrices. Fig. 1 Two-event decomposition of a single step in the iterated survival game when both players are present, illustrating Individual-Partner dependence. The diagram can be used to compute the first-row transition probabilities in the matrices in (2) through (5) by replacing I and P with strategies C or D then assigning probabilities to the arrows Note that the column labels in (2) through (5) denote the situation at the end of the current step of the game, before any switch from C to D occurs. The single-step strategies of the individual and partner in the next round of the game are as specified by their overall, n-step strategies S j and S i . The second and third rows of all four matrices are identical due to our assumption of a single loner survival probability regardless of strategy, and because the state Ø * is absorbing for the individual. The transitions described by the first rows of the matrices are more complex because they involve two events, one for the individual and one for the partner. Although payoffs are awarded simultaneously to both players in determining the transition probabilities in the first rows, this two-fold structure of the single-step game between two players lends itself to depiction as an extensive form game (von Neumann 1928;Kuhn 1953;Cressman 2005). This is illustrated in Fig. 1 and underscores the strong dependence between players in an iterated survival game. Figure 1 is also a probability tree diagram because the transition probabilities in the first rows in (2) through (5) can be obtained by multiplying probabilities associated with the arrows given specified single-step strategies C or D.
An individual with a partner may die, in which case the game is over for the individual regardless of what happens to the partner. This event is represented by the first down-arrow in Fig. 1. Having a large survival probability when the partner is present is the only protection against this fate for the individual. Here, the usual comparisons of a versus c and b versus d describe the consequence of switching strategies against a partner with a given strategy. However, the future state of the individual also depends on what happens to the partner. If the partner dies (second down-arrow in Fig. 1), the individual ends up alone and will be subject to the loner survival probability in every remaining step of the game.
The only way to remain in state one of the Markov chain is for both players to survive (both up-arrows in Fig. 1). The probability of this combined event is given by the upper-left or (1,1) entries in each matrix, which depend on the strategies of both players. Thus, the consequences of switching strategies will also depend on the comparisons of a 2 versus bc and bc versus d 2 . This can be understood in terms of the number of cooperators in each possible pair of single-step strategies. Switching from D to C against a D partner changes the number of cooperators in the pair from zero to one, and switching from D to C against a C partner changes it from one to two. The inclusion of the first cooperator in a pair has effect bc − d 2 whereas the inclusion of a second cooperator has effect a 2 − bc. Then, for example, an individual who suffers a cost b − d < 0 in a Prisoner's Dilemma might also enjoy the benefit of not having to survive alone, if it is also true that bc − d 2 > 0.
Our goal is to understand the overall survival probability of an individual whose strategy is S j given a partner with strategy S i for all i, j ∈ 0, n . Any such game can be partitioned into three phases: both players having strategy C, one C and one D, and both D. The ordered series of these will determine the overall transition matrix. For the example in (1), we have the product A We employ the following decomposition-exemplified by the case CC, when both players having strategy C-in order to compute the powers of the four matrices.
The diagonal elements in the middle matrix in (6) and in A CC itself are the eigenvalues of A CC . The two outer matrices in (6) are the inverses of each other. Then for any number of steps, k = 0, 1, 2, . . ., we have Applying the same technique to A DC , A C D and A D D we obtain Note that some quotients in (8) through (11) and in many payoff functions which follow are indeterminate for specific choices of single-step payoffs, for example if a 0 = d 2 in (11). However, these terms represent sums of geometric series which are well-defined for all payoff values. With these preliminary calculations, we can determine the n-step payoff of S j versus S i , which will be the focus of our analysis. We call this payoff A(S j ; S i ) and note that it is equal to the probability the individual with strategy S j is still alive after the n steps of the game. For the case j ≥ i, we have For the case where j ≤ i, we get the symmetric result in b and c, as well as in i and j, The four terms in (12) and (13) correspond to particular sub-events: the first is when the partner also stays alive during the whole game, while the remaining three are when the partner dies either when both players have strategy C, when one has C and one has D, or when both have D.

Playing with a fully cooperative partner
Here we consider an individual with strategy S j and a partner with strategy S 0 . We ask what strategy the individual should adopt to maximize survival given the specific game parameters (a, b, c, d, a 0 , n). We introduce methods which we extend to S j versus general S i in Sects. 4 and 5. In Sect. 3.1, we illustrate differences among the four well-known classes of (single-step) games, highlighting the importance of the loner survival probability a 0 in determining broad patterns of strategy choice in iterated survival games. In Sect. 3.2, we focus on the case in which the single-step game is a Prisoner's Dilemma, and ask how far the notion of backward induction may be applied to iterated survival games.
The n-step payoff of S j against S 0 is obtained by putting i = 0 in (12): Thus, A(S j ; S 0 ) depends on three individual survival probabilities (a, b, c), as well as on the pair survival probabilities (a 2 , bc) and the loner survival probability (a 0 ) which are eigenvalues of the single-step matrices in (2) and (3). It does not depend on d because there are no steps in which both players use strategy D. The dependence on n is simple: A(S j ; S 0 ) tends to zero as n tends to infinity. Surviving longer is always less likely. Conveniently for our purposes, A(S j ; S 0 ) depends on j only through the terms in the brackets, which do not include n. We focus on these terms and treat n implicitly, noting of course that j ≤ n. Because the terms in brackets may increase as j increases, it should be noted that A(S j ; S 0 ) is a probability-it can never exceed 1-and that if j = n and n tends to infinity, A(S j ; S 0 ) tends to zero. We wish to know the value of j ∈ 0, n which maximizes the survival probability of the individual for a given parameters (a, b, c, a 0 ). Although j is discrete, in order to find the optimum we treat (14) as a continuous function of j ∈ [0, n]. Three cases can occur, because there is at most one change in sign of the slope. The maximum can be reached when j = 0, which would happen for example when a 2 > a 0 > c. Then the fully cooperative behavior has the greatest chance of survival, no matter how many rounds are being played. Alternatively, the supremum of the function may be in the limit j → ∞. Then, for large enough n, the best j would be n. In this case S n , or all-D, would have the greatest chance of survival against S 0 . A third case is that the function has a maximum at some intermediate value, specifically at on either side of the real-valued j opt . If n ≤ j opt , then all-D (S n ) is the best strategy against all-C (S 0 ). If n > j opt , then the optimal number of defections is J opt which does not depend on n. In the Prisoner's Dilemma and the Hawk-Dove game, it will always be advantageous to defect in the final round of the game, because c > a. But when j opt exists, additional end-game defections will be favored only up to J opt even against a partner who commits to full cooperation in an arbitrarily long game.  Axelrod (1984). For all four games in Fig. 2a the relationship of the eigenvalues is a 2 > bc > a 0 . In Fig. 2b it is a 0 > a 2 > bc. Again, we are interested in whether the highest survival occurs at one or the other extreme, j = 0 or j = n, or at some intermediate J opt . An optimal intermediate strategy exists in these examples only for the Prisoner's Dilemma and the Hawk-Dove game with small a 0 (Fig. 2a). When a 0 is the smallest eigenvalue, there is a high cost to a player being alone for a long stretch. The optimal strategy balances the increased chance of paying this cost against the increase in survival from switching from C to D in the Prisoner's Dilemma and the Hawk-Dove game. If, as in the Stag Hunt and Harmony Game in Fig. 2a, switching from C to D does not directly increase survival, then S 0 (all-C) is best. On the other hand, when a 0 is large, a lone individual may have an advantage. This is true in all four examples in Fig. 2b, where a 0 is the largest individual payoff. For large j, the term in brackets in (14) is strictly increasing in j. Provided the game is long enough, S n (all-D) will be the best strategy in all four cases. Here, additional defections by the individual put the partner at greater risk because b < a and b < c. But in the Harmony Game and the Stag Hunt it is also true that c < a, meaning that the individual pays a cost to put the partner at risk. This causes minima of survival at intermediate j in these two games. The individual only sees the benefits of the high loner payoff at larger j in longer games. The Prisoner's Dilemma and the Hawk-Dove game do not show this dip in survival for small j because they both have c > a. Note that changing the level of risk to the partner in the Harmony Game and the Stag Hunt can drastically alter these results. For example, putting b = 0.98 in this Harmony Game completely removes the risk to the partner, while preserving the order of eigenvalues and the fact that a 0 is the largest individual payoff. Now any increase in j will be disadvantageous because the term in brackets in (14) is negative. The four-fold classification of games based on the comparison of a to c and b to d, together with the rough criteria of large versus small a 0 is not enough to determine the potential advantages of switching strategies from C to D at some point in the game. The order of the eigenvalues (a 2 , bc, d 2 , a 0 ) is crucial. The example games in Fig. 2 all have a 2 > bc > d 2 , but it could be otherwise. For some games, we might have a 2 > d 2 > bc and for others bc > a 2 > d 2 .

Comparison of the four types of games
The assumption that C is the more cooperative and D the less cooperative strategy, hence a > d, guarantees that a 2 > d 2 . But in all cases, a 0 could be anywhere in the order of eigenvalues. In what follows, we focus on the classic challenge to cooperation, the Prisoner's Dilemma of Tucker (1950) and Rapoport and Chammah (1965). This is a restricted version of what we have been calling the Prisoner's Dilemma class of games, specifically satisfying conditions (17) and (18) below. Our aim is to determine in detail when a late defection might be optimal or when an early one would be better, depending especially on the magnitude of the loner survival probability, a 0 .

Defection against a fully cooperative partner in the Prisoner's Dilemma
When the single-step game is a Prisoner's Dilemma, playing D in the final step of an n-step game will always increase the survival probability of an individual. If payoffs accrued additively as in the classical repeated Prisoner's Dilemma (Rapoport and Chammah 1965;Axelrod 1984) then by backward induction the same logic would apply to every preceding step of the game. Seeing an uninterrupted sequence of increased chances of survival, an all-C individual facing an all-C partner would switch to all-D. But payoffs do not accrue additively in an iterated survival game. We have already established that an optimal number of defections, J opt in (16), may exist. Here we study in detail how this depends on the loner survival probability.
We make use of the classical assumptions of the Prisoner's Dilemma, described for example by Rapoport and Chammah (1965, p. 34): The broader class of games which includes this Prisoner's Dilemma is defined just by c > a and d > b. Again, a > d in (17) guarantees that a 2 > d 2 , so the survival probability of the pair is higher when both players cooperate than when both defect.
The additional restriction to a 2 > bc in (18) means that pairs survive better when both players cooperate than when just one player cooperates. This is not a major restriction, as 90% of the parameter space of survival games (0 < a, b, c, d < 1) for which (17) is satisfied also has a 2 > bc (Wakeley and Nowak 2019). Note that (17) and (18) do not determine the relationship between bc and d 2 .
We base our detailed analysis on the payoff difference When this difference is positive, there is incentive for an individual currently playing all-C against an all-C partner to switch strategies and defect for the final j rounds of the game. When it is negative, the individual is better off sticking with all-C, or S 0 . The j for which this difference is the largest will be the optimal number of end-game defections given a fully cooperative partner. As in (14), there is a separation of n and j. The same two exponential terms are present within the brackets, which will increase, decrease or remain constant as j increases, depending on the ratios of eigenvalues, bc/a 2 and a 0 /a 2 . Again, the slope changes sign at most once. It is straightforward to compute Then for the Prisoner's Dilemma (i.e. with c > a), the payoff difference increases with j when j is small. The question is whether it continues to increase or reaches a peak and starts to decrease as j grows. Owing to (18), bc/a 2 is strictly less than one. But the parameter a 0 is free to vary between 0 and 1, so a 0 /a 2 may be less than, greater than, or equal to one.
If a 0 < a 2 , then both exponential terms in (19) will be decreasing in j and will eventually go to zero. At some point as j increases, assuming n is large enough, the difference A(S j ; S 0 ) − A(S 0 ; S 0 ) will turn negative and converge to the constant −a 2n (a 0 − a)/(a 0 − a 2 ). Too many defections will ultimately hurt the player because the loner survival probability is small. Again, defecting just once at the end of the game is always favored because c > a. Therefore an optimal strategy will exist, with J opt given by (15) and (16). But if n is not large enough, then j will always be less than this optimum and the best strategy against S 0 will be S n .
Instead if a 0 > a 2 , then the difference A(S j ; S 0 ) − A(S 0 ; S 0 ) will eventually be dominated by the middle term in (19). Depending on the sign of this term, A(S j ; S 0 )− A(S 0 ; S 0 ) will be increasing or decreasing when j is large. As there is at most one change in sign of the slope and the initial slope is positive, either the best strategy is complete defection or there exists an optimal intermediate strategy. The first occurs if and only if the middle term in (19) is positive. This induces a cutoff for a 0 as it varies between a 2 and 1. There is a shift in the behavior of A(S j ; S 0 ) − A(S 0 ; S 0 ) as j increases, from having an intermediate optimum to always increasing, at The cutoff a * 0 is the largest value of a 0 such that full defection might not be favored (i.e. there is a finite optimum j) against a fully cooperative partner. Again, if n ≤ j opt , then full defection would still be the best strategy, even if a 0 < a * 0 . But if a 0 > a * 0 , then full defection will always be favored, for any n.
The two survival differences which determine the coefficients of a 2 and a in (20) can be understood with reference to Fig. 1 and (2) and (3). The first, c − a > 0, is the classic change in payoff for defecting against a cooperative partner, which here is the difference in the single-step survival probability of the individual regardless of what happens to the partner. The second, a 2 − bc > 0, expresses as a positive term the difference in the probability that both the individual and the partner survive. It is a single-step cost in pair survival but may be either a cost or a benefit to the individual depending on the values of a 0 and n. The coefficients in (20) sum to one, so the cutoff a * 0 is an average falling between a 2 and a.
The cutoff a * 0 is closer to a 2 , and therefore smaller, when the benefit in individual survival, c − a, is large relative to the cost in pair survival, a 2 − bc. When this is true, even a fairly small value of the loner survival probability a 0 cannot prevent S n from being the best strategy against S 0 . On the other hand, a * 0 is closer to a, and therefore larger, when the cost in pair survival is relatively big. When this is true, there may be an intermediate optimum strategy even when the loner survival probability is fairly large. Taking derivatives of a * 0 provides some intuition about the effects of changing specific parameters, when other parameters are held constant. As long as the assumptions in (17) continue to be met, a * 0 increases as a increases, but decreases when either b or c increases. In addition, if b increases and c decreases, together at the same rate so that bc approaches a 2 , then a * 0 will decrease toward a 2 . So far, we have considered two possibilities: a 0 < a 2 and a 0 > a 2 . In the first case, a 2 is the largest eigenvalue, so a pair of cooperators survives a single step of the game better than any other pair and better than a lone individual. In this case, both terms that depend on j in (19) decrease to zero and the payoff difference A(S j ; S 0 ) − A(S 0 ; S 0 ) converges to a finite negative constant, so there exists an optimum number of endgame defections, J opt in (16). In the second case (a 0 > a 2 ), a lone individual survives a single step better than any pair of individuals. But even when this is true, continuing to increase the number of end-game defections is advantageous only when a 0 exceeds a * 0 , which is larger than a 2 . If a 2 < a 0 < a * 0 , there is a J opt which may be relevant depending on the total number of steps in the game, n. Note that when a 0 = a * 0 there is still a growing interest in defecting, but the dependence on j is different because the Fig. 3 The optimal, real-valued point of defection j opt increases without bound as a 0 approaches a * 0 , for the same single-step Prisoner's Dilemma game in Fig. 2 In the special case that a 2 = a 0 , we cannot use the results for geometric series which gave (8) through (11). Here we have As bc < a 2 < c, the derivative will ultimately become negative, so there will be some optimal point of defection. Thus, a 0 = a 2 is not pathological and belongs to the case a 2 < a 0 < a * 0 . For technical reasons we distinguished three cases (a 0 < a 2 , a 2 ≤ a 0 < a * 0 , a * 0 ≤ a 0 ) but the important point is whether an optimum exisits. For this we have just two cases: J opt exists when a 0 < a * 0 and does not exist when a * ≤ a 0 . We turn now to the question of how j opt and J opt depend on a 0 when a 0 < a * 0 . Because larger a 0 indicates a smaller cost of being alone, it is intuitive that both quantities should increase with a 0 . Figure 3 displays this for j opt and suggests that both j opt and J opt are increasing functions of a 0 . Examination of j opt in (15) when a 0 is close to either of the extremes, 0 or a * 0 , gives and For J opt , using (16) and and In Appendix 1, we prove that J opt increases with a 0 for a 0 < a * 0 . Beyond this point, i.e. for a 0 ≥ a * 0 , we may also say that J opt is infinite because regardless of n it will always be beneficial to increase the number of defections.

Behavioral equilibria
Here we lift the restriction that the partner is fully cooperative, and ask whether there is an incentive to defect more or to cooperate more when the partner has strategy S i . We will assume that A(S j ; S i ) = A(S i ; S i ) unless j = i, both for simplicity and because we imagine A(S j ; S i ) = A(S i ; S i ) to be unlikely in our model. Since the number of possible strategies {S j ; j ∈ 0, n } is finite, there will always be an optimal one against S i . We are interested in identifying stable strategies, such that the individual cannot increase their probability of survival against a partner who has the same strategy. Strategy S i is optimal in this sense when Then S i is a strict Nash equilibrium and therefore an evolutionarily stable strategy, or ESS (Maynard Smith and Price 1973;Maynard Smith 1982;Hofbauer and Sigmund 1998;Cressman 2005). A full account of Nash equilibria and ESSs would require the analysis of A(S j ; S i ) = A(S i ; S i ) which we do not pursue here. Due to (12) and (13), the cases j > i and j < i must be analyzed separately. Also, note there may be many equilibrium strategies. In this section we focus on local equilibria, meaning that the only options open to the individual are to defect one more time or cooperate one more time. Strategy S i is locally stable if and only if for i ∈ 1, n − 1 , with just (28) and (29) respectively at the endpoints i = 0 and i = n. In Sect. 5, we consider global equilibria, for A(S j ; S i ) over all i, j ∈ 0, n .

General results
We base our analysis of local stability on the two key differences Similar to (19), these two formulas show a separation of i and n. Their signs may depend on i but will not depend on n. Both formulas are sums of two exponential functions in i, with coefficients that depend on the game parameters (a, b, c, d, a 0 ). They can change sign at most once. Therefore, the conditions for local stability in (28) and (29) will each be met-corresponding, respectively, to (30) and (31) being negative-either for a stretch of i or for no values of i. The set of locally stable i is the intersection of these two (possibly empty) stretches. In the case of defecting one more time, the stretch may range from 0 to +∞. In the case of cooperating one more time, it may range from 1 to +∞. Then, the locally stable strategies are a stretch of integers whose boundaries range from 1 to +∞ (which may be empty) plus possibly 0. For the smallest i, (30) and (31) reduce to Strategy S 0 , or all-C, is locally stable if and only if c < a which means that the single-step game is either a Harmony Game or a Stag Hunt (cf. Table 1). As in Sect. 3, we treat n implicitly in what follows, keeping in mind that any stretch of equilibria will depend on n in that n fixes the upper boundary of the interval. Our primary concern is to understand how the stretch of locally stable states depends on the other game parameters, in particular the loner survival probability a 0 .

Focusing on the Prisoner's Dilemma
As in Sect. 3.2 we focus on the Prisoner's Dilemma. Again we assume (17) and (18), namely that c > a > d > b and a 2 > bc. In the following subsections, we first study the incentives (or disincentives) to either defect more or cooperate more, then consider the overlap of these two sets of results in order to identify equilibria, and finally provide a summary and interpretation of outcomes.

Incentives to defect more or cooperate more against S i
Under the assumption that the single-step game is a Prisoner's Dilemma, we have Thus, i = 0 is never locally stable state when the single-step game is a Prisoner's Dilemma. (30) starts off positive for small i and will change sign at most once. We define the real-valued cutoff i D to be the point at which defecting one more time becomes disadvantageous as i increases. If (30) never changes sign, then i D does not exist and additional defection is always favored. When i > i D , the strategy S i is a candidate for locally stability. Similarly, since A( (31) starts off negative for small i and changes sign at most once, we define i C to be the point at which increased cooperation first becomes advantageous. Here too i C may not exist. When i < i C , the second criterion for local stability of strategy S i is met. Both criteria (28) and (29) are satisfied when i ∈ i D , i C , but this interval will be empty if i D > i C .
We begin with the case of increasing defection. If a 0 < d 2 , then A(S i+1 ; S i )− A(S i ; S i ) in (30) will ultimately become negative because the first term inside the brackets will come to dominate as i grows and this term is negative owing to our assumption that a 2 > bc. If a 0 > d 2 , then (30) will ultimately become negative if and only if (a 2 −bc) a 0 −d a 0 −d 2 +c−a < 0. Analogous to the situation in Sect. 3.2 with the cutoffs a * 0 and j opt , here we require and find an associated cutoff for i which exists if a 0 < a 0 . There is an advantage to defecting one more time only when i < i D . For larger i it is disadvantageous. However, if a 0 ≥ a 0 , then i D does not exist and defecting one more time is advantageous for all i.
In the special case a 0 = d 2 , we obtain which starts off positive for i = 0 then turns negative for some larger i. Thus a 0 = d 2 is not a pathological case but belongs with a 0 < d 2 and d 2 < a 0 < a 0 . Like a * 0 in (20), the cutoff a 0 in (36) is an average. Previously i was the number of defections the individual was considering against an all-C partner. Here i is the fixed Fig. 4 i D is the point above which defecting once more would become disadvantageous. It increases with a 0 toward +∞ as a 0 approaches a 0 . The parameters here are the same as in Fig. 3 (a = 0.97 number of D D rounds the individual must face when considering whether to defect one more time against an S i partner. As a result, a 0 is an average falling between d 2 and d instead of between a 2 and a. However, the coefficients determining where it falls are the same as before because the individual is making the same switch, from C to D when the partner has strategy C in that step. Thus, a 0 is closer to d 2 , i.e. smaller, when the resulting gain in individual survival (c − a) is large relative to the loss in pair survival (a 2 − bc), and closer to d, i.e. larger, when the opposite is true. Figure 4 illustrates that i D is an increasing function of a 0 , growing from 0 to +∞ as a 0 goes from 0 to a 0 . As before, this fits with intuition about the balance between the benefit of defecting while the partner is still alive and the drawback of having to survive alone. The bigger a 0 is, the smaller this drawback becomes. The extremes of i D can be obtained from (37). We find In Appendix 2, we prove that i D is indeed an increasing function of a 0 .
Turning to the case of increasing cooperation, recall that A( (31) is negative for the smallest value, i = 1. Additional cooperations will continue to be disfavored unless A(S i−1 ; S i ) − A(S i ; S i ) changes sign and becomes positive at some i C . If i C exists, then for any larger i it will be advantageous for the individual to cooperate one more time. Then for all i > i C , strategy S i cannot be locally stable, whereas for i < i C it might be locally stable. Note that if the individual changes strategy from S i to S i−1 against an S i partner, the pair-survival probability changes from d 2 to bc, and the individual survival probability changes from d to b. The net effect of the latter is negative (b − d < 0). This direct disadvantage to additional cooperation may be offset by increased pair survival, but only if bc > d 2 . Again, the assumptions in (17) and (18) do not determine the relationship of bc to d 2 .
When bc ≤ d 2 , the sign of A(S i−1 ; S i ) − A(S i ; S i ) never changes because the net effect on pair survival, bc − d 2 , is at most zero and will not be able to offset the direct, individual disadvantage of cooperating one more time. In this case i C does not exist, so all strategies are candidates for local stability, the upper limit being set only by n. When bc > d 2 , the sign of the payoff difference may change, giving a finite i C , but this will depend on the loner survival probability. If a 0 < d 2 , then A(S i−1 ; S i )− A(S i ; S i ) will eventually become positive. The case a 0 = d 2 gives the same result, but is necessary again to compute the difference in probability without using the results for geometric series as we did previously for the condition on A(S i−1 ; S i ) − A(S i ; S i ). If instead a 0 > d 2 , the payoff difference will ultimately become positive if and only if (bc but only when i is greater than Even when the loner survival probability is small, it will be disadvantageous to cooperate one more time if i < i C . Using an approach like the one for i D in Appendix 2, it can be shown that i C is an increasing function of a 0 in the interval (0, a 0 ). At the extremes of loner survivability, we have Intuitively, the larger a 0 is, the lower the danger of a long stretch of mutual defection, so the individual is less inclined to risk a low probability of individual survival (b) in a given step for a greater chance of pair survival (bc). As a 0 approaches a 0 , surviving alone no longer becomes a drawback as i increases.

Stretches of locally stable strategies
The stretch of locally stable strategies is the interval of integers which satisfy the two conditions, (28) and (29). This interval is i D , i C but is empty when i D > i C . There are three different cases to consider. The first case is d 2 ≥ bc, such that i C does not exist regardless of a 0 . With an upper limit of n, the integer interval begins as 1, n when a 0 is close to 0, then shrinks to an empty set as a 0 increases, because the lower boundary, i D , grows without bound as a 0 approaches the cutoff a 0 in (36) and becomes infinite (does not exist) when a 0 ≥ a 0 . The second and third cases occur under the condition bc > d 2 , when i C may exist. Here, if a 0 is close to 0, then i D = i C = 1, so S 1 is the only locally stable strategy for small a 0 . When the chance of surviving alone is very small, cooperation will be advantageous except in the final step of the game. As a 0 increases, both i C and i D Fig. 5 In orange, i C for a given a 0 is the point above which an additional round of cooperation is favored. In blue, i D for a given a 0 is the point below which an additional round of defection is favored. The game parameters are a = 0.97, b = 0.93, c = 0.98, d = 0.95, which are related to those used previous, e.g. in Fig. 4, by subtracting 0.01 from b and c which makes a 0 < a 0 while keeping bc > d 2 . For any given a 0 , the stretch of locally stable states is the set of integer values of i falling between the two lines, where increased cooperation and increased defection are both disfavored increase without bound, but with different consequences depending on whether a 0 < a 0 or a 0 > a 0 . The latter two cases differ owing to the different rates of increase of the two boundaries i D and i C as a 0 increases. For simplicity, we focus on the continuous interval [i D , i C ] which has length i C −i D . We again treat n implicitly, knowing the picture will look different for n < i D , i D < n < i C and n > i C . If a 0 < a 0 , then i C diverges before i D and i C − i D will increase as a 0 increases. If a 0 > a 0 , then i D diverges before i C and i C − i D will decrease as a 0 increases. In the case of shrinking i C − i D , since i D = i C = 1 when a 0 is close to 0 there will be at most one locally stable state, which will exist over values of a 0 for which [i D , i C ] contains an integer. Local stability becomes impossible when a 0 is large enough that i D exceeds i C . Figure 5 illustrates the case where a 0 < a 0 , so that i C diverges before i D . In Appendix 3, we prove that the stretch of equilibria grows with a 0 in this case. The stretch of locally stable equilibria i D , i C increases in length with its two boundaries drifting towards n as a 0 grows. The upper limit i C will reach n for some a 0 < a 0 after which the stretch of equilibria will be i D , n which starts closing as the lower boundary increases with a 0 . Eventually the stretch will be reduced to the single point n for some a 0 < a 0 . The stretch will disappear as a 0 approaches a 0 , meaning that there will always be an incentive to defect once more. But since there are only n rounds in the game, S n will remain a stable strategy for all larger values of a 0 .
Using the same techniques, the opposite behavior can be shown to hold when a 0 < a 0 . Here, the stretch decreases in length, with at most one locally stable state, until it disappears at some a 0 < a 0 when the curves for i C and i D cross. Figure 6 shows an example. For a 0 larger than the value for which i C = i D , no stretch of locally stable equilibria can exist. We might call this value a 0 and for reference give its formula, which can be obtained using (37) and (42). For the parameters in Fig. 6, we have a 0 ≈ 0.43. As long as n is large enough, there will be three zones: for small i there will only be an incentive to defect more, for intermediate i increased defection and increased cooperation will both be favored over keeping the same strategy, and for large i there will only be an incentive to cooperate more. These three zones will drift towards larger i so that eventually for some a 0 < a 0 there will only be an advantage to defect one more time. Then only S n will remain a stable strategy.

Summary and interpretation of cases
Our analyses in the previous two sections (4.2.1 and 4.2.2) establish that when neither i D nor i C exists, there is an incentive to defect one more time against a partner with strategy S i regardless of i. When i D exists, additional defections are favored if We focused on the possibility of a non-empty stretch of local equilibria i D , i C existing when i > i D and i < i C . We also described the possibility of a stretch of what we may call 'disequilibria', where increased defection and increased cooperation are both favored. Here we point out a third case, that the stretch contains no integers, that is when i D = i C so incentives switch between i D and i D . In all scenarios, we established that when i is outside the stretch there is incentive to move toward it by increasing the number of defections if i < i D and increasing the number of cooperations if i > i C . Table 2 delineates ten possibilities, showing the parameter ranges and resulting incentive structures for each, assuming n is large enough that n ≥ i D and n ≥ i C whenever i D and i C exist. The first level of classification creates three major divisions: in the first, only i D may exist, whereas in the second and third both i D and i C may exist. The second, finer level emphasizes the importance of a 0 . Among the ten possibilities listed, we recognize five basic types of incentive structure. Table 2 Parameter regions-largely determined by the relative magnitude of the loner survival probability a 0 -which produce different incentives for an individual with strategy S i to either cooperate once more, defect once more, either or neither, against a partner with the same strategy S i bc ≤ d 2 a 0 ≥ a 0 Additional defection always favored (1) bc > d 2 and a 0 ≥ a 0 a 0 ≥ a 0 Additional defection always favored (1) bc > d 2 and a 0 < a 0 a 0 ≥ a 0 Additional defection always favored (1) a 0 < a 0 and i D = i C Incentives switch between i D and i D (5b) In all cases it is assumed that c > a > d > b and a 2 > bc. In the second-to-last line, the incentives switch from favoring additional defection if i ≤ i D to favoring additional cooperation if i ≥ i D = i C . The right-most column bins the ten possible incentive structures into the five basic types Roughly speaking, large a 0 causes additional defection to be favored regardless of i (Type 1), and small a 0 leads to the existence of equilibria which may be unbounded and capped only by n (Type 2) or bounded by i C (Type 3). For some intermediate values of a 0 , disequilibria arise which may be unbounded (Type 4) or bounded (Type 5). These intermediate values of a 0 occur when bc > d 2 and a 0 < a 0 , as in Fig. 6, and a 0 is larger than the value for which i D and i C cross, namely a 0 in (45). However, we do not use a 0 to classify incentive structures in Table 2 because incentive structures depend on i D and i C , not simply on i D and i C . Following the discussion of Fig. 1 in Sect. 2, we interpret the possibilities outlined in Table 2 as a balance between individual survival and pair survival. The first major division of Table 2 has already been discussed. It is based on the assumption that the order of eigenvalues is a 2 > d 2 ≥ bc, with a 0 falling somewhere between 0 and 1. Here an additional round of cooperation does not benefit the individual (b − d < 0) or the pair (bc − d 2 ≤ 0). Thus the only criterion for stable states is whether additional defections remain favored. They are favored for small i but become disfavored at some larger value of i = i D which increases with a 0 . For a 0 ≥ a 0 , additional defections are favored for all i so none of the S i are stable.
The second and third major divisions of Table 2 are for a 2 > bc > d 2 , in which case the interval of locally stable states is bounded for small a 0 then shifts toward larger integers as a 0 increases. As it shifts, both ends of the continuous interval [i D , i C ] grow smoothly with a 0 while its width i C − i D either expands or shrinks depending on whether a 0 > a 0 , so that i C diverges first as in Fig. 5, or a 0 < a 0 , so that i D diverges first as in Fig. 6. In the latter case, as a 0 increases there may be a series of single equilibrium points (Type 3b) with i D = i C = 1, 2, 3 . . ., which are separated by short intervals of a 0 for which incentives switch between i D and i D (Type 5b) before the final one of these intervals occurs around a 0 , then is followed by a series of expanding bounded stretches of disequilibria (Type 5a) as a 0 approaches a 0 . In the simple example of Fig. 6, i = 1 is the single equilibrium point for a 0 ∈ (0, 0.323), incentives switch between i = 1 and i = 2 for a 0 ∈ (0.323, 0.547), and there are expanding stretches of disequilibria for a 0 ∈ (0.547, 0.919 = a 0 ).
Putting the criterion for a shrinking stretch of equilibria, and consequently the chance for disequilibria, in terms of individual versus pair survival, we have On the one hand, the advantages of increased defection extend to smaller a 0 when the resulting cost to pair survival (a 2 − bc) is low compared to the gain in individual survival (c − a). On the other hand, the advantages of increased cooperation extend to larger a 0 when the resulting gain in pair survival (bc − d 2 ) is high compared to the cost in individual survival (d − b). When this criterion (46) is met, there is a range of loner survivability for which the threat of having to survive alone is enough to support additional cooperation but not enough to prevent additional defection.

Global properties of A(S j ; S i )
Here we return to the payoff matrix A(S j ; S i ) for all i, j ∈ 0, n , given by (12) for j ≥ i and by (13) for j ≤ i. To recap: in Sect. 3 we fixed i = 0 and asked whether an optimal response j = J opt existed, and in Sect. 4 we focused on j = i and considered in detail the neighboring states where j and i differ by 1. Our findings about J opt , i D and i C retain their importance in this section, where we study the full payoff matrix A(S j ; S i ). In the subsections which follow, we investigate the global stability of locally stable strategies, show how A(S i ; S i ) depends on i, and ascertain key features of a best-response walk on the surface A(S j ; S i ). We continue to assume that the single-step game is a Prisoner's Dilemma, so we have c > a > d > b and a 2 > bc.

Global versus local stability
Global stability is defined as follows: This, again, is in the sense of a strict Nash equilibrium. A globally stable state is obviously a locally stable one. Here we prove that the reciprocal is true.
We begin with the case of increasing cooperation. Specifically, we compare the difference in payoff of two individuals, one who cooperates k additional times and one who cooperates k − 1 additional times, both having a partner with strategy S i . Using (13) and simplifying, we have Here k ranges from 1 to i. Equation (48) is negative when k = 1, due to local stability, and will change sign at most once as k increases from 1 to i. We need only check the endpoint, k = i, where we find No additional number of cooperations is favorable against a locally stable strategy.
In the case of increasing defection, we compare the payoff of an individual who defects k + 1 times to that of individual who defects k times, against a partner with strategy S i . Here k ranges from 0 to n − 1, but we must consider all k ≥ 0 because n may take any value. Using (12) and simplifying, we may write this difference as in which a * 0 is the cutoff given by (20), which was derived in the consideration of an optimal number of defections against a partner with strategy S 0 , and which does not depend on k. Local stability means that (50) is negative when k = 0. If it remains negative for all k > 0, then no additional defections will be favored against a partner with strategy S i . This will depend on the comparison of H and the second term inside the brackets in (50). If a 0 < bc, this second term is positive, so from k = 0 we know H must be negative. Also, the second term will shrink to zero as k increases because a 0 /bc < 1. Therefore, the whole of (50) remains negative for all k if a 0 < bc. Alternatively, if bc < a 0 < a * 0 , then the second term in (50) is negative and increases in absolute value as k increases. Here too (50) remains negative for all k. We do not need to consider a 0 > a * 0 because local stability requires a 0 < a 0 and we have a 0 ≤ a * 0 . If S i is locally stable, there is no increased number of defections which is better.
Taking both cases together, we have proven that locally stable states and globally stable states are the same. For brevity, we have omitted the detailed treatments of special cases, such as a 0 = a 2 , and simply note that these do not alter our conclusion. In sum, globally stable states form the same intervals as locally stable states we described previously in Sect. 4.2.2.

The diagonal A(S i ; S i )
Although potentially long stretches of local equilibria may exist, not all A(S i ; S i ) are equivalent. In the single-step survival game or in the usual Prisoner's Dilemma with a > d, C is a better choice than D if both players take the same strategy. Here we are interested in whether S 0 is the best strategy in this sense in the n-step game. We base our analysis on the one-step difference which upon simplification may be written For the smallest i we have The difference A(S i+1 ; S i+1 )− A(S i ; S i ) will remain negative for larger i unless the second term in the brackets in (52) becomes too large in the negative direction. Of course a+d > 0. This second term in the brackets is a decreasing function of a 0 , which begins positive for 0 < a 0 < d, then becomes negative when a 0 > d and continues to decrease as a 0 approaches 1. It is straightforward to check that even with a 0 = 1, (52) is negative. Thus, A(S i ; S i ) is a decreasing function of i. The fully cooperative strategy S 0 is the best if both players are restricted to having the same strategy.

A best-response walk on the surface A(S j ; S i )
To better understand the full payoff matrix A(S j ; S i ) for all i, j ∈ 0, n , we studied the best responses of an individual to their partner's current strategy. We assume that both players initially have the same strategy, that the partner follows suit with the same best response as the individual, and that this process is repeated. The resulting walk is well defined in the sense that none of the A(S j ; S i ) are equal, considering all j ∈ 0, n for a given i. Because the walk is deterministic and has a finite number of possibilities (exactly n + 1 states), it cannot be injective. Ultimately it will end in a cycle, which might consist of one globally stable strategy. As we do not consider mixed strategies or population-frequencies of strategies, these best-response walks should not be confused with best-response dynamics the way the latter are usually defined (Hofbauer and Sigmund 1998;Cressman 2005;Sandholm 2010). The analysis of i D and i C in Sect. 4.2, based on single-step changes in strategy, shows that there is incentive to move toward a stretch of equilibria for any partner strategies outside the stretch, by increasing defection when i ≤ i D and by increasing cooperation when i ≥ i C . The same analysis shows that there is incentive to move toward a stretch of disequilibria or a stretch of equilibria which is empty. Here we investigate how bestresponse walks on the surface A(S j ; S i ) depend on the initial value of i, how stretches of equilibria or disequilibria are approached from above and below in steps which may be greater than one, and how these walks converge on single points or enter into larger cycles. Figure 7 illustrates this for two survival games of length n = 20, one with a stretch of equilibria and one with a stretch of disequilibria. The first (Fig. 7a, c) has a 0 < a 0 < a 0 and 0 < i D < i C < n and so exemplifies the fifth of the ten possibilities listed in Table 2, with a stretch of equilibria for i ∈ 4, 15 . The second (Fig. 7b, d) has a 0 < a 0 < a 0 and 0 < i C < i D < n and so exemplifies the eighth of the ten possibilities listed in Table 2, with a stretch of disequilibria for i ∈ 5, 13 . Panels A and B give 3d depictions of A(S j ; S i ) as a continuous surface. Panels C and D show the same surfaces, viewed from above, and display all possible best-response walks using arrows. Each possible walk starts at some point on the diagonal. It follows the vertical arrow which goes either up or down to the optimal strategy S j against S i . Then it follows the horizontal arrow which goes back to the diagonal. It continues in like manner, repeating the exact same procedures. Figure 7 shows the characteristic features of walks when i D and i C exist. In particular, if i ≤ i D the best response (on the side of additional defection) is an increasing function of i, whereas if i ≥ i C the best response (on the side of additional cooperation) does not depend on i. When there is a stretch of equilibria, i D , i C , the points on the interior are their own best responses, and walks which begin outside the stretch converge on its endpoints, i D from below and i C from above. When there is a stretch of disequilibria, i C , i D , incentives to defect more send walks into the interior then through the stretch, toward i D , but these are opposed by incentives to cooperate more, which always leap over the stretch, directly to i C . In this case, walks may converge on cycles of two or more states.
We can use (48) and (50) in Sect. 5.1 to obtain the best responses for i ≥ i C and i ≤ i D , respectively. In the first case, we put j = i − k in (48) and rewrite it for our purposes here as Now j ranges from 0 to i. We know that (54) is negative when j = 0, from (49) which holds for all i. In addition, because here we are assuming i ≥ i C , we know that (54) is positive when j = i. We treat j as continuous and solve for the value which makes (54) equal to zero, Then, the best response falls in the interval ( j * , j * + 1) and must be equal to j * . Writing (55) in this way emphasizes that we are considering the case a 0 < a 0 , namely when i C exists. In fact, it is straightforward to show that j * = i C − 1, so that j * = i C . Thus, for partner strategies with i ≥ i C , the optimal strategy of an individual is to defect only in the final j * = i C steps of the game. If there is a stretch of equilibria then j * is at the upper end of the stretch, whereas if there is a stretch of disequilibria then j * is just beyond the lower end of the stretch. In the second case, i ≤ i D , we similarly set (50) equal to zero and solve to obtain in which the dependence on i is through H , given by (51). The best response is captured by the interval (i + k * (i), i + k * (i) + 1) and is equal to i + k * (i) . The full expression for k * (i) is cumbersome, but for the smallest i we have Note that this is another route to the optimal number of defections against a fully cooperative partner (Sect. 3.2) because k * (0) = J opt . For larger i, we find that k * (i) decreases with i, finally reaching zero for i = i D . As Fig. 7 shows, the optimal total number (i + k * (i) ) Fig. 8 Three additional examples of best-response walks on the surface A(S j ; S i ) for a game of length n = 20. In a, (a, b, c, d) = (0.97, 0.91, 0.98, 0.95) and a 0 = 0.95, so bc < d 2 and a 0 > a 0 , and additional defection is always favored. In b, (a, b, c, d) = (0.97, 0.93, 0.98, 0.95) and a 0 = 0.92, so bc > d 2 and a 0 < a 0 < a 0 , and all i ≥ 12 are stable. In c, (a, b, c, d) = (0.97, 0.93, 0.99, 0.949) and a 0 = 0.78, so bc > d 2 , a 0 < a 0 < a 0 and i D < i C , and there is a single stable state at i = 4. Thus, these correspond to the first, third and last of the ten possibilities listed in Table 2 of end-game defections against partner strategies with i ≤ i D increases with i. The largest integer-valued i which still favors increased defection is i = i D and this would motivate one additional defection by the individual, up to j = i D . If there is a stretch of equilibria, this largest value is at the lower end of the stretch, whereas if there is a stretch of disequilibria it is just beyond the upper end of the stretch. However, in the latter case, as the walk moves through the stretch, it may happen as in Fig. 7d that it never reaches j = i D and instead turns downward because there is an even stronger incentive for additional cooperation. The examples in Fig. 7 represent just two of the five distinct outcomes among the ten total possibilities listed in Table 2, namely when there is either a bounded stretch of equilibria (Type 3a) or a bounded stretch of disequilibria (Type 5a), and n is large enough that the entire stretch is apparent within the game. Figure 8 shows three more outcomes: a case in which additional defection is favored for all i (Fig. 8a), a case in which there is a stretch of equilibria capped by n (Fig. 8b), and a case in which there is a single equilibrium point (Fig. 8c). These are the first (Type 1), fourth (Type 2), and tenth (Type 3b) of ten possibilities in Table 2. The ninth possibility in Table 2 (Type 5b), when incentives switch between i D = i C and i D = i C , is not depicted but will simply result in a cycle between those two adjacent states. We have also not depicted an example of the seventh possibility (Type 4) but note that it produces multi-state cycles similar to what is shown in Fig. 7b, d.
We may also consider k * (i) and j * separately for j ≥ i above the diagonal and j ≤ i below the diagonal. Best responses on the side of increasing defection ( j ≥ i) are tentative in the sense that they proceed incrementally, the optimum for a given i being j = i+ k * (i) . When the upper limit to increasing defection i D exists, the best-response walk leads to it, but generally in increments of decreasing size as in Fig. 8b rather than all at once. When i D does not exist, a richer set of possibilities can occur, depending on the value of a 0 . If a 0 < d, then k * (i) still decreases with i, from k * (0) = J opt down to its lower limit of k * (i) = 1 for some larger i. If a 0 = d, then k * (i) is a constant (J opt ) as in Fig. 8a. If a 0 > d, then k * (i) is an increasing function of i. We prove these statements in Appendix 4. In sum, the best-response number of defections, j = i + k * (i) , is an increasing function of i, approaching i D in steps if i D exists and otherwise stopping only when it hits the cap n. Note (cf. Fig. 8c) that we use increasing to mean non-decreasing, as opposed to strictly increasing.
In contrast, the result j * says that when i C exists, then for any partner strategies with i ≥ i C , the best-response number of defections is i C . There is no incentive to try some intermediate number of defections but only to jump straight to the endpoint of the best-response walk on the side of increasing cooperation. Of course, i C might not exist and then there is never an incentive to cooperate more. But when i C does exist, the best option is to cooperate from the beginning of the game and to continue cooperating for n − i C steps, even if the partner is going to defect in every round. Note that a decision to cooperate more comes from a position of both players already defecting too much (i > i C ), so the prospect and resulting cost of having to survive alone are high. Additional cooperation by the individual also directly benefits the partner, which is different than the case of increasing defection when i < i D , where the interests of the individual and the partner are not aligned.

The parameter space of Prisoner-Dilemma survival games
Throughout this work, we have been particularly interested in whether cooperation can become favored upon iteration, due to low loner survivability, despite a single-step Prisoner's Dilemma. In this section, we characterize the parameter space of single-step games to see how broadly this holds. We sample parameter sets randomly, subject to (17) and (18), i.e. c > a > d > b and a 2 > bc, then bin them according to the five possible incentive structures introduced in Table 2 and explored in terms of best-response walks in Sect. 5. Here, we describe these five qualitatively different incentive structures simply as in Table 3. We may say that cooperation is supported, in the sense of there being at least some checks on defection, under four of the five incentive structures (Types 2-5) but is clearly not supported when additional defection is always favored regardless of the partner's strategy (Type 1). Table 3 gives the results when the survival probabilities (a, b, c, d, a 0 ) are sampled uniformly at random under two different models. The first model draws from the entire parameter space. This admits many cases where having a partner at all is an obvious disadvantage (a 0 > a, b, c, d) and where the prospects of surviving even a single round may be low. The second model samples over two narrower ranges: (0.9, 1) for a, b, c and d, and (0.7, 1) for a 0 . This focuses on games which are not too harsh in a single step, and in which a 0 may be relatively small but may still exceed a, b, c, and d. Note that the cutoffs for cooperation possibly being favored fall between pairwise and individual survival probabilities: a 2 < a * 0 < a and d 2 < a 0 , a 0 < d. In this second model, the lower bound for all pairwise survival probabilities is 0.9 2 = 0.81, so there is a strong chance a loner will be worse off than a pair of individuals.
We generated one million random samples of parameters for each model. For each sample, we took four uniform random numbers in the appropriate range, sorted and labeled them so that c > a > d > b. Thus, all one million initial samples satisfied assumption (17). We then excluded samples which did not meet assumption (18), namely a 2 > bc. This excluded about 10% of samples in the first model and about 24% in the second model. For each remaining sample, we generated an a 0 uniformly at random, again in the range for each model. Finally, we checked the samples against the criteria in Table 2, binned them Table 3 Percent outcomes for one million parameter sets sampled uniformly at random according to two different models, in which the single-step game is a Prisoner's Dilemma (c > a > d > b and a 2 > bc)

Incentive structure
Models for random sampling a, b, c, d ∈ (0, 1) a, b, c, d ∈ (0.9, 1) a 0 ∈ (0, 1) a 0 ∈ (0.7, 1) The descriptions of incentive structures 2 through 5 hold when the cap n is large enough (n > i D , i C ) into the five qualitatively different incentive structures, and computed the percentages of samples falling into each bin. Under the first model, across the entire parameter space of these games, additional defection is favored against any partner strategy about two-thirds of the time. Most of the other one-third is occupied by games with unbounded stretches of equilibria. Games with stretches of disequilibria are rare. Under the second model, with a restricted parameter space which should tend to favor cooperation, defection is favored regardless of partner strategy less than one-quarter of the time. Bounded stretches of equilibria or disequilibria are more frequent. Unbounded stretches of disequilibria remain rare, which makes sense as this requires a 0 to fall between a 0 and a 0 .
We may also quantify the extent of checks on the number of end-game defections in cases where cooperation is supported. Due to the shapes of i D and i C as functions of a 0 (recall Figs. 5 and 6), which remain relatively flat until diverging sharply as a 0 approaches a 0 and a 0 , there are essentially two kinds of games. On the one hand, if a 0 ≥ a 0 , a 0 , then defection is clearly favored. On the other hand, if a 0 < a 0 or a 0 < a 0 , then there are rather strong checks on defection. Across all cases in which i D or i C existed under the first model in Table 3, the median i D was 0.3 and the 90th percentile i D was 2.0. The median i C was 1.6 and the 90th percentile i C was 4.6. In the second model, the median i D was 2.4 and the 90th percentile i D was 16.8; the median i C was 4.5 and the 90th percentile i D was 22.5.

Discussion
We studied the effects of switching from C to D at some point during an iterated, two-player survival game. We focused on the case where each single step is a canonical Prisoner's Dilemma (c > a > d > b, a 2 > bc) and asked how the temptation to defect on one's partner might be offset by the threat of having to survive alone. We found three critical values for the loner survival probability (a * 0 , a 0 , a 0 ) which establish broad patterns of incentives to cooperate more or defect more. When a 0 is small relative to these values, choosing to defect early in the game is disadvantageous and more cooperative strategies are supported. The opposite is true when a 0 is large.
Our model of strategy choice in an iterated survival game complements previous ones which have assumed that individuals possess fixed single-step strategies. Eshel and Wein-shall (1988) modeled single-step strategies as probabilistic mixtures of C and D, with n geometrically distributed and (a, b, c, d) drawn from a distribution with non-zero probabilities for Harmony Games (a ≥ c, b ≥ d) as well as Prisoner's Dilemmas (c > a > d > b). Eshel and Shaked (2001) added the possibility of non-independence of players' survival within each step. Garay (2009) considered mixtures like those of Eshel and Weinshall (1988) but in a game of fixed length and with constant single-step payoffs. Wakeley and Nowak (2019) studied the choice between the two pure, single-step strategies, C and D, in games of fixed length.
We constrained our players to choose among the pure n-step strategies, S i in which i is the number of end-game defections. This limits their options from 2 n possible n-step strategies to an array of n + 1 strategies. Their only question is when to start defecting. They decide this before the game begins, rather than reactively during the game. Also, though they may end up alone, the only way for this to happen is for their partner to die. When a 0 is large, our model suggests defecting early in hopes of becoming a loner soon. Throughout this work, we have been motivated by the other possibility, that individuals might want to stay in the game to avoid becoming a loner when a 0 is not too large, in which case our model is a lot more palatable. But in either scenario, it would be of interest to loosen our restrictions and allow individuals to react and to become loners without causing their partner's demise.
Outside the context of iterated survival games, two lines of related work have shown how cooperation can be supported when individuals may opt out of interactions. In the first approach, individuals may decide to opt out of a game when loners receive a separate, potentially viable payoff (Hauert et al. 2002a, b;Garcia et al. 2015;Rossine et al. 2020). In our model, a decision to terminate a partnership would result in both players becoming loners, each surviving by a 0 for the remaining steps of the game. Depending on how this was implemented, opting out could become more attractive than defecting (and waiting for the partner to die) when a 0 is large.
In the second approach, individuals may opt out depending on what the partner has done-especially if the partner defects-and then obtain a new partner (Izquierdo et al. 2010(Izquierdo et al. , 2014Zhang et al. 2016;Zheng et al. 2017). This provides a mechanism for positive assortment in interactions, which is known generally to support cooperation (Eshel and Cavalli-Sforza 1982;Taylor and Nowak 2006). In our model, rejecting one partner and getting another at some step of the game seems at odds with the basic principle that loner survivability should matter. But if assortment took the form of propensities, so that the payoff for S j was an average over partner strategies S i which favored i = j, we might infer from Sect. 5.2 that additional defection would be even less advantageous for small a 0 than we find here.
A two-player iterated survival game is a game against Nature in which individuals may succeed by working together. Our players can do this even when choosing strategies which maximize individual survival at the possible expense of the partner. But their success is limited. We have shown that stretches of equilibria may exist (Table 2, Type 2 and 3) where individuals have no incentive to change strategy. We have also shown (Sect. 5.2) that both players simultaneously cooperating one more time is advantageous regardless of their current strategy. The fact that best-response walks get stuck at i = i C when they start with i > i C and never move if they begin inside a stretch of equilibria ( i D < i < i C ) illustrates the shortcomings of the rational, myopic, noncooperative players of traditional game theory (Binmore 1987).
One way in which players could access better equilibria would if they were able to collaborate (Sugden 1993;Bacharach 1999;Newton 2017;Rusch 2019). They might, for example, make a rational agreement to consider joint changes in strategy which benefitted both if no unilateral change was advantageous. If such players were constrained to choose among adjacent strategies as in Sect. 4, this would allow them to proceed through potentially long stretches of equilibria, increasing cooperation until they reached i = i D . They would then cycle between i D − 1 and i D . If instead such players could choose among all strategies as in Sect. 5, they would jump from any equilibrium strategy straight to the best pairwise strategy, at i = 0, where they would then see the advantageous unilateral move to J opt . The end result would be a cycle including i = i D and i = 0 (cf. Fig. 7c and Fig. 8b, c).
Another way to allow the exploration of more favorable equilibria would be for strategy selection to occur via evolution in a finite population (Young and Foster 1991;Kandori et al. 1993;Binmore et al. 1995;Amir and Berninghaus 1996;Binmore and Samuelson 1997;Nowak et al. 2004;Fudenberg et al. 2006;Sandholm 2010). This could be done by including multiple strategies and mutations in the finite population model of Wakeley and Nowak (2019). Although the details would depend on the size and structure of the population as well as on the structure of mutation-e.g. nearest-neighbor as in Sect. 4 or distributed in some way over all possible strategies-we expect that the attainment of mutually beneficial states like i = i D would be facilitated relative to our best-response walk or to the infinite-population replicator dynamic (Taylor and Jonker 1978).
This extension to finite populations would not be as simple as the Moran model (Moran 1958(Moran , 1962 of Binmore et al. (1995) because both individuals may die in our survival game, but we expect similar results would hold. At stationarity with high mutation rates, multiple strategies would be present in the population at once, whereas with low mutation rates, the population would be monomorphic most of the time. In both cases, the full range of equilibrium strategies could be explored, and in the latter case the population would be concentrated on particular ones of these (Binmore et al. 1995;Fudenberg et al. 2006;Sandholm 2007Sandholm , 2010. We might predict that individuals' strategies would vary around i D when stretches of equilibria exist. We might also speculate that when stretches of disequilibria exist, so that i C is the best state in the stretch when both players have the same strategy, finite-population dynamics could offset the tendency to cycle back to less mutually favorable states. Our model of switching strategies in a two-player iterated survival game essentially provides a mechanism for 'by-product mutualism' (West-Eberhard 1975;Brown 1983;Mesterton-Gibbons and Dugatkin 1992) specifically where cooperative behaviors are supported if the longer-term consequences of cooperation versus defection are taken into account (Mesterton-Gibbons and Dugatkin 1997;Garay 2009;Smaldino et al. 2013;De Jaegher and Hoyer 2016). We have shown how iteration can change the game so as to favor cooperation (De Jaegher 2019), notably without any details of behavior or ecology, only the multiplicative accrual of survival payoffs.
Both i = i D and a 0 = d 2 are solutions of (63). The solution a 0 = d 2 is true for all i. We want to know how the other solution depends on a 0 , and for this we write i D (a 0 ). We use a graphical method depicted in Fig. 9. Specifically, the two solutions of (63)  increases with a 0 and, for a given a 0 < d 2 , it increases with i. Then because these curves are anchored at a 0 = d 2 , the other points at which they cross the diagonal, which we call a 0 (i), must also increase with i. Considering two values of i, with i 1 > i 2 > 0, we so that a 0 (i 2 ) < d 2 ⇒ a 0 (i 1 ) > a 0 (i 2 ) and a 0 (i 2 ) > d 2 ⇒ a 0 (i 1 ) > d 2 , and so that a 0 (i 2 ) > d 2 ⇒ a 0 (i 1 ) > a 0 (i 2 ). Finally, because a 0 (i) is a positive strictly increasing function, its reciprocal function i D (a 0 ) is a strictly increasing function, which is what we set out to prove.

Appendix 3
A graphical proof shows that the stretch of equilibria grows with a 0 in the case a 0 < a 0 . Assume some k > 0. Then we have As shown in Fig. 10, graphing the two sides of the right-hand equality in (66) as functions of a 0 shows that the two curves intersect at a 0 = d 2 regardless of k. This point anchors all the curves, though it is not a permissible solution of (66) because (66) was derived assuming which is close to a 0 in this case and marked by the thin vertical line d 2 = a 0 . For any given k, the two curves intersect again at another a 0 which is the solution of (66) and which increases with k. We call this value a 0 (k). Then for k 1 > k 2 > 0, so we have a 0 (k 2 ) ≤ d 2 ⇒ a 0 (k 1 ) ≥ a 0 (k 2 ) and a 0 (k 2 ) > d 2 ⇒ a 0 (k 1 ) > d 2 . Further, so a 0 (k 2 ) > d 2 ⇒ a 0 (k 1 ) > a 0 (k 2 ). Therefore a 0 (k) is an increasing function, which means that the bigger the difference between i C and i D is, the bigger a 0 has to be. This proves that the length of the equilibrium stretch i C − i D increases with a 0 , approaching infinite length as a 0 approaches a 0 . When a 0 < a 0 < a 0 the situation is like the first case, d 2 ≥ bc which also has infinite i C , and the interval of equilibria i D , n will shrink until it disappears when a 0 ≥ a 0 .
The second term in the brackets in (71) is always positive thanks to the diagonal behavior described in Sect. 5.2. The first term in the brackets is also positive, because l − 1 < i D , meaning there is a local incentive to defect one more time. Note, we used the fact that i D is does not depend of the number of rounds in the game (l or n). To finish the proof, we further note that k * (i) = max {k ≥ 1|A(S i+k , S i ) < A(S i+k−1 , S i )}.