The Evolution of Cooperation Through Institutional Incentives and Optional Participation

Rewards and penalties are common practical tools that can be used to promote cooperation in social institutions. The evolution of cooperation under reward and punishment incentives in joint enterprises has been formalized and investigated, mostly by using compulsory public good games. Recently, Sasaki et al. (2012, Proc Natl Acad Sci USA 109:1165–1169) considered optional participation as well as institutional incentives and described how the interplay between these mechanisms affects the evolution of cooperation in public good games. Here, we present a full classification of these cases of evolutionary dynamics. Specifically, whenever penalties are large enough to cause the bi-stability of both cooperation and defection in cases in which participation in the public good game is compulsory, these penalties will ultimately result in cooperation if participation in the public good game is optional. The global stability of coercion-based cooperation in this optional case contrasts strikingly with the bi-stability that is observed in the compulsory case. We also argue that optional participation is not as effective under rewards as under punishment.

Obviously, whether rewards or penalties, sufficiently large incentives can transform freeloaders into full cooperators, and incentives with small impact do nothing on the outcomes [50]. However, incentivizing is costly, and such heavy incentives often incur serious costs on those who provide the incentives, whether in a peer-to-peer or institutional manner. Previous game-theoretic studies on the evolution of cooperation with incentives have focused on public good games with compulsory participation, and revealed that the intermediate degrees of punishment lead to a couple of stable equilibria, full defection and full cooperation [7,8,42,47,50,54]. In this bi-stable dynamics, establishing full cooperation requires an initially sufficient fraction of cooperators, or ex ante adjustment to overcome the initial condition [8,42]. This situation is a coordination game [57], which is a model of great interest for analyzing a widespread coordination problem (e.g., in choosing distinct technical standards).
In contrast to a traditional case with compulsory participation, another approach to the evolution of cooperation is an option to opt out of joint enterprises [1,6,10,18,24,25,32,35,40,49,52,62,65]. The opting-out option can make the freeloader problem relaxed: individuals can exit a joint venture when stuck in a state in which all freeload off one another ("economic stalemate"), and then pursue a stand-alone project; if a joint venture with mutual cooperation is more profitable than in isolation, the individuals once exited will switch to contributing to the venture. This situation, however, will also find defection attractive. Thus, joint enterprises with optional participation can give rise to a rock-paper-scissors cycle [24, 25,35,52].
Recently, Sasaki et al. [50] revealed that considering optional participation as well as institutional incentives can effect fully cooperative outcomes for the intermediate ranges of incentives. They demonstrated that opting-out combined with rewarding is not very effective at establishing full cooperation, but opting-out combined with punishment is very effective at establishing cooperation. Although there are a series of existing papers on the interplay of punishment and opting-out mechanisms [9,13,16,26,55,56,61], the main points of these earlier studies comprise solving the puzzling issue of second-order freeloading: the exploitation of the efforts of others to uphold incentives for cooperation [7,38,41,43,63]. Sasaki et al. [50] consider incentives controlled exclusively by a centralized authority (like the empire or state) [2,4,12,31], and thus, their model is already free from the second-order freeloader problem.
Here we analytically provide a full classification of the replicator dynamics in a public good game with institutional incentives and optional participation. We clarify when and how cooperation can be selected over defection in a bi-stable situation associated with institutional punishment without requiring any ability to communicate among individuals. In particular, assuming that the penalties are large enough to cause bi-stability with both full cooperation and full defection (no matter what the basins of attraction are) in cases of compulsory participation, cooperation will necessarily become selected in the long term, regardless of the initial conditions. The paper is organized as follows. In Sect. 2, we formalize optional public good games with institutional incentives and determine the average payoffs for the three strategies: cooperation, defection, and non-participation. In Sect. 3, based on analytical results from compulsory games (Sect. 3.1), we explore the interior equilibrium (Sect. 3.2) and in detail classify global dynamics for the three strategies (Sect. 3.3). Finally, in Sect. 4 we provide further discussion and concluding remarks.

Social Dilemmas
To describe our institutional-incentive model, we start from public good games with group size n ≥ 2. The n players in a group are given the opportunity to participate in a public good game. We assume that participation pays a fixed entrance fee σ > 0 to the sanctioning institution, whereas non-participation yields nothing. We denote by m the number of players who are willing to participate (0 ≤ m ≤ n) and assume that at least two participants are required for the game to occur [9,13,24,26,55]. If the game does take place, each of the m participants in the group can decide whether to invest a fixed amount c > 0 into a common pool, knowing that each contribution will be multiplied by r > 1 and then shared equally among all m − 1 other participants in the group. Thus, participants have no direct gain from their own investments [13,15,55,56,63]. If all of the participants invest, they obtain a net payoff (r − 1)c > 0. The game is a social dilemma, which is independent of the value of r, because participants can improve their payoffs by withholding their contribution.
Let us next assume that the total incentive stipulated by a sanctioning institution is proportional to the group size m and hence of the form mδ, where δ > 0 is the (potential) per capita incentive. If rewards are employed to incentivize cooperation, these funds will be shared among the so-called "cooperators" who contribute (see [48] for a voluntary reward fund). Hence, each cooperator will obtain a bonus that is denoted by mδ/n C , where n C denotes the number of cooperators in the group of m participants. If penalties are employed to incentivize cooperation, "defectors" who do not contribute will analogously have their payoffs reduced by mδ/n D , where n D denotes the number of defectors in the group of m participants (m = n C + n D ).
We consider an infinitely large and well-mixed population of players, from which n samples are randomly selected to form a group for each game. Our analysis of the underlying evolutionary game is based especially on the replicator dynamics [30] for the three corresponding strategies of the cooperator, defector, and non-participant, with respective frequencies x, y, and z. The combination of all possible values of (x, y, z) with x, y, z ≥ 0 and x + y + z = 1 forms the triangular state space Δ. We denote by C, D, and N the three vertices of Δ that correspond to the three homogeneous states in which all cooperate (x = 1), defect (y = 1), or are non-participants (z = 1), respectively. For Δ, the replicator dynamics is defined byẋ whereP s denotes the average payoff in the entire population; P s C , P s D , and P s N denote the expected payoff values for cooperators, defectors, and non-participants, respectively; and s = o, r, p is used to specify one of three different incentive schemes, namely, "without incentives," "with rewards," and "with punishment," respectively. Because non-participants have a payoff of 0, P s N = 0, and thus,P s = xP s C + yP s D . We note that if (r − 1)c > σ , the three edges of the state space Δ form a heteroclinic cycle without incentives: N → C → D → N (Figs. 2a or 3a). Defectors dominate cooperators because of the cost of contribution c, and non-participants dominate defectors because of the cost of participation σ . Finally, cooperators dominate non-participants because of the net benefit from the public good game with (r − 1)c > σ . In the interior of Δ, all of the trajectories originate from and converge to N, which is a non-hyperbolic equilibrium. Hence, cooperation can emerge only in brief bursts, sparked by random perturbations [13,25].

Payoffs
Here, we calculate the average payoff for the whole population and the expected payoff values for cooperators and defectors. In a group with m − 1 co-participants (m = 2, . . . , n), a defector or a cooperator obtains from the public good game an average payoff of rcx/(1 − z) [13]. Hence, Note that z n−1 is the probability of finding no co-players and, thus, of being reduced to nonparticipation. In addition, cooperators contribute c with a probability 1 − z n−1 , and thus, . We now turn to the cases with institutional incentives. First, we consider penalties. Because cooperators never receive penalties, we have P p C = P o C . In a group in which the m − 1 co-participants include k cooperators (and thus, m − 1 − k defectors), switching from defecting to cooperating implies avoiding the penalty mδ/(m − k). Hence, and thus,P Next, we consider rewards. It is now the defectors who are unaffected, implying P r D = P o D . In a group with m − 1 co-participants, including k cooperators, switching from defecting to cooperating implies obtaining the reward mδ/(k + 1). Hence, and thus,P

Coordination and Coexistence for Compulsory Participation
We investigated the interplay of institutional incentives and optional participation. As a first step, we considered replicator dynamics along the three edges of the state space Δ. On the DN-edge (x = 0), this dynamics is always D → N because the payoff for non-participating is better than that for defecting by at least the participation fee σ , regardless of whether penalties versus rewards are in place. On the NC-edge (y = 0), it is obvious that if the public good game is too expensive (i.e., if σ ≥ (r − 1)c, under penalties or σ ≥ (r − 1)c + δ, under rewards), players will opt for non-participation more than cooperation. Indeed, N becomes a global attractor becauseż > 0 holds in Δ \ {z = 0}. We do not consider cases further but assume that the dynamics of the NC-edge is always N → C.
On the CD-edge (z = 0), the dynamics corresponds to compulsory participation, and Eq.
. Clearly, both of the ends C (x = 1) and D (x = 0) are fixed points. Under penalties, the term for the payoff difference is Under rewards, it is Because δ > 0, P p C − P p D strictly increases, and P r C − P r D strictly decreases, with x. The condition under which there exists an interior equilibrium R on the CD-edge is Next, we summarize the game dynamics for compulsory public good games ( Fig. 1). For such a small δ that δ < δ − , defection is a unique outcome; D is globally stable, and C is unstable. For such a large δ that δ > δ + , cooperation is a unique outcome; C is globally stable, and D is unstable. For the intermediate values of δ, cooperation evolves in different ways under penalties versus rewards, as follows. Under penalties (Fig. 1a), as δ crosses the threshold δ − , C becomes stable, and an unstable interior equilibrium R splits off from C. The point R separates the basins of attraction of C and D. Penalties cause bi-stable competition between cooperators and defectors, which is often exhibited as a coordination game [57]; one or the other norm will become established, but there can be no coexistence. With increasing δ, the basin of attraction of D becomes increasingly smaller, until δ attains the value of δ + . Here, R merges with the formerly stable D, which becomes unstable.
In contrast, under rewards ( Fig. 1b), as δ crosses a threshold δ − , D becomes unstable, and a stable interior equilibrium R splits off from D. The point R is a global attractor. Rewards give rise to the stable coexistence of cooperators and defectors, which is a typical result in a snowdrift game [58]. As δ increases, the fraction of cooperators within the stable coexistence becomes increasingly larger. Finally, as δ reaches another threshold δ + , R merges with the formerly unstable C, which becomes stable. We note that both δ + and δ − do not depend on whether we take into account rewards or penalties. (a) Punishment. When δ increases beyond a threshold δ − , an unstable interior equilibrium R enters the state space at x = 1, moves left, and eventually exits it at x = 0 for δ = δ + . (b) Rewards. When δ increases beyond a threshold δ − , a (globally) stable interior equilibrium R enters the state space at x = 0, moves right, and eventually exits it at x = 1 for δ = δ + . Consequently, for the interval δ − < δ < δ + (gray-colored region), punishment results in bi-stability of both pure states; rewards lead to a stable mixture independent of the initial state. Parameters: n = 5, r = 3, c = 1, and σ = 0.5

The Interior Equilibrium Q for Optional Participation
Now, we consider the interior of the state space Δ. We start by exploring the fixed point in the interior. For this purpose, we introduce the coordinate system (f, z) in Δ \ {z = 1}, with f = x/(x + y), and we rewrite Eq. (1) aṡ Dividing the right-hand side of Eq. (10) by 1 − z n−1 , which is positive in Δ \ {z = 1}, corresponds to a change in velocity and does not affect the orbits in Δ [30]. Using Eqs.
(3)-(6), this transforms Eq. (10) into the following. Under penalties, Eq. (10) becomeṡ whereas under rewards, it becomeṡ where Note that H (f, 0) = n−2 i=0 f i and H (f, 1) = 1. At an interior equilibrium Q = (f Q , z Q ), the three different strategies must have equal payoffs, which, in our model, means that they all must equal 0. The conditions P o C = P p C = 0 under penalties and P o D = P r D = 0 under rewards imply that f Q is given by respectively. Thus, if it exists, the interior equilibrium Q must be located on the line given by f = f Q . From Eqs. (11) and (12), Q must satisfy When there are only two players (i.e., pairwise interactions with n = 2), there are either no interior equilibria or else a line of interior equilibria that connects R and N (the latter situation can arise for only one choice of δ). A summary of the dynamics for n = 2 is given in Sect. 3.4. Here we analyze the general case of a public good game with more than two players (i.e., n > 2). Then, if Q exists, it is uniquely determined and a saddle point, whether incentives are penalties or rewards (see Appendices A.1 and A.2 for detailed proofs of the uniqueness and the saddle, respectively). As δ increases, Q splits off from R (with x R = f Q ) and moves across the state space along the line given by Eq. (14) and finally exits this space through N. The function H decreases with increasing z, and the right-hand side of Eq. (15) decreases with increasing δ, which implies that z Q increases with δ. By substituting Eq. (13) into Eq. (15), we find that the threshold values of δ for Q's entrance (z = 0) and exit (z = 1) into the state space are respectively given by where B = f Q(p) (and s = p) under penalties, and B = 1 − f Q(r) (and s = r) under rewards. We note that δ − < δ s ≤ δ s < δ + , which is an equality only for n = 2.

Classification of Global Dynamics
Here, we analyze in detail the global dynamics using Eqs. (11) and (12) (1, 1), are hyperbolic equilibria, except when each undergoes bifurcation (as shown later). We note that the dynamic on the N 1 N 0 -edge is unidirectional to N 0 without incentives. First, we examine penalties. From Eq. (11), the Jacobians at C and N 0 are respectively given by From our assumption that (r − 1)c > σ , it follows that if δ < c/n, then det J C < 0, and thus, C is a saddle point; otherwise, det J C > 0 and tr J C < 0, and thus, C is a sink. Regarding N 1 , if δ < c/2, N 1 is a source (det J N 1 > 0 and tr J N 1 > 0); otherwise, N 1 is a saddle (det J N 1 < 0). Next, the Jacobians at D and N 0 are respectively given by If δ < c, D is a saddle point (det J D < 0), and N 0 is a sink (det J N 0 > 0 and trJ N 0 < 0); otherwise, D is a source (det J D > 0 and tr J D > 0), and N 0 is a saddle point (det J N 0 < 0). We also analyze the stability of R. As δ increases from c/n to c, the boundary repellor R = (x R , 0) enters the CD-edge at C and then moves to D. The Jacobian at R is given by Its upper diagonal component is positive because ∂H (f, z)/∂f ≥ 0 and H > 0, whereas the lower component vanishes at In addition, a new boundary equilibrium S = (x S , 1) can appear along the N 1 N 0 -edge. Solvingḟ (x S , 1) = 0 in Eq. (11) yields x S = (c − δ)/δ; thus, S is unique. S is a repellor along the edge (as is R). As δ increases, S enters the edge at N 1 (for δ = c/2) and exits it at N 0 (for δ = c). The Jacobian at S is given by Again, its upper diagonal component is positive. Using x S = (c − δ)/δ, we find that the sign of the lower component changes once, from positive to negative, as δ increases from c/2 to c. Therefore, S is initially a source (det J S > 0 and tr J S > 0) but then turns into a saddle point (det J S < 0), which is stable with respect to z.
Let us now turn to rewards. From Eq. (12), the Jacobians at D and N 0 are If δ < c/n, D is a saddle point (det J D < 0); otherwise, D is a source (det J D > 0 and tr J D > 0). Regarding N 0 , if δ < c/2, N 0 is a sink (det J N 0 > 0 and tr J N 0 < 0); otherwise, N 0 is a saddle point (det J N 0 < 0). Meanwhile, the Jacobians at C and N 1 are From (r − 1)c > σ − δ, it follows that if δ < c, C is a saddle point (det J C < 0), and N 1 is a source (det J N 1 > 0 and tr J N 1 > 0); otherwise, C is a sink (det J C > 0 and tr J C < 0), and N 1 is a saddle point (det J N 1 < 0). We also analyze the stability of R. As δ increases from c/n to c, the boundary attractor R enters the CD-edge at D and then moves toward C. The Jacobian at R is given by Its upper diagonal component is negative because ∂H (1 − f, z)/∂f ≤ 0 and H > 0, and the lower component vanishes at x R = f Q(r) = σ/(rc). Therefore, if 0 < x R < f Q(r) , R is a saddle point (det J R < 0) and unstable with respect to z; otherwise, if f Q(r) < x R < 1, R is a sink (det J R > 0 and tr J R < 0). Similarly, a boundary equilibrium S can appear along the N 1 N 0 -edge. Solvingḟ (x S , 1) = 0 in Eq. (12) yields x S = 1 − (c − δ)/δ, and thus, S is unique. S is an attractor along the edge (as is R). As δ increases, S enters the edge at N 0 (for δ = c/2) and exits at N 1 (for δ = c). The Jacobian at S is Again, its upper diagonal component is positive. Using x S = 1 − (c − δ)/δ, we find that the sign of the lower component changes once, from negative to positive, as δ increases from c/2 to c. Therefore, S is initially a sink (det J S > 0 and tr J S < 0) and then becomes a saddle point (det J S < 0), which is unstable with respect to z. We give a full classification of the global dynamics, as follows.
1. For 0 ≤ δ < δ − (Figs. 2a and 3a), C and D are saddle points, N 1 is a source, and N 0 is a sink. There is no other equilibrium, andḟ < 0 holds in the interior state space. All interior orbits originate from N 1 and converge to N 0 . N 0 is globally stable. After applying the contraction map, we find that the interior of Δ is filled with homoclinic orbits originating from and converging to N. 2. As δ crosses δ − (Figs. 2b and 3b), under penalties, C becomes a sink, and the saddle point R enters the CD-edge at C; under rewards, D turns into a source, and R enters the same edge through D.
Penalties. There exists an orbit originating from N 1 and converging to R that separates the basins of attraction of C and N 0 . All of the orbits in the basin of N 0 have their αlimits at N 1 . Hence, the corresponding region in Δ is filled with homoclinic orbits and is surrounded by a heteroclinic cycle N → R → D → N. However, if the population is in the vicinity of N, small and rare random perturbations will eventually send the population into the basin of attraction of C (as is the case for c/2 < δ).
Rewards. There exists an orbit originating from R and converging N 0 . In contrast to the case with penalties, N 0 remains a global attractor. A region separated by the orbit RN 0 encloses orbits with N 1 as their α-limit. Therefore, in Δ, the corresponding region is filled with homoclinic orbits that are surrounded by a heteroclinic cycle N → C → R → N. 3. As δ crosses c/2 (Figs. 2c and 3c), under penalties, N 1 becomes a saddle point, and the source S enters the N 1 N 0 -edge at N 1 ; under rewards, N 0 becomes a saddle point, and the sink S enters the same edge at N 0 . As δ increases, S moves toward N 0 (penalties) or N 1 (rewards). Penalties. If c/2 < δ p holds, then for c/2 < δ < δ p , there is still an orbit originating from S and converging to R that separates the interior of Δ into the basins of attraction of C and N 0 . All of the orbits in the basin of N 0 have their α-limits at N 1 , as before. In Δ, the separatrix NR and the NC-edge now intersect transversally at N, and the entrance of a minority of participants (including cooperators and defectors) into the greater population of non-participants may be successful.
Rewards. If c/2 < δ r holds, then for c/2 < δ < δ r , there exists an orbit originating from R and converging to S that divides the interior of Δ into two regions: one of them , which is injective except for z = 1. The edge is contracted to N. The vertices of U are denoted by C = (1, 0), D = (0, 0), N 1 = (1, 1), and N 0 = (0, 1). The stream plot is based on Eq. (11). Dotted and dashed curves in U denote whereḟ andż vanish, respectively. (a) Without incentives, the interior of U is filled with orbits originating from N 1 and then converging to N 0 , which correspond to homoclinic cycles fully covering the interior of Δ. (b) As δ increases, the equilibrium R (a saddle point) first enters the CD-edge at C, which then becomes a sink. (c) When δ crosses c/2, the equilibrium S (a source) enters the N 1 N 0 -edge at N 1 , which then becomes a saddle point. (d) When δ crosses δ p , the saddle point Q enters the interior of Δ through R, which then becomes a source. Q traverses U along a horizontal line. (e) When δ crosses δ p , Q exits Δ through S, which then becomes a saddle. For larger values of δ, there is no interior orbit that originates from the N 1 N 0 -edge and converges to it, and thus, Δ has no homoclinic cycle. When δ crosses δ + , R and S exit Δ through D, which becomes a source, respectively N 0 , which becomes a saddle. (f) For δ > δ + , the interiors of U and Δ are filled with orbits originating from D and converging to C. Parameters are the same as in Fig. 1 For larger values of δ, there is no interior orbit that originates from the N 1 N 0 -edge and converges to it and, thus, Δ has no homoclinic cycle. When δ crosses δ + , R and S exit Δ through C, which becomes a sink, respectively N 1 , which becomes a saddle. (f) For δ > δ + , C is a global attractor as in Fig. 2f. The parameters are the same as in Figs. 1 and 2, except δ = 0 (a) consists of orbits originating from N 1 , corresponding in Δ to a region filled with homoclinic orbits; the other one consists of orbits originating from D. 4. Penalties. As δ crosses δ p (Fig. 2d), the saddle point Q enters the interior of Δ through R, which becomes a source. Based on the uniqueness of Q and the Poincaré-Bendixson theorem ( [30], Appendix A.3), we can see that there is no such homoclinic orbit originating from and converging to Q, and the unstable manifold of Q must consist of an orbit converging to C and an orbit converging to N 0 ; the stable manifold of Q must consist of an orbit originating from D and an orbit originating from S (or, in the case that δ p < c/2, from N 1 for δ p < δ < c/2). The stable manifold separates the basins of attraction of C and N 0 ; the unstable manifold separates the basin for N 0 into two regions. One of these regions is filled with orbits originating from S (or from N 1 under the above conditions) and converging to N 0 . For Δ, this means that the corresponding region is filled with homoclinic orbits and is surrounded by a heteroclinic cycle N → Q → N (Fig. 2d). As δ further increases, Q moves across U , from the CD-edge to the N 1 N 0 -edge along the line f = f Q(p) . Rewards. As δ crosses δ r (Fig. 3d), Q enters the interior of Δ through R, which becomes a sink. As δ continues to increase, similarly Q moves to the N 1 N 0 -edge, along the line f = f Q(r) . There is no homoclinic loop for Q, as under penalties, and now, we find that the stable manifold of Q must consist of two orbits originating from D and N 1 ; the unstable manifold of Q must consist of an orbit converging to R and another converging to S (or, in the case that δ r < c/2, to N 0 for δ r < δ < c/2 (Fig. 3c)). The stable manifold separates the basins of attraction of R and S (or N 0 under the above conditions); the unstable manifold separates the basin for S (or N 0 ) into two regions. One of these regions is filled with orbits issuing from N 1 and converging to S (or N 0 ). The corresponding region in Δ is filled with homoclinic orbits and is surrounded by a heteroclinic cycle N → Q → N (Figs. 3c and 3d). If the population is in the vicinity of N, small and rare random perturbations will eventually send the population into the basin of attraction of R (as is the case for δ r < δ). 5. As δ crosses δ p under penalties (Fig. 2e) or δ r under rewards (Fig. 3e), Q exits the state space through S, which then becomes a saddle point. For larger values of δ, there is no longer an interior equilibrium. 6. Finally, as δ crosses δ + (Figs. 2f and 3f), R and S simultaneously exit U , through D and N 0 (penalties) or C and N 1 (rewards), respectively. For δ + < δ, N 1 and N 0 are saddle points, D is a source, and C is a sink.ḟ > 0 holds throughout the state space. All of the interior orbits originate from D and converge to C. Hence, C is globally stable.
At this moment, throughout both incentives, R and S in U undergo bifurcation simultaneously, and the line f = f Q given in Eq. (14), which consists of a continuum of equilibria, connects R and S (and in Δ, R and N) (Fig. 4). When δ does not take the specific value in Eq. (25), there is no interior equilibrium, and the global dynamics is classified as in the general case when n > 2 (see the list 1-3, 5, 6 of Sect. 3.3). Within pairwise interactions, therefore, the interior dynamics degenerates. This exceptional case was not described in Sasaki et al. [50]. , which leads to that the degeneracy arises at δ = 2/3 for penalties as well as for rewards

Discussion
We considered a model for the evolution of cooperation through institutional incentives and analyzed in detail the evolutionary game dynamics. We employed public goods games, which typically assume that there are at least three players. Specifically, based on a public good game with optional participation, we fully analyzed how opting-out impacts on game dynamics; in particular, opting-out can completely overcome a coordination problem associated with punishment for a considerably broader range of parameters than in cases of compulsory participation. We start from assuming that there is a state-like institution that takes exclusive control of individual-level sanctions in the form of penalties and rewards. In our extended model, nobody is forced to enter a joint enterprise that is protected by the institutional sanctioning. However, whoever is willing to enter, must be charged at the entrance. Further, if one proves unable or unwilling to pay, the sanctioning institution can ban that person from participation in the game. Indeed, joint ventures in real life are mostly protected by enforceable contracts in which members can freely participate, but are then bound by a higher authority. For example, anyone can opt to not participate in a wedding vow, but once it is taken, it is among the strongest enforceable contracts. As far as we know, higher authorities always demand penalties if contracts are broken.
Based on our mathematical analysis, we argue that institutional punishment, rather than institutional rewards, can become a more viable incentivization scheme for cooperation when combined with optional participation. In spite of the fact that the expected payoffs include nonlinear terms, the corresponding replicator dynamics is completely analyzed: in particular, proving that the interior equilibrium for optional participation is unique and a saddle point plays a key role in solving the global dynamics.
We show that combining optional participation with rewards can only marginally improve group welfare (to the same level as the non-participant's fixed payoff) for a small range of the per capita incentive δ, with δ − < δ < δ r (Fig. 3b). Within this interval, compulsory participation can lead to partial cooperation; however, optional participation eliminates the cooperation and thus drives a population into a state in which all players exit. Hence, freedom of participation is not a particularly effective way of boosting cooperation under a rewards scenario.
Under penalties, the situation changes considerably. Indeed, as soon as δ > δ − (Fig. 2b), the state in which all players cooperate abruptly turns into a global attractor for optional participation. When δ just exceeds δ − , group welfare becomes the maximum (r − 1)c − σ . Meanwhile, for compulsory participation, a largest part of the (boundary) state space between cooperation and defection still belongs to the basin of attraction of the state in which all players defect. Because δ − = c/n, where n is the group size, and c is the net contribution cost (a constant), when n is larger, the minimal sanctioning cost δ − to establish full cooperation is smaller.
Collaborating results for compulsory participation have recently been obtained in continuous public good games with institutional incentives by Cressman et al. [12], who considered the gradual evolution of continuously varying contribution to a public good. The authors show that rewarding and punishing with probabilities depending on the player's contribution and those of the co-players, can destabilize full defection and stabilize full cooperation, respectively. This model also indicates that combining the best of both incentives would lead the population to full cooperation, irrespective of the initial condition. Looking back at our model, non-participation reflects the common characteristic of destabilizing full defection; thus, it would be fascinating to investigate how efficiently voluntary rewards [28,48,54], instead of voluntary participation, can establish coercion-based cooperation.
In the next two paragraphs, we consider only the penalty scenario and the corresponding coordination situation. There are various approaches to equilibrium selection in n-person coordination games for binary choices [19,29,34]. A strand of literature uses stochastic evolution models [14,33,64], in which typically, a risk-dominant equilibrium [23] that has the larger basin of attraction is selected through random fluctuation in the long run. In contrast, considering optional participation, our model typically selects the cooperation equilibrium which provides the higher group welfare, even if the cooperation equilibrium has the smaller basin of attraction when participation is compulsory than has the defection equilibrium. In the sense of favoring the efficient equilibrium, our result is similar to that found in a decentralized partner-changing model proposed by Oechssler [36], in which players may occasionally change interaction groups.
Higher-order freeloaders are problematic for decentralized peer-to-peer sanctions [11,41]. This is not the case, however, for centralized institutional sanctions. In addition, it is clear that sanctioning institutions will stipulate a lesser antisocial punishment targeted at cooperators [27], which can prevent the evolution of pro-social behaviors ( [44,46], see also [18]). Indeed, punishing cooperators essentially promotes defectors, who will reduce the number of participants willing to pay for social institutions. For self-sustainability, thus, sanctioning institutions should dismiss any antisocial schemes that may lead to a future reduction in resources for funding the institution.
Thus, we find that our model restricts the space of possible actions into a very narrow framework of alternative strategies, while increasing complexity. In practice, truly chaotic situations which offer a very long list of possibilities are unfeasible and create inconvenience, as is described by Michael Ende in "The Prison of Freedom" [1992]. Participants in economic experiments usually can make their meaningful choices only from a short and regulated list of options, as is the way in real life. Our result indicates that a third party capable of controlling incentives and membership can play a key role in selecting a cooperation equilibrium without ex ante adjustment. The question of how such a social order can emerge out of a world of chaos is left entirely open.

A.2 The Saddle Point Q
We prove that for n > 2, Q is a saddle point. We first consider penalties using Eq. (11). Because the square brackets in Eq. (11) vanish at Q, the Jacobian at Q is given by where H = H (f, z) and A = (r − 1)c + δ.
Using ∂H (f, z)/∂z < 0, H > 0, and A > 0, which yields Therefore, Q is a saddle point. We next consider rewards using Eq. (12). Similarly, we find that the Jacobian at Q is given by where H = H (1 − f, z) and A is as in Eq. (30). Using ∂H (1 − f, z)/∂z < 0, H > 0, and A > 0, it follows again that det J Q < 0. Therefore, Q is a saddle point.

A.3 No Homoclinic Orbit of Q
First, we prove that a homoclinic loop that originates from and converges to Q does not exist. Using the Poincaré-Bendixson theorem [30] and the uniqueness of an interior equilibrium, we show that if it does exist, there must be a point p inside the loop such that both of its αand ω-limit sets include Q. This contradicts the fact that Q is a saddle point. Indeed, there may be a section that cuts through Q such that the positive and negative orbits of p infinitely often cross it; however, it is impossible for a sequence consisting of all the crossing points to originate from and also converge to the saddle point Q. Hence, there is no homoclinic orbit of Q.
Next, we show that orbits that form the unstable manifold of Q do not converge to the same equilibrium (indeed, this is a sink). If they do, the closed region that is surrounded by the orbits must include a point q such that its ω-limit set is Q. Using the Poincaré-Bendixson theorem and the uniqueness of an interior equilibrium, the α-limit set for q must include Q; this is a contradiction. Similarly, we can prove that the orbits that form the stable manifold of Q do not issue from the same equilibrium.