1 Introduction

Much of the research suggesting that groups can self-govern comes from linear, boundary solution, public good experiments (Chaudhuri 2011). Such experiments are useful because they are easily explained to subjects and provide the starkest possible equilibrium predictions. That is, the incentives are all or nothing; self-interest suggests complete free-riding while efficiency requires contributing one’s entire endowment. However, social dilemmas observed in the real world are rarely as stark as those presented in linear public good experiments. It is thus important to understand whether peer to peer punishment is as effective in non-linear, i.e. more complex, environments.

Cason and Gangadharan (2015) investigated the effectiveness of peer to peer punishment in non-linear public good (PG) and common pool resource (CPR) experiments. In both treatments, the opportunity to punish increased cooperation. However, the positive impact was too weak to increase welfare (group earnings). Cason and Gangadharan suggest that the non-linear environment limits the effectiveness of peer punishment because the social optimum is harder to identify and, therefore, to coalesce around. These results suggest that cooperative behavior is qualitatively similar across public good and common pool resource experiments. However, the treatments investigated by Cason and Gangadharan are not payoff equivalent and thus it is difficult to compare behavior across treatments.

This paper replicates the findings of Cason and Gangadharan (2015) across payoff equivalent, strategically symmetric, PG and CPR experiments. The contribution of this replication is that the level of cooperation can be quantified and directly compared across conditions. Results suggest that peer punishment induces the equivalent amount of cooperation in both treatments. Despite the increased cooperation, punishment alone does not improve welfare. However, the equivalence of the cooperation-inducing effect across treatments suggests that the same procedures that increase the effectiveness of peer punishment in linear PG experiments may have a similar impact within more complicated social dilemmas. This suggests that self-governance in complex social dilemmas is possible.

2 The experiment

2.1 Experimental design

This paper investigates behavior across PG and CPR experiments with and without the opportunity to punish. Treatments are referred to as PGnoPun, CPRnoPun, PGPun, and CPRPun. All subjects were recruited from the University of Massachusetts Amherst. In each session, 4 groups of 4 students were randomly formed and maintained throughout. At the beginning of each period, each subject was given an endowment of 50 Experimental Dollars (EDs) which they were able to allocate between two accounts. All subjects were instructed to allocate any integer of their EDs to account 1 (the group account) and that the remainder of their endowment would be allocated to account 2 (their private account).

Instructions displayed individual and group payoff tables describing how individual and group period earnings varied across the range of group account allocations.Footnote 1 Individual and group payoff tables were included to ensure that each subject understood the relationship between their individual allocation, their individual earnings, and the earnings of the group. Subjects played one practice period to demonstrate how the game would proceed for the next 15 paid periods. In all treatments, subjects were required to allocate 25 EDs to the group account in the practice period. As shown below, in all treatments, a group account allocation of 25 EDs from each group member maximized and equally distributed group earnings.

After each period, subjects were shown their allocation to the group account, the total allocation to the group account, their period earnings, and their accumulated earnings. They were also shown the allocations of each group member in random order. The order of the presentation of individual allocations was randomized each period to mitigate reputation effects. They were paid privately at the end of the experiment such that 100 ED = $1.

The payoff function in the PG treatments is:

$$\begin{aligned} \pi _{i}=w\left( e-x_{i}\right) +\frac{1}{n}\left[ a\sum _{j=1}^{n}x_{j}-b \left( \sum _{j=1}^{n}x_{j}\right) ^{2}\right] \end{aligned}$$
(1)

where \(w=1\) is the return from the private account, \(e=50\) is the subject’s endowment, \(a=6\), \(b=0.025\), \(n=4\) and \(x_{i}\) is the subject’s group account allocation. The second term represents each subject’s share of the return from the group account. This payoff function indicates that it is in the self-interest of each subject to allocate EDs to the group account until \(\sum x_{j}=\frac{a-nw}{2b}=40\). Thus, the symmetric Nash allocation is 10 EDs.

The payoff function in the CPR treatments is:

$$\begin{aligned} \pi _{i}=w\left( e-x_{i}\right) +\frac{x_{i}}{\sum\nolimits _{j=1}^{n}x_{j}}\left[ a\sum _{j=1}^{n}x_{j}-b\left( \sum _{j=1}^{n}x_{j}\right) ^{2}\right] \end{aligned}$$
(2)

where, compared to the payoff function in the PG treatments, the only difference is how the returns from the group account are distributed.Footnote 2 Again, the second term represents each subject’s share of the return from the group account which is now proportional to their allocation. This payoff function indicates that it is in the self-interest of each subject to allocate EDs to the group account until \(\sum x_{j}=\left( \frac{n}{n+1} \right) \frac{a-w}{b}=160\). Thus, the symmetric Nash allocation is 40 EDs.

Table 1 Experimental design

The social optimum is determined by treating the group as a single entity such that the group account return accrues to the group rather than being distributed across group members. From this perspective, it is in the interest of the group to equate the return from the private account and the aggregate return from the group account. At the group level, the unique social optimum allocation is such that \(\sum x_{j}=\frac{a-w}{2b}=100\) EDs are allocated. Thus, in both the PG and CPR treatments, the symmetric social optimum allocation is 25 EDs. In all treatments, participants earn 112.5 EDs at the symmetric social optimum each period. The symmetric Nash allocations are equally distant from this social optimum at 10 and 40 EDs in the PG and CPR treatments, respectively. In all treatments, participants earn 90 EDs at the symmetric Nash equilibrium each period. Table 1 summarizes the experimental design.Footnote 3

Subjects in PGPun and CPRPun were given the same instructions as their no-punishment counterparts except that their game was augmented with a second decision stage. After each group member made his/her allocation decision, each subject continued to be shown the individual allocations of each group member in random order. Each subject now had the opportunity to pay 1 ED to punish another group member by reducing his/her period earnings by 3 EDs. They were allowed to place multiple sanctions on any number of other group members but had to pay the cost from the current period’s earnings.

2.2 Replication study power calculations

The current study first tests whether the findings presented in Cason and Gangadharan (2015) are replicated. These findings are a significant increase in cooperation with than without punishment in both the PG and CPR games. Given our sample size of 8 independent observations per group, and the observed means and standard deviation in the original study, we have the following power:Footnote 4

  1. (a)

    cooperation is significantly higher at the \(p<5\,\%\) level 14.98 % of the time with than without punishment in the PG game,

  2. (b)

    cooperation is significantly higher at the \(p<5\,\%\) level 34.75 % of the time with than without punishment in the CPR game.

3 Results

A total of 128 subjects participated in the experiment providing a total of 1920 observations. Table 2 displays the collected demographic variables [self-reported grade point average (GPA) and Gender (male = 1)] and the average earnings (after the costs of punishment in brackets) in each treatment. In PGnoPun and CPRnoPun, average earnings were $20.64 and $20.54, respectively. In PGPun and CPRPun, average earnings were $21.19 and $21.08 before, and $19.01 and $17.25 after, the costs of punishment, respectively.

Table 2 Demographics and average allocations, cooperation scores, and earnings

To quantify the level of cooperation on a common metric an index is developed which captures, on a 0–100 scale, the level of cooperation in each treatment (Table 2). In the PG treatments, the index is \((x_{i}/50)\times 100\) and in the CPR treatments, it is \((1-x_{i}/50)\times 100\). In all treatments, the symmetric social allocation (25 EDs) yields a cooperation score of 50. The symmetric Nash allocations (10 and 40 EDs in the PG and CPR treatments, respectively) yield a cooperation score of 20. The importance of the index is to capture the level of cooperation across treatments such that each index score implies the same relative level of cooperation. Table 2 displays the average group account allocations and the average cooperation scores. Mann–Whitney U tests (MW) are presented to investigate average cooperation scores across conditions using group average as the unit of observation (8 independent observations per condition). In line with the qualitative analysis of Cason and Gangadharan (2015), average cooperation scores are predicted to be statistically equivalent with and without punishment.

As shown in Table 2, the average cooperation score was 34.35 and 33.61 in PGnoPun and CPRnoPun, respectively, and was not significantly different (MW \(n=16\); \(z=0.263\); \(p=0.793\)).Footnote 5 Similarly, average earnings are not significantly different across PGnoPun and CPRnoPun at $20.64 and $20.54, respectively (MW \(n=16\); \(z=0.105\); \(p=0.916\)). Therefore, across payoff equivalent PG and CPR experiments cooperation and welfare are statistically equivalent in the absence of punishment.

The opportunity to punish significantly increases average allocations (cooperation scores) from 17.2 EDs to 20.8 EDs (from 34.35 to 41.54) in the PG treatments (MW \(n=16; z=2.522; p=0.012\)) and significantly decreases (increases) average allocations (cooperation scores) from 33.2 EDs to 29.5 EDs (from 33.61 to 41.01) in the CPR treatments (MW \(n=16; z=2.310; p=0.021\)). As a result, peer punishment induces statistically equivalent levels of cooperation in both treatments (MW \(n=16; z=0.525; p=0.599\)). After the costs of punishment are included average earnings are significantly lowered from $20.64 to $19.01 in the PG treatments (MW \(n=16; z=3.048; p<0.01\)) and from $20.54 to $17.25 in the CPR treatments (MW \(n=16; z=2.521; p=0.012\)). Just as in the absence of punishment average earnings are statistically equivalent across the punishment treatments (MW \(n=16; z=0.105; p=0.916\)).

Beyond the effect of punishment on cooperation, it is of interest to investigate the use of punishment across treatments. Following Nikiforakis (2008), a hurdle model is estimated to understand both the decision to punish, as a binary choice, and the amount of punishment received. The independent variables include an indicator for the PG treatment, the amount that subject \(j^{\prime }s\) cooperation score deviates above and below the group’s period average cooperation score, the period’s average cooperation score, and the period to account for a time trend.

Column 1 (Table 3) displays the average marginal effects of the independent variables on the probability of receiving punishment. Results suggest that a below-average cooperation score increases the probability that one is punished and that the probability of being punished decreases with repetition. Importantly, the PG treatment indicator is insignificant suggesting that there is no systematic variation in one’s probability of being punished across treatments.

Table 3 Punishment behavior

Column 2 (Table 3) displays the average marginal effects of the independent variables on the amount of punishment received. Here, cooperation scores above and below the group’s period average increase the amount of punishment received. While punishing group members who cooperate more than average is not intuitive note that the punishment received is 3.2 times higher when one’s cooperation score is below average. For example, a one unit increase (decrease) above (below) the group’s period average cooperation score increases the amount of punishment received by 0.18 (0.58) EDs. Further, when the model is restricted to only those observations for which the sum of the groups’ cooperation scores is 2 or less the impact of above average cooperation scores is only marginally significant (0.18 (0.11)*).Footnote 6 Period remains negative and significant suggesting that the level of punishment decreases with repetition.

4 Discussion and conclusion

This paper investigated the effectiveness of peer punishment across payoff equivalent PG and CPR experiments. Using an index to quantify the level of cooperation, results suggest that the cooperation-inducing effect of peer punishment is quantitatively equivalent across treatments. Despite the increased cooperation, the opportunity to punish reduces welfare in both treatments. Importantly, the similarity observed across treatments suggests that there is no fundamental difference in a groups’ capacity for cooperation across PG and CPR experiments.

The linear PG literature suggests that peer punishment is more effective when (1) established groups interact over longer durations (Gachter et al. 2008), at least when counter-punishment is not possible (Engelmann and Nikiforakis 2015), in and (2) are able to employ a cost-effective, high impact, form of punishment (Sefton et al. 2007; Nikiforakis and Normann 2008; Egas and Riedl 2008). Whether the required duration, with or without counter-punishment, and impact of punishment are positively related to complexity, however measured, is an open empirical question. These results motivate future research to determine the capacity for self-governance in complex social dilemmas.