1 Introduction

Social dilemma situations in which the collective interest is at odds with private interests are widespread. Cooperation among decision makers leads to the Pareto optimum, but free riding is a dominant strategy and results in a Pareto inferior outcome. In this paper we analyze a specific social dilemma that involves different benefits from cooperation to different types of players (benefit heterogeneity) and that introduces leadership (leading by example) either by appointment (exogenous leadership) or by self-selection through volunteering or voting (endogenous leadership).

Our results have implications for public goods provision with heterogeneous benefits. Team work is a relevant example, with a team output that is a public good, and with sometimes different benefits from this public good by different team members, depending on the contractual situation. Leading by example seems to be important in many cases. Take, for instance, an academic project that leads to a paper. Somebody whose tenure clock is ticking has different benefits from a joint project than somebody who has just received tenure. On a more global scale, greenhouse gas emission reduction as a global public good involves both heterogeneous benefits and the necessity of leading by example. Countries clearly have different benefits from CO2 abatement investments. Similarly, the global coordination problem will require countries or specific regions to take first steps and “lead by example”, because all countries will never be aligned in climate policies. Our results, in a nutshell, seem to indicate that it requires additional instruments such as coercion power, sanctioning institutions, or communication to make leading by example work under benefit heterogeneity, regardless of whether leaders are appointed or volunteer.

Economists have analyzed social dilemmas in the context of public goods provision. In the experimental laboratory, the simultaneous linear public goods game (also known as the voluntary contribution mechanism) has been the main workhorse to study cooperation empirically (Isaac et al. 1985; Ledyard 1995; Chaudhuri 2011). A general finding from economic experiments on the simultaneous public goods game is that decision makers are willing to cooperate, i.e. willing to contribute voluntarily to the public good, but that cooperation declines over time unless there is an enforcement mechanism such as peer punishment.

Leading by example turns the simultaneous linear public goods game (partly) into a sequential game, without changing incentives or enforcement possibilities. The first-mover can set an example with her contribution, but has no other means of coercion. Many existing experimental results indicate that leading by example leads to higher levels of contributions (Dannenberg 2015; Güth et al. 2007; Moxnes and van der Heijden 2003; Pogrebna et al. 2011; Rivas and Sutter 2011). However, there are also several studies reporting weak and non-significant leadership effects (Gächter and Renner 2018; Gürerk et al. 2018; Haigner and Wakolbinger 2010; Jack and Recalde 2015; Potters et al. 2007; Sahin et al. 2015).Footnote 1 Regardless of finding an average effect on contributions or not, almost all studies show that there is a positive correlation between leaders’ and followers’ contributions.

Heterogeneity of group members—though common outside the experimental laboratory—is often not considered explicitly in experiments. The general gist of the existing results is that heterogeneity tends to lead to less cooperation. If group members have other-regarding or pro-social concerns, they have to coordinate on cooperating, respectively on the level of cooperation; heterogeneity makes coordination more difficult. Existing studies finding negative effects of heterogeneity or null effects have, for instance, looked at heterogeneity in endowments (Buckley and Croson 2006; Chan et al. 1999; Charness et al. 2014; Cherry et al. 2005; Ostrom et al. 1994; Reuben and Riedl 2013), heterogeneity in benefits through different returns from the public good (Fischbacher et al. 2014; Fisher et al. 1994; Kube et al. 2015; Reuben and Riedl 2013), heterogeneity in the source of endowment (earned endowment versus allocated endowment; Oxoby and Spraggon 2013), and heterogeneity from other observable characteristics such as religion, ethnic affiliation, nationality, or other identities (e.g., Chen et al. 2014; Habyarimana et al. 2007).

The current paper combines benefit heterogeneity and leading by example. We are particularly interested in populations with benefit heterogeneity, not only because this type of heterogeneity is ubiquitous outside the experimental laboratory, but also considering that the normative conflict associated with benefit heterogeneity is difficult to be resolved. Previous studies have shown that, when people obtain different benefits from the public good, there is a normative conflict between contribution equality and payoff equality. Specifically, high-benefit group members consider equal contributions of all group members as the social norm, whereas low-benefit group members try to enforce the norm that all group members earn the same. Hitherto it is unclear, however, whether leading by example helps promoting cooperation under normative conflict caused by benefit heterogeneity.

In our laboratory experiment, groups of four members can contribute to a linear public good. Two group members have a higher return rate from the public goods than the other two, but it is still a dominant strategy for all group members to contribute nothing to the public good, i.e. to free-ride. Four treatments allow us to study the effects of leadership by example. We implement two treatments in which either one randomly selected low-benefit member or one randomly selected high-benefit member contributes first, and her contribution level is communicated to the other three members that then contribute simultaneously. A baseline treatment requires all four members to contribute simultaneously. We introduce a fourth treatment that allows all members to volunteer for leadership. It changes the exogenous assignment of leadership by example to an endogenous assignment (choice) by self-selected leaders.

We are the first to implement an experiment with leadership by example by group members with benefit heterogeneity using a linear public goods setting. There is a nascent literature in experimental economics that looks at heterogeneity, more generally, in social dilemmas with leaders. For instance, heterogeneity in endowments (Levati et al. 2007; Neitzel and Sääksvuori 2013), heterogeneity from different identities (Drouvelis and Nosenzo 2013), heterogeneity in the length of group membership (Angelova et al. 2019), heterogeneity in religions (Keuschnigg and Schikora 2014), and heterogeneity in opportunity costs (Au and Chung 2007; Collins 2016; Dasgupta and Orman 2013) have been considered in different studies.Footnote 2

Levati et al. (2007) find that leading by example works effectively in heterogeneous endowment populations if all group members rotate in the leader’s role, whereas Neitzel and Sääksvuori (2013) do not find such a positive effect with a fixed group member being the leader in repeated interaction. Drouvelis and Nosenzo (2013) show that group members having the same group identity fosters the effectiveness of leading by example, whereas Keuschnigg and Schikora (2014) and Angelova et al. (2019) find that leading by example is likely to reduce cooperation in culturally diverse populations and in communities with group members of different group membership tenures, respectively. Au and Chung (2007), Collins (2016) and Dasgupta and Orman (2013) investigate heterogeneous opportunity costs of contributing. They report evidence that contributions are higher when subjects with low opportunity costs/higher efficacy contribute first compared to when subjects with high opportunity costs/lower efficacy contribute first.Footnote 3

Ananyev (2019) concurrently developed a similar setup for leading by example with heterogeneous benefits. He finds that leading by example does not promote cooperation with heterogeneous benefits, which is in line with our results. He also implements a voting treatment, in which voters can determine the benefit level, i.e. the type, of the leader, but not the leader herself, and most voters prefer the high benefit type to become leader. In contrast, our endogenous treatment is based on volunteering for leadership.

Our treatment with self-selected leaders also adds to the literature on endogenous leadership (e.g., Arbak and Villeval 2013; Bruttel and Fischbacher 2013; Cappelen et al. 2016; Dannenberg 2015; Haigner and Wakolbinger 2010; Préget et al. 2016; Rivas and Sutter 2011). In general, there is a tendency that self-selected or endogenous leadership works more effectively than assigned leadership, but details matter.Footnote 4

Our study also contributes to the experimental work studying the effect of various mechanisms on cooperation in groups with benefit heterogeneity. Punishment or reward are found to be less effective in increasing contributions in heterogeneous groups than in homogeneous groups, and have no or limited impact on group efficiency in heterogeneous groups (Gangadharan et al. 2017; Kölle 2015; Nikiforakis et al. 2012; Reuben and Riedl 2009, 2013). Among these studies, Gangadharan et al. (2017) show that on top of the reward or punishment effect, communication has a positive impact on contributions and group efficiency, but the effects are still smaller than in a homogenous benefit environment. It seems that normative conflict has a negative impact on group efficiency and undermines the power of otherwise effective mechanisms in promoting cooperation.

The findings from our experiment suggest a limited effect of leadership by example with heterogeneous benefits. With exogenously imposed leadership, we observe a significantly slower decline in contributions relative to the baseline. However, our baseline treatment and the two treatments with exogenous leadership do not differ statistically in terms of average contribution levels, irrespective of whether a high-benefit member or a low-benefit member is the leader. When subjects are given the opportunity to self-select into the role of leader, contributions are significantly higher with self-selected leadership than without. Self-selected leadership, in particular self-selected low-benefit leadership, can also raise contributions significantly over the baseline level. However, we do not observe a high enough willingness to lead for high-benefit members, and the willingness for low-benefit members to lead is decreasing quickly over time. This trend, combined with the fact that contributions are at a low level when nobody in the group volunteers to be the leader, weakens, on average over all groups, the positive effect otherwise brought about by self-selected leadership. Consequently, there is only a slight increase in contributions, on average, in the endogenous leadership treatment over the baseline level.

Our data reveal that imperfect conditional cooperation by followers, i.e. contributing less than leaders that set good examples, combined with conflicts about the social norm regarding the “adequate” contribution level within heterogeneous group, hampers the effectiveness of leading by example. The conflicts are reflected in the conditional cooperation pattern of followers. Followers whose benefits are different from the leader try to reciprocate according to their perceived contribution norm: when led by high-benefit leaders, low-benefit followers reciprocate on a lower level than high-benefit followers; when led by low-benefit leaders, high-benefit followers reciprocate similarly as low-benefit followers. As a consequence, setting good examples does not yield higher profits for low-benefit leaders, but rather makes them suffer from a larger income inequality to their disadvantage; they thereby quickly decrease their contributions when in the role of leaders or refrain from becoming leaders in later periods. For high-benefit leaders, even though, on average, it pays for them to contribute more, we do not observe high enough leader contributions or a widespread willingness to become leader. Apart from the influence of social preference, pessimistic beliefs and the fear of losing out because of insufficient follower contributions seem to constitute major reasons for low contributions and the missing willingness to volunteer for leadership.

Our design also allows us to compare self-selected leadership with imposed leadership. The findings show that self-selected leaders—in particular low-benefit leaders—indeed set better examples than imposed leaders. However, compared to imposed leaders, self-selected leaders are exploited more strongly by followers—particularly by those whose benefit type is different from the leader. In consequence, we only observe a slight increase in average contributions of early periods when low-benefit leaders go from being imposed to being self-selected. Moreover, we do not find evidence that high-benefit leadership is more effective than low-benefit leadership. If anything, contributions with self-selected low-benefit leaders are on average higher than contributions with self-selected high-benefit leaders in the short run.

The remainder of the paper is organized as follows. Section 2 describes the design and procedures of the experiment. Section 3 presents the main results, and Sect. 4 provides concluding remarks.

2 Experimental design, procedures, and theoretical expectations

2.1 Experimental design and procedures

Our basic game is a four-person linear public goods game that is repeated for ten periods in fixed groups. In each period, each of the four group members receives an endowment of 20 tokens and is asked to decide about how many tokens to contribute to a group account. The tokens not contributed remain in one’s private account. Each group member’s contribution to the group account in period t, \({C}_{it}\), must satisfy 0 ≤ \({C}_{it}\)  ≤ 20. The payoff function for an individual i in period t is

$$\pi_{it} = 20 - C_{it} + \beta_{i} \times \sum\nolimits_{i = 1}^{4} {C_{it} }$$

Among the four group members, two subjects are randomly selected to be of type A (low-benefit members) and two of type B (high-benefit members).Footnote 5 The marginal per-capita return from the public account (\({\beta }_{i}\)) is set at 0.4 for members of type A and 0.8 for members of type B. That is, each token a subject keeps in her private account is worth 1 point to her, regardless of her type; in addition, she earns 0.4 points for each token all group members (including herself) contribute to the group account if she is of type A, while she earns 0.8 points for each token she or any other group member contributes to the group account if she is of type B. At the beginning of the first period, each group member is randomly assigned an ID from 1 to 4. They learn their own types and IDs (that remain the same throughout the experiment). Design details are described in the experimental instructions, and by reading them aloud at the beginning of the experiment they are made common knowledge to all participants.

We implement the following four treatments in a between-subject design: (1) Baseline (BASE): All 4 group members make private contribution decisions simultaneously. (2) Exogenous high-benefit leader (HBL): One high-benefit member is randomly selected in each period as the leader. The leader contributes first, and the other three members contribute simultaneously, after receiving information about the leader’s contribution. (3) Exogenous low-benefit leader (LBL): Similar as HBL, except that the leader is randomly chosen from the two low-benefit members in each period. (4) Endogenous leader (EN): In each period, all members could choose whether they want to become leader or not. If none of the four members chooses to become leader in a given period, the four group members contribute simultaneously and privately in that period, just like in BASE; if there is only one member who chooses to become leader in a period, this member makes her contribution decision before the other three group members, just as in HBL or LBL (depending on the type of the volunteer); if there are at least two members who are willing to become leader, a random draw determines the actual first mover in that period. After their choice regarding contributing first, those subjects who have volunteered learn whether they are leader for the given period. For those who have volunteered but are not chosen as leader, they obviously learn implicitly that they are not the only group member to volunteer.

We follow Gächter and Renner (2010) in how beliefs are elicited. Beliefs are elicited in each period of the game, after subjects have made their contribution decisions. Specifically, in the baseline treatment, we ask participants to estimate the average of the other players’ contributions within their group, for each type separately. In the leadership treatments, we ask the leader about her estimate of how many tokens the two different-type followers would contribute on average, and how many tokens the other same-type follower would contribute; each follower needs to submit her estimate of the other followers’ average contribution, for each type separately, after having seen the leader’s contribution. For subjects who are requested to submit two estimates in a period, one estimate is randomly selected to count for their earning. If the belief is correct, the subject receives an additional 3 points; if the belief differs by 1(2) points, the subject receives 2(1) points; in all other cases, the subject receives nothing from the estimate.

At the end of each period, subjects get feedback including each group member’s type, contribution to the group account, income (excluding earnings from estimates) and identity within the group in leadership treatments (i.e. whether one is first mover or not). They are also informed about their own income from the estimates. Every period of play counts towards final earnings.

After the ten periods, all treatments are followed by a monetarily incentivized social value orientation questionnaire, known as the ring test (Liebrand 1984; Liebrand and McClintock 1988). Subjects have to make binary choices in 24 different allocation tasks. In each task, a subject has to choose among two allocations that allocate money to herself and another anonymous recipient. All 24 decisions are paid and the pairing is fixed throughout this part. By adding up the subject’s 24 decisions, we obtain the total sum of money allocated to herself (x-amount) and to the recipient (y-amount). The subject’s social value orientation is calculated as the angle of the vector \(\uptheta\) that results from the ratio x/y. Based on the ratio x/y, one can assign each subject to one of eight categories of social orientation (individualism, altruism, cooperation, competition, martyrdom, masochism, sadomasochism, and aggression). A more accurate measure of social value orientation is the exact angle \(\uptheta\), positive in the first quadrant and negative in the fourth quadrant. Almost all subject ratios lie in these two quadrants: thus, the larger this angle, the more pro-social the subject. We will use this measure in our analysis.Footnote 6 From the 24 decisions, one can also measure a subject’s consistency in making allocation choices. The length of the vector can serve as the consistency measure: it is equal to 30 if a subject makes 24 consistent choices and 0 if the choices are perfectly random. The longer the vector, the more consistent are a subject’s decisions. When using the data from the social value orientation test, we consider only subjects with a consistency measure of at least 50%.Footnote 7 At the end of the experiment, subjects learn their total income from the main part of the experiment and from the ring test.

The experiment was conducted in the MELESSA laboratory at the University of Munich in May 2014 and September 2015. A total of 236 subjects were recruited via ORSEE (Greiner 2015). Subjects remained anonymous throughout the experiment, and cash payments were made privately. The experiment was programmed and conducted with the software z-Tree (Fischbacher 2007). We conducted two sessions for each of the treatments BASE, HBL and LBL, and four sessions for treatment EN, with 24 subjects in each session.Footnote 8 At the beginning of each session, subjects received the instructions for the public goods game. The instructions for the ring test were handed out to subjects after the end of the main part. However, subjects knew that there would be a second part after the ten periods of the main part and that it would be unrelated to the first part. Instructions were written in neutral language, avoiding terms such as “leader”, “follower”, etc. In order to test the understanding of the rules and the incentive structure subjects were asked to answer control questions after reading the instructions aloud. The experiment did not proceed until all subjects had answered all questions correctly. Sessions lasted, on average, for about 75 min, and subjects earned approximately € 13.7, on average.

2.2 Theoretical expectations

Obviously, leadership does not change the standard prediction of zero contributions based on dominant strategies in the public goods games. We know, however, that decision makers contribute voluntarily to the public good, and that leading by example has the potential of increasing contribution levels in comparison to simultaneous contribution decisions, although the observed effects depend on details of the setup.

Our first set of research question pertains to leadership effects. Does leadership increase overall contribution levels even in presence of heterogeneous benefits from the public good? Does the effectiveness of leadership depend on the leader type, i.e. whether a low-benefit member becomes the leader or whether it is a high-benefit member? On the one hand, heterogeneity might make signaling by leaders and coordination among group members more difficult. On the other hand, the leadership signal might be stronger, particularly when it comes from the low-benefit member.

Our second set of research questions pertains to the process of how leadership is determined. We compare exogenously assigned leadership and self-selected leadership. Do self-selected leaders contribute more than leaders that are exogenously assigned? Are high-benefit members or low-benefit members more likely to volunteer as leaders? We know from previous research that self-selected leadership has the potential to be more effective than assigned leadership. Whether there are different inclinations to volunteer for different benefit types, is an exploratory question. It could well be that groups realize that high-benefit leaders are more effective and thus self-select more often; however, a similar argument could be made for low-benefit members, whose signaling leverage is perhaps larger.

Given that standard models with other-regarding preferences or reciprocal concerns do not yield unambiguous predictions for our setup, we think that it is best to formulate theoretical expectations based on the tendencies in related existing studies. Our main hypotheses are:

H.1: Leadership—regardless of whether by low-benefit or high-benefit members—increases contribution levels, even in the presence of benefit heterogeneity, compared to simultaneous contributions decisions.

H.2: Endogenous (self-selected) leadership increases contribution levels over exogenously assigned leadership, regardless of whether by low-benefit or high-benefit members.

H.3: High-benefit members have more ressources for effective leadership and are thus more effective than low-benefit members in fostering cooperation and raising average contribution levels.

3 Experimental results

We organize the presentation of our results in the following way. Section 3.1 compares average contributions across the treatments and situations. It provides tests of the three hypotheses in Sect. 2.2. The remaining analysis is more exploratory. In Sect. 3.2, we investigate the structure and determinants of self-selected leadership. Section 3.3 studies contribution behavior of leaders and followers. Unless specified differently, the non-parametric tests are two-sided Mann–Whitney rank sum tests, with each group as a statistically strictly independent observation.

3.1 Treatment differences

The upper panel of Table 1 and the left panel of Fig. 1 give an overview of the average contributions by treatment over time.Footnote 9 Contributions start out at about 50% of the total endowment in all treatments and decrease over time in varying degrees. We first compare contributions in the two exogenous leadership treatments with those in the baseline treatment. Subjects in BASE contribute, on average, 7.2 points, which corresponds to 36% of their endowment. Average contributions are slightly higher in HBL (9.17) and in LBL (8.21), but the differences are not significant (both p > 0.35).Footnote 10

Table 1 Average contributions by treatment and by state in EN
Fig. 1
figure 1

Average contributions by treatment (left) and by state in EN (right) over time

As the right panel of Fig. 1 indicates, there are three possible states of the world in EN: (1) the actual leader is a high-benefit member (which we will refer to as EN_HBL), (2) the actual leader is a low-benefit member (EN_LBL), and (3) nobody volunteers, hence the group has no leader (EN_NL). Using a two-sided Wilcoxon signed-rank test by including only those groups that experienced at least two of these states, we find that, relative to the situation without self-selected leaders, average contributions are significantly higher with self-selected leaders, irrespective of the leader type (4.61 in EN_NL vs. 9.24 in EN_HBL, p < 0.001, N = 21; 4.83 in EN_NL vs. 11.43 in EN_LBL, p < 0.001, N = 18).

We next compare contributions in the three states of EN (as shown in the lower panel of Table 1), to contributions in BASE. It turns out that average contributions with self-selected leadership (EN_HBL + EN_LBL) are significantly higher than those in BASE (9.73 vs. 7.20, p < 0.05, N = 36).Footnote 11 However, average contributions in EN_NL are significantly lower than those in BASE (4.61 vs. 7.20, p < 0.01, N = 33).Footnote 12 As we will analyze in greater detail in Sect. 3.2, the state of no self-selected leadership occurs 26% of the time. The negative effect on contributions without self-selected leadership countervails the positive effect brought by self-selected leadership, leading to an insignificant difference in contributions between the endogenous treatment (aggregating over the periods with and without self-selected leaders) and the baseline treatment (8.38 vs. 7.2, p = 0.5).Footnote 13

Does the benefit type of the leader matter for contributions? In contrast to our prediction, there is no significant difference in contributions between the two exogenous leadership treatments (p = 0.71). This is in contrast to existing findings that groups are more cooperative when assigning the advantageous types (e.g., more productive) as first movers (Au and Chung 2007; Collins 2016; Dasgupta and Orman 2013). In our case, when leadership is assigned endogenously, low-benefit leadership seems to be more effective than high-benefit leadership, at least in the early periods (12.49 in EN_LBL vs. 10.22 in EN_HBL in the first five periods, p < 0.01, N = 18; 7.89 in EN_LBL vs. 7.63 in EN_HBL in the final five periods, p = 0.68, N = 10).

Does self-selected leadership yield higher contributions than imposed leadership? This does not seem to be case under high-benefit leadership (9.12 in EN_HBL vs. 9.17 in HBL, p = 0.72, N = 35). Relative to imposed low-benefit leadership, self-selected low-benefit leadership seems to be more effective, yet, the difference is significant only in the early periods (11.64 in EN_LBL vs. 9.0 in LBL in the first five periods, p = 0.05, N = 33; 9.62 in EN_LBL vs. 7.42 in LBL in the final five periods, p = 0.22, N = 26).

Table 2 reports results for a random effects regression of individual contributions across treatments (and states). Model (1) only includes the four treatment dummies as independent variables, and it confirms the non-parametric results from above. In Model (2), we further split the treatment dummy “EN” into three state dummies. Both the coefficients of “EN_HBL” and “EN_LBL” are positive and significant, indicating that contributions with self-selected leaders are significantly higher than the baseline level, regardless of the leader type. The significantly negative coefficient of “EN_NL” clearly reflects the drawback in EN: when nobody is willing to take the lead and all group members contribute simultaneously, contributions are significantly lower than in BASE.Footnote 14

Table 2 Random effects regression of individual contributions

Models (3) to (6) add controls for individual social preference using subjects’ ring test scores, which allow us to identify—at least tentatively—whether some of the significant effects found in the non-parametric analysis, i.e. those that compare exogenous with self-selected roles, are due to pure self-selection effects. The time trend is also included. In Model (3) the coefficient of the treatment dummy “EN” becomes significant at the 5%-level. There is no significant difference in contributions between “HBL” and “LBL” (p = 0.73, Wald test). We thus pool the two exogenous leadership treatments in Model (4). In line with the non-parametric results, there is a significant increase in contributions in the pooled exogenous leadership treatments as compared to the baseline treatment.Footnote 15 However, given the small magnitude of the coefficients “EXO_L (HBL/LBL)” and “EN”, both ways of implementing leading by example seem to have only a limited effect in promoting contributions. Model (5) further shows that the small effect of the pooled exogenous leadership treatments is mainly driven by a significantly slower decay in contributions.Footnote 16 Model (6), where we add control variables on top of Model (2), shows that our main results are robust. We summarize our main findings so far:

Result 1a: (RELATED TO H.1) When leaders are assigned exogenously, neither type of leadership has a significant influence on contributions, albeit we observe a significantly slower decay in contributions in the exogenous leadership treatments than in BASE.

Result 1b: (RELATED TO H.1) When subjects are given the choice to self-select into leadership, contributions are significantly higher with self-selected leaders than when nobody volunteers. Contributions with self-selected leaders, in particular self-selected low-benefit leaders, are also significantly higher than those in BASE; however, total contributions are only slightly higher in EN than in BASE, due to the cost of failed leadership implementation in terms of low contributions in such cases.

Result 2: (RELATED TO H.2) While contributions with self-selected and imposed high-benefit leaders are similar, contributions with self-selected low-benefit leaders are slightly higher than those with imposed low-benefit leaders, particularly in early periods.

Result 3: (RELATED TO H.3) When leadership is randomly imposed, the leader’s benefit type has no significant influence on contributions. When leadership is self-selected, low-benefit leaders contribute more than high-benefit leaders, particularly in early periods.

3.2 Structure and determinants of self-selected leadership

So far, our results have indicated the positive effect of self-selected leadership and the cost of failed leadership implementation. Under benefit heterogeneity, does self-selected leadership become more frequent over time? Are high-benefit members or low-benefit members more likely to volunteer as leaders? What determines subjects’ volunteering decisions? In this section, we analyze the structure and determinants of self-selected leadership in treatment EN.

Overall, leadership is implemented 74% of the time. Broken down, 46% of the time the group has a self-selected high-benefit leader and 28% of the time a self-selected low-benefit leader. The left panel of Fig. 2 plots the frequencies of the three possible states in EN over time.Footnote 17 We observe that each type of self-selected leadership occurs over 40% of the time in the first period. While the instances of EN_HBL remain fairly stable over time, the instances of EN_LBL decrease (Spearman’s \(\uprho\) = − 0.21, p < 0.01 for EN_LBL, and Spearman’s \(\uprho\) = − 0.02, p = 0.74 for EN_HBL).Footnote 18 Towards the end, EN_NL takes up about 50% of the cases. Apparently, self-selected leadership occurs less frequently in our heterogenous groups than in homogenous groups in previous studies (Rivas and Sutter 2011; Préget et al. 2016).Footnote 19

Fig. 2
figure 2

Evolution of the three states and volunteering in EN. “HB” and”LB” represent high- and low-benefit members in EN

The dynamics of states is consistent with subjects’ volunteering patterns. As shown in the right panel of Fig. 2, 33% of low-benefit members want to be the first mover in the first period, and this proportion is decreasing over time (Spearman’s \(\uprho\) = − 0.25, p < 0.001), whereas the proportion of high-benefit volunteers fluctuates around 40% over the entire experiment. High-benefit members, on average, volunteer to be the leader more often than low-benefit members (39.4% vs 22.7%, p = 0.01, two-sided Wilcoxon signed-rank test; p < 0.01, chi-square test). A random effects probit regression on subjects’ willingness to be the leader (as shown in Models (1) to (3) of Table 3) confirms that the non-parametric results hold when controlling for individual social preferences. Importantly, the regressions reveal that social preferences are an important determinant of volunteering for leadership: the more pro-social a subject, the more likely she is willing to be the first mover.

Table 3 Determinants of the decision to lead in EN (random effects probit)

To understand why the simultaneous move structure becomes more frequent over time, we go on to investigate whether being leader pays off, on average, for the two benefit types.Footnote 20 For low-benefit members, on average, they end up better off when being followers than when being leaders (29.64 vs. 23.36, p < 0.0001, two-sided Wilcoxon signed-rank test). Surprisingly, being leader is, on average, not more profitable than being in a state without a leader (23.36 vs. 24.48, p = 0.51, two-sided Wilcoxon signed-rank test). Rather, when being a leader, one bears an earning disadvantage within the group: low-benefit leaders’ average earnings are 25% lower than low-benefit followers’ and 47% lower than high-benefit followers’ earnings. This can explain why low-benefit members’ willingness to volunteer decreases over time.

High-benefit leaders earn more, on average, than when there is no leader (34.78 vs. 28.43, p < 0.01, two-sided Wilcoxon signed-rank test). Furthermore, there is no statistically significant difference in earnings relative to low-benefit followers (5.22 vs. 3.85, p = 0.52, two-sided Wilcoxon signed-rank test). This may explain why high-benefit members are more often willing to become the leader than low-benefit members.

We are interested in the reasons why high-benefit members do not self-select into leadership more often. One possible explanation is that they wait for others to volunteer because leadership is still less profitable for high-benefit members than being follower (34.78 vs. 40.37, p = 0.02, two-sided Wilcoxon signed-rank test). Another potential reason could be disappointment with the lack of cooperativeness by other members, in particular by the same-type member. In order to address the latter aspect, we analyze how subjects’ willingness to remain the leader depends on events in the previous period, as shown in Models (4) and (5) of Table 3. Apparently, for leaders the willingness to re-volunteer strongly depends on the responsiveness of followers in the previous period: the less responsive followers are in terms of contributions, the less likely the previous leader chooses to volunteer again. Differences exist across different benefit types. While low-benefit leaders take the responsiveness of both types of followers into account, high-benefit leaders focus strongly on the responsiveness of the high-benefit follower.Footnote 21

Result 4: Low-benefit members exhibit a decreasing trend to volunteer for leadership; high-benefit members are overall more likely to volunteer for leadership than low-benefit members, yet the proportion never exceeds 50%. Social preferences and other members’ responsiveness in terms of contributions are important determinants of the willingness to become leader.

3.3 Leader and follower contributions

The success of leadership relies on two factors: leaders setting good examples, and followers responding to the leaders’ examples. To understand treatment effects in more detail, in this section, we explore how leader and followers behave in different leadership treatments and states.Footnote 22 Figures 3 and 4 present contribution dynamics of leaders and followers under high- and low-benefit leadership, respectively: the left panel shows exogenous leadership, and the right panel self-selected leadership. For reasons of comparison, we add contribution dynamics of group members in the baseline treatment, as illustrated by the two dashed lines.

Fig. 3
figure 3

Average contributions of leaders and followers under high-benefit leadership (HBL). “HL” represents high-benefit leaders. “HF”/“LF” represents high- and low-benefit followers respectively, while “HB”/“LB” represents high- and low-benefit members in BASE

Fig. 4
figure 4

Average contributions of leaders and followers under low-benefit leadership (LBL). “LL” represents low-benefit leaders. “HF”/“LF” represents high- and low-benefit followers respectively, while “HB”/“LB” represents high- and low-benefit members in BASE

We start the analysis by looking at leaders’ behavior. In the presence of normative conflicts, do leaders set good examples? Do self-selected leaders contribute more than leaders that are exogenously assigned? As shown in Fig. 3, contributions of high-benefit members decline significantly over time in BASE (Spearman’s \(\uprho\) = − 0.49, p < 0.01). In contrast, high-benefit leaders’ contributions are fairly stable (Spearman’s \(\uprho\) = − 0.02, p = 0.81 in HBL; Spearman’s \(\uprho\) = − 0.04, p = 0.69 in EN_HBL). Over all periods, high-benefit leaders contribute more than their same-type counterparts in BASE, but the difference is only significant in EN_HBL (p = 0.18 for HBL; p = 0.01 for EN_HBL).Footnote 23 Leaders’ contributions are higher in EN_HBL than in HBL, though the difference is not significant (p = 0.51).

Low-benefit leaders’ behavior is shown in Fig. 4. Leaders in LBL start out at a medium contribution level that is only slightly smaller than the initial leader contribution in HBL. However, contributions drop quickly to about half of the contribution of high-benefit leaders. There is an overall decreasing trend in leaders’ contributions in LBL (Spearman’s \(\uprho\) = − 0.22, p = 0.02), while this is not the case in EN_LBL with self-selected leaders (Spearman’s \(\uprho\) = 0.07, p = 0.55).

Nonetheless, under both forms of low-benefit leadership, leaders’ contributions are significantly higher than the same-type counterparts in BASE (p = 0.04 in LBL; p < 0.01 in EN_LBL), and leaders’ contributions are significantly higher in EN_LBL than in LBL (p < 0.01).

Table 4 reports the results of a random effects regression comparing contributions of leaders and their same-type counterparts in BASE. Models (1) to (3) compare between high-benefit leaders and their high-benefit counterparts, while Models (4) to (6) compare between low-benefit leaders and their low-benefit counterparts. The coefficients of treatment/state dummies in both Models (1) and (4) are significantly positive, indicating that high-benefit leader and low-benefit leader contributions are significantly higher than their corresponding counterparts in BASE, regardless of the way how leadership is generated. The magnitude of the effect, however, is largest for EN_LBL. Wald tests further confirm that self-selected low-benefit leaders contribute significantly more than imposed low-benefit leaders (p < 0.001), whereas there is no significant difference between self-selected and imposed high-benefit leaders (p = 0.55).Footnote 24 We get qualitatively the same results when controlling for the time trend and leaders’ social preferences, as shown in Model (2) and (5), indicating that leadership behavior cannot simply be attributed to time or pure self-selection effects.Footnote 25

Table 4 Leaders’ contributions across treatments and states
Table 5 Followers’ contributions across treatments and states

In Models (3) and (6), we add the interaction terms between the time and treatment/state dummies. Model (3) shows that high-benefit leaders’ contributions have a completely different time trend than their counterparts. An F-test further confirms the stable trend of high-benefit leaders’ contributions by failing to reject the null hypothesis that the combined effect of Period and Period*HBL (or Period*EN_HBL) is equal to zero (p = 0.32 for HBL and p = 0.17 for EN_HBL). The picture looks a bit different for low-benefit leaders. We only find a stable trend of low-benefit leaders’ contributions in EN_LBL (p = 0.52 for the combined effect of Period and Period*EN_LBL; p = 0.06 for the combined effect of Period and Period*LBL). These results confirm the Spearman rank correlation analysis.

Result 5a: Over all periods, leaders, in particular self-selected leaders, contribute significantly more than their same-type counterparts in treatment BASE. The magnitude of the effect is largest for self-selected low-benefit leaders.

Result 5b: Self-selected leaders tend to contribute more than imposed leaders, yet, the difference is only significant for the low-benefit type.

Next, we turn to the behavior of followers. As shown in Fig. 3 that considers only the situation of high-benefit leadership, high-benefit followers exploit leaders by undercutting leaders’ contributions by 13% in HBL, and 19% in EN_HBL (two-sided Wilcoxon signed-ranks test: p = 0.04 in HBL and p = 0.01 in EN_HBL). Low-benefit followers contribute about half of leader contributions in the first period and further decrease contributions over time; on average, they undercut leaders’ contributions by 55% in HBL and 64% in EN_HBL (two-sided Wilcoxon signed-ranks test: p < 0.005 in HBL and p < 0.0001 in EN_HBL). Using data from all periods, we do not observe a significant difference in contributions between followers and their corresponding same-type counterparts in treatment BASE (p > 0.3).

Figure 4 shows that low-benefit followers undercut leaders’ contributions by 25% in LBL and 52% in EN_LBL (two-sided Wilcoxon signed-ranks test: p < 0.005 in LBL and p = 0.0001 in EN_LBL). Surprisingly, for high-benefit followers, despite the fact that their MPCR is twice as high as that of leaders, their contributions are only marginally significantly higher than those of leaders in LBL (two-sided Wilcoxon signed-ranks test: p = 0.06), and even significantly lower than those of leaders in EN_LBL (two-sided Wilcoxon signed-ranks test: p = 0.02). This is in sharp contrast to the baseline treatment where high-benefit members contribute 146% more than low-benefit members. Still, we find no significant difference in contributions between followers and their corresponding counterparts in treatment BASE (p > 0.3).

Table 5 presents a random effects regression of followers’ contributions on treatment/state dummies. Models (1) and (2) compare high-benefit followers with their high-benefit counterparts in treatment BASE; Models (3) and (4) compare low-benefit followers with their low-benefit counterparts in BASE. All coefficients on treatment/state dummies are positive, indicating a positive effect on followers’ contributions, yet, the only significant coefficient is the one for low-benefit followers in EN_LBL. Note that this coefficient also becomes marginally significant in Model (4) with some lower consistency thresholds of the ring test score.

Taken together, leadership has little effect in promoting follower contributions. Only under self-selected low-benefit leadership do we observe a marginally significant increase in low-benefit followers’ contributions. Interestingly, self-selected leaders do not seem to have more impact than imposed leaders in our setting. In contrast, followers tend to exploit self-selected leaders to a stronger extent than imposed leaders, as the distance between leaders’ and followers’ average contributions is larger with self-selected leaders than with randomly assigned leaders (p = 0.11 for HBL vs. EN_HBL; p < 0.001 for LBL vs. EN_LBL).Footnote 26 In consequence, we do not observe a significant difference in followers’ contributions between self-selected and imposed leadership (p = 0.85 for HBL vs. EN_HBL; p = 0.33 for LBL vs. EN_LBL),Footnote 27 even though self-selected low-benefit leaders contribute significantly more than imposed low-benefit leaders. The asymmetry in the group appears to provide an excuse that leads to a reduction of a potentially positive leadership effect for both exogenous and endogenous leadership, with an even more pronounced reduction for endogenous leadership.Footnote 28

Result 6: Relative to their same-type counterparts in treatment BASE, we find marginally significantly higher contributions only of low-benefit followers in EN_LBL. Followers exploit leaders more strongly when leaders are volunteers than when leadership is imposed.

This finding raises the question of how followers exactly respond to leaders’ examples. Table 6 reports the results of a random effects regression on how followers respond to their leaders’ contributions. Except for leader’s contribution, we include the time trend, the follower type, the interaction term between leader’s contribution and the follower type, and individual social preferences as independent variables. Table 6 indicates that, with high-benefit leaders, it is mainly the high-benefit followers that reciprocate positively, irrespective of whether the leader has been randomly assigned or volunteered. For every additional token the high-benefit leader contributes in a given period, a low-benefit follower contributes, on average, about 0.15 tokens. The level of reciprocity of high-benefit followers is significantly higher, with an average of about 0.6 tokens in HBL and 0.55 tokens in EN_HBL. Hence, high-benefit leaders’ examples mainly have an impact on the same-type followers.

Table 6 Followers’ responses to leader’s example

The last two columns of Table 6 show that low-benefit leaders have a significant influence on low-benefit followers. For every additional token the low-benefit leader contributes in a given period, low-benefit followers contribute 0.28 tokens in LBL and 0.35 tokens in EN_LBL. The interaction term is positive in LBL but not significant, implying that reciprocity from high-benefit followers is not significantly stronger than the one from low-benefit followers when they face a low-benefit leader.Footnote 29 It seems that, for followers that are of a different type as the leader, reciprocity towards the leader is based on a self-serving perception of contribution norms: with high-benefit leaders, low-benefit followers tend to balance their payoff with high-benefit members, and thus reciprocate on a much lower level; with low-benefit leaders, high-benefit followers try to balance the reciprocation level in contributions with low-benefit members.Footnote 30 We shall argue that imperfect conditional cooperation by followers, combined with the self-serving perception of contribution norms, is one important reason for the ineffectiveness of leadership in the presence of benefit heterogeneity.

Result 7: Low-benefit followers reciprocate less strongly than high-benefit followers under high-benefit leadership. The two types of followers reciprocate at a similar rate under low-benefit leadership. The followership pattern is in line with a self-serving perception of contribution norms.

Apparently, followers’ reciprocity is not sufficient to make setting good examples the payoff-maximizing strategy for low-benefit leaders. On average, their costs of contributing are higher than their benefits from increased contributions by followers. In addition, the more low-benefit leaders contribute, the more their earning falls short of those of followers. In other words, setting good examples, on average, does not pay for low-benefit leaders, but instead increases their earnings disadvantage within the group. This is consistent with our finding that imposed low-benefit leaders decrease their contributions very quickly and self-selection for leadership does not pay off for low-benefit members.

For high-benefit leaders, their examples only have little influence on low-benefit followers. However, since they obtain a comparatively large benefit from the public good, their costs of contributing are, on average, lower than their benefits from the increased contributions by followers, i.e. setting a good example is, on average, profitable for them. This is consistent with our finding that average contributions of imposed high-benefit leaders remain stable over time, and high-benefit members are on average better off when they self-select for leadership than when nobody volunteers in the group.Footnote 31

4 Conclusion

In collective action problems outside the experimental laboratory, group members are likely to gain different benefits from the provision of a public good. This paper examined the effect of leading by example on cooperation when individuals have different benefits from the group account by using a linear public goods experiment.

We find that the effect of leading by example is limited in promoting cooperation. Average contributions do not differ significantly between situations with and without either type of randomly selected leadership, though we do observe a significantly slower declining trend in contributions with (imposed) leadership. In line with previous research, we see significantly higher contributions with self-selected leaders than without. Contributions with self-selected leaders, in particular self-selected low-benefit leaders, are also significantly higher than those in our baseline treatment with simultaneous contributions. However, we do not observe a high enough willingness to lead for high-benefit members, and the motivation for low-benefit members to self-select into leadership is decreasing quickly over time. This trend, combined with the fact that contributions are low in case nobody in the group volunteers to become the leader, leads to contributions that are only marginally higher, on average, in the endogenous treatment than in the baseline treatment with simultaneous contributions.

Via exploring followers’ reciprocity towards leaders, we notice the usual pattern of imperfect conditional cooperation by followers. In addition, we find that followers who are of a different type than the leader appear to condition the level of reciprocity on their preferred contribution norm. Specifically, while low-benefit followers reciprocate to the high-benefit leader at a particularly low rate, high-benefit followers reciprocate to low-benefit leaders to a similar extent as low-benefit followers. Followers’ insufficient reciprocity not only impairs the effectiveness of leaders’ role models, but also demotivates (potential) leaders when it comes to volunteering for leadership. In fact, even if it actually pays off for leaders to provide a good example, on average, pessimistic beliefs about followers’ reciprocity also seem to deter some high-benefit leaders to self-select into leadership.

With respect to the comparison of imposed and self-selected leadership, we find that self-selected leaders—in particular low-benefit leaders—tend to set better examples than imposed leaders. However, followers, in particular those that are of a different type than the leader, do not increase contributions enough. Our results remain qualitatively unchanged when controlling for subjects’ social preferences, which makes it unlikely that pure self-selection effects into leadership drive our main results. Moreover, in contrast to some existing evidence that assigning the advantageous type to leadership results in higher contribution levels, our findings do not support the conclusion that high-benefit leadership is more effective than low-benefit leadership.

Overall, our findings are in line with the suggestion that, when there is benefit heterogeneity in a group, the conflict between different equity and contribution norms is difficult to overcome, even with a mechanism—leading by example—that has often been proven as useful in homogenous populations. They also pose some questions for future research: How can groups with heterogeneity overcome the coordination problem regarding different contribution norms, when there is a leader? Perhaps, in such a situation the leader needs more than just the good example. For instance, it would be interesting to study situations with benefit heterogeneity in which leaders have additional coercive power such as a punishment option or ostracism power. An alternative would be introducing a communication option for leaders in order to alleviate the coordination problem. Another promising route of research in view of our results is appropriate selection mechanisms for leaders. It seems that type and nature of leaders matter when it comes to the effectiveness of leadership. Relevant characteristics could be considered in appointment or selection (voting) procedures. It should also be noted that in our setting, there are two high-benefit and two low-benefit group members. Since high-benefit leaders influence high-benefit followers more strongly, the effectiveness of high-benefit leadership might depend on the distribution of the two benefit types within the group. Future research could look at groups that consist of three high-benefit members and one low-benefit member and vice versa.