1 Introduction

Accountability is an essential element of various relationships in public and private spheres, including representative politics, public administration, service production, global governance, voluntary associations, and private companies (e.g., Bäckstrand, 2008; Kuyper & Bäckstrand, 2016; Warren, 1996a). Different institutional mechanisms of accountability have been identified in previous studies, and a variety of institutional designs have been formulated to secure accountability in different contexts. This variety also reflects different understandings of accountability in the literature (Bovens, 2010; Bovens et al., 2014).

The common assumption about public accountability relations is that accountability solves problems associated with the delegation of powers and responsibility, because it brings the actions of the agents in line with the expectations of the principals (e.g., Manin, et al., 1999). Different conceptualizations of public accountability refer to two fundamental accountability mechanisms involved in different institutional designs: the threat of (material) sanctions and the requirement that those who are held accountable should justify their actions. Different strands of literature tend to emphasize one of these mechanisms and consider it “the” basis of public accountability, even though the need to combine these mechanisms is often acknowledged. In general terms, rational choice theory and economics-based approaches tend to put emphasis on sanctions (e.g., Besley, 2006), while theories of deliberative democracy (e.g., Gutmann & Thompson, 1996) and social psychology (e.g., Lerner & Tetlock, 1999) tend to emphasize the requirement of a justification. Our aim is to bring these two accountability mechanisms together by studying both the independent and combined effects of sanctions and justifications.

We investigate the two mechanisms of accountability in a laboratory experiment where participants play the trust (investment) game (Berg et al., 1995). In this game, a sender can show trust by allocating resources to a responder who, in turn, can show trustworthiness by returning resources. Sending and returning are mutually beneficial because resources are multiplied if transferred from the sender to the responder. Instead of a standard two-player trust game, we use a modification that involves two senders and one responder. This experimental design shares a key characteristic of mechanisms involved in public accountability relations where public officials make decisions that affect a number of individuals. However, in designing the experiment, we deliberately abstracted away from specific procedures, such as elections, to be able to study the fundamental mechanisms of public accountability, rather than examining specific institutional designs.

The aim of this article is to examine whether the threat of sanctions and the requirement of a justification enhance behavioral trust. We are interested in what Warren (1996b) calls “warranted trust”, that is, trust that is shown when specific mechanisms guarantee that those who are trusted behave in a desired manner. We study the influence of each accountability mechanism alone as well as their combined effect. As we will point out below, there are reasons to expect that the requirement of a justification backed with the threat of sanctions provides the most influential form of accountability, because it incentivizes the agents to act in ways they can publicly justify. In addition, we explore what kinds of justifications are formulated, and how the possibility of facing sanctions affects the content of justifications.

Our main observation is that obliging responders to justify their decisions induces the highest levels of contributions. We also find that responders tend to justify their choices especially in terms of reciprocity, which is in line with their behavior: the more a sender contributes, the larger is the amount a responder tends to return to the specific sender. Moreover, the risk of facing a punishment appears to discourage responders from giving justifications appealing to their self-interest. Our results have implications for the real-world design of institutions of public accountability by especially showing the effects of justification as an accountability mechanism and by showing the tendencies towards reciprocity.

2 Conceptualizations of trust and accountability

Trust and accountability are clearly distinct phenomena. According to Warren (1996b: 4), “[w]hen one trusts, one forgoes the opportunity to influence decision-making, on the assumption that there are shared or convergent interests between truster and trustee”. In a similar vein, Berg et al. (1995) define trust in terms of “belief in reciprocity” (italics added). In contrast, accountability entails mechanisms intended to ensure that an accountable actor behaves in a manner that the principals require, even in case of divergent or opposing interests. In this respect, mechanisms of accountability can enhance what Warren (1996b: 20) calls “warranted trust”. In other words, accountability mechanisms function as “protections and inducements” that help manage the risks involved when placing trust in an actor. For example, in representative democracies, trust in elected representatives is “warranted” because it is secured by specific accountability mechanisms allowing voters to influence or react to representatives’ actions when they do not align with their interests (Warren & Gastil 2015).

Different strands of literature focus on different mechanisms of accountability. In political economy and rational choice theory, accountability is primarily understood as a mechanism based on material sanctions or rewards (Fearon, 1999: 55). Besley (2006: 37) defines accountability in terms of the opportunity of the public to punish decision-makers: “A politician is formally accountable if there is some institutional structure that allows the possibility of some action to be taken against him/her (such as being voted out of office) in the event that he/she does poor job.” Besley’s definition exemplifies a prominent feature of the formally oriented literature on democracy, namely that elected representatives are expected to act in the interests of voters because of the risk of not being re-elected.Footnote 1 In addition to material sanctions, various types of immaterial or social sanctions, such as reputational effects, may also motivate those holding public offices (Colombo, 2018; Lerner & Tetlock, 1999).

Although the essence of these approaches to accountability is the risk of utility losses, another approach to accountability is to emphasize the requirement of public justification and reasoning as the key accountability mechanism. According to Philp (2009: 29) “A is accountable with respect to M when some individual, body or institution, Y, can require A to inform and explain/justify his or her conduct with respect to M.” According to this view, what defines an accountability relationship is Y’s capacity to require A to give an account. In theories of deliberative democracy, the requirement of a public justification is the key feature of accountability. Gutmann and Thompson (1996: 128) define accountability as the requirement of citizens and decision-makers to give justification for their decisions to all those who are bound or significantly affected by them.

Real-world public accountability relations typically involve both mechanisms of accountability. Schedler (1999) argues that those who are accountable are obliged to inform the public about their decisions, as well as to explain and justify their decisions. The public in turn is empowered to monitor decision-makers’ conduct and force them to “bear the consequences” in the form of sanctions. Thus, “A is accountable to B when A is obliged to inform B about A’s (past or future) actions and decisions, to justify them, and to suffer punishment in the case of eventual misconduct” (Schedler, 1999: 17). Rehfeld (2005: 189–190) argues that a sanctioning dimension is necessary for “any reasonable account of legitimate political representation” and that the discursive mechanism only complements it. Without sanctioning, a representative could act according to his or her self-interest and even justify his or her conduct in these terms.

To sum up, various strands of research give different weights to different mechanisms of accountability and institutional designs supporting them. At present, there is relatively little experimental research on how different aspects of accountability interact with each other and, importantly, how they affect individual action and reasoning. However, we can expect that both sanctions and the requirement of justification enhance trust and thereby cooperation to some extent, but that together they constitute a strong mechanism with the clearest behavioral effects. The existing experimental evidence, to which we will now turn, provides preliminary support for this expectation.

3 Previous experimental studies on sanctions and justifications

The absence, or insufficiency, of accountability mechanisms can lead to collectively sub-optimal outcomes by leaving room for decision-makers’ self-serving choices. In experimental studies on electoral accountability, the sanctioning mechanism tends to be reduced to a dichotomous choice between voting for or against an incumbent. Experimental results on electoral accountability provide somewhat ambiguous results. Previous studies suggest that people tend to resort to retrospective voting and punish decision-makers for outcomes they dislike (Landa, 2010; Woon, 2012). Some studies show that electoral mechanisms actually decrease the amounts decision-makers distribute. Arguably, this is due to an effect whereby being elected is perceived as an entitlement to use resources in a self-serving way (Geng et al., 2011). Others have called into question the result concerning subjective entitlements (Weiss & Wolff, 2013).

In experimental studies on electoral accountability, decision-makers’ messages to recipients are often intended to reflect campaign promises, which have been shown to increase the electorate’s payoffs (Corazzini et al., 2014; Feltovich & Giovannoni, 2015). Although promises include statements about future action, justifications refer to statements about and reasons for actions that are given simultaneously with an action or retrospectively. Justifications have mainly been understood in terms of blame avoidance and blame management, where the primary interest lies in the conditions under which members of the public are likely to accept explanations that representatives give for damaging, scandalous, or otherwise undesired policy outcomes (McGraw, 1991; McGraw et al., 1993).

Beyond the electoral setting, the influence of costly punishment has been studied in experimental games where individual rationality and collectively optimal action conflict. A number of studies have shown that the possibility of a punishment increases contributions to public goods (e.g., Ostrom et al., 1992; Fehr and Gächter, 2000; Hamman et al., 2011; Lierl, 2016). However, evidence on the trust game suggest that the effect of sanctions is sensitive to the specific design of the experiment, and that sanctions do not always increase trust and trustworthiness (Calabuig et al., 2016; Charness et al., 2008; Fehr & List, 2004; Fehr & Rockenbach, 2003; Houser et al., 2008; Rigdon, 2009). It seems that while sanctions can enhance cooperation in certain contexts, they may also give rise to self-interested choices if players understand them as a price paid for acting selfishly, or if they dampen their intrinsic motivation to cooperate (Houser et al., 2008).

The effects of messages and communication on reasoning and behavior have been studied in a variety of experimental designs. There is a large number of experimental studies on the behavioral consequences of various types of communication in settings where people interact (Bó & Bó 2014; Cason & Gangadharan, 2016; for meta-analyses, see Sally, 1995; Balliet, 2010). Although evidence on the communicative aspect of accountability in social dilemmas is limited, some studies look at the consequences of monitoring and justifications in trust and public good games. Bracht and Feltovich (2009) find that “cheap talk” in the form of messages from the responder before the sender’s decision does not have much effect on either player’s behavior in a two-person trust game, whereas the opportunity of senders to observe responders’ actions in the previous round significantly increases the amounts returned. Experimental evidence on the public good game suggests that an obligation to justify one’s choice to other participants increases contributions in particular among those who have larger endowments (De Cremer & van Dijk, 2009). Other studies demonstrate that being required to justify one’s choice increases norm-abiding behavior in commons dilemmas and games testing deviations from social norms (de Kwaadsteniet et al., 2007; Xiao 2017).

The key idea of our experimental design is to study the behavioral effects of the requirement of justification and material sanctions, as well as their interaction. While there are no previous studies with the same experimental design, the relationship between different types of communication (which can vary from an abstract “signal” to a completely free-form communication, cf. Brandts et al. (2019) for a review), and punishment has been widely studied in the previous literature on social dilemmas. For example, Bochet et al. (2006) found that chat room communication had almost as strong effects on enhancing cooperation as face-to-face communication. They also report that adding a punishment option to a chat room treatment raised contributions only moderately. Dufwenberg et al. (2021) had subjects record a single pre-play message, and report that promises drive the effect of communication on beliefs, and that broken promises lead to higher rates of costly punishment. Furthermore, Ostrom et al. (1992) report that communication increases yields but that overuse of sanctioning and sanctioning without communication reduces net yields in a common pool resource experiment. However, when subjects agree on a joint investment strategy into the common pool and choose their sanction mechanism, they achieve close to optimal results. Likewise, in their experiment on spatial common pool resources, Janssen et al. (2010) found that participants use costly punishment if presented an option to do so, but without communication this does not increase gross payoffs. However, when communication is allowed, performance increases significantly, but it is not sustained if communication is not possible anymore and punishment is available.

Social psychological studies typically focus on the effects of the requirement of justification on individual reasoning. Based on the literature review, Lerner and Tetlock (1999) discuss the conditions under which accountability might amplify biases in reasoning rather than enhance open-mindedness and critical thinking. The authors argue that “experimental work has repeatedly shown that expecting to discuss one’s views with an audience whose views are known led participants to strategically shift their attitudes toward that of the audience” (Lerner & Tetlock, 1999: 256). However, when the views of the audience are not known, accountability tends to give rise to what Tetlock (1983) has called “preemptive self-criticism”, or increased awareness of their own decision processes and anticipation of potential counter-arguments. More recently, Mercier, et al. (2015) have found support for “the argumentative theory of reasoning”, according to which people tend to be lazy when considering their own arguments but critical towards those of others. This highlights the importance of various feedback mechanisms such as dialogue or, as is the case in our experiment, material sanctions.

4 Experimental design and hypotheses

In the standard two-person trust game (Berg et al., 1995), two subjects are randomly assigned into the roles of a sender and a responder and given an endowment of, say, ten units. At the first stage of the game, the sender decides an amount x (0 ≤ x ≤ 10) he or she sends to the responder. The sender keeps 10–x, and x is tripled by the experimenter to create benefits of cooperation, so that the responder gets 3x. At the second stage of the game, the responder passes on an amount y (0 ≤ y ≤ 3x) to the sender, and keeps 3xy. The sender’s choice is understood to model trust, that is, whether the sender believes the responder to reciprocate, and the behavior of the responder is understood to model trustworthiness. Another interpretation of players’ behavior is that senders take a risk by sending money. Both players can earn more money if senders send and responders return money. The standard behavioral result is that trust is frequently observed but that slight variations in the design can produce substantial changes in contributions (Johnson & Mislin, 2011).

We use a three-player variant of the trust game with two senders and one responder, where senders move first and have an opportunity to send money to the responder. The responder moves second and can return money to the senders, possibly returning different amounts to each sender. In the experiment, each participant played the game for six rounds, each round consisting of three stages. Both senders and responders were first endowed with 12 points (2 points = 1 euro) in the beginning of the first stage of each round.Footnote 2 The senders then decided whether and how much to send to the responder. The amount sent was tripled by the experimenter and passed on to the responder. At the second stage of each round, the responder decided whether and how much to return to each sender, that is, how to divide the total amount sent, tripled by the experimenter, plus his or her initial endowment of 12 points between the two senders and him- or herself. Each round ended with a third stage where both senders and responders were given extra 12 points. In the two treatment conditions including the opportunity to punish, the senders could use these extra points to punish the responder. Extra points were also given in treatment conditions without the punishment opportunity to make sure that possible behavioral differences across conditions do not depend on the extra points. The senders made both allocation and punishment decisions individually and simultaneously, so they could not coordinate their choices. The game tree is displayed in Fig. 1.

Fig. 1
figure 1

The structure of the three-player trust game in Baseline. In the first round, the amounts sent by the senders (s1 and s2, respectively) are multiplied by three by the experimenter. The responder can then return to sender 1 any amount r1 between 0 and his initial endowment (12) plus the total amount received in the first round (3 × (s1 + s2)), minus the amount returned (r2) to the sender 2

Our experiment follows a 2 (opportunity to punish) × 2 (requirement of justification) factorial design. In Baseline, the three-person trust game was played without punishment possibilities or justification requirements. In Punishment, the senders could use the extra points received in the third stage to punish the responder. Punishment was costly, and one point used for punishment decreased the responder’s earnings by three points for that round. As a consequence of punishment, a responder’s earnings could go down to zero. The costliness of punishments captures an essential feature of most real-world accountability relationships where special efforts are needed in sanctioning the decision-maker. Further, in experimental research, costly punishment is a standard mechanism because it hampers the use of punishment randomly, or for irrelevant reasons.

In Justification, responders were asked to write a free-form justification for their decision in an open space. The justification was shown to the senders along with the responder’s allocation decision. Finally, in Justification + Punishment, the responder was asked to write a justification for his or her decision. The senders had an opportunity to punish the responder after the responder’s allocation decision and the corresponding justification were revealed.Footnote 3 The justification was thereby given before the responder knew about the senders’ reactions, which allows an analysis of punishments as a feedback mechanism.

In all conditions including punishment, justification, or both, senders were aware of these mechanisms when making their initial decisions implying that senders could anticipate the potential effect of the accountability mechanisms on responder behavior. Table 1 presents the design of the experiment.Footnote 4

Table 1 The design of the experiment

We used the three-player variant of the trust game to capture the asymmetries typical in public accountability relations where a small number of decision-makers are entitled to make decisions that affect several individuals so that the decision-makers are able to discriminate between these individuals. To be more specific, in a two-player trust game, the responder can be reciprocal toward the sender and return money in proportion to the sender’s allocation. In the three-player variant, the responder can likewise be reciprocal by returning money to each sender in proportion to their allocations, but the responder can also return money without reacting to what the senders have done, e.g., split the returned money equally between the senders. It is thereby possible to distinguish between different reaction strategies to sender behavior. We can also investigate what kinds of verbal justifications responders give to their choices, as well as whether the verbal justifications match their choices. To take a real world example, elected representatives may make decisions that benefit either all members of the public or only a specific subgroup, and they may or may not explain their actions in a manner that corresponds to their actions.

It must be pointed out that in the three-person trust game, senders may be motivated by mutual competition if they anticipate that the responder will return more to a sender who sends more.Footnote 5 However, this possibility is reduced by the fact that senders make their choices simultaneously, which makes it impossible for a sender to know the other sender’s possible action at the time of making a decision. The senders are also randomly assigned to a new three-person game on each round, which rules out the development of reputational effects regarding sender or responder behavior. It is also noteworthy that as in the regular two-person trust game, the individually rational behavioral strategy also in this setup is to send nothing.

The possibility of punishment and the requirement of justification are operationalizations of the two basic mechanisms of accountability. Punishment models a situation where a decision-maker is accountable to the public in the sense that members of the public have an opportunity to sanction the decision-maker. Analogously to the possibility of the public to impose sanctions on decision-makers in real-world accountability relationships, senders in our case have an opportunity to impose monetary fines on responders. Further, senders decide individually whether and how much they sanction, which is analogous to the variance in sanctions in real-world accountability relations. Justification is intended to capture the requirement that decision-makers provide reasons for their actions. Finally, Justification + Punishment models a situation where both accountability mechanisms are in place, that is, the decision-maker is required to justify his or her actions and can also face sanctions.

4.1 Hypotheses

In each treatment condition, the subgame perfect payoff maximizing strategy is for the responder to return nothing and for the senders to send nothing. In the two treatment cells involving costly punishment, not to punish is the dominant strategy for the senders, irrespective of responder behavior in the second stage of the game. However, based on earlier studies on the trust game, we can expect that both senders and responders make contributions. Furthermore, we expect that senders will send more and responders will return more in Punishment and in Justification compared to Baseline.

Punishment gives senders an opportunity to reduce responders’ earnings, which is likely to give responders a motivation to return money. Punishment will influence responder behavior if they anticipate that violating the social norm of returning money will lead to punishment and if they care about being punished (De Cremer et al., 2001). Evidence on public good games is rather robust in showing that the opportunity to sanction free riders increases contributions (Ostrom et al., 1992; Fehr and Gächter, 2000; Hamman et al., 2011; Lierl, 2016). Evidence on the effect on punishment in the case of trust games is somewhat more ambiguous (Calabuig et al., 2016; Charness et al., 2008; Fehr & List, 2004; Fehr & Rockenbach, 2003; Houser et al., 2008; Rigdon, 2009), suggesting that the fear of punishment does not automatically affect responder behavior.

The requirement to give a justification does not give senders an opportunity to affect responders’ material well-being directly. However, it may still influence responders. Requiring responders to justify their choices may increase the likelihood of following a social norm of behaving in a trustworthy or fair manner (De Cremer et al., 2001). Responders may share more resources when a justification is required because acting fairly is easier to justify and people may care about their ability to give an account for their action (de Kwaadsteniet et al. 2007, De Cremer & van Dijk, 2009). Indeed, evidence shows that being required to justify one’s choice increases norm-abiding behavior (de Kwaadsteniet et al., 2007; Xiao 2017). Responders’ contributions may also be increased when justifications are required because people tend to care about pleasing one’s audience (Lerner & Tetlock, 1999), and returning money accompanied with a justification for that action is likely to please the senders. If senders anticipate that the ability to punish and the requirement of a justification increase responders’ likelihood of returning money, senders are likely to feel more confident about investing money. Explicitly stated, our hypotheses regarding sender and responder behavior posit that:

H1: Senders send more money and responders return more money in Punishment compared to Baseline.

H2: Senders send more money and responders return more money in Justification compared to Baseline.

Since there appears to be no previous studies comparing the relative effects of material sanctions and verbal justifications, we are not able to make a specific prediction about the difference between Punishment and Justification. However, we expect that the combination of punishment and justification encourages decision-makers to make decisions that benefit the public more than either accountability mechanism does alone, that is, responders are expected to return the most in Justification + Punishment. This expectation is based on the role of sanctions as a feedback mechanism between senders and responders, which allows senders to react to responders’ decisions and to the corresponding justifications. In other words, when the requirement of a justification is accompanied with a probability of a punishment, responders are likely to anticipate that a discrepancy between their action and justification will not go unpunished. For that reason they are likely to act in a way that can be justified in an acceptable manner, i.e., return money. There is evidence which supports this anticipation by showing that a discrepancy between communication and action prompts sanctions (Dufwenberg et al. 2021). For this reason, we assume a significant interaction effect and predict that senders anticipate the largest returns from responders when both accountability mechanisms are in place. Our third hypothesis is as follows:

H3a: Justification induces senders to send more money when they can punish responders in the third stage of the game.

H3b: Justification induces responders to return more money when senders can punish responders in the third stage of the game.

In addition to testing these three hypotheses, we conduct an exploratory analysis of the contents of the justifications given by the responders. In particular, we compare justifications given when responders are merely expected to justify their choices to those justifications given when senders have an opportunity to punish. Moreover, we explore the relationship between responder behavior and the justification given for that behavior, as well as the senders’ reactions to the combination of responder action and justification.

We conducted an anonymous and computerized experiment (with Z-tree; Fischbacher, 2007). Subjects’ allocation to treatments was random. In the beginning of each experimental session, subjects were first randomly assigned to their seats in the decision-making laboratory. Written instructions were then handed out to each participant and read aloud. The experiment began after all participants had successfully completed a practice round. In each session, the game was played for six rounds and subjects were informed about the number of rounds in the initial instructions. The outcome of a round was revealed to the subjects immediately after the round was played, i.e., subjects were aware of the outcome of the previous round before making their decisions on rounds 2–6. Each participant was randomly allocated into the role of a sender or a responder in the beginning of the experiment, and he or she retained that role throughout the six rounds. Because of random allocation to different roles, entitlement effects should not have impact on our results.

To avoid effects of repeated games, participants were randomly assigned into a new three-player group in the beginning of each round. Each participant was thus in the same role throughout the six rounds but the other group members most likely changed between rounds. At the end of the experiment, each participant was paid the amount he or she had earned from one round, selected by asking the participant to roll a six-sided die.

Each treatment consisted of three experimental sessions of 18 subjects, yielding 54 subjects per treatment, and a total of 216 subjects. The subjects were mostly undergraduate and graduate students of the local university (58 percent female, mean age 27.4, s.d. 6.35). Each subject participated in one experimental session only. Each session took place on a single day at the Decision-Making Laboratory (PCRClab) at University of Turku.

5 Results

We will start with an overview of the points sent and returned, followed by a closer examination of the first round to ensure independent observations. We also examine senders’ and responders’ behavior across the rounds in order to capture the time dynamics present within each treatment, as well as earnings and sanctions. Finally, we analyze the justifications responders gave in the two treatments where they were required.

5.1 Sender and responder behavior

Aggregated over all four treatments and six rounds, we observed a total of 432 individual plays of the three-player trust game and a total of 864 sender decisions (n = 144, or 36 senders per treatment, each making 6 decisions). Table 2 shows that the average amount of points sent was 5.70 in Baseline, 6.20 in Punishment, 6.89 in Justification, and 7.33 in Justification + Punishment. In accordance with a number of previous studies, senders make strictly positive contributions in our experiment.

Table 2 Average points sent and returned, and points used for punishment by treatment, all rounds

In total, our experiment consisted of four (2 × 2) treatment cells, and the single-shot trust game was repeated 6 times in each cell, with stranger matching. We collected data on 432 allocation decision pairs (864 allocation decisions, respectively) by the responders (n = 72, or 18 per treatment, each responder making two decisions per round, i.e., 12 decisions). As reported in Table 2, the average amount returned for each sender was 8.00 points in Baseline, 9.06 in Punishment, 9.82 in Justification and 11.73 in Justification + Punishment. In Baseline, the average share returned by responders was 47% of the points received after the multiplication, in Punishment the average returned share was 49%, in Justification it was 48%, and in Justification + Punishment 53%.

We test first for an interaction between the treatments, restricting the analysis to the first round to ensure independent choices. The sender behavior was analyzed with a 2 (no punishment vs. punishment) × 2 (no justification vs. justification) ANOVA. The results are reported in Table 3. The ANOVA on senders’ first-round choices provides tentative support to our hypothesis H2: As shown in Table 3, the main effect of the justification treatment is almost significant F(1, 140), p = 0.078. However, the effect of the Punishment manipulation and the interaction effect between the Justification and Punishment treatments were not significant at the conventional level of 0.05, and thus regarding sender behavior we do not find support for H1 or H3a.

Table 3 Two-way ANOVA of sent allocations, observations restricted to the first round

In a similar manner to our analysis of the sender behavior, we restrict the observations to the first round and analyze responders’ behavior with a 2 (no punishment vs. punishment) × 2 (no justification vs. justification) ANOVA. As was the case with the senders, the responders seem to return more if treated with Justification F(1, 140), p = 0.051 as shown in Table 4. However, if we include the amount received in the first stage of the first period as a covariate (Table 5), the main effect vanishes because the first stage sender allocation is a highly significant explanatory variable F(1, 67), p < 0.00001 which accounts for most variation in the amounts of points returned. Regarding responder behavior, we therefore find support for H2, if points received are not taken into account, but when they are, the effect is no longer seen. Since Punishment or the interaction between Punishment and Justification are not significant, H1 and H3b are not supported in the case of responder behavior.

Table 4 Two-way ANOVA of returned allocations, observations restricted to the first round
Table 5 Two-way ANCOVA of returned allocations, observations restricted to the first round, points received in the first stage as a covariate

5.2 Earnings, sanctions, and dynamics of behavior

Overall, we can detect a pattern of reciprocal behavior among the responders: The senders, having no chance to coordinate the amounts they send, were inclined to send an unequal number of points to the responder (in 369 out of 432 times). In return, the responders gave more points to the more generous sender in 80 percent of the time (in 294 out of 369 times). In cases when the responder had received equal allocations from both senders, he or she almost always (95 percent of the time) returned an equal amount to each sender (in 60 out of 63 times).

Sending a larger allocation than the other sender significantly increased the likelihood of getting a larger share in return. In fact, for the more generous sender, the odds of receiving a larger allocation (than the other sender) in return were 78 times higher, (OR: 78.4, 95% CI 29.92–256.92). These results suggest that reciprocity was the dominant motivation for responder behavior (cf. Table B8 in Appendix B).

The average gross and net earnings in different treatments are shown in Table 6. On average, the senders punished the responders quite moderately, and consequently their gross and net earnings are quite close to each other even in treatments where punishments were possible. However, since each point used for punishment decreased the responders’ points by three, even this moderate amount of punishment affected responders’ payoffs considerably. In the Punishment treatment, responders’ earnings were on average reduced by 8.52 points, that is, 2 × 1.42 × 3 where 1.42 is the average number of points an individual sender used for punishing the responder. In the Justification + Punishment treatment cell, responders’ earnings were on average reduced by 7.36 points. In that sense, giving the senders an opportunity to punish the responders did not increase overall efficiency, although this efficiency in terms of points earned increased in the later periods (see Tables 11 and 12 in Appendix B). However, one must keep in mind that this is partly due to the specific parameters of the experiment. The earnings are contingent on the relative cost of punishment or the multiplier for the punishment points that was chosen by experimenters.

Table 6 Average gross and net earnings per treatment

We see a growing trend in the average amounts of points sent in each treatment, as shown in Fig. 2. The panel on the left shows the average share of the initial endowment of 12 points senders transferred to responders. Comparing only the first and last period, the average amount sent increased about 50% in each treatment. This growth is most pronounced in Baseline where the increase was about 67%. This suggest that there was a form of indirect reciprocity; in other words, positive experiences in previous rounds induced senders to send more, which in turn induced larger returns even though the players did not engage in fixed groups. The panel on the right shows that the average share of points returned remained quite stable in each treatment, whereas the absolute amounts of points grew as the senders’ transfers increased. Indeed, as shown in Tables 9 and 10 in Appendix B, the average share returned was always at least 40% and in some occasions even over 50%, making sending money actually a rewarding choice, and in this manner likely contributing to the growing trend of monetary allocations over the six periods in our experiment.

Fig. 2
figure 2

Average shares of points sent and returned per period

We conducted panel regressions on points sent and returned using the treatment dummies as independent variables. We did not observe significant treatment effects on either amounts sent (Table 13 in Appendix B) or returned (Table 14 in Appendix B). However, if we include the amount received in the first stage when explaining responder behavior, or amount received in the previous round when explaining sender behavior as an explanatory variable, the number of points received (first stage/previous round) turns out to be highly significant explanatory variable, which is to be expected. Furthermore, some caution should be used in interpreting the results, given the rather small number of observations per treatment.Footnote 6

To summarize the main results, the average points sent and returned were highest in Justification + Punishment, lowest in Baseline, while Punishment and Justification fell between these two. When restricting analysis at the first round, the contributions were highest in the treatments with the justification requirement. In accordance with this, responders also returned more money when justifications were required. However, taking into account what responders earned renders the effect of Justification insignificant. Punishment or the interaction between Punishment and Justification did not produce statistically significant effects.

Responders also reacted to senders’ behavior by discriminating between the two senders and rewarding them in accordance to their behavior. Furthermore, over the six rounds, the proportional share of points returned remained the same throughout the rounds. However, since senders increased the amount they sent, in absolute terms, retuned amounts also increased. The same pattern was observed independent of the treatment. This virtuous cycle increases gross efficiency. However, once the effects of punishments are taken into account, the net efficiency in terms of the total payoffs for the whole group of three is actually decreased in treatments with the punishment opportunity.

5.3 Justifications

Since the analysis of treatment effects suggests that responders’ obligation to justify their choices had some influence on sender behavior, we will further analyze the content of justifications and their connection to responders’ behavior. In the Justification and Justification + Punishment treatment conditions, responders were asked to write a justification for their decision. What kinds of justifications did they give? Since our hypotheses pertained to participants’ behavior, the analysis on justifications is exploratory in character. However, insofar as sanctions function as a feedback mechanism, we can expect that the threat of being sanctioned affects the types of justifications responders give. We coded the justifications into five classes: Reciprocity, Equality, Self-Interest, Other, and Empty. The classification is based on our interest in responders’ application of different fairness norms. The criteria for these classes as well as examples of each class are given in Table 7.Footnote 7 Each justification was placed in one class based on its principal content; the justifications were generally short.

Table 7 Justification classes

Figure 3 reports the proportional distribution of justifications in Justification and Justification + Punishment treatments cells. In both cells, messages falling into the Reciprocity class are by far the most common (56% in Justification and 52% in Punishment and Justification), whereas appeals to the equality norm are much less frequent. In Justification, Self-interest ties with Equality, and none of the responders failed to give a justification in this treatment. In Justification + Punishment, appeals to self-interest are not observed but two responders failed to justify their choices seven times, and one responder failed to do so once. Although such failures could be considered an analytical nuisance, they can also be substantively meaningful: the failure to give a justification even when explicitly required to do so is still a message to the senders. Note also that the share of Empty in Justification + Punishment is almost the same as that of Self-Interest in Justification. The distribution of justification types differs between the treatments (n = 216, χ2 = 31.761, df = 4, p < 0.001). We thereby feel confident to say that the threat of punishment eliminates the propensity to explicitly justify decisions with self-interest.

Fig. 3
figure 3

The distribution of justification classes

A question that naturally follows is whether punishments depend on the kinds of justifications responders give or whether the failure to give a justification triggers punishments, which can be investigated in the Justification + Punishment treatment cell. To this end, we compared total sanctions (combined sanctions by both senders) directed at responders. Responders seem to face especially harsh punishments if they give no justification, the average number of points used for punishing being 7.46 (s.d. 6.48, n = 13) in that case. Recall that the “fine” suffered by the average responder is obtained by multiplying this number by three. In contrast, when the responder evokes the principle of reciprocity, the average sanction is only 1.05 points (s.d. 2.05, n = 56). Other justification classes fall in between these two but are closer to Reciprocity than Empty. The Brown-Forsythe test statistic (7.382, df1 = 3, df2 = 28.163, p = 0.001) suggests that average punishments differ among the classes. Furthermore, Tamhane’s T2 test shows that there are statistically significant differences between Reciprocity and Empty (p = 0.023) as well as between Reciprocity and Other (p = 0.079).Footnote 8

Finally, we checked whether responders’ behavior was in line with the justifications they gave. As it is not possible to classify choices as unequivocally as justifications, the results are indicative rather than conclusive. When the allocations falling into the category of reciprocity, equality or self-interest are considered, about 61 percent of them match with the associated justification.Footnote 9 Justifications matched actual allocations to a somewhat greater degree in the Justification treatment cell (67.3 percent) than in the Justification + Punishment treatment cell (54.5 percent). According to Pearson’s chi square test, the difference between the cells is statistically almost significant (χ2 = 3.599, df = 1, p = 0.058, n = 208). In the Justification + Punishment treatment cell, responders faced on average larger sanctions when the justification did not match the allocation (9.33 points) than when the two matched (3.22 points). The result from a Mann–Whitney U test is statistically significant (U = 894, p = 0.006, n = 101) and indicates that the opportunity to punish was indeed used as a feedback mechanism to sanction responders, not just for allocations, but also for inconsistencies between allocations and justifications.

6 Conclusions and discussion

Our experiment was designed to analyze the extent to which different accountability mechanisms increase contributions in a trust game. We designed a three-person trust game to capture asymmetries involved in public accountability relations. The game gauges trust understood as “belief in reciprocity” (Berg et al., 1995: 137), and our experimental treatments make it possible to discern the behavioral effects of two accountability mechanisms, punishment and justification, either separately or combined. We replicate observations from previous studies on the trust game, that is, resources are sent and returned even in the baseline treatment with no accountability mechanisms.

We also see some differences between the treatments. The average sums sent or returned were highest in Justification + Punishment, and lowest in Baseline. When looking at the first round, the opportunity to punish responders did not increase points sent or returned, whereas the requirement of a justification had an impact on sender behavior. Moreover, responders reacted to sender behavior, they returned more when they were given more, and they also discriminated between the two senders according to how much each sent. The obligation to justify decisions influenced responder behavior, but the effect is not seen when the amount received from senders is taken into account. The analysis of the content of justifications reveals that in the Justification + Punishment treatment cell, there were no justifications that appeal to pure self-interest, whereas in the Justification cell, appeals to self-interest were presented. Moreover, the failure to give any justification as well as inconsistency between one’s behavior and the given justification triggered punishments.

These results suggest that punishment was not an effective way to increase contributions, an observation that is in line with certain other trust game studies (Houser et al., 2008, Clabuig et al. 2016). Regarding justifications, our results give further support to the view that the requirement to justify one’s actions increases contributions to others. This may be because people find it easier to give justifications for actions that follow social norms and that are not self-interested. Especially the norm of reciprocity was observed both in responders’ actions and in the justifications they gave for their actions. Our results suggest that the difficulty to justify self-interested action may have influenced responders’ behavior, because self-interest was not often given as a justification. Furthermore, the fear of punishment prevented appeals to self-interest totally, suggesting that punishment gave senders a tool to give feedback both on decisions and the justifications received from responders. It is worth pointing out, however, that the tendency to follow the norm of reciprocity in terms of both behavior and justifications may also be regarded problematic in public accountability relations where impartiality is expected to be a norm (cf. Rothstein & Teorell, 2008).

Overall, our study offers some support for the importance of justifications as a mechanism of accountability. However, some caution should be used when generalizing from the results because the number of experimental sessions–and consequently, the number independent observations—is somewhat low, and future experiments would therefore be needed to increase the robustness of our findings. Another source of uncertainty arises from the uncertainty regarding sender motivations. While our design was not conducive towards competition between the senders, we cannot conclusively rule out the possibility that some senders might perceive the game as a contest for favors from the responder. Moreover, the frequency of reciprocity as a basis of justifications might lend support to this interpretation of sender behavior.

In our study, experimental subjects were randomly assigned to the sender and responder roles. In this respect, it does not capture one aspect which is present in many public accountability relations, namely the authorization of office-holders. Communication was restricted to responders’ justifications for their decisions, whereas future research could address the effects of multilateral communication, including senders’ opportunity to publicly discuss and give verbal feedback on responders’ behavior and justifications. Moreover, social sanctions such as shaming were precluded since experimental subjects were anonymous, and subjects had no incentives to create reputations since groups were re-shuffled in each round of the experiment. Because our study had an exclusive emphasis on verbal justifications and material sanctions, the effects of social sanctions and reputation building are left for further research.

In our experiment, responders were accountable for one decision at a time, while elected representatives can be accountable for a number of decisions. Responders were accountable to all those individuals who were affected by their decisions, and in treatments involving punishment any (or either) of them could use this opportunity. Elected representatives are typically accountable only to a specific subgroup of the electorate, i.e., constituents, which incentivizes them to act according to the preferences of this particular subgroup (e.g., Chambers, 2004). Future research could also address the behavioral consequences of this restriction.