1 Introduction

Contracts that link remuneration to the achievement of performance targets are widely used to align the interests of employers and workers in both the private sector (Lazear, 2000) and the public sector (Burgess & Ratto, 2003). The considerable economic literature on the effects of performance contracts acknowledges that incentives improve outcomes through workers’ increased effort, but generally falls short of unpacking what dimensions of effort change. According to psychologists, incentives increase performance through three potential pathways or changes in effort (Kanfer, 1990). First, incentives can impact the direction of effort, which is the choice individuals make to focus on one task or another. Second, incentives can change the intensity of effort, or the extent to which workers apply their cognitive resources (i.e. attention or focus exerted to minimise mistakes or increase efficiency, or both). Finally, incentives may affect the persistence of effort, which is the time workers spend on a given task. Understanding the mechanisms through which performance is achieved may be particularly important in multi-tasking settings, especially if work is constrained by time limits. If higher performance is achieved through a change in the direction or persistence of effort i.e. individuals engage more in the incentivised activity or spend more time on it, performance in non-incentivised activities is likely to decline due to the automatic reduction in the time available (Holmstrom & Milgrom, 1991). However, if incentives change attentional processes (i.e. the intensity of effort), Kahneman (1973)’s work on attention research suggests that this could increase the overall attentional resource pool available to workers, leading to positive spill-overs on a non-incentivised task (Yechiam & Hochman, 2013b).

In this paper, we explore the direct and indirect effects of incentives on performance and effort channels in a real effort task, under two types of contracts: one that rewards workers’ good performance and another that penalises them for poor performance. Prospect theory suggests that because of loss aversion, framing incentives as losses can be more motivating than equivalent rewards (Kahneman & Tversky, 1979). A number of studies have tested this prediction, with mixed results,Footnote 1 but very few have explored the mechanisms through which increased performance is achieved under both frames. We contribute to the experimental literature using real effort tasks to study the effects of financial incentives on quantity and quality of output.Footnote 2 We also build on studies from cognitive psychology that have shown how potential losses, unlike rewards, heighten the level of attention of subjects (Yechiam & Hochman, 2013c), which can result in increased performance for a non-incentivised task (Yechiam & Hochman, 2013b). We measure the effects of the two contract frames on performance and two possible effort channels (persistence and intensity of effort) in a real-effort experiment that mimics the healthcare context, where performance contracts are ubiquitous and often incomplete.

We describe our experiment design in Sect. 2. Subjects performed a real-effort medically framed task that involved two activities: a routine activity (medical data entry) and a cognitive activity (diagnosis). We randomly allocated participants to a control, gain or loss contract. All contracts included a base pay. In the gain contract, subjects could earn an additional bonus whose size depended on performance in the data entry activity. In the penalty contract, the base pay would be reduced by an amount conditional on their performance in the data entry activity.

We report our results in Sect. 3. Performance in the data entry activity improves in a similar way under the gain and loss contracts, but this is achieved through different behavioural responses. Subjects facing potential losses improve their performance through increased intensity of effort (i.e. reducing the number of errors made), while subjects facing rewards increase persistence of effort (i.e. increased time spent on the incentivised activity). There is no evidence of either negative or positive spill-over effect of either contract on the non-incentivised activity. We discuss these results and their implications in Sect. 4.

2 Experimental setup

2.1 The medical task

We developed a novel real-effort task to mimic the key features of a medical consultation. Before the start of a 10-min period of work,Footnote 3 participants receive 10 files of hypothetical patients. A patient file is a laboratory test report that includes 22 two- or three-digit numbers corresponding to standard blood tests.Footnote 4 Subjects are then asked to ‘manage’ as many patients as they can during the 10-min period. Managing a patient is done in three successive steps, with subjects required to validate one step to go to the next one:

  1. (1)

    Registration: entering patient identifier on the computer interface;

  2. (2)

    Data entry: entering individual blood test results into a computer mask;

  3. (3)

    Diagnosis: interpreting the haematology results by identifying the correct pathology from a list of 13 possible diagnoses.Footnote 5

Although the registration phase is necessary, it is the data entry and diagnosis activities that are the focus of the medical task.Footnote 6

The task design shapes the production process and limits strategic behaviour in several ways. First, each patient managed involved the same 3-step process, meaning that the data entry and diagnosis activities are sequenced (the diagnosis choice automatically follows the end of the data entry activity). Although participants can still choose to devote little time to either activity, they cannot completely ignore one activity if they choose to engage in the other.Footnote 7 Second, cherry-picking of easier diagnoses or easier blood test results is possible, but unlikely. Time constraints make identifying easier diagnoses inefficient. Regarding data entry, all reports were qualitatively roughly similar in terms of data entry overall difficulty (i.e. same number of digits). Moreover, while a rational individual might choose to enter easier data entries (e.g. those requiring fewer characters), skipping specific results would require an attention probably more costly than the expected gain.Footnote 8

The welfare benefits of health services for patients are a key feature of healthcare markets, not least because they can form part of providers’ utility functions (Arrow, 1963). To incorporate this factor into the task, we followed other experiments conducted in health (Hennig-Schmidt et al., 2011; Lagarde & Blaauw, 2017) and linked subjects’ performance to donations to a healthcare charity. Both types of activities (mundane process activities and cognitive ones) are important to achieving high quality of care in the real world. Therefore, they both generate social benefits (i.e. donations to charities) in the experiment. The social incentive was set at R0.20 (USD0.02) for each correctly entered test item and R1.50 (USD0.15) for each correctly identified diagnosis.Footnote 9

2.2 Experimental design

Because doctors’ cognitive effort is difficult to observe, performance contracts in health usually focus on routine activities which contribute to better health outcomes. For example, payments are linked to process measures of quality of care, such as undertaking routine checks or monitoring.Footnote 10 Following this logic, we test the effects of contracts that reward performance in the data entry activity, while the diagnosis activity is not incentivised.

Subjects were randomly assigned to one of the three treatments: control, gain or loss treatment.Footnote 11 The gain and loss treatments were isomorphic and only differed in their framing. In the gain treatment, subjects earned a base pay of R90, plus an additional bonus worth R10, R20, R30, R40 or R50, depending on the total number of correct entries made.Footnote 12 In the loss treatment, the payment specified a base pay of R140, minus a penalty of R10, R20, R30, R40 or R50, if a minimal number of correct entries was not made.Footnote 13 In the control treatment, participants received a fixed pay of R105. We used results from a pre-testFootnote 14 to calibrate this amount, seeking to equalise the expected remuneration across treatments to control for the income effect on performance.

Participants took part in two consecutive periods of work of 10 min each. Within each group, the second period of work was identical to the first period for half of participants, while the other half was randomised to receiving a reward for each correct diagnosis to stimulate performance in the diagnosis task. Since our objective here is to explore the relative effect of the gain and loss framing of incentives, we focus on the first period, where the control group receives no incentive, and use the second one only as a robustness check.Footnote 15

The experiment was run using the software z-Tree (Fischbacher, 2007)—see online Appendix C for screenshots. Participants received an attendance fee of R30 (USD2.90) and were allocated to a workstation according to a random blocked design to obtain an equal number of participants per treatment (see online Appendix D for more details on experimental procedures). Specific instructions on the computer screen explained how they would be remunerated in the task (see online Appendix E). Before the task started, subjects were informed that patient benefits generated in the task would translate into actual donations for healthcare delivery and they could select their preferred charity from a list of five.Footnote 16 At the end of the session, each participant received their payment anonymously after completing a short questionnaire capturing basic socio-demographic information.

A total of 180 medical students participated in 11 experimental sessions. A session lasted approximately 45 minutes and on average participants earned R118.3 (USD11.4) in addition to the attendance fee, and a total of R5,223.9 (USD505) was transferred to charities. Participants were fifth year medical studentsFootnote 17 from the Medical School at the University of Witwatersrand in Johannesburg, South Africa. Their characteristics were similar across all treatment groups (see Table A1 in online Appendix A).Footnote 18

2.3 Testable hypotheses and data

We formulate the following five testable hypotheses:

H1: following standard economic theory, financial incentives in the loss and gain treatments will lead to higher performance in data entry.

H2: loss aversion theory (Kahneman & Tversky, 1979) predicts that performance in data entry will be higher in the loss treatment compared to the gain treatment.

H3: according to studies in psychology, the mechanism behind increased performance with incentives framed as losses is not the higher subjective weight given to losses compared to gains (Kahneman & Tversky, 1979), but the fact that losses create a physiological arousal in subjects which draw their attention to the task more than gains (Yechiam & Hochman, 2013a, c). Hence, in our context, this “loss attention” model predicts that higher performance in data entry will be achieved through increased attentional investment (i.e. higher accuracy).

H4: the theory of incomplete contracts (Holmstrom & Milgrom, 1991) predicts a reduction in effort (time) invested in the diagnosis activity when data entry is incentivised, leading to a reduction in performance. However, in our setting, subjects (medical students) might be intrinsically motivated to perform in this task, hence limiting the negative effect of incentives on the non-incentivised activity.

H5: according to attention research (Kahneman 1973), when individuals work on several tasks, an increase in attention in one task can occur through two different mechanisms: (i) a change in the relative allocation of attention from one task to the other or (ii) an overall increase in attentional resources, which will benefit all tasks proportionally to the initial allocation of resources. In line with the prediction of the “loss attention” model (H3), losses are expected to increase the overall attentional resources, leading to a positive spill-over effect on the diagnosis task even if it is not directly incentivised (Yechiam & Hochman, 2013b).

Performance in the data entry activity is measured by the total number of correct entries made over the period, since this is the performance target in the loss and gain treatments. Similarly, we consider the number of correct diagnoses made as the performance measure in the diagnosis activity.

To explore how incentives affect two possible channels of effort,Footnote 19 we first measure the persistence of effort as the total time spent on an activity.Footnote 20 Second, in the absence of a physiological measure of attention, we use accuracy in an activity (proportion of correct attempts out of total attempts made) as a proxy for effort intensity, assuming that increased attention reduces errors. However, if individuals seek to minimise errors (improve performance) by double-checking ex-post that their response is correct, the two channels of effort may not be independent from each other, as verifying one’s responses takes time. In online Appendix F, we show that the correlation between our measures of effort intensity and persistence is low, thus providing support to the notion that performance is increased by being more focussed and avoiding mistakes ex-ante.

Table A2 in online Appendix A provides descriptive statistics of all performance measures for the three groups.

3 Results

3.1 Effect of framing on performance in the incentivised activity

We first explore the effects of incentives on the targeted activity (data entry). Evidence from the distribution of performance results (Fig. 1) supports hypothesis (H1) that performance in data entry is higher in the two incentive treatments. Overall, about 20% more correct entries are made under performance contracts: compared to the 96.9 entries in the control treatment, participants made 117.8 correct entries in the loss treatment (p = 0.008 two-sided Mann–Whitney U-Test, hereafter MW test), and 116.6 in the gain treatment (p = 0.023 MW test). However, there is no evidence supporting the prediction (H2) of loss aversion theory that performance under the loss contract is higher than under the gain contract (p = 0.839, MW test). This result is robust to the inclusion of subjects’ characteristics in a regression (Table 1, column 2), and we fail to reject the null hypothesis that individuals’ performance is the same under both frames (p = 0.648, test of equality of coefficients).

Fig. 1
figure 1

Performance in the data entry activity, by treatment

Table 1 Impact of financial incentives on performance in data entry

3.2 Effect of framing on the channels of effort

Looking at the effort channels through which higher performance was achieved under the two contracts, the raw data (see Table A2 and Figure A1 in online Appendix A) show that subjects in the gain contract achieve high performance by spending more time on data entry compared to the other two groups. On average, they spend 30 more seconds on this activity than subjects in the control group (p = 0.019), and 25 more second than those in the loss group (p = 0.012). By contrast, there is no difference between the control and loss group (p=0.995). Regression results presented in Table 2 confirm these findings, which remain robust to the inclusion of individual controls (Column 2).

Table 2 Impact of financial incentives on effort persistence and intensity in data entry

Next we consider the intensity of effort, proxied by accuracy in the task. Participants in the loss group achieve near perfect accuracy: with 97.8% of entries correctly made, this is 9 percentage points (pp) higher than in the control group (p = 0.014) and 5.2 pp higher than the gain treatment (p = 0.042).Footnote 21 This level of attention and limited number of mistakes are consistent with the notion of heightened attention dedicated to the task due to the threat of losses (H3). The results are robust to controlling for additional demographic characteristics (Table 2, column 4).

3.3 Effects on the non-incentivised activity

Next, we consider the effects of the contract frames on the non-incentivised diagnosis activity. Unlike what is predicted by standard economic theory (H4), performance (number of correct diagnoses) does not decrease under either incomplete contract. Subjects in the control group make 3.2 correct diagnoses, against 3.5 correct diagnoses (p = 0.189) in the gain treatment, and 3.3 correct diagnoses in the loss treatment (p = 0.276). This null result is robust to the inclusion of individual characteristics (Table 3, column 2). Overall, this result supports the notion that individuals are intrinsically motivated to perform well in this task.

Table 3 Impact of financial incentives on effort and performance in diagnosis identification (non-incentivised activity)

Turning to the channels of effort, we fail to detect any difference in intensity of effort (accuracy) between the incentives and the control group (Table 3, columns 3 and 4), or across incentive frames. In the loss treatment, there is no evidence of a positive spill-over effect of the heightened attention in data entry on the diagnosis activity (H5). Consistent with the result that they spent more time on data entry, subjects under the gain contract spent nearly 23 fewer seconds on the diagnosis activity, while there is no evidence that under the loss framed incentives, subjects spent less time on diagnosis (Table 3, columns 5 and 6).

3.4 Choice of optimal effort allocation

Even though a correct diagnosis generated no private monetary gains, our data show that participants spent on average more than 20% percent of their time on diagnosis,Footnote 22 which is inconsistent with purely selfish financial motives. Beyond intrinsic motivation, could altruistic motives explain this behaviour? To answer this question, we consider how altruistic subjects should allocate their time between diagnosis and data entry. This choice depends on relative expected (social) earnings per unit of time spent in both activities. Given that social rewards for each activity are fixed, participants should allocate their effort between data entry or diagnosis depending on their relative abilities (a combination of their speed, i.e. average time per output produced, and their accuracy). A rational decision-maker should focus on data entry as long as each second spent on this activity yields higher returns than a second spent on diagnosis. Given the social benefits attached to the two activities, a subject should focus on data entry if the time she needs to obtain a correct diagnosis is at least 7.5 times higher than the time per correct data entry.Footnote 23 Taking the median abilities in the control group as proxiesFootnote 24—i.e. 4.12 seconds per correct data entry, and 47.95 per correct diagnosis (see Table A2)—the optimal strategy is to focus entirely on data entry to maximise charity donations. We find that the median subject dedicating all of her working time to data entry could raise R25.9 for charity, by making just under 130 correct entries, which is higher than the median performance observed in all treatments.Footnote 25 Note that there is one case where it would be optimal to have the reverse strategy: focus on diagnoses and spend any remaining time on data entry. This would be for participants with high abilities in diagnosis identification (speed and accuracy), but median skills in data entry—see online Appendix H for a description of this scenario.Footnote 26 However, this combination of skills is highly unusual; only one individual in the sample fits this profile.

3.5 Robustness check

Results from a second period of 10-min undertaken in the same conditions by half of respondents are shown in Table A3 in online Appendix A. The results confirm the similar positive effects of incentives on performance (Table A3 columns 1–2), with no evidence of a higher performance increase with incentives framed as losses (p = 0.964). As in period 1, higher performance in the Gain treatment is achieved through an increase in effort persistence (Table A3 columns 3–4), but no increase in effort intensity (Table A3 columns 5–6). Evidence also confirms the notion that incentives framed as losses (but not gains) increase attention (H3), proxied by accuracy in data entry. However, we also find that individuals in the Loss treatment achieved a higher performance in data entry through an increase in effort persistence. This may not be entirely surprising in a context where 74 percent of subjects under both incentive frames spent more time on data entry in period 2 relative to period 1, possibly after realising that time spent on data entry could lead to higher returns.Footnote 27 As a result, in both treatments, there is a substitution of effort persistence away from the diagnosis activity. Yet this does not translate into a significant reduction in performance in diagnosis identification.

4 Discussion and conclusion

Using a novel medically framed real-effort experiment, we evaluate the impact of incomplete contracts, framed as gains or losses, on subject performance and effort channels. We find that both types of incentives lead to increased performance (H1), with no evidence that losses triggered a greater response than gains (H2). This null result echoes those of studies where subjects are told in advance which performance target they have to reach (de Quidt et al., 2017). Another possible explanation lies in the fact that incentives used were relatively small for participants, which has been found to reduce the likelihood of loss aversion (Mukherjee et al., 2017).Footnote 28

We find that the two framings triggered different behavioural responses. Consistent with the loss attention model (H3), subjects facing losses achieved higher levels of performance by increasing their attention in data entry. The potential implications could be far from trivial in settings where errors are costly. In healthcare, for example, greater attention can reduce medical errors and adverse patient outcomes (Yang et al., 2018). However, generalising beyond the lab is challenging, especially from a 10-min period of work. Sustaining increased attention over longer periods of time could become costly and generate other trade-offs, as seen here in the second period of work. More research will be needed to explore the context in which performance contracts framed as losses can be used to reap the benefits of increased effort intensity.

How to explain that performance in the diagnosis activity did not fall when data entry was incentivised (H4)? Several factors may explain this result. First, performance in that task may be less sensitive to effort exerted and more to the (random) difficulty of cases seen. As mentioned in the description of the task, cases included easy and common diagnoses—needing little reflection—and more uncommon ailments, requiring more advanced knowledge, so that many medical students could have failed to identify the latter, even when spending a lot of time on them. The imperfect relationship between time and performance could partly explain why, even when less time was spent on diagnoses, performance still did not fall. Second, in both treatments, this result may be driven by the interaction of the task design (sequencing of the two activities) with the subjects’ intrinsic motivation. Since the diagnosis screen automatically appeared when subjects finished a data entry sheet, not only was it impossible to skip the diagnosis activity entirely, but it was also a reminder of the potential satisfaction derived from doing this task, which was deemed interesting by mostFootnote 29 and echoed participants’ identity (Akerlof & Kranton, 2000).Footnote 30 Theoretical models suggest that if agents are intrinsically motivated to exert effort (Besley & Ghatak, 2018), the expected adverse effects of incomplete contracts may be limited, as it has been found in empirical studies in health.Footnote 31 In the loss treatment, this result is also likely driven by the fact that increased performance in data entry is obtained through increased intensity of effort (H5), which does not deplete resources available for diagnosis identification.Footnote 32 Lastly, even in the absence of monetary reward for data entry, as shown in Section 3.4, a prosocial subject willing to maximise the payoff of the charity should already prioritise this activity over diagnosis in the control group. A different result might emerge in a setup where the social incentives to identify correct diagnoses at baseline were higher.

The next question is why did we not observe a positive spill-over effect of heightened attention on diagnosis performance, resulting from an increase in the pool of attentional resources (H5)? The answer may lie, again, in the non-linear relationship between effort and performance: unlike the simple decision task used by Yechiam and Hochman (2013b), identifying a diagnosis correctly not only requires attention, but also knowledge: as several diagnoses were particularly hard to find, the scope for improved performance was limited for most participants.

Our results highlight the importance for employers of considering not simply if incentive contracts are effective at increasing the rewarded dimension of performance, but also how these contracts improve performance. Indeed, our findings suggest that the behavioural implications of different incentive designs may be far from trivial, especially in settings where individuals undertake different types of tasks. In the setting we simulated, where remuneration was linked to the quantity of output of good quality produced, although incentives framed as gains and losses both achieved a similar result, the loss frame led to the virtual elimination of wastage in production, as participants no longer made errors. This is a key upside, which could have important implications for performance pay in settings where both quantity and quality of outputs are critical. On the other hand, incentives framed as gains reduced some of the effort that workers put in the non-rewarded activity. While this had no further consequence on performance in our experiment, for reasons discussed above, it could become a liability in a context where individuals are not intrinsically motivated. In general, our findings should encourage researchers and policy-makers to explore further the relative effects of incentives framed as losses and gains for incentivising workers, and in particular healthcare providers.