In this section, we, first, look at inspections and tax compliance following the hypotheses spelled out in Sect. 3.3. Then, we investigate potential spillover effects of inspections on compliance to answer the main research question of our paper. We, finally, perform an exploratory analysis of different types of audit strategies adopted by TAs.
Inspections
As a first step, we need to understand TAs’ behavior in the Human treatment, as this builds the stepping stone to understand TPs’ behavior, in which we are ultimately interested. TAs could influence the game by self-reporting the value from the die roll in each of the first 20 rounds, thus determining the occurrence of inspections. We have a total of 30 TAs, 14 in the condition Contingent and 16 in condition Flat.
In Fig. 1, we provide a representation of the distribution of the total number of inspections performed by each TAs (bars) together with the theoretical density distribution obtained from a binomial distribution with 20 random draws and a probability of success of 1/6 (solid line). The vertical dashed line captures the mean of the empirically observed distribution.
The overall frequency of inspections is equal to 32.8% and 31.1% in conditions Flat and Contingent, respectively. A binomial test shows that both frequencies are significantly different from the expected frequency of 16.7% (exact binomial test, for both p < 0.001). This means that, in line with Hypothesis 1a, TAs in Contingent tend to over-inspect TPs. However, in contrast to Hypothesis 1a, Fig. 1 highlights a strong similarity in behavior across treatments. The average number of inspections is equal to 6.2 and 6.6 in conditions Contingent and Flat, respectively. Non-parametric tests show that the two distributions do not statistically differ from each other (Wilcoxon rank sum test, \(p = 0.883\); Kolmogorov–Smirnov test, \(p = 1.000\)).
Result 1
Inspections are more frequent than predicted by the roll of the die, both in condition Contingent and Flat. There is no significant difference in inspection frequencies across conditions Contingent and Flat.
The questionnaire provides us with some more insights into TAs’ self-reported motivations that might help explain these results (see Appendix 2). The motivation that attracts the highest level of agreement is “I acted to enforce the rules”, with an average agreement score of 4.4. Most TAs disagree on the fact that they acted in their own personal interest. Finally, the interest of the group seems not to be a strong motivation driver, and most of the TAs believe that others in the same role would have behaved as they did. Overall, TAs’ answers do not significantly differ in Contingent and Flat (Wilcoxon rank sum tests, all \(p>\) 0.200). Furthermore, answers in the questionnaire are not correlated to the number of inspections implemented (Spearman’s rank correlation \(\rho\), all \(p>\) 0.161).
Tax compliance
Figure 2 provides a representation of the distribution of taxes paid in the 30 rounds of the experiment. In addition to the conventional pieces of information provided by the boxplots, the diamond dots capture mean values in each round. The dashed horizontal line shows the average value for each of the four experimental conditions separately.
Considering inspection condition Human first, the average taxes paid are equal to 18.3 and 17.9 in conditions Contingent and Flat, respectively. To gather a measure of the heterogeneity of contributions within the groups, we computed the distribution of the standard deviations of taxes paid within each group in a given taxation round. The average values of the distribution are rather large and equal to 9.898 and to 9.091 in conditions Contingent and Flat, respectively.Footnote 17
As shown by the boxplots, the central tendency of the distribution is larger than the full evasion prediction obtained under the assumption of risk neutrality and inspections faithfully determined by the outcome of the die. Furthermore, taxes paid are quite stable throughout the 20 rounds of condition Human. Spearman’s rank correlation tests do not show any significant correlation between taxes paid at the group level and round number, neither in Contingent (\(\rho = -0.033\) , \(p = 0.588\) ) nor in Flat (\(\rho = -0.085\) , \(p = 0.131\) ). Furthermore, taxes paid in the two incentive regimes of the Human treatment do not statistically differ (Wilcoxon Rank Sum test, \(p = 0.984\) ). This is a straightforward consequence of what we reported in Result 1 and goes partly against Hypothesis 1b.
Result 2
In condition Human, taxes collected are high, quite stable over rounds, and do not differ across condition Contingent and Flat.
The analysis reported in Appendix 3 shows a positive correlation between the number of inspections and tax compliance. Indeed, questionnaire data (see Appendix 2) reveal that the main motivation to pay taxes in condition Human is one’s own personal interest, driven by the urge to avoid sanctions. Most of the TPs disagree on the fact that the TA followed her personal interest or the group’s interest. The majority of TPs moderately agree on the fact that the TA followed the rules in a transparent way. The amount of taxes paid in Human is positively and significantly correlated to the belief that the tax agent acted in the group’s interest (Spearman’s rank correlation \(\rho\) = 0.194, p = 0.034). There is also a positive correlation between taxes paid and agreement about the payment of taxes in the group’s interest (\(\rho\) = 0.372, \(p<\) 0.001) and to avoid sanctions (\(\rho\) = 0.245, p = 0.007). The same pattern emerges also in condition Machine (\(\rho\) = 0.234, \(p<\) 0.010, and \(\rho\) = 0.229, \(p<\) 0.012). All other correlations between answers in the questionnaire and total taxes paid are not statistically significant (all \(p>\) 0.154).
Concerning taxes collected in condition Machine, we find that both for condition Contingent and Flat average compliance stays well above the full evasion benchmark predicted in Hypothesis 2 (18 and 15.5). The average standard deviation of taxes paid within groups is equal to 10.747 in Contingent and to 10.255 in Flat. These values are larger than those observed in condition Human. While there is no significant difference in taxes paid between the two incentive conditions (Wilcoxon Rank Sum test, p = 0.334), we register a drop in taxes paid in incentive condition Flat between condition Machine and Human. This drop between Flat/Human and Flat/Machine is moderate (2.4 points), but statistically significant (Wilcoxon Signed Rank test, p = 0.003). There is no significant difference between Contingent/Human and Contingent/Machine (p = 0.808).
Result 3
In condition Machine, taxes collected are high and quite stable. No significant differences can be observed between condition Contingent and Flat.
To provide further support for our results, we perform a regression analysis on the taxes paid by TPs. The dependent variable Taxes Paid is regressed against a set of explanatory variables: Contingent is equal to 1 in incentive condition Contingent and equal to 0 in Flat; Period captures the round in which taxes are paid and spans the range 1–20 in Human and 1–10 in Machine; Machine is equal to 1 in condition Machine and to 0 in Human. Table 2 presents the estimation outcomes of a linear mixed model with clustered random effects at the individual and group level.
Table 2 Tax compliance (linear mixed model) As shown by estimates in Table 2, taxes paid are positive and stable over rounds, though slightly declining in Human. We confirm that there is no significant difference between incentive conditions, as the coefficient of Contigent shows. The positive interaction term Contingent:Machine points to a significant difference between the Human condition in incentive condition Flat and the Machine condition in incentive condition Contingent. However, a linear hypothesis test (Contingent + Contingent:Machine = 0, Chi-square test, p = 0.206) shows no difference between Contingent and Flat in Machine. Furthermore, no significant drop in taxes paid between Human and Machine is observed, neither in condition Contingent (see coefficient of Machine) nor in condition Flat (Machine + Contingent:Machine = 0, Chi-square test, p = 0.491). In addition, estimated parameters show that the drop in contributions observed in Machine for incentive condition Flat is not statistically significant. These findings confirm the results on compliance reported above.
After having established that taxes paid in both the Human and the Machine condition are high, quite stable and that high tax compliance persists even when inspection rates are lowered below the deterrence threshold, we provide here an assessment of the impact of controls in Human on compliance in Machine (spillover effects). We identify a strong positive correlation (Spearman’s rank correlation \(\rho\)) between total number of controls in a group in Human and total taxes paid in Machine, both in Contingent (\(\rho = 0.560\), \(p = 0.037\)) and in Flat (\(\rho = 0.515\), \(p = 0.041\)). The presence of positive spillover effects is also corroborated by Table 3, reporting the fit of a linear mixed model with random effects at the individual and group levels. The dependent variable is given by taxes paid by an individual in a given period. The spillover effect is captured by the explanatory variable Inspections(cum).H, which counts the number of inspections performed by the TA. We also control for incentive schemes (Contingent vs. Flat) and for the Period of tax collection.
Table 3 Tax compliance and controls (linear mixed models) As the estimated coefficient of Inspections(cum).H shows, more inspections performed by a TA determine higher compliance levels in the Machine condition when inspections are fully random and non-deterrent. No significant difference between the two incentive schemes is registered.
Result 4
More inspections performed by the TAs in Human result in higher compliance in condition Machine.
Types of audit strategies
We complement our main results with an exploratory analysis of the inspection strategies employed by the TAs. TAs implement very different inspection strategies, which in turn lead to different compliance levels on the TPs’ side. To gain insight into this, we categorize TAs according to the total number of inspections performed in condition Human and the average compliance reached therein (see Appendix 3, where we display group compliance patterns together with audit strategies). A hierarchical cluster analysis leads to the isolation of three major groups,Footnote 18 as displayed in Figure 3. We label the three groups in the following way: Honests (triangles), Beaters (squares), and Educators (circles).
Beaters perform a large number of inspections and the TPs to whom they are associated display high levels of tax compliance. Educators perform fewer inspections, but achieve high tax compliance as well employing particular audit strategies (see Appendix 3). Honests implement the lowest number of inspections and register the lowest level of compliance. Non-parametric tests show that Beaters perform significantly more inspections than Educators and Honests (Wilcoxon rank sum test, p = 0.008 and p = 0.010, respectively). At the same time, Educators inspect more often than Honests (Wilcoxon rank sum test, p < 0.001). The frequency of inspection of the latter does not significantly differ from the truthful frequency of 1/6 (Wilcoxon signed rank test, p = 0.115).
Concerning the effects on tax compliance, Educators and Beaters reach significantly higher levels compared to the Honests (Wilcoxon Rank Sum test, p < 0.001 and p = 0.005, respectively). In contrast, no significant difference in taxes collected is registered when comparing Educators and Beaters (Wilcoxon Rank Sum test, p = 0.421).
Result 5
Three alternative styles of inspection strategies can be identified: Honests, Beaters, and Educators. Honests perform inspections in line with the preset inspection rule but obtain low compliance. The other two obtain high compliance. However, Educators perform significantly less inspection than Beaters.