Table 1 presents the main results from the field experiment by comparing averages across treatments for all performance indicators. As can be seen in Column (1), performance in the routine task of calling numbers is the lowest in the Control Treatment (1.43 dialed phone numbers per minute of net working time).Footnote 8 The smartphone ban increases performance considerably: on average, an employee dialed 1.55 numbers per minute in the Ban Treatment and 1.62 numbers in the Ban + Trust Treatment. Figure 3 depicts the average number of call attempts per minute separately for the three treatments. The difference between C and B is statistically significant at the 5%-level (\(p=0.045\) in a two-sided t-test), while the difference between C and B+T is significant at the 1%-level (\(p=0.004\)). The difference between B+T and B is not statistically significant (\(p=0.283\)). Accordingly, a smartphone ban that is accompanied by a trust signal to counteract the potential negative distrust signal increases call attempts per minute by more than 13%. Without the trust signal, performance still increases by more than 8% in comparison to the Control Treatment. The average performance increase among the two treatments is 10.68%, which can be seen in Table 1 when comparing the Control Treatment to the aggregate of both ban treatments. This pooled ban treatment effect is highly significant (\(p=0.005\)).Footnote 9 We additionally run OLS regressions using control variables as a robustness check. The explanatory variables of central interest are the treatment dummies B+T and B. Control variables included in the estimations capture the timing of the interviewer slotFootnote 10, gender, age, bachelor’s degree, freshman status, the number of return calls received, and an indicator for whether the employee used his or her smartphone during waiting time in front of the TV study’s headquarter prior to working. Furthermore, we add dummies for the different offices. Columns (1) and (2) of Table 2 present coefficients of OLS estimates based on a specification with all control variables listed above. Our main findings remain qualitatively the same. There is a positive and highly significant treatment effect for both ban treatments. The point estimate of the B dummy variable is somewhat larger than the point estimate of B+T, but there is still no significant difference between the two (\(p=0.522\)). This result holds once we additionally control for the Big Five personality traits (Columns (5) and (6)), taken from the online survey. Results are not sensitive towards using the smaller sample of 108 online survey participants, as shown in the middle columns (3) and (4).
Column (2) of Table 1 reveals the same findings when we turn to the total number of call attempts per employee without adjustment for interview time. This indicator’s average is the lowest in the Control Treatment (265.83) and the largest in the Ban + Trust Treatment (293.68), as can be seen in Fig. 4. The difference is statistically significant in a two-sided t-test (\(p=0.020\)). The average for the Ban Treatment is close to that from B+T (291.39; significantly different from C, \(p=0.028\)). Hence, we conclude that individual effort is significantly higher in the case of a smartphone ban, independent of its variant. Results from non-parametric tests show the same picture. The p-values of two-sided ranksum tests are \(p=0.015\) (C vs. B+T), \(p=0.029\) (C vs. B), \(p=0.745\) (B+T vs. B), and \(p=0.008\) (both ban treatments vs. C). OLS regressions in the vein of Table 2 show the same findings and are presented in Table A.2 in Appendix A.
Finally, we can use the administrative phone data to investigate the dynamic perspective of the treatment effect with respect to how many numbers an employee dialed. Since the phone bills document the exact timing of phone calls to existing households (as soon as an individual or answering machine received the call) and the exact duration of each conversation, we can separate all call attempts (according to the interviewers’ documentation) into time bins on a quarter-hourly basis. This allows us to estimate when each call attempt took place (whether it was in the first quarter-hour of the job, in the second, and so on). Figure 5 describes how the treatment effect emerges over time. The left (right) panel compares the aggregated average number of call attempts per quarter-hour in the B+T (B) treatment to that in C. Both figures start with the first 15 minutes of working time and reveal that there is already a small gap between the ban treatments and the Control Treatment at this early stage. The gap widens continuously throughout the working time until it reaches the final values discussed above.
Columns (3) and (4) of Table 1 report the average number of conversations and conducted interviews. The higher level of exerted effort when the smartphone ban was in place translates into a higher number of conversations, so that the number of conversations is on average above 85 in both B+T and B treatments, compared to below 80 in the C treatment. The average number of conducted interviews per employee amounts to 4.70 in the Ban + Trust Treatment, compared to 4.15 in the Control Treatment, and 3.83 in the Ban Treatment. Hence, only the Ban + Trust Treatment leads to a performance level that is higher than in the Control Treatment. The increase in conducted interviews amounts to more than 13%, which is the same increase caused by the Ban + Trust Treatment in comparison to the Control Treatment for call attempts per minute (see Column (1) of Table 1). Figure 6 visualizes the evidence on how the ban treatments affected the number of conducted interviews. The p-values of comparisons between ban treatments and Control Treatment are \(p=0.285\) (C vs. B+T) and \(p=0.497\) (C vs. B) in two-sided t-tests. The comparison of performance levels between the Ban Treatment and the Ban + Trust Treatment suggests a weakly significant effect (\(p=0.076\)). According to non-parametric test results, all pairwise comparisons turn out to be statistically insignificant at conventional levels. The p-values of two-sided ranksum tests are \(p=0.198\) (C vs. B+T), \(p=0.848\) (C vs. B), \(p=0.104\) (B+T vs. B), and \(p=0.532\) (both ban treatments vs. C). Table 3 presents OLS regressions in the vein of Table 2 and shows a marginally significant positive effect of the B+T treatment compared to the C treatment only when we consider the full set of controls. The coefficients of the B treatment and the B+T treatment dummy variables are significantly different at least at the 10%-level throughout the specifications. The number of interviews in the B treatment is in no case significantly different from the C treatment.Footnote 11
An important caveat to mention at this point is that the sample size underlying our analyses may not be sufficiently large to detect effects in all of the performance indicators. A lack of statistical power could explain why test results provide no evidence that smartphone bans improve performance in the non-routine task of convincing individuals to do a survey, which contrasts with our above evidence on the routine-task of dialing telephone numbers. Another possible explanation is that the perceived level of trust toward the employer could play a role in this particular performance dimension.
Further indicators of interest reflect the time interviewers spent on talking to people on the phone. Interviewers may spend more or less time on conducting the interviews. Differences in efficiency of carrying out interviews are certainly relevant from the employer’s perceptive, as, for example, some interviewers may take too long for one questionnaire and thereby waste time. As can be seen in Column (5) of Table 1, the fact that interviewers conduct more interviews in B+T is reflected in the total interview time per employee, while the average time needed per conducted interview (Column (6)) is fairly constant across the treatments. Its mean value is 427.81 seconds. The largest deviation from the mean can be found for B. It is, however, tiny (2.86 seconds). Hence, we conclude that the treatments did not change the way the interviews were carried out and that efficiency in interviewing does not seem to play a role in any of our other findings.
Discussion of channels and further results
In this section, we report suggestive evidence on reduced shirking as the potential transmission channel of a smartphone ban at the workplace. A natural measure for shirking behavior in our setting is the number of periods without calls (“breaks”). However, due to the nature of our field experiment, we cannot directly observe interviewers taking a break in their offices. The available phone bills inform us about the exact timing and duration of each call only if there was a contact with either a real person or an answering machine. Between these contacts, interviewers continued dialing numbers, for which we do not have exact timestamps. To still identify breaks, we look at all periods of more than 5 minutes without a call according to the phone bill. We define a period as “taking a break” whenever an interviewer documented 5 or less contact attempts within these more than 5 minutes, which would be clearly below the average attempts-per-minute ratio, according to Column (1) of Table 1. While the total number of breaks is rather small, the interviewers took considerably more breaks in the C treatment than with the smartphone ban (1.45 breaks on average in C, 0.70 in B+T, and 0.81 in B).Footnote 12 Fig. 7 depicts the dynamic perspective with respect to breaks and shows that throughout the whole period of 3.5 h, interviewers almost consistently took the most breaks in the Control Treatment.
The reduced number of short breaks in the ban treatments may be associated with the restricted use of smartphones. To shed some light on this intuition, we utilize a question in the online survey on actual smartphone use during working time. Results show that employees’ self-reported level of smartphone use is significantly lower in the ban treatments. The survey data reveals that 17 employees (i.e., 47.22% of the workforce for which we have information from the online survey) used their smartphone two times or more during working time in the control group.Footnote 13 In contrast, only 8 individuals used their smartphones that often in each of the ban treatments (22.22%), which suggests that actual smartphone use was much lower in B+T and B than in C (\(p=0.014\), two-sided Fisher’s exact test). We also observe a substantial number of individuals reporting on using their smartphone once during working time in the ban treatments (10 in B+T and 15 in B), which suggests that employees may feel safe to misbehave one time (and even report about it honestly), but not more often. Finally, we take the self-reported measure on whether an interviewer used his or her personal smartphone twice or more often during the working time as the independent variable (dummy variable Yes/no) in an ordered probit regression with the number of breaks as the dependent variable. It turns out that self-reported phone use predicts the number of identified breaks (\(p=0.007\)). This relationship remains stable when we employ the full self-reported phone use information as independent variable without recoding it as a dummy (\(p=0.030\)). We conclude that the ban treatment effects could have been induced by less shirking which we can link to a reduction of smartphone use during the working time.
We now focus on the potential negative side effects of the ban and specifically on the level of (dis)trust perceived by the employees. The feedback survey contained an item on how individuals perceived their job: “I felt that the head of the TV study put a large amount of trust in me”, measured on a Likert scale from 1 (“Completely disagree”) to 7 (“Completely agree”). The left panel of Figure A.9 in Appendix A depicts the average level of perceived trust per treatment and shows no significant difference between any of our experimental conditions (with average trust levels of 5.95 in B+T, 5.70 in B, and 6.00 in C). A similar picture emerges when we base our analysis on a similar question from the online survey using the same scale, as can be seen in the right panel of Figure A.9.Footnote 14 Overall, these results suggest that the smartphone ban did not decrease the perceived level of trust – irrespectively of whether it was accompanied by an additional trust signal or not. Moreover, the average level of perceived trust was considerably high, indicating that the interviewers did not feel distrusted at all. The online survey provides us with further evidence on the perception of a smartphone ban based on an item which reads: “Do you interpret a smartphone ban at the workplace as a signal of distrust?”. Possible answers ranged from 1 (“Not at all”) to 7 (“Absolutely”) on a Likert-scale. One out of four tended to clearly agree (answer 6–7) and 6.48% concurred “absolutely” with that understanding. Roughly a quarter of the respondents reported a 5, which could be seen as a weak form of agreeing on the 7-point scale. Hence, when we ask directly about distrust triggered by the smartphone ban, half of the respondents seem to at least weakly agree with this idea, which contrasts with the above evidence. In consequence, we cannot rule out that smartphone bans induced distrust effects in our setting, given that the survey evidence does not reveal a clear picture.
To learn more about the role of trust and distrust at the workplace, a vignette study on work motivation in hypothetical workplace scenarios was integrated into the online survey. The scenarios differed in regard of the level of employer control, closely following an idea by Falk and Kosfeld (2006). We observe that the interviewers generally prefer trust over control and report higher work motivation when the employer abstains from controlling in otherwise identical scenarios. In some cases, however, control appears to be perceived as legitimate, so that work motivation was comparatively high. According to comments that could be entered into a text box, some individuals actually suggested considering control measures in the trust versions of the vignettes.Footnote 15 Two additional survey questions shed some light on why the ban may not have reduced trust toward the employer. First, employees were asked in the online survey whether they think that a smartphone can distract people from their work (Yes/a little/no/don’t know). The left panel of Fig. 8 shows that an overwhelming majority of 91.67% chose one of the first two categories (with 62.96% choosing “Yes”). This is a strong indicator for smartphones being perceived by the employees as a source of distraction. Second, a question from the feedback survey asked whether the interviewers think that private use of internet or smartphone on the job is alright (on a Likert scale from 1 (“Disagree completely”) to 7 (“Agree completely”)). The right panel of Fig. 8 shows that a large majority disagrees with that statement. Less than one fifth tend to agree (with only less than 6% indicating “6” or “7”). Employees hence show only little support for browsing the internet and using personal smartphones in the workplace. While university regulations described in Section 2.3 could have contributed to this finding, we conclude for our setting that attempts by the employer to restrict misbehaviors are not necessarily perceived negatively, but may be seen as a legitimate measure to foster task achievement.
The literature on control and monitoring at the workplace discusses a variety of different aspects potentially relevant for employee behavior, some of which we briefly examine in this subsection. First, although there was no monitoring at the workplace, the interviewers may have had the perception of being monitored and this could have affected their performance. A question in the online survey asks interviewers if they knew that the phone bill could be used ex-post to check the correctness of their work. Possible answers were “yes”, “more or less”, and “no”. Only 8.33% of the employees were aware of this. Employees in the ban treatments had no higher awareness of the possibility to obtain data on job performance via the phone bills.Footnote 16
Second, it could be that smartphones offer benefits from the interviewers’ point of view, such as the possibility to recover from a long interview, which we should observe in their perception of the job. This would lead us to expect higher levels of satisfaction in C than in B+T and B. The data, however, reveals that this is not the case. Both job satisfaction and satisfaction with working conditions are very high and almost identical across all treatments (see Figure A.10 in Appendix A), which does not suggest any differences in the perception of the work environment or any psychological costs of the ban.
Third, another aspect that could influence the impact of a ban is the potential signal in regard to coworkers’ job performance. At first glance, one may expect the need for a ban to be higher in workplace environments where shirking is more common and average performance rather low. The results from the survey rather reject this, since only a minority of individuals interpret a smartphone ban as evidence for a high prevalence of shirking. In a separate question on whether a smartphone ban is a signal for low performance expectations, only 13.89% tended to agree with that and just 6.48% agreed absolutely. Quite the contrary, it seems instead that the ban-induced signal on the performance of others might have been a positive one. A question in the online survey asked the employees to estimate the average number of conducted interviews. Accordingly, 44.74% of the Control Treatment (C) employees believed that the entire workforce had conducted on average four or less interviews, which is below the actual mean (see Column (4) of Table 1), while 32.43% in the ban treatments also estimated this performance level. Meanwhile, only 18.42% expect a high performance of six or more interviews in the Control Treatment (C), in contrast to the 29.73% in the ban treatments. An OLS regression with the estimated number of interviews as the dependent variable and a dummy for the ban treatments as explanatory variable reveals a weakly significant effect (\(\beta =0.728\), \(p=0.083\)). While this piece of evidence does not confirm the interpretation that the ban worked as signal for high work performance, the evidence certainly contradicts the contrary idea that banning of smartphones might go along with the establishment of a shirking norm and a signal of others’ low performance.
Finally, imposing a smartphone ban may signal a high level of job importance which then might lead to high effort. The online survey included an item that asked for the subjective importance of performance targets during the interview job. There are no statistically significant differences between the treatments. The same holds with respect to the survey item “I felt my performance was appreciated by the head of the TV study”.
Given the results presented in this subsection, we conclude that it is unlikely that the positive treatment effects on the number of dialed numbers are driven by aspects other than the reduction of shirking. Note that in supplementary analyses on how potential mechanisms discussed in this subsection relate to performance, the only robust finding is that perceived co-worker performance correlates with both call attempts per minute and conducted interviews in significant ways.
Other side effects
In this subsection, we focus on counterproductive behaviors that could be triggered by our treatments. One factor of interest in this context is the number of faked interviews, which are completed questionnaires for which there is no entry in the telephone bill. Yet, those are very rare (one case in each ban treatment and none in the Control Treatment). In addition, we observe that some employees took away pens from their office tables (two in the Control Treatment (C), three in the B+T treatment, and none in B).Footnote 17 Furthermore, when someone did not dial the area code but instead called households in the surrounding area of the university, this constitutes a form of sloppy behavior that is harmful to the goal of collecting representative data. The incidence of a false telephone number happened despite clear instructions on how to correctly use the dialing code (two times in the Control Treatment (C), six times in the B+T treatment, and zero times in the B treatment).
Table 4 summarizes the prevalence of undesirable behavior, in which we also include indicators from the online survey on whether interviewers reported having followed the instructions. We find that cases of counterproductive behavior occurred significantly more often in the B+T treatment than in the C and B treatment (\(p=0.037\) respectively \(p=0.017\) in two-sided Fisher’s exact tests). This suggests that it is not the ban itself that triggers undesirable behavior, as we observe the lowest degree of sloppiness in the Ban Treatment. Given the few incidences of sloppy behaviors, we are cautious with interpreting these results; yet, it appears that in this regard, the trust message might have affected employee behavior in a way not intended by the employer.Footnote 18