1 Introduction

Although multitaskingFootnote 1 is increasingly common in the modern work environment, its productivity effects remain underexplored. Furthermore, the stereotype that women are better at multitasking is almost universally accepted but, again, scientific evidence is missing. This paper fills these gaps through an experimental design which allows us to answer the following research questions. First, how does multitasking affect productivity? Second, do people perform better when they are allowed to choose their own schedule? Third, are there indeed gender differences in the effect of multitasking on productivity? And fourth, are there gender differences in the propensity to multitask?

The first pair of questions is motivated by a practical concern: how to schedule tasks optimally. Is sequential execution advisable, or is it more productive to alternate (that is, to multitask)? Is it optimal to let workers choose their own schedule or should companies impose one? Although it seems intuitive that scheduling has an impact on productivity, this topic has received little attention so far in economics. The vast literature on multiple tasks focuses instead on the pros and cons of bundling different tasks into a single job and on what kind of tasks should be grouped together.Footnote 2 The literature on workers’ decision making rights does not address scheduling directly either.Footnote 3 The only paper we found analyzing the impact of work schedules is Coviello et al. (2011). They examine court cases, where a natural candidate for the measure of performance is average duration. They find that judges who work on many cases in parallel take more time than judges who work sequentially to complete similar portfolios of cases. Their results confirm that work schedules are indeed an important determinant of productivity.

The second pair of research questions is motivated by the gap between popular views and scientific evidence: best-selling books advertise that women are better at multitasking as a scientifically established fact,Footnote 4 while in reality this gender difference has not so far been shown by any peer-reviewed paper.Footnote 5 While empirical evidence is lacking, the view that women are better at multitasking gets support from the hunter-gatherer hypothesis, a theoretical argument in biological anthropology. In particular, Fisher (1999) claims that the prehistoric division of work “built” different aptitudes into the male and female brain through natural selection. Different skills are required for hunting, performed by males, than for gathering, performed by women. As a consequence, argues Fisher, women think “contextually”, as they synthesize many factors into a “web of factors”, while men think linearly, focusing on a single task until it is done. This implies that women are both better at multitasking and more inclined to do it. Our design allows us to examine both these claims explicitly.

We examine the above research questions empirically by conducting an experiment in which subjects are randomly allocated to different work schedules. Participants have to perform two separate tasks (a Sudoku and a Word Search puzzle) according to one of three different treatments: one where they perform the tasks sequentially, one where they are forced to alternate between the two tasks, and one where they can freely organize their work. The amount of time spent on each task is identical in each treatment. Performance differences between treatments therefore measure the productivity effect of the different schedules. Relative performance in the third treatment, where subjects can freely choose the degree of multitasking, is indicative on whether individuals should be free to organize their own schedule. Gender differences in performance in the second treatment allow us to test whether men perform worse than women when they are forced to multitask. Finally, choices in the third treatment are used to test whether men indeed prefer a more sequential schedule than women.

Related to our paper is a literature on ‘task-switching’ in psychology (see Monsell 2003 for a review). In these experiments, a series of stimuli is presented to participants who have to perform a short task on each stimulus. For example, pairs of numbers are shown and subjects have to either add them up or to multiply them (see Rubinstein et al. 2001). From time to time, the required operation changes. It is commonly found that there are ‘switching costs’ associated with changing tasks, i.e. the response to the stimuli is slower after a task-switch. This literature can, however, not answer our research questions. The tasks used are too simple to expect any advantages from multitasking and subjects are not allowed to choose their schedule freely. Also, these experiments are not usually incentivized. In contrast, we use two complex tasks of much longer duration. These tasks are contingent, meaning that after switching back, subjects return to working on the same ongoing problem. Subjects can therefore expect an advantage from alternating: they can switch when they get stuck and later look at the same problem with a ‘fresh eye’. Indeed, our subjects do switch when they are allowed to.

Finally, none of the psychological experiments are designed to examine gender differences. Their samples are generally too small to do so and often characterized by strong gender imbalances. Our comparatively large and balanced sample, on the other hand, allows us to test both whether there are gender differences in the effects of multitasking and in the propensity to multitask.

2 Definitions

There are several possible definitions of multitasking.Footnote 6 The variant we address in our experiment is the one that is most relevant in the workplace: people switching between multiple contingent tasks.Footnote 7 It is also this form of multitasking which has garnered the most interest in the popular press, where articles about the productivity effects of multitasking are common. In our experiment, subjects continue working on the same problem after they return from their work on the second task, similar to an employee switching between tasks or having his work at hand interrupted by another, perhaps more urgent task. Another relevant example is when people multitask on a computer, switching back and forth between windows or tabs.

Note that our definition of multitasking is similar to what psychologists call task-switching, but there is an important difference between the two: contingency. When tasks are contingent, there are potential benefits to multitasking, such as seeing an old problem with a ‘fresh eye’. In contrast, in previous task-switching experiments subjects get a new stimulus to work on each time (e.g. they get a new pair of numbers to add up), so only the operation remains the same, not the problem they are working on. In this way, we aim to investigate the type of multitasking which occurs in a modern work environment where employees switch between several demanding and ongoing tasks.

In line with this aim, we chose tasks that require primarily mental effort and have virtually no physical aspects. Our tasks are therefore not chosen to resemble household activities like doing the dishes and taking care of children. Research that focuses on such household activities can be found in Kalenkoski and Foster (2010).Footnote 8

Note also that our definition of multitasking explicitly ignores a possible advantage of working on multiple tasks, namely the possibility of reallocating time between these tasks. We see time allocation as separate from multitasking: it is possible to reallocate time between tasks while still executing them sequentially and conversely it is possible to multitask without reallocating time between tasks. One strength of our design is that it clearly separates these two mechanisms: we keep time allocation constant so we can identify the effect of scheduling.

3 Experimental design and data

3.1 Treatments and groups

Three treatments were applied during the experiment: Treatment Single, Treatment Multi, and Treatment Choice (subjects were randomly allocated to treatments within each session and were unaware of these labels). In Treatment Single, subjects had to work on two tasks consecutively, for 12 minutes each. In Treatment Multi, subjects were forced to switch between the two tasks approximately every four minutes,Footnote 9 resulting in the same total time constraint per task as before. Subjects did not know how many switches would occur and the time intervals between switches varied, making anticipation unlikely. In Treatment Choice, subjects could alternate between the two tasks by pressing a ‘Switch’ button, subject to the same time constraint per task as before (12 minutes each). A timer informed subjects about the remaining time for each task. When the 12 minutes for one task expired, the screen changed automatically to the other task and the Switch button could not be used anymore.

It is important to see that this design ensures that the same amount of time is spent on each task in all three treatments. If we tried to resemble simultaneity, for example by splitting the screen, we could not determine how much time subjects spend on each task, and therefore we would not know whether performance between treatments differs due to differences in the amount of time allocated to the two tasks or due to differences in the schedules.

As shown in Table 1, subjects were assigned to three groups. Every subject played two rounds, the first of which was Treatment Single. In the second round, subjects in Group 1 played Treatment Single again, subjects in Group 2 played Treatment Multi, and subjects in Group 3 played Treatment Choice. The subjects knew from the start that there would be two rounds and that they would work on one Sudoku puzzle and one Word Search puzzle in each. The puzzles given in Round 2 were different from the puzzles in Round 1 (but they were the same for all subjects within rounds).

Table 1 Treatments of each group

This design allows us to answer all four research questions and the fact that Group 1 plays Single twice allows for a difference-in-differences approach. This enables us to correct for learning effects and performance drops due to exhaustion or boredom. To examine the effect of forced multitasking on productivity, we can compare the performance difference between Round 1 and Round 2 of Group 2 to the performance difference of Group 1. To examine the effect of a self-chosen work schedule, we can compare the performance difference of Group 3 to the performance differences of the other two groups. If subjects choose the optimal work schedule, we should see that the performance difference of Group 3 is at least as high as the performance difference of the other two groups.Footnote 10

To examine gender differences in the effects of multitasking on productivity, we follow a difference-in-difference-in-differences approach. Note that any gender difference in performance cannot be led by differences in task proficiency since we compare performance in Round 2 to a subject’s own performance in Round 1. Besides, Group 1 captures any gender differences in learning or exhaustion. For Group 2, any gender difference in performance therefore can only come from differences in the reaction to multitasking. For Group 3, both the reaction to multitasking and the self-chosen degree of multitasking determine the performance difference.

Finally, to examine whether there is any gender difference in the propensity to multitask, we check whether there is a gender difference in the number of switches in Treatment Choice. The propensity to multitask might vary with proficiency: subjects who perform well might find switching easier or more beneficial. Alternatively, subjects who get stuck more often may want to switch more often. To avoid attributing such effects to gender differences in multitasking, we control for performance in Round 1.

3.2 Tasks

Our design requires tasks that are not gender-specific and for which multitasking is natural and possibly beneficial. For these reasons, we have chosen Sudoku and Word Search as tasks.Footnote 11 Sudoku is played over a 9×9 grid, divided into 3×3 sub-grids called “regions”. The left panel of Fig. 1 illustrates that a Sudoku puzzle begins with some of the grid cells already filled with numbers. The objective of Sudoku is to fill the other empty cells with integers from 1 to 9, such that each number appears exactly once in each row, exactly once in each column, and exactly once in each region. The numbers given at the beginning ensure that the Sudoku puzzle has a unique solution. For example, the unique solution to the Sudoku in Fig. 1 is illustrated in the right panel. We measure performance in the Sudoku task by the number of correctly filled cells.

Fig. 1
figure 1

Screenshots of Sudoku

When solving a Sudoku puzzle, solutions often come in waves. Multitasking can be appealing when one is stuck: one can work on the other task and hope to see the problem from a different angle when switching back.

The other task was to find as many words as possible in a Word Search puzzle. An example of a Word Search puzzle is presented in the left panel of Fig. 2, and its solution is presented in the right panel. Participants had to look for the English names of European and American countries in a 17×17 letter grid. Words could be in all directions, including diagonal and backwards. Subjects’ performance is measured by the number of correct words found.Footnote 12

Fig. 2
figure 2

Screenshots of word search

As in the case of Sudoku, it is reasonable to expect subjects to switch when unable to find new words for a while. The situation is similar to polishing a paper, when reading the same lines over and over becomes counterproductive after a while—one changes to another task simply because a ‘fresh eye’ is needed to recognize meaning behind the letters.

3.3 Procedures, payments, timeline

One pilot and ten regular sessions were run in the computer lab of CREED (Center for Research in Experimental Economics and Political Decision-Making) at the University of Amsterdam. Participants were university students from various fields of study. The application procedure ensured that the two genders were represented approximately equally in every session, but left subjects unaware that the experiment examines gender-related issues. The experiment was conducted in English, therefore both international and Dutch students could participate. All instructions and tasks were computerized,Footnote 13 and subjects were not allowed to use any paper or take notes during the experiment.

The experiment started with an introduction that explained the rules of the two tasks and gave the participants opportunity to practice. Subjects learned that there would be two rounds and that they would have to play a Sudoku and a Word Search in both rounds. In each round, subjects earned 6 points for each correctly filled Sudoku cell and lost 6 points for each cell filled with a wrong number to avoid random guessing. Subjects were not penalized for cells filled with multiple numbers.Footnote 14 They received 9 points for each word found in Word Search. In Word Search, only entire words could be marked and there was therefore no need to penalize random clicking. Subjects’ total points for each round were determined as the sum of their points in Sudoku and their points in Word Search. Negative total points were set to 0. One of the two rounds was randomly selected for payment at the end and the conversion rate was 1 euro per 11 points. In addition to this, there was a fixed show-up fee of 7 euros. The performance payments and the conversion rate were chosen based on the results of a pilot, such that subjects could earn approximately equal amounts on the two tasks and that the average payment was around 23 euros. The sessions lasted for approximately 1 hour and 45 minutes.

The order of the tasks within each round was randomized, and the assignment of subjects to the three treatments in round 2 was random as well, so that each group consisted of approximately one third of the subjects in every session. The rules of the treatments were explained immediately before the start of the treatment. Subjects were not aware of the fact that not everyone was playing the same treatment as they did.

After both rounds were over, but before being informed about their payment, we elicited some background information such as gender, age, field of study, and nationality from the subjects through a questionnaire. Those who participated in Treatment Choice were also asked their reasons for (not) switching.

3.4 Data

Our sample consists of 218 subjects from the ten regular sessions.Footnote 15 They are 22 years old on average and the majority of them is Dutch (73 percent). Approximately half of the sample consists of economics students (53 percent). The sample contains 11 censored observations from subjects who solved the entire Sudoku puzzle in the second round but not in the first.Footnote 16 As Sect. 3.1 explained, subjects were randomly assigned to three groups. Table 2 shows the number of observations per group and gender.Footnote 17 As we can see, there are between 30 and 43 subjects per cell.

Table 2 Number of observations per cell

4 Results

4.1 Multitasking and performance

Performance is measured as the sum of Sudoku plus-points and Word Search points.Footnote 18 Table 3 shows means per group and gender (for both rounds), and performance differences between rounds. Note that the difference-in-differences(-in-differences) strategy takes care of any performance differences between cells in Round 1. Results are qualitatively the same when using relative instead of absolute changes.

Table 3 Average total points per cell

Comparing the results of Group 1 and Group 2 to each other shows that the productivity effects of multitasking are significantly negative: the difference-in-differences is −23 points (t-test: p=0.04). Subjects who could pick their own schedule (Group 3) perform only slightly better than those forced to multitask and score 21 points less than Group 1 (p=0.07).

The difference-in-differences in performance between men and women in Group 2 suggests that men handle forced multitasking relatively better than women, but the difference is not significant (p=0.62). The results of Group 3, on the other hand, suggest that women are better at organizing their own schedule, but this difference is not significant either (p=0.35). There are no gender differences in learning either: the performance improvement for Group 1 subjects is the same for both genders (p=0.84). In sum, a simple comparison of differences does not reveal any significant gender differences.Footnote 19

Using regression techniques, we can check whether the results hold if we take censoring and the (non-significant) gender differences in learning into account. Table 4 shows the results of fixed effects and first-difference censored regressions which take full advantage of the panel structure of our data.Footnote 20 As we can see, the results of the censored regressions are very close to the results of the fixed effect estimates and all the previous conclusions are confirmed. The coefficients of Treatment Multi and Treatment Choice (relative to Treatment Single) are negative and significant at the 5 percent and the 10 percent level, respectively.Footnote 21 The gender-specific estimates confirm that there is no gender difference in learning (the gender dummy is insignificant). The point estimates suggest that men adapted better to Treatment Multi and women adapted better to Treatment Choice, but none of these gender differences is significant.Footnote 22

Table 4 Impact of treatments on total points

4.2 Propensity to multitask

To examine gender differences in the propensity to multitask, we use the results of Group 3.Footnote 23 Table 5 describes the switching behavior of men and women in Treatment Choice. As we can see, 71 percent of the subjects do actually switch when they are allowed to and the share of switchers is exactly the same for men and for women. So contrary to the claims of Fisher (1999), men do not focus on a single task any more than women do. Moreover, we can reject that women switch more or equally often than men (one-sided t-test; p-value=0.06).

Table 5 Number of switches in Treatment Choice

Table 6 displays the results of two OLS regressions where the number of switches is the dependent variable. In Column 1, we only control for performance in Round 1, while in Column 2 we include session and task-order fixed effects. Contrary to our expectations, performance in Round 1 does not influence switching behavior at all; this also implies that the impact of gender on switching is not caused by performance differences. When task order and session fixed effects are also included, the gender difference becomes significant at the 10 percent level. In sum, the results show that if there is any gender difference, it is men switching more than women and not the other way around.

Table 6 Regression results on propensity to switch

It is interesting to look at how the performance drop in Group 3 relates to the number of switches. In Table 7, we regress the performance difference between Round 1 and Round 2 for subjects in Group 3 on the number of switches. The performance difference is insignificantly negatively correlated with the number of switches. When we restrict the sample to those who actually switch at least once in Column 2, the coefficient becomes significant at the 10-percent level. This indicates that the performance of subjects who switch more often suffers more. But we have to be careful in interpreting this coefficient. Although by using differences we take into account baseline performance levels, the number of switches might still be endogenous with respect to learning or tiredness effects.

Table 7 Performance in Treatment Choice and the propensity to switch

So why do subjects switch although this seemingly harms their performance? Subjects already experienced an example of each task in Round 1 which should minimize switching due to mere curiosity. Indeed, the average subject (amongst those who switch) switches for the first time after 225 seconds. The second switch, for those who switch at least twice, on average occurs after another 237 seconds. Moreover in the post-experimental questionnaire, many subjects explicitly stated ‘looking at the problem with a fresh eye’ as a reason for switching while none mentioned curiosity. It therefore seems more likely that subjects switched because they (wrongly) thought it increases their performance.

5 Discussion and conclusions

Our results demonstrate that work schedules can be an important determinant of productivity. We find that multitasking significantly lowers performance as compared to a sequential execution. This suggests that the costs of switching, which include recalling the rules, details and steps executed thus far, outweigh the benefit of a ‘fresh eye’.Footnote 24 Subjects who could choose the amount and timing of their switches freely did only marginally better than those forced to switch at unanticipated points in time and they perform significantly worse than those working under the exogenously imposed sequential schedule. Finally, we find no evidence that women are better at (or more attracted to) multitasking.

The finding that subjects are unable to organize their own work optimally is not unprecedented. For example, Ariely and Wertenbroch (2002) find that students who can set their own deadlines perform worse than those forced to adhere to equally spaced deadlines. Possibly, subjects pick a suboptimal schedule because the two tasks imply a high cognitive load that leads to more impulsive choices, as suggested by the results of Shiv and Fedorikhin (1999).Footnote 25 Another possibility is that even though subjects choose the best schedule possible, planning itself requires so much effort that their performance on the tasks takes a hit.

The fact that in our experiment the number of switches is negatively correlated with performance supports the interpretation that subjects choose a suboptimal schedule. The hypothesis that the effort required for planning when to switch is at the root of the performance impact is however supported by the fact that the average number of switches in Treatment Choice is only 2.16, but subjects still fall back almost as much as subjects in Treatment Multi who were forced to switch four times and could not anticipate the timing of the switches. It is difficult to distinguish between these explanations as the number of switches is potentially endogenous to performance. Whichever explanation is correct, the results are not in favor of self-imposed work schedules.

The results support the intuition that scheduling is an important input in the production function that deserves more attention in the economic literature. However, there are some caveats which need to be taken into account when extrapolating our findings. The results were obtained in a stylized lab setting and may be specific to the chosen tasks. Note also that we compared multitasking to sequential execution keeping time allocation between tasks constant, so our analysis does not extend to situations where time allocation varies. Future experiments could uncover whether individuals are able to optimally allocate their time between multiple tasks and whether there are gender differences in this regard. Furthermore, our strict time constraints and performance-dependent pay scheme possibly put pressure on the subjects which may affect performance.Footnote 26 This does not affect the internal validity of our results, as these factors are constant across treatments, but may mean that our results are particularly applicable to high pressure work situations. Furthermore, some potential benefits of multitasking are eliminated in our design. Multitasking may make work more stimulating which in turn could increase productivity. Workers could even repay a less boring work design with increased productivity as a form of gift exchange (Akerlof 1982). The 12 minutes subjects work on a task in our experiment might be too short to get bored and there is no room for gift exchange. Further research is therefore needed to determine to what extent our results carry over to specific work environments.

If they do carry over, there are important implications for job design. Although our experiment does not provide a direct test of this, the results suggest that assigning multiple tasks to a worker may be problematic for reasons different from those suggested by the previous literature (e.g. by Holmstrom and Milgrom 1991). Namely, if workers are given several tasks at once, they may hamper their own productivity by juggling between the tasks. One way to avoid this problem is to assign the next task only after the previous one has been finished. Another way is to prescribe a sequential execution rather than letting workers choose their own schedule.

The finding that subjects perform worse under the self-chosen work schedule also adds a new aspect to the debate about the centralization of decision making. The standard argument in favor of decentralization is that workers have more information than managers and that more decision making rights lead to an increase in motivation.Footnote 27 Typically, loss of control is mentioned as the sole disadvantage. Our results suggest further issues: decision-making may take away resources from a worker’s actual tasks and workers may simply not be able to schedule their own work optimally. One limitation of our study though in this regard is that there is little room for learning in our experiment. Over time, workers may get better at choosing their own schedules or learn to avoid multitasking. Future research could uncover whether with more experience or a longer time horizon subjects are able to optimize their schedule.

As far as gender differences are concerned, we do not find any evidence for them in the effects of multitasking. Besides, the share of switchers is exactly the same for men and women and the average number of switches is higher for men. These results contradict the claims of Fisher (1999): if men think so much more linearly than women, why don’t they insist more on a sequential schedule? And why is it that women do not adapt better to multitasking than men when forced to alternate? In sum, the view that women are better at multitasking is not supported by our findings.