Performing many actions at the same time is commonly thought of as something both error prone and stressful – and for the most part, the results of experimental psychological research corroborate this intuitive notion. However, recent evidence points to some important exceptions from the apparent rule that doing two things at once always incurs costs: Huestegge and Koch (2014) were able to show that executing both a saccade and a button press in response to a peripheral visual stimulus is associated with lower error rates (i.e., better overall performance) than only executing the button press, but suppressing the saccade. Specifically, difficulties with saccade inhibition resulted in a large number of false-positive eye movement responses. The authors concluded that executing two actions simultaneously can actually be beneficial when one of them (here: the eye movement) is relatively easy to initiate (or even close to automatic), reasoning that in such scenarios, inhibiting the response is more demanding than overtly executing the response. In this study, we want to explicitly test this general cognitive explanation by investigating if it is applicable to other, more commonly studied action modalities. Alternatively, the effect might be highly domain-specific, that is, restricted to highly automatic oculomotor movements (which may have a rather special status; Pashler, Carrier, & Hoffman, 1993).

Dual-action benefits (or equivalently, single-action costs) as discussed above can only occur when in some experimental trials one of at least two potential responses has to be suppressed. This is not the case in typical dual-task studies utilizing the psychological refractory period (PRP) paradigm (for a review, see Pashler, 1994), with two overt responses to two distinct, systematically delayed stimuli. As a result, inhibitory control demands do not feature prominently (if at all) in largely PRP-based models of multiple action control (e.g., Logan & Gordon, 2001; Meyer & Kieras, 1997). Conversely, while control processes related to response suppression are the focal point of attention in the selective inhibition literature (e.g., Aron & Verbruggen, 2008), the corresponding experiments rarely address either dual-action benefits or costs (for notable exceptions, see Logan & Burkell, 1986; Yamaguchi, Logan, & Bissett, 2012). Thus, in contrast to the study by Huestegge and Koch (2014), neither PRP experiments nor stop-signal or go/no-go based selective inhibition studies are usually designed to specifically reveal the cognitive mechanisms behind inhibition-based dual-action benefits.

For this reason, we followed Huestegge and Koch (2014) by implementing an experimental setup where a single stimulus requires either two equivalent responses at the same time or only one singular response. In line with Fagot and Pashler (1992), we assume that using a single stimulus attribute to specify two responses results in a single, compound response selection process. Crucially, this means that dual-action costs due to the processing of two stimuli and the selection of two independent responses cannot cancel out any potential dual-action benefits. Note, however, that dual-action costs could still arise at a post-selection level; for example, some authors have proposed a bottleneck during motor processing (Bratzke, Rolke, & Ulrich, 2009; Ulrich et al., 2006). Similar to a response selection bottleneck, such a motor bottleneck would result in prolonged reaction times for one of the response modalities in the dual-action condition. This effect should be particularly pronounced if manual responses are generally executed first (Bratzke et al., 2008), resulting in significant dual-action reaction time costs for the vocal modality.

In the present study, vocal response demands (instead of saccades as in Huestegge & Koch, 2014) were combined with manual button presses. Participants had to react to a single visual stimulus by either responding in only one of two response modalities (single-action condition) or in both at the same time (dual-action condition). Note that we did not utilize any stimulus explicitly indicating to inhibit (“no-go”) or stop responses, a procedure that may explicitly encourage participants to focus on action inhibition (e.g., Miller, 2006; Yamaguchi et al., 2012). In order to induce differential levels of inhibition difficulty, we exploited the fact that visual stimuli are more readily paired with manual responses than with vocal responses (in contexts with multiple action demands). This is probably due to the fact that both visual stimuli and manual responses have a visuospatial component that vocal responses lack. Furthermore, it has been noted that we usually use vision to guide manual behavior, and that manual action usually results in changes of the visual percept (input-output modality compatibility [IOMC] effect; cf. Hazeltine, Ruthruff, & Remington, 2006; Huestegge, Pieczykolan, & Koch, 2014; Stelzel & Schubert, 2011; Stephan, Koch, Hendler, & Huestegge, 2013). This entails that the initiation of manual responses (but not vocal responses) is relatively easy when the trigger is visual in nature; conversely, using visual stimuli should make the inhibition of manual responses relatively hard. If this is indeed the case, we should observe a pattern of relatively many false-positive responses in contrast to relatively few false-negative responses in the manual, but not the vocal modality.

The imperative stimulus in all trials was a centrally presented directional word; participants then had to either read the stimulus aloud (vocal modality, relatively lower IOMC/inhibition difficulty) or press the corresponding arrow key on a keyboard (manual modality, relatively higher IOMC/inhibition difficulty). We predicted that executing both a vocal response and a button press at the same time should be less difficult (resulting in lower overall error rates) than only executing the vocal response while inhibiting the easily initiated key press. Finding single-action costs in this sense would further corroborate the hypothesis that in some dual-action scenarios, inhibitory control can be more demanding than execution-related control.

Methods

Participants

Forty-four university students with normal or corrected-to-normal vision participated in the experiment. The datasets of two participants were later removed due to excessive overall error rates (> 30%). Thus, the final sample included 42 subjects (five males, mean age = 20.95 years, SD = 2.59, range = 18–29). All participants were native speakers of German.

Apparatus, stimuli, and procedure

Participants were tested using a desktop computer running Windows 7 and PsychoPy 1.83.04. Stimuli were presented on a 19-in. TFT screen (1,280 × 1,024 pixels resolution). Responses were recorded using a USB keyboard (manual modality, left and right arrow keys) and a headset connected to the 3.5-mm microphone jack (vocal modality). Subjects were instructed to use the index and ring fingers of the right hand for manual responses. Voice onset times were determined using PsychoPy’s voice-key routine. The stimuli consisted of directional words that were presented centrally in white ink on a black background.

Participants read instructions presented on the computer screen and performed a 30-trial training session. The experiment comprised 180 trials (60 per condition, 30 per condition and direction). Conditions were presented in a fully randomized fashion. Each trial started with a white central fixation cross. After 250 ms, the color of the cross changed to red, green, or blue, indicating the response condition (single manual, single vocal, dual). The color-to-condition mapping was counter-balanced across participants. The colored central fixation cross was displayed on the screen for 250 ms, followed by the imperative stimulus (the word “links”/left or “rechts”/right). In the single manual condition, subjects had to press the arrow key corresponding to the direction indicated by the imperative stimulus, but were not supposed to read the word aloud; in the single vocal condition, they had to read the word aloud, but were not supposed to press a key; in the dual condition, they had to perform both responses. Participants were instructed to respond as quickly and as accurately as possible once the imperative stimulus was presented. There were no particular instructions regarding response sequencing in the dual condition. The imperative stimulus was presented for 1,500 ms; responses that were made after this period were flagged as too late with corresponding feedback to the participants. There was a 1,000-ms inter-trial interval in which a black screen was shown.

Design

Error rates were analyzed as a function of the within-subject independent variable response condition (single manual, single vocal, dual). Note that on all three levels of response condition and thus in each trial, errors in both response modalities were possible (e.g., in the single vocal condition, a button press constituted a false-positive response in the manual modality). Thus, we analyzed these two dependent variables (manual errors and vocal errors) with separate analyses of variance (ANOVAs). In a second step, we also conducted separate modality-wise analyses of error type as a (post hoc) factor (false positive, false negative, directional). Here, we defined a false positive as a response in an uncued response modality (e.g., a button press on a single vocal trial); a false negative as the omission of a response in a cued response modality (e.g., lack of a button press on a single manual trial); and a directional error as an incorrect response in a cued response modality (e.g., left instead of right button press on a dual trial). Reaction times (RTs) for each response modality were only analyzed for correct trials, reducing the factor response condition to two levels (single, dual).

Results

Error data

Error rates as a function of response condition and response modality are plotted in Fig. 1. Visual inspection immediately reveals that error rates are lowest overall in dual-response conditions (equivalent to dual-response benefits). Indeed, a one-way ANOVA with the factor response condition resulted in significant main effects for both the manual modality, F(2,82) = 9.72, p < .001, \( {\eta}_p^2 \) = .19, and the vocal modality, F(2,82) = 6.53, p = .004, \( {\eta}_p^2 \) = .14. In both the manual and the vocal modality, Bonferroni-corrected pairwise post hoc t-tests indicated a significant difference between the dual and the single vocal (ps < .001) and the dual and the single manual condition (p = .039 and p = .041, respectively), while the difference between the single manual and the single vocal condition was not significant (p = .231 and p = 1, respectively).

Fig. 1
figure 1

Error rates (%) as a function of response modality and response condition. Error bars represent SE

Errors in single-response conditions were of special interest because the relative number of false positives versus false negatives can be interpreted as an index of inhibition difficulty. Thus, we specifically analyzed error rates as a function of error type and response modality in single vocal and single manual trials (Fig. 2). Separate one-way ANOVAs with the factor error type revealed significant main effects in both the manual modality, F(2,82) = 22.7, p < .001, \( {\eta}_p^2 \) = .36, and the vocal modality, F(2,82) = 28.36, p < .001, \( {\eta}_p^2 \) = .41. Bonferroni-corrected pairwise post hoc t-tests in the manual modality indicated a significant difference between directional errors and false positives as well as between false negatives and false positives, ps < .001, but not between directional errors and false negatives, p = 1. In the vocal modality, post hoc t-tests revealed significant differences between directional errors and false negatives as well as between directional errors and false positives, ps < .001, but not between false negatives and false positives, p = .46.

Fig. 2
figure 2

Error rates (%) in single-response conditions as a function of response modality and error type. Error bars represent SE

Reaction time data

RTs as a function of response condition and modality are plotted in Fig. 3. Separate paired t-tests between the dual and the single action condition revealed no effect for the manual modality, t(41) = .678, p = .501, d = .105, but a significant effect for the vocal modality, t(41) = 3.15, p = .003, d = .486.

Fig. 3
figure 3

Reaction times (RTs; ms) as a function of response modality and response condition. Error bars represent SE

Discussion

In the present multiple action control study, we were interested in the relative costs associated with inhibitory versus execution-related processes. In particular, we investigated whether inhibition-based dual-action benefits (without utilizing explicit no-go stimuli) can be observed in response modalities other than eye movements, specifically when combining vocal and manual responses. We implemented a choice RT paradigm in which dual- and single-action trials were randomly intermixed.

The overall distribution of error types suggested that manual responses were harder to inhibit than vocal responses, resulting in significantly more false-positive than false-negative errors for the former while there was a similar number of false-positive and false-negative errors for the latter. Taken together, these observations were in accordance with our assumption that manual responses (which were also initiated faster than the vocal responses) were relatively easier to initiate than vocal responses, probably due to greater IOMC (e.g., Hazeltine et al., 2006). However, the difference was quantitative rather than qualitative – vocal responses were still sufficiently easy to initiate to require suppression in the single manual condition.

Most importantly, the results confirmed our main hypothesis: manual error rates were significantly lower when participants had to respond with both a button press and naming than when they had to respond with naming only. Note that this finding was not compromised by a speed-accuracy trade-off (i.e., manual RTs were not affected). An analogous pattern was present in the error rates for the vocal modality.Footnote 1 The relatively high incidence of manual false-positive errors is an indication of fundamental difficulties with inhibiting unwarranted manual actions, and suggests that single-action costs (i.e., dual-action benefits) in this sense are not modality-specific effects (i.e., exclusive to saccades, cf. Huestegge & Hazeltine, 2011; Huestegge & Koch, 2014), but rather generally related to costs associated with inhibiting relatively easily initiated responses. Note that this instance of accuracy-related dual-action benefits is in contrast to the usually reported finding of (typically RT-related) dual-action costs in most multitasking studies (in which two overt actions are typically triggered by separate stimuli), and shows that under specific conditions (e.g., when there is strong code overlap based on S-R and R-R compatibility), inhibitory control can be more demanding than execution-related control.

Assuming that both single- and dual-response conditions in the present paradigm (where two responses were triggered by the same stimulus attribute) only involved a single, compound response selection process (Fagot & Pashler, 1992), response selection bottleneck-based accounts (which require the presence of two selection processes in the first place) of this result can be ruled out. Alternatively, however, single-action costs in the form of false-positive responses could also be explained as resulting from spreading activation between cognitive representations or codes (Hazeltine et al., 2006; Huestegge, 2011; Kornblum, Hasbroucq, & Osman, 1990). Based on a corresponding framework, Huestegge and Koch (2014) proposed a mechanism for explaining unwanted saccade execution, which can readily be transferred to failures of manual and vocal inhibition. For example, in a vocal-only trial of the present experiment, the imperative stimulus (e.g., “left”) triggers the activation of both a spatial and a verbal code; a correct response requires that the verbal code is bound to the vocal modality code, but that the spatial code is not bound to the manual modality code. Given that the manual response is assumed to be based on comparatively strong S-R bindings, it is reasonable to assume a high baseline activation of the associations between spatial codes and manual modality codes. As a result, the activation of a spatial code in vocal-only trials might sometimes spill over to the (strongly associated) manual modality code, eventually triggering an unwarranted (false-positive) manual response. Assuming that there is an analogous (but weaker) baseline association between verbal codes and vocal modality codes, false-positive vocal responses can be explained along the same lines. Note that this is much more likely to occur in a dual-action setting with randomly intermixed trial types since in such a scenario, all possible modality codes must be kept active to some extent as potential binding targets. Based on this reasoning, we predict that when trial types are blocked, false positives should be less probable since irrelevant modality codes could be more easily suppressed, a claim that awaits dedicated empirical testing in the future.

In addition to dual-action benefits in terms of less incorrect dual-action responses than uncued (and thus incorrect) single-action responses (i.e., unwarranted button presses in the single vocal condition or verbalizations in the single manual condition), we also found dual-action benefits in terms of less incorrect dual-action responses than incorrect cued single-action responses (i.e., missed or directionally wrong button presses in the single manual condition or verbalizations in the single vocal condition). Tentatively, we suggest that at least part of this unexpected effect may be due to difficulties with selective inhibition (e.g., Aron & Verbruggen, 2008; Bissett & Logan, 2014). For example, it is possible that most of the time, participants attempted to selectively inhibit the uncued response in single-action trials. In some cases, however, they may instead have resorted to globally inhibiting all responses followed by selectively restarting only the cued response (cf. Bissett & Logan, 2014). Given the limited response time window, such a global initial inhibition followed by a “reboot” could sometimes have resulted in exceeding the temporal response threshold and therefore in a missed response (and thus an error). Further research is needed to specify the exact mechanisms that are involved here.

Some previous studies already addressed performance costs associated with response inhibition, albeit from a different theoretical perspective and/or with a focus on RT effects rather than error rates. For example, research on stopping actions has concentrated on cognitive mechanisms in situations involving a stimulus prompting an action which, after a certain time interval, is followed by a stop signal (e.g., De Jong, Coles, & Logan, 1995; De Jong, Coles, Logan, & Gratton, 1990; Logan, 1994; Logan & Cowan, 1984; Logan, Van Zandt, Verbruggen, & Wagenmakers, 2014; Verbruggen & Logan, 2008; Yamaguchi et al., 2012). However, this situation differs substantially from our present setup, which never involved any stimulus triggering an eventually unwanted response. Closer relatives to our own design are dual-task studies (with two distinct stimuli for each task) involving a go/no-go task as Task 2. For example, Miller (2006) showed that trials involving a no-go stimulus (vs. a go stimulus) in Task 2 were characterized by a Task 1 RT increase (see also Janczyk & Huestegge, 2017; Miller & Durst, 2014, 2015). It was assumed that this effect is based on an inhibitory response triggered by the no-go stimulus, which (either directly or via its transformation into a dedicated “inhibitory response” selection process) eventually prolongs Task 1 processing (see Röttger & Haider, 2016). However, there are important differences between these studies and the experimental procedure implemented here: The latter does not involve any explicit no-go stimulus in the first place, and both responses in dual-response conditions were triggered by the same stimulus feature. This may explain why we did not find any corresponding RT effects.

In future studies, it would be interesting to investigate the role of the cue-stimulus interval (CSI) in more detail. Here, we used a fixed – and relatively short – CSI of 250 ms, which may have made it difficult to actually make use of the information provided by the cue (i.e., by removing non-required modality codes or task sets from working memory; cf. Gade, Druey, Souza, & Oberauer, 2014). At longer CSIs, however, it may often be possible to complete such operations before the imperative stimulus is presented, resulting in successful inhibition in the single-action conditions and thus a reduction of relative dual-action benefits. A similar effect could probably be achieved by varying the proportion of responses required in the different modalities (e.g., by making manual responses a rare occurrence it should be possible to decrease the occurrence of manual false positives in the single vocal condition, thereby reducing the corresponding dual-action benefit effect).

In conclusion, we were able to show that accuracy-related dual-action benefits due to inhibitory control demands (that were not explicitly triggered by a dedicated no-go stimulus) are not specific to eye movements as a response modality. Thus, our results strongly indicate that when dual-action responses are made particularly easy (e.g., by introducing S-R and R-R compatibility and a common trigger stimulus), dual-action benefits depend on the (relative) ease of response initiation in general, not on a particular response modality. As of yet, cognitive costs incurred by inhibitory demands (i.e., the suppression of an overt response) do not feature prominently in multitasking models, which were usually developed based on experimental settings in which tasks require overt response execution. While our present research design clearly differs from typical dual-task setups (in which two responses are independently triggered), our results nevertheless suggest that “doing nothing” (i.e., response inhibition) can sometimes be hard work (i.e., more demanding than response execution), yielding error-prone performance in trials requiring the execution of only a single action.