Life inside and outside of psychological laboratories differs in many ways, which is particularly true with respect to action control. Outside the lab people seem to carry out actions to achieve particular goals and to adapt the environment according to their needs. Once they enter a lab, however, they are commonly talked into responding to arbitrary stimuli by carrying out meaningless movements. The latter is assumed to increase the amount of experimental control over the variables involved in performing an action, which of course is true and utterly important for disentangling all the confounds present in everyday actions. And yet, most models of action control seem to take this highly artificial stimulus-response situation so serious that they use it as a template for voluntary action in general. In fact, almost all introductory textbooks of cognitive psychology do not only neglect most aspects of action control, but they also reduce action to a mere byproduct of perception and reasoning.

We could have seen this coming. In the last half of the nineteenth century there were two dominant forces that shaped psychological theorizing with regard to action control. On the one hand, there were authors who relied mainly on introspection, an approach that not surprisingly was strongly driven by our everyday concept of action as a means to achieve wanted outcomes. Authors like Lotze (1852), Harless (1861), and James (1890) were interested in the question how the mere thought of a particular action (or its outcomes) can eventually lead to its execution or, more generally speaking, how we can voluntarily move our body in the absence of any conscious insight into motor processes (executive ignorance). Action was thus reconstructed by starting the analysis with a goal or to-be-achieved effect and then asking how motor processes are used to achieve this effect. According to this logic, action is a means to generate perceptions (of outcomes) and to the degree that these perceptions can be anticipated and systematically produced action is considered voluntary. On the other hand, there were authors who followed Descartes’ strategy of tracing the perceptual sensations produced by external stimuli through the body, with muscle contractions being the final result. Particularly important for the further development of experimental psychology and the cognitive neurosciences was the approach of Donders (1868). He suggested analytically segmenting the processing stream from the sensory organ to the muscle into separate, sequential stages and to measure the duration of each stage by systematically manipulating task factors related to it. According to the logic underlying this approach, action is a consequence of sensory processing but not its precursor, which makes the action truly a response and the stimulus it’s most important predictor.

Donders’ methodological approach turned out to be far more useful for the emerging discipline of experimental psychology and related areas of the cognitive neurosciences than the nineteenth century introspective armchair reasoning. After the necessary adjustments and refinements (Sternberg, 1969), the technique to use reaction times to segment information-processing streams into stages was widely used and still dominates research in several areas (such as dual-task performance, see Pashler, 1994). Indeed, from the currently available textbooks it is easy to see that Donders’ approach has influenced our thinking most: actions are commonly referred to as responses and considered to be mainly controlled by the stimulus and the way it is processed. However, the recent interest in what is commonly called executive functions (the term that replaced the outdated “will”) has revealed the shortcomings of a purely stimulus-driven approach and lead to a revival not of the methods but of the analytical perspective of the introspective theorists. In the following, I would like to focus on the probably most comprehensive approach involved in this revival, the Theory of Event Coding (or TEC: Hommel, Müsseler, Aschersleben & Prinz, 2001a, b). I will briefly review the main assumptions of what was considered a meta-theoretical framework that, among other things, integrates ideomotor theorizing with Prinz’s (1990) common-coding hypothesis (which claims that perception and action rely on shared cognitive representations) and Hommel’s (1997) action-concept model (which holds that human cognition is based on integrated sensorimotor units), and elaborate on the implications of these assumptions for action control. Then I go on to discuss how a TEC-inspired approach changes the way actions are reconstructed and analyzed, and how this affects our understanding of how stimulus and action events are processed and cognitively represented, and how actions are selected, prepared, planned, and evaluated. Even though more research is certainly needed, accumulating evidence suggests that the TEC-driven approach is not only tenable but, even more importantly, that it is fruitful in generating novel theoretical questions and experimental strategies.

The ideomotor principle

What Lotze (1852), Harless (1861), James (1890) and related theorists intended to explain was how having the idea of an action translates into that action’s execution, which is why their approach has been coined ideo-motor (Greenwald, 1970; Prinz, 1987; Stock & Stock, 2004). The ideomotor approach has suffered from a notoriously bad press, however. Thorndike (1913) compared it to the superstitious believes of primitive people in the power of their thoughts to magically change things in the world, and Miller, Galanter, and Pribram (1960) concluded that all this approach has to offer for bridging the gap from knowledge to action is the hyphen between ideo and motor. In contrast to the claims and own achievements of these historically pretty successful attempts to ridicule ideomotor theorizing, the ideomotor approach was rather specific with regard to the basic mechanism underlying action control. Figure 1 shows the neural scenario suggested by James (1890). Consider a motor neuron M the activation of which moves a Muscle, which again provides kinesthetic feedback by activating neuron K. This may represent the neural hardware a newborn is equipped with or the neural software it prenatally acquired. When exploring the world, the newborn may at some point get neuron M activated, be it through a reflex or arousal induced by sensations (represented by S), or simply by motor noise (sometimes called “motor babbling”). Whatever its cause, this activation results in a movement that produces the kinesthetic perception that is associated with the activation of K. If this happened only once, not much would follow. If, however, activating M regularly leads to the activation of K, trace conditioning creates an association between M and K, following the Hebbian principle that what fires together wires together (cf. Greenwald, 1970). Accordingly, K would become a kind of retrieval cue for M, so that re-creating or anticipating the perceptual experience coded by K becomes a means of activating M in a now intentional fashion: the activation of M, and of the movement this invokes, has come under intentional control. Obviously, the same logic applies to any other sensory modality, so that the codes of any perceptual consequence or effect of a given movement can become integrated with the motor neurons producing this movement and thus become its cognitive representation.

Fig. 1
figure 1

James’ (1890) neural model of acquiring ideomotor control (see text for explanation). Taken from James (1890, p. 582)

The notion that the motor patterns underlying voluntary actions are represented by codes of their perceptual effects has received ample empirical support. Elsner and Hommel (2001) have demonstrated that introducing novel auditory action effects, such as tones of a particular pitch that sound contingent on pressing a particular key, renders these effects primes and retrieval cues of the actions they accompany. For instance, if adults have experienced that left and right key presses systematically produce high- and low-pitched sounds, presenting these sounds as stimuli later on facilitated performance if the sound–key mapping heeded the previous key–sound mapping. Moreover, if subjects were presented with a free-choice task after having experienced particular key–sound contingencies, presenting a sound as a mere trigger signal increased the frequency of choosing the action that previously had produced this sound. Comparable findings have been obtained in numerous labs and with various tasks, stimuli, actions, and effects, and with participants of various ages (for an overview, see Hommel & Elsner, 2009), which points to a rather general action-effect integration mechanism. Studies using neuroimaging techniques have shown that facing a previously learned action effect leads to the activation of a number of action-related brain areas. In particular, auditory action effects activate the right hippocampus, which presumably links the sensory effect representations to their corresponding action plans, and the supplementary motor area, which presumably houses these plans (Elsner et al., 2002; Melcher et al., 2008). Of particular interest, even though the observed activations were elicited by the presentation of a stimulus (an action effect), the responding cortical areas were those that are typically involved in controlling endogenously planned but not stimulus-triggered actions—which fits with the notion that the codes of action effects are used for controlling voluntary action. Finally, a number of findings have demonstrated that the compatibility between features of the action proper and features of its effects have an impact on reaction time and, thus, on response selection. For instance, keypresses responses are initiated faster if they trigger visual events in spatially corresponding rather than non-corresponding locations (Kunde, 2001), suggesting that the spatial codes of both the key to be pressed and the visual event to be produced are considered when selecting a response. Comparable effects of action-effect compatibility have been reported for temporal (Kunde, 2003), semantic (Koch & Kunde, 2002), and other relations between actions and effects.

Coding and representing stimulus and action events

If actions are cognitively represented by codes of their perceptual consequences, one may ask whether representations of perceived events and of produced actions differ at all. TEC makes the strong claim that they do not. Considering typical laboratory tasks this claim may be surprising. Subjects typically stare at a computer monitor and are briefly flashed with arbitrary symbols, which they then under enormous time pressure translate into arbitrarily assigned key presses. Accordingly, it may make sense to consider the processes between the stimulus-produced light hitting the retina and some hypothetical internal identification process perception and most of what follows until the key is pressed as action (or response). But our eyes are neither made for staring at particular locations on computer screens, nor are they particular good at this task, as indicated by the difficulty and effort needed to keep one’s eye at the indicated spot. Quite on the contrary, outside the lab our eyes use to jump around about four times a second and they do so in order to create stimuli on the retina but not to respond to them. The same logic holds for other sensory modalities: hearing often benefits from orienting one’s body or head towards stimulus sources and tactile perception would virtually be impossible without systematically moving one’s effectors across the to-be-perceived surface. This means that perception is just as well the consequence of action than it is its cause or, as Dewey (1896) suggested, perception and action may be better conceived as mutually coordinated rather than causing each other. Hence, perception is not something imposed on us by a stimulus but the experience associated with performing an action: perceiving, that is.

Once one accepts that perceiving and acting is the same thing (carrying out movements to create particular effects), it makes a bit more sense to assume that perceived events and produced actions are represented in the same way. TEC argues that the ideomotor approach provides a good basis for this consideration. To take the scenario sketched in Fig. 1, the emerging link between M and K, and any other perceptually derived code (say, V, A, O, T, and P for the visual, auditory, olfactory, tactile, and proprioceptive feedback provided by the movement resulting from activating M), can be considered as the representation of both the perceptions one can experience by carrying out the M-induced movement and the action needed to produce them. Accordingly, the representation can subserve the anticipation of upcoming perceptual experience just as well as the selection of actions according to their expected outcomes—it thus is a truly sensorimotor unit subserving the needs of perceiving and acting.

Two further assumptions of TEC are important for the theoretical reconstruction of action control. One is that cognitive representations are composites of feature codes. The primate brain does not represent events by individual neurons or local neural populations but, rather, by widely distributed feature networks. The visual cortex consists of numerous representational maps that are coding for all sorts of visual features, such as color, orientation, shape, or motion (DeYoe & Van Essen, 1988) and other sensory cortices have been reported to contain feature maps as well. The same goes for the brain areas involved in action planning, which comprise of separable networks to code, for instance, the direction (Georgopoulos, 1990), force (Kalaska & Hyde, 1985), and distance (Riehle & Requin, 1989) of manual actions. The assumption that cognitive event representations are composites has two further implications. One implication is that binding operations are necessary to relate the codes referring to the same event. That is, activating or re-activating feature codes may not necessarily be sufficient to perceive or produce an event; instead the activation of these codes may need to be followed by their integration. Another implication of representing events in a feature-based fashion is that different events can be related to, compared with, or confused with each other based on the number of features they share. Hence, if one considers the number of features shared between events as their similarity, events can be more or less similar to each other and, given the sensorimotor nature of the cognitive representations, perceptions and actions can be as well.

A second important assumption underlying TEC is that cognitive representations refer to distal but not proximal aspects of the represented events (Prinz, 1992). Unfortunately, the terms distal and proximal are used in various, not always well-defined ways, often to distinguish between effectors, muscles, or actions far from versus close to the body, respectively. TEC relies on the more specific terminology of Heider (1926, 1930) and Brunswik (1944). These authors addressed the veridicality of our perception and how we can experience the attributes of objects in the world in our minds. According to their analysis, four different layers of perceptual processing are important to consider. The first layer (D, using the terms of Heider, 1930) refers to the objects, people, and events in our environment that are relevant for our life. It is this layer that Heider refers to as distal and where the things we perceive can be objectively defined. The second, proximal layer (V) refers to the physical information about these things that reaches our sense organs. One of the problems Heider and Brunswik consider is that the relationship between D and V is not invariant but depends on the medium (e.g., air, water, environmental conditions) through which the information is provided. V is thus not a valid and complete copy of D but only provides cues about the things defined at D. The next layer (V′) consists of the direct physiological correlate of V, such as the neural patterns in the receiving sense organs and closely related processing systems. In the absence of dramatic changes or damages of the sense organs, the relationship between V and V′ is assumed to be dictated more or less by physics and biology and thus more or less direct and invariant. The fourth layer refers to the central experience of the external thing (D′), which is assumed to correspond to V’ no better than V corresponds to D. According to Heider and Brunswik, the theoretical challenge consists in explaining why D′ can correspond so well to D despite the incomplete correspondences along the way of the perceptual process (i.e., between D and V and between V′ and D′). Importantly for present purposes, any possible to-be-perceived event necessarily has a distal and a proximal representation in the world (D and V, respectively) and internal representations that refer to these external representations (D′ and V′, respectively). Brunswik (1944) has extended this logic to action, where the distal goal object and the proximal means to achieve it are assumed to be internally represented by central representations and peripheral muscle commands, respectively. As pointed out by Prinz (1992), proximal internal representations of perceived and produced events (early sensory and motor patterns that is, irrespective of modality and content) are not closely related to their central representations and to each other, so that it is difficult to believe that feature-based interactions between perception and action occur on these levels. More plausible is the assumption that such interactions take place between the distal representations of perceived and produced events, that is, between the representations of stimulus and action features as they appear in the external world. This is why TEC focuses on distal but not proximal internal representations.

Given the distal focus of TEC the theory does not address, and cannot explain, how the transition is made between proximal and distal representations. With regard to action control, this means that the theory does not account for all aspects involved in generating a particular motor pattern. This limitation is by no means accidental but is meant to reflect the way actions are controlled. According to early ideas of Keele (1968), actions are generated by retrieving motor programs, which back then were assumed to consist of sets of muscle instructions that make for a complete feedforward program. The idea that all aspects of an action are centrally determined in advance is not particularly realistic, however. Theoretical reasons to doubt that consider the enormous storage problem the maintenance of all possible combinations of muscle parameters would imply and the difficulty to generalize from existing programs to novel, never before performed movements (Schmidt, 1975). Empirical reasons relate to observations that some action parameters can be flexibly adjusted on the fly, even in the absence of any conscious knowledge of the acting person about the adjustment (Prablanc & Pélisson, 1990). This suggests that the feedforward components of action control do not completely determine an action but, rather, (a) specify only those parameters that are essential for achieving the intended action effects; (b) leave the specification of nonessential parameters to lower-level sensorimotor online channels with characteristics that Milner and Goodale (1995) ascribed to what they called the dorsal routeFootnote 1; and (c) constrain the processing characteristics of those lower-level channels by “directing their attention” to the task-relevant stimulus features—a process that I will describe in the next section.

Apart from the available theoretical and empirical reasons for this distribution of labor between offline action control and online parameter specification, there are two implications of TEC that also favor such a dual architecture. First, TEC assumes that action planning is based on distal representations. Given the indirect relationship between internal proximal and distal representational levels (Heider’s V′ and D′), which implies a loss of concrete information in the transition from proximal to distal codes, and the need for very specific information needed to fill in the remaining gaps of feedforward action plans, it makes sense to assume that the filling is left to the representational level that keeps closest touch with the physical input—the proximal level that is. Hence, proximal and distal codes may selectively target and serve to inform online and offline control, respectively. Second, feedforward action plans are assumed to be associated with codes of action effects that an agent can imagine and that he or she can actively intend to select and control the action that is likely to reproduce those effects. What people can and will imagine commonly refers to invariant properties of a given action but not to concrete parameters that will often change with the context, the effector being used, and the posture assumed before starting the movement (Rosenbaum, Loukopoulos, Meulenbroek, Vaughan & Engelbrecht, 1995). Indeed, the success of an action (evaluating which requires a comparison between intended and actual outcomes, see below) is commonly judged based on the action’s general, invariant properties (whether or not the cup of coffee was brought to one’s mouth, say) but not on specific parameter values (e.g., how fast this was done or along which path in space the hand traveled). This implies that storing specific parameter values would be of little use for selecting, planning, or monitoring an action—the more so as they change frequently and would be difficult to learn, which suggests that detailed values are commonly not considered for long-term storage (so that even top sportsmen need to practice). But these parameters must come from somewhere, which implies that another, sensorimotor level must be involved.

Preparing for perceiving and acting

Dondersian experimental analyses of the processing stages intervening between stimulus presentation and response presuppose that all the interesting cognitive processes take place in between these two events. Consider, for instance, how Donders (1868) assessed the human will. By comparing reaction times between tasks with differing stimulus- and response-processing demands he managed to isolate and measure the duration of what nowadays would be called response selection, which he called the “determination of the will”. Obviously, the idea was that people would await a stimulus and only then start thinking about what to do. Even though more modern versions of Donders’ stage analysis (e.g., Pashler & Johnston, 1989; Sternberg, 1969) would not deny that stimulus-response links are somehow prepared before a stimulus is processed, they do not have the methodological means to consider these preparation processes in their analytical designs. Accordingly, preparation is commonly not an issue in stage-theoretical approaches. And yet, especially with regard to action control there are reasons to assume that some if not all of the more interesting processes take place long before the stimulus appears. Probably the first who considered this possibility was Sigmund Exner (1879), who discussed the example of a speeded manual response to the onset of a visual stimulus. He noticed that long before the stimulus comes up, he had already set himself into some kind of state that ensured that the response would be carried out efficiently and as intended. Evoking that state is a voluntary, attention-demanding act, so he argued, but once the state is created the response is actually involuntary or at least automatic in the sense that no further effort of will is needed to translate the stimulus into the action. If so, traditional reaction time analyses would tap into the more or less automatic chain of processes previously set up by will or, to use a more modern term, executive control. In other words, what most psychological experiments are picking up might be considered willfully prepared reflexes (Hommel, 2000; Hommel & Elsner, 2009), in addition to the impact of automatically created stimulus-response instances (Logan, 1988) or bindings (Hommel, 1998a, 2004).

One may encounter this rather skeptical view by arguing that task-preparation processes are successfully addressed by the rapidly increasing number of task-switching studies (e.g., Monsell, 2003). In these studies, people switch back and forth between multiple tasks, what commonly creates separable performance costs on trials that require a switch. However, the repetitive nature of these studies raises the possibility that people prepare for and schedule task switches just like any other task-specific process, so to automatize the act of switching. Indeed, not only is there evidence that substantial amounts of task-switching costs actually reflect proactive interference (Allport, Styles & Hsieh, 1994), stimulus-induced task-set retrieval conflicts (Waszak, Hommel & Allport, 2003), and other preparation-unrelated processes (Wylie & Allport, 2000), but even the remaining process of implementing the appropriate task set may be realized by automatic retrieval induced by the task-switching cue (see Logan, Schneider & Bundesen, 2007). If so, even true switching costs may measure nothing but the time demands of previously prepared cognitive reflexes.

A number of recent observations support the prepared reflex notion. For instance, Kunde, Kiesel, and Hoffmann (2003) found that subliminally presented irrelevant stimuli can trigger actions if they fit the apparently previously established stimulus-response rule—even if these stimuli never served as targets and were not encountered earlier in the experiment. Wenke, Gaschler, and Nattkemper (2005) demonstrated that stimulus-response rules held in mind for a later trial are automatically applied to stimuli in another, intervening task. Along the same lines, Cohen-Kdoshay and Meiran (2007) found that, in a flanker task, response-incompatible flankers interfere with responding even in the very first trial, that is, before any experience-based stimulus-response association could have been established.

TEC provides two theoretical tools to understand how preparation enables action control. First, it assumes that feature codes the activation of which overlaps in time are automatically integrated into event files (Hommel, 1998a, 2004; Hommel et al., 2001a). Integration occurs irrespective of the temporal sequence of the underlying distal events, what matters is only whether these events induce activations that fall into the same integration window. There is evidence that people can tailor the size of integration windows to the situation at hand (e.g., to the temporal density of events: Akyürek, Toffanin & Hommel, 2008) and tend to lose sequential information if two events fall into the same window (Akyürek et al., 2008; Akyürek, Riddell, Toffanin & Hommel, 2007). This integration mechanism is perfectly suited to allow for both stimulus-response learning and response-effect (i.e., response-stimulus) learning, as it does not care whether the stimulus leads or follows the action. It is also perfectly suited to generate ideomotor action. Note that for ideomotor theory to work the original sequence of processing first the action and then the effect (when experiencing an effect for the first time) needs to be reversed when reactivating the action code by activating the code of its effect. In other words, ideomotor action control presupposes that action-effect learning generalizes to effect-action retrieval—which again requires a mechanism that does not care about brief time delays. Indeed, there is ample evidence that actions and stimuli are automatically bound irrespective of whether the stimulus leads or follows the action (Dutzi & Hommel, 2009; Elsner & Hommel, 2001; Hommel, 2005). The underlying binding mechanism allows for both episodic learning when running through the trials of an experiment (comparable to instance learning as envisioned by Logan, 1988) and the preparation of task-related stimulus-response and response-effect bindings before performing a task. Given that integration relies on code activation but not stimulus presentation or response execution, and given that ideomotor theory assumes that effect and action codes must be accessible and activatable by “thinking of” (i.e., generating the idea of) the coded events, imagining and playing through the task rules and relevant sequences of stimuli, responses, and effects is likely to create the code-activation overlap necessary for integration. If so, mentally playing through a task should result in weak but functional bindings between stimuli and responses and between responses and effects. The weakness of these preliminary bindings may well lead to errors (which are often observed during the first trials of experiments) but these will quickly be avoided by adding experience-based bindings acquired through practice.

The second mechanism TEC provides for understanding the impact of preparatory operations on performance is “intentional weighting” (Hommel et al., 2001a). The assumption is that preparing for a task involves the priming of task-relevant feature dimensions, such as color, shape, or perhaps higher order perceptual or semantic features. Priming a feature dimension increases the impact of features being coded on it on object selection and performance. One example for how this mechanism works comes from Memelink and Hommel (2005, 2006). They interleaved a two-dimensional Simon task (which could produce both horizontal and vertical stimulus-response compatibility effects) with another task that required either horizontal or vertical stimulus coding. This other task strongly impacted performance in the Simon task by increasing compatibility effects on the dimension it made salient and decreasing compatibility effects of the other dimension. Further evidence is provided by Fagioli, Hommel, and Schubotz (2007) and Wykowska, Schubö, and Hommel (2008). They showed that preparing a manual grasping or reaching action facilitates the detection and discrimination of targets in an unrelated interleaved task if these targets are defined on action-relevant dimensions (like shape or size and color or contrast, respectively). This latter observation is particularly interesting with regard to the interaction between high-level feedforward action programming and low-level sensorimotor online adaptation. In contrast to Milner and Goodale (1995), who attribute the entire control of action to such online channels, TEC (Hommel, 1996; Hommel et al., 2001b) and related approaches (e.g., Glover, 2004) maintain that high-level processes take care of the feedforward programming of goal-relevant action characteristics, whereas low-level processes are responsible for the online adaptation of the action to current circumstances. This latter assumption raises the question of how high-level processes can steer low-level processes towards the stimulus information that is needed to fill in the parameters left open by action programming. Intentional weighting along the lines of Fagioli et al. (2007) and Wykowska et al. (2008) provides an answer: low-level channels process any available information in principle but the top-down weighting of task-relevant stimulus dimensions makes sure that stimulus codes from these dimensions dominate the specification of open action parameters (Hommel, 2009). For instance, preparing for the grasp of an object may involve the pre-programming of invariant characteristics of the approach movement, the relation between hand aperture and object size, and so forth, as well as the top-down priming of size-related feature maps so to facilitate the processing of size information by online channels.

Activating stimulus and action codes

Once the task intentions have been transformed into the appropriate event files, stimuli can automatically induce activation of the event representations they refer to. Event files are considered networks of codes that may relate to any event-related aspect, such as activation conditions and context, actions and action effects, or thoughts that regularly accompany the event. Their activation follows a pattern-completion logic, which means that activating one member of a network will automatically spread activation to all the other members. However, how strong and efficient activation will spread depends on whether and how strongly the dimension on which a given network member is defined is primed by task relevance, i.e., intentionally weighted. Hommel, Memelink, Zmigrod, and Colzato (2008) investigated under which circumstances previously created stimulus-action bindings involving color, shape, and (response-) location codes affect performance in the following trial. As has been observed previously (Hommel, 1998a), performance was worse if the present shape-response and color-response combinations partially mismatched the combinations in the previous trials, so that one feature was repeated while the other alternated. This suggested that stimulus and response codes were automatically bound and retrieved upon repetition of any feature. Importantly, however, bindings that matched the current attentional set had a stronger impact on performance, and this was true irrespective under which attentional set these bindings had been created. This suggests that task relevance and the corresponding attentional biases do not affect the creation of bindings between stimulus and response information but the retrieval of these bindings. More specifically, the attentional set strongly affects which ingredients of an event file are reactivated during event-file retrieval. However rich and comprehensive thus an event file may be, in a given situation mainly the task-relevant (or otherwise primed) codes it includes will be reactivated.Footnote 2

The TEC approach to the representation of stimulus and action events and the way these representations are controlled explains various phenomena that otherwise are hard to understand and it turned out to be successful in leading to the discovery of various novel phenomena. Effects of compatibility between stimuli and responses (such as Simon or Stroop effects) are an example. Compatibility effects are commonly attributed to the similarity between stimuli and responses and the overlap of stimulus and response representations (cf., Kornblum, Hasbroucq & Osman, 1990; Prinz, 1990; Wallace, 1971). Unfortunately, however, most approaches are silent with regard to the crucial questions of (a) in which sense stimulus and response features can overlap at all and (b) how these features are cognitively represented in such a way that overlap can produce compatibility effects. For instance, the dimensional overlap model of Kornblum et al. (1990) merely assumes that a stimulus that feature-overlaps with a response automatically primes this response’s representation without explaining why and how. TEC (as other ideomotor approaches: e.g., Greenwald, 1970) provides an intuitive and mechanistically straightforward answer: if two given representations feature-overlap they are literally related to the same neural codes, that is, they physically overlap in the sense that they share ingredients.Footnote 3 Whether these representations function to represent the stimulus or the response in a given task does not matter and does not have any implication for the way the event is represented.

TEC also explains why feature overlap can create compatibility effects even if one of the overlapping features is actually task irrelevant. For instance, the Simon effect refers to the observation that spatial responses to non-spatial stimulus features (such as color or shape) are faster if the stimulus location corresponds to the response location (Simon & Rudell, 1967). Numerous authors have wondered why stimulus location is considered at all in a task where the relevant stimulus feature is not spatially defined, and they have postulated dedicated mechanisms that function to create spatial stimulus codes even under such circumstances. For instance, Umiltà and Nicoletti (1992) and Stoffer (1991) have claimed that it is actually not the stimulus that is spatially coded but the movement of the attentional focus towards the stimulus location. Given the Simon effects can be related to all sorts of spatial reference frames, such as egocentrically, allocentrically, object-relative, or effector-relative stimulus location (e.g., Hommel & Lippa, 1995; Lamberts, Tavernier & d’Ydewalle, 1992), and that Simon-type effects have been reported for non-spatial feature overlap (e.g., Kornblum, 1994), attentional approaches have an extremely limited scope and fail to address all but the original version of the Simon effect. In contrast, TEC provides a straightforward account that applies to all existing versions: given that the responses in a Simon task are spatially defined, the corresponding spatial stimulus maps are intentionally weighted—so to allow for the proper spatial coding of the responses (i.e., the discrimination and identification of spatial action effects). As stimuli and responses are coded in the same way and by using the same stimulus maps, the system is structurally unable to prime the processing of response locations without priming the processing of the locations of any other event—such as the stimuli in a Simon task.

Somewhat paradoxically, this account predicts that the spatial Simon effect should be reduced or even absent if the responses would not be spatially defined, which of course is hard to test in a task that relies on spatial responses to measure the effect. But there are reasons to assume that the prediction holds nevertheless. By using ERPs, Valle-Inclán and Redondo (1998) replicated earlier observations that lateralized stimuli activate lateralized readiness potentials (LRPs) in the opposite cortical hemisphere, suggesting that stimuli can indeed activate spatially corresponding effectors. Interestingly, the relevant S-R mapping was not fixed in this study, but varied randomly from trial to trial, as did the temporal order in which the mapping and the stimulus were presented. If the mapping preceded the stimulus, the stimulus activated the spatially corresponding response (i.e., evoked a contralateral LRP) irrespective of which response was actually correct. But if the stimulus preceded the S–R mapping, this activation was no longer observed. If we consider that the response set could only be prepared if the mapping was known, we can conclude that the automatic processing of stimulus location up to the activation of responses, the hallmark of the Simon effect, presupposes that the task is properly prepared. As predicted by TEC, preparing for action involves the intentional weighting of response-related feature dimensions, and this is indeed what seems to be required for the Simon effect to occur. The Stroop effect seems to work the same way. The effect refers to the observation that naming a color is hampered by presenting it in the shape of an incongruent color word (Stroop, 1935). Interestingly, the effect is much more pronounced with vocal than with manual responses (cf., Magen & Cohen, 2007). According to TEC, this is what one would expect as preparing for uttering color names should lead to the stronger intentional weighting of coding systems that are involved in processing the vocal action effects: color words, which happen to be the main distractors in this task.

Apart from its more ambitious scope, the advantage of TEC over alternative accounts of stimulus-response compatibility is that it does not only predict that feature overlap between stimuli and responses affects performance but it also explains why this is the case. But TEC is also able to predict novel compatibility effects that other accounts have no obvious way to handle. Note that what experimenters call stimuli and responses are considered by TEC as events that play different roles in a given experiment (one being externally triggered and the other being internally generated) but that are cognitively represented in an equivalent fashion and, most importantly, in the same coding domain. Feature overlap between stimulus and response thus means that neural codes are virtually shared by different representations and that it is this sharing that produces stimulus-response compatibility. If so, and if being a stimulus and a response is just a role a given event is arranged to play, one would expect that responses can also overlap with stimuli and thus affect stimulus processing. That is, TEC predicts response-stimulus compatibility effects. Indeed, several types of such compatibility effects have been demonstrated so far. For instance, preparing a spatially defined manual response systematically affects the detection (Müsseler & Hommel, 1997a) and identification (Müsseler & Hommel, 1997b) of masked arrowheads pointing in response-compatible or incompatible directions. The processing of masked compatible or incompatible direction words is unaffected by manual action preparation (Hommel & Müsseler, 2006). In contrast, preparing for vocally responding with a direction word interacts with the identification of masked visual direction words but not with the processing of arrowheads (Hommel & Müsseler, 2006). These observations rule out possible interactions at a purely semantic level (which as such would not be inconsistent with TEC) but point to interactions between feature codes.

Another interesting prediction from a TEC perspective is that what stage models commonly call stimulus-response translation, a process that has widely been claimed to be highly capacity limited and strictly serial (Pashler, 1994), should actually occur automatically and in parallel. According to TEC, stimulus and response features are integrated into event files and prepared to some degree before a task is performed, so that registering a stimulus should suffice to spread activation to the related response. As the task proceeds, the stimulus-response links are further strengthened by the continuous integration of co-varying stimulus and response features (Hommel, 2005), making the translation even smoother. Automatic stimulus-response translation was indeed demonstrated under dual-task conditions, which previous approaches considered to render automatic translation impossible. In the study of Hommel (1998b), subjects carried out two responses to two stimuli in a row, as in other dual-task studies. However, the second, vocal response was chosen to be compatible or incompatible with either the first (manual) response or the first (visual) stimulus. For instance, the second response could consist in saying “green” while the first stimulus was green (i.e., compatible) or red (incompatible). Unsurprisingly, the second response was affected by compatibility with the first stimulus. More importantly, however, responses to the first stimulus were faster in compatible than in incompatible conditions. Not only does this amount to another demonstration that activating a response can lead to the priming of a feature-overlapping stimulus, but it also shows that the second response was activated at a point in time when the stimulus-response translation related to the first response was not yet completed. In other words, stimulus-response translation for the two tasks must have occurred in parallel, which contradicts serial translation accounts but provides support for a TEC-inspired approach.

Selecting and planning an action

Traditional approaches to action control make a fine distinction between the selection and the programming of an action and this distinction seems so obvious and intuitive that most authors use it without much theoretical justification (e.g., Kornblum et al., 1990; Pashler, 1994; Sanders, 1983). This is understandable from an information processing approach to cognition that applies the computer metaphor to biological systems. According to that perspective, selecting an action is explicitly or implicitly conceived of as choosing an abstract symbol representing the appropriate action whereas action programming consists of translating that symbol into a program that can actually operate on the available hardware. Depending on the circumstances, this translation process may require the retrieval of an existing motor program or the construction of a novel program from scratch. Action selection and programming are commonly assumed to represent two sequential stages, with selection being particularly capacity demanding (e.g., Pashler, 1994).

It should be clear from the previous discussion that TEC does not fit with this traditional line of thought. Generating the idea of an action is considered to involve the activation of codes representing the perceivable effects of that action. These effect codes are assumed to be integrated into sensorimotor networks or the event files serving both to register and to produce the coded effects. In other words, “thinking of an action” always involves the tendency to generate that action motorically by spreading activation from effect codes to the associated motor codes (cf., Jeannerod & Decety, 1995; Keysers & Perrett, 2004). Given that this process of (not necessarily consciously) “thinking of an action” is considered to be crucial for selecting an action, this has two major implications that conflict with the traditional selection-programming distinction. First, action selection and action programming are conceptually interwoven: if selecting the feature of an action consists of activating the code(s) representing that feature and if this activation spreads to the motor codes responsible for generating it (i.e., shaping the performance of the action in such a way that the given feature is produced), selecting means programming that feature—or, in TEC terminology, selecting an action involves planning it. Second, if action selection and programming are that much interwoven it makes little sense to distinguish between these two processes and to assume that they occur in a strict temporal sequence.

A number of observations are consistent with TECs failure to distinguish between the selection and the programming of an action. For instance, the time it takes to initiate an action is known to increase with the action’s complexity, which is assumed to reflect the greater programming demands with more complex actions (Henry & Rogers, 1960). Along the same lines, initiation times increase with the eventual duration of the action (Klapp, 1995) and even with the duration of action effects (Kunde, 2003). Interestingly, reaction time for the same action increases with the complexity of alternative actions (e.g., Rosenbaum, Salzman & Kingman, 1984; Semjen, Garcia-Colera & Requin, 1984). This may be due to preparatory effects but it may just as well be that action selection is affected by the extent of the action plans or event files involved. TEC reasoning suggests that selecting an action is a temporally extended process of increasing the activation of feature codes up to a threshold or until the action is initiated (see below). In the beginning of the decision process, multiple action plans may be activated, which would allow their codes to engage in facilitatory or inhibitory interactions. Obviously, the amount of interactions would be larger the more actions are involved (which would account for Hicks’ law) and the more complex the plans are. Evidence that the complexity of plans is considered during response selection is provided by Meegan and Tipper (1999), who showed that irrelevant stimuli that signal more complex actions are less distracting than stimuli signaling less complex actions. Irrelevant stimuli that are related to action alternatives are assumed to impact response selection (e.g., Eriksen & Schultz, 1979; Lu & Proctor, 1995), which suggests that the finding of Meegan and Tipper demonstrates that response selection reflects action-planning demands. It is difficult to see how demonstrations of this sort can be accommodated by approaches that draw a strong line between response selection and response programming.

Let us take another example. Stimulus-response compatibility phenomena like the Simon effect are commonly attributed to what traditional models call the response-selection stage (e.g., Kornblum et al., 1990). The idea is that stimuli tend to activate feature-overlapping responses, which leads to response conflict and, thus, to a delay of response selection if the activation targeted the wrong response. A straightforward prediction from this assumption is that compatibility effects should disappear if the response can be selected before the activating stimulus appears. This prediction clearly failed: Hommel (1995, 1996) validly precued the correct response in each trial of a Simon task, so that the left or right key press could be selected and even programmed long before the lateralized stimulus appeared. Nevertheless, substantial Simon effects of undiminished size were obtained. From a TEC point of view this observation is unsurprising: selecting and programming the appropriate action is assumed to consist of activating the codes that represent the features of that action, including codes representing the action’s location. As long as this action plan (i.e., the network of activated codes) is not executed it must be maintained, and while it is maintained it is of course vulnerable to changes in the activation states of the codes it consists of. Processing a stimulus that shares one or more on these codes is likely to change these activation states (by increasing their activation in compatible trials or by activating conflicting codes in incompatible trials), so that there is no reason why compatibility effects should not occur. In contrast, traditional stage-like approaches face the difficulty to explain how stimuli can affect processes that are assumed to be completed already.

Further problems for traditional approaches come from studies that used deadline techniques, which require subject to carry out an action when a go signal is presented irrespective of the progress of the planning process. When used in a manual reaching task, premature go signals have been found to produce actions that seem to rely on default parameters, such as the spatial average of the alternative goal locations (Ghez, Hening & Favilla, 1990; van Sonderen & Denier van der Gon, 1991). A stepwise increase of the delay of the go signal led to a continuous transition from this default parameter to the actual goal parameter. TEC provides a natural account for this observation: early in the process of accumulating evidence for one of the spatial responses the codes representing all possible end locations will be activated (an assumption that is consistent with observations from single-cell recordings in monkeys: Cisek & Kalaska, 2005), so that executing the response at this point will reflect the joint impact of these activations. As time passes, the code of the correct response will increasingly dominate and outcompete codes of the alternative responses, so that the factual end location will approach the actual target location the later in time the go signal appears. However, traditional stage approaches face a couple of problems. One problem is that it is far from obvious how selection and programming should be distinguished with aiming tasks of that sort. Another problem is that selection should take place between symbols that stand for clearly defined end locations, so that deadline-induced errors in the selection process should lead to the execution of alternative responses but not to spatial averaging.

Another interesting implication of TEC is that it provides a straightforward explanation for why response selection represents a capacity-demanding processing bottleneck in many tasks. That response selection often functions as a bottleneck has been suspected for a long time. Welford (1952) was among the first to assume that most cognitive processes may be able to run in parallel in the service of multiple tasks but the selection of an action may be an exception. Indeed, systematic research has accumulated evidence that human multitasking abilities are mainly restricted by the apparent seriality of response selection (Pashler, 1994). While many authors subscribe to this view, there is hardly any evidence on and very few theoretical considerations about why that may be the case. Moreover, authors often fail to distinguish between stimulus-response translation and response selection, suggesting that the latter is achieved by the former (i.e., responses are selected by translating stimuli according to particular rules; e.g., Pashler, 1998). However, in view of the just discussed evidence that stimulus-response translation proceeds rather automatically (Hommel, 1998b), the actual bottleneck remains even more of a mystery. The few ideas that are available consider response selection proper as a global operation that considers multiple sources of information and multiple brain areas (see Hommel, 1998b; Pashler, 1993). Making sure that a given response is correct requires the agent to integrate several pieces of information, such as the given stimulus, the most activated (but not yet selected) response, and the task goal. According to TEC, all this information is widely distributed across the brain, which necessarily renders the process responsible for integrating it a global operation. As brain-imaging studies suggest, global operations monopolize the communication between brain areas and thus create a bottleneck by temporarily suppressing communication related to other events (Gross et al., 2004). With regard to the integration of features for perception, this has led to the assumption that features can be integrated for only one event at a time (Treisman, 1988) and, given that TEC does not distinguish between perceived and produced events, it makes sense to extend this consideration to the integration of action features. Along these lines, if and as far as response selection constitutes a processing bottleneck this is because the integration process it requires is global and thus monopolizes communication in the cognitive system.

At this point, it is not clear whether every single action requires integration (and thus creates a bottleneck) but the existence of action-related integration has been documented in several ways. As briefly mentioned earlier, Müsseler and Hommel (1997a, b) investigated the impact of action planning on perceptual processes. They had subjects prepare a left or right key-pressing action and presented a masked left- or right-pointing arrowhead briefly before or during the execution of the key press. The location of the key press interacted with the direction of the arrowhead, which confirms that action planning can affect perception. Interestingly, however, the effect pattern was opposite to what one may have expected at first sight: report of the arrowhead direction was worse if it corresponded to the key press. A positive correspondence effect would seem more obvious: planning the action should involve activating the location code that represents the relative location of the key and/or the effector operating it, and this activation should prime the identification of a feature-overlapping event—the compatible arrowhead that is. However, note that the task involved two different events the coding of which was made to overlap in time. If the codes relating to these two events would just be activated but not further integrated, the system would have no means to determine which code would belong to which event—the notorious binding problem. An integration process would need to make sure that a given code is part of one particular event but not of any other, which should make it difficult to use that very code for the representation of another event (Hommel et al., 2001a, b; Müsseler & Hommel, 1997b; Stoet & Hommel, 1999). To illustrate that, consider the possibility that feature binding operates by means of synchronizing the neural codes representing these features (Fries, 2005: von der Malsburg, 1999), which would mean that all neurons that refer to features of the same event will fire in the same rhythm. Representing another binding at the same time with members firing at the same frequency is possible in principle, but only if the frequencies differ in phase (Raffone & Wolters, 2001). For a code that is related to multiple events this poses the problem of which phase it should join (given that joining both by increasing the frequency to the lowest common denominator would require unrealistically high oscillation frequencies, see Kopell, Ermentrout, Whittington & Traub, 2000). If we assume that it is more likely to join and stay with the first binding it was entrained with, the observations of Müsseler and Hommel would be easy to understand: planning the action involved the integration of the corresponding feature codes, including location codes, so that integrated codes were not, or not that easily available for coding the direction of the arrowhead (Hommel, 2004).Footnote 4

According to this code-occupation account any integrated binding should impair the integration of any other feature-overlapping event. Hence, preparing an action and maintaining the plan in the presence of other, temporally overlapping events should impair not only perceptual processes but other planning operations as well. Stoet and Hommel (1999) investigated this matter by having subjects prepare a left or right key-pressing action (A1), perform a speeded left or right key press to a central stimulus (A2), and only then carry out the planned action (A1). Performance on A2 was clearly affected by the A1 plan, showing worse performance if A1 and A2 overlapped in location. This was the case even if A2 was carried out by hand and A1 by foot, which excludes an account of the observation in terms of peripheral interactions or the inhibition of the A1-related effector in order to prevent immediate execution. Moreover, reminding subjects of A1 until it was to be executed (so that advance planning was not strictly necessary) turned the effect positive, suggesting that codes were primed but not yet integrated. This confirms that code occupation does require feature codes to be integrated. Later research provided evidence that the code-occupation effect is more likely with unpracticed actions and/or effectors: Wiediger and Fournier (2008) obtained the effect for the left but not the right hand of right-handers. If we assume that planning an action, in contrast to retrieving a stored action plan, and integrating the features involved is needed for unpracticed actions only (Melcher et al., 2008), this finding fits well with the idea that planning and integration are related. However, more research is certainly needed with regard to the question of when and under what circumstances integration is necessary and takes place. Particularly interesting is the question whether integration presupposes a confusability of features and thus occurs only if multiple event representations overlap in time.

Monitoring an action

According to ideomotor approaches, including TEC, action control is strongly anticipatory in nature. For one, this is obvious from the way these approaches conceive of action selection, which is assumed to be guided by the previously experienced and presently expected effects of the considered actions. For another, integrating action patterns with codes of to-be-expected effects provides an ideal means to evaluate the success of the action once it is executed. Cybernetic models of action control assume that this evaluation is achieved by comparing expected action effects with actually generated action effects, the action being more successful the smaller the discrepancy between them (e.g., Adams, 1971; Schmidt, 1975; Wolpert & Ghahramani, 2000). Substantial discrepancies would thus signal a failure of action control, to be remedied in following attempts. Recent observations from electrophysiological studies support the idea that acquired action effects are involved in signaling action-related errors.

Waszak and Herwig (2007) had subjects acquire associations between left and right key presses and tones of different pitch, as in the study of Elsner and Hommel (2001), before presenting them with an auditory oddball task, where numerous standard tones and infrequent deviants (tones that differed from the standards in frequency) appeared. Auditory deviants produced a P3 component in the ERPs (Pritchard, 1981) that was more pronounced when it was preceded by the response that was associated with the standard. This demonstrates that the acquisition of action-tone associations affected tone processing in such a way that a cortical signal was generated if the currently generated tone did not match the expected tone.

Along similar lines, Band, van Steenbergen, Ridderinkhof, Falkenstein, and Hommel (2008), presented subjects with a probabilistic learning task, in which some key presses were followed by a tone of a particular pitch in 80% of the trials and by a tone of another pitch in 20% of the trials. In other words, these key presses produced one more expected and one less expected auditory action effect. Interestingly, the less expected effects generated an ERP component that is commonly seen with the presentation of negative performance feedback, the so-called feedback-related negativity (FRN: Miltner, Braun & Coles, 1997). This fits with the assumption that learned action effects are exploited for predicting upcoming events and matched against actually achieved effects. There was even some evidence that mismatches lead to adaptations in action control: reaction times of the trials following the presentation of the less expected effects were increased as compared to the trials following more expected effects.

Apart from this evaluative function, the comparison between expected and achieved effects may play another role as well. Holroyd and Coles (2002) argued that the FRN, just like the error-related negativity (Falkenstein, Hohnsbein, Hoorman, & Blanke, 1990; Gehring, Goss, Coles, Meyer & Donchin, 1993), reflects a negative reinforcement signal from the mesencephalic dopamine system to modulate reinforcement learning. According to that view, stimulus-response combinations that lead to the unexpected absence of reward lose associative strength, whereas combinations that lead to the unexpected presence of reward gain associative strength (see Schultz, 2002). If we consider that the integration of stimuli and actions is to some degree blind to the actual sequence of the to-be-integrated events (Hommel, 2005), we can extend this logic to action-effect acquisition: novel action effects are entirely unexpected, which would induce a dopaminergic boost that leads to action-effect integration. With increasing experience the effects would become more predictable, which would reduce the dopaminergic signal and reduce learning, thus producing the well-known asymptotic learning curve. For stimulus-response integration in humans, there is indeed evidence for a dopaminergic basis. Stimulus-action binding has been found to increase in the presence of pictures with a positive valence (Colzato, van Wouwe & Hommel, 2007a), which are assumed to drive dopaminergic activity to a more effective level, and to decrease under stress (Colzato, Kool & Hommel, 2008), which is assumed to drive dopaminergic activity beyond effective levels. Along the same lines, stimulus-action binding is stronger in populations that are likely to have a more effective dopaminergic supply at their disposal, such as people with high spontaneous eye-blink rates (Colzato, van Wouwe & Hommel, 2007b) and recreational cannabis users (Colzato & Hommel, 2008).


One of the aims of this article is to caution against the widespread tendency to take the setups of the experimental tasks we use in our laboratories too seriously and to tailor our theories too tightly to them. Presenting carefully selected stimuli and measuring arbitrary responses to them provides many advantages, but real actions are commonly not driven by stimuli, not carried out to subserve meaningless goals, and not aimed at carrying out movements for their own sake. Few theories account for that but many still consider the stimulus as the precursor and main predictor of action. One purpose of formulating TEC was to provide an alternative perspective that allows (better) to take intentions and the goal-directed nature of action into consideration, and to do so in a neurobiologically plausible way. TEC makes an attempt to explain that and how human action is anticipatory in nature, how anticipations emerge through experience, and how the anticipation of action effects comes to regulate human behavior. In particular, we have seen that anticipations serve at least two purposes: the selection of appropriate actions and the evaluation of action outcomes in the context of a particular goal.

Another aim of this article was to show that TEC does not yet provide a full-fledged account of action but that it provided fruitful guidelines for asking new questions, generating new data, and interpreting them in the context of a coherent theoretical framework. However, more work needs to be done. Among other things, a better understanding is needed for how more complex, multistep actions are acquired and controlled, how motivational processes affect the preparation and execution of actions, and how individual learning and experience, and external constraints interact to create action goals. From a TEC perspective, this calls for connecting the basic architecture to self-related long-term structures and for getting to grips with the neuromodulators that are driving the activation and integration of feature codes.