A surge of research interest over the last 2 decades has resulted in widespread agreement that anterior cingulate cortex (ACC) is partly responsible for cognitive control and decision making (e.g., Euston, Gruber, & McNaughton, 2012; Ridderinkhof, van den Wildenberg, Segalowtiz, & Carter, 2004; Rolls, 2009; Rushworth & Behrens, 2008; Rushworth, Kolling, Sallet, & Mars, 2012; Shackman et al., 2011; Shenhav, Botvinick, & Cohen, 2013; Walton & Mars, 2007), but the development of a formal, comprehensive theory about its function has remained out of reach (e.g., Alexander & Brown, 2011; Botvinick, Braver, Barch, Carter, & Cohen, 2001; Brown & Braver, 2005; Holroyd & McClure, 2015; Khamassi, Lallee, Enel, Procyk, & Dominey, 2011; Silvetti, Seurinck, & Verguts, 2011; Verguts, 2017; Verguts, Vassena, & Silvetti, 2015; Yeung, Botvinick, & Cohen, 2004). Efforts in this direction have been hindered by a complex panoply of empirical findings. For example, a salient observation across neurophysiological studies is that individual cells in ACC tend to respond to multiple task events, which seems to implicate a role for ACC in everything (Ebitz & Hayden, 2016). Thus the exact computational function of ACC—and even whether it has one (Bush, 2009)—remains controversial (Holroyd & Yeung, 2011).

Here we present a simple computational model of ACC that illuminates many of these disparate findings. Our approach is motivated by three considerations. First, the model is based on the long-standing idea in cognitive psychology that cognitive processes are encoded across networks of interconnected processing units (Rumelhart & McClelland, 1986). Although this perspective has recently gained currency in behavioral neuroscience, which is increasingly emphasizing the collective activity of ensembles of neurons rather than the tuning properties of individual cells (e.g., Fusi, Miller & Rigotti, 2016; Rigotti et al., 2013; Yuste, 2015), to our knowledge it has yet to be applied to formal models of ACC. Second, the model takes into account ubiquitous findings from human neuroimaging and electrophysiological studies that ACC is sensitive to response conflict, errors, and surprising events (e.g., Botvinick et al., 2001). Third, the model is based on our previous argument, derived partly from lesion data in human and other animals, that ACC is concerned with the execution of extended, goal-directed action sequences (Holroyd & Yeung, 2012). Here we develop that idea by proposing that the ACC predicts each successive step in the sequence. On this account, the model yields distributed patterns of activity across ACC neurons that track the progression of action sequences (in keeping with the neurophysiological data), and produces a discrepancy signal when any step of a sequence deviates from the predicted step (in keeping with the neuroimaging data).

We begin by providing a very brief overview of neurophysiological data related to ACC. We then home in on a series of studies that suggest a special role for ACC in the execution of goal-directed action sequences. Next, we describe principles for modeling action sequences based on recurrent neural networks (RNNs), and apply these principles to simulate the activity of ACC neurons of rats performing a sequential task. Consistent with empirical findings, the individual units in the model respond to multiple task events, whereas distributed patterns of activity across the units follow the progression of each sequence. We also illustrate how these same principles account for the surprise, error and conflict signals produced in ACC when humans perform speeded response time (RT) tasks, as revealed by human event-related brain potential (ERP) and functional magnetic resonance imaging (fMRI) experiments. Finally, in supplementary materials, we illustrate how these signals could be applied to regulate action execution; these simulations suggest that ACC should be especially important for maintaining contextual information that disambiguates related action sequences. These observations are discussed in terms of our proposal that the ACC is responsible for motivating the execution of extended behaviors (Holroyd & Yeung, 2011, 2012).

Background

Neurophysiological and neuroimaging observations of ACC

A major complicating factor in understanding ACC is the wide range of neurophysiological findings that seem to implicate it in most task events (Ebitz & Hayden, 2016). For example, ACC neurons are seen to respond to stimulus events (Nishijo et al., 1997), motor activity (Backus, Ye, Russo, & Crutcher, 2001; Russo, Backus, Ye, & Crutcher, 2002; Shima et al., 1991), rewards (Amiez, Joseph, & Procyk, 2005; Kennerley, Behrens, & Wallis, 2011; Luk & Wallis, 2009; Sallet et al., 2007; Seo & Lee, 2008), errors (Shen et al., 2015; Totah, Kim, Homayoun, & Moghaddam, 2009; C. Wang, Ulbert, Schomer, Marinkovic, & Halgren, 2005), prediction errors and surprise signals (Bryden, Johnson, Tobia, Kashtelyan, & Roesch, 2011; Hayden, Heilbronner, Pearson, & Platt, 2011; Klavir, Genud-Gabai, & Paz, 2013; Matsumoto, Matsumoto, Abe, & Tanaka, 2007), pain and its anticipation (Koyama, Tanaka, & Mikami, 1998) and, more controversially, in conflict (Davis et al., 2005; Ebitz & Platt, 2015; Kaping, Vinck, Hutchison, Everling, & Womelsdorf, 2011). ACC neurons are also related to cognitive processes associated with working memory (Niki & Watanabe, 1976), long-term memory formation (Weible, Rowland, Monaghan, Wolfgang, & Kentros, 2012), effortful control (Davis, Hutchinson, Lozano, Tasker, & Dostrovsky, 2000; Hillman & Bilkey, 2010, 2012) and even grid cell representations (Jacobs et al., 2013). Complicating matters further, many of these neurons reveal interdependencies across events. For example, ACC cells are said to provide a “gateway” through which decision making systems affect behavior (Cai & Padoa-Schioppa, 2012) by linking or “multiplexing” information about rewards and actions (Hayden & Platt, 2010; Shima & Tanji, 1998; Tanji, Shima, & Matsuzaka, 2002) and predictive stimuli (Takenouchi et al., 1999). The firing patterns of many ACC neurons are multi-determined (Shidara, Mizuhiki, & Richmond, 2005), reflecting multiple aspects of value-based decision making (Hoshi, Sawamura, & Tanji, 2005; Kennerley, Dahmubed, Lara, & Wallis, 2009; Kennerley & Wallis, 2009; Khamassi, Quilodran, Enel, Dominey, & Procyk, 2015) including high-level aspects of task performance such as task switching and behavior shifts (Johnston, Levin, Koval, & Everling, 2007; Kuwabara, Mansouri, Buckley, & Tanaka, 2014; Quilodran, Rothé, & Procyk, 2008). In humans, individual ACC neurons are recruited across multiple different tasks, especially when these tasks demand effort or attention (Davis et al., 2000; Wang et al., 2005).

Neuroimaging and scalp-recorded electrophysiological recordings in humans have also associated ACC with a wide range of phenomena (Holroyd & Yeung, 2011, 2012). Yet, a core set of highly replicable findings have implicated ACC function specifically in the processing of response conflict (Botvinick et al., 2001; Botvinick, Cohen, & Carter, 2004; Yeung, 2013; Yeung et al., 2004), errors (Falkenstein, Hohnsbein, Hoormann, & Blanke, 1990; Gehring et al., 1993; Wessel, Danielmeier, Morton, & Ullsperger, 2012) and otherwise surprising or unexpected events (Alexander & Brown, 2011; Braver, Barch, Gray, Molfese, & Snyder, 2001; Ferdinand & Ovitz, 2014; Forster & Brown, 2011; HajiHosseini & Holroyd, 2013; Holroyd, 2004; Holroyd, Pakzad-Vaezi, & Krigolson, 2008; Jessup, Busemeyer, & Brown, 2010; Metereau & Dreher, 2013; Nee, Kastner, & Brown, 2011; Oliveira, McDonald, & Goodman, 2007; O’Reilly et al., 2013; Silvetti et al., 2011; Wessel et al., 2012). For example, the error-related negativity (ERN) is an ERP component elicited by error commission in speeded response-time tasks (Falkenstein et al., 1990; Gehring et al., 1993; for review see Gehring, Liu, Orr, & Carp, 2012), and the N2 is an ERP component elicited by unexpected, task-relevant stimuli (Donkers, Nieuwenhuis, & van Boxtel, 2005; Ferdinand, Mecklinger, Kray, & Gehring, 2012; Gehring, Gratton, Coles, & Donchin, 1992; Holroyd, 2004; Kopp & Wolff, 2000; Oliveira et al., 2007; Warren & Holroyd, 2012), especially when the stimuli mismatch with a perceptual template of ongoing events (Jia et al., 2007; Sams, Alho, & Näätänen, 1983; Y. Wang, Cui, Wang, Tian, & Zhang, 2004; for review, see Folstein & Van Petten, 2008). The neural sources of the N2 and the ERN colocalize in ACC with an enhanced fMRI blood-oxygen-level dependent (BOLD) response to the same events (Mathalon, Whitfield, & Ford, 2003; Wessel et al., 2012), and it has been proposed that the two ERP components reflect different manifestations of a single underlying cognitive process by ACC (Cavanagh, Zambrano-Vazquez, & Allen, 2012; Wessel et al., 2012; Yeung et al., 2004).

These observations suggest that, when inspected at the cellular level, ACC function is associated with a wide range of task-related events, but when inspected at a more global level, ACC activation is strongly modulated by events related to surprise, conflict and errors.

ACC and action sequences

Although the response profiles of ACC neurons are heterogeneous, a telling set of observations have implicated ACC specifically in the production of goal-directed action sequences. Early on, Procyk, Tanaka, and Joseph (2000) and Procyk and Joseph (2001) observed that motor neurons in ACC are sensitive to the serial order of actions executed in a sequence irrespective of the actual movements performed. Shidara and Richmond (2002) also reported that ACC neurons code for the degree of reward expectancy through multi-stage tasks, and Mulder, Nordquist, Örgüt, and Pennartz (2003) observed that ACC neurons implement a “response set,” which they defined as “a cognitive-motor predisposition to organize and execute an action sequence directed towards a particular goal.” This research presaged a series of studies indicating that ACC neurons are sensitive to the sequential order of task progression across multiple task stages (Cowen & McNaughton, 2007; Fujisawa, Amarasingham, Harrison, & Buzsáki, 2008; Hayden, Pearson, & Platt, 2011; Hoshi et al., 2005; Shidara et al., 2005), as represented by distinct patterns of activity of ensembles of ACC neurons (Blanchard, Strait & Hayden, 2015; Cowen, Davis, & Nitz, 2012). These sequential activations have been said to be mediated by a working memory process intrinsic to ACC (Baeg et al., 2003), or alternatively by inputs carrying sensory and motor efferent information to ACC (Euston & McNaughton, 2006). The sequential activity is also coordinated with respect to the phase of theta oscillations of local field potentials in ACC (Remondes & Wilson, 2013), which are said to synchronize the computations of widespread brain networks (Holroyd, 2016; Verguts, 2017).

Recently, Jeremy Seamans and colleagues have systematically investigated the activity of ACC neurons during the execution of task sequences (Balaguer-Ballester, Lapish, Seamans, & Durstewitz, 2011; Caracheo, Emberly, Hadizadeh, Hyman, & Seamans, 2013; Durstewitz, Vittoz, Floresco, & Seamans, 2010; Hyman, Ma, Balaguer-Ballester, Durstewitz, & Seamans, 2012; Hyman, Whitman, Emberly, Woodward, & Seamans, 2013; Lapish, Durstewitz, Chandler, & Seamans, 2008; Ma, Hyman, Lindsay, Phillips, & Seamans, 2014; Ma, Hyman, Phillips, & Seamans, 2014). Their work focuses on distributed patterns of activity that characterize entire ensembles of neurons, rather than on the responses of the individual neurons that comprise the ensembles. Application of dimension reduction techniques reveal that network-wide ACC activity is characterized by complex trajectories through an abstract state space (Balaguer-Ballester et al., 2011), the disruption to which predicts behavioral errors (Lapish et al., 2008; see also Hyman et al., 2013; Stokes et al., 2013). On this view, ACC network activity tracks task progression through a task-dependent frame of reference, or “task space” (Lapish et al., 2008), toward the animal’s goal (Ma, Ma, Hyman, et al., 2014). The network activity is especially sensitive to transitions through subcomponents of cognitive tasks (Balaguer-Ballester et al., 2011) and is accompanied by abrupt transitions in the state space when the animal learns new rules (Durstewitz et al., 2010) or is exposed to important environmental changes (Caracheo et al., 2013). ACC network activity also discriminates between task sequences more accurately than comparable activity in the dorsal striatum (Ma, Hyman, Lindsay, et al., 2014; Ma, Hyman, Phillips, et al., 2014), and can be distinguished from the function of the hippocampus, which famously encodes sequences related to spatial navigation and other temporally organized events (Hyman et al., 2012).

In humans, fMRI evidence indicates that ACC is responsible for learning and evaluating the execution of hierarchically organized sequences of cognitive tasks (Koechlin, Danek, Burnod, & Grafman, 2002). Further, unexpected changes in sequence production tasks elicit an increased ACC BOLD response (Berns, Cohen, & Mintun, 1997; Ursu, Clark, Aizenstein, Stenger, & Carter, 2009) and larger N2 amplitudes (Eimer, Goschke, Schlaghecken, & Stürmer, 1996; Ferdinand, Mecklinger, & Kray, 2008; Fu, Bin, Dienes, Fu, & Gao, 2013; Jongsma et al., 2013; Lang & Kotchoubey, 2000; Miyawaki, Sato, Yasuda, Kumano, & Kuboki, 2005; Rüsseler, Hennighausen, Münte, & Rösler, 2003; Rüsseler, Kuhlicke, & Münte, 2003; Rüsseler & Rösler, 2000; Schlaghecken, Stürmer, & Eimer, 2000). These observations are in line with our suggestion, based largely on lesion evidence in humans and other animals, that ACC supports the execution of extended, goal directed behaviors involving multiple actions (Holroyd & McClure, 2015; Holroyd & Yeung, 2012), and that unexpected deviations from the intended sequence elicit surprise signals from ACC.

The current approach: recurrent neural networks

Accordingly, we aimed to illustrate these principles in a computational model that simulates ACC neuron activity as animals execute goal-directed task sequences and that produces surprise signals to unexpected events during those sequences. Crucially, the widely varying response profiles of individual ACC neurons suggests that this level of analysis may not be optimal for inferring ACC function. Instead, our approach leverages a long-standing idea in cognitive psychology (Rumelhart & McClelland, 1986) and computational neuroscience (Churchland & Sejnowski, 1992) that holds that neurocognitive functions are encoded across distributed populations of units rather than by individual cells. Although fundamental to computational neuroscience, this principle has only recently become more widely recognized in behavioral neuroscience (e.g., Fusi et al., 2016; Yuste, 2015)—a change in perspective driven partly by methodological advances that have allowed for the simultaneous collection and statistical analysis of data from multiple neural units (Cunningham & Yu, 2014). Thus, a basic principle of neural network theory suggests that we should simulate and analyze the ensemble activity of ACC neurons rather than the single-cell activity per se.

Here, we were inspired by the observation that recurrent connections in neural networks provide a natural means for supporting the execution of extended, goal-directed behaviors (Durstewitz, Seamans, & Sejnowski, 2000). In particular, “connectionist” style recurrent neural networks have been used extensively to study the cognitive psychology of sequence processing (e.g., Cleeremans & McClelland, 1991; Elman, 1990). RNNs provide a computational platform for exploring how distributed representations of hierarchically organized sequences unfold dynamically over time (Elman, 1990). This approach has proven especially fruitful as a proving ground for understanding the neural and cognitive representations that underpin complex action sequences with hierarchical structure. For example, previous work has illustrated how patterns of activity observed across network units can provide insight into representations shared within and across action sequences (Botvinick & Plaut, 2002, 2004). We therefore adopted RNNs for this purpose.

Connectionist networks consist of sets of interconnected, abstract units that process information passed between them (Rumelhart & McClelland, 1986). Some connectionist networks exploit a simple but powerful architecture consisting of three layers of nonlinear processing units, including an input layer, output layer, and an intermediating so-called hidden layer, connected with feed-forward excitatory projections (or weights), that can be augmented with additional layers. With the appropriate connection weights, which are determined through an iterative training process, these networks are capable of approximating any mathematical function to an arbitrary degree of accuracy (Hornik, 1991). It should be emphasized that although the connectionist framework in psychology is motivated by the massively parallel and interconnected structure of the human nervous system, the units in connectionist models are not normally understood to represent individual neurons. Rather, the networks describe in the abstract how distributed processing systems can give rise to observable behavior (Rumelhart & McClelland, 1986).

By definition, any network that contains feedback connections from downstream units to upstream units—that is, that contains connective loops—is recurrent. An important consequence of recurrence is the introduction of activity-dependent memory: Because the output of the recurrent units is subsequently passed back as input, the network retains a memory of previous events. This architecture introduces temporal structure into network processing by integrating current input with prior output. Here we adopt a particular network architecture in which the output of the hidden layer is passed, via one-to-one connections, as an exact copy to a context layer. In turn, the context layer maintains the information in a memory buffer for one discrete time step, after which it is fed back to the hidden layer as input (see Fig. 1; Elman, 1990).

Fig. 1
figure 1

Elman network. Boxes indicate collections (“layers”) of processing units (not shown); arrows indicate feed-forward excitatory connections from each layer to the next. Elman networks are recurrent neural networks consisting of an input layer, a hidden layer, an output layer, and a context layer. The input units connect fully to the hidden units, the hidden units connect fully the output units, and the context units connect fully to the hidden units; connections between the hidden layer and context layer units are one to one, such that on each time step the context layer receives an identical copy of the hidden layer activity. The networks are trained to produce each element of a sequence given the previous element as input. The number of units in each layer is task dependent

Given an initial element of a sequence as input, these so-called Elman networks can be trained to reproduce the subsequent elements of that sequence. Of interest in these simulations is how the networks disambiguate elements in a sequence by maintaining a memory of the context in which each element occurs. For example, a network may be trained to produce the sequences A->B->C and D->B->E. Recurrence enables the network to produce correctly C or E following the element that is common to both sequences (B), by retaining information about the context in which that element occurred (following an A or D). This contextual encoding can be supported across even longer and more complex sequences, such as A->B->B->B->B->C and D->B->B->B->B->E (Servan-Schreiber, Cleeremans, & McClelland, 1991).

We used an Elman network to represent the function of ACC (Elman, 1990; see Fig. 2). On this account, ACC takes as input information related to external states of the environment and actions produced by the agent, and uses that information to predict the immediately forthcoming environmental state or action. The network then detects discrepancies between what it predicts and what actually occurs, that is, it detects any unexpected deviations from the goal-directed action sequence under execution. Further, the ensemble activity of units in the hidden and context layers encode, in an abstract state-space, the progression of the action sequence as it unfolds. Note that this account holds ACC responsible only for predicting upcoming events and actions, not for producing them. In this way the model instantiates a type of forward model applied to high-level action selection, which in other domains have been used to predict the sensory consequences of low-level motor actions (Shadmehr & Wise, 2005; Wolpert & Ghahramani, 2000).

Fig. 2
figure 2

Schematic of the recurrent neural network model of anterior cingulate cortex (ACC). Each box corresponds to a layer of an Elman network as shown in Fig. 1; circles within boxes represent individual units. ACC takes as input information related to external states of the environment and actions produced by the actor (e.g., a red stimulus, that a lever was pressed, and so on), and uses that information to predict the immediately forthcoming environmental state or action. Note that the input and output layers can represent the environment in different ways, and so can be composed of different units. Differences between the model predictions (the output of the network, delayed by one sequence step; not shown) and what actually occurs (the input to the network) produce discrepancy signals (circle labeled “x”), as revealed in functional magnetic resonance imaging and event-related brain potential experiments. The role of ACC in sequence execution tasks can be inferred from distributed representations encoded by units in the hidden and context layers, as observed in the ensemble activity of ACC neurons in nonhuman animal studies

All of the simulations herein conform to the network architecture and set of computational principles illustrated in Fig. 1. Each model consists of four layers of abstract processing units including an input layer, context layer, hidden layer, and output layer. Each unit transforms, in discrete time steps, the input it receives into output according to the logistic function

$$ f(x) = 1/\left\{1+{e}^{-\left( Gx+ B\right)}\right\} $$
(1)

where x is the net input to the unit, G is the gain parameter, and B is the bias parameter. The input layer receives information about external events including stimuli in the environment and actions produced by the agent (or alternatively, “efference copies” of issued motor commands; Angel, 1976). This information is represented “locally” such that individual elements of a task correspond to designated units in the input and output layers. Thus, a sequence is simulated by activating, on each time step, the input unit corresponding to the event at that time (by setting the activation value of that unit to 1). The hidden layer then computes the linear combination of the input unit and context unit activations, weighted by the size of the connecting weights, and transforms the result x through the logistic function (Equation 1). In turn, the output layer computes the linear combination of the hidden layer outputs with the connecting weights, the results of which are also nonlinearly transformed through the logistic function. Units with high activity are assumed to reflect the network's outputs, thus output layer activation values less than a threshold of .9 are set equal to zero. Crucially, on each time step the output of each hidden unit is copied to a corresponding unit in the context layer, which maintains that information into the following time step. This memory buffer enables the hidden layer to integrate new information received from the environment with information about past events maintained in the context layer.

The job of the RNN is to produce, on each time step, an output corresponding to the input to be received on the subsequent time step. To develop the internal representations that allow for successful task performance, the network weights are adjusted through a gradual, iterative training process. Training is achieved by computing an error term error that indicates the discrepancy between the actual output of the network and the desired output, that is, between the predicted event and the event that actually happened,

$$ \mathrm{error} = \mathrm{actual}\ \hbox{-}\ \mathrm{predicted} $$
(2)

where error, actual and predicted are all vectors of length equal to the number of units in the output layer. The weights are then adjusted in a way that minimizes the Euclidean length of the error vector across iterations according to the back-propagation through time learning algorithm (Williams & Zipser, 1995). On this basis the RNN is able to learn to predict which action will be produced given previous environmental states and actions, even for ambiguous states that afford multiple potential actions. Note that a single RNN can implement multiple sequences, the limit to which would depend on the number and length of the sequences, their degree of shared structure, and the representational capacity of the network.

In what follows we first examine how ACC tracks task progression. The RNN implementation of this function predicts that ACC should exhibit formal network properties that should be evident at the neural level, which we explore by comparing our results with neurophysiological data recorded from rats. Second, we compare the discrepancy signals produced by the model with ERP and fMRI indicators of ACC activation in humans. Last, in online supplementary materials, we explore how this signal could be utilized to regulate behavior. All of the following simulations conform to the RNN architecture described above. For each RNN, the number of units in each layer is specific to each problem. Each task was simulated multiple times with weights that were initialized with small random values drawn from a uniform distribution between −0.01 and 0.01. Input units were activated by setting their activation values to 1, and only one input unit was activated at a given time. The learning rate parameter was set to 0.5 across simulations, except where indicated (Williams & Zipser, 1995).

Simulations of multivariate ACC activity

We begin by simulating the role of ACC in sequence execution. For this purpose, we trained an RNN to predict each step of a sequence production task recently conduced by Ma and colleagues (Ma, Hyman, Lindsay, et al., 2014; Ma, Hyman, Phillips, et al., 2014), who trained rats to execute a series of 3 different lever presses according to 3 different sequences (see Fig. 3). The input layer of the network contained seven input units, including three “sequence” units that indicated which of the three sequences to execute, three “orientation” units that indicated which of the three levers the animal faced (as specified by textile cues, but which are color-coded as red, yellow, and blue in Fig. 3 for heuristic purposes), and a “press” unit that indicated that the animal recently pressed a lever. Eight output units coded for all possible events that could occur during the sequence, including six units representing the three stimulus locations (left lever, middle lever, right lever) and three stimulus types (red lever, green lever, blue lever), a unit representing a lever press, and a unit representing that the sequence had terminated. The hidden and context layers contained 50 units each. In order to press a lever, the rat was assumed first to orient to the lever and then to press it; correspondingly, for each press the network was trained to predict, in a two-step sequence, first the lever location and lever type simultaneously, and then a lever press.

Fig. 3
figure 3

Rat sequence production task. Rats were trained to press three different levers according to three different sequences. Each lever was characterized by a unique sensory cue, color-coded here for heuristic purposes as red, yellow and blue. Sequence A consisted of the sequence right, middle, and left lever presses, followed by reward receipt. Sequence B consisted of the sequence middle, left, and right lever presses, followed by reward receipt. Sequence C consisted of the sequence left, right, and middle lever presses, followed by reward receipt. Each sequence was conducted for 10 trials. Note that the order of sensory cues was maintained across the sequences whereas their spatial location changed. Note. Republished with permission of The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, from Tracking Progress Toward a Goal in Corticostriatal Ensembles, Ma, Hyman, Phillips, et al., 34, 2014; permission conveyed through Copyright Clearance Center, Inc. (Color figure online)

Trial progression was simulated across a sequence of seven discrete time steps in which a different input unit was activated at each step (by setting the activation value for that unit to 1): first, the sequence unit representing the sequence to be executed, followed by three iterations of orient and press corresponding to the given sequence. The network was trained to predict what event would occur following each action. For example, if the animal oriented to the red lever, then the network was trained to predict a lever press on the next time step. And if the animal pressed a lever, then the network predicted the location and color of the lever to which the rat would orient on the subsequent step in the sequence. Two hundred RNNs initialized with different weights were trained on the three sequences for 6,000 trials each. All of the network achieved 100% accuracy.

Although the RNNs were not intended to represent how these processes are instantiated at a biological level, we assumed that the abstract neural networks and actual biological networks obey similar computational principles and therefore display comparable patterns of activity. We thus began our analysis by inspecting the properties of individual units in the hidden layers of the RNNs (Figure 2).Footnote 1 We separately classified hidden units that were exclusively activated by input from the sequence units, orientation units, or response units (see Fig. 4, left panel; note that because the first orientation action in each sequence was taken in response to the sequence cue input, the unit activity that preceded the first orientation action was concurrent with activity elicited by the sequence cue input). Individual units in the hidden layer responded exclusively to the contextual input indicating the sequence to be performed on that trial (16% +/− 4%), to orienting to any of the three levers (21% +/− 4%), or to pressing any of the levers (41% +/− 3%). Overall, nearly 90% of units in the simulations were task responsive; conversely, about 10% of the units were nonresponsive. In addition, because Ma, Hyman, Phillips, et al. (2014) and other groups (e.g., Blanchard et al., 2015; Shidara & Richmond, 2002) have observed ACC neurons that increase in activity throughout the execution of action sequences, we inspected the results for “ramping” units that exhibited comparable behavior. Overall, 14% +/− 5% of the units exhibited ramping behavior (see Fig. 4, left panel).

Fig. 4
figure 4

Overview of hidden unit activity in the sequence production task. Left panel: Proportion of all hidden layer units classified as “sequence” (S), “orientation” (O), “press” (P) and “ramping” (R) units. Error bars indicate standard error of the mean across simulations. Right panel: Sequence trajectories of the first two principal components of hidden layer unit activity (PC1 and PC2). Each point in the space is labeled according to the input cue for that element of the sequence. S= sequence cue; O1, O2, O3 = the first, second, and third orientation responses, respectively; P1, P2, and P3 = the first, second and third lever presses, respectively; S1, S2, and S3 = Sequence 1, Sequence 2, and Sequence 3, respectively. Note that the trajectories for the three sequences are indistinct because they nearly overlap. (Color figure online)

Thus, separate units in the model encoded each of several major elements of task execution. Many neurons were also sensitive to multiple task events. For example, 10% +/− 3% of all hidden units across all networks were selective to both press actions and orientation actions, 2% +/− 2% of all units were sensitive to both ramping and press actions, and 7.5% +/− 2.5% of all units were sensitive to both ramping and orientation actions. Conversely, around 70%+/− 4% of all the press-sensitive units were uniquely sensitive to presses, and 43% +/− 3.2% of the orientation-sensitive units were uniquely sensitive to orientation.

We expected that this single-unit activity reflected idiosyncratic aspects of the network’s overall function that is encoded across the collective activity of the entire network. To investigate the nature of these representations, we further analyzed the dynamics of the network during task execution using a dimension reduction approach motivated by previous RNN simulations of hierarchically organized action sequences in humans (Botvinick & Plaut, 2002, 2004). In particular, we applied principal components analysis (PCA) to characterize the evolving patterns of activation observed across the network hidden units (see Fig. 2), and computed the distance between the different network representations in the network state space.

PCA was applied to the hidden unit activation levels during task execution across all the 200 networks (50 units per network = 50 variables; three sequences with seven steps per sequence across 200 networks = 4,200 observations). Figure 4, right panel, illustrates the factor scores for the first two principal components, which together account for 70% of the variance for each event in each sequence. Each line (S1, S2, S3) represents the configuration of the network through a sequence of seven events, beginning with the initial sequence cue (S) followed by three cycles of orient (O1, O2, O3) and press (P1, P2, P3). Visual inspection suggests that the network travels through state space in an orderly fashion, with PC1 encoding sequence progression. These two factors encode different aspects of task performance that correspond well with the empirical data (see Fig. 5). Principal Component 1 appears to represent the sequential position of each task event, with later steps in the sequence represented with higher component scores (see Fig. 5a). In the original study by Ma, Hyman, Lindsay, et al. (2014), a similar increase in activity was seen in the firing rates of 107 of 637 ACC neurons (see Fig. 5b). By contrast, Principal Component 2 distinguishes between different events within the same sequence, namely by separating lever presses from orientation responses (see Fig. 5c), as also observed for a smaller collection of 37 ACC neurons (see Fig. 5d). The figures also suggest that the network is relatively insensitive to the specific levers pressed, coding instead for the serial position of each action within each sequence. Consistent with this observation, the Euclidean distance between representations in the 50-dimension hidden unit state space was larger for a given lever press across the three different sequences (“serial”) as compared to a control condition (“control”; see Fig. 5e). A comparable analysis of ACC network activity for rats performing the task yielded similar results, indicating larger differences between lever presses of a given lever across the three different sequences (“serial”) compared to a random control condition (“control”; see Fig. 5f).Footnote 2

Fig. 5
figure 5

Simulated and empirical anterior cingulate cortex (ACC) ensemble activity in the sequence production task. Simulated data are presented in the left column and empirical data are presented in the right column. a Factor scores for Principal Component 1 (PC1) of hidden unit activation states as a function of serial position. b Firing rate (FR) of ACC neurons sensitive to sequence progression as a function of serial position. c Factor scores for PC2 of hidden unit activation states as a function of serial position. d FR of ACC neurons sensitive to lever presses as a function of serial position. e Euclidean distance between hidden unit network representations associated with presses on the same lever occurring at different serial positions across sequences (black, “serial”) and a control comparison (white, “control”). f The Mahalanobis distance between population vectors in the multiple single unit activity space for presses on the same lever occurring at different serial positions across sequences (black, “serial”) and a control comparison (white, “control”). ***p < .00001. S, O1, P1, O2, P2, O3, P3, S1, S2, S3 = Same as in Fig. 4; 1st, 2nd, 3rd = the first, second, and third lever presses on each sequence, respectively; Approach = the reward-approach period. Error bars indicate standard error of the mean (SEM). Units in panels a, c, e, and f are arbitrary. Note that the trajectories for the three sequences in panels a and c are indistinct because they nearly overlap. Panels b & d: Republished with permission of The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, from Tracking Progress Toward a Goal in Corticostriatal Ensembles, Ma, Hyman, Phillips, et al., 34, 2014; permission conveyed through Copyright Clearance Center, Inc. Panel f: Adapted by permission from Macmillan Publishers Ltd: Nature Neuroscience, Ma, Hyman, Lindsay, et al., (2014), copyright 2014

The model thus yielded individual units that encoded specific features in the trial that together as a group represented higher order aspects of the task sequence. These results suggest that the responses of individual ACC neurons to various task events reflect a deeper role for ACC in tracking the progression of the organism through a series of goal-directed actions, as revealed by the collective activity of entire ensemble of units. Importantly, the characteristics were not explicitly coded into the model architecture but rather emerged as a natural consequence of recurrent neural network function.

Simulations of univariate ACC activation

Above, we used an RNN to illustrate the role of ACC in predicting each event in the execution of goal-directed action sequences, which revealed distributed patterns of activity across ensembles of ACC neurons that evolve dynamically throughout the sequence. Here, we show that these same principles naturally allow for the detection of unexpected events that fall outside the domain of the sequence under execution. RNNs are inherently predictive mechanisms: On the condition that the output of the network is compared to the actual state of the environment, the networks are trained based on the current state of the environment to predict the immediately subsequent state. This training is enforced with an error signal that quantifies the discrepancy between the internal model predictions and the actual outcomes. Here we compute in the testing phase the Euclidean length D of the same signal used to train the network during the training phase (Equation 2), namely

$$ D = \left|\left|\ error\ \right|\right| $$
(3)

As we will show, this discrepancy measure aligns with univariate signals recorded in ACC that reflect the collective activity of multiple neurons.

Electrophysiological signals in humans

Toward this end, we first simulated subject performance in a study by Wessel and colleagues (2012) that combined elements of response compatibility tasks and oddball tasks (see Fig. 6). On each trial subjects responded to an imperative stimulus that indicated which of two buttons to press. The imperative stimulus was flanked on either side by distractors that mapped either to the same response or to the other response. Button presses were followed immediately by the appearance of a “standard” stimulus on most trials, a “novel” stimulus on a small, random subset of correct trials, and a “target” stimulus on only three trials, which the subjects were instructed to count silently.

Fig. 6
figure 6

Speeded response time task. On each trial subjects were required to fixate on an initial stimulus and then to respond to an imperative stimulus consisting of a central target stimulus that was mapped to either a left or right button press, flanked by two distractor stimuli on each side. A second stimulus appeared shortly after the participant’s response. On most trials the stimulus was an upward-pointing triangle (standard). Three upside-down triangles appeared at equal intervals over the duration of the experiment (target). And a picture of an object or an animal occasionally appeared following correct trials (novel). Participants were instructed to press a third button upon appearance of the target stimuli. Republished with permission of The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, from Surprise and Error: Common Neuronal Architecture for the Processing of Errors and Novelty, Wessel et al., 32, 2012; permission conveyed through Copyright Clearance Center, Inc.

We simulated the task with an RNN composed of nine input units, corresponding to the eight possible imperative stimuli in the task (four stimuli in which the responses were compatible with the flanker stimuli, and four stimuli in which the responses were incompatible with the flanker stimuli), and one unit indicating that the response had been completed; four output units, corresponding to the left and right button presses and the novel and standard stimuli; and 10 units in the hidden and context layers, each. Each trial was represented in two time steps: On the first time step the input corresponding to the stimulus for that trial was activated, and on the second time step the input unit corresponding to the selected response was activated. Networks were trained to predict on each trial the response that was executed to the imperative stimulus and the stimulus that would occur following response completion. Five hundred networks were trained 3,000 times each on data derived from the subject accuracy levels reported in Wessel et al. (2012). For example, if accuracy for a particular stimulus was 90%, then the model was trained to predict the correct response on 90% of these trials and the incorrect response on the remaining 10% of the trials, randomly interleaved.

After training, we tested the response of the network to trials with correct responses, error responses, standard stimuli, and novel stimuli. The discrepancy signal at each time step was quantified as the Euclidean distance in a four-dimension state space between the output unit activation levels (indicating what the network predicted) and the output unit target values (indicating what actually occurred; see Equation 3). Response errors were evaluated at Step 1 and the novel stimuli at Step 2. The network produced relatively larger discrepancy signals to errors and to novel stimuli, which occurred less often than correct responses and standard stimuli, respectively, and were therefore relatively unexpected (see Fig. 7, left panel).

Fig. 7
figure 7

Simulated and empirical human event-related brain potential (ERP) data. Left panel: Simulated ERP amplitudes. Discrepancy signals are shown for correct and error responses on Step 1 of the sequence for standard trials, and on Step 2 of the sequence of standard and novel trials. Conditions are labeled according to their associated ERP components: correct-related negativity (CRN), error-related negativity (ERN), and N2. Units are arbitrary. Error bars show standard error of the mean. Right panel: ERPs time-locked to response onset, recorded over central areas of the scalp (channel Cz). Note that the novel and standard stimuli appear 10 ms after the response. Separate ERPs are presented for error trials (solid black), novel correct trials (solid gray), and standard correct trials (dashed grey). Negative is plotted up by convention. The ERN peaks on error trials about 50 ms following the response. A smaller negative deflection occurring at the same time on correct (standard) trials is termed the CRN (Gehring et al., 2012). The N2 peaks about 300 ms following the onset of the novel stimuli. Republished with permission of The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, from Surprise and Error: Common Neuronal Architecture for the Processing of Errors and Novelty, Wessel et al., 32, 2012; permission conveyed through Copyright Clearance Center, Inc.

Notably, this simple measure conforms to the behavior of two ERP components related to surprise and error processing (see Fig. 7, right panel). As commonly observed, Wessel and colleagues (2012) found a large ERN following error commission to the imperative stimuli and a large N2 elicited by the infrequently occurring novel stimuli (see Fig. 7, right panel). Our simulations agree with these findings but suggest a more nuanced explanation for their functional significance: The N2 and ERN indicate that the task sequence is not unfolding as predicted.

FMRI BOLD signals in humans

Importantly, although model details such as the number of units in each layer necessarily vary from simulation to simulation, the same model architecture and computational principles were used to simulate both the behavior of rats in a sequence production task and of humans in a speeded reaction time task. Taken together, the results suggest that ACC tracks the execution of goal-directed action sequences as a distributed pattern of activity across multiple cells and produces discrepancy signals when ongoing events deviate from the given plan.

These ideas share much in common with an influential model of ACC function called the predicted response-outcome (PRO) model which, rather than holding ACC responsible for detecting unexpected events, holds that it detects the omission of expected events (Alexander & Brown, 2011). To compare the two models directly, we simulated the fMRI BOLD response to events in the stop-change task (Brown & Braver, 2005) that have previously been simulated using the PRO model (Alexander & Brown, 2011). These data therefore provide a useful benchmark for comparison.

On each trial of the stop-change task, subjects are required to press a button as quickly as possible to an imperative stimulus, while occasionally inhibiting that response in favor of an alternative choice when presented with a second “change” stimulus; crucially, the timing between the imperative stimulus and the change stimulus is controlled such that some trials are harder than others. Participants are first presented with a cue indicating the trial difficulty (low error vs. high error), followed by an arrow indicating which of two buttons to press (see Fig. 8). On the change trials, the arrow is followed by a second arrow that indicates to participants to cancel the initial response and press the other button; the remaining trials are called “go trials.” Go and change trials occur with 67% and 33% probability, respectively. Trial difficulty is determined for each subject individually according to a staircase procedure that adjusts the time between the imperative cue and the change signal (the “change signal delay”; CSD), enforcing a 50% error rate on difficult change trials and a 4% error rate on easy change trials.

Fig. 8
figure 8

The stop-change task. See text for a description. From “Learned Predictions of Error Likelihood in the Anterior Cingulate Cortex, “ by J. W. Brown and T. S. Braver, 2005 Science, 307, 1118–1121. Reprinted with permission from AAAS

Figure 9a presents the results of the PRO model simulation to key task events, which parallel the empirical BOLD signal findings (not shown). As can be seen, the model yields four essential findings. First, larger signals are produced to correct trials requiring a change in response compared to correct trials requiring no such change, yielding a contrast related to response conflict. Second, larger signals are elicited by errors than by correct responses, yielding an error signal. Third, errors occurring on less difficult trials (“low error likelihood”; LEL) produce larger signals than errors occurring on harder trials (“high error likelihood”; HEL). Finally, larger signals are elicited by correct go-responses on HEL trials compared to correct go-responses on LEL trials, called the “error-likelihood” effect. As described elsewhere, these signals all result in the PRO model from the omission of expected events (Alexander & Brown, 2011).

Fig. 9
figure 9

Simulated human blood-oxygen-level dependent (BOLD) response to events in the stop-change task. a Simulated BOLD response by the predicted response-outcome model of anterior cingulate cortex (Alexander & Brown, 2011). The conflict effect is derived by comparing correct change trials with correct go trials. The error effect is derived by comparing error trials with correct trials. The error unexpectedness effect is derived by comparing errors on high error-likelihood (HEL) trials with errors on low error-likelihood (LEL) trials. The error likelihood effect is derived by comparing HEL trials with LEL trials. Adapted from “Medial Prefrontal Cortex as an Action-Outcome Predictor,” by W. H. Alexander and J. W. Brown, 2011, Nature Neuroscience, 14, 1338–1344. Copyright 2011 by Macmillan Publishers Ltd. Adapted with permission. b Simulated BOLD response by the recurrent neural network model for the same four comparisons shown in the left panel. Cont = contrast; Err = error; Cor = correct. High and Low indicate conditions with high or low conflict, error expectancy, and error likelihood, respectively, for these three comparisons. Error bars indicate standard error of the mean (SEM)

We simulated task performance with an RNN composed of five input units, corresponding to the two difficulty cues (hard, easy), the two arrow directions (pointing left, pointing right), and the change signal; five output units, corresponding to the two arrow directions, the change stimulus, and the two responses (left button press, right button press); and five units for each of the hidden and context layers. Each trial was simulated as a sequence of three time steps that indicated the difficulty cue (Step 1), the arrow direction (Step 2), and the appearance of the change stimulus (Step 3). Based on this input the network was trained to predict the direction of the subsequent arrow stimulus (Step 1), whether or not the change signal would appear (Step 2), and the forthcoming button press (Step 3). Two hundred networks starting with different initial weights were trained for 8,000 trials each, with target values derived from the empirical error rates for each condition, namely, 4% errors on easy change trials and 50% error on hard change trials, pseudorandomly interspersed. As with the actual experiment, 33% of all of the trials were change trials (Brown & Braver, 2005). In keeping with evidence that the BOLD response in this task is sensitive to multiple events within each trial (Nieuwenhuis, Schweizer, Mars, Botvinick, & Hajcak, 2007), the discrepancy signal on each trial was taken as the average discrepancy across the three-step sequence.

Across three of the four contrasts, the discrepancy signals produced by the RNN qualitatively replicated the output of the PRO model (and the associated empirical observations of ACC BOLD response; see Fig. 9b). This correspondence stems from the fact that both models operate on similar principles. The activity level of each output unit of the RNN reflects the probability that a particular event will occur at that time step, as determined by the frequency of occurrence of that event at that time during the course of training. Thus, whereas the PRO model produces a stronger response on error trials to the unexpected omission of the correct response, the RNN model produces a larger discrepancy signal to the unexpected commission of the error itself. For the same reason, errors that are unexpected produce a larger discrepancy signal than errors that are expected. And because correct responses occur less frequently on change trials than on go trials, they elicit larger discrepancy signals on the former compared to the latter, producing a conflict effect.

Unlike the PRO model, however, the RNN does not consistently predict a larger discrepancy signal to correct responses on go trials with high error likelihood compared to correct responses on go trials with low error likelihood. This result obtains because the go signal is equally predictive of a correct outcome on both HEL and LEL trials, yielding discrepancy signals of about equal magnitude. However, we note that the empirical result itself is controversial, which we address in the discussion below.

Taken together, our simulations suggest that ACC tracks the execution of extended behaviors and produces large discrepancy signals to unexpected events that deviate from the task domain. In online supplementary materials, we illustrate how these discrepancy signals could be used to regulate task performance, as discussed below.

Discussion

ACC research has been complicated by a plethora of challenging empirical findings such as the common observation that ACC neurons tend to respond to multiple task-related events (Ebitz & Hayden, 2016; Holroyd & Yeung, 2011). Here, we elucidate these findings with a computational approach that is motivated by the long-standing principle in cognitive psychology (Rumelhart & McClelland, 1986) and computational neuroscience (Churchland & Sejnowski, 1992) that neural functions are encoded as distributed representations across ensembles of neurons (e.g., Fusi et al., 2016; Yuste, 2015). Based in part on our argument that ACC is concerned with the execution of extended, goal-directed action sequences, which we have articulated elsewhere (Holroyd & Yeung, 2012; see also Holroyd & McClure, 2015), we trained a model to predict the behaviors of human and nonhuman animals in sequential tasks. The simulations illustrate how the collective activity of ACC neurons, encoded as a distributed representation across ensembles of units, can track the progression of an agent throughout the execution of goal-directed action sequences, in line with observations of ensemble ACC activity in rats and other animals (e.g., Baeg et al., 2003; Balaguer-Ballester et al., 2011; Blanchard et al., 2015; Cowen et al., 2012; Cowen & McNaughton, 2007; Durstewitz et al., 2010; Euston & McNaughton, 2006; Fujisawa et al., 2008; Hayden, Pearson, et al., 2011; Hyman et al., 2012, 2013; Lapish et al., 2008; Ma, Hyman, Lindsay, et al., 2014; Ma, Hyman, Phillips, et al., 2014; Remondes & Wilson, 2013). Further, because the model was trained to predict task events as they occurred, the simulations were sensitive to unexpected deviations from each sequence, consistent with ubiquitous evidence from fMRI and ERP studies that ACC responds to surprising or conflict-eliciting events (Alexander & Brown, 2011; Botvinick et al., 2001, 2004; Braver et al., 2001; Cavanagh et al., 2012; Donkers et al., 2005; Ferdinand et al., 2012; Ferdinand & Opitz, 2014; Folstein & Van Petten, 2008; Gehring et al., 1992; HajiHosseini & Holroyd, 2013; Holroyd, 2004; Holroyd et al., 2008; Jessup et al., 2010; Jia et al., 2007; Kopp & Wolff, 2000; Mathalon et al., 2003; Nee et al., 2011; Oliveira et al., 2007; O’Reilly et al., 2013; Sams et al., 1983; Silvetti et al., 2011; Y. Wang et al., 2004; Warren & Holroyd, 2012; Wessel et al., 2012; Yeung, 2013; Yeung et al., 2004).

Although a handful of computational models have simulated ACC function previously (e.g., Alexander & Brown, 2011; Botvinick et al., 2001; Brown & Braver, 2005; Holroyd & Coles, 2002, 2008; Holroyd & McClure, 2015; Holroyd, Yeung, Coles, & Cohen, 2005; Khamassi, et al., 2011; Silvetti et al., 2011; Verguts, 2017; Verguts et al., 2015; Yeung et al., 2004), to our knowledge none of these models have examined the essential role of ACC in encoding task execution as distributed representations that evolve dynamically with time (Balaguer-Ballester et al., 2011; Caracheo et al., 2013; Durstewitz et al., 2010; Lapish et al., 2008). Our simulations are thus the first to provide a formal account of this crucial aspect of ACC function. In so doing, the simulations also yield—without any additional assumptions—the surprise, conflict, and error signals that are the mainstay of many previous ACC models (e.g., Alexander & Brown, 2011; Botvinick et al., 2001; Silvetti et al., 2011) but explains these events as resulting from unexpected deviations in the execution of goal-directed action sequences. Consonant with recent calls for the development of theoretical frameworks that bridge studies between human and nonhuman animals (Badre, Frank, & Moore, 2015), these simulations provide a common framework for relating single-cell findings associated with task execution in nonhuman animals with ubiquitous electrophysiological and neuroimaging findings in humans.

Relationship to the PRO model

We simulated the fMRI BOLD response in the stop-change task (Brown & Braver, 2005) as a benchmark for comparison with one of the most comprehensive and successful models of ACC function, the PRO model (Alexander & Brown, 2011; see also Silvetti et al., 2011, for comparable findings). Like the PRO model, the simulations successfully accounted for what are arguably the most salient aspects of ACC activity—conflict, error, and surprise signals (see Fig. 9b). Yet in contrast to the PRO model, our simulations failed to capture the error likelihood effect previously observed in the original stop-change study (Brown & Braver, 2005). That effect is specifically associated with the contrast between correct-go, HEL trials and correct-go, LEL trials in the stop-change task (Brown & Braver, 2005; see Fig. 9a). This divergence between the PRO and RNN models appears to reflect a fundamental difference between their computational properties. The PRO model accounts for the error likelihood effect with separate units for the correct and incorrect response that are simultaneously activated on HEL trials; because subsequent production of the correct response is also associated with the omission of the expected incorrect response, a larger surprise signal is produced on HEL trials than on LEL trials, the latter of which predict mostly correct responses. By contrast, the RNN model predicts what should occur following the appearance of the initial go stimulus, which is either the change stimulus on change trials or the response on go trials. Because these two events occur about equally often for both the HEL and LEL conditions—for both conditions, the change stimulus occurs on 33% of the trials, and the correct response follows on nearly all of the go trials—the RNN produces about equally large discrepancy signals to go-correct trials across the HEL and LEL conditions.

With respect to predicting univariate ACC signals, in our view the PRO and RNN models are more alike than different. Nevertheless, the error likelihood effect may constitute diagnostic evidence between the two models. In this regard it is worth noting that the effect has received less empirical support than the other observations of ACC function related to conflict, errors, and surprise. Although the effect is reproducible (Alexander & Brown, 2010), it appears to be relatively weak (Brown, 2009), and varies substantially across individuals in terms of their sensitivity to risk (Alexander, Fukunaga, Finn, & Brown, 2015; Brown & Braver, 2007; cf. Brown & Braver, 2008). Further, one laboratory failed to replicate the result in two fMRI experiments (Nieuwenhuis, Schweizer, Mars, Botvinick, & Hajcak, 2007) and also found ERP evidence that is inconsistent with the theory (Yeung & Nieuwenhuis, 2009; see also Hammer, Rautzenberg, Heldmann, & Münte, 2011). These considerations suggest that the error likelihood effect warrants further investigation.

By contrast, only the RNN model has yet been shown to account for the type of multivariate activity illustrated in Figs. 4 and 5. A useful exercise would entail training the PRO, RNN, and other models of ACC function on the same task and then applying representational similarity analysis to compare the network properties of each of the models against empirical data (Kriegeskorte & Kievit, 2013; Kriegeskorte, Mur, & Bandettini, 2008).

Discrepancy signal function

Although these simulations illustrate the role of ACC in tracking the execution of goal-directed action sequences and in detecting discrepancies between predicted and actual events in the sequence, they do not apply the discrepancy signals for any functional purpose. A perplexing characteristic of ACC is that despite clear evidence that such discrepancy signals are correlated with adaptive adjustments to behavior, suggesting that ACC is involved in behavioral regulation (Cavanagh & Shackman, 2015), lesions to ACC only minimally impair these adjustments, indicating that the trial-to-trial changes in performance are actually carried out by other brain areas (Holroyd & Yeung, 2012). For this reason, we have argued elsewhere that ACC is responsible for motivating control over the execution of extended action sequences rather than in adjusting behavior from one moment to the next (Holroyd & Yeung, 2012), a property that we have implemented in a previous model of ACC (Holroyd & McClure, 2015). In keeping with this perspective, additional simulations (see online Supplementary Materials) suggest that the discrepancy signals might regulate a control signal that is used by an action production system to produce actions that comport to the task objectives, for example, by providing contextual information that prevents against the production of capture errors and other action slips during the execution of goal-directed action sequences. Alternatively, the discrepancy signals might promote exploration of alternative task strategies (Donoso, Collins, & Koechlin, 2014; Kuwabara et al., 2014; Schuck et al., 2015; Tervo et al., 2014), in parallel to a proposed switching mechanism regulated by tonic dopamine levels in ACC (Holroyd & McClure, 2015) or could provide an update or training signal to the predictive model mediated by ACC (e.g., O’Reilly et al., 2013). We believe that the evidence to date is in insufficient to decide between these possibilities, which in any case are not mutually exclusive.

Future directions

A promising avenue for investigation would entail instantiating the model in a more biologically realistic network that incorporates finer temporal dynamics into the recurrent activity (Sussillo, 2014), in line with previous efforts to simulate the role of prefrontal cortex in response generation and decision making (Erlich & Brody, 2013; Mante, Sussillo, Shenoy, & Newsome, 2013; Moody, Wise, di Pellegrino, & Zipser, 1998; Nakahara & Doya, 1998; X.-J. Wang, 2008). These networks typically exhibit complex temporal dynamics that are amenable to investigation using an arsenal of mathematical tools from nonlinear dynamical systems analysis (e.g., Durstewitz & Deco, 2008; Sussillo, 2014; Sussillo & Barak, 2013; Wolf, Engelken, Puelma-Touzel, Weidinger, & Neef, 2014). Although notoriously difficult to train (Sussillo, 2014), recent advances (e.g., Martens, 2010; Martens & Sutskever, 2011; Song, Yang, & Wang, 2016) have been encouraging (Abbott, DePasquale, & Memmesheimer, 2016; Ardid & Wang, 2013; Song et al., 2016; Sussillo & Abbott, 2009). For example, networks of sparsely and recurrently connected spiking neurons have been trained using a biologically realistic reward signal to compute rule-specific decisions based on information maintained in working memory (Hoerzer, Legenstein, & Maas, 2014), though it remains an open question whether or not such principles can be utilized to simulate the neural mechanisms of hierarchically organized action sequences (for efforts in this directions, see Namikawa, Nishimoto, & Tani, 2011; Nishimoto & Tani, 2004; Rao & Sejnowski, 2000; Starzyk & He, 2007; Yamashita & Tani, 2008).

In other work, we have simulated the role of ACC in learning the value of and selecting tasks based on principles of hierarchical reinforcement learning (HRL; Holroyd & McClure, 2015). An obvious next step would therefore be to integrate the RNN model of ACC into the HRL framework. A hybrid model of ACC would see the action policy for each option implemented in a separate RNN and would select and execute the RNNs based on their learned reward values. This approach could help resolve a debate about whether the execution of hierarchically organized action sequences are better represented with connectionist-style RNNs (Botvinick, 2005; Botvinick & Plaut, 2002, 2004), or with rule-following symbolic processes that explicitly organize action sequences according to goals and sub-goals (Cooper & Shallice, 2000, 2006). An integrated model would allow for options to be flexibly combined in novel configurations according to principles of HRL (Hengst, 2012) while retaining the strengths of the connectionist implementation, such as the ability to generalize learned structure across contextual domains (Botvinick & Plaut, 2002, 2004). These two approaches are not actually incompatible (Cooper & Shallice, 2006), and recent efforts have seen the development of hybrid models that explicitly encode goal and subgoal states in RNNs (Cooper, Ruh, & Mareschal, 2014). Related work has investigated how an RNN can implement an “actor-critic” architecture that executes action sequences according to principles of reinforcement learning (Ruh, Cooper, & Mareschal, 2005; see also Cooper & Glasspool, 2001).

Conclusion

Decades of research on ACC have revealed a colorful but bewildering landscape of empirical findings. Our simulations show the proverbial forest for the trees, where the trees consist of observations of ACC activity to individual task events and the forest is the dynamically evolving relationship between these observations. This proposal is cast in a formal theoretical framework that accounts for existing ensemble activity in ACC of nonhuman animals as well as for conflict, surprise, and related signals commonly observed in ACC in human functional neuroimaging and electrophysiological experiments. In the context of previous work on HRL (Holroyd & McClure, 2015; Holroyd & Yeung, 2012), these efforts point toward a unified account of the role of ACC in action selection and performance monitoring.