Distributed representations of action sequences in anterior cingulate cortex: A recurrent neural network approach
Anterior cingulate cortex (ACC) has been the subject of intense debate over the past 2 decades, but its specific computational function remains controversial. Here we present a simple computational model of ACC that incorporates distributed representations across a network of interconnected processing units. Based on the proposal that ACC is concerned with the execution of extended, goal-directed action sequences, we trained a recurrent neural network to predict each successive step of several sequences associated with multiple tasks. In keeping with neurophysiological observations from nonhuman animals, the network yields distributed patterns of activity across ACC neurons that track the progression of each sequence, and in keeping with human neuroimaging data, the network produces discrepancy signals when any step of the sequence deviates from the predicted step. These simulations illustrate a novel approach for investigating ACC function.
KeywordsAnterior cingulate cortex Recurrent neural network Sequence learning Surprise
A surge of research interest over the last 2 decades has resulted in widespread agreement that anterior cingulate cortex (ACC) is partly responsible for cognitive control and decision making (e.g., Euston, Gruber, & McNaughton, 2012; Ridderinkhof, van den Wildenberg, Segalowtiz, & Carter, 2004; Rolls, 2009; Rushworth & Behrens, 2008; Rushworth, Kolling, Sallet, & Mars, 2012; Shackman et al., 2011; Shenhav, Botvinick, & Cohen, 2013; Walton & Mars, 2007), but the development of a formal, comprehensive theory about its function has remained out of reach (e.g., Alexander & Brown, 2011; Botvinick, Braver, Barch, Carter, & Cohen, 2001; Brown & Braver, 2005; Holroyd & McClure, 2015; Khamassi, Lallee, Enel, Procyk, & Dominey, 2011; Silvetti, Seurinck, & Verguts, 2011; Verguts, 2017; Verguts, Vassena, & Silvetti, 2015; Yeung, Botvinick, & Cohen, 2004). Efforts in this direction have been hindered by a complex panoply of empirical findings. For example, a salient observation across neurophysiological studies is that individual cells in ACC tend to respond to multiple task events, which seems to implicate a role for ACC in everything (Ebitz & Hayden, 2016). Thus the exact computational function of ACC—and even whether it has one (Bush, 2009)—remains controversial (Holroyd & Yeung, 2011).
Here we present a simple computational model of ACC that illuminates many of these disparate findings. Our approach is motivated by three considerations. First, the model is based on the long-standing idea in cognitive psychology that cognitive processes are encoded across networks of interconnected processing units (Rumelhart & McClelland, 1986). Although this perspective has recently gained currency in behavioral neuroscience, which is increasingly emphasizing the collective activity of ensembles of neurons rather than the tuning properties of individual cells (e.g., Fusi, Miller & Rigotti, 2016; Rigotti et al., 2013; Yuste, 2015), to our knowledge it has yet to be applied to formal models of ACC. Second, the model takes into account ubiquitous findings from human neuroimaging and electrophysiological studies that ACC is sensitive to response conflict, errors, and surprising events (e.g., Botvinick et al., 2001). Third, the model is based on our previous argument, derived partly from lesion data in human and other animals, that ACC is concerned with the execution of extended, goal-directed action sequences (Holroyd & Yeung, 2012). Here we develop that idea by proposing that the ACC predicts each successive step in the sequence. On this account, the model yields distributed patterns of activity across ACC neurons that track the progression of action sequences (in keeping with the neurophysiological data), and produces a discrepancy signal when any step of a sequence deviates from the predicted step (in keeping with the neuroimaging data).
We begin by providing a very brief overview of neurophysiological data related to ACC. We then home in on a series of studies that suggest a special role for ACC in the execution of goal-directed action sequences. Next, we describe principles for modeling action sequences based on recurrent neural networks (RNNs), and apply these principles to simulate the activity of ACC neurons of rats performing a sequential task. Consistent with empirical findings, the individual units in the model respond to multiple task events, whereas distributed patterns of activity across the units follow the progression of each sequence. We also illustrate how these same principles account for the surprise, error and conflict signals produced in ACC when humans perform speeded response time (RT) tasks, as revealed by human event-related brain potential (ERP) and functional magnetic resonance imaging (fMRI) experiments. Finally, in supplementary materials, we illustrate how these signals could be applied to regulate action execution; these simulations suggest that ACC should be especially important for maintaining contextual information that disambiguates related action sequences. These observations are discussed in terms of our proposal that the ACC is responsible for motivating the execution of extended behaviors (Holroyd & Yeung, 2011, 2012).
Neurophysiological and neuroimaging observations of ACC
A major complicating factor in understanding ACC is the wide range of neurophysiological findings that seem to implicate it in most task events (Ebitz & Hayden, 2016). For example, ACC neurons are seen to respond to stimulus events (Nishijo et al., 1997), motor activity (Backus, Ye, Russo, & Crutcher, 2001; Russo, Backus, Ye, & Crutcher, 2002; Shima et al., 1991), rewards (Amiez, Joseph, & Procyk, 2005; Kennerley, Behrens, & Wallis, 2011; Luk & Wallis, 2009; Sallet et al., 2007; Seo & Lee, 2008), errors (Shen et al., 2015; Totah, Kim, Homayoun, & Moghaddam, 2009; C. Wang, Ulbert, Schomer, Marinkovic, & Halgren, 2005), prediction errors and surprise signals (Bryden, Johnson, Tobia, Kashtelyan, & Roesch, 2011; Hayden, Heilbronner, Pearson, & Platt, 2011; Klavir, Genud-Gabai, & Paz, 2013; Matsumoto, Matsumoto, Abe, & Tanaka, 2007), pain and its anticipation (Koyama, Tanaka, & Mikami, 1998) and, more controversially, in conflict (Davis et al., 2005; Ebitz & Platt, 2015; Kaping, Vinck, Hutchison, Everling, & Womelsdorf, 2011). ACC neurons are also related to cognitive processes associated with working memory (Niki & Watanabe, 1976), long-term memory formation (Weible, Rowland, Monaghan, Wolfgang, & Kentros, 2012), effortful control (Davis, Hutchinson, Lozano, Tasker, & Dostrovsky, 2000; Hillman & Bilkey, 2010, 2012) and even grid cell representations (Jacobs et al., 2013). Complicating matters further, many of these neurons reveal interdependencies across events. For example, ACC cells are said to provide a “gateway” through which decision making systems affect behavior (Cai & Padoa-Schioppa, 2012) by linking or “multiplexing” information about rewards and actions (Hayden & Platt, 2010; Shima & Tanji, 1998; Tanji, Shima, & Matsuzaka, 2002) and predictive stimuli (Takenouchi et al., 1999). The firing patterns of many ACC neurons are multi-determined (Shidara, Mizuhiki, & Richmond, 2005), reflecting multiple aspects of value-based decision making (Hoshi, Sawamura, & Tanji, 2005; Kennerley, Dahmubed, Lara, & Wallis, 2009; Kennerley & Wallis, 2009; Khamassi, Quilodran, Enel, Dominey, & Procyk, 2015) including high-level aspects of task performance such as task switching and behavior shifts (Johnston, Levin, Koval, & Everling, 2007; Kuwabara, Mansouri, Buckley, & Tanaka, 2014; Quilodran, Rothé, & Procyk, 2008). In humans, individual ACC neurons are recruited across multiple different tasks, especially when these tasks demand effort or attention (Davis et al., 2000; Wang et al., 2005).
Neuroimaging and scalp-recorded electrophysiological recordings in humans have also associated ACC with a wide range of phenomena (Holroyd & Yeung, 2011, 2012). Yet, a core set of highly replicable findings have implicated ACC function specifically in the processing of response conflict (Botvinick et al., 2001; Botvinick, Cohen, & Carter, 2004; Yeung, 2013; Yeung et al., 2004), errors (Falkenstein, Hohnsbein, Hoormann, & Blanke, 1990; Gehring et al., 1993; Wessel, Danielmeier, Morton, & Ullsperger, 2012) and otherwise surprising or unexpected events (Alexander & Brown, 2011; Braver, Barch, Gray, Molfese, & Snyder, 2001; Ferdinand & Ovitz, 2014; Forster & Brown, 2011; HajiHosseini & Holroyd, 2013; Holroyd, 2004; Holroyd, Pakzad-Vaezi, & Krigolson, 2008; Jessup, Busemeyer, & Brown, 2010; Metereau & Dreher, 2013; Nee, Kastner, & Brown, 2011; Oliveira, McDonald, & Goodman, 2007; O’Reilly et al., 2013; Silvetti et al., 2011; Wessel et al., 2012). For example, the error-related negativity (ERN) is an ERP component elicited by error commission in speeded response-time tasks (Falkenstein et al., 1990; Gehring et al., 1993; for review see Gehring, Liu, Orr, & Carp, 2012), and the N2 is an ERP component elicited by unexpected, task-relevant stimuli (Donkers, Nieuwenhuis, & van Boxtel, 2005; Ferdinand, Mecklinger, Kray, & Gehring, 2012; Gehring, Gratton, Coles, & Donchin, 1992; Holroyd, 2004; Kopp & Wolff, 2000; Oliveira et al., 2007; Warren & Holroyd, 2012), especially when the stimuli mismatch with a perceptual template of ongoing events (Jia et al., 2007; Sams, Alho, & Näätänen, 1983; Y. Wang, Cui, Wang, Tian, & Zhang, 2004; for review, see Folstein & Van Petten, 2008). The neural sources of the N2 and the ERN colocalize in ACC with an enhanced fMRI blood-oxygen-level dependent (BOLD) response to the same events (Mathalon, Whitfield, & Ford, 2003; Wessel et al., 2012), and it has been proposed that the two ERP components reflect different manifestations of a single underlying cognitive process by ACC (Cavanagh, Zambrano-Vazquez, & Allen, 2012; Wessel et al., 2012; Yeung et al., 2004).
These observations suggest that, when inspected at the cellular level, ACC function is associated with a wide range of task-related events, but when inspected at a more global level, ACC activation is strongly modulated by events related to surprise, conflict and errors.
ACC and action sequences
Although the response profiles of ACC neurons are heterogeneous, a telling set of observations have implicated ACC specifically in the production of goal-directed action sequences. Early on, Procyk, Tanaka, and Joseph (2000) and Procyk and Joseph (2001) observed that motor neurons in ACC are sensitive to the serial order of actions executed in a sequence irrespective of the actual movements performed. Shidara and Richmond (2002) also reported that ACC neurons code for the degree of reward expectancy through multi-stage tasks, and Mulder, Nordquist, Örgüt, and Pennartz (2003) observed that ACC neurons implement a “response set,” which they defined as “a cognitive-motor predisposition to organize and execute an action sequence directed towards a particular goal.” This research presaged a series of studies indicating that ACC neurons are sensitive to the sequential order of task progression across multiple task stages (Cowen & McNaughton, 2007; Fujisawa, Amarasingham, Harrison, & Buzsáki, 2008; Hayden, Pearson, & Platt, 2011; Hoshi et al., 2005; Shidara et al., 2005), as represented by distinct patterns of activity of ensembles of ACC neurons (Blanchard, Strait & Hayden, 2015; Cowen, Davis, & Nitz, 2012). These sequential activations have been said to be mediated by a working memory process intrinsic to ACC (Baeg et al., 2003), or alternatively by inputs carrying sensory and motor efferent information to ACC (Euston & McNaughton, 2006). The sequential activity is also coordinated with respect to the phase of theta oscillations of local field potentials in ACC (Remondes & Wilson, 2013), which are said to synchronize the computations of widespread brain networks (Holroyd, 2016; Verguts, 2017).
Recently, Jeremy Seamans and colleagues have systematically investigated the activity of ACC neurons during the execution of task sequences (Balaguer-Ballester, Lapish, Seamans, & Durstewitz, 2011; Caracheo, Emberly, Hadizadeh, Hyman, & Seamans, 2013; Durstewitz, Vittoz, Floresco, & Seamans, 2010; Hyman, Ma, Balaguer-Ballester, Durstewitz, & Seamans, 2012; Hyman, Whitman, Emberly, Woodward, & Seamans, 2013; Lapish, Durstewitz, Chandler, & Seamans, 2008; Ma, Hyman, Lindsay, Phillips, & Seamans, 2014; Ma, Hyman, Phillips, & Seamans, 2014). Their work focuses on distributed patterns of activity that characterize entire ensembles of neurons, rather than on the responses of the individual neurons that comprise the ensembles. Application of dimension reduction techniques reveal that network-wide ACC activity is characterized by complex trajectories through an abstract state space (Balaguer-Ballester et al., 2011), the disruption to which predicts behavioral errors (Lapish et al., 2008; see also Hyman et al., 2013; Stokes et al., 2013). On this view, ACC network activity tracks task progression through a task-dependent frame of reference, or “task space” (Lapish et al., 2008), toward the animal’s goal (Ma, Ma, Hyman, et al., 2014). The network activity is especially sensitive to transitions through subcomponents of cognitive tasks (Balaguer-Ballester et al., 2011) and is accompanied by abrupt transitions in the state space when the animal learns new rules (Durstewitz et al., 2010) or is exposed to important environmental changes (Caracheo et al., 2013). ACC network activity also discriminates between task sequences more accurately than comparable activity in the dorsal striatum (Ma, Hyman, Lindsay, et al., 2014; Ma, Hyman, Phillips, et al., 2014), and can be distinguished from the function of the hippocampus, which famously encodes sequences related to spatial navigation and other temporally organized events (Hyman et al., 2012).
In humans, fMRI evidence indicates that ACC is responsible for learning and evaluating the execution of hierarchically organized sequences of cognitive tasks (Koechlin, Danek, Burnod, & Grafman, 2002). Further, unexpected changes in sequence production tasks elicit an increased ACC BOLD response (Berns, Cohen, & Mintun, 1997; Ursu, Clark, Aizenstein, Stenger, & Carter, 2009) and larger N2 amplitudes (Eimer, Goschke, Schlaghecken, & Stürmer, 1996; Ferdinand, Mecklinger, & Kray, 2008; Fu, Bin, Dienes, Fu, & Gao, 2013; Jongsma et al., 2013; Lang & Kotchoubey, 2000; Miyawaki, Sato, Yasuda, Kumano, & Kuboki, 2005; Rüsseler, Hennighausen, Münte, & Rösler, 2003; Rüsseler, Kuhlicke, & Münte, 2003; Rüsseler & Rösler, 2000; Schlaghecken, Stürmer, & Eimer, 2000). These observations are in line with our suggestion, based largely on lesion evidence in humans and other animals, that ACC supports the execution of extended, goal directed behaviors involving multiple actions (Holroyd & McClure, 2015; Holroyd & Yeung, 2012), and that unexpected deviations from the intended sequence elicit surprise signals from ACC.
The current approach: recurrent neural networks
Accordingly, we aimed to illustrate these principles in a computational model that simulates ACC neuron activity as animals execute goal-directed task sequences and that produces surprise signals to unexpected events during those sequences. Crucially, the widely varying response profiles of individual ACC neurons suggests that this level of analysis may not be optimal for inferring ACC function. Instead, our approach leverages a long-standing idea in cognitive psychology (Rumelhart & McClelland, 1986) and computational neuroscience (Churchland & Sejnowski, 1992) that holds that neurocognitive functions are encoded across distributed populations of units rather than by individual cells. Although fundamental to computational neuroscience, this principle has only recently become more widely recognized in behavioral neuroscience (e.g., Fusi et al., 2016; Yuste, 2015)—a change in perspective driven partly by methodological advances that have allowed for the simultaneous collection and statistical analysis of data from multiple neural units (Cunningham & Yu, 2014). Thus, a basic principle of neural network theory suggests that we should simulate and analyze the ensemble activity of ACC neurons rather than the single-cell activity per se.
Here, we were inspired by the observation that recurrent connections in neural networks provide a natural means for supporting the execution of extended, goal-directed behaviors (Durstewitz, Seamans, & Sejnowski, 2000). In particular, “connectionist” style recurrent neural networks have been used extensively to study the cognitive psychology of sequence processing (e.g., Cleeremans & McClelland, 1991; Elman, 1990). RNNs provide a computational platform for exploring how distributed representations of hierarchically organized sequences unfold dynamically over time (Elman, 1990). This approach has proven especially fruitful as a proving ground for understanding the neural and cognitive representations that underpin complex action sequences with hierarchical structure. For example, previous work has illustrated how patterns of activity observed across network units can provide insight into representations shared within and across action sequences (Botvinick & Plaut, 2002, 2004). We therefore adopted RNNs for this purpose.
Connectionist networks consist of sets of interconnected, abstract units that process information passed between them (Rumelhart & McClelland, 1986). Some connectionist networks exploit a simple but powerful architecture consisting of three layers of nonlinear processing units, including an input layer, output layer, and an intermediating so-called hidden layer, connected with feed-forward excitatory projections (or weights), that can be augmented with additional layers. With the appropriate connection weights, which are determined through an iterative training process, these networks are capable of approximating any mathematical function to an arbitrary degree of accuracy (Hornik, 1991). It should be emphasized that although the connectionist framework in psychology is motivated by the massively parallel and interconnected structure of the human nervous system, the units in connectionist models are not normally understood to represent individual neurons. Rather, the networks describe in the abstract how distributed processing systems can give rise to observable behavior (Rumelhart & McClelland, 1986).
Given an initial element of a sequence as input, these so-called Elman networks can be trained to reproduce the subsequent elements of that sequence. Of interest in these simulations is how the networks disambiguate elements in a sequence by maintaining a memory of the context in which each element occurs. For example, a network may be trained to produce the sequences A->B->C and D->B->E. Recurrence enables the network to produce correctly C or E following the element that is common to both sequences (B), by retaining information about the context in which that element occurred (following an A or D). This contextual encoding can be supported across even longer and more complex sequences, such as A->B->B->B->B->C and D->B->B->B->B->E (Servan-Schreiber, Cleeremans, & McClelland, 1991).
In what follows we first examine how ACC tracks task progression. The RNN implementation of this function predicts that ACC should exhibit formal network properties that should be evident at the neural level, which we explore by comparing our results with neurophysiological data recorded from rats. Second, we compare the discrepancy signals produced by the model with ERP and fMRI indicators of ACC activation in humans. Last, in online supplementary materials, we explore how this signal could be utilized to regulate behavior. All of the following simulations conform to the RNN architecture described above. For each RNN, the number of units in each layer is specific to each problem. Each task was simulated multiple times with weights that were initialized with small random values drawn from a uniform distribution between −0.01 and 0.01. Input units were activated by setting their activation values to 1, and only one input unit was activated at a given time. The learning rate parameter was set to 0.5 across simulations, except where indicated (Williams & Zipser, 1995).
Simulations of multivariate ACC activity
Trial progression was simulated across a sequence of seven discrete time steps in which a different input unit was activated at each step (by setting the activation value for that unit to 1): first, the sequence unit representing the sequence to be executed, followed by three iterations of orient and press corresponding to the given sequence. The network was trained to predict what event would occur following each action. For example, if the animal oriented to the red lever, then the network was trained to predict a lever press on the next time step. And if the animal pressed a lever, then the network predicted the location and color of the lever to which the rat would orient on the subsequent step in the sequence. Two hundred RNNs initialized with different weights were trained on the three sequences for 6,000 trials each. All of the network achieved 100% accuracy.
Thus, separate units in the model encoded each of several major elements of task execution. Many neurons were also sensitive to multiple task events. For example, 10% +/− 3% of all hidden units across all networks were selective to both press actions and orientation actions, 2% +/− 2% of all units were sensitive to both ramping and press actions, and 7.5% +/− 2.5% of all units were sensitive to both ramping and orientation actions. Conversely, around 70%+/− 4% of all the press-sensitive units were uniquely sensitive to presses, and 43% +/− 3.2% of the orientation-sensitive units were uniquely sensitive to orientation.
We expected that this single-unit activity reflected idiosyncratic aspects of the network’s overall function that is encoded across the collective activity of the entire network. To investigate the nature of these representations, we further analyzed the dynamics of the network during task execution using a dimension reduction approach motivated by previous RNN simulations of hierarchically organized action sequences in humans (Botvinick & Plaut, 2002, 2004). In particular, we applied principal components analysis (PCA) to characterize the evolving patterns of activation observed across the network hidden units (see Fig. 2), and computed the distance between the different network representations in the network state space.
The model thus yielded individual units that encoded specific features in the trial that together as a group represented higher order aspects of the task sequence. These results suggest that the responses of individual ACC neurons to various task events reflect a deeper role for ACC in tracking the progression of the organism through a series of goal-directed actions, as revealed by the collective activity of entire ensemble of units. Importantly, the characteristics were not explicitly coded into the model architecture but rather emerged as a natural consequence of recurrent neural network function.
Simulations of univariate ACC activation
As we will show, this discrepancy measure aligns with univariate signals recorded in ACC that reflect the collective activity of multiple neurons.
Electrophysiological signals in humans
We simulated the task with an RNN composed of nine input units, corresponding to the eight possible imperative stimuli in the task (four stimuli in which the responses were compatible with the flanker stimuli, and four stimuli in which the responses were incompatible with the flanker stimuli), and one unit indicating that the response had been completed; four output units, corresponding to the left and right button presses and the novel and standard stimuli; and 10 units in the hidden and context layers, each. Each trial was represented in two time steps: On the first time step the input corresponding to the stimulus for that trial was activated, and on the second time step the input unit corresponding to the selected response was activated. Networks were trained to predict on each trial the response that was executed to the imperative stimulus and the stimulus that would occur following response completion. Five hundred networks were trained 3,000 times each on data derived from the subject accuracy levels reported in Wessel et al. (2012). For example, if accuracy for a particular stimulus was 90%, then the model was trained to predict the correct response on 90% of these trials and the incorrect response on the remaining 10% of the trials, randomly interleaved.
Notably, this simple measure conforms to the behavior of two ERP components related to surprise and error processing (see Fig. 7, right panel). As commonly observed, Wessel and colleagues (2012) found a large ERN following error commission to the imperative stimuli and a large N2 elicited by the infrequently occurring novel stimuli (see Fig. 7, right panel). Our simulations agree with these findings but suggest a more nuanced explanation for their functional significance: The N2 and ERN indicate that the task sequence is not unfolding as predicted.
FMRI BOLD signals in humans
Importantly, although model details such as the number of units in each layer necessarily vary from simulation to simulation, the same model architecture and computational principles were used to simulate both the behavior of rats in a sequence production task and of humans in a speeded reaction time task. Taken together, the results suggest that ACC tracks the execution of goal-directed action sequences as a distributed pattern of activity across multiple cells and produces discrepancy signals when ongoing events deviate from the given plan.
These ideas share much in common with an influential model of ACC function called the predicted response-outcome (PRO) model which, rather than holding ACC responsible for detecting unexpected events, holds that it detects the omission of expected events (Alexander & Brown, 2011). To compare the two models directly, we simulated the fMRI BOLD response to events in the stop-change task (Brown & Braver, 2005) that have previously been simulated using the PRO model (Alexander & Brown, 2011). These data therefore provide a useful benchmark for comparison.
We simulated task performance with an RNN composed of five input units, corresponding to the two difficulty cues (hard, easy), the two arrow directions (pointing left, pointing right), and the change signal; five output units, corresponding to the two arrow directions, the change stimulus, and the two responses (left button press, right button press); and five units for each of the hidden and context layers. Each trial was simulated as a sequence of three time steps that indicated the difficulty cue (Step 1), the arrow direction (Step 2), and the appearance of the change stimulus (Step 3). Based on this input the network was trained to predict the direction of the subsequent arrow stimulus (Step 1), whether or not the change signal would appear (Step 2), and the forthcoming button press (Step 3). Two hundred networks starting with different initial weights were trained for 8,000 trials each, with target values derived from the empirical error rates for each condition, namely, 4% errors on easy change trials and 50% error on hard change trials, pseudorandomly interspersed. As with the actual experiment, 33% of all of the trials were change trials (Brown & Braver, 2005). In keeping with evidence that the BOLD response in this task is sensitive to multiple events within each trial (Nieuwenhuis, Schweizer, Mars, Botvinick, & Hajcak, 2007), the discrepancy signal on each trial was taken as the average discrepancy across the three-step sequence.
Across three of the four contrasts, the discrepancy signals produced by the RNN qualitatively replicated the output of the PRO model (and the associated empirical observations of ACC BOLD response; see Fig. 9b). This correspondence stems from the fact that both models operate on similar principles. The activity level of each output unit of the RNN reflects the probability that a particular event will occur at that time step, as determined by the frequency of occurrence of that event at that time during the course of training. Thus, whereas the PRO model produces a stronger response on error trials to the unexpected omission of the correct response, the RNN model produces a larger discrepancy signal to the unexpected commission of the error itself. For the same reason, errors that are unexpected produce a larger discrepancy signal than errors that are expected. And because correct responses occur less frequently on change trials than on go trials, they elicit larger discrepancy signals on the former compared to the latter, producing a conflict effect.
Unlike the PRO model, however, the RNN does not consistently predict a larger discrepancy signal to correct responses on go trials with high error likelihood compared to correct responses on go trials with low error likelihood. This result obtains because the go signal is equally predictive of a correct outcome on both HEL and LEL trials, yielding discrepancy signals of about equal magnitude. However, we note that the empirical result itself is controversial, which we address in the discussion below.
Taken together, our simulations suggest that ACC tracks the execution of extended behaviors and produces large discrepancy signals to unexpected events that deviate from the task domain. In online supplementary materials, we illustrate how these discrepancy signals could be used to regulate task performance, as discussed below.
ACC research has been complicated by a plethora of challenging empirical findings such as the common observation that ACC neurons tend to respond to multiple task-related events (Ebitz & Hayden, 2016; Holroyd & Yeung, 2011). Here, we elucidate these findings with a computational approach that is motivated by the long-standing principle in cognitive psychology (Rumelhart & McClelland, 1986) and computational neuroscience (Churchland & Sejnowski, 1992) that neural functions are encoded as distributed representations across ensembles of neurons (e.g., Fusi et al., 2016; Yuste, 2015). Based in part on our argument that ACC is concerned with the execution of extended, goal-directed action sequences, which we have articulated elsewhere (Holroyd & Yeung, 2012; see also Holroyd & McClure, 2015), we trained a model to predict the behaviors of human and nonhuman animals in sequential tasks. The simulations illustrate how the collective activity of ACC neurons, encoded as a distributed representation across ensembles of units, can track the progression of an agent throughout the execution of goal-directed action sequences, in line with observations of ensemble ACC activity in rats and other animals (e.g., Baeg et al., 2003; Balaguer-Ballester et al., 2011; Blanchard et al., 2015; Cowen et al., 2012; Cowen & McNaughton, 2007; Durstewitz et al., 2010; Euston & McNaughton, 2006; Fujisawa et al., 2008; Hayden, Pearson, et al., 2011; Hyman et al., 2012, 2013; Lapish et al., 2008; Ma, Hyman, Lindsay, et al., 2014; Ma, Hyman, Phillips, et al., 2014; Remondes & Wilson, 2013). Further, because the model was trained to predict task events as they occurred, the simulations were sensitive to unexpected deviations from each sequence, consistent with ubiquitous evidence from fMRI and ERP studies that ACC responds to surprising or conflict-eliciting events (Alexander & Brown, 2011; Botvinick et al., 2001, 2004; Braver et al., 2001; Cavanagh et al., 2012; Donkers et al., 2005; Ferdinand et al., 2012; Ferdinand & Opitz, 2014; Folstein & Van Petten, 2008; Gehring et al., 1992; HajiHosseini & Holroyd, 2013; Holroyd, 2004; Holroyd et al., 2008; Jessup et al., 2010; Jia et al., 2007; Kopp & Wolff, 2000; Mathalon et al., 2003; Nee et al., 2011; Oliveira et al., 2007; O’Reilly et al., 2013; Sams et al., 1983; Silvetti et al., 2011; Y. Wang et al., 2004; Warren & Holroyd, 2012; Wessel et al., 2012; Yeung, 2013; Yeung et al., 2004).
Although a handful of computational models have simulated ACC function previously (e.g., Alexander & Brown, 2011; Botvinick et al., 2001; Brown & Braver, 2005; Holroyd & Coles, 2002, 2008; Holroyd & McClure, 2015; Holroyd, Yeung, Coles, & Cohen, 2005; Khamassi, et al., 2011; Silvetti et al., 2011; Verguts, 2017; Verguts et al., 2015; Yeung et al., 2004), to our knowledge none of these models have examined the essential role of ACC in encoding task execution as distributed representations that evolve dynamically with time (Balaguer-Ballester et al., 2011; Caracheo et al., 2013; Durstewitz et al., 2010; Lapish et al., 2008). Our simulations are thus the first to provide a formal account of this crucial aspect of ACC function. In so doing, the simulations also yield—without any additional assumptions—the surprise, conflict, and error signals that are the mainstay of many previous ACC models (e.g., Alexander & Brown, 2011; Botvinick et al., 2001; Silvetti et al., 2011) but explains these events as resulting from unexpected deviations in the execution of goal-directed action sequences. Consonant with recent calls for the development of theoretical frameworks that bridge studies between human and nonhuman animals (Badre, Frank, & Moore, 2015), these simulations provide a common framework for relating single-cell findings associated with task execution in nonhuman animals with ubiquitous electrophysiological and neuroimaging findings in humans.
Relationship to the PRO model
We simulated the fMRI BOLD response in the stop-change task (Brown & Braver, 2005) as a benchmark for comparison with one of the most comprehensive and successful models of ACC function, the PRO model (Alexander & Brown, 2011; see also Silvetti et al., 2011, for comparable findings). Like the PRO model, the simulations successfully accounted for what are arguably the most salient aspects of ACC activity—conflict, error, and surprise signals (see Fig. 9b). Yet in contrast to the PRO model, our simulations failed to capture the error likelihood effect previously observed in the original stop-change study (Brown & Braver, 2005). That effect is specifically associated with the contrast between correct-go, HEL trials and correct-go, LEL trials in the stop-change task (Brown & Braver, 2005; see Fig. 9a). This divergence between the PRO and RNN models appears to reflect a fundamental difference between their computational properties. The PRO model accounts for the error likelihood effect with separate units for the correct and incorrect response that are simultaneously activated on HEL trials; because subsequent production of the correct response is also associated with the omission of the expected incorrect response, a larger surprise signal is produced on HEL trials than on LEL trials, the latter of which predict mostly correct responses. By contrast, the RNN model predicts what should occur following the appearance of the initial go stimulus, which is either the change stimulus on change trials or the response on go trials. Because these two events occur about equally often for both the HEL and LEL conditions—for both conditions, the change stimulus occurs on 33% of the trials, and the correct response follows on nearly all of the go trials—the RNN produces about equally large discrepancy signals to go-correct trials across the HEL and LEL conditions.
With respect to predicting univariate ACC signals, in our view the PRO and RNN models are more alike than different. Nevertheless, the error likelihood effect may constitute diagnostic evidence between the two models. In this regard it is worth noting that the effect has received less empirical support than the other observations of ACC function related to conflict, errors, and surprise. Although the effect is reproducible (Alexander & Brown, 2010), it appears to be relatively weak (Brown, 2009), and varies substantially across individuals in terms of their sensitivity to risk (Alexander, Fukunaga, Finn, & Brown, 2015; Brown & Braver, 2007; cf. Brown & Braver, 2008). Further, one laboratory failed to replicate the result in two fMRI experiments (Nieuwenhuis, Schweizer, Mars, Botvinick, & Hajcak, 2007) and also found ERP evidence that is inconsistent with the theory (Yeung & Nieuwenhuis, 2009; see also Hammer, Rautzenberg, Heldmann, & Münte, 2011). These considerations suggest that the error likelihood effect warrants further investigation.
By contrast, only the RNN model has yet been shown to account for the type of multivariate activity illustrated in Figs. 4 and 5. A useful exercise would entail training the PRO, RNN, and other models of ACC function on the same task and then applying representational similarity analysis to compare the network properties of each of the models against empirical data (Kriegeskorte & Kievit, 2013; Kriegeskorte, Mur, & Bandettini, 2008).
Discrepancy signal function
Although these simulations illustrate the role of ACC in tracking the execution of goal-directed action sequences and in detecting discrepancies between predicted and actual events in the sequence, they do not apply the discrepancy signals for any functional purpose. A perplexing characteristic of ACC is that despite clear evidence that such discrepancy signals are correlated with adaptive adjustments to behavior, suggesting that ACC is involved in behavioral regulation (Cavanagh & Shackman, 2015), lesions to ACC only minimally impair these adjustments, indicating that the trial-to-trial changes in performance are actually carried out by other brain areas (Holroyd & Yeung, 2012). For this reason, we have argued elsewhere that ACC is responsible for motivating control over the execution of extended action sequences rather than in adjusting behavior from one moment to the next (Holroyd & Yeung, 2012), a property that we have implemented in a previous model of ACC (Holroyd & McClure, 2015). In keeping with this perspective, additional simulations (see online Supplementary Materials) suggest that the discrepancy signals might regulate a control signal that is used by an action production system to produce actions that comport to the task objectives, for example, by providing contextual information that prevents against the production of capture errors and other action slips during the execution of goal-directed action sequences. Alternatively, the discrepancy signals might promote exploration of alternative task strategies (Donoso, Collins, & Koechlin, 2014; Kuwabara et al., 2014; Schuck et al., 2015; Tervo et al., 2014), in parallel to a proposed switching mechanism regulated by tonic dopamine levels in ACC (Holroyd & McClure, 2015) or could provide an update or training signal to the predictive model mediated by ACC (e.g., O’Reilly et al., 2013). We believe that the evidence to date is in insufficient to decide between these possibilities, which in any case are not mutually exclusive.
A promising avenue for investigation would entail instantiating the model in a more biologically realistic network that incorporates finer temporal dynamics into the recurrent activity (Sussillo, 2014), in line with previous efforts to simulate the role of prefrontal cortex in response generation and decision making (Erlich & Brody, 2013; Mante, Sussillo, Shenoy, & Newsome, 2013; Moody, Wise, di Pellegrino, & Zipser, 1998; Nakahara & Doya, 1998; X.-J. Wang, 2008). These networks typically exhibit complex temporal dynamics that are amenable to investigation using an arsenal of mathematical tools from nonlinear dynamical systems analysis (e.g., Durstewitz & Deco, 2008; Sussillo, 2014; Sussillo & Barak, 2013; Wolf, Engelken, Puelma-Touzel, Weidinger, & Neef, 2014). Although notoriously difficult to train (Sussillo, 2014), recent advances (e.g., Martens, 2010; Martens & Sutskever, 2011; Song, Yang, & Wang, 2016) have been encouraging (Abbott, DePasquale, & Memmesheimer, 2016; Ardid & Wang, 2013; Song et al., 2016; Sussillo & Abbott, 2009). For example, networks of sparsely and recurrently connected spiking neurons have been trained using a biologically realistic reward signal to compute rule-specific decisions based on information maintained in working memory (Hoerzer, Legenstein, & Maas, 2014), though it remains an open question whether or not such principles can be utilized to simulate the neural mechanisms of hierarchically organized action sequences (for efforts in this directions, see Namikawa, Nishimoto, & Tani, 2011; Nishimoto & Tani, 2004; Rao & Sejnowski, 2000; Starzyk & He, 2007; Yamashita & Tani, 2008).
In other work, we have simulated the role of ACC in learning the value of and selecting tasks based on principles of hierarchical reinforcement learning (HRL; Holroyd & McClure, 2015). An obvious next step would therefore be to integrate the RNN model of ACC into the HRL framework. A hybrid model of ACC would see the action policy for each option implemented in a separate RNN and would select and execute the RNNs based on their learned reward values. This approach could help resolve a debate about whether the execution of hierarchically organized action sequences are better represented with connectionist-style RNNs (Botvinick, 2005; Botvinick & Plaut, 2002, 2004), or with rule-following symbolic processes that explicitly organize action sequences according to goals and sub-goals (Cooper & Shallice, 2000, 2006). An integrated model would allow for options to be flexibly combined in novel configurations according to principles of HRL (Hengst, 2012) while retaining the strengths of the connectionist implementation, such as the ability to generalize learned structure across contextual domains (Botvinick & Plaut, 2002, 2004). These two approaches are not actually incompatible (Cooper & Shallice, 2006), and recent efforts have seen the development of hybrid models that explicitly encode goal and subgoal states in RNNs (Cooper, Ruh, & Mareschal, 2014). Related work has investigated how an RNN can implement an “actor-critic” architecture that executes action sequences according to principles of reinforcement learning (Ruh, Cooper, & Mareschal, 2005; see also Cooper & Glasspool, 2001).
Decades of research on ACC have revealed a colorful but bewildering landscape of empirical findings. Our simulations show the proverbial forest for the trees, where the trees consist of observations of ACC activity to individual task events and the forest is the dynamically evolving relationship between these observations. This proposal is cast in a formal theoretical framework that accounts for existing ensemble activity in ACC of nonhuman animals as well as for conflict, surprise, and related signals commonly observed in ACC in human functional neuroimaging and electrophysiological experiments. In the context of previous work on HRL (Holroyd & McClure, 2015; Holroyd & Yeung, 2012), these efforts point toward a unified account of the role of ACC in action selection and performance monitoring.
Unit sensitivity was determined as follows. Press units were defined as units with high average activity to the nine press actions, as determined by the following permutation analysis. The distribution of average unit activities to all combinations of nine out of the 21 possible events (seven events across each of three sequences) was estimated by selecting 10,000 different permutations of nine randomly chosen events, and then averaging the activities of each unit across each set of nine events. The threshold (.77) was defined as the mean (0.53) plus the standard deviation (0.24) of the distribution. Orientation units were identified as units with high average activity to the six orientation actions, following an analogous permutation approach (across multiple combinations of 6 non-specific events), which yielded a threshold of .80 (mean = 0.53, standard deviation = 0.27). Sequence units were defined as units with higher average activity to the first event in each sequence (i.e., to the three sequence cues) relative to the average activity to all the remaining events in the sequence. (Note that this definition of sequence unit refers to the hidden layer, which should not be confused with the sequence units in the input layer that indicate which of the three sequences to execute.) For the permutation analysis, each units’ activity averaged across 18 randomly selected events was subtracted from the average activity of three randomly selected events, which yielded a threshold of 0.71 (M = −0.001, SD = 0.71). Ramping units were defined as units with average activity to each of three pairs of orient (O1, O2, O3) and press (P1, P2, P3) actions—that is, (O1, P1), (O2, P2), (O3, P3)—that systematically increased by more than .1 across each successive pair of pairs—that is, activity(O2, P2) > .1 + activity(O1, P1) and activity(O3, P3) > .1 + activity(O2, P2). Simulation results were broadly consistent across a range of hidden layer sizes (20, 50, and 100 units).
Figure 5e indicates the Euclidean distance between representations and Fig. 5f indicates the Mahalanobis distance. The Mahalanobis distance corresponds to Euclidean distance when each axis is rescaled to have unit variance. Further, the control condition in Fig. 5f is derived from a within-subjects random shuffling of trials across conditions. Because each network (“subject”) in the simulation performed each sequence only once, the same within-subjects comparison for the simulation was not possible. Therefore, the control condition in Fig. 5e reflects the Euclidean distance between hidden unit network representations associated with different lever presses occurring at the same serial position across sequences.
The order of authorship is arbitrary; both authors made equal contributions to the research and preparation of this article. This research was supported in part by funding from the Canada Research Chairs program and a Natural Sciences and Engineering Research Council of Canada Discovery Grant (312409–05) awarded to C.B.H.
- Abbott, L. F., DePasquale, B., & Memmesheimer, R.-M. (2016). Building functional networks of spiking model neurons. Nature Neuroscience, 19, 350–355. doi: 10.1038/nn.4241
- Botvinick, M. M. (2005). Modeling routine sequential action with recurrent neural nets. In J. J. Bryson, T. J. Prescott, & A. K. Seth (Eds.), Modeling natural action selection (pp. 180–187). Cambridge: Cambridge University Press.Google Scholar
- Bush, G. (2009). Dorsal anterior midcingulate cortex: Roles in normal cognition and disruption in attention-deficit/hyperactivity disorder. In B. A. Vogt (Ed.), Cingulate neurobiology and disease (pp. 245–274). Oxford: Oxford University Press.Google Scholar
- Caracheo, B. F., Emberly, E., Hadizadeh, S., Hyman, J. M., & Seamans, J. K. (2013). Abrupt changes in the patterns and complexity of anterior cingulate cortex activity when food is introduced into an environment. Frontiers in Neuroscience, 7, 74. doi: 10.3389/fnins.2013.00074 PubMedPubMedCentralCrossRefGoogle Scholar
- Churchland, P. S., & Sejnowski, T. J. (1992). The computational brain. Cambridge: MIT Press.Google Scholar
- Davis, K. D., Taylor, K. S., Hutchison, W. D., Dostrovsky, J. O., McAndrews, M. P., Richter, E. O., & Lozano, A. M. (2005). Human anterior cingulate cortex neurons encode cognitive and emotional demands. Journal of Neuroscience, 25, 8402–8406. doi: 10.1523/jneurosci.2315-05.2005 PubMedCrossRefGoogle Scholar
- Falkenstein, M., Hohnsbein, J., Hoormann, J., & Blanke, L. (1990). Effects of errors in choice reaction tasks on the ERP under focused and divided attention. In C. H. M. Brunia, A. W. K. Gaillard, & A. Kok (Eds.), Psychophysiological brain research (pp. 192–195). Tilberg: Tilburg University Press.Google Scholar
- Hayden, B. Y., Heilbronner, S. R., Pearson, J. M., & Platt, M. L. (2011). Surprise signals in anterior cingulate cortex: Neuronal encoding of unsigned reward prediction errors driving adjustment in behavior. Journal of Neuroscience, 31, 4178–4187. doi: 10.1523/jneurosci.4652-10.2011 PubMedPubMedCentralCrossRefGoogle Scholar
- Holroyd, C. B. (2004). A note on the oddball N200 and the feedback ERN. In M. Ullsperger & M. Falkenstein (Eds.), Errors, conflicts, and the brain: Current opinions on performance monitoring (pp. 211–218). Leipzig: MPI of Cognitive Neuroscience.Google Scholar
- Holroyd, C. B. (2016). The waste disposal problem of effortful control. In T. Braver (Ed.), Motivation and cognitive control (pp. 235–260). New York: Psychology Press.Google Scholar
- Holroyd, C. B., & Yeung, N. (2011). An integrative theory of anterior cingulate cortex function: Option selection in hierarchical reinforcement learning. In R. B. Mars, J. Sallet, M. F. S. Rushworth, & N. Yeung (Eds.), Neural basis of motivational and cognitive control (pp. 333–349). Cambridge: MIT Press. doi: 10.7551/mitpress/9780262016438.003.0018 Google Scholar
- Jongsma, M. L. A., van Rijn, C. M., Gerrits, N. J. H. M., Eichele, T., Steenbergen, B., Maes, J. H. R., & Quiroga, R. Q. (2013). The learning-oddball paradigm: Data of 24 separate individuals illustrate its potential usefulness as a new clinical tool. Clinical Neurophysiology, 124, 514–521. doi: 10.1016/j.clinph.2012.09.009 PubMedCrossRefGoogle Scholar
- Kaping, D., Vinck, M., Hutchison, R. M., Everling, S., & Womelsdorf, T. (2011). Specific contributions of ventromedial, anterior cingulate, and lateral prefrontal cortex for attentional selection and stimulus valuation. PLoS Biology, 9, e1001224. doi: 10.1371/journal.pbio.1001224 PubMedPubMedCentralCrossRefGoogle Scholar
- Khamassi, M., Lallee, S., Enel, P., Procyk, E., & Dominey, P. F. (2011). Robot cognitive control with a neurophysiologically inspired reinforcement learning model. Frontiers in Neurorobotics, 6(1), 1–14.Google Scholar
- Kuwabara, M., Mansouri, F. A., Buckley, M. J., & Tanaka, K. (2014). Cognitive control functions of anterior cingulate cortex in macaque monkeys performing a Wisconsin card sorting test analog. Journal of Neuroscience, 34, 7531–7547. doi: 10.1523/jneurosci.3405-13.2014 PubMedPubMedCentralCrossRefGoogle Scholar
- Lapish, C. C., Durstewitz, D., Chandler, L. J., & Seamans, J. K. (2008). Successful choice behavior is associated with distinct and coherent network states in anterior cingulate cortex. Proceedings of the National Academy of Sciences, 105, 11963–11968. doi: 10.1073/pnas.0804045105 CrossRefGoogle Scholar
- Martens, J. (2010). Deep learning via Hessian-free optimization. Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel.Google Scholar
- Martens, J., & Sutskever, I. (2011). Learning recurrent neural networks with hessian-free optimization. Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA.Google Scholar
- O’Reilly, J. X., Schüffelgen, U., Cuell, S. F., Behrens, T. E. J., Mars, R. B., & Rushworth, M. F. S. (2013). Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proceedings of the National Academy of Sciences, 110, E3660–E3669. doi: 10.1073/pnas.1305373110 CrossRefGoogle Scholar
- Oliveira, F. T. P., McDonald, J. J., & Goodman, D. (2007). Performance monitoring in the anterior cingulate is not all error related: Expectancy deviation and the representation of action-outcome associations. Journal of Cognitive Neuroscience, 19, 1994–2004. doi: 10.1162/jocn.2007.19.12.1994 PubMedCrossRefGoogle Scholar
- Rao, R. P. N., & Sejnowski, T. J. (2000). Predictive learning of temporal sequences in recurrent neocortical circuits. In S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.), Advances in neural information processing systems 12 (pp. 164–70). Cambridge: MIT Press.Google Scholar
- Ridderinkhof, K. R., van den Wildenberg, W. P. M., Segalowtiz, S. J., & Carter, C. S. (2004). Neurocognitive mechanisms of cognitive control: The role of prefrontal cortex in action selection, response inhibition, performance monitoring, and reward-based learning. Brain and Cognition, 56, 129–140. doi: 10.1016/j.bandc.2004.09.016 PubMedCrossRefGoogle Scholar
- Rolls, E. G. (2009). The anterior and midcingulate cortices and reward. In B. A. Vogt (Ed.), Cingulate neurobiology and disease (pp. 191–218). Oxford: Oxford University Press.Google Scholar
- Ruh, N., Cooper, R. P., & Mareschal, D. (2005). Routine action: Combining familiarity and goal orientedness. In J. J. Bryson, T. J. Prescott, & A. K. Seth (Eds.), Modeling natural action selection: Proceedings of an international workshop (pp. 174–179). Edinburgh: AISB Press.Google Scholar
- Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing: Explorations in the microstructure of cognition (vol. 1). Cambridge: MIT Press.Google Scholar
- Shadmehr, R., & Wise, S. P. (2005). The computational neurobiology of reaching and pointing. Cambridge: MIT Press.Google Scholar
- Shen, C., Ardid, S., Kaping, D., Westendorff, S., Everling, S., & Womelsdorf, T. (2015). Anterior cingulate cortex cells identify process-specific errors of attentional control prior to transient prefrontal-cingulate inhibition. Cerebral Cortex, 25, 2213–2228. doi: 10.1093/cercor/bhu028 PubMedCrossRefGoogle Scholar
- Song, H. F., Yang, G. R., & Wang, X.-J. (2016). Training excitatory-inhibitory recurrent neural networks for cognitive tasks: A simple and flexible framework. PLOS Computational Biology, 1–30. doi: 10.1371/journal.pcbi.1004792
- Tanji, J., Shima, K., & Matsuzaka, Y. (2002). Reward-based planning of motor selection in the rostral cingulate motor area. In S. C. Gandevia, U. Proske, & D. G. Stuart (Eds.), Sensorimotor control of movement and posture (pp. 417–423). New York: Springer. doi: 10.1007/978-1-4615-0713-0_47 CrossRefGoogle Scholar
- Wang, C., Ulbert, I., Schomer, D. L., Marinkovic, K., & Halgren, E. (2005). Responses of human anterior cingulate cortex microdomains to error detection, conflict monitoring, stimulus–response mapping, familiarity, and orienting. Journal of Neuroscience, 25, 604–613. doi: 10.1523/jneurosci.4151-04.2005 PubMedCrossRefGoogle Scholar
- Williams, R. J., & Zipser, D. (1995). Gradient-based learning algorithms for recurrent networks and their computational complexity. In Y. Chauvin & D. E. Rumelhart (Eds.), Back-propagation: Theory, architectures and applications (pp. 433–486). Hillsdale: Erlbaum.Google Scholar