Does intrinsic reward motivate cognitive control? a naturalistic-fMRI study based on the synchronization theory of flow

Cognitive control is a framework for understanding the neuropsychological processes that underlie the successful completion of everyday tasks. Only recently has research in this area investigated motivational contributions to control allocation. An important gap in our understanding is the way in which intrinsic rewards associated with a task motivate the sustained allocation of control. To address this issue, we draw on flow theory, which predicts that a balance between task difficulty and individual ability results in the highest levels of intrinsic reward. In three behavioral and one functional magnetic resonance imaging studies, we used a naturalistic and open-source video game stimulus to show that changes in the balance between task difficulty and an individual’s ability to perform the task resulted in different levels of intrinsic reward, which is associated with different brain states. Specifically, psychophysiological interaction analyses show that high levels of intrinsic reward associated with a balance between task difficulty and individual ability are associated with increased functional connectivity between key structures within cognitive control and reward networks. By comparison, a mismatch between task difficulty and individual ability is associated with lower levels of intrinsic reward and corresponds to increased activity within the default mode network. These results suggest that intrinsic reward motivates cognitive control allocation.

Planning, goal maintenance, performance monitoring, response inhibition, and reward processing are key features of cognitive control (Miller, 2000;Miller & Cohen, 2001). However, much of the work in this area has largely ignored motivation despite the fact that it is theorized to play a role in control allocation and task performance . Recent attempts at integrating these two constructs have largely focused on the ways in which reward expectation motivates the allocation of control (Botvinick & Braver, 2014). A key finding demonstrates that control allocation is a function of anticipated task difficulty and expected rewards where humans strive to find an optimal balance between the two (Kool & Botvinick, 2014). Upon task completion, consummatory reward mechanisms track task-related outcomes and motivate subsequent behavior to maximize future rewards (O'Doherty et al., 2004). By comparison, the way in which task-related intrinsic rewards (Deci & Ryan, 1985) motivate the sustained allocation of cognitive control during task execution remains largely unknown .
Mounting evidence has demonstrated that increased extrinsic rewards (e.g., monetary payments) are associated with increases in sustained task performance and increased neural activity in attentional, reward, and cognitive control networks (Engelmann, Damaraju, Padmala, & Pessoa, 2009;Locke & Braver, 2008). Similarly, the intrinsically rewarding nature of self-determined choice has been shown to elicit activity in reward-network structures and corresponds with increases in task enjoyment and performance (Kang et al., 2009;Leotti & Delgado, 2011;Murayama et al., 2015). Although robust evidence shows that, under some circumstances, demanding tasks can be intrinsically rewarding (for a review, see: Inzlicht, Shenhav, & Olivola, 2018), it is unknown how intrinsic rewards resulting from task demands (and not from choice) motivate cognitive control allocation. This may be due, at least in part, to the difficulty of manipulating task-based intrinsic reward in a laboratory setting.
Flow theory (Csikszentmihalyi, 1975) offers a potential solution for overcoming this challenge. Flow theory posits that the sustained execution of a task is experienced as being intrinsically rewarding when there is a balance between the task's difficulty and an individual's ability to meet the task's demands (for a modern treatment, see Inzlicht et al., 2018). By comparison, the theory predicts that a mismatch between task difficulty and individual ability leads to different psychological states. Tasks for which difficulty is greater than individual ability leads to a state of anxiety, whereas tasks for which difficulty is less than individual ability leads to boredom (Nakamura & Csikszentmihalyi, 2005).
Importantly, flow is experienced as intrinsically rewarding such that that participants undertake a flow-inducing task Bfor its own sake, with little concern for what they will get out of it, even when it is difficult^ (Csikszentmihalyi, 1990, p. 71). While flow states have been observed across a diversity of activities, including musical composition, athletics, creative and artistic work, etc., they also are shown to emerge during video game use as enjoyable video games depend on a balance between game difficulty and player ability (Sherry, 2004). Evidence using a video game stimulus demonstrates that allowing for task difficulty to vary in relationship to individual ability results in a curvilinear relationship where self-reported intrinsic reward is low when difficulty ≠ ability and high when difficulty ≈ ability (Keller & Bless, 2008). A recent behavioral and psychophysiological study using a racing video game also showed that the flow state (difficulty ≈ ability) resulted in the highest levels of absorption, attentional effort, and efficient gaze compared with conditions where difficulty ≠ ability (Harris, Vine, & Wilson, 2017a).
Progress also has been made towards understanding the neural basis of flow. Specifically, the synchronization theory of flow predicts intrinsically rewarding state of flow results from a network synchronization process between structures within cognitive control and reward networks when task difficulty ≈ individual ability (Weber, Huskey, & Craighead, 2016;Weber, Tamborini, Westcott-Baker, & Kantor, 2009). In two independent functional magnetic resonance imaging studies (fMRI), subjects answered math problems during a fMRI scanning session (Ulrich, Keller, & Grön, 2016b;Ulrich, Keller, Hoenig, Waller, & Grön, 2014). Problems that matched subject's ability corresponded to the highest levels of intrinsic reward compared with problems that were too easy or difficult. This balance between difficulty and ability also was associated with increased activity in attentional and cognitive control structures, particularly the inferior frontal gyrus (IFG), anterior insula, and the superior and inferior parietal lobes (SPL, IPL). Increased activity was observed in the dorsal striatum (both caudate nucleus and putamen), regions implicated in consummatory reward processing (O'Doherty et al., 2004;Satterthwaite et al., 2007) and performance monitoring during cognitive control (Berkman, Falk, & Lieberman, 2012). Similar experimental paradigms using video game stimuli indicate that a balance between difficulty and ability corresponds with activation in attentional (lateral prefrontal cortex, cerebellum, thalamus, SPL) and reward (caudate nucleus, nucleus accumbens, putamen) structures (Klasen, Weber, Kircher, Mathiak, & Mathiak, 2012;Yoshida et al., 2014). These results provides preliminary support for synchronization theory's structural predictions (for a recent review, see Harris, Vine, & Wilson, 2017b).
By comparison, a mismatch between difficulty and ability is associated with lower levels of intrinsic reward and increased levels of activity among default mode network structures (DMN; Ulrich, Keller, & Grön, 2016a;Ulrich et al., 2014). Similar findings have been observed in a study using a naturalistic video game stimulus (Mathiak, Klasen, Zvyagintsev, Weber, & Mathiak, 2013). Moreover, sustained performance on difficult cognitive tasks has been shown to exhaust subjects, resulting in a shift from activity in frontoparietal control networks to the DMN (Esposito, Otto, Zijlstra, & Goebel, 2014). These results suggest that intrinsic reward may motivate task engagement and be a key factor driving shifts in brain-network organization between one optimized for cognitive control and one that characterizes task disengagement. Converging evidence shows that the insula plays a key role in shifting between these networks (Chang, Yarkoni, Khaw, & Sanfey, 2013) where changes in activity within this structure predict task disengagement (Meyniel, Sergent, Rigoux, Daunizeau, & Pessiglione, 2013). These results suggest that task-related intrinsic reward modulates the allocation of cognitive control during task performance and that variation in intrinsic reward impacts networked brain connectivity patterns. Accordingly, and consistent with flow theory, we predict that self-reported intrinsic reward should be highest when task difficulty ≈ individual ability compared with conditions where task difficulty ≠ individual ability. If true, then synchronization theory predicts functional connectivity between key structures within the cognitive control and reward networks when task difficulty ≈ individual ability but not when difficulty ≠ individual ability.
To date, much of the flow literature has relied on self-report measures administered after a flow-inducing task. As a source of convergent validity, and to overcome potential limitations associated with self-reports (Nisbett & Ross, 1980), we also included an online behavioral measure for evaluating our experimental manipulation. Previous experimentation has shown a curvilinear relationship between motivation and attentional engagement (Lang, 2000). Within the context of motivated attentional engagement, such results have a straightforward interpretation. All other things being equal, subjects should allocate more attentional resources to motivationally relevant tasks compared with less motivationally relevant tasks. It follows that tasks perceived as having higher levels of intrinsic reward should be more motivationally relevant than tasks that are perceived as having lower levels of intrinsic reward. Therefore, subjects should show more attentional engagement when intrinsic reward is high compared with low. To test this, we had subjects perform a secondary task reaction time procedure (STRT; Lang, Bradley, Park, Shin, & Chung, 2006) while playing the experimental video game stimulus. We predicted that reaction times will show an inverted Ushaped pattern where attentional engagement with the video game stimulus is highest (and therefore STRTs are the longest) when task difficulty ≈ individual ability compared to conditions where task difficulty ≠ individual ability.
This manuscript details the validation of an experimental protocol for manipulating intrinsic reward and its application to an fMRI context. Our results provide self-report, behavioral, and neuropsychological evidence (using both brain activation and functional connectivity analyses) demonstrating a relationship between intrinsic reward and cognitive control. We conclude with a discussion of the implications of our findings, consider how our behavioral paradigm answers recent calls for more naturalistic experimental designs within cognitive neuroscience literature, and outline next-steps for future research in this area.

General overview
Three behavioral experiments were conducted to evaluate a novel procedure for manipulating and measuring the relationship between task difficulty, individual ability, intrinsic reward, and cognitive control. This procedure was then adapted to an fMRI context. All four experiments shared the same conceptual logic such that subjects played a video game while responding to a STRT measure ( Figure 1). We detail differences in gameplay and STRT parameters below.
Subjects Human subjects in each experiment were drawn from a pool of students at the University of California Santa Barbara (Table 1; final n's for experiment: one = 122, two = 110, three = 87, fMRI = 18). Subjects in all experiments (behavioral and fMRI) were screened prior to participation and were not recruited if they had participated in any of the previous studies. Accordingly, subjects in all experiments did not have prior experience with the video game stimulus or experimental paradigm. The University's Institutional Review Board approved all experiments. Subjects in the fMRI experiment were right-handed, had normal or corrected to normal vision, and did not demonstrate any contraindication to fMRI scanning. Experiment three showed that self-reported video game ability was a significant predictor of actual video game performance. Accordingly, subjects were not recruited for the fMRI study if they reported very high or low video game ability.
Previous behavioral research evaluating engagement with video games has shown considerable variability in effect sizes (Raines, Levine, & Weber, 2018;Sherry, 2001). Accordingly, small effects were assumed when calculating a power analysis for the first behavioral experiment with subsequent behavioral experiments seeking to maintain comparable sample sizes. The fMRI sample size corresponded to related studies reported in the literature (Desmond & Glover, 2002;Friston, 2012). One run for one subject was excluded from the fMRI experiment due to equipment malfunction; two subjects voluntarily withdrew from the study during initial structural image acquisition.
Naturalistic video game stimulus In experiments 1 and 2, participants played Star Reaction (ABiGames), a point-and-click style video game where subjects used their cursor to collect star-shaped targets that were displayed at different locations on a screen while avoiding rings that bounced around the screen. Thirteen levels incrementally manipulated difficulty by altering the number of targets a subject needed to collect, the number of objects to be avoided, and the rate at which these objects moved around the video game window. While useful for initial testing, Star Reaction offered few options for interface customization, thereby limiting experimental control. To overcome this issue, an open-source variant called Asteroid Impact (CC BY-SA 4.0) was developed for experiment three and the subsequent fMRI experiment. Asteroid Impact was designed to have similar mechanics to Star Reaction while allowing for tighter experimental control (the experimental video game stimulus and its supporting documentation can be downloaded from: https://github.com/ richardhuskey/asteroid_impact). Secondary task reaction time measurement Subjects completed a STRT measure while playing the experimental video game ( Figure 1). STRTs were defined as the latency between the onset time of a stimulus (trial) and the moment when a subject responded with a key press. For experiments 1 and 2, each condition included 48 trials that lasted for 1,500 ms. Only visual trials were used in experiment 1, whereas half of the visual trials were replaced with auditory trials (sine waveform, 440.0 Hz) in experiment 2. The intertrial interval (ITI) for each trial was calculated by adding a sample of normally distributed randomly generated numbers (M = 1,969 ms, SD = 1,000 ms) to a baseline of 1,500 ms. In experiment 3 and the fMRI experiment, 24 visual trials were shown for each condition. The ITI for these trials was jittered around a truncated Gaussian distribution with a floor of 1,500 ms and a standard deviation of 2.0. In the behavioral experiment, subjects responded to STRT trials by using their nondominant hand to press the spacebar key on a computer keyboard. In the fMRI experiment, subjects used the thumb on their left hand (all subjects were right-handed) to press a button on an MRI safe button box. Trials were shown in one of five possible locations on a second screen in the behavioral experiment and were shown in one of four possible locations on the same screen in the fMRI experiment.

Measuring intrinsic reward
In experiments 1 and 2, intrinsic reward was measured using a 4-item, 7-point scale (Bowman, Weber, Tamborini, & Sherry, 2013;Weber, Behr, & Bates, 2014). Experiment 3 used the Event Experience Scale, a better *Self-reported video game ability was measured using a 4-item scale in experiments 1 and 2 and with a 7-item scale in experiment 3 and the fMRI study. validated and more widely used measure of task-related intrinsic reward (Jackson & Marsh, 1996). Specifically, selfreported intrinsic reward was measured using the 4-item, 5point autotelic experience subscale. Items on this scale included: BI really enjoyed the experience^; BI loved the feeling of performance and want to capture it again^; BThe experience left me feeling great^; and BI found the experience extremely rewarding.M easuring individual differences in intrinsic reward sensitivity Experiment 3 measured intrinsic reward sensitivity, which is understood as a trait-level measure, using the 4-item, 5point autotelic personality subscale of the Activity Experience Scale (Jackson & Eklund, 2004).
Measuring video game ability It is possible that subject's video game ability explains differences in self-reported intrinsic reward as well as STRT performance. Accordingly, video game ability was included as an a priori defined covariate of no interest. In experiments 1 and 2, video game ability was evaluated using a 4-point single-item measure where subjects were asked to Bindicate their general video game skill.^In experiment 3 and the fMRI study, this was changed to a 7-point single-item measure. In addition, and based on evidence that performance on different cognitive tasks correlates with video game ability (Bowman et al., 2013;Sherry, 2004), established behavioral measures of targeting (Watson & Kimura, 1989), attentional vigilance (Robertson, Manly, Andrade, Baddeley, & Yiend, 1997), dual-tasking ability (Erickson et al., 2007), and three-dimensional mental rotation (Peters et al., 1995) were collected as independent behavioral proxies for video game ability in experiment 3 (Figures 2, 3, 4

and 5).
Three-dimensional mental rotation The redrawn Vandenberg and Kuse mental rotations test (Peters et al., 1995) was administered in two three-minute runs. For each run, subjects were shown 12 three-dimensional reference shapes. For each reference shape, subjects were asked to identify which two (out of four) shapes matched the reference. Subjects were given a point if they correctly identified both shapes (M = 7.298, SD = 3.894, range = 0-22).
Sustained attention response test Following Robertson et al. (1997), subjects were shown a series of numbers (1-9) in five different font sizes for 250 ms (font sizes were balanced across all values). The trial was then masked for 900 ms. Subjects were instructed to press a key as quickly as possible for all numbers (a go trial) except the number 3 (a no-go trial). A total of 225 trials were shown, 25 of which were no-go trials. Mirroring previous studies (Unsworth et al., 2015), the two dependent measured included: (1) accuracy operationalized as the frequency count of no-go trials where a key press was withheld (M = 21.824, SD = 2.780, range = 11-25) and (2) the standard deviation of reaction times for correct go trials (M = 453.012, SD = 87.169, range = 102.07-544.40).
Dual-task paradigm Consistent with Erickson et al. (2007), subjects were shown two types of trials (single-mixed, dual-   (Peters et al., 1995). This test was conducted as a potential measure of video game skill in experiment 3. mixed), which lasted for 2,500 ms and were separated by a 500-ms fixation cross. In single mixed trials, subjects were shown one of four possible stimuli: >, <, a red square, or a green square. Each stimulus was mapped to a specific key and subjects were instructed to press the correct key as quickly as possible when a trial was shown without sacrificing accuracy. In the dual-mixed condition, two of four possible stimuli were shown, and subjects were instructed to press the two keys that corresponded to each stimulus. A total of eight combinations of single-and dual-mixed trials were possible. Each was presented a total of 20 times in a randomized order. Two dependent measures were assessed: (1) accuracy, the total number of dual-mixed trials where both keys were correctly pressed (M = 67.279, SD = 13.495, range = 5-79), and (2) (Robertson et al., 1997). This test was conducted as a potential measure of video game skill in experiment 3.   (Erickson et al., 2007). This test was conducted as a potential measure of video game skill in experiment 3.
Targeting task Subjects targeting abilities were evaluated using a dart-throwing procedure (Watson & Kimura, 1989). A 60-cm diameter circular target with the bullseye 152 cm from the floor was fixed to a wall 3 m from where subjects stood. Subjects completed 25 overhand throws of a 25-gram dart using their dominant hand. The distance of each throw from the center was recorded in millimeters and averaged for each subject (M = 137.838, SD = 27.085, range = 70.89-207.00). Smaller values indicated greater accuracy.
Behavioral localizer tasks This fMRI experiment used n-back and gambling tasks (Figures 6-7) to localize behaviorally the neural activations in cognitive control and reward regions of interest (respectively). These tasks were selected a priori to allow us to define seed regions of interest (ROIs) for psychophysiological interaction analyses (PPI, see below) where the ROIs were defined by two tasks that were independent of our main behavioral manipulation. This decision had two benefits. First, using independently localized ROIs prevented circularity in our analysis that might otherwise inflate our statistical results. Moreover, these localizer tasks were selected because they also were used in the Human Connectome Project, which helps to integrate our findings within the broader literature. The n-back task was used to behaviorally localize functional activity in cognitive control regions of interest. The n-back task was selected as it shows reliable activation patterns across subjects (Drobyshevsky, Baumann, & Schneider, 2006), sessions (Caceres, Hall, Zelaya, Williams, & Mehta, 2009), and does not show gender differences (Schmidt et al., 2009). In a series of 2 runs, subjects were shown 320 trials where each trial was a randomly selected letter from A-Z that was shown for 1,000 ms. In the 2-back condition, subjects were required to press a key when the letter shown was the same as one shown two trials back. In the 0-back condition, subjects pressed a key when the trial showed the letter X. Each run followed a 2-back (40 trials), 0-back (40 trials), 2-back (40 trials), and 0-back (40 trials) pattern. Subjects were instructed to prioritize accuracy before speed. The 2-back and 0-back conditions were modeled in a block design with a 2-back > 0-back contrast in subsequent fMRI data analyses. A priori hypothesized seed ROIs (in MNI152 space) for the PPI analysis were generated based on peak activations resulting from this contrast and included: right DLPFC (32, 54, 10), left DLPFC (−32, 54, 10), right thalamus (16, −16, 10), and the left thalamus (−8, −10, −2). Additionally, our primary brain activation analysis (discussed below) implicated an additional Figure 5. Experimental schematic of the targeting task (Watson & Kimura, 1989). This test was conducted as a potential measure of video game skill in experiment 3.  Figure 6. Experimental schematic of the N-back procedure. This task was conducted in the fMRI experiment to independently localize cognitive control ROIs interesting a posteriori region of interest, which we also were able to localize independently by using the n-back task. This was the right insula (40, 16, −6).
Structures within the reward network were behaviorally localized using a gambling task that has been shown to activate the basal ganglia reliably (Delgado, Nystrom, Fissell, Noll, & Fiez, 2000;May et al., 2004;Tricomi, Delgado, & Fiez, 2004). In this task, subjects were shown a series of cards with a numeric value of 1-9. During an initial guessing period (2,500 ms), subjects were asked to indicate if they thought value of the card was greater than or less than 5. Subjects were then shown the outcome of their guess (1,000 ms) and then a fixation cross during the post-outcome period (11,500 ms) for a cumulative trial duration of 15,000 ms. A total of 100 trials were shown across 5 runs. Subjects were rewarded 1 dollar for correct guesses, lost 50 cents for incorrect guesses, and did not win or lose any money for tie trials. The ratio of wins, losses, and ties was set at 40:40:20 (balanced across all runs). Neural activity during the post-outcome period was modeled in an event-related design with a wins > loss contrast. Seed ROIs (in MNI152 space) for the PPI analysis were generated based on peak activations resulting from this contrast and included: right ventral striatum/nucleus accumbens (10, 16, −6), left ventral striatum/nucleus accumbens (−10, 16, −6), right dorsal striatum/putamen (16, 12, −6), and the left dorsal striatum/ putamen (−18, 12, 6).
Procedures Subjects provided informed consent before each experiment was conducted. Self-reported video game ability, intrinsic reward sensitivity, and baseline reaction times were collected at the beginning of each experiment. Subjects then familiarized themselves with the video game stimulus by reading the rules and by repeatedly playing the video game's first level for a period of 2 minutes. Subjects then played three randomly ordered conditions that manipulated low-difficulty, high-difficulty, and balanced-difficulty (see Figure 1c for a conceptual schematic). Subjects were instructed to try to complete as many levels as possible during each condition. The low-difficulty condition (ability > difficulty) was operationalized as repeated play of the video game's first and least challenging level, whereas the high-difficulty condition (ability < difficulty) required repeated play of the most challenging level.
Of critical importance for flow theory is the way in which task difficulty is balanced with individual ability. In the balanced-difficulty condition (ability ≈ difficulty), video game difficulty and player ability were matched by incrementally increasing the game's difficulty after a subject completed a level. This manipulation relies on a logic common to video game design (Koster, 2005) where once an individual has developed sufficient skill to beat one level, the next level is incrementally more difficult. This simple procedure ensures that task difficulty is constantly matched with individual ability. In the present study, the balanced-difficulty condition started on the game's second level. Each level required subjects to collect a certain number of targets. Level difficulty increased once subjects had collected all targets for a given level. In experiments 1 and 2, video game difficulty was determined based on the default Star Reaction settings. Asteroid Impact allowed us to tune the video game's parameters in order to adjust difficulty. The parameters used in experiment 3 and the fMRI study are now discussed in more detail.
The low-difficulty condition required subjects to collect three targets while avoiding just one object. By comparison, the high-difficulty condition required that subjects collect 25 targets while avoiding seven objects of varying sizes that traveled at different speeds. The balanced-difficulty condition . Experimental schematic of the Gambling task procedure. This task was conducted in the fMRI experiment to independently localize reward ROIs incrementally increased difficulty by modifying four parameters: (1) the number of targets to collect, (2) the number of objects to avoid, (3) the rate at which objects moved, and (4) the size of the objects to be avoided. Extensive pretesting (not reported in this manuscript, although experiment three reports the validation of these pretests) was conducted to determine the correct parameters for each of these settings. Such a design draws directly from flow theory by assuming that task-related intrinsic reward is not driven by actual task outcomes (e.g., performance) but instead by the perception of a balance between task difficulty and individual ability. Importantly, this assumption is corroborated by a large body of literature (Csikszentmihalyi, 1975(Csikszentmihalyi, , 1990. We also provide empirical support for the assumption that self-reported intrinsic reward is highest during the balanced difficulty condition (see Results section) and thereby validate our experimental procedure.
In experiments 1 and 2, each condition lasted for a total of 4 minutes. Because experiment 3 was designed to validate an fMRI procedure that would employ a block-design, and a 4minute block is rather long and may create confounds with low-frequency scanner noise, we shortened each condition to 2 minutes in experiment 3 and the fMRI experiment. Self-reported measures of intrinsic reward were collected after each experimental condition in the behavioral experiments. Subjects completed each condition just once in experiments 1, 2, and 3, and these orders were randomized for all subjects. In the fMRI experiment, subjects completed a total of four runs where each run included all three conditions where each condition was separated by 57 s of rest (black screen) and 8 s of instructions. Conditions in the fMRI experiment were shown in a counterbalanced order. Researchers were not blind to the conditions.
In experiment 3, subjects then completed the threedimensional mental rotation, attentional vigilance, dualtasking, and targeting measures. In the fMRI experiment, subjects then completed an n-back and gambling task to localize independently the neural activity in key cognitive control and reward network regions of interest.
STRT and self-report data analysis The STRT data analysis plan was determined a priori, and the same analytic approach was applied for all experiments. All STRT observations were capped at 1,500 ms, and the harmonic mean response time was calculated for each subject for each condition (for extended justifcation for this analytic decision, see Ratcliff, 1993). Repeated measures ANCOVAs were calculated to assess how intrinsic rewards and reaction times differed across experimental conditions. In each model, the variable of interest (i.e., reaction time, self-reported intrinsic reward) was included as a within-subjects factor, and condition order was included as a between-subjects factor to control for possible order effects. Self-reported video game ability and baseline reaction time covariates also were included in models evaluating reaction times. Statistics from the multivariate tests are reported as these are more robust against any violations of assumptions of normalcy and sphericity.
fMRI acquisition, preprocessing, and analysis Data were acquired on a 3-tesla Siemens Magnetom Prisma scanner. Following recommendations established by the Human Connectome Project (Ugurbil et al., 2013), a multiband echo planar gradient sequence measured the blood oxygenated level-dependent contrast (TR = 720.0 ms, TE = 37.0 ms, FA = 52 degrees, FOV = 208 mm, multi-band acceleration factor = 8) with each volume consisting of 72 interleaved slices with a 2-mm isotropic spatial resolution acquired parallel to the AC-PC plane. A high-resolution T1-weighted sagittal sequence of the whole brain (TR = 2500.0 ms, TE = 2.22 ms, FA = 7 degrees, FOV = 241 mm, 0.9-mm isotropic resolution) was collected before functional scanning.
We first conducted analyses to evaluate brain activation in response to our experimental manipulation. Accordingly, a series of first-level GLMs were estimated for all subjects for all runs for the Asteroid Impact experimental conditions. Each block design model included an explanatory variable (EV) for each condition (i.e., low-difficulty, balanced-difficulty, highdifficulty), fixed for the entire duration of each condition, 120 s, which was convolved with a hemodynamic response function (gamma convolution = 6 s, SD = 3). Temporal derivatives of each EV also were included as covariates of no interest. Following a similar analytical logic established in related studies (Ulrich et al., 2016b(Ulrich et al., , 2014, planned contrasts modeled neural activations unique to each condition. These contrasts included: balanced-difficulty > low-and high-difficulty (2, −1, −1), balanced-difficulty > low-difficulty (1, −1), balanced-difficulty > high-difficulty (1, −1), low-difficulty > balanced-difficulty (1, −1) high-difficulty > balanced difficulty (1, −1), and high-difficulty > low-difficulty (1, −1) contrasts.
These first-level models were then carried forward into a second-level mixed effects analysis (FLAME; Woolrich, Behrens, Beckmann, Jenkinson, & Smith, 2004). No additional contrasts were constructed at the second-level. In line with recommendations for applying cluster-based corrections for multiple comparisons (Eklund, Nichols, & Knutsson, 2016;Woo, Krishnan, & Wager, 2014), we applied a cluster-based procedure to correct for multiple comparisons (Worsley, 2001) with a cluster defining threshold of Z = 3.1 and a cluster extent threshold of p < 0.0001. Structures were evaluated using FSL's probabilistic atlases and were cross-referenced with the Neurosynth database (Yarkoni, Poldrack, Nichols, Van Essen, & Wager, 2011).
A series of psychophysiological interaction analyses (PPI; Friston et al., 1997;Huskey, 2016) were then modeled to evaluate task-modulated functional connectivity between structures within cognitive control and reward networks. As discussed above, seed regions of interest (ROIs) were defined independently of our primary experimental task based on functional activations in the n-back and gambling localizer tasks. A 3-mm sphere was drawn around peak voxels for each ROI (in MNI152 space), warped to each subject's native space, and used to extract the neural timeseries from filtered functional data for each subject for each run. The first level PPI model included an indicator variable that encoded the balanced-difficulty > low-difficulty and high-difficulty contrast, a physiological EV, and an interaction term. Second level mixed-effects models were then estimated for each seed ROI. Given that PPI analyses tend to suffer from decreased statistical power (Friston et al., 1997;O'Reilly, Woolrich, Behrens, Smith, & Johansen-Berg, 2012) FSL's default cluster-based correction for multiple comparisons was applied with a cluster defining threshold of Z = 2.3 and a cluster extent threshold of p < 0.05. PPI results are reported for the interaction term, which reflects task-modulated changes in connectivity for the balanced-difficulty condition.
Experiment 3 tested whether the video game ability covariate is best evaluated using self-report or behavioral measures as well as the hypothesis that individual differences in intrinsic reward sensitivity predict task performance (Buetti & Lleras, 2016). Bivariate Pearson correlations were calculated to assess the relationship between subject's performance on each behavioral measure of ability and the total number of targets they successfully collected (M = 230.88, SD = 24.14, range = 119.00-274.00) while using Asteroid Impact (a measure of overall video game performance; Table 2). Self-reported video game ability (r = 0.337, p = 0.002), the standard deviation of reaction times during the dual-mixed procedure (r = −0.221, p = 0.043), and threedimensional mental rotation ability (r = 0.287, p = 0.008) were Table 2 Pearson correlations between theoretical predictors of task performance and actual Asteroid Impact video game performance. These data were collected in experiment 3. significantly correlated with Asteroid Impact performance. These three variables were then regressed on Asteroid Impact performance to further characterize the nature of this relationship. Selfreported video game ability was entered into the first block (adjusted R 2 = 0.094, F(1,82) = 9.628, p = 0.003) with dual-mixed standard deviation, three-dimensional mental rotation ability, and two-and three-way interaction terms entered in the second block (adjusted R 2 change = 0.012, F(5,77) = 2.646, p = 0.022). Selfreported video game ability was the only variable that significantly predicted Asteroid Impact performance (B = 0.324, p = 0.003). Therefore, it was again used as a covariate in subsequent reaction time analyses. For experiment 3, the items used to assess self-reported intrinsic reward showed acceptable internal consistency (Cronbach's α = 0.751) and the overall repeated measures ANCOVA models were significant for intrinsic reward (Wilks' λ = 0.406, F(2,80) = 58.432, p < 0.001) and reaction time (Wilks' λ = 0.310, F(2,78) = 86.698, p < 0.001). Again, intrinsic reward was the greatest and response times to a distracting secondary task were longest in the balanceddifficulty condition (Tables 3 and 4). The results from these three studies demonstrate that the experimental paradigm successfully manipulated levels of intrinsic reward and task difficulty. These results also suggest that, within the context of this experimental procedure, the STRTs may serve as a behavioral correlate of intrinsic reward.

Brain imaging experiment (study 4)
As a manipulation check, and reconfirming the pattern observed in behavioral experiments 1, 2, and 3, STRTs measured during the fMRI experiment were the longest in the balanceddifficulty condition (Wilks' λ = 0.095, F(2,9) = 42.96, p < 0.001; Table 4). Therefore, and following the rationale presented in the Introduction, we infer that our experimental procedure successfully manipulated intrinsic reward in an fMRI context.
Brain mapping results The brain mapping analysis yielded several clusters (Tables 5, 6 and 7). Consistent with previous findings (Klasen et al., 2012;Ulrich et al., 2016bUlrich et al., , 2014Yoshida et al., 2014), results show that the balanceddifficulty condition elicited robust neural activity in cognitive control, attentional, and reward structures. Specifically, the balanced-difficulty > low-difficulty and high-difficulty contrast ( Figure 8A) revealed broad activity in structures associated with cognitive control (dorsolateral prefrontal cortex; DLPFC), orienting attention (SPL, precentral gyrus), and attentional alerting (dorsoanterior insula). Neural activity also was observed in the putamen, a structure implicated in processing consummatory rewards during cognitive control tasks (Satterthwaite et al., 2007). Group-level parameter estimates for the DLPFC and putamen showed the characteristic inverted-U shaped pattern (Figure 9). The balanceddifficulty > low-difficulty as well as the balanced-difficulty > high-difficulty contrasts also were evaluated to aid in interpretation of these results. Activations in these contrasts are quite similar to the balanced-difficulty > low-difficulty and high-difficulty contrast. In fact, a comparison of the balanced-difficulty > low-difficulty and high difficulty to the balanced-difficulty > low-difficulty (Table 8) activation tables shows largely identical activations. However, the balanced- For each row, superscripted text indicates statistically significant pairwise comparisons after a Bonferroni correction for multiple comparisons at the p < 0.05 level.
Note that experiments 1 and 2 used a 4-item, 7-point scale (Bowman, Weber, Tamborini, & Sherry, 2013;Weber, Behr, & Bates, 2014), whereas experiment 3 used the 4-item, 5-point autotelic experience subscale (Jackson & Marsh, 1996). difficulty > high-difficulty contrast (Table 9) elicits activation in sensorimotor areas (e.g., premotor cortex, cerebellum, anterior precuneus), which are largely absent in the balanceddifficulty > low-difficulty and high-difficulty contrast. Further still, it is possible that the high-difficulty condition required similar levels of prefrontal control and reward processing as the balanced-difficulty condition. The high-difficulty > low-difficulty contrast also was evaluated to tease out differences between these conditions (Table 10). While both the balanced-difficulty > low-difficulty and highdifficulty > low-difficulty contrasts show similar activation patterns in occipital cortex, superior and middle frontal gyri, only the balanced-difficulty > low-difficulty contrast shows activations in cognitive control, reward, and salience network  structures such as the DLPFC, putamen, caudate nucleus, dorsoanterior, and posterior insula. By comparison, the low-difficulty > balanced-difficulty contrast ( Figure 8B) showed activity in structures commonly implicated in the DMN, particularly the dorsal and ventral medial prefrontal cortex (PFC), ventral posteromedial cortex, temporal pole, and hippocampus. Finally, the high-difficulty > balanced-difficulty contrast ( Figure 8C) revealed activity in the occipital fusiform gyrus, temporal pole, orbitofrontal cortex, and inferior temporal gyrus.
PPI results A series of PPI analyses was then conducted to characterize functional connectivity patterns between key cognitive control and reward structures in the balanceddifficulty condition > low-and high-difficulty condition. Independent seed ROIs were defined a priori for anticipatory (nucleus accumbens) and consummatory (putamen) reward structures as well as key cognitive control (dorsolateral prefrontal cortex, thalamus) ROIs. An a posteriori, and therefore exploratory, seed ROI also was evaluated for the right dorsoanterior insula-a structure that was implicated in the brain mapping results.
In the balanced-difficulty > low-and high-difficulty contrast, the bilateral nucleus accumbens showed functional connections with the occipital pole, paracingulate cortex, central operculum, DLPFC, middle temporal gyrus, and temporal-occipital fusiform cortex (Table 11; Figure  10a), whereas the bilateral DLPFC seed exhibited connectivity with the orbitofrontal cortex (OFC), frontopolar cortex, STG, central precuneus, and occipital fusiform gyrus with several clusters extending into the anterior cingulate (ACC) and paracingulate (PCC) cortices (Table 12; Figure  10b). Significant results were not observed when seeding from the putamen or thalamus.
When evaluating the exploratory ROIs, a seed ROI in the right dorsoanterior insula showed connectivity with somatosensory cortices, medial PFC, temporal and occipital cortex (Table 13; Figure 10c).

Discussion
Our self-report, behavioral, and fMRI hypotheses were largely supported. These results contribute to the nascent body of literature investigating the contributions of cognitive control and motivation to sustained control allocation during cognitively demanding tasks. In our study, we experientially manipulated the balance between task difficulty and individual ability, which resulted in different levels of intrinsic reward. Consistent with previous research (Keller & Bless, 2008;Ulrich et al., 2016bUlrich et al., , 2014Yoshida et al., 2014), a balance Table 7 Neural activity in the high-difficulty > balanced-difficulty contrast; cluster corrected for multiple comparisons with a cluster defining threshold of Z = 3.1 and a cluster extent threshold of p < 0.0001; coordinates are in MNI152 space.  between task difficulty and individual ability resulted in the highest levels of self-reported intrinsic reward. Moreover, high levels of intrinsic reward corresponded to increased task-related attentional engagement as demonstrated by longer reaction times in the balanced-difficulty condition compared to the low-and high-difficulty conditions. This result also is reflected in the neuroimaging data. Differential levels of motivation were associated with different brain sates. We now turn our focus to these key findings and their broader implications. Figure 9 Group-level parameter estimates for the DMPFC (34, 44, 32), VMPFC (0, 28, -14), and Putamen (-22, -2, 4). These voxels were selected based on peak activations reported in the brain activation analysis for each experimental condition.

Reward-processing and cognitive control
The behavioral and self-report measures indicate a successful experimental manipulation. Our fMRI results suggest intriguing updates to the nascent literature on cognitive control and motivation. First, our brain mapping results conform to previous findings implicating intrinsic reward processing during cognitive control tasks. Our novel contribution is in elucidating the functional connections between these structures. Of particular interest is the relationship between anticipatory and consummatory rewards during cognitive control. Our GLM-based results showed that the balanced-difficulty condition, relative to conditions of low-and high-difficulty, elicited activity in the putamen. This fits nicely with the notion this structure is implicated in consummatory reward processing (O'Doherty et al., 2004;Satterthwaite et al., 2007) and that a balance between task difficulty and individual ability elicits strong activity in this structure (Ulrich et al., 2016b(Ulrich et al., , 2014. However, a balance between difficulty and ability also has been shown to elicit activity in the ventral striatum, particularly the nucleus accumbens (Klasen et al., 2012). How do we account for these seemingly contradictory findings? One  possible answer is found in our PPI results when seeding from the ventral striatum. We show that the nucleus accumbens is more strongly functionally connected with the DLPFC when task difficulty is balanced with individual ability than when there is a mismatch between difficulty and ability. This result is consistent with the view that these two structures are implicated in reward anticipation and cognitive cost calculation (Botvinick, Huffstetler, & McGuire, 2009;Kool, McGuire, Wang, & Botvinick, 2013).
With that said, we did not design our study to manipulate directly the reward expectation, so it is difficult to tell if our results support the view that reward anticipation and consumption is dissociated between the dorsal and ventral striatum (O'Doherty et al., 2004) or, as some have suggested, if these structures subserve a common function related to either evaluating the cognitive costs associated with earning a particular reward (Vassena et al., 2014) or in consummatory reward processing (Pauli et al., 2016). It is entirely possible that there is no single neural correlate of intrinsic reward. Indeed, one current perspective argues that intrinsic and extrinsic rewards may not be dissociable at the neuroanatomical level, but instead at the temporal level where extrinsic rewards are temporally immediate and tangible where intrinsic rewards are less tangible and more temporally disperse . Our current study provides preliminary support for this view.
Admittedly, the naturalistic paradigm used in this study sacrifices some experimental control, and this poses some Table 11 Psychophysiological interaction results when seeding from the bilateral (right: 10, 16, -6; left: -10, 16, -6) nucleus accumbens in the balanced-difficulty > low-difficulty and high-difficulty contrast; cluster corrected for multiple comparisons with a cluster defining threshold of Z = 2.3 and a cluster extent threshold of p < 0.05; coordinates are in MNI152 space.  interpretation difficulties. While the putamen often is associated with reward processing, it also is implicated in task-learning. Specifically, the putamen shows strong activation for novel tasks, but this activation decreases for learned tasks (Jimura, Cazalis, Stover, & Poldrack, 2014a). Our decision to make two conditions consistent in terms of video game state (i.e., repeated play of the easiest or hardest conditions) may have allowed subjects to "learn" the low-and high-difficulty conditions, whereas the balanced-difficulty condition may be understood as a series of unlearned tasks. Putamen activation  also has been shown to increase during a response-inhibition task among subjects with high behavioral performance and decrease among subjects with low behavioral performance (Jimura et al., 2014b). Liberally interpreted, this suggests that putamen activation should increase in response to high behavioral performance. In our study, the low-difficulty condition yielded fast reaction times (high-behavioral performance) and was easy such that subjects had high levels of video game performance. Inconsistent with the liberal interpretation that putamen activation tracks high behavioral performance presented above, we see the highest levels of putamen activation in the balanced-difficulty > low-difficulty contrast (but not also in the balanced-difficulty > high-difficulty contrast). A more stringent test would be among conditions that are similarly novel and do not afford task-learning. This presents an interesting opportunity for future research. Similarly, the nucleus accumbens demonstrates sensitivity not only to extrinsic (e.g., monetary) reward anticipation but also to positive performance feedback (Daniel & Pollmann, 2010). We admit that experimentally accounting for this confound is not trivial. In our study, the balanced-difficulty condition provided positive performance feedback by increasing level difficulty (which remained invariant for the low-and high-difficulty conditions). However, positive performance feedback also was received during the low-difficulty condition when subjects successfully completed a level as they received a message indicating that they had beaten the level (this is the same message that subjects received in the balanced-and high-difficulty conditions). Accordingly, nucleus accumbens activation driven solely by levelcompletion feedback would be lost in the balanceddifficulty > low-difficulty contrast. It follows then, that remaining nucleus accumbens activation should track increases in difficulty, more closely aligning with the view presented above that this structure, in conjunction with the DLPFC, tracks reward anticipation and cognitive cost calculation. Nevertheless, this remains an important and unresolved issue for flow research as immediate and clear performance feedback is understood as a causal antecedent of flow (Csikszentmihalyi, 1990). Therefore, any manipulation of task-difficulty with individual ability is inherently conflated with different patterns of performance feedback.
Ultimately, the methodological limitations arising from the difficulty of manipulating intrinsic reward in a lab-setting constrain our interpretation of the results while suggesting new avenues for future research. Even with these considerations in mind, our results show that a balance between task difficulty and individual ability modulates reward-related subcortical processing and that these structures are functionally connected with frontocontrol structures during a cognitive control task. This finding provides novel evidence that intrinsic reward is associated with the allocation of cognitive control during sustained task performance.

Low levels of intrinsic reward and contributions to DMN activity
In the present study, we show different brain activity and functional connectivity patterns in the balanced-difficulty condition compared to the low-difficulty and high-difficulty conditions. While the balanced-difficulty condition elicited activity in structures commonly implicated in cognitive control and reward processing, the low-difficulty condition showed activations in the DMN. Such a finding is consistent with previous results showing that the DMN is down-regulated when there is a balance between task difficulty and individual ability (Ulrich et al., 2016a). Further evidence shows that failures to suppress the DMN are associated with lapses in attention (Weissman, Roberts, Visscher, & Woldorff, 2006) and decreased performance during cognitive control tasks (Kelly, Uddin, Biswal, Castellanos, & Milham, 2008).
Interestingly, we also see that STRTs were generally faster during the low-difficulty condition. This result, in conjunction with the observed activations in key DMN structures, provides additional evidence that the lowdifficulty condition required low levels of cognitive control. Moreover, it contextualizes the extent to which lowdifficulty tasks can be performed automatically or at least with very low levels of cognitive control (Vatansever, Menon, & Stamatakis, 2017). This, combined with previous evidence showing that boring video game play (Mathiak et al., 2013) and a mismatch between difficulty and ability (Ulrich et al., 2016a(Ulrich et al., , 2016b(Ulrich et al., , 2014, is associated with DMN activity, provides converging evidence that different levels of intrinsic reward may be driving the shift between DMN activation during low-difficulty and cognitive control network activation during the balanced-difficulty conditions. Less clear is why similar DMN activation patterns were not observed in the high-difficulty condition. One possible explanation might be found in the STRT patterns observed during this condition. There is some evidence that attention to a secondary task does not necessarily increase when the primary task is difficult or even in response to increases in extrinsic rewards (Buetti & Lleras, 2016). Our high-difficulty condition had the second longest STRTs across all three of our behavioral studies. This suggests that subjects may have allocated more cognitive resources to the video game stimulus during this condition, even though the condition was rated as being comparatively low in intrinsic reward. Further experimentation is needed to determine if and at what level of mismatch between task difficulty and individual ability results in levels of task disengagement that correspond to DMN activation.
One intriguing possibility implicated by our exploratory PPI analyses is that the dorsoanterior insula may be involved in shifts between DMN and cognitive control networks. Foundational empirical investigations provide a network-level model for these switches (Sridharan, Levitin, & Menon, 2008), which is further supported by meta analytic results from Neurosynth (Yarkoni et al., 2011), implicating the insula (and broader salience network) in shifts between cognitively demanding tasks and task disengagement (Chang et al., 2013). The consistency between structures identified in our study with those identified in the reward-motivated cognitive control literature hints at a network-level architecture. Follow-up work using Asteroid Impact or similar naturalistic tasks should adopt the latest methodological advances in network neuroscience (Bassett & Sporns, 2017) to interrogate the way in which shifts in motivation drive dynamic shifts between frontoparietal control and DMN as facilitated by the insula.

Motivation drives task-related attentional engagement
One critique of the emerging cognitive control and motivation literature is that the highly controlled experimental tasks employed typically rely on extrinsic and not intrinsic rewards . In this study, we sacrificed some experimental control in favor of developing a task that allowed for modulating intrinsic rewards. As a failsafe, we used STRTs as a behavioral measure of the extent to which variation in intrinsic reward entrained attentional engagement with the task. The rational for this measure capitalizes on the insight that motivation has a curvilinear influence on task-related attentional engagement (Lang, 2000). This result is born out in our STRT data and is consistent with previous findings (Lang et al., 2006). That our STRT data show the same inverted U-shaped pattern as our selfreported intrinsic reward measure suggests that STRTs may serve as a behavioral correlate of intrinsic reward, particularly during motivationally relevant tasks. With that said, two important constraints are worth noting. First, the absolute mean STRT differences between conditions are quite small, thereby obscuring inferences about the magnitude of intrinsic rewards. A second issue is that STRTs are only a useful index of intrinsic reward when there is a firm understanding of how the stimulus balances task difficulty and individual ability. Nevertheless, our behavioral and neuroimaging results demonstrate that intrinsic reward motivate different levels of task engagement.
The synchronization theory of flow: alternative theoretical explanations and future opportunities The results reported in this manuscript are situated within the context of reward-motivated cognitive control (Botvinick & Braver, 2014;Braver et al., 2014). Specifically, we used flow theory (Csikszentmihalyi, 1975) as a guide for manipulating intrinsic reward and the synchronization theory of flow (Weber et al., 2009) as a guide for making informed predictions about the neural basis of flow experiences. Accordingly, and consistent with the latest developments in flow theory (Harris et al., 2017b;Weber et al., 2016), we interpret our findings in terms of intrinsic-reward motivated cognitive control. Our results seem to fit nicely with both theory and previously published empirical results.
Some readers might question our decision to frame these issues in terms of cognitive control. From its earliest conceptualization, cognitive control research has focused on the processes that enable goal-directed behavior (Miller, 2000;Miller & Cohen, 2001), which modern evidence shows is motivated by reward (Botvinick & Braver, 2014;Braver et al., 2014). Such a high-level process necessarily requires multiple lowerlevel processes including attention, working memory, reward processing, sensory motor coordination, etc. We consider attention (using STRTs) and reward processing (using selfreport) in the present study, but most certainly do not account for these other processes. One might reasonably ask, if attention is a component of the process of interest, why not frame this manuscript in classic attentional terms (Fan, McCandliss, Fossella, Flombaum, & Posner, 2005;Posner, Inhoff, Friedrich, & Cohen, 1987;Raz & Buhle, 2006)?
Interestingly, the original formulation of synchronization theory did exactly that (see p. 406 in Weber et al., 2009), framing flow from Posner's tripartite theory of attention (Posner et al., 1987). While synchronization theory originally acknowledged executive attention as a potential component of flow, the theory primarily considered the phenomenon in terms of better specified processes (Raz & Buhle, 2006), such as alerting and orienting attention. The theory was later reformulated in terms of cognitive control to better specify the goal-directed nature of flow experiences (Weber et al., 2016). While cognitive control and executive attention both explain a considerable number of empirical findings and are often used to interpret similar processes (Long & Kuhl, 2018), there are important distinctions between the two models (Petersen & Posner, 2012). Once a sufficient body of evidence has accumulated in this area, it will be important to examine which model best accounts for the data.
Until then, and as we have taken pains to point out above, the potential for alternate explanations exists. The conditions in the present study do not systematically vary or otherwise control for a number of potential confounds including different event rates, different levels of feedback, different levels of visual complexity, etc. These differences allowed us to manipulate the balance between individual ability and task difficulty, which is central to flow theory. Along the way, we have endeavored to account for alternate explanations introduced by these confounds. Despite these limitations, we see results that are consistent with previous studies, which give us confidence in the findings.
Future research should, to the extent that is possible, seek to resolve these issues. We admit, as others have before us (Bohil, Alicea, & Biocca, 2011;Maguire, 2012;K. Mathiak & Weber, 2006;Spiers & Maguire, 2007), that designing naturalistic interventions with suitable levels of experimental control is a nontrivial task. However, and as has been forcefully argued by Marr (1982) and his contemporaries (Krakauer, Ghazanfar, Gomez-Marin, Maciver, & Poeppel, 2017), a focus on naturalistic behavior is essential if we are to advance our understanding to the mind/brain. To that end, we are pleased to offer an opensource stimulus, Asteroid Impact, so that interested researchers can adapt, replicate, and extend the paradigm in their own laboratories (Poldrack et al., 2017).

Conclusions
In their earliest writings, Miller and Cohen (Miller, 2000;Miller & Cohen, 2001) indicated that motivation may play a role in cognitive control. In the decades that have followed, most of the research in this area has treated the two as separable processes by choosing to focus on cognition rather than motivation. However, an emerging perspective argues that higher order cognitions and their resulting behaviors are not easily reducible to their lower-level constitute parts, especially when considering the relationship between cognition and motivation (Pessoa, 2008). Our results fit within this framework by showing how task-elicited differences in motivation are associated with shifts in task-related reward perceptions, attentional allocation, and control allocation.