Introduction

Working memory and attention control are two hallmarks of the human cognitive system. Working memory allows people to store, manipulate, and transform information, even when it is not immediately available to their senses. Attention control allows people to perform goal-directed mental activity in the presence of potent distraction. However, working memory is capacity-limited, and people can only manage a small set of representations or goals at one time. Attention control also is fallible. Occasionally, people experience lapses of attention, such as distraction by external or internal information (e.g., mind-wandering). Like most human traits, the capacity of working memory and the ability to control attention are normally distributed in the population (Schor et al., 2020). Both working memory capacity and attention control abilities range from weak to strong even in healthy young adults. Importantly, individual differences in working memory capacity and attention control predict a host of important outcomes, such as academic aptitude (Engle et al., 1999; Mrazek et al., 2012), reading comprehension (Daneman and Carpenter, 1980; McVay and Kane, 2012; Peng et al., 2018; Robison and Unsworth, 2015; Unsworth and McMillan, 2013), second language acquisition (Miyake and Friedman, 1998; Linck et al., 2014; Kormos and Sáfár, 2008), and emotion regulation (Groves et al., 2020; Schmeichel and Demaree, 2010; Schmeichel et al., 2008). Perhaps the most heavily researched correlate of working memory capacity is fluid intelligence: the ability to reason with novel and abstract information to solve problems (Chuderski, 2013; Conway et al., 2002; Engel de Abreu et al., 2010; Engle et al., 1999; Fry and Hale, 2000; Kyllonen and Cristal, 1990; Unsworth et al., 2014; Unsworth and McMillan, 2014). Therefore, several decades of research have been dedicated to understanding the derivation of such individual differences from cognitive, developmental, and biological perspectives. The current study was designed to test theories that propose that the locus coeruleus-norepinephrine (LC-NE) system underlies individual differences in cognitive abilities, specifically working memory, attention control, and fluid intelligence (Tsukahara and Engle, 2021a; Unsworth and Robison, 2017a).

Unsworth and Robison (2017a) theorize that an important source of individual differences may be the relative functioning of the locus coeruleus-norepinephrine (LC-NE) system. The LC comprises a pair of small nuclei in the brainstem that release most of the NE into the cortex. Consequently, NE amplifies the gain of target cortical networks, enhancing activity that produces goal-relevant behaviors and suppressing activity that produces goal-irrelevant behaviors (Berridge and Waterhouse, 2003; Aston-Jones and Cohen, 2005). The LC has diverse projections into brain networks that are particularly active while people are controlling attention and using working memory and thus has an important role in cognition (Arnsten and Li, 2005; Berridge and Waterhouse, 2003; Samuels and Szabadi, 2008; Sara, 2009). LC neurons demonstrate two modes of firing: tonic and phasic. Tonic activity is slow and rhythmic, which has been proposed as an arousal regulating role. Such steady delivery of NE into the brain allows a person to maintain alertness when necessary. Phasic activity constitutes brief, bursting neuronal firing, and it occurs in response to a behaviorally salient event (Aston-Jones and Cohen, 2005; Berridge and Waterhouse, 2003).

The Adaptive Gain Theory of LC function (Aston-Jones and Cohen, 2005) argues that moderate tonic LC activity is necessary for optimal phasic responding. Extremely low tonic arousal can induce drowsiness, whereas extremely high tonic arousal can induce stress and indiscriminate responding. Therefore, when attentiveness is important, it is best for a person/organism to maintain a consistent and moderate tonic arousal level. Building on this idea, Unsworth and Robison (2017a) posit that there are at least two distinct manifestations of differential LC-NE system functioning: tonic arousal regulation and phasic responsiveness. Tonic arousal regulation keeps an individual at a moderate level of arousal, which is at least partially driven by tonic LC activity, releasing a stable amount of NE into the brain to maintain alertness. In the current study, we refer to the stability and consistency of arousal as tonic arousal regulation, or sometimes more simply as arousal regulation. The second important role of the LC is to deliver NE into task-critical cortical networks in the precise moments during which an important neural computation must be performed. This function corresponds to event-driven, phasic, bursting activity of LC neurons. We refer to this as phasic responsiveness. Unsworth and Robison (2017a) propose that NE delivery from the LC to cortex amplifies gain in target cortical networks (e.g., the fronto-parietal network), which allows it to exert control over the default mode network (DMN), to produce goal-relevant behaviors. When this happens, people stay task-focused and avoid attentional lapses.

The LC-NE theory of individual differences in working memory and attention control makes two straightforward predictions: people with more regulated tonic arousal and greater phasic responsiveness will demonstrate higher working memory capacity and stronger attention control (Unsworth and Robison, 2017a). When people have poorly regulated tonic arousal, they have more fluctuations in arousal from moment to moment. Consequently, they will slip into states of hyper- and hypoarousal more frequently, which are suboptimal for goal-directed cognition. Therefore, people with poorly regulated tonic arousal should show lower estimates of attention control, working memory capacity, and fluid intelligence. Additionally, if the phasic responding system is working suboptimally, less NE is delivered to the cortical networks that implement external, goal-directed activity (e.g., frontoparietal control network). Thus, Task goals are not executed as well, resulting in poor cognitive performance.

Measuring the LC-NE system in people is a nontrivial problem. Neuroimaging of LC-NE activity is difficult because of the LC’s small size and location in the brainstem. Recent advances in neuromelanin-sensitive magnetic resonance imaging provide a promising direction (Betts et al., 2019; Clewett et al., 2016; Dahl et al., 2022; Keren et al., 2009; Shibata et al., 2006; Sasaki et al., 2006). However, applying this method at scale is costly, making it hard to test theories of individual differences. Therefore, more indirect measures of LC-NE functioning are necessary. Pupil diameter is a viable candidate. Historically, pupil dilation has been used as a measure of mental effort (Beatty, 1982a; Kahneman, 1973). As examples, the pupil dilates when people attempt to solve mathematical operations (Bradshaw, 1968; Hess and Polt, 1964; Payne, Parry, and Harasymiw, 1968), encode information into working and long-term memory (Beatty & Kahneman, 1966; Kahneman and Beatty, 1966; Kahneman and Peavler, 1969; Peavler, 1974), exert cognitive control (Laeng et al., 2011; van der Wel and van Steenbergen, 2018), perform perceptual discriminations (Beatty, 1982b; Kahneman and Beatty, 1967; Strauch et al., 2020; Urai et al., 2017), and make speeded responses to stimuli (Massar et al., 2016; Unsworth and Robison, 2016).

The pupillary effort signal has long had an indeterminate source. For example, Hess and Polt (1964) simply said it could be used as an index of “total mental activity” (P. 1191). However, recent work has demonstrated that it may be a downstream effect of phasic LC activity. For example, simultaneous recording of pupil diameter and neural firing in mice and non-human primates has demonstrated a tight temporal linkage between pupil diameter and patterns of both tonic and phasic LC neuron firing rates (Joshi et al. 2016; Joshi and Gold, 2020; Reimer et al., 2016). Furthermore, the LC BOLD response correlates with pupillary dilations, both at rest and during an oddball task (Murphy et al., 2014). In another study, the magnitude of pupillary responses during multiple-object tracking correlated with LC responsiveness during that same task (Alnæs et al., 2014). Although pupil diameter will be an imperfect proxy for LC activity, as pupillary dilations are probably affected by several different neuromodulatory systems (e.g., dopaminergic, cholinergic, serotonergic), we believe that there is enough evidence connecting the LC-NE system to pupillary dynamics such that pupil diameter can been used as an indirect measure of both tonic LC activity and phasic LC responsiveness.

In support of the LC-NE theory of individual differences, several recent investigations have observed correlations among measures of working memory, attention control, tonic arousal regulation, and phasic responsivity via pupillometry. For example, Unsworth and Robison (2015) measured working memory with a change-detection task (Luck and Vogel, 1997) while measuring pupil diameter. To estimate individual differences in tonic arousal, they computed pupil diameter during a fixation period preceding each trial. Then, to operationalize arousal regulation, they computed the coefficient of variation (CoV) of pretrial pupil diameter across all trials within an individual. Participants with a high CoV of pretrial pupil diameter were considered to have poor tonic arousal regulation. To measure phasic responsiveness, Unsworth and Robison computed the magnitude of the pupillary response over the 4,000-ms working memory delay. Importantly, both arousal regulation and phasic responsiveness correlated with capacity (k) estimates. Furthermore, arousal regulation and phasic responsiveness were uncorrelated, and each explained significant variance in capacity. This same pattern of correlations has been observed with various iterations of change-detection tasks (Unsworth and Robison, 2018), a discrete whole-report working memory procedure (Robison and Unsworth, 2019), and at the latent level using change-detection working memory tasks that varied in memoranda type (Robison and Brewer, 2020). Therefore, both arousal regulation and phasic responsiveness seem to be important individual differences underlying working memory capacity.

In another study, Unsworth and Robison (2017b) performed a factor-level analysis of working memory capacity, attention control, tonic arousal regulation, and phasic responsiveness. When measured via pupillometry taken during two attention tasks—the psychomotor vigilance task and the color-word Stroop task, both phasic pupillary responsiveness and tonic arousal regulation significantly predicted latent factors for attention control and working memory capacity. Unsworth and Robison (2017b) also found that arousal regulation and phasic responsiveness correlated with self-reports of task-unrelated thoughts (e.g., mind-wandering and external distraction). As an additional piece of evidence in favor of an LC-NE account, two recent studies have found correlations among arousal regulation, phasic responsiveness, and long-term memory abilities. In the first, Madore et al. (2020) showed that arousal regulation during the encoding phase of a recognition memory task negatively correlated with subsequent memory performance. In that same study, poor arousal regulation also predicted more self-reported media multitasking, a measure of real-word distractibility. In the second study, Robison et al. (2022b) demonstrated correlations between arousal regulation, phasic responsiveness, and long-term memory abilities in a free recall task. Specifically, people with more regulated tonic arousal and greater phasic, pupillary responsiveness during the encoding period tended to have better recall. Importantly, arousal regulation and phasic responsiveness were uncorrelated and accounted for separate sources of variance in recall abilities. Therefore, arousal regulation and phasic responsiveness seem to predict not only attention control and working memory capacity, but related abilities such as long-term memory (Madore and Wagner, 2022).

Tsukahara and colleagues (Tsukahara et al., 2016; Tsukahara and Engle, 2021a, 2021b) also have proposed a crucial role for the LC for determining individual differences in cognitive ability, specifically fluid intelligence. Tsukahara et al. (2016, 2021) hypothesize that there are individual differences in the functional connectivity between the LC and cortical networks that implement higher-order cognitive functions, such as executive control, goal maintenance, and disengagement, in the resting-state brain. To support this hypothesis, Tsukahara et al. point to a positive correlation between a common executive-attention ability and resting pupil size. They argue that resting pupil size, given its downstream connection to the LC, can be used as a proxy for functional connectivity between the LC and cortex. While the two theories both implicate the LC-NE system as an underlying factor for individual differences in cognitive abilities, there is one critical difference between Tsukahara and Engle’s (2021a) theory and Unsworth and Robison’s (2017a) theory. Whereas Tsukahara and Engle argue that there are important individual differences in the functional organization of the resting state brain, Unsworth and Robison argue that these individual differences primarily arise during active states, especially in situations that demand controlled attention. Of course, it is possible that both resting state and active state LC connectivity are important individual differences. It also could be the case that people who have strong resting, functional connectivity between the LC and cortex also have relatively better tonic LC regulation and phasic responsiveness during active states. Therefore, the theories are not mutually exclusive. However, they do make different predictions. Tsukahara and Engle’s (2021a) resting-state theory of LC functional connectivity makes the prediction that individual differences in baseline pupil size, measured at rest in the absence of any goal-directed mental activity, will correlate with individual differences in executive-attention and fluid intelligence. On the contrary, Unsworth and Robison’s (2017a) theory predicts that pupillary measures of tonic arousal regulation and phasic responsiveness during active, goal-directed cognition will correlate with individual difference in working memory, attention control, and fluid intelligence. Testing these predictions was the current study’s central goal.

To be fair, it is important to note that several recent findings are inconsistent with both the active state and resting state theories of LC-NE function. First, Robison and Brewer (2022) tried to extend the active state LC-NE theory to fluid intelligence. As mentioned earlier, one of the reasons why working memory and attention control are deemed important is because they predict higher-order cognitive abilities, such as fluid intelligence. In a latent variable analysis, Robison and Brewer (2022) found correlations among arousal regulation, attention control, and self-reported, task-unrelated thoughts. However, the correlations between arousal regulation and working memory capacity and between arousal regulation and fluid intelligence were near-zero. This finding is inconsistent with the LC-NE theory. Similarly, Robison and Brewer (2020) did not find a correlation between tonic arousal regulation and a factor formed by complex span measures of working memory, despite finding a correlation with the change-detection measures of working memory during which pupillometry was measured. Third, Aminihajibashi et al. (2019, 2020a) have found correlations between tonic arousal regulation and working memory, both at rest and during a cognitively demanding multiple-object tracking task. However, in-task arousal regulation did not predict fluid intelligence, and phasic responsiveness during multiple-object tracking and Posner cueing tasks did not correlate with either fluid intelligence or working memory capacity (Aminihajibashi et al., 2020b). Collectively, these findings pose an issue for the LC-NE theory of working memory capacity and attention control. If indeed tonic arousal regulation and phasic responsiveness are important sources of variation driving individual differences in goal-directed cognition, such correlations should have been observed. There is at least one potential explanation for these discrepancies. Robison and Brewer (2020, 2022) focused on tonic arousal regulation in the two studies mentioned, and they did not simultaneously investigate phasic LC responsiveness.

Regarding Tsukahara et al.’s resting state theory of LC-NE functioning, there also is limited evidence for the correlation between pupil size and cognitive ability, even fluid intelligence more specifically. For example, a recent meta-analysis of 26 available studies on the correlation between working memory capacity and pupil size estimated the correlation to be near zero (Unsworth, Miller, and Robison, 2021a). Furthermore, Robison and Brewer (2022), Robison et al. (2022b), Aminihajibashi et al. (2019) have all found null correlations between resting pupil diameter, working memory capacity, attention control, and fluid intelligence. In a large sample of more than 4,500 young adults, near-zero correlations were observed between measures of working memory capacity, global cognition, executive functioning, and episodic memory (Coors et al., 2022). However, this study did observe a small and significant correlation between processing speed and resting pupil size (r = 0.13). Finally, Robison, Coyne et al. (2022a) measured resting pupil size and cognitive abilities among a sample of 845 members of the U.S. Navy. The latent correlation between a general cognitive ability factor, comprising measures of fluid intelligence (Raven Advanced Progressive Matrices), attention control (antisaccade, Sustained Attention to Cue Task), working memory (digit span, change-detection, mental counters), and resting pupil size was near zero, as well (r = 0.02). Therefore, beyond the studies published by Tsukahara et al. (2016, 2021b), there is limited evidence for a correlation between resting pupil size and cognitive abilities, such as working memory capacity, attention control, and fluid intelligence.

Current study

The primary goal of the current study was to test the relative contribution of pupillary indices of tonic arousal regulation and phasic responsiveness to individual differences in working memory capacity, attention control, and fluid intelligence. We posed several questions. First, to what extent do tonic arousal regulation and phasic responsiveness similarly, or perhaps differentially, predict individual differences in attention control, working memory capacity, and fluid intelligence? Unsworth and Robison’s (2017b) theory of LC-NE functioning predicts that tonic arousal regulation and phasic responsiveness will correlate with stronger attention control, higher working memory capacity, and higher fluid intelligence. Unsworth and Robison’s theory also argues than tonic arousal regulation and phasic responsiveness are distinct individual differences. That is, they constitute separable (yet potentially related) aspects of LC-NE functioning and thus should account for unique sources of variance in cognitive ability. Third, to what extent are the relations among pupillary measures of arousal regulation and phasic responsiveness generalizable across the tasks and participant samples in which they are measured? In previous studies, pupillary measures have been embedded into both attention control and working memory tasks. The present study directly compared whether it indeed matters into which tasks these measures are embedded.

The second goal was to test predictions by the resting state theory of LC-NE functioning (Tsukahara et al., 2016; Tsukahara & Engle 2021b; Tsukahara, Draheim, & Engle 2021). If there are individual differences in functional connectivity between the LC and cortex at rest, and these individual differences can be captured by baseline pupil size, larger baseline pupil diameter should be associated with higher working memory capacity, stronger attention control, and higher fluid intelligence. More specifically, Tsukahara et al. have made the argument that this correlation is particularly strong between resting-pupil size and fluid intelligence. Studies 3 and 4 had a baseline/resting pupil measurement at the beginning of a session, allowing us to test this prediction.

A third goal of the current study was to rule out potential alternative explanations for the previously observed correlations among pupillary measures and cognitive abilities. For example, to what extent are arousal regulation and phasic responsiveness stable individual differences that are consistent across different days of measurement? In all previous work, tonic and phasic pupillary indices were measured on the same day of administration, thus introducing the potential for state-specific factors to be misinterpreted as stable individual differences. Therefore, Study 3 administered multiple pupillary measures on two different days to assess their trait stability.

Finally, the current study attempted to rule out a plausible alternative explanation that individual differences in tonic arousal and phasic responsiveness are due to intrinsic motivation and/or overall alertness levels. To this end, Studies 2 and 3 asked participants to self-report their current levels of motivation and alertness at various points throughout the lab sessions. In Study 4, participants were asked to rate their subjective sleepiness at the beginning of the session and about their general intrinsic motivation using a questionnaire at the end of the session.

The current study comprises four separate datasets. Study 1 is a reanalysis of a previously published dataset (Robison and Brewer, 2020) and included two types of working memory measures—complex span and visual-arrays tasks—with pupillary measures taken during the visual arrays tasks. Study 2 is an unpublished dataset, including three attention tasks with pupillary measures taken during each of the three tasks. Study 3 includes measures of working memory, attention, fluid intelligence, and resting-pupil size. Pupillary measures of tonic-arousal regulation and phasic, pupillary responsiveness were collected during the attention tasks. Study 4 comprised a largely overlapping set of measures as Study 3. Studies 2, 3, and 4 have not been published, although Study 4 was reported as a dissertation by author K. J. Ralph (2019). We want to be clear that Studies 2, 3, and 4 were not designed and implemented with the intention of being published simultaneously. However, we believe there is value in including the studies in a single report for several reasons. First, the studies were all specifically designed to test the LC-NE account of individual differences in working memory capacity and attention control. Additionally, the studies measured pupillary dynamics during different sets of tasks, allowing for a test of the task- and domain-generality of tonic and phasic pupillary measures. Second, Studies 3 and 4 both had the goal of extending the range of abilities examined to including working memory, attention control, and fluid intelligence. Third, Studies 3 and 4 measured resting pupil size during a pre-experimental baseline, allowing for a test of the predictions made by the resting state theory of LC-NE functioning (Tsukahara et al., 2016, 2021). Fourth, the studies share substantial overlap in the chosen measures of respective constructs. Therefore, there was a fair amount of consistency in how the factor-level cognitive constructs were estimated. Finally, the studies were conducted at different universities and on different age samples. Therefore, it allowed us to assess the sample- and age-independence of the relations among cognitive abilities, resting pupil size, tonic arousal regulation, and phasic responsiveness. To preview the results, the pattern of correlations among these factors was consistent across studies. Thus, the results paint a rather clear picture of the relations.

Study 1

Study 1 has already been reported by Robison and Brewer (2020). However, in their analysis, they focused on individual differences in arousal regulation and did not examine phasic responsiveness. Their reasoning was that the visual arrays tasks, during which the pupillary dynamics were measured, included an intermixing of precued, retrocued, and noncued trials. Therefore, the three trial types produced substantially different task-evoked pupillary responses during the working memory delays (Figure 1), leading Robison and Brewer (2020) to believe that the task-evoked responses would not be reliable across trial types and tasks. This turned out to be an incorrect assumption. Therefore, we reanalyzed their data with an additional focus on phasic responsiveness. Overall, the goal was to test whether arousal regulation and phasic responsiveness each made distinct and significant contributions to variance in working memory capacity as measured by a set of complex span and visual arrays tasks, as predicted by Unsworth and Robison Table 1 (2017a).

Table 1 Summary of constructs and tasks in each study
Fig. 1
figure 1

Task-evoked pupillary responses for the color (A), orientation (B), and letter (C) visual arrays tasks by trial type in Study 1

Method

Transparency and openness

We report all measures, how we determined our sample size, and all data exclusions, when relevant. Tasks were delivered using E-Prime software (Psychology Software Tools, Inc.) The behavioral and pupillary data were both analyzed by using custom-written R scripts using the tidyverse (Wickham et al., 2019), data.table (Dowle & Srinavasan, 2020), psych (Revelle, 2015), and lavaan (Rosseel, 2012) packages. Plots were generated using ggplot2 (Wickham, 2016) and cowplot (Wilke et al., 2019). All data files and the R scripts used to analyze and process the data are available on the Open Science Framework (https://osf.io/w8cyg/).

Participants and procedure

A sample of 213 participants completed the study (M age = 18.70, SD = 1.06, 109 females, 104 males). Our goal was to achieve a minimum sample size of 200 participants, and we used the end of the academic semester as our stopping rule for data collection. Participants completed all the tasks within a single 2-hour session. At the beginning of the session, participants completed informed consent and demographics forms. Participants then completed these tasks in the following order: color change-detection, operation span, orientation change-detection, symmetry span, letter change-detection, and reading span. At the end of the session, participants completed a computerized self-report measure of attention deficit/hyperactivity disorder symptoms. However, these data were collected as pilot data for a separate project, and they are not analyzed in the present study. All participants were treated according to the ethical principles of the American Psychological Association. During the visual arrays tasks, participants were seated in a chinrest positioned 60 cm from a Tobii T-1750 eye-tracker. During the complex span tasks, participants were seated at a different computer in the same room. Participants rotated between a visual arrays and complex span task to give participants a break from the chinrest.

Tasks

Operation span (Unsworth et al., 2005)

Participants solved math operations (e.g., 2 x 5 – 4 = ?) while remembering sets of letters. On each list, a math operation appeared, and a participant answered regarding its veracity. Then, a letter appeared on-screen for 1 second. This process repeated for a list of three to seven items. Then, at the end of the list, participants recalled the letters that appeared on the list in forward serial order. Participants were given credit for any letter reported in the correct serial position. There were two lists of each list-length, for a total possible score of 50.

Symmetry span (Unsworth et al., 2009)

Participants remembered spatial locations while making symmetry judgments. On each list, a 4 x 4 grid appeared, and one of the locations would flash red for 1 second. Then, a black and white pattern appeared, and the participant decided whether the pattern was symmetrical about its y-axis. This process repeated for two to five items. At the end of the list, participants reported the sequence of spatial locations in forward serial order. Responses were marked correct for every item reported in the correct serial position. There were two lists each of set sizes two through five for a maximum possible score of 28.

Reading span (Unsworth et al., 2009)

Participants read sentences (e.g., The prosecutor’s case was lost, because it was flimsy) while remembering sets of letters. On each list, a sentence appeared; the participant answered regarding its syntactic sensibility and then saw a letter for 1 second. Nonsense sentences were created by replacing one word in an otherwise sensible sentence. This process repeated for a list of three to seven items. At the end of the list, participants recalled the letters that appeared on the list in forward serial order. They were given credit for any letter reported in the correct serial position. There were two lists of each list-length, for a total possible score of 50.

Color visual arrays/change-detection (Luck and Vogel, 1997)

Each trial started with a black fixation cross against a gray background for 2,000 ms. On neutral and retrocued trials, the fixation screen was followed by a 750-ms blank screen. On precued trials, the fixation screen was followed by a 250-ms blank screen. Then, a directional cue (< or >) appeared at the center of the screen for 250 ms and then another 250-ms blank screen. The target items appeared for 250 ms. Three target items always appeared on the left and right hemifields of the screen. The target items were colored squares each subtending 3° of visual angle. The colors were sampled randomly without replacement from seven preselected colors. On precued and neutral trials, target items were followed by a 4,000-ms blank delay screen. On retrocued trials, after 1,750 ms, a directional arrow cue (< or >) appeared at the center of the screen for 250 ms. Then, there was another 2,000-ms blank delay. Thus, across trial types, the total duration of trials was identical. On both precued and retrocued trials, the cues informed participants with 100% validity from which hemifield the target item would be tested. At the end of the delay screen, the items reappeared with one square circled by a black ring. The participants’ task was to decide whether this square was the same color or a different color than its initial presentation. Participants made their responses by pressing two keys marked “S” for same and “D” for different (the “F” and “J” keys on the keyboard). After a 1,000-ms blank delay screen, the next trial began. There were 126 trials in the task, which took about 25 minutes to complete. The dependent variable from the task was the proportion correct. Thought probes appeared after 16 randomly sampled trials.

Orientation visual arrays/change-detection (Luck and Vogel, 1997)

Each trial started with a black fixation cross in black text against a gray background for 2,000 ms. On neutral and retrocued trials, the fixation screen was followed by a 750-ms blank screen. On precued trials, the fixation screen was followed by a 250-ms blank screen, then a categorical cue (the word “blue” or “red”) appeared at the center of the screen for 250 ms, followed by another 250-ms blank screen. The target items then appeared for 250 ms. The target array always included three blue bars and three red bars. The target items were colored bars each subtending 3° of visual angle. The orientations of the bars were sampled with replacement from four possible orientations: horizontal, vertical, tilted 45° to the left, or tilted 45° to the right. On precued and neutral trials, target items were followed by a 4,000-ms blank delay screen. On retrocued trials, after 1,750 ms, the categorical cue (“red” or “blue”) appeared at the center of the screen for 250 ms. Then, there was another a 2,000-ms blank delay. The cues informed participants with 100% validity from which color category the target item would be tested. At the end of the delay screen, the items reappeared with one bar indicated by a white dot in its center. The participants’ task was to decide whether this bar was the same orientation or a different orientation than its initial presentation. Participants made their responses by pressing two keys marked “S” for same and “D” for different (the “F” and “J” keys on the keyboard). After a 1,000-ms blank delay screen, the next trial began. There were 126 trials in the task, which took about 25 minutes to complete. The dependent variable from the task was the proportion correct. Thought probes appeared after 16 randomly sampled trials.

Letter change-detection (Robison and Brewer, 2020)

Each trial started with a fixation cross in black text against a gray background for 2,000 ms. On neutral and retrocued trials, the fixation screen was followed by a 750-ms blank screen. On precued trials, the fixation screen was followed by a 250-ms blank screen. Then, a spatial cue outlining three of the six target locations appeared for 250 ms and then another 250-ms blank screen. The target items appeared for 250 ms. Six target letters, sampled without replacement from all consonants, appeared at six preset locations, spaced equally around an invisible circle. On precued and neutral trials, target items were followed by a 4,000-ms blank delay screen. On retrocued trials, after 1,750 ms, the spatial cue appeared for 250 ms. Then, there was another a 2,000-ms blank delay. At the end of the delay screen, the letters reappeared with one letter outlined by a black box. The participants’ task was to decide whether this letter was the same or a different letter than its initial presentation. Participants made their responses by pressing two keys marked “S” for same and “D” for different (the “F” and “J” keys on the keyboard). After a 1,000-ms blank delay screen, the next trial began. There were 126 trials in the task, which took about 25 minutes to complete. The dependent variable from the task was the proportion correct. Thought probes appeared after 16 randomly sampled trials.

Thought probes

Interspersed throughout the visual arrays tasks were screens that asked participants, “What were you thinking about just prior to when this screen appeared?” There were five response options: (a) I was totally focused on the current task; (b) I was thinking about my performance on the task or how long it is taking; (c) I was distracted by sights/sounds around me or by physical sensations (e.g., hungry/thirsty); (d) I was thinking about things unrelated to the task (i.e., mind-wandering); (e) I wasn’t thinking about anything/my mind was blank. Participants were instructed to press the key that best described their preceding thoughts. During the instructions to the tasks, participants were told that it is perfectly normal to mind-wander, zone out, or get distracted from time to time on tasks like these, and that they should answer the questions honestly and accurately. For the correlational analyses, we summed reports of options (a), (b), and (c) and divided by 16 (the total number of probes) into a lapse rate variable.

Pupillometry

A Tobii T-1750 eye-tracker continuously recorded pupil diameter and gaze position from both eyes at 60 Hz during the three visual arrays tasks. Participants sat with head position fixed in a chinrest positioned 60 cm from the screen. Participants completed the complex span tasks on a

separate computer in the same room. The only light in the room came from the light produced by the two computer monitors. Missing data due to blinks and off-screen fixations were removed. If a participant was missing more than 50% of trials for a task, we excluded pupil data from that task from the analyses.

Data analysis

Pupil data were processed by first removing invalid data points as tagged by Tobii eye-tracker. Then, pupil samples <2 mm or >8 mm were removed (Mathôt et al., 2018; Mathôt and Vilotijević, 2022). The dependent variables were intraindividual variability in pretrial pupil diameter and the average task-evoked pupillary response (TEPRs) during the 4,000-ms delay. To obtain these estimates, we computed the average pupil diameter during each trial’s fixation period. Then, for each participant, we computed their mean pretrial pupil diameter across all trials and the standard deviation of their pretrial pupil diameter. CoV was then computed as the standard deviation divided by the mean. Because higher values of CoV would indicate dysregulation, we then multiplied these values by −1, so a factor formed by these measures would indicate more regulated arousal instead of dysregulated arousal. TEPRs were computed on a trial-by-trial basis by taking the mean pupil diameter from the last 200 ms of the stimulus period as a baseline and subtracting all subsequent samples in a trial from this value. Data were then down-sampled by averaging samples into 100-ms–wide windows. Finally, each trial’s TEPR was computed as the mean change from baseline over the 4,000-ms delay period. The average of this value across all trials was used as the measure of phasic responsiveness. The data were analyzed using all available pairwise comparisons in the variance-covariance matrix using the missing = “ml” command in the cfa() and sem() functions in lavaan.

Exclusions

No participants needed to be removed as outliers in the behavioral data. However, there were occasional calibration issues with the eye-trackers and tasks freezing and crashing mid-session. These data points were removed from the analysis, rather than excluding participants who encountered these issues listwise. One participant was removed listwise, because a fire drill sounded during their session. For the eye-tracking data, we separately examined trials and participants for missing data. Trials in which more than 50% of samples were missing were excluded. Then, participants with more than 50% of trials missing were excluded from the analyses. Achieved sample sizes for each individual measure are listed in Table 2.

Table 2 Descriptive statistics for measures in Study 1

Results

Descriptive statistics are listed in Table 2, and zero-order correlations among all measures are listed in Table 3. The analysis was largely similar to that reported by Robison and Brewer (2020). We specified a confirmatory factor analysis in which operation span, symmetry span, and reading span loaded onto a factor, the color, orientation, and letter visual arrays tasks loaded onto a factor, task-unrelated thought (TUT) proportions from the color, orientation, and letter tasks loaded onto a factor, pretrial pupil variability (CoV) from the color, orientation, and letter tasks loaded onto a factor, and average TEPR from the color, orientation, and letter tasks loaded onto a factor. The five factors were allowed to correlate. The model fit the data well, χ2(80) = 172.74, p < 0.001, CFI = 0.93, TLI = 0.91, RMSEA = 0.07 90% confidence interval [CI] [0.06, 0.09], SRMR = 0.05.Footnote 1 The factor loadings and interfactor correlations are listed in Table 4.

Table 3 Zero-order correlations among measures in Study 1
Table 4 Latent factor loadings and interfactor correlations for latent variable model in Study 1

To summarize the interfactor correlations, the complex span and visual arrays factors correlated but were clearly distinguishable factors. Both factors negatively correlated with the TUT factor, meaning that people who performed better on the complex span and visual arrays tasks tended to self-report fewer TUTs. Finally, the predictions made by the LC-NE theory of individual differences (Unsworth and Robison, 2017a) were supported by five of the six latent correlations involving phasic responsiveness and arousal regulation. Specifically, participants who demonstrated greater phasic responsiveness tended to have higher complex span performance, higher visual arrays performance, and reported fewer instances of TUTs. Arousal regulation significantly correlated with the visual arrays and TUT factors, but not with the complex span factors, as originally reported by Robison and Brewer (2020). Finally, there was no correlation between arousal regulation and phasic responsiveness. Figure 2 shows scatterplots of the interfactor correlations.

Fig. 2
figure 2

Scatterplots of correlations among factors in Study 1

Discussion

Study 1 was a reanalysis of a previously published dataset (Robison and Brewer, 2020). Although correlations among complex span, visual arrays, TUTs, and arousal regulation had previously been reported, Robison and Brewer (2020) did not examine individual differences in phasic responsiveness. Based on the large differences in TEPRs between the precued, retrocued, and neutral trials in that study, and the variation in waveforms, we did not anticipate being able to extract reliable measures of phasic responsiveness. However, TEPRs from the three tasks intercorrelated, and despite the large differences across tasks and trial types in the waveforms, average TEPRs showed reasonable reliability estimates.

Collectively, the results supported the arousal regulation/phasic responsiveness theory of Unsworth and Robison (2017a). Tonic arousal regulation was correlated with better performance on the visual arrays tasks and fewer TUTs. Greater phasic responsiveness was associated with higher complex span performance, higher visual arrays accuracy, and fewer TUTs. The only prediction made by Unsworth and Robison (2017a) not supported in Study 1 was a near-zero correlation between complex span and arousal regulation. Such a correlation was observed by Unsworth and Robison (2017b) but not by Robison and Brewer (2022). Studies 3 and 4 bring more data to bear on this issue.

Study 2

Study 2 examined individual differences in tonic arousal regulation, phasic responsiveness, and attention control, and whether these associations could be explained by intrinsic motivation or alertness. Given previous work that showed that motivation and alertness are strong correlates of laboratory-based measures of attention control (Robison and Unsworth, 2018; Unsworth, Miller, and Robison, 2021b), we wondered whether the association between arousal regulation and/or phasic responsiveness with attention control could be explained by these individual differences. To answer this question, we gave participants a set of three attention tasks while recording pupillary dynamics. After seeing practice trials and before completing the experimental trials of each task, we asked participants to self-report their motivation to perform well on the task and their current level of alertness. Then, we examined whether individual differences in motivation and/or alertness correlated with attention control, arousal regulation, and phasic responsiveness. Subsequently, we examined whether motivation and alertness could account for the associations between the pupillary factors and attention control.

Method

Participants and procedure

The sample included 306 participants, who were all students at the University of Texas at Arlington. Our goal was to collect data from as many participants as possible during one academic semester, with a minimally viable sample of 200. After exclusions (see below), the remaining participants had an average age of 19.15 (SD = 2.35), 19% identified as African American/black, 6% as Native American, 25% as Asian, <1% as Native Hawaiian or Pacific Islander, 52% as white, and 44% as Hispanic or Latino; 73% identified as female, 25% as male, and 2% as nonbinary. The laboratory session lasted 1 hour, for which participants were compensated with partial course credit. Participants first read and signed an informed consent form then completed a brief demographic questionnaire. Then, they completed the psychomotor vigilance task (PVT; Dinges and Powell, 1985), the Sustained Attention to Response Task (SART; Robertson et al., 1997), and the antisaccade task (Hutchison et al., 2020). The study protocol was approved by the Institutional Review Board of the University of Texas at Arlington. All participants were treated according to the ethical standards of the American Psychological Association.

Tasks

PVT

Participants were randomly assigned to one of three conditions by the program. The only differences between conditions were the background and font color of the stimuli. In the black condition, the stimuli were presented as white text against a black background; in the gray condition, the stimuli were presented as black text against a gray background; and in the white condition, the stimuli were presented as black text against a white background. Each trial started with a 2-second fixation screen with crosses (+++++) centered on the screen. Then, a row of zeros appeared at the center of the screen (00.000). All stimuli were printed in size 32 Arial bold font. After an unpredictable amount of time ranging from 2 to 10 seconds, the zeros began counting forward like a stopwatch. Participants were instructed to press the spacebar as quickly as possible to stop the numbers from scrolling. Upon response, the timer stopped, revealing their reaction time (e.g., 00.324) for 1 second. After a 1-second blank intertrial interval, the next trial began. In all conditions, participants completed five practice trials followed by 100 experimental trials. Between the practice trials and experimental trials, participants were asked to rate how motivated they felt to perform well on the task on a 1 (not at all motivated) to 9 (extremely motivated) scale. Then, they were asked to rate how alert they felt on a 1 (extremely tired, fighting sleep) to 7 (extremely alert) scale. The dependent variable was each participant’s slowest 20% of reaction times.

SART

Each trial started with a mask (#) for 250 ms. Then, a single digit (1 to 9) appeared for 250 ms, followed by a second mask (#) for 1,750 ms. The masks and stimuli were printed in size 32, black, Arial bold font against a gray background. The participants’ task was to press the spacebar whenever they saw any digit other than 3. When they saw the digit 3, they were to withhold their response. There were 27 practice trials followed by 360 experimental trials. Between the practice trials and experimental trials, participants were asked to rate their motivation and alertness in the same way as in the PVT. Participants were randomly assigned to one of two conditions. In the dominant condition, participants were required to use their dominant hand to make their responses. In the nondominant condition, participants were required to use their nondominant hand to make their responses. The other hand was placed in a device measuring heart rate and galvanic skin response from their index and middle fingers, but this was only done to mask participants as to the purpose of the manipulation and to prevent them from switching hands. The dependent measure was the standard deviation of reaction times (RT SD) on “go” trials.

Antisaccade

Each trial started with a fixation screen (+++ in size 24, black, Arial font against a gray background) for 2,000 ms. Then, a warning cue (*** in size 28, Courier New, bold font) appeared for either 2,000 or 3,000 ms. A white flashing cue (*) then appeared on either the right or left size of the screen for 350 ms. A target (O or Q in size 28 Courier New black font) then appeared on the opposite side of the screen for 100 ms, followed by a backward mask (##) until the participant made a response. Participants used the “O” and “Q” keys on the keyboard to identify the target. There were eight slow-paced practice trials in which the target stayed on-screen for 500 ms, then six practice trials of the fast-paced version with a 100-ms target duration, and then 100 experimental trials. Between the practice trials and experimental trials, participants rated their motivation and alertness just like they did in the PVT and SART. The dependent variable was the proportion of correctly identified targets.

Pupillometry

Pupil data were collected via Gazepoint GP3HD (Gazepoint, Inc.) eye-trackers sampling binocularly at 150 Hz. Tasks were delivered and pupil data collected from the eye-trackers via PyGaze (Dalmaijer et al., 2014). The room was dark, with no external lighting except for the computer monitor. Participants completed the tasks with their heads placed in a chinrest. Data were processed offline using custom written R scripts. Missing and invalid data points are automatically flagged by the Gazepoint API. These samples were removed from the data. Then, the data were screened by removing pupil diameters <1 mm or >9 mm.Footnote 2 Data points were not interpolated. The analysis scripts used to process and analyze the data are available on this manuscript’s OSF page.

Pretrial pupil diameter

For the PVT and antisaccade task, pretrial pupil diameter was computed as the mean pupil diameter over the 2,000-ms pretrial fixation screen for each participant for each trial. For the SART task, pretrial pupil diameter was computed as the mean pupil diameter during the 250-ms pretrial fixation screen for each participant for each trial. Then, the mean and standard deviation of these values across all trials were computed for each participant for each task. Finally, CoV was computed by dividing each participant’s standard deviation by their mean.

TEPRs

TEPRs were computed for each trial, task, and participant by computing the mean change from baseline over a 2,000-ms wide window. The dependent variable was the mean TEPR across all trials.

Exclusions

To ensure multivariate normality and reduce the impact of outliers, we took several steps to exclude data points based on preset criteria. For the SART, values for RT SD >500 ms were and people with hit rates <75% were removed (N = 17). For the PVT, values for slowest quintile outside 1,000 ms and who had more than 10 trials removed for falling outside 200 and 3,000 ms were removed (N = 17). For eye-tracking data, any trials with more than 50% of samples missing were excluded, and any participants with more than 50% of trials missing were then excluded (PVT: N = 19, SART: N = 17, zero removed for antisaccade). Participants were only included if they were between the ages of 17 and 35 to avoid potential age-related confounds. This eliminated two participants. After these exclusions, there was one participant with an extreme value for pretrial CoV on SART, so this value was set to missing. All other missing data points were due to computer errors or experimenter error.

Results

Descriptive statistics are listed in Table 5, and zero-order correlations among the measures are listed in Table 6. The grand-averaged TEPRs for each task are depicted in Fig. 3. As shown in Fig. 3A, the TEPRs were substantially different across the three PVT conditions, F(2, 262) = 8.42, p < 0.001. Therefore, we recomputed the TEPRs while controlling the condition. This residualized variable was used in the latent variable modeling. There were no differences in performance on the PVT across conditions, F(2, 265) = 0.39, p = 0.68), and there also were no differences in pretrial CoV across conditions, F(2, 263) = 0.61, p = 0.54). Therefore, no corrections were made to the behavioral or pretrial pupil data from the PVT. Although it was not a critical part of the study, the dominant/nondominant manipulation in the SART was included to examine whether participants are more likely to commit no-go errors when using their dominant hand. However, there were no group differences in either commission error rate (dominant: M = 0.38, SD = 0.20; nondominant: M = 0.37, SD = 0.18; t(280) = 0.04, p = 0.97) or RT SD (dominant: M = 138.01, SD = 56.20; nondominant: M = 139.50, SD = 73.14). Therefore, no corrections were made to the data from the SART.

Table 5 Descriptive statistics for measures in Study 2
Table 6 Zero-order correlations among measures in Study 2
Fig. 3
figure 3

Average task evoked responses for the psychomotor vigilance task (A), Sustained Attention to Response Task (B), and antisaccade task in Study 2 (C)

We specified a confirmatory factor analysis in which behavioral performance on the antisaccade, SART, and PVT loaded onto an attention control factor, average TEPRs from each task loaded onto a phasic responsiveness factor, CoV of pretrial pupil diameter from each loaded onto an arousal regulation factor, motivation ratings from each task loaded onto a motivation factor, and alertness ratings from each task loaded onto an alertness factor. The fit of this model was not good (χ2(80) = 285.61, p < 0.001, CFI = 0.87, TLI = 0.82, RMSEA = 0.09 [0.08, 0.11], SRMR = 0.06). The poor fit seemed to be due to a large residual correlation between the motivation and alertness ratings in the PVT and the motivation and alertness ratings in the antisaccade. These large correlations make sense given these ratings were made on back-to-back screens. Allowing the residual variances from PVT motivation and PVT alertness to correlate and allowing the residuals from antisaccade motivation and antisaccade alertness significantly improved the fit of the model, Δχ2(2) = 117.65, p < 0.001, and led to a good model fit overall, χ2(78) = 167.96, p < 0.001, CFI = 0.94, TLI = 0.92, RMSEA = 0.06 [0.05, 0.08], SRMR = 0.06. The factor loadings and interfactor correlations are listed in Table 7.

Table 7 Factor loadings and interfactor correlations for latent variable model in Study 2

To summarize the pattern of correlations, both phasic responsiveness and arousal regulation significantly correlated with attention control, as predicted by Unsworth and Robison (2017a). People with stronger attention control tended to exhibit greater phasic responsiveness and more regulated tonic arousal. Unlike Study 1, there was a significant correlation between phasic responsiveness and arousal regulation, such that people who tended to have greater phasic responsiveness tended to have more regulated tonic arousal. Attention control also significantly correlated with both motivation and alertness. People who reported being more motivated and alert tended to perform better on the tasks. Both higher motivation and higher alertness also were significantly associated with greater phasic responsiveness, and there was a modest yet significant correlation between tonic arousal regulation and motivation. Figure 4 shows scatterplots of the interfactor correlations.

Fig. 4
figure 4

Scatterplots of correlations among factors in Study 2

Given the correlations among attention control, phasic responsiveness, arousal regulation, motivation, and alertness, it was worth investigating whether the self-reported motivation and alertness could account for the relations between pupillary measures and attention control. It could be that people tend to show greater phasic responsiveness, more regulated arousal, and higher attention control simply because they are feeling more alert and motivated during the tasks. To test this hypothesis, we specified a factor-level structural regression in which phasic responsiveness, tonic regulation, motivation, and alertness predicted attention control. This model is depicted in Fig. 5. Collectively, the factors accounted for 48% of the variance in attention control. But only phasic responsiveness had a significant direct path to attention control. Thus, all other factors tended to correlate with attention control because of their shared variance with phasic responsiveness. On its own, phasic responsiveness accounted for about 35% of the variance in attention control.

Fig. 5
figure 5

Structural equation model predicting attention control from phasic responsiveness, tonic arousal regulation, motivation, and alertness in Study 2. Collectively, the four predictors accounted for 48% of the variance in attention control. Only phasic responsiveness had a significant direct path on its own. Solid lines indicate significant paths at p < 0.05. Dashed lines indicate non-significant paths

Discussion

Study 2 tested the arousal regulation/phasic responsiveness theory of LC-NE (Unsworth and Robison, 2017a) in addition to one alternative hypothesis that arousal regulation and/or phasic responsiveness might be due to individual differences in task-specific states, such as motivation and alertness. Although phasic responsiveness significantly correlated with both motivation and alertness, these factors did not fully account for the relationship between phasic responsiveness and attention control. Therefore, the motivation/alertness alternative hypothesis was not supported by the data. In fact, in the simultaneous structural regression model, only phasic responsiveness had a significant direct effect on attention control. It is worth noting that the motivation ratings may have been influenced by a ceiling effect, with average ratings around 7 on a 9-point scale and a substantial proportion of people reporting the maximal rating of 9 (33% of people in PVT, 27% in SART, and 19% in antisaccade). In general, people with stronger attention control tended to exhibit greater phasic responsiveness as measured by pupillary responses and more regulated tonic arousal, supporting the theoretical predictions made by Unsworth and Robison (2017a).

Study 3

Study 3 was designed to test the LC-NE theories of Unsworth and Robison (2017a) and Tsukahara and Engle (2016, 2021a, 2021b). Like Study 2, it was designed to follow-up Study 1 and Robison and Brewer (2022). Study 3 significantly expanded upon Study 1, Study 2, and Robison and Brewer (2022) in several ways. First, Study 2 measured all tasks within a single session, leaving open the question of whether arousal regulation and phasic responsiveness are truly trait-level individual differences or are better described as state-specific variables. Second, like Study 2, Study 3 measured motivation and alertness but also expanded the set of cognitive abilities to include attention control, working memory, and fluid intelligence. This allowed for an extension of Unsworth and Robison’s (2017a) theoretical predictions to higher-order cognitive abilities. Third, tasks were delivered on two separate days, allowing for an assessment of the trait stability or state-specificity pupillary indices that presumably measure functioning of the LC-NE system. This facet of the study allowed for a test of an alternative hypothesis that phasic responsiveness and tonic arousal regulation are not trait-stable individual differences but rather state-specific aspects of cognitive functioning. Furthermore, we could examine whether individual differences in pupillary dynamics are influenced by both trait- and state-level variation. Finally, Study 3 included a measure of baseline/resting pupil diameter to test predictions made by Tsukahara et al. (2016, 2021).

Method

Participants and procedure

The experimental protocol occurred over two laboratory sessions, each lasting 1 hour. The sessions were an average of seven days apart, but always at least two and no more than 14 days apart. Participants were not required to participate on the same day of the week and time of day across sessions. This aspect of the study was specifically chosen to create a discrepancy between sessions in terms of circadian factors while reducing attrition. However, many participants (60%) did indeed choose to participate in the two sessions exactly 7 days apart, because those times worked within their weekly schedules. During the first session, participants completed an informed consent form, a demographics questionnaire, a self-report questionnaire, a resting/baseline pupil measurement and then the operation span, psychomotor vigilance task (PVT), Raven Advanced Progressive Matrices, the arrow flanker task, and the antisaccade task. During the second session, participants again completed the self-report questionnaire and resting-pupil measurement and then the symmetry span, Sustained Attention to Response Task, number series, Stroop, letter sets, and reading span tasks. Pupillometry data were collected during the PVT, flanker, SART, and Stroop tasks. Participants completed the tasks alone in a dimly lit room.

The target sample size was 200 participants, based mostly off Study 1, but we also used a recent simulation study (Kretzschmar and Gignac, 2019) to ensure our sample size was large enough for latent variable analyses. The final sample included 302 participants who completed at least one session (M age = 19.47, SD = 3.51; 153 identified as female, 152 as male, and 1 as nonbinary; 55% identified as white, 22% as Hispanic or Latino, 10% as Asian, and 7% as black).Footnote 3 All participants were students at Arizona State University, and they were compensated with partial course credit for their participation. The experimental protocol was approved by the Institutional Review Board of Arizona State University. All participants were treated according to the ethical standards of the American Psychological Association.

The study began in September 2019 but was not completed until May 2022. Our target sample size was 250 participants. However, midway through data collection, our lab operations ceased due to COVID-19. The experiment resumed in February 2022 when it was deemed safe to do so. We used the end of the Spring 2022 semester as our stopping rule for data collection.

Demographics and self-reports

Participants self-reported their age, gender identity, native language, race, ethnicity, height, and handedness at the beginning of the first session. Then, during both sessions, participants self-reported the approximate number of hours they had slept the previous night, whether they had consumed any caffeine in the previous 4 hours, any nicotine in the previous 4 hours, any alcohol in the previous 24 hours, or were taking any medication that they believed would negatively affect their memory or speed of response. These data were collected as potential influences on resting and/or in-task pupillary dynamics. None of these factors affected resting pupil diameter, arousal regulation, or phasic responsiveness. Therefore, they are not reported in any depth in the analyses. However, few participants reported having used caffeine (session 1: 28%, session 2: 27%), nicotine (session 1: 11%, session 2: 10%), or alcohol (session 1: 3%, session 2: 4%) before coming to the lab.

Tasks

Working memory capacity

Operation span

This task was identical to that used in Study 1.

Symmetry span

This task was identical to that used in Study 1.

Reading span

This task was identical to that used in Study 1.

Attention control

Psychomotor vigilance task

This task was nearly identical to that used in Study 2, except all participants completed the task with black font against a gray background. There were five practice trials followed by 63 experimental trials. A thought probe appeared pseudo-randomly after eight trials. Thought probes were spaced to appear no fewer than six and no more than eight trials apart.

Computation of pretrial pupil diameter and baseline-correction of evoked pupillary responses was identical to the flanker and Stroop tasks. The grand-averaged pupillary response is plotted in Fig. 6A. The pupillary response on each trial was computed by averaging the change from baseline over the window from 0 to 2,000 ms after stimulus onset (i.e., the start of the counter).

Fig. 6
figure 6

Grand-averaged task-evoked pupillary responses (TEPRs) from the Psychomotor vigilance task (A), Arrow flanker task (B), Stroop task (C), and Sustained Attention to Response task (D) in Study 3

Arrow flanker (Eriksen and Eriksen, 1974; Stoffels and Van der Molen, 1988)

Each trial began with a 1-second fixation screen with black crosses (+++++) centered against a gray background. Then, a row of directional arrows appeared in the same location as the fixation crosses. The participants’ task was to press a key corresponding to the direction of the center arrow. On congruent trials, the center and flanking arrows pointed in the same direction (e.g., < < < < <). On incongruent trials, the center and flanking arrows pointed in opposite directions (e.g., < < > < <). Participants pressed the M key for a right-pointing arrow and the Z key for a left-pointing arrow. After pressing a key, a mask of lowercase xs appeared in the same location as the stimulus (e.g., xxxxx) for 2 seconds. Then, the next trial began. The dependent variable was the difference in reaction times between incongruent and congruent trials. Thought probes appeared pseudo-randomly after 15 trials, and were set to appear at least 5 and no more than 8 trials apart.

Pretrial pupil diameter was computed for each participant for each trial as the mean pupil diameter during each 1-second fixation screen. Task-evoked responses were baseline-corrected on a trial-by-trial basis. First, we standardized pupil diameter within each trial for each participant. This process reduced interindividual noise in pupillary changes based on individual differences in overall pupil size. Then, we computed the change from baseline (in standardized units) by taking mean pupil diameter during the 200-ms window immediately preceding stimulus onset and calculating the difference from this value in subsequent 50-ms wide bins. Then, for each participant, the average task-evoked response was computed for congruent and incongruent trials separately. These average responses are plotted in Fig. 6B. Finally, for each trial, the mean change from baseline was computed, and this value was then averaged across all trials to give the mean TEPR score for the task.

Antisaccade (Kane et al., 2001)

This task was similar to that used in Study 2. Each trial started with a row of asterisks in cyan (RGB: 0, 255, 255) size 28 Courier bold font for 200 - 1,800 ms. Then, an cue (=) flashed on the right or left side of the screen for 300 ms (on for 100 ms, off for 50 ms, on for 100 ms, and off again for 50 ms). Then, a target letter (a B, P, or R) appeared on the opposite side of the screen for 100 ms, followed by a masking letter (H) for 100 ms and then the digit 8 until a participant made their response. The participants’ task was to press a key corresponding to the target letter (the number pad keys 1, 2, and 3 for B, P, and R, respectively). There were ten response-mapping trials where the target letter appeared at the center of the screen, ten prosaccade trials were the target letter appeared on the same side of the screen as the flashing cue, ten practice antisaccade trials, and 60 experimental antisaccade trials. The dependent variable was the proportion correct on the antisaccade experimental trials.

Sustained attention to response task

This task was similar to that used in Study 2. On each trial, a single digit appeared at the center of the screen in size 32 arial font for 250 ms, followed by an X in size 32 arial font for 2,000 ms. There were 30 practice trials followed by 210 experimental trials. The dependent variable was the standard deviation of reaction times on “go” trials.

Because the SART trials were stacked more closely together than the Stroop, flanker, and PVT trials, “pretrial” pupil diameter was computed as the mean pupil diameter during the last 250 ms of a trial (meaning this value was not computed for trial 1). Then, this value was subtracted from all subsequent windows for the baseline-corrected evoked pupillary response. Pupillary responses are plotted in Fig. 6D, separately for hits (correctly pressing the spacebar to all digits except 3), correct rejections (correctly withholding a key press for 3s), misses (incorrectly withholding a key press to digits other than 3), and false alarms (incorrectly pressing the spacebar on 3s). We used the mean pupillary response over the window from 0 to 2,000 ms after stimulus onset for all trials.

Color-word Stroop (Stroop, 1935)

Each trial started with a fixation screen, on which five crosses (+++++) appeared in black text against a gray (RGB: 122, 122, 122) background. The fixation screen lasted for a jittered duration between 1,800 and 2,200 ms. Then, one of three words appeared: red, green, or blue, in size 32 arial bolded font against the same gray background. On congruent trials, the word appeared in the color matching the word. On incongruent trials, the font color and the word differed. The participant’s task was to press a key corresponding to the font color of the word as quickly and accurately as possible. The stimulus word remained on-screen until a participant made a response. After pressing a key, a mask appeared in the same color font with the same number of characters as the stimulus (e.g., #####) for 2 seconds. Then the next trial began. Only reaction times from accurate trials were used. Participants first completed 15 response-mapping trials on which a colored square appeared, then ten practice trials, followed by 102 total experimental trials (67% congruent, 33% incongruent). A thought probe appeared pseudo-randomly after 16 trials, spaced at least five trials and no more than eight trials apart. The dependent variable was the average reaction time on incongruent trials minus the average reaction time on congruent trials.

Pretrial pupil diameter and baseline correction for task-evoked responses were performed identically to the flanker task. The average evoked pupillary response for congruent and incongruent trials is plotted in Fig. 6C. Like the flanker task, the average task-evoked response was computed for each trial by taking the mean dilation from baseline over the window from 0–2,000 ms post stimulus-onset.

Fluid intelligence

Number series (Thurstone, 1938)

On each trial, a sequence of numbers appeared, and the participant’s task was to select from a set of five possible options the number that best continued the sequence. Participants had four and a half minutes to complete as many trials as possible, with a maximum possible score of 15.

Raven advanced progressive matrices (Raven et al., 1962)

On each trial a 3 x 3 grid appears with patterned shapes in each cell. The bottom-right piece of the grid is missing, and the participants’ task is to select the piece that best completes the pattern in the grid from a set of eight options. Participants had 10 minutes to complete the 18 odd-numbered items. The dependent variable was the number of correctly reported items.

Letter sets (Ekstrom and Harman, 1976)

On each trial, a set of four different four-letter sets appeared. Among the sets, three of the four sets followed an implicit rule, and one of the four sets violated this rule. The participant’s task was to select the set of letters that violated the rule. Participants had 5 minutes to complete as many trials as possible, with a maximum possible score of 20.

Motivation and alertness ratings

After the operation span, symmetry span, flanker, Stroop, PVT, and SART tasks, participants rated their motivation and alertness levels on a scale from 1 (extremely unmotivated; not alert at all) to 7 (extremely motivated; extremely alert). The motivation rating said, “Please rate how motivated you felt to perform well on the task.” The alertness rating said, “Please rate how alert you feel right now.” Participants used the keyboard to make their responses.

Thought probes

Thought probes were included in the flanker, Stroop, PVT, and SART tasks. The thought probes asked participants, “What were you thinking about in the few seconds preceding this screen?” There were five response options: (a) I was focused on the current task; (b) I was thinking about my performance on the task; (c) I was distracted by sights/sounds in my environment; (d) I was thinking about things unrelated to the task (e.g., daydreaming); and (e) My mind was blank. Consistent with prior research, responses (c), (d), and (e) were considered TUTs. The TUT proportion response proportion was used as the dependent variable.

Pupillometry

Pupil data were collected via Gazepoint GP3HD (Gazepoint, Inc.) eye-trackers sampling binocularly at 150 Hz. Room lighting was dim; a single lamp was placed in the corner of the room. Participants completed the tasks with their heads placed in a chinrest. For the tasks not involving eye-tracking, participants were permitted to have their chins hovering above or behind the chinrest if desired. Data were preprocessed offline using custom written R scripts. The data and scripts can be found on this manuscript’s OSF page.

Pre-experimental baseline

After completing the informed consent and demographics questionnaire, participants fixated on a black cross against a gray background for 3 minutes. From this time window, the mean pupil diameter was computed for each participant. The lighting level in the run room was about 36 lux and the brightness from the monitor was about 17 lux.Footnote 4

Exclusions

Two participants were excluded listwise from the analysis because they fell outside the target age range (18–35 years). Forty-six participants completed Session 1 but not Session 2, either due to cancellations, or no-shows. An additional 14 participants completed Session 1 in March, 2020 but the lab shut down due to COVID-19, and the study did not resume until February, 2022. These participants were retained as they still contributed data for half the experimental tasks. To ensure adherence to task instructions and that the data were multivariate normal, we used the following exclusion criteria for individual tasks. For the Stroop and flanker tasks, participants with less 60% accuracy were excluded (one participant for Stroop, 44 for flanker). One additional participant’s Stroop score was excluded because it was well outside the group mean. For the PVT, we excluded two participants whose slowest quintile of trials was outside 1,000 ms. For the SART, we excluded 16 participants because their RT SD fell outside 500 ms. All other missing data points were because of eye-tracking malfunction/data missingness, computer errors (e.g., freezes, crashes, etc.), experimenter errors (incorrect subject number input, accidentally skipping a task, etc.). For eye-tracking data, any trials with more than 50% of samples missing were excluded, and any participants with more than 50% of trials missing were then excluded. Achieved sample sizes for each measure are listed in Table 8. For latent variable modeling, the full variance-covariance matrix from all available data points was used (i.e., participants were not removed listwise, unless otherwise noted above).

Table 8 Descriptive statistics for measures in Study 3

Results

Descriptive statistics for all measures are listed in Table 8, and zero-order correlations are listed in Table 9. The Stroop and flanker effects had low reliability estimates, which is now a well-documented issue with these tasks (Draheim et al., 2019; Enkavi et al., 2019; Feldman and Freitas, 2016; Hedge et al., 2018; Rey-Mermet et al., 2019; Rouder and Haaf, 2019; Whitehead et al., 2019, 2020). Our initial set of analyses confirmed the hypothesized latent structure of the data using confirmatory factor analysis. First, we specified a model on the behavioral data in which the operation span, symmetry span, and reading span tasks loaded onto a working memory factor; the antisaccade, psychomotor vigilance, SART, Stroop, and flanker data loaded onto an attention control factor, and the Raven, number series, and letter sets loaded onto a fluid intelligence factor. The fit of this model was acceptable (χ2(41) = 77.30, p = 0.001, CFI = 0.93, TLI = 0.90, RMSEA = 0.05 [0.04, 0.07], SRMR = 0.05). Working memory capacity, attention control, and fluid intelligence correlated in a manner consistent with previous work (Table 10).

Table 9 Zero-order correlations among measures in Study 3
Table 10 Factor loadings and interfactor correlations for latent variable model in Study 3

Next, we examined whether resting pupil diameter correlated with the latent cognitive constructs by adding mean baseline pupil diameter to the model as an exogenous variable. Baseline pupil measurements from Session 1 and Session 2 correlated highly (r(221) = 0.75 [0.69, 0.81], p < 0.001). Therefore, we averaged these two measures. Baseline pupil diameter did not correlate with the working memory (r = 0.09, [−0.05, 0.23], p = 0.21), attention control (r = 0.04 [−0.11, 0.19], p = 0.62) or fluid intelligence factors (r = 0.04 [-0.11, 0.19], p = 0.64). Therefore, we did not observe evidence consistent with a resting state account of LC-NE function and cognitive ability (Tsukahara et al., 2016; Tsukahara & Engle, 2021b; Tsukahara et al., 2021). Scatterplots of the interfactor correlations are shown in Fig. 7.

Fig. 7
figure 7

Scatterplots of attention control, working memory capacity, and fluid intelligence with arousal regulation, phasic responsiveness, and baseline pupil diameter factor scores in Study 3

To estimate whether phasic responsiveness and arousal regulation are best described by trait-level factors, state-specific factors, or a combination, we first specified a model in which psychomotor vigilance TEPRs and flanker TEPRs loaded onto a factor (phasic responsiveness – Session 1), Stroop TEPRs and SART TEPRs loaded onto a factor (phasic responsiveness – Session 2), psychomotor vigilance pretrial CoV and flanker pretrial CoV loaded onto a factor (arousal regulation – Session 1), and Stroop pretrial CoV and SART pretrial CoV loaded onto a factor (arousal regulation – Session 2). The model is depicted in Fig. 8A. The model fit the data well, overall (χ2(14) = 9.94, p = 0.77, CFI = 1.00, TLI = 1.03, RMSEA = 0.00 [0.00, 0.04], SRMR = 0.03). There were strong correlations between phasic responsiveness across sessions (r = 0.93), and between arousal regulation across sessions (r = 0.72). However, measures of arousal regulation and phasic responsiveness did not correlate, either within or across sessions.

Fig. 8
figure 8

Confirmatory factor analyses with session-specific factors for both phasic responsiveness and tonic arousal regulation (A), a trait-level factor for phasic responsiveness and session-specific factors for arousal regulation (B), and trait-level factors for both phasic responsiveness and arousal regulation (C) in Study 3

To estimate whether there was evidence for state-specific variance in phasic responsiveness and arousal regulation, we next estimated models that instead specified general phasic responsiveness and arousal regulation factors, rather than separating the factors by session. In the first model, we specified a general phasic responsiveness factor and two session-specific arousal regulation factors (Fig. 8B). This model fit the data well overall (χ2(17) = 14.15, p = 0.66, CFI = 1.00, TLI = 1.01, RMSEA = 0.00 [0.00, 0.04], SRMR = 0.04), and it did not fit any worse than a model with state-specific phasic responsiveness factors (Fig. 8A; Δχ2(3) = 4.21, p = 0.24). Therefore, phasic responsiveness was best described by a single trait-level factor. Finally, we specified a model with just two factors: a general phasic responsiveness factor and a general arousal regulation factor (Fig. 8C). This model fit the data well overall (χ2(19) = 26.94, p = 0.10, CFI = 0.97, TLI = 0.96, RMSEA = 0.04 [0.00, 0.07], SRMR = 0.04), but significantly worse than the model with two state-specific factors (Δχ2(2) = 12.79, p = 0.001). Therefore, there was evidence that arousal regulation had both trait-stable, as indicated by the high correlation between the state-specific factors, and state-specific variance, as indicated by the worse fit of the single, trait-level arousal regulation factor. However, for these types of models, our interpretation is limited by the fact that different tasks were delivered during sessions 1 and 2. Therefore, any covariance between those tasks could be due to shared method variance, rather than shared temporal variance. This a point we return to in the Discussion.

In the next set of factor-analytic models, we specified trait-level factors for tonic arousal regulation and phasic responsiveness as described above, along with working memory, attention control, and fluid intelligence factors. This model fit the data acceptably (χ2(142) = 196.98, p = 0.001, CFI = 0.93, TLI = 0.92, RMSEA = 0.04 [0.02, 0.05], SRMR = 0.06). Factor loadings and interfactor correlations are listed in Table 10. All three cognitive factors were positively and significantly correlated with phasic responsiveness. However, only attention significantly correlated with arousal regulation. Thus, people who showed greater phasic responsiveness tended to have higher working memory capacity, stronger attention control, and higher fluid intelligence. However, tonic arousal regulation only correlated with stronger attention control.Footnote 5

In our next analysis, we added average motivation ratings and average alertness ratings from the sessions to the model as exogenous variables.Footnote 6 Average alertness and motivation ratings from the two sessions correlated moderately across sessions (alertness: r = 0.40 [0.29, 0.51], p < 0.001; motivation: r = 0.42 [0.31, 0.52], p < 0.001), and overall, the six motivation ratings showed acceptable internal consistency (Cronbach’s α = 0.70), as did the six alertness ratings (α = 0.70). Motivation and alertness correlated highly (r = 0.77, p = [0.71, 0.81], p < 0.001). Self-reported motivation significantly and positively correlated with working memory (r = 0.29 [0.16, 0.42], p < 0.001), attention control (r = 0.34 [0.20, 0.48], p < 0.001), fluid intelligence (r = 0.18 [0.03, 0.32], p = 0.02), and phasic responsiveness (r = 0.20 [0.06, 0.35], p = 0.007). However, there was not a significant correlation between motivation and arousal regulation (r = −0.01 [−0.13, 0.16], p = 0.85). Self-reported alertness significantly correlated with working memory (r = 0.26 [0.13, 0.39], p < 0.001), attention control (r = 0.37 [0.23, 0.51], p < 0.001), fluid intelligence (r = 0.16 [0.01, 0.30], p = 0.04), and phasic responsiveness (r = 0.18 [0.04, 0.34], p = 0.02). But alertness did not significantly correlate with arousal regulation (r = 0.06 [−0.08, 0.20], p = 0.39). Therefore, given these intercorrelations, it was worth investigating whether intrinsic motivation and self-reported alertness levels account for the relations between cognitive abilities and phasic responsiveness. Therefore, we estimated three structural regression models in which motivation, alertness and phasic responsiveness predicted each cognitive ability (i.e., working memory, attention control, and fluid intelligence). These models are depicted in Fig. 9. In all three models, a significant direct path from phasic responsiveness to the cognitive abilities remained even after accounting for individual differences in self-reported motivation and alertness. Therefore, the relations between phasic responsiveness and cognitive abilities did not seem to be driven by motivation or alertness, replicating Study 2.

Fig. 9
figure 9

Structural regression models in which phasic responsiveness, motivation, and alertness were set as predictors of Working memory capacity (A), Attention control (B), and Fluid intelligence (C) in Study 3. In all three models, phasic responsiveness accounted for a significant portion of variance in the respective cognitive ability even after controlling for self-reported motivation and alertness. Solid lines indicate significant paths at p < 0.05; dashed lines indicate nonsignificant paths

Our final analysis added a TUT factor to the model. Reports of TUTs from the SART, PVT, Stroop, and flanker tasks were specified to load onto a factor. This factor negatively correlated with motivation (r = −0.38 [−0.51, −0.24], p < 0.001) and alertness (r = −0.44 [−0.57, −0.31], p < 0.001), but did not have significant correlations with attention control (r = −0.09 [−0.28, 0.10], p = 0.35), working memory capacity (r = 0.13 [−0.04, 0.31], p = 0.14) or fluid intelligence (r = 0.17 [−0.02, 0.35], p = 0.07). TUTs had a significant negative correlation with arousal regulation (r = −0.18, [−0.35, −0.01], p = 0.04), similar to Study 1, but not a significant correlation with phasic responsiveness [r = −0.12 [−0.30, 0.06], p = 0.19).

Discussion

Collectively, several pieces of evidence were consistent with the LC-NE account of individual differences proposed by Unsworth and Robison (2017a). First, phasic responsiveness and tonic arousal regulation were task- and situation-general characteristics of individuals, as pupillary responses and intraindividual variability in pretrial pupil diameter correlated across tasks that were administered on different sessions, on average 7 days apart, and formed coherent latent factors. Second, the factors formed by these measures significantly and positively correlated with individual differences in working memory capacity, attention control, and fluid intelligence. Third, although phasic responsiveness significantly and positively correlated with self-reported motivation and alertness levels, accounting for these factors did not account for the relations between phasic responsiveness and cognitive abilities. However, one finding was inconsistent with Unsworth and Robison’s (2017a) LC-NE account. Tonic arousal regulation only correlated with attentional, not with working memory or fluid intelligence. However, this pattern was consistent with what was recently observed by Robison and Brewer (2022) and replicates what was found in Studies 1 and 2. We will return to this point in the General Discussion. Finally, there was no evidence for a correlation between resting pupil diameter and cognitive abilities. Therefore, we did not observe any evidence consistent with Tsukahara and Engle’s (2021a) resting state theory of LC-NE connectivity and individual differences. We will return to this point in more detail in the General Discussion.

Study 4

The original goals of Study 4, when developed, were to measure phasic responsiveness and arousal regulation from a set of tasks not entirely overlapping with an attention control factor (Unsworth and Robison, 2017b). Furthermore, the set of tasks were designed to measure factors for working memory capacity, multiple aspects of attention control (alertness and goal maintenance), pupillary dynamics, and self-report (thought probes and retrospective questionnaires). The thought probes were distributed among tasks measuring working memory (symmetry span), alertness (AX-CPT), and goal maintenance (Stroop). In the present analyses, we combined the alertness and goal maintenance measures to load onto a single attention control factor. Like Study 3, Study 4 was designed specifically to test predictions made by the LC-NE theories of individual differences proposed by Unsworth and Robison (2017a) and by Tsukahara et al. (2016, 2021a, 2021b). Importantly, the cognitive measures used in Study 4 were largely overlapping with those used in Study 3, allowing for a near-direct comparison of the correlations among tonic arousal regulation, phasic responsiveness, working memory capacity, attention control, and fluid intelligence.

Method

Participants and procedure

The sample included 241 adolescents (M age = 16.23, SD = 1.16; 24% 9th graders, 30% 10th graders, 25% 11th graders, 21% 12th graders; 45% male, 55% female). Participants were recruited from a local community high school in the area surrounding South Bend, Indiana. Recruitment involved mailing letters to parents describing the aims of the current study and providing information to contact the researchers. School administrators were provided stuffed and stamped recruitment letters to mail to parents. The researchers did not receive parent and student information unless parents responded to the recruitment letter. Approximately 3,500 letters were mailed. Participants were included if they were rising 9th, 10th, 11th, and 12th graders in the fall of 2018. Participants were required to have normal or corrected vision, no physical impairment that might prevent them from completing the experimental tasks (e.g., motor problems), and no history of traumatic brain injury. Participants could not be enrolled in any other concurrent research studies. The target sample size as based on the recommended sample size for factor analytic and SEM techniques, approximately 10 participants were required for each measure (Floyd and Widaman, 1995). Thus, to account for possible attrition, total target recruitment was 220 participants.

Following recruitment, participants were scheduled for one 3.5-hour assessment appointment in a laboratory located in the Psychology Department at the University of Notre Dame. Parents of participants were emailed a consent form requesting their permission for including their child in the current study. Following consent, parents were emailed a questionnaire regarding their child’s cognitive functioning,

tendency to experience lapses, social-emotional behavior, as well as a questionnaire regarding their own cognition, propensity to experience lapses, parenting behavior, and demographic information. Child participants were emailed an assent form following parental consent. Following assent, children were emailed a questionnaire about their cognitive functioning, tendency to experience lapses, social-emotional behavior, interactions with their parents, personality, rumination and reflection styles, anxiety, depression, sleep, and stimulant (e.g., coffee) intake. If parents and/or participants did not complete the consent, assent, or the survey prior to their cognitive testing appointment, they were required to complete these items upon arrival at their appointment. Participants received $120 USD for completion of all study measures. The Institutional Review Board at the University of Notre Dame approved the protocol for the study.

Following consent and completion of the questionnaires, participants took part in a cognitive assessment including measures of working memory, attention control, and fluid intelligence. During three of the tasks (operation span, rotation span, and antisaccade), participants’ pupil diameters were measured. During three separate tasks (symmetry span, Stroop, and AX-CPT), participants’ current thoughts were probed. At the end of the session, participants completed a survey about their motivation. There were two tasks included in the session that were not analyzed here: Attention Network Test for Interactions and Vigilance (ANTI-V: Roca et al., 2011; Fan et al., 2002) and a simple RT task that was adapted from the alertness subtest of Test for Attentional Performance (TAP; Zimmerman and Fimm, 1995). The difference scores from the ANTI-V had very low reliability estimates and the simple RT task was very similar to the PVT, so these measures were not included in the analyses. All participants completed the tasks in the same order (Draheim, Harrison, Embretson, and Engle, 2018). A researcher was present during all sessions for the entire session.

Tasks

Working memory capacity

Operation span

The task was identical to that used in Study 1 with a few exceptions. First, preceding each list there was a 2,000-ms fixation screen, which was used as the pretrial pupil measurement. Then, at the end of each list, there was a blank 3,000-ms interval to allow the pupil to return to baseline (Eckstein et al., 2017). Finally, the task was lengthened to ensure more reliable estimates of tonic and phasic pupillary measures. Thus, the task included 30 lists instead of 8. In each of two blocks, there were six lists of set size 4 through 8, for a maximum possible score of 180.

Pretrial pupil diameter was computed as the mean pupil diameter during the 2,000-ms prelist fixation screen. Then, for each participant we computed CoV of these 30 measurements and multiplied it by −1. Because there are several different screens within a trial and within a list, each with varying physical properties and varying duration depending on the individual, the TEPR was computed as the average change in pupil diameter (in standardized units) on the letter-encoding screens, which were roughly equal in luminance regardless of the letter presented and were always presented for the same amount of time (1,500 ms). We computed the average change over these letter encoding screens within a trial, then averaged across trials to compute the subject-level dependent variable. Pupil data were unavailable for the first 40 participants for this task because of a programming error.

Symmetry span

This task was nearly identical to that used in Study 1, except it was lengthened to include six lists of each set size from three to seven items. Additionally, thought probes were added after six randomly selected lists.

Rotation span (Harrison et al., 2013)

Before each list, baseline pupil measurements were taken for 2,000 ms by requiring participants to stare at an array of asterisks in the center of the computer screen. Participants’ pupils were measured from the start of each trial to the end of each trial but not in between trials and not during the practice. Following the baseline pupil measurements, participants viewed a letter rotated to one of eight different angles. They needed to make a judgment as to whether the letter (when positioned upright), was facing in the correct direction, or was mirror-reversed. Following the rotation judgment, participants viewed a series of long and short arrows pointing in one of eight directions. Participants needed to remember the order of the long and short arrows facing in the correct direction. Participants recalled the arrows using their mouse from a recall screen that presented possible arrows size and direction). The inter-trial interval was 3,000 ms to allow the pupil to return to baseline before the next trial (Eckstein et al., 2017). Three trials of each set size (three to seven letter/arrow pairs) were randomly presented in two blocks for a total of 30 test lists. As in the operation span, participants practiced on the memory, processing, and memory plus processing portions of the task before test trials. The dependent variable was the total number of correctly recalled items in the correct serial position across all lists (maximum score = 150).

The pupil data were analyzed almost identically to how they were for the operation span task, with the exception that the TEPR was computed during the arrow encoding screens, rather than the letter encoding screens. Otherwise, pretrial CoV, average TEPR, and their reliabilities were computed identically.

Attention control

AX-continuous performance test (AX-CPT)

The AX-CPT (Rosvold et al., 1956) was used to measure both tonic and phasic alertness. During the AX-CPT, participants were required to press the 1 key if they saw the target letter X preceded by the letter A (AX trial) and to press the 3 key for any other combination of letters. AX trials were presented on 70% of trials. On 10% of nontarget trials, participants saw the letter A followed by any other letter but X (AY trials). On 10% of nontarget trials, participants saw the letter X preceded by any other letter but A (BX trials). On the final 10% of nontarget trials, participants saw BY trials with neither the accurate cue A nor the target X. Each trial began with a cue that could be any letter except X, K, or Y. Letters K and Y were excluded as cues because of their physical similarity to the target letter X. The cue was presented at the center of the computer screen for 200 ms. After an interstimulus interval of 1,000 ms, the probe appeared for 200 ms. The probe was any letter except A, K, or Y. Following the probe, a row of asterisks appeared during a 1,500-ms intertrial interval. During this time, participants needed to press the target key if they observed an AX trial or the nontarget key for any other trial. Participants were instructed only to respond when they had seen the second letter in the pair. The task included 300 trials and took about 20 minutes to complete.

Correct responses to AX trials are counted as hits. Failures to respond to AX trials as targets are counted as misses (or omissions). When participants respond to an AY, BX, or BY trial by pressing the target key, this is considered a false alarm (or commission). When participants respond to an AY, BX, or BY trial by pressing the non-target key these are counted as correct rejections. The dependent variable was the AX hit rate minus the AY false-alarm rate.

PVT

This task was nearly identical to that used in Study 1 with a few exceptions. First, the background screen was white. Second, because pupil data were not collected during this task, there was no pretrial fixation screen. Rather, each trial started with the appearance of the row of zeroes (00.000). The interstimulus intervals (i.e., wait periods) were identical to Study 1. Also, rather than being set to run for a specific number of trials, the task ran for 10 minutes. The dependent variable was the average RT for the slowest 20% of trials.

Stroop

This task was nearly identical to that used in Study 3 with a few exceptions. First, the background screen color was white, rather than gray. Second, because pupil data were not collected, there was no pretrial fixation screen, nor a backward mask after responses. Instead, the stimulus word simply disappeared after a key press, and then the next trial began after a jittered intertrial interval. Fifteen response mapping and six practice trials preceded 135 test trials (67% congruent, 33% incongruent). Thought probes were randomly presented on 21 trials. The dependent variable for the difference in RT between incongruent and congruent trials.

Antisaccade

This task was nearly identical to that used in Study 3 with a few exceptions. First, the background screen was gray instead of black. Second, before each trial, a 2,000-ms pretrial fixation screen (***) appeared before each trial. Third, at the end of a trial, a 3,000-ms blank intertrial interval was included to allow time for the pupil to return to baseline before the next trial. Participants completed 10 response mapping trials followed by 15 prosaccade and 50 antisaccade test trials. The dependent measure was the proportion of correctly reported targets on antisaccade trials. Pretrial pupil diameter was computed as the mean pupil diameter during the 2,000-ms fixation screen between each trial. TEPRs were time locked to stimuli presentation on a trial-by-trial basis, just as in Experiment 1. The dependent variable for tonic regulation was the CoV of pretrial pupil diameter within an individual over the 50 trials, multiplied by −1. Trial-by-trial baselining and standardization was the same method as that employed for all tasks in Study 1. The dependent variable for phasic responsiveness was the average TEPR (in standardized units) over the period from 0 to 2,000 ms after the appearance of a flashing cue.

Fluid intelligence

Raven advanced progressive matrices

This task was identical to that used in Study 3.

Number series

This task was identical to that used in Study 3.

Letter sets

This task was identical to that used in Study 3.

Self-report questionnaires

Before the lab session, participants completed a set of questionnaires on an online platform. Participants’ mothers also completed a set of questionnaires. Many of these measures were taken as part of separate projects and are not reported here. In addition to the two questionnaires mentioned below, the participants also completed a self-report ADHD scale (Dupaul et al., 1998), the Day-Dreaming Frequency Scale (DDFS) of the Imaginal Processes Inventory (Singer and Antrobus, 1963). The Attention-Related Cognitive Errors Scale (ARCES; Cheyne et al., 2006), the Mind-wandering: Spontaneous and Deliberate scale (MW-S; MW-D; Carriere et al., 2013), the Big Five Inventory (John et al., 2008), a Generalized Anxiety Disorder questionnaire (GAD: Spitzer et al., 2006), the Kutcher Adolescent Depression Scale (KADS; Brooks & Kutcher, 2001), and the Ruminative-Reflection Questionnaire (RRQ; Trapnell & Campbell, 1999). Following pre-experimental baseline pupil measurement and before beginning the experimental tasks, participants answered the Stanford Sleepiness Scale (SSS; Hoddes et al., 1973), which measured participants’ subjective alertness at the start of the experiment. The SSS is a one-item, 7-point scale in which participants rate their degree of alertness from 1 (feeling active, vital, alert, or wide awake) to 7 (no longer fighting sleep, sleep onset soon, having dream-like thoughts). Thus, high scores of 7 indicated very low alertness and low scores of 1 indicated the participant was alert. The SSS was used as the measure of alertness. At the end of the experiment, participants completed the Interest/Enjoyment, Effort/Importance, and the Pressure/Tension subscales of the Intrinsic Motivation Inventory (IMI; Ryan and Deci, 2000). The IMI is a measure of participants’ subjective experience in relation to a target activity and has shown to relate to intrinsic motivation and self-regulation. For each scale included in this study, participants rated the veracity of statements such as “I enjoyed doing this activity very much” on a 7-point Likert scale ranging from 1 (not at all) to 7 (very true). The average response to the Effort/Importance subscale to this questionnaire, after reverse-coding, was used as the measure of motivation.

Thought probes

Thought probes were randomly presented on the Stroop (Unsworth and McMillan, 2014), AX-CPT, and symmetry span tasks (Mrazek et al., 2012). Following the recommendations of Stawarczyk et al. (2011), thought probes included both task-related and stimulus-dependent parameters. During the experimental tasks the following probe appeared: “What were you just thinking about? Use the mouse to select what you were thinking about on the previous trial.” The thought choices were: (a) the external environment outside of the task; (b) my current physical or emotional state; (c) future plans or past events; (d) the current task; (e) performance on the current task; (f) technology (Hollis and Was, 2016); (g) daydreams; (h) nothing, my mind was blank; (i) personal concerns or worries; and (j) other. Choice (d) was categorized as on-task thought. Choices (a), (b), (c), (e), (f), (g), (h), and (i) were summed to form a total off-task thought score for each task. At the start of each task, participants were given instructions/examples regarding the different thought choices (Meier, Smeekens, Silvia, Kwapil, and Kane, 2018). A researcher was present to answer any questions regarding the lapse types.

Pupillometry

During the antisaccade, operation span, and rotation span tasks, participants’ pupil diameters were measured binocularly at 120 Hz by using the Tobii Pro Spectrum. This device includes a 24-inch monitor mounted with an eye tracking device that uses infrared illuminators to create reflection patterns on participants’ corneas. A six-point calibration procedure preceded all tasks in which pupils were measured. All participants were assessed in the same room, dimly lit at the same setting, and sat approximately 60 cm from the monitor with their heads unrestrained. Importantly, the Tobii Pro Spectrum is expressly designed to be tolerant of head movement. Furthermore, because it directly images the eye, this device may be less influenced by measurement accuracy issues that arise due to eye rotations (e.g., pupil foreshortening error; Hayes and Petrov, 2016). Missing pupil measures due to blinks or offscreen eye movements were removed.

Pre-experimental baseline

At the start of the test session prior to any cognitive testing, participants’ pupils were measured while participants stared at a row of black asterisks centered on a gray background for 2 min (Tsukahara et al., 2016). The mean over this window was used as the measure of baseline pupil diameter. The lighting level in the run room was 130 lux and the brightness from the monitor was 5 lux.

Results

Descriptive statistics for all measures are listed in Tables 10 and 11 lists the zero-order correlations. In the first model, we specified a confirmatory factor analysis with the cognitive measures. Operation span, symmetry span, and rotation span were allowed to load onto a working memory factor; antisaccade, psychomotor vigilance, Stroop, and AX-CPT onto an attention control factor; and Raven, number series, and letter sets onto a fluid intelligence factor. The model fit the data well (χ2(32) = 55.78, p = 0.006, CFI = 0.97, TLI = 0.95, RMSEA = 0.05 [0.03, 0.08], SRMR = 0.05). Overall, the pattern of correlations among the cognitive abilities was similar to Study 3. Next, we examined the correlations among the cognitive abilities and baseline pupil diameter. We did so by adding the mean baseline pupil diameter as an exogenous variable to the model. Baseline pupil diameter did not correlate with working memory (r = −0.06 [−0.21, 0.08], p = 0.38), attention control (r = −0.11 [−0.28, 0.06], p = 0.20), or fluid intelligence (r = −0.06, [−0.22, 0.11], p = 0.51). Therefore, we did not find any evidence consistent with a positive correlation between cognitive abilities and baseline pupil diameter, as predicted by the resting state account of LC-NE function.

Table 11 Descriptive statistics for measures in Study 4
Table 12 Zero-order correlations among measures in Study 4

We next specified latent factors for tonic arousal regulation and phasic responsiveness by allowing CoV of pretrial pupil diameter from the antisaccade, operation span, and rotation span tasks to load onto a factor and the average TEPR from each of those tasks to load onto a factor. The cognitive abilities were specified in the same manner as described above Table 12. Although the model converged upon a solution, there was an extreme Heywood case, with the standardized loading for rotation span TEPR onto the phasic responsiveness factor above 1 (loading = 1.18, SE = 0.27). Careful inspection of the zero-order correlation matrix revealed the cause of this issue. There was a large correlation between the operation span and rotation span TEPRs (r = 0.58), but antisaccade TEPR did not correlate with either operation span TEPR or rotation span TEPR. We took several steps to try to resolve this issue. First, we specified the loading for the rotation span TEPR onto the phasic responsiveness factor to equal 1. Doing so resolved the Heywood case, but it resulted in a rather poor-fitting model (χ2(95) = 228.62, p < 0.001, CFI = 0.87, TLI = 0.83, RMSEA = 0.07 [0.06, 0.09], SRMR = 0.07). Next, we freed the correlation between the residual variances of operation span TEPR and rotation span TEPR. Doing so led to a better-fitting model (χ2(94) = 176.15, p < 0.001, CFI = 0.92, TLI = 0.90, RMSEA = 0.06 [0.05, 0.07], SRMR = 0.06). However, none of the TEPR variables significantly loaded onto the phasic responsiveness factor. Thus, the only solution that would allow us to specify a latent phasic responsiveness factor was to drop antisaccade TEPR from the model. In this final model, operation span and rotation span TEPR loaded onto the phasic responsiveness factor (specifying that their loadings be equal because there are only two manifest variables), pretrial pupil CoV from antisaccade, operation span, and rotation span to load onto an arousal regulation factor, and the cognitive factors specified as above. This model fit the data well (χ2(81) = 158.91, p < 0.001, CFI = 0.92, TLI = 0.90, RMSEA = 0.06 [0.05, 0.08], SRMR = 0.06). The factor loadings and interfactor correlations are listed in Table 13. Scatterplots of the interfactor correlations are shown in Fig. 10.

Table 13 Factor loadings and interfactor correlations for latent variable model in Study 4
Fig. 10
figure 10

Scatterplots of attention control, working memory capacity, and fluid intelligence with arousal regulation, phasic responsiveness, and baseline pupil diameter factor scores in Study 4

The pattern of interfactor correlations was consistent with Study 3. Specifically, the phasic responsiveness from the span tasks positively and significantly correlated with working memory, attention control, and fluid intelligence, whereas the arousal regulation factor did not significantly correlate with any of the cognitive abilities. In Studies 2 and 3, there was a significant correlation between attention control and arousal regulation (lower pretrial CoV = higher attention control). However, despite being in the same direction and close in magnitude, that correlation was not significant in Study 4.

As in Studies 2 and 3, we also examined whether individual differences in self-reported motivation and/or alertness correlated with and accounted for the relations between tonic arousal regulation, phasic responsiveness, and cognitive abilities. The SSS score was entered as the measure of alertness, and the Effort/Important subscale score from the IMI was entered as the measure of motivation, both as exogenous variables. Motivation correlated with attention control (r = 0.18 [0.01, 0.35], p = 0.04), but it did not significantly correlate with working memory, fluid intelligence, arousal regulation, or phasic responsiveness (all |r|s < 0.13, all ps > 0.07). Alertness did not significantly correlate with any of the factors (all |r|s < 0.08, all ps > 0.10). A simultaneous regression model on attention control with motivation and phasic responsiveness indicated that motivation did not account for the relation between attention control and phasic responsiveness (motivation: B = 0.15 [−0.01, 0.31], p = 0.07; phasic responsiveness: B = 0.30 [0.11, 0.50], p < 0.001).Footnote 7

The final analysis examined correlations among the self-report scales, cognitive factors, tonic arousal regulation, and phasic responsiveness. A TUT factor was specified by allowing TUT proportions from the Stroop, symmetry span, and AX-CPT tasks. Average responses on each scale were entered as exogenous variables. Each of the scales showed reasonable reliability estimates (see Table 15). These analyses were largely exploratory. Correlations between working memory, attention control, fluid intelligence, arousal regulation, phasic responsiveness, and each self-report variable are listed in Table 13, and correlations among the self-report scales are listed in Table 14. Only a few correlations were significant. TUTs were associated with lower working memory and lower attention control, but not lower fluid intelligence. Further, TUTs were not associated with phasic responsiveness and arousal regulation, as they were in Study 1. Higher self-reported depression on the KADS was associated with slightly lower working memory capacity and poorer attention control. Greater anxiety on the GAD scale was associated with poorer attention control. Higher neuroticism also was associated with lower working memory capacity, poorer attention control, and lower fluid intelligence. Arousal regulation was not associated with any of the self-report scales, and the only significant correlation with phasic responsiveness and the self-report scales was a small but significant negative correlation with depression. Correlations among the self-report scales are listed in Table 15.

Table 14 Correlations among cognitive abilities, arousal regulation, phasic responsiveness, and self-report scales
Table 15 Correlations among self-report scales in Study 4

Discussion

The results of Study 4 largely replicated Study 3. First, phasic responsiveness significantly and positively correlated with working memory capacity, attention control, and fluid intelligence, whereas arousal regulation did not correlate with any of the cognitive abilities. Second, like Study 3, we did not observe any significant correlations between measures of resting/baseline pupil diameter and cognitive abilities. The correlations between baseline pupil diameter and the cognitive abilities were all small (|r|s < 0.03). Third, as in Study 3, the relations between phasic responsiveness and cognitive abilities were not explained by factors like intrinsic motivation or alertness. The only difference between Studies 3 and 4 in the pattern of results was that Study 3 found a modest, yet significant correlation between tonic arousal regulation and attention control, and Study 4 did not. Another potentially important aspect of Study 4 was the lack of a correlation between the TEPR measure from the antisaccade task with the TEPR measures from the operation span and rotation span tasks. We will discuss this fact in more detail in the General Discussion.

General discussion

The current set of studies tested predictions made by two complementary hypotheses regarding individual differences in functioning of the LC-NE system and cognitive abilities, specifically working memory, attention control, and fluid intelligence. Unsworth and Robison (2017a) have proposed a theory arguing that there are two aspects of LC-NE functioning that could affect cognitive performance: tonic arousal regulation and phasic responsiveness. Tonic arousal regulation refers to the maintenance of a consistent and moderate level of tonic arousal during goal-directed cognitive activity, which is theoretically driven by slow and rhythmic tonic LC activity. Phasic responsiveness refers to the timely, appropriate, and quick delivery of NE by the LC in moments in which a computation must be performed by the cortical areas that implement attention control, working memory, and higher-order cognitive functions. In line with previous work (Aminihajibashi et al., 2020a, 2020b; Robison and Brewer, 2020, 2022; Robison and Unsworth, 2019; Unsworth and Robison, 2015, 2018, 2017b, 2018) intraindividual variability in pretrial pupil diameter was used as the measure of tonic arousal regulation, and average TEPRs were used as the measures of phasic responsiveness.

Tsukahara and Engle (2021a) also argue for an important role for the LC-NE system underlying individual differences in cognitive abilities, most notably fluid intelligence. Their account argues that at rest, there is greater functional connectivity between the LC-NE system and cortical networks that implement higher-order cognition, for individuals higher in cognitive ability. As evidence for this account, they point to positive correlations between resting pupil size and fluid intelligence that they have observed in several studies (Tsukahara et al., 2016; Tsukahara and Engle, 2021b). Thus, the arousal regulation/phasic responsiveness account and the resting connectivity accounts make differential predictions regarding the relation between pupillary measures of LC-NE activity and cognitive ability. Unsworth and Robison (2017a) predict that there should be a positive correlation between tonic arousal regulation (i.e., lower intraindividual variability in pretrial pupil diameter) and cognitive ability and a positive correlation between phasic responsiveness (i.e., larger TEPRs) and cognitive ability. The account by Tsukahara and Engle (2021a) predicts a positive correlation between resting/baseline pupil size and cognitive abilities.

All four studies had measures of tonic arousal regulation and phasic responsiveness embedded into measures of working memory and attention control. In Study 1, the pupillary measures were taken during change-detection/visual arrays working memory tasks, and participants also completed complex span measures of working memory, during which pupil diameter was not recorded. Additionally, participants were asked to self-report their attentional state at various points throughout the visual array tasks. This allowed us to specify a factor representing a tendency to experience task-unrelated thoughts. Overall, the findings were consistent with the predictions made by Unsworth and Robison’s (2017a) LC-NE theory of individual differences. Phasic responsiveness was correlated with higher performance on the visual arrays tasks, higher performance on the complex span tasks, and fewer self-reports of attention lapses. Tonic arousal regulation was correlated with higher visual arrays performance and fewer attentional lapses, but not with complex span working memory.

In Study 2, pupil measures were taken during three attention tasks. Additionally, participants were asked to self-report their motivation and alertness during each task. This step was taken to rule out an alternative hypothesis that individual differences in phasic responsiveness simply indicate how motivated they felt on each task, or how alert they feel that day. Indeed, this alternative hypothesis was ruled out. Despite modest and significant correlations between phasic responsiveness, motivation, and alertness, accounting for these factors did not account for the relationship between phasic responsiveness and attention control. Again, phasic responsiveness and arousal regulation were both significant correlates of individual differences in attention control, consistent with Unsworth and Robison’s (2017a) theorizing. In these data, phasic responsiveness had the strongest, and indeed only uniquely significant, effect on attention control.

In Study 3, pupillary dynamics were measured both at rest (i.e., before participants started completing any cognitive tests) and within four attention tasks. Importantly, participants completed the four tasks in addition to complex span measures of working memory and three measures of fluid intelligence on 2 days of administration, separated by an average of 7 days. This procedure allowed us to test several alternative explanations. The first alternative explanation is that the relationship between pupillary dynamics and cognitive ability is state-specific. If that is the case, there should be weaker correlations between measures of tonic arousal regulation and/or phasic responsiveness across different days of administration. The logic behind that prediction is similar to latent state-trait analyses of psychological variables (Steyer et al., 1999). That is, the variance in a psychological construct can be decomposed into at least four sources: task/measure-specific variance, state-specific variance, trait-level variance, and error. As an example, on a memory task, the individual differences in performance can be due to a person’s overall memory abilities (trait), the state of the individual that day (e.g., stress level, alertness level, motivation), task-specific strategies, and random error. When multiple measures of a construct are delivered multiple times, latent-variable analysis can be applied to proportion variance into these four components.

Overall, our data were most consistent with a trait-level explanation for phasic responsiveness, because our model fit best with a single, trait-level factor comprising TEPRs from all four attention tasks. Although we could not fit the ideal state-trait multiple, because we administered different tasks on different days, state-specificity had to be determined via the shared versus unique variance in TEPRs measured on different days. The data suggested that TEPRs measured on separate days correlated just as strongly as those measured on the same day. However, there was evidence for state-specificity in tonic arousal regulation. A model that separately parsed variance by day of administration fit better than a model with all four pretrial pupil CoV measures loading onto a single factor. It is worth noting that the two session-specific factors correlated highly (r = 0.70), indicating that there is a high degree of trait stability in tonic arousal regulation. Finally, self-reported motivation and alertness correlated with all three cognitive abilities and with phasic responsiveness. However, accounting for these factors did not eliminate the relation between phasic responsiveness and cognitive ability, consistent with Study 2. We should note that while no specific alternative theory has been proposed arguing that state factors are wholly responsible for relations between phasic responsiveness and cognitive ability, we believed it was a plausible alternative, given the interrelations among cognitive performance, pupillary dynamics, and effort (Kahneman, 1973). It also is worth noting that there are other potential state drivers of TEPRs, such as task difficulty, stress level, time-of-day, etc., that were not manipulated here. In the Limitations section below, we describe a potentially better way of parsing individual differences data into task-specific, state-specific, and trait-level variation.

Taking the results into account, we came to the following conclusions in Study 3: 1) phasic responsiveness is task- and situation-general feature of individuals that can account for significant interindividual variance in working memory, attention control, and fluid intelligence; 2) these relations are not simply driven by individual differences in alertness or intrinsic motivation; 3) tonic arousal regulation is affected by both state-specific factors and trait-level factors, but is only related to attention control, and rather weakly; and 4) we found no evidence for an association between resting pupil diameter and any of the cognitive abilities measured. Thus, there was partial evidence for Unsworth and Robison’s (2017a) theory regarding LC-NE functioning and individual differences in cognition. Specifically, only phasic responsiveness showed systematic and robust relations with working memory, attention control, and fluid intelligence. In general, there was no evidence in favor of a resting-state theory of LC-NE functioning (Tsukahara and Engle, 2021a).

In Study 4, pupillary dynamics were measured at rest and during one attention task and two complex span working memory tasks. Interestingly, the same pattern of relationships emerged. Specifically, there were no relations between tonic arousal regulation and working memory capacity, attention control, or fluid intelligence. However, there were significant positive relations among phasic responsiveness, working memory capacity, attention control, and fluid intelligence. None of these relations could be accounted for by individual differences in alertness, measured once at the beginning of the lab session, or intrinsic motivation, measured at the end of the lab session. Additionally, baseline pupil diameter did not correlate with any of the cognitive abilities. Therefore, we observed partial evidence in favor of Unsworth and Robison’s (2017a) theory of LC-NE functioning in that phasic responsiveness predicted individual differences in cognitive ability. Again, there was no evidence for the resting-state theory.

Collectively, the results of the four studies were consistent with a theory that one feature of individuals that correlates with relatively higher working memory, stronger attention control, and higher fluid intelligence is the ability to exert attentional effort toward the execution of task goals, which we call phasic responsiveness. According to the Adaptive Gain Theory (Aston-Jones and Cohen, 2005), one major function of the LC-NE system is the phasic (i.e., short and bursting) release of NE into the cortical regions that implement goal-relevant behaviors. In the present context, these behaviors were the selection and maintenance of goal-relevant representations in working memory (visual arrays tasks), the maintenance of goal-relevant information in the presence of distraction (complex span tasks), the inhibition of prepotent response tendencies (antisaccade, SART, AX-CPT), speeded responses to unpredictably onsetting stimuli (psychomotor vigilance), conflict resolution (Stroop, flanker), and complex, abstract reasoning (Raven, number series, letter sets). Across four different samples, we observed construct-level associations between the magnitude of TEPRs, which we believe index phasic NE release, and these cognitive abilities. Thus, the relative functioning of the LC-NE system in this phasic mode seems to be an important correlate of individual differences in three core cognitive abilities: working memory, attention, and abstract reasoning.

The present findings both limit and extend the theoretical claims made by Unsworth and Robison (2017a). First, the fact that phasic responsiveness consistently correlated with working memory capacity, attention control, and fluid intelligence both supports and extends the theory. This finding confirms their hypothesis that phasic responsiveness is one important way by which individual differences in working memory and attention control arise. Second, the findings extend that theory to show that phasic responsiveness also correlates with higher-order cognitive abilities like fluid intelligence. As a potential bound on the theory, we did not find consistent evidence for a correlation between tonic arousal regulation and cognitive abilities, as predicted by Unsworth and Robison (2017a). In Study 1, arousal regulation correlated with a factor formed by visual arrays tasks, but not with a factor formed by complex span tasks. In Study 2, both phasic responsiveness and arousal regulation significantly correlated with attention control, but the relationship between phasic responsiveness and attention control was much stronger than the relationship between arousal regulation and attention control. In Study 3, arousal regulation correlated with attention control, but not with working memory or fluid intelligence. Finally in Study 4, arousal regulation did not correlate with any of the cognitive abilities. Therefore, there was only partial support for an association between arousal regulation and cognitive abilities.

Overall, the present findings cast doubt on a resting-state, functional connectivity account of individual differences in LC-NE functioning (Tsukahara & Engle, 2021a). There was no evidence in Studies 3 or 4 for a correlation between resting/baseline pupil diameter and any of the cognitive abilities, as predicted by the resting-state account. In our view, there are at least three interpretations for this null result. First, if indeed resting pupil size is a valid measure of the resting functional connectivity between the LC and higher-order cortical networks, one interpretation of our null result is that resting functional connectivity between the LC and cortex is not a strong correlate of cognitive ability. A second interpretation is that resting functional connectivity between the LC and cortex is indeed an important correlation of cognitive ability, but resting pupil size is not a valid measure of it. A third possibility is that our samples measured resting pupil size in suboptimal conditions, creating a range restriction issue. We discuss this in more detail in the Limitations section below.

We propose that task-embedded measures of pupillary dynamics carry more meaningful interindividual variance than resting measures of pupil size. Across multiple independent studies, we and others have observed correlations between phasic pupillary responsiveness and individual differences in working memory capacity and attention control (Hood et al., 2022; Hutchison et al., 2020; Unsworth and Robison, 2015, 2017b, 2018; Robison and Unsworth, 2019; Unsworth et al., 2023; Unsworth and Robison, 2017b), and long-term memory (Robison, Trost et al., 2022b). However, there is limited evidence for a correlation between resting pupil size and working memory (Unsworth et al., 2021a). Outside Tsukahara et al.’s two published studies (2016, 2021b), no other groups have found a correlation between resting pupil size and fluid intelligence (current studies; Robison, Coyne et al., 2022a; Aminihajibashi et al., 2020a; Coors et al., 2022; Robison and Brewer, 2022).

Limitations

There were a few weaknesses and limitations in the studies that are worth discussing. In Study 1, we administered four tasks on two days of administration. However, we did not administer the same tasks on different days of administration, which perhaps would have been a more direct comparison. Our design was such that one measure of conflict resolution (Stroop, flanker) would be paired with one measure of sustained attention (psychomotor vigilance, SART), and that all participants would complete the tasks in the same order. However, in future work, it could be helpful to either 1) randomize the order of tasks across sessions, or 2) deliver precisely the same tasks on each day to assess trait-stability vs. state-specificity (i.e., a latent state-trait design; Steyer et al., 1999).

A second limitation was that the pupillary dynamics were measured always during either attention tasks (Study 1, Study 3) or during a set of similar working memory tasks (Study 2). Furthermore, in Study 2, TEPRs from antisaccade did not correlate with the TEPRs from operation span and rotation span. In all our prior work, pupillary measures have been taken during either attention tasks (Robison and Brewer, 2022; Unsworth and Robison, 2017b) or during working memory tasks (Robison and Brewer, 2020; Robison and Unsworth, 2019; Unsworth and Robison, 2015, 2018), but never both attention and working memory tasks in the same participants. The fact that the antisaccade TEPRs did not correlate with operation span and rotation span TEPRs in Study 4 raises the possibility that pupillary dynamics in these two different types of tasks tap into substantially different individual differences. Indeed, the TEPRs from antisaccade correlated with antisaccade performance (r = 0.20) but little else. Two recent studies by Aminihajibashi et al. (2020a, 2020b) demonstrate some similar effects and are worth discussing here. In the first, pupillary responses were recorded during a multiple-object tracking task (MOT). The magnitude of pupillary dilations during the tracking period scaled with number of objects, and the magnitude of the pupillary responses (i.e., phasic responsiveness) correlated positively with performance on the MOT, particularly at the largest set size (5 objects). But interestingly, phasic responsiveness from the MOT did not correlate with either working memory capacity or fluid intelligence. Aminihajibashi et al. (2020a) attribute this finding to multiple resource theory, which argues that there are multiple pools of resources which are used to accomplish different types of tasks. In another study, Aminihajibashi et al. (2020b) found that orienting-related pupillary responses correlated with behavioral performance on a Posner cueing task. However, these pupillary responses were uncorrelated with measures of working memory capacity and general cognitive ability (g). That would be consistent with our finding that phasic responsiveness correlated well among measures of a similar construct, but did not correlate well when measured across tasks of different constructs (Study 4). Our interpretation, given the similarly in latent correlations across studies between the phasic responsiveness factor with working memory, attention control, and fluid intelligence, is that phasic responsiveness is a task-general individual difference that is important for variation in cognitive ability. However, it is possible that phasic responsiveness, as measured during complex span tasks, is distinct from and perhaps uncorrelated with phasic responsiveness measured during attention tasks. This is another future direction for our ongoing work in this area.

A third limitation concerns our failure to replicate Tsukahara et al. (2016, 2021) to find evidence for a correlation between resting pupil size and fluid intelligence. In their most recent report on this relation, Tsukahara and Engle (2021b) highlighted the importance of background luminance when measuring resting pupil size. Specifically, in one study, they found that the correlation between pupil size and cognitive abilities (working memory, attention control, and fluid intelligence, measured similarly to the current studies) was much stronger when pupil diameter was measured against a gray background screen rather than a white background screen. In a second study, Tsukahara and Engle manipulated background screen color (white vs. black), monitor brightness (bright vs. dim), and room lighting (lights on vs. lights off) within participants. In general, they found that the correlations between pupil size and cognitive ability strengthened in conditions that produced maximal pupil size and maximal interindividual differences (i.e., dark screens, lights off, and dim monitors). In Studies 3 and 4, baseline pupil diameter was measured in dim lighting conditions against a gray background screen. In Study 3, the average baseline pupil diameter was 4.48 mm (SD = 0.90). This amount of interindividual variability was similar to the optimal lighting conditions in Tsukahara and Engle (2021b). However, in Study 4, average baseline pupil diameter was lower overall (M = 3.66) and had less interindividual variability (SD = 0.54). Therefore, Study 2 may have suffered from a range restriction issue. When the current studies were designed, Tsukahara and Engle (2021b) had not published that finding, so we did not carefully consider background luminance. However, in a recent study, we found a near-zero correlation between pupil size and cognitive ability, regardless of the background screen against which pupil diameter was measured (Robison, Coyne et al., 2022a). Furthermore, in a recently completed study, we attempted to replicate Tsukahara and Engle more directly, and we found null correlations between fluid intelligence and pupil size in all lighting conditions (lights on vs. off and black vs. gray vs. white background screens; Robison and Campbell, 2022).

Another potential limitation is that we used relatively wide windows for the computations of TEPRs (e.g., up to 4,000 ms in Study 1). Although we use these TEPRs as our indices of phasic bursting of the LC, such bursts may occur over shorter time intervals (<300 ms), according to Vazey et al. (2018). Thus, our TEPRs may be capturing multiple phasic LC bursts (e.g., in the PVT and antisaccade tasks) or something different altogether. Furthermore, the effects of a phasic LC burst on the pupil diameter of trial n may occasionally leak over into the pretrial measurement of trial n+1. We spaced our tasks with blank intertrial intervals and fixation periods to account for these effects. Future work will need to investigate them further.

A final limitation of the current study, and a major assumption upon which Unsworth and Robison’s (2017a) rests, is that pupil diameter is an indirect yet valid measure of LC-NE activity in both its tonic and phasic modes. However, the pupil is a multiply determined signal. Its primary role, of course, is to adaptively regulate the amount of light that is allowed through the eye and onto the retina. Clearly the pupil is sensitive to the outlay of cognitive effort, either to a quick and accurate response to a stimulus or the maintenance of memoranda over several seconds. Our proposition is that the driver of this phenomenon is a phasic release of NE into cortex, which by its projections into the structures that control pupillary dilation, have an indirect effect on pupil diameter. The cortical networks that receive input from LC-NE neurons also receive dense inputs from dopaminergic, serotoninergic, and cholinergic neuromodulation. Therefore, it is plausible that individual differences in arousal regulation and phasic responsiveness are caused by a different system, or perhaps a confluence of systems. Given their shared role in the sorts of computations required by working memory and attention tasks, the relative roles of these neuromodulatory systems are hard to parse. Whatever is driving individual differences in the magnitude of abrupt, phasic dilations of the pupil amid cognitive activity is reliable. Determining precisely what structural or functional differences in neural physiology drive individual differences in arousal regulation and phasic responsiveness will be an important area for future research. In the next section, we highlight a few promising avenues.

Future directions

Two future directions will be important to pursue in future work. The first is to assess the degree to which measures of phasic responsiveness correlate across tasks that measure distinct cognitive abilities (e.g., attention control, working memory, long-term memory). Although previous work has shown that phasic responsiveness correlates with all these abilities, studies have rarely included pupillary measures embedded in tasks that measure different abilities. Furthermore, in Study 4, the TEPRs from the antisaccade task and from the complex span tasks did not correlate, suggesting there may be domain specificity to phasic responsiveness.

The second future direction will be to tether pupillary measures of LC-NE functioning to structural and/or functional differences in the physiology of the LC-NE system. For example, using neuromelanin-sensitive magnetic resonance imaging, it is possible to characterize the integrity of the LC based on the intensity of the contrast. For example, LC signal strength has been connected to “cognitive reserve” (Clewett et al., 2016) and memory performance (Dahl et al., 2022) in older adults. Decidedly less research has used this technique as a means of characterizing individual differences in healthy young adults. A next step for this line research would be to investigate whether measures of LC integrity correlate with arousal regulation and phasic responsiveness in both healthy young adults, healthy older adults, and older adults with neurological impairments. An even further step would be to temporally connect LC responsiveness using a BOLD response from LC neurons to phasic responsiveness on a trial-to-trial or person-to-person basis (Murphy et al., 2014). These types of multimodal investigations will allow for more rigorous testing of theories connecting LC-NE structure and function to cognition.

Conclusions

Across four studies on different populations (adolescents and young adults) comprising more than 1,000 participants, we observed evidence for an association between phasic responsiveness, as measured by the magnitude of task-evoked pupillary responses and individual differences in working memory capacity, attention control, and fluid intelligence. We interpret these findings in light of the theory that evoked pupillary responses are indicative of phasic responsiveness of the LC-NE neuromodulatory system. The LC-NE system projects into the cortical networks that implement working memory, attention, and other higher-order reasoning processes. Individuals for whom this system functions more optimally tend to have higher working memory capacity, stronger attention control, and higher fluid intelligence.