The perception of luminance (i.e., brightness) is primarily driven by the intensity of light source (Jameson & Hurvich, 1961) and its surrounding spatiotemporal contexts (Eagleman et al., 2004). Beyond these lower-level sensory factors, various cognitive processes, such as the conceptual associations related to affect, may also exert top-down influences on brightness perception (Meier et al., 2007). For example, positive stimuli appear brighter than negative stimuli (Meier et al., 2007), and neutral objects look brighter when positive thoughts are evoked (Banerjee et al., 2012; Bhattacharya & Lindsen, 2016; Meier et al., 2007; Meier et al., 2015). Although these associations are reliable (Meier et al., 2015) and are in line with our everyday experiences (e.g., “a brighter smile”), whether these effects reflect a change in perceived brightness or a shift in response bias remains a long-lasting debate (Firestone & Scholl, 2016).

This uncertainty is largely due to the conceptual difficulty in assessing behavioral top-down effects on brightness perception (see a detailed discussion in Firestone & Scholl, 2016, and a visual demonstration in Fig. S1). Consider the example when observers are asked to match the achromatic font color of a word with gray-scale color patches (Meier et al., 2007). If the word’s meaning modulates its perceived brightness, it should also affect the perceived brightness of gray patches used for perceptual matching. These two effects would cancel each other out (Firestone, 2013), and, as a result, perceived brightness of the word measured by the matching gray patch should remain constant regardless of the word’s meaning. Behavioral observations of “brighter” perception of positive words are, therefore, more likely reflect a response bias instead of a change in brightness discriminability (Firestone & Scholl, 2014, 2015, 2016). This assertion is consistent with the modularity principle of the mind (Fodor, 1983), which often assumes that cognition does not penetrate into early perception (Firestone & Scholl, 2016; Pylyshyn, 1999).

However, an emerging literature suggests that top-down factors can modulate brightness perception manifested as early as changes in pupillary light reflex (PLR). For example, brightness concepts presented as images (Naber & Nakayama, 2013) or words (Mathôt et al., 2017), and brightness percepts mediated by attention (Mathôt et al., 2015), illusion (Laeng & Endestad, 2012), or consciousness (Sperandio et al., 2018) can all modulate PLR. Here, the rapid phasic response of PLR is considered an outcome measure of brightness perception, which is different from steady-state pupil size measures (e.g., Chung & Pease, 1999). As PLR is considered free from voluntary control (Ebitz & Moore, 2017; Mathôt et al., 2017; Sperandio et al., 2018), these findings suggest some top-down influences on brightness perception. However, what remains unknown is whether indirect conceptual association between affect and luminance is sufficient to introduce changes in early perceptual processes, as effects related to conceptual associations (e.g., metaphors) are often assumed to operate at the postperceptual level (Lakoff, 2014; Xie & Zhang, 2014). Furthermore, it is also unclear whether these PLR effects are associated with the behavioral tendency of more “brighter” responses under positive affect—a key step forward to link perceptual changes with altered behavioral patterns (Sperandio et al., 2018).

To fill these gaps, we asked participants to evaluate the emotional valence of a word and then, after a brief blank interval, judge the luminance of a gray color square (i.e., the luminance probe) while we were simultaneously recording their pupil sizes (Fig. 1A). Although participants were told and saw in practice that there were two possible probes with a subtle difference in luminance, we presented only one probe at a constant luminance level during the experiment (Meier et al., 2007). This iso-luminant probe minimizes the impacts of selective attention on pupil measures, as compared with simultaneous presentation of mixed information (Binda et al., 2013; Leong et al., 2019). This design also addresses whether the indirect conceptual association between affect and brightness has an impact on subsequent perceptual processes—a critical test for early perceptual modulation based on conceptual associations (Lakoff, 2014). Typically, participants would report seeing the probe as “brighter” more often after they have evaluated the valence of a positive word (Meier et al., 2007). If this effect is driven by response bias, PLR should be constant as probe luminance remains the same throughout the experiment. However, if this effect is related to a change in perceived brightness, similar to the previous studies that link PLR with conscious perception (Hakerem & Sutton, 1966; Sperandio et al., 2018), PLR should be greater when observers report seeing the iso-luminant probe being “brighter,” especially after an observer correctly evaluate the valence of a positive concept.

Fig. 1
figure 1

Task and behavioral findings. A An example trial. B Participants more frequently reported seeing an iso-luminance probe as “brighter” following correct valence evaluation of a positive word. C The behavioral effect across valence conditions in each participant. ***p < .001

Method

Participants

Forty-eight volunteers (19.90 ± 0.20 [Mean ± SEM] years old, 35 female) participated in the study for course credits. The first 28 participants were recruited for a 1-hour experiment, and the remaining 20 participants were recruited for a 2-hour experiment with increased trial counts. These data were combined and analyzed with linear mixed-effect models that factored in different trials numbers across subjects (see Statistical Analysis). All participants reported normal or corrected to normal visual acuity and normal color vision. Data from 3 additional subjects were excluded as we could not obtain meaningful pupil data from these subjects due to issues in eye tracking with contact lens or eyeglasses. Informed consent was obtained at the beginning of the experiment. Based on G*Power (Faul et al., 2009), the ultimate sample size (n = 48) would have sufficient statistical power (80%) to detect a medium size of subject-level effect (e.g., Cohen’s d ≥ 0.36) at a significant level of .05. This study was not preregistered. Data are available online (https://osf.io/2vuaz/).

Apparatus and stimuli

The experiment was conducted in a moderately lit windowless room (~500 lx). Visual stimuli were presented on a 60-Hz LCD monitor (calibrated with a X-Rite I1Pro spectrophotometer) with a grey background (42 cd/m2) at a viewing distance of 80 cm. Eye movements and pupil size (linearly associated with pupil diameter in arbitrary units) were recorded with an EyeLink 1000 eye tracker (SR Research Ltd., Ontario, Canada) at a sampling rate of 500 Hz after calibration (see Xie et al., 2022, for details). Fifty words with positive meaning and 50 words with negative meaning were adopted from previous studies (Table S1; Meier et al., 2004; Meier et al., 2007). While word frequency was not controlled in this original study (Meier et al., 2007), word frequency did not interact with the effect of semantic valence on behavioral brightness perceptual judgment (also see Meier et al., 2004; Meier et al., 2007, for details). To evaluate how word frequency may influence the pupillary responses to the iso-luminant probe, we extracted word frequency data from the Google Web Trillion Word Corpus (https://norvig.com/ngrams/) and correlated word frequency ranks with PLR effects at the word level as detailed in the later sections. All word stimuli were presented in 36-point Helvetica type at the center of the screen, with a visual angle of 1.8° in height and 1.4° to 3.2° in width. A 3° × 3° grey square (49 cd/m2) was presented at the center of the screen as the luminance probe.

Procedure

Each trial started with a fixation point at the center of the screen for 800 ms, followed by a 400-ms presentation of a word, randomly chosen from the positive or negative word sets (Fig. 2A). Participants were asked to judge the word’s valence using two buttons on a gamepad. After a jittered blank interval between 900 ms and 1,100 ms (averaged at 1,000 ms), a luminance probe appeared at the center of the screen for 400 ms, followed by a 3,000-ms blank interval. Participants reported whether the probe was the “brighter” or the “darker” one using two gamepad buttons that were different from the buttons for the valence judgment task. While the luminance probe was set constant at 49 cd/m2 during the experiment, participants were told and actually saw probes with a subtle difference in luminance during the practice (48 cd/m2 vs. 50 cd/m2). They completed 15 or 30 practice trials and took a few minutes break for eye-tracking recalibration before the actual experiment. Ultimately, each word was presented once or twice over the course of the experiment in random order, yielding a total of 100 (n = 28) or 200 (n = 20) trials per subjects. During the experiment, participants were encouraged to suppress eye blinks and eye movements (see Fig. S3) and achieve accuracy over speed. Reaction time was thus not the focus of the current experiment (see Fig. S4).

Fig. 2
figure 2

PLR effects. A Participants showed greater pupil constriction when they judged the iso-luminant probe as “brighter” following correct evaluation of a positive (right) relative to negative word (left). B We quantified PLR magnitude as the windowed normalized pupil size during the maximum PLR period from 500 to 1,000 ms following probe onset (Ebitz & Moore, 2017). C Subject-level analysis of the average PLR magnitude across trials shows that PLR effects linking brightness judgment and perception are stronger following participants’ correct valence evaluation of a positive word, as compared with a negative word. D Similarly, word-level analysis of the PLR effects averaged for the same word across subjects also suggest that words with more positive meanings are associated with greater PLR effects. Each dot represents data for a word out of a total of 100 words. The line represents the linear fit of the data. Example words are shown in the figure. Error bars or areas represent standard errors. **p < .01. (Color figure online)

Preprocessing and analysis of pupil data

Pupil data were prepared and analyzed using established methods (Granholm et al., 1996; Siegle et al., 2004). In briefly, a 2,500-ms epochs time-locked to the onset of the probe were extracted from trials with correct responses for word valence judgment. Artifacts, including blinks, were identified and interpolated using established algorithm (Granholm et al., 1996; Siegle et al., 2004). Trials with more than 50% interpolated area or with a <100-ms response time were excluded from analysis (Granholm et al., 1996; Siegle et al., 2004; Xie et al., 2022). In the end, 6.75% of trials were rejected across participants. We then applied baseline correction by dividing the pupil data by the average pupil size within 200 ms before the onset of the luminance probe (Cohen, 2014). This percentage of change normalizes individual differences in baseline pupil size and effectively capture pupil changes time-locked to the probe presentation (Ebitz & Moore, 2017), given that PLR is only weakly correlated with baseline pupil size (Ebitz et al., 2014). The resulting PLR measures are therefore not confounded by variable pupil sizes triggered by different words on the screen before probe onset, although word valence does not have a significant impact on participants’ pupil size measures (in line with Mathôt et al., 2017; also see Fig. S5 and Discussion).

Several measures of PLR were extracted (Ebitz & Moore, 2017). First, we measured the windowed pupil size from 500 to1,000 ms after probe presentation that captures maximum PLR period across participants (Fig. 2A–B). This windowed pupil size measure is referred to as PLR magnitude (Ebitz & Moore, 2017). Second, we also measured the local minima of the pupil size measures during the 500–1,000-ms time window following probe onset as PLR amplitude (Bitsios et al., 2004; Ebitz & Moore, 2017). Both measures are highly correlated and produce comparable experimental outcomes. Furthermore, the size of measurement time window for PLR magnitude and amplitude does not significantly change the experimental outcome either, as long as this window captures the PLR peak (e.g., changing from 500–1,000 ms to 600–900 ms).

Semantic projection for word-level analysis

To investigate word-level effects, we examine how the semantic meaning of word valence is related to PLR effects between different response types across participants. First, we projected the high-dimensional semantic embeddings of each words based on GloVe (global vector for word representations; Pennington et al., 2014) onto a more interpretable valence scale (Fig. S2A) using a previously established method (Grand et al., 2022). Specifically, we defined a feature subspace using the average vector of six distinct words that characterize each of the opposite ends of the valence space (positive: “pleasant”; “good”; “favorable”; “happy”; “great”; and “cheerful”; negative: “unpleasant”; “bad”; “unwanted”; “sad”; “awful”; and “upset”).

$${\displaystyle \begin{array}{c}\underset{Positive}{\to }=\frac{\ \underset{Pleasant}{\to }+\underset{Good}{\to }+\underset{Favorable}{\to }+\underset{Happy}{\to }+\underset{Great}{\to }+\underset{Cheerful}{\to}\kern0.75em }{6}\\ {}\underset{Negative}{\to }=\frac{\ \underset{Unpleasant}{\to }+\underset{bad}{\to }+\underset{Unwanted}{\to }+\underset{sad}{\to }+\underset{Awful}{\to }+\underset{Upset}{\to}\kern0.75em }{6}\end{array}}$$

According to the prior study (Grand et al., 2022), this averaging procedure ensures robust approximations of feature subspaces that were not strongly impacted by the particular choice of antonyms. Next, we took the difference between positive and negative ends of the valence scale, such that a positive value would mean more positive meaning and a negative value mean more negative meaning.

$$\underset{Valence}{\to }=\underset{Positive}{\to }-\underset{Negative}{\to }$$

We then project each word’s vector onto this valence scale via a dot product calculation. This estimated value therefore captures the semantic valence value of each word from our everyday language,

$$word\ valence\ projection=\underset{Valence\ }{\to}\bullet \underset{Word}{\to }$$

Afterwards, we correlated word valence projection scores with the average difference scores in PLR magnitude (500–1,000 ms after probe onset) when a probe was judged as being “brighter” versus “darker” (i.e., “brighter” − “darker”) following the correct valence evaluation of each presented word across participants. On average, the PLR effect estimated for each word comes from data averaged across 31 ± 0.35 participants per word for the “brighter” response and 23 ± 0.33 participants per word for the “darker” response.

Statistical analysis

In addition to within-subject statistical procedures (e.g., paired-sample t test), we applied linear mixed-effect modeling on trial-level pupil data to take into account trial-level variations in subject-level inference. Wald’s Z test is used to examine the statistical significance of mixed-effect model parameters. Furthermore, to examine word-level effect (Xie et al., 2020), we applied Spearman rank-order correlation to evaluate the relationship between word valence projection and the PLR effect between different response types across subjects. Additionally, we performed a sensitivity analysis by resampling the data with different sample sizes (from 5 to 47) to examine how the number of the subjects would affect this correlation estimate. This resampling analysis suggests the current sample size allows us to obtain a reliable estimate of this correlation (see Fig. S2 for details).

Results

Behavior

As accuracy was emphasized over speed, participants were highly accurate in evaluating a word’s valence across experimental conditions (positive vs. negative: 94.5% vs. 92.9%), t(47) = 1.64 , p = .11, Cohen’s d = 0.24 [95% CI: −0.05, 0.52]. Following correct word evaluation, participants frequently reported seeing the probe being “brighter” after evaluating a positive word (66.0% vs. chance of 50%), t(47) = 9.49, p < .001, Cohen’s d = 1.37 [1.03, 1.70]. In contrast, they reported seeing the probe equally likely as being “brighter” or “darker” after evaluating a randomly intermixed negative word (51.4% vs. chance of 50%), t(47) = 0.70, p = 0.49, Cohen’s d = 0.10 [−0.14, 0.34] (Fig. 1B). There was a significant within-subject difference between these two conditions, t(47) = 6.01, p < .001, Cohen’s d = 0.87 [0.53, 1.20], with 79% of participants showing this effect (Fig. 1C), in line with previous findings (Meier et al., 2007).

Pupillometry

To examine PLR, we measured pupillary response as the percentage of change relative to the mean pupil size in the 200 ms before the probe onset for each trial (see Fig. 2A for the average PLR under different brightness response types and word valence conditions). This procedure normalizes individual differences in pupil size with minimal impacts on PLR, considering that variability in PLR is only weakly related to baseline pupil size (Ebitz & Moore, 2017). We quantified PLR magnitude using the windowed pupil size at 500–1,000 ms following probe onset when pupil constriction peaked (Fig. 2B). The trial-by-trial measures of PLR magnitude were then subjected to linear mixed-effect models across subjects and trial types.

First, we examined whether PLR magnitude could reflect participants’ behavioral judgement of probe luminance. We found that PLR was significantly greater when participants reported seeing the luminant probe as “brighter” as compared with when they reported seeing it as “darker” (Wald test: Z = 2.86, p = .004). This main effect of response type on PLR suggests that brightness judgement is grounded in brightness perception, instead of response bias (Hakerem & Sutton, 1966; Sperandio et al., 2018). Next, we examined how the valence of a word preceding the probe would modulate the relationship between brightness judgement and perception. Although PLR magnitude did not differ significantly between word valence conditions (Z = 1.56, p = .21), there was a significant interaction effect between brightness response type and word valence (Z = 3.27, p = .001). Specifically, the difference in PLR magnitude between “brighter” and “darker” responses was significantly greater in the positive condition, t(47) = 3.60, p < .001, Cohen’s d = 0.52 [0.22, 0.82], as compared with the negative condition, t(47) = 0.56, p = .58, Cohen’s d = 0.09 [−0.24, 0.42] (Fig. 2C). These observations remained robust when we examined PLR amplitude estimated from the local minimum (Ebitz & Moore, 2017) during 500–1,000 ms after probe onset (e.g., interaction effect between response type and word valence: Z = 3.06, p = .004). These results are consistent with the behavioral data, in which participants’ response to the luminance probe is almost random in the negative condition (51.4% “brighter” response) but shows a more systematical pattern in the positive condition (66.0% “brighter” response). As a result, the PLR effect for the “brighter” (vs. “darker”) response was more pronounced following the correct evaluation of a positive word as compared with that of a negative word.

We then investigated whether words with more positive meanings indeed triggered larger PLR effects in a more continuous manner. To do so, we projected the high-dimensional embeddings of the vectorized word representation of each word (Grand et al., 2022) to the valence dimension and create an continuous valence measure for each word (e.g., from “unpleasant” to “pleasant”; see Fig. S2 for details). We correlated these word valence values with the subject-average PLR effects triggered by each word when participants judged the iso-luminant probe being “brighter” over “darker.” Consistent with the subject-level effects, we found that words with more positive meanings were associated with greater PLR effects (ρ = .32 [.13, .48], p = .0013; Fig. 2D). This significant correlation between semantic valence and pupillary responses to the iso-luminant probe across words is in contrast with the lack of association between word frequency and the word-level PLR effect (Fig. S6). We observed no significant evidence that the pupillary responses to the iso-luminant probe could be modulated by word frequency (ρ = .03 [−.16, .23], p = .75). After factoring in the correlation between semantic valence and word frequency (ρ = 0.34), the correlation between semantic valence and pupillary responses remains significantly larger than between word frequency and pupillary responses (ρ = .32 vs. ρ = .03, Z = 2.58, p = .01), based on a comparison of the correlated correlation (Meng, Rosenthal, & Rubin, 1992). Collectively, the converging subject- and word-level findings suggest that the iso-luminant probe reliably looks brighter, manifested as larger pupil constriction, when a preceding word carries a positive meaning.

Discussion

Given the probabilistic nature of perception (Knill & Pouget, 2004), an iso-luminant visual object can be perceived as being brighter or darker across different exposures, depending on the spatiotemporal contexts (Eagleman et al., 2004) or an observer’s momentary thoughts and feelings (Banerjee et al., 2012; Meier et al., 2007). Although the former is well acknowledged, whether or not associative concepts related to one’s thoughts and feelings can genuinely impact early perception has remained a long-lasting debate (Firestone & Scholl, 2016). Here, with collaborative pupillary and behavioral evidence, our findings pose a critical test against the assertion that cognition does not impact perception.

Our data suggest that the indirect association between positive concepts and luminance (e.g., “a brighter smile”) can impact brightness perception, in contrast to the conventional understanding that metaphorical associations primarily operate at the postperceptual or conceptual level (Barsalou, 2008; Lakoff, 2014). We have demonstrated these effects both at the subject level across emotion conditions and at the word level when word valence is considered a more continuous measure. Our findings are separate from the influence of luminance concepts on brightness perception (Mathôt et al., 2017), considering that the words used in the study do not directly carry luminance information (Table S1) and the effects must rely upon indirect semantic mapping. In theory, this association can be scaffolded by the overlapping neural circuitry between visual perception and high-level cognitive processes (Lakoff, 2014). For instance, microsimulation at a primate’ frontal eye field is sufficient to modulate PLR controlled by brain stem nuclei (Ebitz & Moore, 2017). Future research should investigate how this frontal-brain-stem circuitry may mediate top-down PLR effects in humans. Furthermore, as our findings have focused on semantic valence, it remains to be established whether these results can be generalized to the indirect impact of phonic valence (i.e., a sound that is perceived more positively or negatively) on brightness perception. Clarifying these issues will be important for us to understand how multi-sensory experiences are integrated with our prior associative knowledge to form a coherent percept of the environment.

It is worth noting that various alternative interpretations could not account for our current observations. First, as participants showed no systematic eye movements before or after the onset of the iso-luminant probe, the pupil diameter measures in the present study were, therefore, unlikely to be contaminated by eye movements (Fig. S3). Second, consistent with Mathôt et al. (2017), we did not obtain significant evidence that word valence alone could have a significant impact on pupil size measures before or after participants’ correct judgment of word meaning in valence. Yet, similar to Mathôt et al. (2017), we did observe that pupil sizes triggered by negative words were numerically larger than that triggered by positive words (Fig. S5), which might be accounted for by the lower word frequency of the negative words (Kuchinke et al., 2007) based on the words selected by Meier et al. (2007). This then raises an important issue concerning whether word frequency alone could modulate pupillary responses to the iso-luminant probe. We examined this issue by investigating the relationship between word frequency and the word-level pupillary effect across participants (Fig. S6), and further contrasted this relationship with the association between semantic valence and the word-level pupillary effect (Fig. 2D). Our data suggest that word frequency has minimal contribution to participants’ pupillary responses to the iso-luminant probe across response types (ρ = .03 [−.16, .23], p = .75), and that semantic valence is a better predictor of pupillary responses to the iso-luminant probe, as compared with the word frequency (ρ = .32 vs. ρ = .03, Z = 2.58, p = .01). These findings, therefore, rule out word frequency as a third alternative explanation for the current findings. Fourth, although attention remains an important factor in accounting for some top-down effects observed in the literature (Firestone & Scholl, 2016), various attention-related mechanisms, such as the attentional engagement towards certain emotional words or information selection during probe presentation, seem to play a minimal role in our current findings. For example, we find little evidence for a main effect of emotion on the overall pupil size measures, which casts some doubts on participants’ greater attentional engagement in a certain emotion condition over another. Furthermore, although selective attention towards stimuli with different levels of luminance can manifest as changes in pupil size (Binda et al., 2013), it is unlikely to account for the current findings given that we presented only one iso-luminant probe throughout the experiment.

In sum, the current study provides both subject-level and word-level evidence that conceptual associations between affect and luminance can modulate behavioral judgment on a subsequent iso-luminant probe, as well as the PLR associated with this judgment (Hakerem & Sutton, 1966; Sperandio et al., 2018). These findings could not be accounted for by lower-level stimulus features as the probe luminance remain constant throughout the experiment. Furthermore, eye movement, pupillary responses to the word, word frequency, attentional engagement, or information selection could not account for our current PLR effects either. Collectively, our data suggest that conceptual associations based on semantic valence can modulate early visual perception, as early as the pupil level.