Introduction

The role of executive function in creative cognition is complex and multifaceted (Chrysikou, 2019). The novel generation of ideas can arise from increased availability of low-level (such as perceptual) information, which could benefit from lower levels of top-down intervention in the form of cognitive control (Chrysikou, Weber, & Thompson-Schill, 2014). Such low-level information may facilitate remote associations, which play a critical role in novel idea generation (Kenett, 2018b; Kenett & Faust, 2019; Mednick, 1962). Conversely, the creative process also involves the evaluation of the appropriateness of these novel ideas, which likely benefits from higher levels of cognitive control (Chrysikou, 2018, 2019).

According to the control-attention theory of creativity, goal-directed idea generation is governed by top-down control processes, such as inhibition, that facilitate the strategic search for task-relevant responses (Beaty & Silvia, 2012; Benedek, Franz, Heene, & Neubauer, 2012; Benedek, Jauk, Sommer, Arendasy, & Neubauer, 2014; Frith et al., 2020; Silvia, 2015). Specifically, the quality of creative responses is thought to depend on individual differences in such cognitive control processes (Benedek et al., 2014; Benedek & Fink, 2019). Finally, theories on creativity propose that optimal cognitive control in the creative process is contingent on the creative stage, that is, on either the generation or evaluation of creative ideas (Kleinmintz, Ivancovsky, & Shamay-Tsoory, 2019; Sowden, Pringle, & Gabora, 2014).

A growing number of studies applying brain stimulation techniques to inhibit neural activation in the prefrontal cortex—by targeting cognitive control mechanisms, such as inhibition (Miller, 2000)—have demonstrated how such an inhibition facilitates the generation of creative responses (Weinberger, Green, & Chrysikou, 2017). However, it is still debated whether exciting neural activation in the prefrontal cortex may inhibit the generation of creative responses. We continue this line of research and examine whether cognitive control acts as a filter mechanism in such creative generation processes via anodal and cathodal stimulation of the prefrontal cortex.

Transcranial direct current stimulation (tDCS) has been used as a complementary method to more ubiquitous brain imaging studies in order to allow direct tests of the causal roles of specific brain areas in cognitive function. It is applied via a constant, low-level electrical current to the cortex through surface electrodes positioned on the scalp to modulate the excitability of neurons within a region of interest (Nitsche et al., 2008; Stagg & Nitsche, 2011). Experiments utilizing tDCS apply anodal stimulation (generally intended to increase cortical excitability) and/or cathodal stimulation (generally intended to decrease cortical excitability) over brain regions hypothesized to play a role in some aspect of cognition or behavior. Most studies also include a “sham” condition in which electrodes are placed on the scalp but without the application of sustained electrical current, serving as a control condition.

Several studies have examined the effects of tDCS over frontotemporal areas (inferior frontal gyrus, dorsolateral prefrontal cortex, anterior temporal lobe, and middle temporal gyrus) on verbal creativity (Weinberger et al., 2017). These studies mainly investigated how cathodal tDCS stimulation inhibits cognitive control mechanisms during divergent thinking tasks (Chi & Snyder, 2011; Chrysikou et al., 2013; Mayseless & Shamay-Tsoory, 2015). In particular, open-ended problem-solving tasks requiring divergent thinking that have been widely used in creativity research, requiring participants to generate uncommon uses to common objects (Acar & Runco, 2019; Runco & Acar, 2012). Stimulation studies in divergent thinking tasks are based on theories arguing that inhibited cognitive control facilitates novel responses via access to widespread associative networks (Kounios & Beeman, 2015). One such theory relates creative performance to the role of the prefrontal cortex in cognitive control as a matching filter hypothesis (MFH; Chrysikou et al., 2014).

According to the MFH, cognitive control mechanisms are critical to optimize performance on convergent tasks that hinge on regulatory, top-down filtering processes (e.g., idea evaluation), whereas the same mechanisms may constrain or impede performance on open-ended tasks that rely on spontaneous, bottom-up processes (e.g., idea generation). Based on this hypothesis, Chrysikou et al. (2013) applied cathodal tDCS stimulation to modulate left inferior frontal gyrus (IFG) activity during a divergent thinking task. In accordance with the MFH, they found that cathodal tDCS over the left IFG decreased RTs and improved fluency in an uncommon divergent-thinking generation task compared with a common-uses generation task and compared with cathodal stimulation of the right IFG or sham condition (Chrysikou et al., 2013). Other studies have found that anodal tDCS over left dorsolateral prefrontal cortex facilitates performance in convergent thinking creative tasks (Cerruti & Schlaug, 2009; Goel, Eimontaite, Goel, & Schindler, 2015; Metuki, Sela, & Lavidor, 2012; Zmigrod, Colzato, & Hommel, 2015). Convergent creativity tasks require finding a unique solution to a closed ended problem, such as in the remote association test (Mednick, 1962). In accordance with the MFH, such tasks require higher involvement of cognitive control mechanisms (Chrysikou, 2018, 2019; Chrysikou et al., 2014). Collectively, the set of findings reviewed here illustrate that both increases and decreases in frontally mediated cognitive control functions—induced by anodal or cathodal tDCS, respectively—effects on creative output depends on the precise demands of the particular creativity task.

In contrast to the picture painted above, there is still debate on whether increasing cognitive control mechanisms during idea generation can lead to inhibited creative output. Ivancovsky, Kurman, Morio, and Shamay-Tsoory (2019) applied anodal and cathodal tDCS stimulation to the left IFG in a sample of Israeli and Japanese participants while they performed a divergent thinking task. The authors found a marginally significant interaction between stimulation type (anodal/cathodal) and condition (stimulation/sham; p = 0.047) on a general creativity score (that averaged over fluency, originality, and flexibility scores of participants’ responses). Furthermore, simple effect t-test analysis found a weaker effect of the anodal stimulation (p = 0.05)—decreased creativity scores compared with sham—and no significant effect for the cathodal stimulation (p = 0.09; Ivancovsky et al., 2019). Based on these findings, the authors concluded that increasing frontally mediated cognitive control processes impairs divergent thinking. In sum, it is yet to be determined whether (and in what circumstances) anodal stimulation over the left IFG can inhibit creative performance in divergent thinking tasks.

In all of the studies reviewed above, the measures that have been subjected to scrutiny following tDCS manipulations are the number of responses, the latency of responses, or the subjective ratings of the content of the responses (e.g., novelty or appropriateness) as evaluated by independent raters. Recently, a growing body of studies have been using quantitative measures of semantic distance as an objective measure of creative output (Heinen & Johnson, 2018; Kenett, 2019). Such an objective measure circumvents the limitations of subjective ratings of creative output (Forster & Dunbar, 2009; Silvia et al., 2008) and thus may lead to more reliable results in tDCS studies of creative thinking. Such measures have been gaining popularity in creativity research due to the assumption that such measures can be used to quantify the degree of divergence of participants’ responses (Acar & Runco, 2014; Beaty & Johnson, 2020; Beketayev & Runco, 2016; Hass, 2017a, 2017b; Kenett, 2018a, 2019). A distributional model that has been used to extract semantic distance is Latent Semantic Analysis (LSA; Landauer, Foltz, & Laham, 1998). LSA quantifies the semantic similarity between words in a given multidimensional semantic space by determining the probability of a given word co-occurring in a specific context (e.g., a paragraph of a document).

Over the past few years, LSA has gained popularity in creativity research (Beaty, Silvia, Nusbaum, Jauk, & Benedek, 2014; Bourgin, Abbott, Griffiths, Smith, & Vul, 2014; Dumas & Dunbar, 2014; Forster & Dunbar, 2009; Green, 2016; Heinen & Johnson, 2018). For example, Heinen and Johnson (2018) recently showed that LSA scores relate to measures of novelty and appropriateness, which are considered common subjective measures of creative output (Runco & Acar, 2012). Furthermore, the authors show that such LSA scores were sensitive to instruction manipulation and changed when participants were required to generate creative responses (Heinen & Johnson, 2018). While objections have been raised at applying LSA to studying behavioral performance (Kenett, Levi, Anaki, & Faust, 2017), it is a useful method to measure semantic distance (Kenett, 2019).

Recent studies examined the effects of tDCS on semantic distances of participant’s responses, as measured with LSA (Brunyé et al., 2015; Green et al., 2017). Green et al. (2017) used LSA to examine the effect of anodal tDCS stimulation on the frontal pole on analogical reasoning. The authors show how tDCS stimulation leads to analogies that are farther apart (lower LSA similarity scores) without affecting their appropriateness (Green et al., 2017). Thus, there is precedent for semantic distance, measured via LSA, to be empirically examined in relation to creativity in tDCS studies.

We examined the effects of anodal and cathodal stimulation over a region corresponding to the left inferior frontal gyrus on a task requiring participants to provide an uncommon but plausible ending to an incomplete sentence. Our task is similar to the Hayling task (Burgess & Shallice, 1997), in which participants are asked to complete open-ended sentences with either a word that fits the sentence (initiation condition) or a word that is completely unrelated to the sentence (suppression condition). Previous studies have linked the suppression condition, which requires inhibition of related responses, to the left dorsolateral prefrontal cortex (Allen et al., 2008; Nathaniel-James & Frith, 2002). Metzuyanim-Gorlick and Mashal (2016) examined the effect of anodal tDCS over left dorsolateral prefrontal cortex and cathodal tDCS over the right dorsolateral prefrontal cortex on performance of the Hayling task and found that this tDCS stimulation montage improves performance in the suppression condition, by facilitating inhibition of responses (Metzuyanim-Gorlick & Mashal, 2016). To control for confounding effects of retrieval strategies that are required in our uncommon sentence completion task and individual differences in working memory and broad retrieval abilities, we administered a phonemic verbal fluency task (Benedek et al., 2017) and a reading span task (van den Noort, Bosch, Haverkort, & Hugdahl, 2008).

In line with the MFH (Chrysikou et al., 2014), and based on the work of Metzuyanim-Gorlick and Mashal (2016), we expected that tDCS stimulation would reduce (anodal) or increase (cathodal) the “uncommonness” (i.e., the semantic distance) of participants’ responses in our uncommon sentence completion task. We examined this hypothesis by measuring how anodal and cathodal stimulation alter the latencies, semantic distances, and the subjective ratings of the creative content of participants’ responses. Such findings can provide further empirical support for the MFH (Chrysikou et al., 2014) and further highlight the costs and benefits of cognitive control in creative thinking (Chrysikou, 2018).

Materials and Methods

Study Design

The design of our study is a between-subject tDCS study, such that we pseudo-randomly assigned participants to one of three stimulation type conditions (anodal, cathodal, or sham). A within-subject tDCS design can account for individual variability (Li, Uehara, & Hanakawa, 2015). However, a within-subject design introduces different confounds that a between-subject design addresses (Thair, Holloway, Newport, & Smith, 2017). These issues include experiential unblinding to study conditions (i.e., noticing differences between sham and nonsham conditions), and additional confounds due to learning, practice, and order effects (Thair et al., 2017). In lieu of using a within-subjects design to minimize the impact of extraneous variability on our analyses, we instead opted to collect several behavioral measures from each participant to allow us to assess and statistically control for variability in individual skills and abilities that might impact the outcome measures for the study (e.g., working memory capacity and broad retrieval abilities).

Participants

We recruited 48 adults (ages 18-35 years, 24 females) for the study from the University of Pennsylvania community. Participants were placed into a stimulation condition (anodal, cathodal, sham) based on the order of their recruitment but also based on their demographic information (age, gender, etc.). In this way, potential confounders at the subject level were counterbalanced within stimulation condition. All participants were right-handed and confirmed that they did not suffer from any neurological or psychiatric issues and were not taking any neurological or psychiatric medications. One participant from each group was removed from the final analysis. Stimulation was discontinued for one participant in the anodal stimulation group by request, although no adverse effects were reported. Data from another participant in the cathodal stimulation group was lost due to a malfunction with the recording device. In order to balance stimulation conditions, a participant was removed at random from the sham group, leaving 45 participants (n = 15 per stimulation condition) for the final analyses reported below. Gender, age, and years of formal education were matched across conditions (Table 1).

Table 1 Demographic information

Materials

Behavioral Measures

Uncommon Sentence completion task

Participants were presented with sentences missing a final word and were asked to generate uncommon endings to complete them. In line with Chrysikou et al. (2013), as well as standard instructions in creativity studies, participants were explicitly required to generate novel, uncommon responses (Acar, Runco, & Park, 2020; Said-Metwaly, Fernández-Castilla, Kyndt, & Van den Noortgate, 2020). Participants were required to generate uncommon responses, which is a common way to measure response originality (Wilson, Guilford, Christensen, & Lewis, 1954). The sentences for the task were chosen from a set of 493 sentences from a published set of incomplete sentences, with only the final word missing (Block & Baldwin, 2010). We selected sentences with Cloze probabilities (probability that a general population would converge on a given word to complete the sentence) for completion between 70-90%. This probability range was chosen to have moderate to high constraint on the range of responses each sentence could evoke by the sample. This led to a subsample of 141 sentences, from which 21 were randomly chosen to be the practice stimuli. The 120 sentences used in the task varied in length from 4 to 12 words (see Appendix). Each response (along with the response latency) was recorded.

The content of these responses was analyzed in two ways. As described earlier, we computed LSA semantic distance scores for each of the responses compared to the Cloze response. Based on previous studies (Beaty et al., 2014), LSA similarity scores were computed via the Colorado LSA website (http://lsa.colorado.edu/). This score was computed between the Cloze response and all empirical responses generated for all of the sentences. We then derived a value of semantic distance by computing the inverse of the LSA similarity score (Beaty et al., 2014).

In addition to these measurements of semantic distance, we also obtained subjective evaluations of the content of participants’ responses, in line with prior studies of divergent thinking. We used Amazon Mechanical Turk (AMT; Buhrmester, Kwang, & Gosling, 2011) to recruit an independent group of raters to evaluate the set of responses that we obtained from participants in the tDCS experiment. In this survey, AMT raters were presented with the sentences and participants’ responses and were asked to rate for each response its novelty and appropriateness (20 raters per sentence, each rating both dimensions). Each AMT rater was presented with a random subset of 10 sentences. For each of these sentences, each unique response generated by any of the participants to that sentence was presented, along with the Cloze response. Thus, the AMT rater viewed the full range of responses for a given sentence. Each response was displayed next to a slider bar on a scale ranging from 1 to 7: with 1 being the lowest rating, and 7 being the highest. AMT raters were instructed to use the full range of the scale and rated two creativity dimensions for each of the responses for that subset of sentences: Novelty and Appropriateness (defined based on a previous study; Heinen & Johnson, 2018). Novelty was defined as “The property of originality or newness of the response in relation to completing the sentence. Furthermore, a novel response can be completely unrelated to the end of the sentence.” Appropriateness was defined as “the inherent explanatory ease of the response to complete the sentence. That is, appropriateness relates to whether the response is comprehensible, understandable, and accessible in relation to the sentence that it completes.”

To examine consistency across AMT raters, we used an intraclass correlation coefficient (ICC) employing a one-way random effects structure, appropriate for multiple AMT raters and measurements of the same construct (Koo & Li, 2016). This was achieved using the ICC function in the psych package in R (Revelle, 2014), based on the ICC2 model that assumes that a random number of raters rated each response with absolute agreement in the ratings, with the method that utilizes the mean across k raters. Both novelty (ICC = 0.89, df = 3403, 95% confidence interval [CI] [0.88 0.90]) and appropriateness (ICC = 0.94, df = 3415, 95% CI [0.94 0.95]) ratings displayed high-excellent reliability among AMT raters. Interpretations of the coefficients magnitudes were specific to ICC (Koo & Li, 2016), but also align with other, similar measures of reliability (Cicchetti et al., 2006). Finally, raters used the entire range of the scale (1-7) for both dimensions, and the rating data for novelty (M = 4.13, SD = 2.21, Skewness = −0.14) and appropriateness (M = 3.59, SD = 2.29, Skewness = 0.19) were normally distributed.

Verbal Fluency

To account for individual differences in broad retrieval strategies, we administered the F-A-S phonemic verbal fluency test (Harrison, Buxton, Husain, & Wise, 2000) to all participants. Participants were instructed to come up with as many unique words as possible in a set amount of time, beginning with the specified letter (F, A, or S, 1 minute each), and to avoid the use of proper nouns and multiple words using the same word-stem with a varying suffix (e.g., fast, faster, fastest), because they would not count towards their total (Bechtoldt, Benton, & Fogel, 1962). Participants verbally generated their responses while the experimenter recorded them. The total number of valid words generated in the task were summed to create a verbal fluency score.

Reading span test

We used a standard computerized version of the reading span test (van den van den Noort et al., 2008) in order to assess participants’ verbal working memory abilities. The test began with instructions and two practice trials, after which the experimental sentences were presented. Participants were instructed to read the sentences aloud at a “comfortable, conversational pace” as they attempted to keep the sentence final word in mind for a given set of sentences. Set sizes ranged from two to six sentences, and the different set sizes were presented in a random order, randomly distributed across sets for each participant. At the end of a set, a visual “recall” prompt was delivered on the screen, and participants verbally recited the sentence-final words that they could remember, in the order in which they appeared. These responses were recorded by the experimenter. The total number of correctly remembered words was used as the working memory score for each participant.

Transcranial direct current stimulation

A battery-powered constant DC stimulator (NeuroConn GmbH, Ilmenau, Germany) was used to deliver the stimulation current. Thin, saline-soaked sponges were used to interface the 5-cm x 5-cm rubber electrodes with the scalp. Electrode placement locations were determined using the 10-20 system. The active-site (anode/cathode) electrode was placed on position F7, corresponding to a swath of cortex including the left inferior frontal gyrus (Chrysikou et al., 2013; Okamoto et al., 2004). The reference (cathode/anode) electrode was placed on the mastoid of the contralateral side. Stimulation ramped up to its final intensity of 1.5 mA over the course of 10 seconds. Stimulation began 180 seconds before the first experimental trial to allow stimulation effects to set in before experimental trials (Nitsche & Paulus, 2000). Stimulation continued for an additional 14 minutes (total time under stimulation = 17 minutes), and the stimulation intensity ramped-down for 10 seconds until ending. In the sham stimulation condition, participants received 30 seconds of post-ramp-up stimulation before ramp-down. The anode and cathode electrode location (F7 or mastoid) were counterbalanced in the sham condition. Stimulation lasted until roughly a minute after the task was completed; therefore, the entirety of the task was done under full-strength stimulation.

The logic of this study, and of our tDCS manipulation in particular, hinges on the assumption that we can modulate frontally mediated cognitive control processes with this technique. We used the same electrode montage as in Chrysikou et al. (2013), with the same device, electrode sizes, current strength, and polarities. Furthermore, Chrysikou et al. (2013) simulated a model to describe the resulting current flow (Datta et al., 2009; Datta, Baker, Bikson, & Fridriksson, 2011). This simulated model was used to verify that the mode of polarization under the cathode applied to F7 lead to decreased excitability in the PFC (Radman, Su, An, Parra, & Bikson, 2007).

This simulated model demonstrated the following tDCS effects of the montage: First, the tDCS effect of the electrode over F7 lead to a concentrated effect of peak current under the posterior portion of the electrode (PFC), which was the most homogeneous, consistent hyperpolarizing current flow. A second current peak in the temporal cortex was more heterogeneous, thus less likely to drive consistent hyperpolarization. The use of a mastoid electrode resulted in a diffuse current flow, which was inconsistent with the electric flow over the F7 electrode (see Chrysikou et al., 2013 for more details). As such, the model confirmed that this tDCS montage produces a current over the PFC, with diffuse current flow in other regions located between the electrodes. While individual variability in brain anatomy may result in differences in the exact location of the effect across participants, the computational model simulated by Chrysikou et al. (2013) confirms the general effect of the montage that we are using in the current study. Finally, we note that a growing number of tDCS studies examine how stimulating the PFC modulates executive functions (Chase, Boudewyn, Carter, & Phillips, 2020; Sarkis, Kaur, & Camprodon, 2014); a few of these experiments have used the same montage that we are using in our current study. These studies have demonstrated how stimulating the F7 region that corresponds to the left IFG (Okamoto et al., 2004) affects frontally mediated processing, such as category learning (Lupyan, Mirman, Hamilton, & Thompson-Schill, 2012) and cognitive control tasks (e.g., the Flanker task; Nozari, Woodard, & Thompson-Schill, 2014). These findings lend support to our assumption that the montage that we are applying does indeed alter cognitive control.

Procedure

After providing informed consent, participants completed the phonemic verbal fluency and working memory tasks in a counterbalanced order within stimulation condition. Participants were then fitted with the tDCS electrodes and presented with the instructions of the uncommon sentence completion task. Participants were made aware that “common” endings for the sentences existed and were explicitly told to avoid those responses (i.e., “avoid the most common word, or the word that someone else would be likely to say”) in favor of endings that are more uncommon or novel but still contextually and grammatically appropriate. They also were instructed to avoid the use of proper nouns. Participants then completed a practice session of 21 sentences with feedback, and they were walked through the stimulation process before stimulation began. They then completed the uncommon sentence completion task entirely under stimulation (or sham), which consisted of 120 sentences.

At the beginning of each trial, a fixation cross with a consistent duration of 500 ms appeared, followed by an incomplete sentence (missing the last word), printed on a single line in 22-point solid black font in the center of a white screen. A sound was generated at stimulus onset to serve as a reference point for calculating reaction times. The intertrial interval consisted of a blank screen with a jittered length between 600-1,500 ms. Participants had 5,500 ms to respond before the sentence disappeared from the screen, in line with previous fMRI and tDCS studies using a similar task (Benedek, Jurisch, Koschutnig, Fink, & Beaty, 2020; Green et al., 2017). Participants were instructed not to respond during the intertrial interval, once the sentence was no longer displayed on the screen. Responses were recorded on paper for each trial by an experimenter seated behind the participant. In addition, an audio recording of the entire experiment was obtained via an omnidirectional room microphone connected to a solid-state recording device (Marantz, Kawasaki, Japan). The microphone was positioned near the stimulus presentation laptop computer, as well as in close proximity to the participant.

Statistical analysis

Continuous outcome variables (novelty and appropriateness ratings) were analyzed using linear mixed-effects (LME) hierarchical regression models (Baayen, 2008), as implemented in the lme4 package (Bates et al., 2015) in R v.3.4.4 (R Core Team, 2017). We chose this approach to account for potentially high variability of tDCS response between individuals. In this type of analysis, a random intercept for each participant statistically removes some of the influence of individual variability from the final fixed-effects comparison (e.g., effect of tDCS on an outcome), thereby making the final estimate a better, more generalizable estimate of the true effect. We go further to address response variability by controlling for some or all collected individual factors (age, gender, education, verbal fluency, and working memory) and sentence characteristics (sentence length, mean word length, and the Cloze probability), if they are additive, before assessing the effect of tDCS condition (anodal, cathodal, sham) on the novelty (Table 2) and appropriateness (Table 3) ratings. Thus, any such potential effect of tDCS on the novelty or appropriateness ratings is above and beyond any of these additional variables, uniquely related to these specific aspects of participants’ responses. We included random intercepts for each participant to account for interindividual variation (Baayen, 2008; Mirman, 2014).

Table 2 Chi-square difference tests for novelty model comparisons
Table 3 Chi-square difference tests for appropriateness model comparisons

ANOVA model comparisons were applied to determine the parameters that best predicted the ratings of the sentence completion responses. To determine the most predictive models for novelty and appropriateness, we began by computing the model with only the intercept term (baseline) and random effects and tested the effects of all other factors that were hypothesized a priori to influence them. Each parameter was serially added into the model in the order displayed in Tables 2 and 3. If a parameter significantly improved model fit, it remained in the model for subsequent comparisons. Because the tDCS stimulation condition was our primary parameter, it was added to the model after all other parameters. In doing so, the resulting statistic represents the amount of variance in the ratings that was explained by the stimulation condition beyond all other fixed effects in the model. Effect sizes of the fixed-effects variables are reported based on their estimates. In addition, the marginal R2 (variance of only the fixed effects) and the conditional R2 (variance of the fixed- and random-effects) of the LME model is computed based on the approach of Nakagawa and colleagues (Nakagawa, Johnson, & Schielzeth, 2017; Nakagawa & Schielzeth, 2013), using the performance package (Lüdecke, Makowski, Waggoner, & Patil, 2020) in R v.3.4.4 (R Core Team, 2017). As a more exploratory analysis, we examined interactions between tDCS and semantic distance and between tDCS and RT. If such an interaction term outperformed any previous models for the three ratings, it was included in the final model. Improvements in model’s predictive ability were determined using the log-likelihood goodness-of-fit measure, such that deviations in −2 times the change in log-likelihood are distributed as x2 with degrees of freedom equal to the number of parameters added (Mirman, 2014, p. 143). All models’ random effect structures were identical and took into consideration random variability among participants.

Results

Data from one stimulus sentence were rendered unusable due to an error, and the analyses were performed on 119 of 120 sentences. From the full set of trials (N = 5,355), we removed trials that did not have RTs due to response omissions (N = 848) or semantic distance values (N = 99), leaving us with 4,408 trials (82.3% of responses). For these trials, we analyzed participants’ RT, semantic distance, novelty, and appropriateness ratings (Tables 2 and 3). In addition, no differences were found across the three groups in the phonological verbal fluency or the working memory tasks (all ps ns).

This analysis led to similar models for the novelty and appropriateness ratings (Tables 2 and 3). Age, gender, and education did not significantly affect novelty and appropriateness ratings. Sentence length (number of words) significantly improved model fit for novelty and appropriateness, while the mean word length and Cloze word probability failed to do so. Keeping sentence length (SL) in these models, we examined the impact of verbal fluency and working memory. Including verbal fluency significantly improved both models’ fit, but working memory did not. The next parameters added to these models were semantic distance and RTs, both of which significantly improved the models’ fit. We then tested our hypothesis that tDCS stimulation condition would significantly impact novelty and appropriateness ratings. As predicted, tDCS stimulation condition significantly improved both models’ fit, indicating the significant effect of stimulation on novelty and appropriateness ratings. Furthermore, there was a significant interaction between tDCS and semantic distance but only for appropriateness ratings.

The last row in Tables 2 and 3 represent the final models with the greatest fit for each type of rating. All parameter estimate analyses are generated from these models. The marginal R2 (variance of only the fixed effects) for the novelty model was 0.13 and for the appropriateness model was 0.14. The conditional R2 (variance of the fixed- and random-effects) of the novelty model was 0.18 and for the appropriateness model 0.19. LME model comparisons were also conducted with semantic distance and RTs as outcome variables, and stimulation condition failed to significantly predict either. Thus, we do not report these models.

After identifying our final model, we examined the coefficients and model estimations for each fixed effect on the outcome variables (novelty and appropriateness ratings). Sentence length had a significant positive relation with novelty ratings (estimate = 0.039, SE = 0.009, p < 0.001), and a nonsignificant negative relation with appropriateness ratings (estimate = −0.045, SE = 0.01, p = 0.151). Phonemic verbal fluency scores had a significant positive relation with novelty rating (estimate = 0.008, SE = 0.003, p = 0.02), and a significant negative relation with appropriateness ratings (estimate = −0.011, SE = 0.005, p = 0.024).

This pattern of parameters differentially impacting novelty and appropriateness ratings continued for semantic distance and RT. Semantic distance had a significant positive relation with novelty rating (estimate = 1.442, SE = 0.068, p < 0.001), and a significant negative relation with appropriateness ratings (estimate = −1.617, SE = 0.170, p < 0.001). RT had a significant negative relation with novelty rating (estimate = −0.105, SE = 0.018, p < 0.001), and a significant positive relation with appropriateness ratings (estimate = 0.112, SE = 0.023, p < 0.001).

Finally, we examined the effect of tDCS condition, by computing the averages and standard errors for each stimulation condition based on our model estimates (Fig. 1). For this analysis, the sham condition was treated as the control condition, and both anodal and cathodal stimulation conditions were compared to it. Anodal stimulation had a significantly negative relation with novelty ratings (estimate = −0.270, SE = 0.094, p = 0.004), and a significantly positive relation with appropriateness ratings (estimate = 0.803, SE = 0.207, p < 0.001) compared with sham. Cathodal stimulation did not significantly affect novelty (estimate = 0.000, SE = 0.091, p = 0.998) or appropriateness (Estimate = 0.221, SE = 0.202, p = 0.275).

Fig. 1
figure 1

Fixed effects of tDCS condition on novelty (A) and appropriateness (B) ratings. Boxplots display the distributions, and the points display the model adjusted effects. Error bars on the points show 95% confidence internals

These results demonstrate that for both novelty and appropriateness, the cathodal and sham stimulation conditions did not significantly differ, but anodal stimulation significantly predicted both types of ratings. These results indicate that the effects were due to anodal stimulation, specifically, not broad effects of brain stimulation. To verify this specific anodal stimulation effect, we conducted subsequent similar analysis, where either the anodal or cathodal (and not sham) conditions served as the control, comparison, condition. These analyses resulted in similar effects and estimates to the ones reported here.

For appropriateness ratings, there was a significant interaction between semantic distance and tDCS conditions, such that the impact of tDCS condition on ratings differed, dependent on the semantic distance of a response. Put another way, this interaction reveals differences, across conditions, in the slope describing the relationship between the appropriateness of the response and its distance from the Cloze response (Mirman, 2014, p. 29). This relationship was stronger for participants in the anodal condition compared to those in the sham condition (estimate = −0.636, SE = 0.223, p = 0.004). The results were not significant for the cathodal stimulation, although they were in the same direction (estimate = −0.350, SE = 0.22, p = 0.111). In Fig. 2, we display a model-based estimation displaying the interaction effect of tDCS condition and semantic distance on appropriateness ratings. This figure illustrates that, as a result of the differences in these slopes, the appropriateness ratings are maximally different, across conditions, for responses that are “closer” to the Cloze response.

Fig. 2
figure 2

Interaction effects of tDCS condition and semantic distance on appropriateness ratings. Responses with low semantic distance in the anodal stimulation condition were related to a significant increase of appropriateness ratings compared to sham and cathodal stimulation. When semantic distance is high, there was not a significant difference between stimulation conditions on ratings.

Discussion

Cognitive control is hypothesized to play a critical role in creative cognition (Chrysikou, 2018; Chrysikou et al., 2014). Inhibited prefrontal cortex activity, by way of cathodal tDCS, has been associated with higher performance in generation of responses (number and speed of responses) in divergent thinking tasks (Chi & Snyder, 2011; Chrysikou et al., 2013; Mayseless & Shamay-Tsoory, 2015). Here, using a more naturalistic sentence completion task, we provide evidence that supports the other direction of this effect: Excitatory stimulation over the lateral prefrontal cortex leads to more appropriate and less novel responses, in a divergent thinking uncommon sentence completion task. Our task is similar to the Hayling task, a task that contrasts common with uncommon sentence completion (Burgess & Shallice, 1997). Thus, the task is optimal to examine the effects of excitatory and inhibitory tDCS stimulation in such an open-ended task that has been used to measure participants’ abilities to inhibit common responses. Furthermore, this task is more natural and thus potentially offers a more controlled task to measure divergent thinking than the standard alternative uses task (Acar & Runco, 2019). In that sense, our task is more similar to verb generation tasks that have been applied in the past to study creativity (Heinen & Johnson, 2018; Prabhakaran, Green, & Gray, 2014).

tDCS stimulation and performance in the uncommon sentence completion task

An account that maps such a dynamic shift in PFC activation to the creative process is the matched filter hypothesis for cognitive control (Chrysikou et al., 2014). According to the MFH, cognitive control mechanisms dynamically shift between spontaneous and controlled filtering of information depend on task demands and individual differences (Chrysikou et al., 2014). Thus, such a dynamic shift may correspond to the two stages of the creative process (Chrysikou, 2019), namely generation and evaluation of creative ideas (Chrysikou, 2019; Kleinmintz et al., 2019; Sowden et al., 2014). During generation, inhibited cognitive control mechanisms (left inferior PFC) may facilitate novel responses via access to widespread associative networks. During evaluation, excited cognitive control mechanisms (left dorsolateral PFC) may facilitate response selection and the evaluation of the novelty and appropriateness of the generated response (Chrysikou, 2019; Weinberger et al., 2017). In reviewing recent tDCS studies on creative thinking, Weinberger et al. (2017) highlighted the tDCS effect as a function of the interactions between task demands (generation vs. evaluation), polarity (anodal vs. cathodal), and stimulation site (left inferior vs. dorsolateral PFC). While previous studies have shown how inhibitory stimulation over the left IFG increases the novelty of responses in a divergent task (Chi & Snyder, 2011; Chrysikou et al., 2013; Mayseless & Shamay-Tsoory, 2015), our findings demonstrate the other direction of this effect: Excitatory stimulation over the left IFG decreases the novelty of responses in an uncommon sentence completion task (see also Ivancovsky et al., 2019).

We found that the novelty and appropriateness ratings of participants’ responses were significantly modulated by excitatory anodal stimulation over the left prefrontal cortex. Anodal stimulation decreased the novelty and increased the appropriateness of participants’ responses, given the context of the sentence relative to sham stimulation. This finding supports and extends the MFH for cognitive control (Chrysikou et al., 2014). However, we did not find an opposite effect, that is, cathodal stimulation decreasing appropriateness and increasing novelty ratings. This null finding seems to be at odds both with the MFH (Chrysikou et al., 2014) and with previous tDCS findings (Chrysikou et al., 2013). However, Chrysikou et al. (2013) used visual presentation of objects, which may demand different cognitive processes than those utilized in verbal divergent thinking tasks (Chrysikou, Motyka, Nigro, Yang, & Thompson-Schill, 2016). Furthermore, this null finding is in line with a previous tDCS study that did not find any cathodal prefrontal effect on divergent thinking (Mayseless & Shamay-Tsoory, 2015). However, our task is different than that used by Chrysikou et al. (2013) or Mayseless and Shamay-Tsoory (2015), which also may contribute to this null effect. Finally, Karuza et al. (2016) systematically probed the effects of current polarity and stimulation intensity on participants’ ability to perform a task of inhibitory cognitive control. The authors found that cathodal stimulation led to highly varied and weakly reliable effects (Karuza et al., 2016).

Thus, our main finding is that anodal simulation induced the generation of responses that were generally more appropriate given the context. This suggests that excitatory anodal stimulation over the lateral prefrontal cortex may have increased participants’ adherence to typical, instead of novel, sentence completion. Our findings strengthen recent findings (Ivancovsky et al., 2019) and complement a large body of work that has shown how cathodal stimulation of the left IFG facilitates the novelty of generated responses (Chrysikou et al., 2013; Weinberger et al., 2017).

Behavioral measures & performance in the uncommon sentence completion task

We also found an effect of sentence length, broad retrieval abilities (as measured with the phonemic verbal fluency task) and RTs on the novelty and appropriateness ratings of participants’ responses. Sentence length of the stimuli had a significant negative relation with appropriateness ratings and a significant positive relation with novelty ratings. This may be related either to the increased time involved in reading the sentence for longer sentences or to the amount of words in the sentence that activate alternative interpretations. Both of these possibilities can lead to increased novelty and lowered appropriateness of the response. Thus, we controlled for this effect as a confound in our model. Broad retrieval ability, measured via the phonemic verbal fluency task, was found to have a significant negative relation with the appropriateness and a significant positive relation with the novelty ratings of participants’ responses. Thus, the higher broad retrieval abilities a participant had, the more novel their responses were. This finding is in accordance with previous behavioral findings that have shown the relation between such broad retrieval abilities and creativity (Beaty et al., 2014; Benedek et al., 2017; Silvia, Beaty, & Nusbaum, 2013). Such a relation has been attributed to executive abilities facilitating effective retrieval from semantic memory (Gilhooly, Fioratou, Anthony, & Wynn, 2007). Performance on the task requires the participant to recall information stored in long-term memory with little to no queue. Therefore, it is possible that participants with increased access to words stored in long-term memory are simply quicker at producing words that satisfy the simplest part of the sentence completion task, generating a word that fits appropriately. Thus, individuals with high broad retrieval abilities had more time to accomplish the difficult part of the task, achieving higher degrees of novelty. Similar to sentence length, we controlled for this effect as a confound in our model. In regard to RT, we found that as RT increased, their responses became more appropriate and less novel. This finding is in line with previous findings on the Hayling task, showing that longer RTs led to worse performance (Cervera-Crespo & González-Alvarez, 2016).

We also measured the effect of working memory on the novelty and appropriateness ratings of participants’ responses. Research has indicated a link between working memory and performance in creativity tasks (Lee & Therriault, 2013). De Dreu, Nijstad, Baas, Wolsink, and Roskes (2012) found that working memory capabilities correlated positively with performance in a range of creative tasks and on a variety of measures of performance in those tasks. Of particular relevance, they found that higher working memory scores correlated with better fluency and subjective originality scores on an insight task. However, we did not find any significant effects of working memory on the novelty and appropriateness of participants’ responses. This null effect might be related to the nature of our uncommon sentence completion task or to the specific subjective ratings that we used in our study.

tDCS stimulation, semantic distance, & performance in the uncommon sentence completion task

Finally, in our study we also examined an objective measure of the uncommonness of participants’ responses, quantified as the semantic distance of each response from the common, Cloze response of the sentence. Semantic distance had a significant negative relation with appropriateness ratings and a positive relation with novelty and creativity ratings. Thus, in accordance with research implicating the role of semantic distance in creativity (Kenett, 2018a), the higher the semantic distance of participants’ responses, the more novel and less appropriate they were judged to be. Our findings are in line with recent research relating LSA semantic distance scores with responses in a divergent thinking task (Beaty & Johnson, 2020; Hass, 2017a, 2017b) and to novelty and appropriateness scores of creative output (Heinen & Johnson, 2018).

As an exploratory analysis, we examined the interaction between tDCS and semantic distance on the novelty and appropriateness ratings. This is due to previous findings demonstrating that tDCS may affect LSA-based measures of semantic distance (Green et al., 2017) and studies relating semantic distance to novelty and appropriateness (Heinen & Johnson, 2018). This analysis revealed a significant interaction between semantic distance and tDCS condition, such that the correlation between semantic distance and appropriateness was stronger for subjects in the anodal stimulation condition than in the other two conditions. This interaction hints at the possibility that anodal stimulation changed how these participants generated more appropriate responses, perhaps by searching for close semantic neighbors of the expected completion.

A possible interpretation of this interaction effect can be attributed to the work of Nozari and Thompson-Schill (2013), who examined cost-benefit effects of selective attention due to anodal stimulation over the left PFC. The authors found a “focusing effect” of selective attention on participants’ performance in reciting four words “tongue-twisters,” where one of the tongue-twisters in each trial served as a target word. After anodal stimulation, participants made fewer errors for the target word (increased benefit) but more errors for the nontarget words (increased cost). The authors interpret their findings as indicating that anodal stimulation of the left PFC boosts attentional bias of selection, leading to more focused selection of the attended item, while increasing errors in unattended items (Nozari & Thompson-Schill, 2013). Under this focusing hypothesis, anodal stimulation may have led to activation of a more “focused” semantic field around the conventional Cloze response. Such a focused semantic field could increase the salience of more conventional responses, thus leading to responses judged to be more appropriate. Although such a theory is supported by findings that tDCS stimulation can alter the semantic distance between responses (Green et al., 2017), we cannot directly examine it in the current study and future research is required to do so.

Limitations and future research

So far, we have interpreted our results in light of the matched filter hypothesis on the role of cognitive control in creative thinking. This interpretation is tempered by several important limitations of our study—some of which concern tDCS in general and some of which arise from particular methodological choices in this study in particular—which we will outline below. Despite these limitations, we believe that our findings in this study further highlight the role of mediated cognitive control in creative thinking. Nonetheless, additional follow-up studies with our task are needed to replicate, extend, and address the limitations of the current study.

First, we only found a significant effect of tDCS stimulation on the subjective ratings of participants’ responses, and not on the additional measures we collected. Chrysikou et al. (2013) found a significant improvement (decrease) in RTs on a divergent thinking task in participants undergoing cathodal stimulation. We did not find any stimulation effect on participants’ RT. However, the RTs in our study included the time spent reading the sentence that was presented on the screen; therefore, we are unable to distinguish between time spent reading and response generation. Furthermore, the task used by Chrysikou et al. (2013) required generating alternative uses for visually, not verbally, presented objects (see also Mayseless & Shamay-Tsoory, 2015). Similarly, a few tDCS studies on verbal fluency did not find behavioral effects of stimulation (Ehlis, Haeussinger, Gastel, Fallgatter, & Plewnia, 2016. We did not find any stimulation effect on the semantic distance of participants’ responses. This indicates that stimulation does not generally alter participants’ semantic space, but rather targets task related demands. This lends further support to a potential semantic “focusing” effect due to our stimulation (Nozari & Thompson-Schill, 2013). A follow-up study is needed to directly compare the effect of tDCS stimulation on the standard divergent thinking task with our uncommon sentence completion task on these variables (RT, semantic distances). Such a study will allow replication of our findings and a more direct link to previous relevant studies (Chrysikou et al., 2013; Mayseless & Shamay-Tsoory, 2015).

A second limitation is having only an uncommon, without a common, completion condition in our task. However, the uncommon condition which requires applying divergent thinking generation capabilities is the focus of our research and a common sentence completion task would only serve as a baseline. To this point, Chrysikou et al. (2013) found in their study a general task effect of uncommon versus common condition and an interaction between the uncommon generation task and tDCS condition. As such, our current study provides preliminary results demonstrating how the uncommon sentence completion task can be used to study creativity. However, a follow-up study that includes both an uncommon and a common sentence completion conditions is needed to replicate and extend our findings. Such a follow-up study should include standard divergent thinking tasks that will allow a more thorough comparison of the effect of anodal and cathodal tDCS stimulation over the left PFC on these tasks.

A third limitation of our study is that we used a between-subject design, with small sample size in each group (n = 15). Many between-subject design tDCS studies have a small sample size and thus suffer from low power (Berryhill, Peterson, Jones, & Stephens, 2014; Thair et al., 2017). However, similar sample sizes to ours have been used in previous tDCS studies of creativity (Chrysikou et al., 2013; Colombo, Bartesaghi, Simonelli, & Antonietti, 2015; Green et al., 2017; Mayseless & Shamay-Tsoory, 2015). Further research is needed to replicate our findings in a within-subject design with a larger sample size. Such a within-subject design would ensure sufficient power and allow examining how additional factors (either trait or state level factors) predict the magnitude of stimulation effect at the individual level.

A fourth limitation in our study is a lack of a neural measurement of the effect of the tDCS stimulation on our uncommon sentence completion task (Chase et al., 2020). Such a neural measurement would allow us direct evidence for the effect of our tDCS stimulation on the IFG and better elucidate our results. Such a neural measurement also is crucial given that our stimulation montage is based on a simulated model (Chrysikou et al., 2013), where we assume that the F7 corresponds to the left IFG. This anatomical claim is based on an assumption derived from a model simulation, and therefore should be considered tentative until additional anatomical consideration is available. Furthermore, even if the model is correct on average, it provides no estimate of individual variability, which is a further limitation of this approach (as we discuss above). A growing number of studies have applied tDCS with EEG (Jones, Johnson, & Berryhill, 2020) and MEG (Ikeda, Takahashi, Hiraishi, Saito, & Kikuchi, 2019) methods. However, tDCS-fMRI studies are still rare and developing (Esmaeilpour et al., 2019). Regardless, several current meta-analysis have consistently argued that the effect of anodal tDCS on the PFC (focusing mostly on the dorsolateral PFC) in various cognitive control tasks is by impacting goal-maintenance functions (Mancuso, Ilieva, Hamilton, & Farah, 2016; Simonsmeier, Grabner, Hein, Krenz, & Schneider, 2018). Thus, evidence that tDCS over the PFC can impact executive functions (such as cognitive control) is growing (Chase et al., 2020). Overall, while we cannot directly demonstrate that our findings are a result of altered IFG functioning, we provide further support for a growing body of literature that demonstrates similar effects in relation to creative thinking and further cognitive tasks that require cognitive control (Chase et al., 2020; Weinberger et al., 2017). A follow-up combined tDCS-fMRI study with our task would allow to address this issue directly.

Finally, we applied concurrent tDCS stimulation with the experiment. It is possible that there are differences in performance with different timing parameters, including application of tDCS prior to testing (Stagg et al., 2011). For example, Nozari et al. (2014) examined the timing (during or after) and task during stimulation (low or high demands) of cathodal stimulation on the PFC. The authors found that a high-demand task during concurrent cathodal stimulation had a systematic effect on participants’ performance on a cognitive control task (the Flanker task). Their finding may also explain our null finding for the effect of cathodal stimulation on participants’ responses, potentially highlighting differences in task demands between the standard divergent thinking task and our uncommon sentence completion task. Further research is needed to examine the time varying and cognitive demand effects of tDCS stimulation on creative tasks.

Conclusions

In the current study, participants performed an open-ended, uncommon-ending sentence completion task while undergoing tDCS stimulation over their left lateral prefrontal cortex. We found that anodal, but not cathodal, stimulation led to noticeable differences in the types of responses participants produced. Participants undergoing excitatory anodal stimulation produced responses that were subjectively rated by an independent group to be more appropriate and less novel, given the context of the task. These results provide further empirical support for the matched filter hypothesis framework (Chrysikou et al., 2014) and shed new light on the extent to which cognitive control mechanisms can influence the generation of novel responses (Chrysikou, 2019). Although the engagement of cognitive control systems can impede creative idea generation, their contribution is imperative for maintaining task goals in working memory and in evaluating the novelty and appropriateness of the generated output. Still, to determine the nature of this relationship, it is critical to consider individual differences and task factors, as well as specific stages of the creative processes (e.g., generation vs. evaluation). Thus, complex interactions between spontaneous and regulatory systems likely guide creative performance. Thus, extending past literature, our findings provide additional—albeit preliminary—empirical support for the role of the prefrontal cortex as a matched filter of cognitive control, contingent on task demands (Chrysikou, 2018; Chrysikou et al., 2014).