Participants
Participants (N = 193; 88 male) ranged in age from 18 to 35 years (M = 22.1 years, SD = 3.7) and were recruited from Yale University and the surrounding community. The experimental protocol was approved by the Yale University Human Subjects Committee. We discovered after testing that 3 participants had taken part in another study that had employed some of the same measures used in the present study; these participants were excluded from analyses involving the overlapping measures to try to avoid practice effects. Data from 7 additional participants were excluded for the following reasons: Participants were taking psychoactive medications, were nonnative English speakers, exceeded the age limit for participation, or had participated in pilot testing for this study.
Given the exploratory nature of our work and our use of a novel creativity task, we were unsure about the number of participants necessary to achieve a robust effect size. We selected a large sample size in order to ensure that we would have enough power to detect a significant difference between performance on cued and uncued conditions of the verb generation task. Furthermore, it is common in the literature on individual differences in creativity to use large sample sizes that are similar in size to the one we chose (e.g., Nusbaum & Silvia, 2011).
Measures and procedure
In addition to the cued creativity verb generation task, participants completed assessments of creativity (divergent thinking tasks, story-writing task, Torrance figural tests, latent inhibition task, Creative Achievement Questionnaire), intelligence and executive functions (Wechsler Adult Intelligence Scale, Raven’s Advanced Progressive Matrices, three-back working memory task, task-switching paradigm), and personality (Big Five Aspect Scales and NEO-PI-R Openness to Experience scale). These measures are described below. Participants were tested individually, and the total duration of each testing session was approximately 2 h. The duration of each individual measure is specified in the relevant section below. Participants also completed two additional measures for research questions not relevant to our aims in this study: (1) a verbal four-term proportional analogical reasoning task, in which participants sought to identify valid analogical mappings between word pairs in two groups (“stem pairs” and “completion pairs”), and (2) a questionnaire used to assess the frequency and intensity of synesthetic experiences from self-report.
Creativity measures
Cued creativity verb generation task
On a given trial, participants were presented with a noun on a computer screen and were asked to say a verb that is related to the noun. For nouns presented in green, participants were instructed to think creatively when generating a verb response. Participants first performed five practice trials for the following nouns: “bowl” (uncued), “comb” (uncued), “fence” (cued), “basket” (cued), and “stage” (uncued). Of the 72 nouns used in experimental trials (see the Appendix, Table 6), half (36) were shown in green (cued condition), and the other half in purple (uncued condition). Cued and uncued trials alternated in sets of two (e.g., two cued trials were followed by two uncued trials, etc.). Because we were interested in individual differences, we chose not to counterbalance the assignment of nouns to conditions (cued, uncued). Each noun can be classified in terms of the extent to which it intrinsically imposes constraints on the verb response (see Barch et al., 2000). A high-constraint noun is one for which there is a single common verb associate (e.g., for “scissors,” almost all participants say “cut”). A low-constraint noun is one for which there is not a single common verb associate (e.g., for “house,” the verb “live”). By design, the two word lists did not differ in terms of their average constraint, where constraint refers to the extent to which the noun is associated with a single common verb associate. We used data from an independent sample to empirically assess the frequency, for each noun, of the most commonly generated verbs. Two cued and two uncued items were removed from all constraint analyses, because the frequency of the most commonly generated verb response for these items was equal to the median value. Of the remaining 68 nouns, no significant difference was found between cued and uncued nouns in the mean frequencies of their most commonly generated verb responses (p > .32).
For each noun, participants were given 8 s to indicate their response by speaking aloud into a microphone connected to a digital voice recorder (task duration: approximately 9 min). From these responses, we derived two measures of performance, the first being the RT, as the latency until the start of the spoken response (the verb). RTs were obtained using Psyscope (Cohen, MacWhinney, Flatt, & Provost, 1993), which recorded the time from noun onset to the onset of a vocal response (using an external button box). Participants’ responses were later transcribed from the digital voice recordings. The second measure was an index of the semantic distance of each verb to the presented noun as derived by latent semantic analysis (LSA; Landauer, Foltz, & Laham, 1998; http://lsa.colorado.edu). LSA is a method for quantifying the similarity between words (or even whole passages) on the basis of statistical analyses of a large corpus of text. We used the topic space of “general reading up to first-year college (300 factors)” and term-to-term comparison type. Technically, this measure of semantic similarity corresponds to the cosine of the angle between vectors corresponding (in our usage) to a noun and a verb within a given semantic space, which is derived through analyses of all of the contexts in which the word tends to be present or absent in that topic space (Landauer et al., 1998; see also Laham, 1997; Landauer & Dumais, 1997). To provide a measure of semantic distance (i.e., the inverse of semantic similarity), LSA-derived semantic similarity values were subtracted from 1 (i.e., semantic distance = 1 − semantic similarity from LSA). Thus, the higher the semantic distance value between two words, the less similar they are in semantic space. LSA values provide a highly reliable measure of noun–verb semantic distance, one with low measurement error and reasonable construct validity. We have previously used LSA to obtain a quantifiable measure of creativity in analogical reasoning, establishing a continuum of semantic distance between within-domain (less creative) and cross-domain (more creative) analogical reasoning (Green, Fugelsang, & Dunbar, 2006; Green, Fugelsang, Kraemer, & Dunbar, 2008; Green, Kraemer, Fugelsang, Gray, & Dunbar, 2010, in press).
Semantic distance is a measure not of the unusualness of the verb but, rather, of the unusualness of the verb in the context of the given noun; note that the noun is the same for all participants but the verbs can vary. Participants’ verb responses were screened for general appropriateness, and the following types of responses were excluded from all analyses: nonverbs (1.7 % of all responses) and verb responses that were not in the LSA corpus (1.4 % of all responses). Additionally, verb forms were standardized by adding the –ing suffix (e.g., “cut” and “cuts” were both standardized to “cutting”) to ensure that responses with the same verb stem corresponded to the same semantic distance value. Semantic distance values for each nounverb pair were calculated and then averaged within participants, doing so separately for the cued ("creative") and uncued conditions.
Data from 10 participants were excluded from all analyses involving semantic distance values for the verb generation task. For 7 participants, their vocal responses were not recorded, due to equipment failure, and the remaining 3 participants had fewer than 50 % of appropriate responses in either the cued or the uncued condition. N = 183 participants contributed to analyses of semantic distance. For RTs, an additional 8 participants were excluded, because each had fewer than 50 % of voice-onset RTs recorded by the button box. Thus, N = 175 participants contributed to analyses of verb generation RT.
Divergent thinking tasks
Participants were administered three divergent thinking problems (Torrance, 1966). For the first problem, participants were asked the following: “Suppose that all humans were born with six fingers on each hand instead of five. List all the consequences or implications that you can think of.” For the second problem, participants were asked to “list as many white, edible things as you can.” For the third problem, participants were asked to “list all the uses you can think of for a brick.” Participants were given 3 min for each problem (task duration: 9 min).
Two students at Yale University served as independent raters and assessed participants’ responses for flexibility, fluency, and originality. Flexibility refers to the total number of different categories that a participant used in each problem, in addition to the number of times that a participant changed the category of his or her response. Fluency refers to the total number of responses. Originality refers to the unusualness (relative to the responses of the other participants in the sample) of participants’ responses. Reliability was high, as assessed by intraclass correlation coefficients (ICCs) across the two ratings for each dimension. For each dimension, interrater reliability for the three problems was the following: flexibility, ICC = .91–.95; fluency, ICC = .97–.99; and originality, ICC = .95–.97. The average ratings were used in subsequent analyses.
Participants’ scores on each dimension were converted to z scores for each problem. These z scores were then summed across the three problems to create a separate z score for flexibility, fluency, and originality for a given participant. The sum of z scores across dimensions represents the divergent thinking total score.
Data from 9 participants were removed from all analyses involving the divergent thinking task due to missing data or failure to follow task instructions or for taking more than the allotted time on a given problem. N = 184 participants contributed data to analyses of divergent thinking.
Story-writing task
The story-writing task is a measure of creative production. Participants were instructed to write a very short story (around four sentences long) including the following three words presented at the top of the computer screen: “stamp,” “send,” and “letter” (task duration: approximately 5 min). Participants were instructed to try to use their imagination and to be creative when writing their story. Participants typed their responses into a standard text-editor program. Four participants with story lengths that were greater than three standard deviations from the mean were excluded from all analyses of story-writing task performance. Data from 7 participants were excluded due to missing data or uninterpretable stories. A total of 182 participants contributed data to all story-writing task analyses.
Three students at Yale University served as independent raters and assessed participants’ stories on the following five dimensions: overall creativity (the extent to which the participant told a unique story that “came alive”), descriptiveness (the extent to which the participant added additional details), semantic flexibility (the manner and number of unique ways in which the participant used the three words), humor (the extent to which the participant incorporated clever, witty, and/or amusing elements into the story), and emotiveness (the extent to which the participant used words that convey emotion and shifts of emotion). Raters assessed each of these dimensions on a 7-point scale (1–7), with 1 reflecting a low rating and 7 reflecting a high rating.
Reliability was assessed as the ICC across the three ratings for each dimension. For each dimension, ICC values indicated good reliability: overall creativity (ICC = .87), descriptiveness (ICC = .87), semantic flexibility (ICC = .90), humor (ICC = .77), and emotiveness (ICC = .77). The average rating on each dimension across raters was used in all reported analyses.
The story-writing dimensions were positively correlated (r values = .27–.88). An exploratory factor analysis using principal axis factoring and direct oblimin rotation (allowing the factors to correlate) was performed on the five story-writing dimensions listed above. Only one factor with an eigenvalue greater than 1.0 was obtained, and this factor accounted for 68.1 % of the variance. Because all of the story-writing dimensions loaded on the same factor, we calculated a story-writing total score, consisting of the sum of each of the z-scored dimension scores. In all of the correlation tables, we report correlations with the story-writing total score.
Although participants were instructed to write very short stories (around four sentences long), there was a great deal of variability in story length (M = 71.7 words, SD = 25.8; range: 19–155 words). Additionally, story word count correlated strongly with story-writing total scores, r(180) = .65. To ensure that the correlations with story-writing performance did not merely reflect story length, all reported correlations in Tables 2, 3, 4 and 5 are partial correlations after controlling for word count. Because the zero-order and partial correlations were very similar, and in order to ease interpretability, all figures use the raw scores.
Abbreviated Torrance Test for Adults (ATTA): figural tests
Participants completed Activities 2 and 3 of the Abbreviated Torrance Tests for Adults (ATTA) as a shortened version of the Torrance Tests of Creative Thinking. Activities 2 and 3 are the figural tests of the ATTA and assess nonverbal creative abilities. For Activity 2 of the ATTA, participants were asked to do the following: “Use the incomplete figures below to make some pictures. Try to make your pictures unusual. Your pictures should communicate as interesting and as complete a story as possible. Be sure to give each picture a title.” For Activity 3, participants were given nine triangles on a sheet of paper and were asked to do the following: “See how many objects or pictures you can make from the triangles below, just as you did with the incomplete figures. Remember to create titles for your pictures.” Participants were given 3 min to complete each Activity (task duration: 6 min). A total of 193 participants contributed data to all Torrance analyses.
Three students at Yale University served as independent raters and assessed participants’ designs on two sets of measures: norm-referenced and criterion-referenced creativity indicators. Norm-referenced measures consist of the following dimensions: fluency (the total number of responses for each problem), originality (the unusualness of participants’ responses), and elaboration (the number of elaborative details that participants added to their designs). Activity 3 was rated on the additional norm-referenced measure of flexibility (the number of different ways in which participants used the triangles). Criterion-referenced creativity indicators consisted of the following dimensions: “openness and resistance to premature closure,” “unusual visualization, different perspective,” “movement and/or sound,” “richness and/or colorfulness of imagery,” “abstractness of titles,” “articulateness in telling story,” “combination/synthesis of two or more figures,” “internal visual perspective,” “expressions of feelings and emotions,” and “fantasy.” Raters assessed each criterion-referenced measure on a 3-point scale (0–2), with a rating of 0 reflecting a low rating and 2 reflecting a high rating.
Reliability was assessed as the ICC across the three ratings for each norm-referenced measure and the sum of criterion-referenced measure scores. ICC values for the average of ratings across the three raters indicated good interrater reliability for fluency (Activity 2, ICC = .92; Activity 3, ICC = .96), originality (Activity 2, ICC = .66; Activity 3, ICC = .57), elaboration (Activity 2, ICC = .91; Activity 3, ICC = .92), flexibility (Activity 3, ICC = .86), and the sum of criterion-referenced measures (ICC = .84). The average of ratings across the three raters for each dimension was used in all reported analyses.
For each Activity, participants’ scores were converted to z-scores for each norm-referenced dimension. The sum of participants’ scores across all criterion-referenced measures for Activities 2 and 3 was added to the sum of z-scores across all norm-referenced measures to create a Torrance total score. In all of the correlation tables, we report correlations with the Torrance total score for ease of interpretability.
Latent inhibition task
Participants completed a latent inhibition (LI) task (task duration: 7 min). LI assesses the extent to which participants experience difficulty in learning to associate a preexposed, formerly irrelevant, stimulus with an outcome. In between-subjects versions of the task, participants who are preexposed to a stimulus tend to require more time to learn the association, as compared with participants who have not been preexposed. The task measures the difficulty participants have in this form of reversal learning. The LI task was included as a putative measure of creativity based largely on Eysenck’s theory of creative achievement (Eysenck, 1993, 1995), which points to reduced LI as a marker of the overlap between high creative achievement and schizotypal personality. Specifically, highly creative individuals are predicted to show a tendency toward attentional “overinclusiveness” of stimuli that others would ignore, just as schizotypal personality is characterized by an inability to exclude irrelevant stimuli from attention (Gray, Feldon, Rawlins, Hemsley, & Smith, 1991; Gray, Hemsley, & Gray, 1992). LI has been negatively associated with the “Big Five” (Costa & McCrae, 1992) personality trait of openness to experience (Carson, Peterson, & Higgins, 2003; Peterson & Carson, 2000; Peterson, Smith, & Carson, 2002), and openness to experience has been found to positively correlate with trait creativity (e.g., McCrae, 1987). However, the association between LI and openness to experience has been inconsistent (Wuthrich & Bates, 2001), and at least one prior report did not find a predicted negative association between LI and trait creativity (Burch, Hemsley, Pavelis, & Corr, 2006).
We used a within-participants version of the LI task (Evans, Gray, & Snowden, 2007). The LI effect was calculated as the difference in mean RTs for preexposed and nonpreexposed stimuli, where a positive difference score indicates the presence of LI. As is described below, a more robust LI effect was found for the RT data, as compared with accuracy data. Thus, we focus on the RT measures for all correlational analyses. In order to control for processing speed, a regression was performed in which RTs from preexposed stimuli were regressed on RTs from nonpreexposed stimuli. The residuals from this regression were used as a measure of LI.
The same exclusionary criteria as in Evans et al. (2007) were employed: Participants who had more than 7 errors of omission or more than 14 errors of commission were excluded from all analyses involving the LI task. Using these criteria, data from 15 participants were excluded from all LI analyses. Two additional participants were excluded from all LI analyses due to missing data. A total of 176 participants contributed data to all LI analyses.
Creative achievement questionnaire
The Creative Achievement Questionnaire (CAQ; Carson, Peterson, & Higgins, 2005) consists of 80 questions, 8 in each of the following domains: visual arts, music, dance, architectural design, creative writing, humor, inventions, scientific discovery, theater and film, and culinary arts (task duration: approximately 5 min).
CAQ total scores (summed across all 10 domains) and individual domain scores were log-transformed in order to better approximate a normal distribution. All analyses involving the CAQ measure employ log CAQ scores. Data from 3 participants were excluded from all analyses involving the CAQ, due to missing data and failure to comply with task instructions, leaving 190 participants.
Intelligence and executive function measures
Wechsler adult intelligence scale
Participants were administered the following subtests of the Wechsler Adult Intelligence Scale (WAIS): vocabulary, similarities, block design, and matrix reasoning. Scores for each subtest were converted to scaled scores and summed to yield the following scores: total (sum across all four subtests), verbal (sum of vocabulary and similarities subtests), and performance (sum of block design and matrix reasoning subtests; task duration: approximately 50 min).
The sum of scaled scores for WAIS total, verbal, and performance measures were then converted to Wechsler Deviation Quotients (DQs) using the conversion table provided in Tellegen and Briggs (1967). A total of 190 participants contributed data to all WAIS analyses.
Raven’s advanced progressive matrices
Participants completed selected questions from the Raven’s Advanced Progressive Matrices (RAPM; Raven, Raven, & Court, 1998), which serves as a measure of general fluid intelligence (gF; task duration: 15 min). Participants were administered 12 items selected from Set II, each of which required the participant to identify the answer option (out of eight provided options) that correctly completed a given pattern.
Participants’ accuracy on the RAPM served as the measure of performance for this task. Data from 2 participants were excluded from all analyses involving the RAPM due to at-chance levels of performance on this task. A total of 191 participants contributed data to all RAPM analyses.
Three-back verbal working memory task
Participants completed a three-back working memory task in which they were presented with words in a serial fashion (task duration: approximately 7 min). Participants were instructed to make a response when a presented word was the same word that had been presented three stimuli ago. Participants’ accuracy (percent correct), d′, and mean RT (only for correct trials) served as measures of performance on this task.
Data from 8 participants were removed from all analyses involving the three-back task due to performance at chance levels or below (e.g., negative d′ values). A total of 184 participants contributed data to all three-back task analyses.
Task-switching paradigm
Participants completed a number–letter task (adapted from Rogers & Monsell, 1995), designed to assess participants’ ability to switch between different task sets (task duration: approximately 8 min). Both a letter and a number were presented on each trial. If the letter and number appeared in blue, participants were instructed to make a consonant/vowel judgment for the letter by pressing one of two labeled keys. If the letter and number appeared in orange, participants were instructed to make an odd/even judgment for the number by pressing one of two labeled keys.
To control for processing speed, a regression was performed in which RTs from switch trials were regressed on RTs from no-switch (repeat) trials. The residuals from this regression were used as a measure of task-switching difficulty. Data from 3 participants were excluded from all analyses involving the task-switching paradigm, because their RT difference scores for switch and no-switch trials (i.e., the RT switch cost) were further than three standard deviations from the mean RT switch cost across participants. Additionally, 2 participants were excluded from all task-switching analyses, due to missing data. A total of 188 participants contributed data to all task-switching analyses.
Personality measures
Big five aspect scales
Participants completed the Big Five Aspects Scales (BFAS) personality questionnaire (DeYoung, Quilty, & Peterson, 2007) (task duration: approximately 6 min). The BFAS personality questionnaire assesses the following two aspects of each Big Five personality domain: neuroticism (volatility, withdrawal), agreeableness (compassion, politeness), conscientiousness (industriousness, orderliness), extraversion (enthusiasm, assertiveness), and openness/intellect (openness, intellect). The BFAS includes a total of 100 items, with 10 items per aspect. A total of 190 participants contributed data to all BFAS analyses.
NEO openness to experience scale
Participants completed the NEO Openness to Experience scale (task duration: approximately 4 min). This scale assesses the following six facets of the Revised NEO Personality Inventory (NEO-PI–R) personality trait of openness (Costa & McCrae, 1992), with eight items per facet: actions (“openness to new experiences on a practical level”), aesthetics (“appreciation of art and beauty”), fantasy (“receptivity to the inner world of imagination”), feelings (“openness to inner feelings and emotions”), ideas (“intellectual curiosity”), and values (“readiness to reexamine own values and those of authority figures”). A total of 193 participants contributed data to all analyses involving the NEO Openness to Experience scale.