Eliciting false insights with semantic priming

Grimmer, Hilary; Laukkonen, Ruben; Tangen, Jason; von Hippel, William

doi:10.3758/s13423-021-02049-x

Eliciting false insights with semantic priming

Brief Report
Open access
Published: 02 February 2022

Volume 29, pages 954–970, (2022)
Cite this article

Download PDF

You have full access to this open access article

Psychonomic Bulletin & Review Aims and scope Submit manuscript

Eliciting false insights with semantic priming

Download PDF

Hilary Grimmer ORCID: orcid.org/0000-0001-8081-3631¹,
Ruben Laukkonen²,
Jason Tangen¹ &
…
William von Hippel¹

3166 Accesses
10 Citations
15 Altmetric
1 Mention
Explore all metrics

Abstract

The insight experience (or ‘Aha moment’) generally evokes strong feelings of certainty and confidence. An ‘Aha’ experience for a false idea could underlie many false beliefs and delusions. However, for as long as insight experiences have been studied, false insights have remained difficult to elicit experimentally. That difficulty, in turn, highlights the fact that we know little about what causes people to experience a false insight. Across two experiments (total N = 300), we developed and tested a new paradigm to elicit false insights. In Experiment 1 we used a combination of semantic priming and visual similarity to elicit feelings of insight for incorrect solutions to anagrams. These false insights were relatively common but were experienced as weaker than correct ones. In Experiment 2 we replicated the findings of Experiment 1 and found that semantic priming and visual similarity interacted to produce false insights. These studies highlight the importance of misleading semantic processing and the feasibility of the solution in the generation of false insights.

Priming analogical reasoning with false memories

Article Open access 18 March 2015

Mark L. Howe, Sarah R. Garner, … Linden J. Ball

Irrelevant insights make worldviews ring true

Article Open access 08 February 2022

Ruben E Laukkonen, Benjamin T Kaveladze, … Jonathan W Schooler

The mnemonic effects of insight on false memory in the DRM paradigm

Article 12 April 2021

Xiumin Du, Can Cui, … Yaowu Song

Introduction

The ‘Aha’ experience is not only exciting, it is also informative; people’s self-reported insights consistently signal the accuracy of their solutions (Danek et al., 2016; Danek & Wiley, 2017; Hedne et al., 2016; Salvi et al., 2016; Webb et al., 2016, 2018). Despite the strength and reliability of this relationship, the feeling of insight does not guarantee that a solution will be correct. Indeed, people have experienced ‘Aha’ moments for incorrect solutions (Danek et al., 2016; Danek & Wiley, 2017; Valueva et al., 2016; Webb et al., 2016). These so-called false insights are difficult to investigate because they have not been evoked experimentally. As a consequence, little is known about their causes. In this paper, we introduce a new experimental paradigm to induce false insights and explore their origins.

Insight moments are important for several reasons: they mark important achievements (Irvine, 2015; Ovington et al., 2018), they are highly memorable (Danek & Wiley, 2020), and they facilitate learning (Kizilirmak et al., 2016). Research has unraveled cognitive processes that underlie insights (Ohlsson, 1984) and more recently has developed phenomenological measures of the insight experience that enable its investigation on a case-by-case basis (Bowden & Jung-Beeman, 2007). These studies reveal that ‘Aha’ moments are accompanied by strong feelings of surprise, positive affect, and certainty (Aziz-Zadeh et al., 2009; Bowden et al., 2005; Danek et al., 2014b; Kounios & Beeman, 2009; Subramaniam et al., 2008). Perhaps most importantly, the heightened confidence associated with an insight experience gives problem-solvers the impression that they have discovered something objectively true (Danek et al., 2014a, 2020; Metcalfe & Wiebe, 1987; Topolinski & Reber, 2010).

Many famous ‘Aha’ moments took considerable time to prove they were accurate, yet problem-solvers often describe a sense of spontaneous certainty without clear evidence. For example, mathematician Yitang Zhang took months to prove his solution to the twin prime conjecture, yet described his moment of insight by saying, “I immediately realized that it would work” (Klarreich, 2013). Experimental evidence that insight moments enhance certainty can be found in research by Laukkonen et al. (2020), who showed that people were more likely to judge statements as true when the statement contained an irrelevant anagram that elicited an ‘Aha’ moment. In a similar paradigm, Dougal and Schooler (2007) also found that successfully solved anagrams were recalled more frequently on a subsequent memory task than solutions to unsolved anagrams, and that this effect diminished when a delay between anagram solving and the memory task was introduced. These findings suggest that insight phenomenology is so closely tied to our judgments of truth that we can misattribute the feelings of certainty to a temporally contiguous, yet conceptually irrelevant, stimulus and mistake feelings of solving for feelings of remembering.

There are several theoretical accounts about why ‘Aha’ moments tend to be correct (Danek & Salvi, 2020; Laukkonen et al., 2018; Salvi et al., 2016), but underlying all of them is the idea that people can feel that they have suddenly solved a problem after experiencing an impasse. As with other feelings, the feeling of insight may not always be accurate, but few studies have directly addressed false insights. One of the only studies comparing false and true insights was conducted by Danek and Wiley (2017), who asked participants to figure out a series of magic tricks. Participants rated any ‘Aha’ experiences in terms of how strong their feelings of surprise, pleasure, satisfaction, and confidence were. The authors found that false ‘Aha’ moments, although uncommon, were rated lower on surprise, pleasure, satisfaction, and confidence than true ‘Aha’ moments.

Early research on insight moments only examined correctly solved problems and distinguished between those solved with and without an ‘Aha’ experience (e.g., Danek et al., 2016; Jung-Beeman et al., 2004; Webb et al., 2016). This practice made it difficult to demonstrate how frequent false insight are, and how they differ from correct insights. By increasing the rates of false solutions, we could also increase the chances for false insights to occur, allowing us to investigate the relationship between ‘Aha’ intensity and accuracy in an experimentally valid and efficient manner, providing a window into their origins and offering information about the processes that generate them.

Although false insights have never been generated through experimental manipulation, there is an analogous – and potentially informative – line of experiments on the creation of false memories (Gallo, 2010). The most famous example is the semantic priming paradigm re-introduced by Roediger and McDermott (1995); see also Deese, 1959). In this paradigm (known as the DRM paradigm), participants are given a list of study words, all of which are related to the same semantic category (e.g., bed, rest, tired, dream), and are then tested for their memory of the study list. Critically, the memory test contains one word that was not present on the study list but is closely related to the semantic category (e.g., sleep). Roediger and McDermott (1995) found that people falsely (and confidently) remembered the related target word as having been presented.

Semantic priming has been widely used across a number of tasks and settings. For example, people are faster to solve anagrams related to semantically primed compared to unprimed categories (Schuberth et al., 1979; White, 1988). In combination, these lines of research suggest that semantic priming can lead people toward both correct and incorrect solutions. Because semantic priming makes certain words more accessible, we reasoned that priming could also make people more likely to mistakenly solve an anagram with a semantically primed associate. Thus, we predicted that solving anagrams after being primed with misleading semantic information could lead participants to have ‘Aha’ experiences for anagram solutions that are objectively incorrect (i.e., elicit false insights). The goal of the current research was to test this possibility, and thereby obtain a better understanding of the mechanisms underlying false insights.

Experiment 1

In Experiment 1 we elicited false insights by priming participants with a list of semantically related words and then presenting a series of four anagrams each relating to the study list in a different way. The anagrams were either made from words (1) chosen at random, (2) presented on the study list, (3) not presented but semantically associated with the list, or (4) visually similar (differing by one or two letters) to an unpresented but semantically associated word. We predicted that people would be lured into having more false insights when solving this final category of visually misleading anagrams that resemble a primed concept compared to the other kinds of anagrams. We also predicted that the phenomenological intensity of false insights would be lower than correct insights, regardless of the type of anagram that led to them. Finally, we expected that participants who experienced more false insights for the deceptive lure anagrams would also be more likely to falsely remember these incorrect solutions as having appeared on the study list.

Method

Open practice statement

This experiment is preregistered on the Open Science Framework. The data, materials, video instructions, experimental design, exclusion criteria, and analysis scripts are available at: https://osf.io/nu3mr/?view_only = c09eedcf8c4545b9a834be405fee90ec

Participants

One hundred and fifty undergraduate psychology students (99 females, mean age = 22.35 years) from The University of X took part in the experiment and were awarded partial course credit for their time. Based on Danek and Wiley (2017), we anticipated a moderate effect size, and established that 150 participants would provide sufficient sensitivity (power = .84) to detect an effect size of d = 0.45.

Design and materials

We generated pairs of similar-looking words (i.e., words of a similar length that share most of their letters), and we then generated lists of ten associated words for each word in the pair. For example, the word pair GARDENER and ENDANGER share most of their letters and are the same length. We then created a list of ten words that were semantically associated with the word GARDENER (e.g., FLOWERPOT, SHOVEL, SEEDLING, etc.) and ten words associated with the word ENDANGER (e.g., HAZARD, THREATEN, RISK). Through this process, we generated six pairs of similar-looking words along with ten semantically associated words for each word in the pair, resulting in six pairs of word lists. One list from each pair and its associated anagrams were put into two counterbalanced versions of the experiment. For example, half of the participants saw the words that primed gardener. This process allowed us to eliminate any effects that might be a function of the specific stimuli rather than the combination of the primes and visual similarity. We randomly allocated half of the participants to perform one of the two versions of the counterbalanced stimuli. Participants thus read one list from each pair (see Fig. 1a) and were then presented with four different anagrams (see Fig. 1b). These anagrams each served a different purpose in terms of our hypotheses. One anagram was a scrambled word from the priming list, which we refer to as the presented target.^{Footnote 1} Another anagram, the primed target, was not presented on the list, but was semantically associated with the words from the list. The critical anagram, which we called the primed lure, was visually similar to a word that was semantically related to the studied list of words, but in fact was really an anagram for a semantically unrelated word that was not presented in the priming list. Finally, we included a random word as a control item that was neither primed nor semantically related (see Fig. 1b). This experiment thus followed a mixed design, with counterbalancing condition as a between-subjects factor, and anagram type as a within-subjects factor.

After we made these lists for all our original word pairs, we used the word-frequency database SUBTLEX-UK (van Heuven, Mandera, Keuleers, & Brysbaert, 2014) to ensure that our words were common enough to assume that participants would be familiar with them. This database of 160,022 words was created by collecting the subtitles from nine British TV channels over a 3-year period and assigning each word a Zipf value to indicate its relative frequency. The Zipf scores can range from 1 (very low frequency) to 6 (very high frequency words). We obtained the Zipf scores for each word on the study lists, with the goal of using words with a value greater than 3 (which van Heuven et al., 2014, propose as the tipping point from low- to high-frequency words). For the control word in the anagram task, we averaged the Zipf scores of the three chosen anagram words, and using that average, we selected a random word of the same length with a Zipf score equal to that average.

To generate anagrams that looked optimally similar to the intended solution, using MATLAB, we generated every possible scrambled configuration of our four anagram word pairs that ranged from most similar to least similar to the intended solution and computed the cosine similarity among the pixels of each scrambled word (see Vokey & Jamieson, 2014). The cosine value of two images indicates how close they are in multidimensional image space and thus how visually similar they are to one another (see OSF for Matlab script). For the primed lures, cosine values closer to 1 suggest that the scrambled word is visually similar to the intended solution – not the actual solution. For the other anagram types, cosine values closer to 1 suggest that they are visually similar to their correct unscrambled solution. Thus, we created the primed lures by entering the intended solution as the target and scrambling the lure (the unrelated but visually similar word) to resemble the intended solution at the ideal level of similarity. The three control anagrams were simply created by scrambling the word itself – the actual solution – to the ideal level of visual similarity. After informal pretesting, we chose anagrams with cosine values of 0.85 to ensure the anagrams were not too dissimilar from their intended solution (and therefore unsolvable), but not so similar that participants might solve them without feelings of impasse and subsequent ‘Aha’ experiences upon resolution. The experiment was programmed using LiveCode and presented to individual participants on laptops.

Measures and procedure

Testing took place in a room with four laptops. After obtaining verbal consent, each participant sat at a computer and played an instruction video that explained how each trial of the experiment would be conducted. The instructions stated that the task was to remember as many words from the study list as possible and recall them after performing an anagram task. Each trial began with participants studying a list of ten semantically associated words, which were presented one at a time on the screen and spoken aloud by the computer voice through the headphones. After the list was completed, participants were instructed to press the spacebar once they were ready to solve the anagrams. The four anagrams described above were then presented in a random order, and participants were told to press the spacebar once they had thought of a solution. There was no time limit for solving the anagrams, but participants were encouraged to work quickly and attempt every anagram. Participants’ reaction time was also recorded in milliseconds for each trial. The full transcript of these instructions is provided in Appendix 1. Upon pressing the spacebar, the anagram disappeared from the screen, and participants were instructed to type their solution into a box on the screen (see Fig. 1c).

After entering each anagram solution, participants were prompted to indicate whether they experienced an ‘Aha’ moment or not (Laukkonen & Tangen, 2018). If participants reported having an ‘Aha’ moment, they were asked to rate the intensity of their ‘Aha’ experience on a scale from 1 (“very weak’) to 10 (‘very strong’). After solving all four anagrams, they were prompted to type all the words they could recall from the study list before proceeding to the next trial. The memory task was used to investigate whether participants who had false insights for the primed lures also falsely remembered these lures as appearing on the study list. False memories were thus recorded when participants included the incorrect solution primed by the lure (i.e., the primed lure) on their recall list at the end of each trial. This process was repeated six times. For additional measures and analyses not included in this paper, see the Online Supplementary Materials (OSM).

Results

The average solution time and correct solution rates for each anagram type and counterbalancing conditions are presented in Table 1. We computed the proportion of all trials for each anagram type with reported false insights as the number of incorrect anagram solutions accompanied by an ‘Aha’ moment divided by the number of trials for each anagram type. The ‘raincloud’ plots in Fig. 2 depict the proportion of trials with false insights across the four anagram

Table 1 Mean reaction time in seconds, proportion of correctly solved trials, and proportion of trials with false insights reported for each anagram type in each counterbalancing condition

Full size table

types, combining boxplots, raw jittered data, and a split-half violin. This figure shows that the primed lure anagrams produced the highest rates of false insights out of the four anagram types.

To test our prediction that the primed lures would elicit more false insights than the other three anagram types, we ran a mixed ANOVA on the proportion of trials with false insights for each anagram type with counterbalancing condition as a between-subjects factor.^{Footnote 2} This analysis revealed no significant difference between the two counterbalancing conditions, suggesting that the effect of anagram type was the same for both sets of stimuli, F(1,148) = 0.04, p = .943, η²_G < .001. As predicted, a significant difference between the number of false insights elicited by each anagram type emerged, F(3,444) 171.52, p < .001, η²_G = .39. We tested our planned comparisons using post hoc Tukey t-tests. As predicted, these revealed that the primed lure anagrams (M = 0.37, SD = 0.22) elicited significantly more false insights than the presented target (M = 0.07, SD = 0.11; t(444) = 19.17, p < .001, d = -1.73, CI = 0.26, 0.34), the primed target (M = 0.09, SD = 0.15; t(444) = 17.83, p < .001, d = -1.49, CI = 0.24, 0.32), and the random anagrams (M = 0.08, SD = 0.13; t(444) = 18.47, p < .001, d = -1.61, CI = 0.25, 0.33).

To compare the intensity of true and false insights, we looked at only trials on which an insight moment was reported, and scored them as either correct or incorrect. We then selected participants who reported at least one false and one correct insight (N = 142) and computed the mean intensity ratings given to false and correct insights for each participant across all anagram types. Because we were interested in the phenomenological difference between false and correct insights, we included all false insights in this analysis, regardless of the type of anagrams on which they occurred – although the majority were for primed lure anagrams (60.35%). A paired t-test revealed that false insights were rated as significantly less intense (M = 5.81, SD = 1.93) than correct insights (M = 6.12, SD = 1.74 t(141) = 2.57, p = .011, d = -0.22, CI = 0.07, 0.54. The correlation between accuracy and insight intensity was also significant, r = .42, p<.001, CI = .28, .54. To test whether false insights predicted false memories, we examined the correlation between participants’ total false insights for primed lures and their total number of false memories for primed lures in the recall task. We considered only primed lures for this analysis to ensure the opportunities for false memory as we defined above (primed lures being reported on the recall list) matched the opportunities for false insights. This relationship was positive and significant, r = .18, p = .029, CI = .02, .33, such that participants who experienced more false insights in the primed lure condition also falsely recalled more primed lures.

Experiment 2

The goal of Experiment 2 was to replicate the findings of Experiment 1 and assess the degree to which the false insight effect in Experiment 1 was driven by either the semantic priming or the misleading visual configuration of the anagrams. We predicted that participants who saw both semantic priming and visually similar anagrams would experience the highest proportion of false insights (as in Experiment 1), followed by participants exposed to semantic priming and given randomly scrambled anagrams, followed by participants who were not semantically primed but were given visually similar anagrams. Thus, we expected that the false insight effect documented in this experiment would be driven more by semantic priming than visual similarity. Finally, we expected that participants would again report lower subjective intensity for false versus correct insights. We did not pursue the relationship between false insights and false memories in this experiment as the aim of this study was to understand the driving factors of the false insight effect.