Neuroscience is increasingly in the public eye (Illes, Kirschen, & Gabrieli, 2003; Racine, Bar-Ilan, & Illes, 2006), and the complexities of neuroimaging raise several challenges for the dissemination of results (Farah, 2009; Illes et al., 2010; Morein-Zamir & Sahakian, 2010; Racine, Bar-Ilan, & Illes, 2005; Roskies, 2008; Singh, Hallmayer, & Illes, 2007). Brain imaging generates large data sets with complex content, so how do researchers select the best presentation format, and does the format affect peoples’ perceptions of the results? To address these questions, we examined whether different types of brain images influence readers’ evaluations of research reports.

Recent work has suggested that people are “seduced” by neuroscientific information, even when it doesn’t add to the data presented. Weisberg, Keil, Goodstein, Rawson, and Gray (2008) presented people with short research articles containing either good or poor explanations for the findings, and containing or not containing superfluous neuroscientific information. Nonexperts were more convinced by articles with poor explanations when they contained “placebic” neuroscientific information—that is, neuroscience text that was irrelevant to the quality of the explanation given (see also Trout, 2008).

McCabe and Castel (2008) presented undergraduates with short research articles and manipulated the presence and type of different images. Across three experiments, their study compared a brain image (either a whole brain or a cross-sectional brain slice, both showing functional activations), a bar graph, a topographic map of electrical activity distributed across the scalp, and no image. In every case, an accompanying brain image led to higher ratings for the perceived soundness of the article.

What is so convincing about images of brains? One hypothesis is that brain images are more convincing because of their perceived complexity. McCabe and Castel (2008) explored this possibility but found no supporting evidence. However, it is possible that perceived complexity is a matter of subjective judgment. In the present study, we thus collected participants’ complexity ratings for different brain images to see whether the ratings predicted persuasiveness. Since moderately complex images are generally preferred over very complex and very simple images (Berlyne, 1971), we tested for a polynomial relationship with this property.

A second hypothesis is that brain images are more persuasive because of their perceived realism. People prefer realistic images and wrongly believe that they improve performance on tasks such as information monitoring or decision-making. Smallman and St. John (2005) identified two variables that induce this misplaced faith in realistic displays, also known as naive realism. One factor is realism of represented objects—visual similarity or resemblance between images in the display and the real objects they represent. Another factor is realism of represented space, where pictorial depth cues such as shading and perspective view give the impression of three-dimensionality. We asked people to rate different brain images on both factors, to see whether the ratings would predict persuasiveness.

An additional factor is familiarity, which has been shown to enhance liking (Zajonc, 1968). We therefore also captured peoples’ familiarity with each image type, to establish whether more familiar brain images would produce more favorable ratings of information associated with them.

If we consider different neuroimage formats, some give the appearance of being more visually complex than others. Some resemble brains in appearance, while others are more schematic and abstract. Some include pictorial depth cues, such as shading and occlusion, to create an impression of three-dimensionality, while others look like flat line drawings. And some are encountered frequently by nonexperts in the popular media, while others are rarely found outside specialized academic journals. For the present studies, we selected five commonly used data presentation formats from the neuroimaging literature. In Experiment 1, neuroscience novices rated the five images for perceived complexity, resemblance to a real brain, and apparent three-dimensionality. In Experiment 2, we asked a different group of novices to rate passages describing scientific research, and each of these texts was accompanied by one of the images. We also asked these raters to separately report their familiarity with each image type. We analyzed the text ratings using the earlier classification to examine which image properties produced greater perceived plausibility for the accompanying scientific reports.

Experiment 1

Method

Participants

Thirty-one University of Dundee undergraduates (14 male; mean age  =  22 years, range  =  18–36) participated. All were native English speakers, had no expertise in cognitive neuroscience or medicine, and had normal or corrected-to-normal vision.

Materials

We selected five frequently used brain image formats by cropping suitable sections from published works. The whole brain image (Fig. 1a) looks like a real brain and contains pictorial depth cues that convey three-dimensionality. The inflated brain (Fig. 1b) also contains pictorial depth cues, but unlike a real brain, the sulci are pushed outward. The axial brain slice (Fig. 1c) looks like a two-dimensional (2-D) cross-section of a real brain. The glass brain (Fig. 1d), which is not designed to look realistic, presents activations “floating” inside a transparent brain, although with no depth cues this is not obvious and its outline appears flat. The topographic map (Fig. 1e), which has color contours indicating electrophysiological activity, is also not designed to appear realistic or 3-D. Each color picture was mounted on 8 × 8 cm white paper.

Fig. 1
figure 1

Mean raw ratings for each image type—(a) whole brain, (b) inflated brain, (c) brain slice, (d) glass brain, and (e) topographic map—by the two realism properties (three-dimensionality and resemblance to a real brain) from Experiment 1. The groupings (high, moderate, low) for each property are indicated by the smaller arrows

Procedure

Participants were tested individually in a within-subjects design. First they were shown the five picture cards and were told that these represented different types of brain images. They then undertook three separate trials in which they freely arranged the images to reflect their subjective judgments. The verbal instruction at the start of each trial was: “Please arrange these cards on the table in any way you like to show how [complex/three-dimensional/much like a real brain] you think they look.” No other constraints were imposed. Following each arrangement, the participants were asked to describe their reasoning and to assign a numerical rating to each image using a 1–8 scale. The experimenter recorded the card arrangements, reasoning, and numerical ratings. The three judgments (complexity, three-dimensionality, and resemblance) were performed independently in counterbalanced order.

Results and discussion

A 3 × 5 (Property × Image) ANOVA showed a significant interaction, F(5.6, 167)  =  34.82, p  <  .001, η2 = .33, Greenhouse–Geisser adjusted (GGA), indicating that ratings for the three properties depended on image type. Correlations, using Kendall’s Tau-b for a small N with tied ranks, showed that the properties were largely unrelated. Although three-dimensionality was correlated with resemblance in the glass and whole brain images (τ Β  =  .36 and .40, p  =  .02 and .01, respectively), there were no other significant correlations among any of the three properties within any of the five image types (see the supplemental materials for correlation tables). We therefore assumed that the three properties were perceptually independent and analyzed them separately, with α adjusted to .005.

For resemblance, there was a large and significant effect of image type, F(4, 120)  =  101.68, p  <  .001, η2  =  .77, indicating that ratings on this property differed for each image (see Fig. 1). Pairwise comparisons showed that the whole brain (M  =  7.1, SD  =  0.9) and the brain slice (M  =  6.8, SD  =  1.6) did not differ, t(30)  =  0.83, p  =  .41, and both were rated as significantly more brain-like than all other images, ts(30) = 8.65 to 19.92, all ps  <  .001. The inflated brain (M  =  3.6, SD  =  1.4) and glass brain (M  =  2.9, SD  =  1.7) did not differ, t(30)  =  1.46, p  =  .15, and were rated as significantly more brain-like than the topographic map (M  =  1.6, SD  =  1.23), ts(30)  =  5.30 and 4.34, respectively, both ps  <  .001. Thus, three distinct groups emerged on this variable—high resemblance (whole brain, brain slice), moderate resemblance (inflated brain, glass brain), and low resemblance (topographic map).

For three-dimensionality, there was also a large effect of image, F(2.1, 64.2)  =  73.11, p  <  .001, η2  =  .71 (GGA), suggesting that ratings differed by image type (see Fig. 1). In pairwise comparisons, the whole brain (M  =  6.8, SD  =  1.8) and the inflated brain (M  =  6.3, SD  =  1.2) did not differ, t(30)  =  1.59, p  =  .12, and were rated as more 3-D than all other images, ts(30)  =  7.03 to 11.27, all ps  <  .001. The brain slice (M  =  3.9, SD  =  1.2) was rated as more 3-D than the glass brain (M  =  1.7, SD  =  1.5) and the topographic map (M  =  2.2, SD  =  1.4), ts(30)  =  8.36 and 5.26, respectively, both ps  <  .001. The glass brain and topographic map did not differ, t(30)  =  1.6, p  =  .11, and were rated as less 3-D than all other images. Thus, three different groups emerged on this variable—high 3-D (whole brain, inflated brain), moderate 3-D (brain slice), and low 3-D (topographic map, glass brain).

For complexity, there was a significant effect of image, F(3.1, 92.3)  =  11.14, p  <  .001, η2  =  .27 (GGA), indicating that ratings differed by image type. In rank order, mean complexity ratings (and SDs) were: glass brain, 6.42 (2.26); whole brain, 5.16 (1.95); inflated brain, 4.48 (1.71); brain slice, 4.13 (1.75); topographic map, 2.94 (2.28). Surprisingly, none of these means differed significantly from the next closest mean, and pairwise comparisons revealed no distinct groups. Therefore, perceived complexity could not be used to derive groups at this stage, but since this may be a statistical power issue, we return to it in Experiment 2.

Experiment 2

In Experiment 2, we paired the five images with scientific passages and tested whether their previously rated properties predicted the perceived credibility of the accompanying texts. Since image complexity ratings in Experiment 1 did not yield distinct groupings, we again assessed this variable, using a survey question administered after the text ratings. We also assessed image familiarity (which is determined by the viewer’s experience, rather than by intrinsic image properties) using the same method.

Method

Participants

A total of 173 University of Dundee first-year undergraduate psychology students participated as part of their curriculum. From these, 51 students who self-reported as dyslexic, nonnative English speaking, or having uncorrected visual acuity issues including color blindness were excluded prior to commencing data analysis, to preclude any effects of perceptual or comprehension difficulties. This left 122 participants (33 male; mean age  =  19.0 years, range  =  18–44).

Materials

The materials were created using the five brain images from Experiment 1 and five short texts (see the supplemental materials). Each text described a neuroscience study, including a possible explanation for the findings. One was adapted from Weisberg et al. (2008), and the others were generated by us. Each text was paired with each image once to create 25 different PowerPoint slides. The sizing and positioning of all texts and images were approximately equal, with the text on the left and the image on the right. A final slide contained only the five images in a single row, labeled A–E (a different version was created for each group to ensure that image order from left to right matched the image order in the preceding five slides; see the Procedure section). Printed rating sheets were also produced.

Procedure

Participants were tested in five groups. A different set of five slides was presented to each group, so that in total all 25 pairings were shown once. Each group saw each image and each text only once, in unique pairings. For example, Group 1 saw Text A paired with Image 1, Group 2 saw Text A paired with Image 2, and so on. Order was counterbalanced across groups so that each image or text occupied every position in the presentation order once. Each unique text–image pairing was viewed by between 19 and 32 participants. The pairings were displayed sequentially on a large screen, and each one was shown until all participants had finished rating. Participants rated the texts for scientific reasoning and quality of explanation (1–8 scales). Finally, after all pairings had been rated, a new instruction was given, and participants turned to a separate rating sheet. The slide containing only the five images was then displayed, and participants rated the images for familiarity and complexity (1–8 scales).

Results and discussion

Each participant’s range of responses was used to normalize his or her ratings (Poulton, 1989). The ratings of scientific reasoning and quality of explanation were highly correlated, suggesting they captured similar judgments (rs = .50 to .75, ps < .001, for all images). We entered the 10 variables (five images, two ratings each) into an exploratory factor analysis (varimax rotation PCA; factor criterion was eigenvalue > 1). This produced a five-factor solution. Each factor captured between 27.6% and 11.4% of unique variance (total cumulative variance accounted for = 84.9%). Each factor corresponded to one image; that is, reasoning and explanation ratings for the whole brain loaded primarily onto one factor, and so on (loadings = .876 to .934). The similar loadings of reasoning and explanation within each image indicated that they tapped a common construct, so we combined them into an aggregate variable that we called credibility. To establish whether image properties affected the perceived credibility of associated texts, we compared texts paired with images previously categorized as high, moderate, and low three-dimensionality and resemblance (analyzed separately, outliers excluded).

We found a significant effect of image three-dimensionality on the perceived credibility of associated texts, F(1.9, 219.2) = 3.90, p = .024, η2 = .03 (GGA). Within-subjects contrasts showed a significant linear trend in text ratings, from high to moderate to low 3-D groupings, F(10.6, 1.2) = 8.79, p = .004. A priori pairwise comparisons showed that texts paired with images previously judged as high 3-D were rated as significantly more credible (M = 5.78) than were texts paired with moderate or low 3-D images (Ms = 5.41 and 5.36), t(118) = 2.03, p = .045, d = 0.56, and t(118) = 2.97, p = .004, d = 0.39, respectively. Moderate and low 3-D images did not differ in their effects on accompanying texts, t(121) = 0.24, p = .81. Figure 2 shows the five images rank-ordered by the three-dimensionality ratings from Experiment 1 and the corresponding credibility ratings for texts presented with those images in Experiment 2. The correspondence between these two data sets is striking, given that they are from different groups of participants directed to rate different aspects of the stimuli (images vs. texts) using different criteria (three-dimensionality vs. scientific credibility). By contrast, there was no effect of resemblance on text credibility ratings, F(2, 242) = 1.02, p = .32, η2 = .008.

Fig. 2
figure 2

Mean normalized three-dimensionality ratings for each image type (Experiment 1, left side) and corresponding credibility ratings from a different group of participants for short research articles when paired with each image type (Experiment 2, right side). Error bars represent 1 standard error of the mean

Image complexity ratings were broadly consistent with Experiment 1, with the glass brain rated as most complex (M = 5.83, modal response = 7), the topographic map as least complex (M = 4.02, modal response = 3), and the inflated brain, whole brain, and brain slice in-between (Ms = 4.89, 4.70, and 4.42; modal responses = 6, 5, and 5, respectively). The greater power revealed a clear separation of the glass brain from all other images, ts(188) = 4.94 to 9.43, ps < .001. For the remaining four images, the groupings were clear, except for one pairwise comparison, and in that case an inspection of distributions and confidence intervals justified separation. Thus, we derived three groups—high complexity (glass brain), moderate complexity (inflated brain, whole brain, brain slice), and low complexity (topographic map). We found no main effect of image complexity on text credibility ratings, but a within-subjects polynomial contrast showed a marginal quadratic trend, F(1, 121) = 3.67, p = .058, η2 = .03. A plot showed an inverted-U shape of plausibility ratings for texts paired with high, moderate, and low complexity images (Ms = 5.36, 5.60, and 5.35, respectively), but group differences were nonsignificant, ts(121) = 0.04 to 1.63, ps = .11 to .97.

For familiarity ratings, two distinct groups emerged: high familiarity (whole brain, brain slice; Ms = 5.44 and 5.22; modal responses = 6 and 7, respectively) and low familiarity (topographic map, inflated brain, glass brain; Ms  = 2.36, 2.00, and 1.92; modal responses = 1). Despite this clear separation, there was no difference in credibility ratings for texts grouped by familiarity of the accompanying images, F(1, 121) = 0.05, p = .83, η2 = .00. Note that complexity and familiarity ratings were gathered after participants had previously seen the images during the text-rating exercise; therefore, we cannot exclude possible carry-over effects.

Finally, we examined the intrinsic relationship between complexity and familiarity, since studies show that more familiar images tend to be rated as less complex (Forsythe, Mulhern, & Sawey, 2008). We found significant negative correlations between complexity and familiarity for the inflated brain, r = –.28, p = .002, glass brain, r = –.19, p = .039, and topographic map, r = –.18, p = .04: Participants who rated these images as more familiar perceived them as less complex, and vice versa. Correlations were not significant for the whole brain and brain slice (high-familiarity images), although plots suggested a negative relationship for all images. The plots also indicated restricted variance, which may have attenuated the correlations.

General discussion

Our findings show that the inflated brain and whole brain images made accompanying texts seem more convincing than texts paired with other images. The main factor differentiating these images was their 3-D perceptual quality, and this property was linearly related to credibility ratings for texts paired with each image type. There is some indication that moderate (not extremely high or low) perceived image complexity may also have contributed to higher ratings for associated texts. Surprisingly, we found no systematic effects of image familiarity or resemblance to a real brain.

Why should the apparent three-dimensionality of a brain image cause naïve viewers to allocate higher credibility ratings to associated texts? One possible explanation relates to how we conceptualize different kinds of displays. Tversky (2001) has distinguished between depictions (direct visualizations of things that are visible) and graphics (schematics that do not have a one-to-one mapping with what they represent and often require expert schemas for interpretation, such as statistical charts; Ratwani & Trafton, 2008). This distinction may be relevant to understanding how our participants perceived the different brain images. It is possible that apparent three-dimensionality contributed to a general impression of concreteness or solidity. This may have encouraged our participants to construe the 3-D images as direct depictions, representing something tangible or visible, rather than nonfigurative graphics. Depictions are easy to interpret, since they do not involve technical conventions and the mapping between image and referent is fairly direct. Such apparent (but not actual) directness may have increased processing fluency or intellectual fluency, which is the feeling that one understands and can extract information with relative ease (Schwarz, 2004; Schwarz, Sanna, Skurnik, & Yoon, 2007; Trout, 2008). When information seems easier to process, it is more likely to be judged as true (Reber & Schwarz, 1999). Thus, a feeling of directness when viewing 3-D images may have led to increased confidence in the associated texts.

The premise that nonexperts may tacitly view brain images as direct depictions concurs with ideas from the neuroethics literature. Roskies (2008) claims that members of the general public tend to think of neuroimages as if they are “photographs” of events in the brain taken by a kind of “brain-camera.” Just as we accord a special veridical status to photographs, brain images “invite us to believe” (p. 20). This view also fits with the naïve realism hypothesis—perception of the real world seems effortless and direct, so displays with “real-world” visual properties also seem direct and intuitive (Smallman & St. John, 2005). According to our interpretation, this may be especially true for images that seem to have tangible object-like qualities, such as apparent three-dimensionality. Thus, the present study goes beyond McCabe and Castel’s (2008) intriguing findings to show that some types of brain images are more convincing than others. We propose that images with visual properties related to “objecthood” may enhance feelings of directness and fluency, making naïve viewers more likely to be convinced by what they see. This theoretical account generates future testable hypotheses about which image variables and which cognitive and affective processes in nonexperts may underlie the persuasive power of brain images.

It is possible that aesthetic preferences also played a role. Viewers tend to prefer 3-D-style displays over 2-D displays, despite often poorer performance with 3-D displays (Carswell, 1991, cited in Kosslyn, 2006; Cockburn & McKenzie, 2001; Fischer, 2000). Most people believe a 3-D version of a graph or chart is more attractive and more likely to appear in a scientific journal (Stewart, Cipolla, & Best, 2008), and presenters generate higher-dimensional graphs when aiming to impress (Tractinsky & Meyer, 1999). Thus, depth cues may enhance both the attractiveness and the scientific aura of data presentation formats. Image preference is also enhanced by feelings of perceptual fluency, which may have been greater for our 3-D images (Reber, Winkielman, & Schwarz, 1998).

Our findings suggest an inverted-U-shaped relationship between perceived image complexity and text credibility ratings (Berlyne, 1971). The effect was weak, but measuring perceived complexity is challenging—there are no reliable metrics, and more familiar images are rated as less complex (consistent with our within-image correlations), so that obtaining an “unbiased” rating of complexity is difficult (Forsythe et al., 2008). Moreover, complexity is not a unitary variable, but is composed of distinct dimensions such as element intelligibility, element heterogeneity, color variety, and asymmetry, and these dimensions affect preference judgments differently (Nadal, Besch, Kelley, Mains, & Marty, 2008).

It is surprising that text credibility was not predicted by image familiarity, given the well-established “mere exposure” effect (Zajonc, 1968). However, familiarity effects are bound up with complexity in a nonstraightforward way (repeated exposure to complex images increases liking, whereas for simple images the opposite holds; Berlyne, 1971), and moreover, familiarity and fluency are related (Whittlesea, 1993). This convoluted set of relationships might have masked familiarity effects, especially within our small set of images, and further research into its effects is needed.

The brain slice was rated as closely resembling a real brain, yet it didn’t result in very high credibility ratings. Although realistically rendered, perhaps other cues made it seem less “object-like”—it could not be a direct depiction, since we do not see inside solid objects in the real world. Indeed, high credibility ratings for the inflated brain imply that precise iconic resemblance may be less important for feelings of fluency than a general sense of tangibility or object-likeness. Of course, this finding may depend on viewer expertise (Weisberg et al., 2008).

Finally, we acknowledge that we do not have a baseline measure of text credibility ratings with a different kind of image, such as a graph, or with no image (cf. McCabe & Castel, 2008). It is possible that our between-image differences represent subtle variations on a much larger general enhancement effect, which occurs when any brain image is added. Nevertheless, it is quite notable that we still found differences in text ratings related to image qualities, especially since the perceived credibility of the texts may have already been elevated by their neuroscientific content (Weisberg et al., 2008).

Our findings provide guidance for the public dissemination of neuroscience (Illes et al., 2010; Morein-Zamir & Sahakian, 2010) and support the need for instruction in “neuroliteracy” to ensure unbiased communication of research (Roskies, 2002). These results also imply that researchers should choose object-like visualizations to accompany their written reports if they wish to make their arguments convincing to nonexperts. If a picture is really worth a thousand words, the images we select to present our data may be saying more than we think.