Can people identify original and manipulated photos of real-world scenes?

Nightingale, Sophie J.; Wade, Kimberley A.; Watson, Derrick G.

doi:10.1186/s41235-017-0067-2

Can people identify original and manipulated photos of real-world scenes?

Original article
Open access
Published: 18 July 2017

Volume 2, article number 30, (2017)
Cite this article

Download PDF

You have full access to this open access article

Cognitive Research: Principles and Implications Submit manuscript

Can people identify original and manipulated photos of real-world scenes?

Download PDF

Sophie J. Nightingale¹,
Kimberley A. Wade¹ &
Derrick G. Watson¹

77k Accesses
57 Citations
766 Altmetric
113 Mentions
Explore all metrics

Abstract

Advances in digital technology mean that the creation of visually compelling photographic fakes is growing at an incredible speed. The prevalence of manipulated photos in our everyday lives invites an important, yet largely unanswered, question: Can people detect photo forgeries? Previous research using simple computer-generated stimuli suggests people are poor at detecting geometrical inconsistencies within a scene. We do not know, however, whether such limitations also apply to real-world scenes that contain common properties that the human visual system is attuned to processing. In two experiments we asked people to detect and locate manipulations within images of real-world scenes. Subjects demonstrated a limited ability to detect original and manipulated images. Furthermore, across both experiments, even when subjects correctly detected manipulated images, they were often unable to locate the manipulation. People’s ability to detect manipulated images was positively correlated with the extent of disruption to the underlying structure of the pixels in the photo. We also explored whether manipulation type and individual differences were associated with people’s ability to identify manipulations. Taken together, our findings show, for the first time, that people have poor ability to identify whether a real-world image is original or has been manipulated. The results have implications for professionals working with digital images in legal, media, and other domains.

Can people detect errors in shadows and reflections?

Article Open access 28 June 2019

Digital Images Are Data: And Should Be Treated as Such

Towards a Systematic Screening Tool for Quality Assurance and Semiautomatic Fraud Detection for Images in the Life Sciences

Article Open access 15 November 2016

Significance

In the digital age, the availability of powerful, low-cost editing software means that the creation of visually compelling photographic fakes is growing at an incredible speed—we live in a world where nearly anyone can create and share a fake image. The rise of photo manipulation has consequences across almost all domains, from law enforcement and national security through to scientific publishing, politics, media, and advertising. Currently, however, scientists know very little about people’s ability to distinguish between original and fake images—the question of whether people can identify when images have been manipulated and what has been manipulated in the images of real-world scenes remains unanswered. The importance of this question becomes evident when considering that, more often than not, in today’s society we still rely on people to make judgments about image authenticity. This reliance applies to almost all digital images, from those that are used as evidence in the courtroom to those that we see every day in newspapers and magazines. Therefore, it is critical to better understand people’s ability to accurately identify fake from original images. This understanding will help to inform the development of effective guidelines and practices to address two key issues: how to better protect people from being fooled by fake images, and how to restore faith in original images.

Background

In 2015, one of the world’s most prestigious photojournalism events—The World Press Photo Contest—was shrouded in controversy following the disqualification of 22 entrants, including an overall prize winner, for manipulating their photo entries. News of the disqualifications led to a heated public debate about the role of photo manipulation in photojournalism. World Press Photo responded by issuing a new code of ethics for the forthcoming contest that stipulated entrants “must ensure their pictures provide an accurate and fair representation of the scene they witnessed so the audience is not misled” (World Press Photo). They also introduced new safeguards for detecting manipulated images, including a computerized photo-verification test for entries reaching the penultimate round of the competition. The need for such a verification process highlights the difficulties competition organizers face in trying to authenticate images. If photography experts can’t spot manipulated images, what hope is there for amateur photographers or other consumers of photographic images? This is the question we aimed to answer. That is, to what extent can lay people distinguish authentic photos from fakes?

Digital image and manipulation technology has surged in the previous decades. People are taking more photos than ever before. Estimates suggested that one trillion photos would be taken in 2015 alone (Worthington, 2014), and that, on average, more than 350 million photos per day are uploaded to Facebook—that is over 14 million photos per hour or 4000 photos per second (Smith, 2013). Coinciding with this increased popularity of photos is the increasing frequency with which they are being manipulated. Although it is difficult to estimate the prevalence of photo manipulation, a recent global survey of photojournalists found that 76% regard photo manipulation as a serious problem, 51% claim to always or often enhance in-camera or RAW (i.e., unprocessed) files, and 25% admit that they, at least sometimes, alter the content of photos (Hadland, Campbell, & Lambert, 2015). Together these findings suggest that we are regularly exposed to a mix of real and fake images.

The prevalence and popularity of manipulated images raises two important questions. First, to what extent do manipulated images alter our thinking about the past? We know that images can have a powerful influence on our memories, beliefs, and behavior (e.g., Newman, Garry, Bernstein, Kantner, & Lindsay, 2012; Wade, Garry, Read, & Lindsay, 2002; Wade, Green, & Nash, 2010). Merely viewing a doctored photo and attempting to recall the event it depicts can lead people to remember wholly false experiences, such as taking a childhood hot-air balloon ride or meeting the Warner Brothers character Bugs Bunny at Disneyland (Braun, Ellis, & Loftus, 2002; Sacchi, Agnoli, & Loftus, 2007; Strange, Sutherland, & Garry, 2006). Thus, if people cannot differentiate between real and fake details in photos, manipulations could frequently alter what we believe and remember.

Second, to what extent should photos be admissible as evidence in court? Laws governing the use of photographic evidence in legal cases, such as the Federal Rules of Evidence (1975), have not kept up with digital change (Parry, 2009). Photos were once difficult to manipulate; the process was complex, laborious, and required expertise. Yet in the digital age, even amateurs can use sophisticated image-editing software to create detailed and compelling fake images. The Federal Rules of Evidence state that the content of a photo can be proven if a witness confirms it is fair and accurate. Put another way, the person who took the photo, any person who subsequently handles it, or any person present when the photo was taken, is not required to testify about the authenticity of the photo. If people cannot distinguish between original and fake photos, then litigants might use manipulated images to intentionally deceive the court, or even testify about images, unaware they have been changed.

Unfortunately, there is no simple solution to prevent people from being fooled by manipulated photos in everyday life or in the criminal arena (Parry, 2009). But the newly emerging field of image forensics is making it possible to better protect against photo fraud (e.g., Farid, 2006). Image forensics uses digital technology to determine image authenticity, and is based on the premise that digital manipulation alters the values of the pixels that make up an image. Put simply, the act of manipulating a photo leaves behind a trace, even if only subtle and not visible to the naked eye (Farid, 2009). Given that different types of manipulations—for instance, cloning, retouching, splicing—affect the underlying pixels in unique and systematic ways, image forensic experts can develop computer methods to reveal image forgeries. Such technological developments are being implemented in several domains, including law, photojournalism, and scientific publishing (Oosterhoff, 2015). The vast majority of image authenticity judgments, however, are still made by eye, and to our knowledge only one published study has explored the extent to which people can detect inconsistencies in images.

Farid and Bravo (2010) investigated how well people can make use of three cues— shadows, reflections, and perspective distortion—that are often indicative of photo tampering. The researchers created a series of computer-generated scenes consisting of basic geometrical shapes. Some scenes, for instance, were consistent with a single light source whereas others were inconsistent with a single light source. When the inconsistencies were obvious, that is, when shadows ran in opposite directions, observers were able to identify tampering with nearly 100% accuracy. Yet when the inconsistencies were subtle, for instance, where the shadows were a combination of results from two different light positions on the same side of the room, observers performed only slightly better than chance. These preliminary findings, based on computer-generated scenes of geometric objects, suggest that the human visual system is poor at identifying inconsistencies in such images.

In the current study we examined whether people are similarly poor at detecting inconsistencies within images of real-world scenes. On the one hand, we might expect people to perform even worse if trying to detect manipulations in real-world photos. Research shows that real-world photos typically contain many multi-element objects that can obscure distortions (Bex, 2010; Hulleman & Olivers, 2015). For example, people with the visual impairment metamorphopsia often do not notice any problems with their vision in their everyday experiences, yet the impairment is quite apparent when they view simple stimuli, such as a grid of evenly spaced horizontal and vertical lines (Amsler, 1953; Bouwens & Meurs, 2003). We also know that people find it more difficult to detect certain types of distortions, such as changes to image contrast, in complex real-world scenes than in more simplistic stimuli (Bex, 2010; Bex, Solomon, & Dakin, 2009). In sum, if people find it particularly difficult to detect manipulations in complex real-world scenes, then we might expect our subjects to perform worse than Farid and Bravo’s (2010) subjects.

On the other hand, there is good reason to predict that people might do well at detecting manipulations in real-world scenes. Visual cognition research suggests that people might detect image manipulations using their knowledge of the typical appearance of real-world scenes. Real-world scenes share common properties, such as the way the luminance values of the pixels are organized and structured (Barlow, 1961; Gardner-Medwin & Barlow, 2001; Olshausen & Field, 2000). Over time, the human visual system has become attuned to such statistical regularities and has expectations about how scenes should look. When an image is manipulated, the structure of the image properties change, which can create a mismatch between what people see and what they expect to see (Craik, 1943; Friston, 2005; Rao & Ballard, 1999; Tolman, 1948). Thus, based on this real-world scene statistics account, we might predict that people should be able to use this “mismatch” as a cue to detecting a manipulation. If so, our subjects should perform better than chance at detecting manipulations in real-world scenes.

Although there is a lack of research directly investigating the applied question of people’s ability to detect photo forgeries, people’s ability to detect change in a scene is well-studied in visual cognition. Notably, change blindness is the striking finding that, in some situations, people are surprisingly slow, or entirely unable, to detect changes made to, or find differences between, two scenes (e.g., Pashler, 1988; Simons, 1996; Simons & Levin, 1997). In some of the early studies, researchers demonstrated observers’ inability to detect changes made to a scene during an eye movement (saccade) using very simple stimuli (e.g., Wallach & Lewis, 1966), and later, in complex real-world scenes (e.g., Grimes, 1996). Researchers have also shown that change blindness occurs even when the eyes are fixated on the scene: The flicker paradigm, for instance, simulates the effects of a saccade or eye blink by inserting a blank screen between the continuous and sequential presentation of an original and changed image (Rensink, O’Regan, & Clark, 1997). It often requires a large number of alternations between the two images before the change can be identified. Furthermore, change blindness persists when the original and changed images are shown side by side (Scott-Brown, Baker, & Orbach, 2000), when change is masked by a camera cut in motion pictures (Levin & Simons, 1997), and even when change occurs in real-world situations (Simons & Levin, 1998).

Such striking failures of perception suggest that people do not automatically form a complete and detailed visual representation of a scene in memory. Therefore, to detect change, it might be necessary to draw effortful, focused attention to the changed aspect (Simons & Levin, 1998). So which aspects of a scene are most likely to gain focused attention? One suggestion is that attention is guided by salience; the more salient aspects of a scene attract attention and are represented more precisely than less salient aspects. In support of this idea, research has shown that changes to more important objects are more readily detected than changes made to less important objects (Rensink et al., 1997). Other findings, however, indicate that observers sometimes miss even large changes to central aspects of a scene (Simons & Levin, 1998). Therefore, the question of what determines scene saliency continues to be explored. Specifically, researchers disagree about whether the low-level visual salience of objects in a scene, such as brightness (e.g., Lansdale, Underwood, & Davies, 2010; Pringle, Irwin, Kramer, & Atchley, 2001; Spotorno & Faure, 2011) or the high-level semantic meaning of the scene (Stirk & Underwood, 2007) has the most influence on attentional allocation.

What other factors affect people’s susceptibility to change blindness? One robust finding in the signal detection literature is that the ability to make accurate perceptual decisions is related to the strength of the signal and the amount of noise (Green & Swets, 1966). Signal detection theory has been applied to change detection. In one study, observers judged whether two sequentially presented arrays of colored dots remained identical or if there was a change (Wilken & Ma, 2004). Crucially, the researchers manipulated the strength of the signal in the change trials by varying the number of colored dots in the display that changed, while noise (total set size) remained constant. Performance improved as a function of the number of dots in the display that changed color—put simply, greater signal resulted in greater change detection.

Given the lack of research investigating people’s ability to detect photo forgeries, change blindness offers a highly relevant area of research. A key difference between the change blindness research and our current experiments, however, is that our change detection task does not involve a comparison of two images; therefore, representing the scene in memory is not a factor in our research. That is, subjects do not compare the original and manipulated versions of an image. Instead, they make their judgment based on viewing only a single image. This image is either the original, unaltered image or an image that has been manipulated in some way.

In the current study, we explored people’s ability to identify common types of image manipulations that are frequently applied to real-world photos. We distinguished between physically implausible versus plausible manipulations. For example, a physically implausible image might depict an outdoor scene lit only by the sun with a person’s shadow running one way and a car’s shadow running the other way. Such shadows imply the impossible: two suns. Alternatively, when an unfamiliar face is retouched in an image it is quite plausible; eliminating spots and wrinkles or whitening teeth do not contradict physical constraints in the world that govern how faces ought to look. In our study, geometrical and shadow manipulations made up our implausible manipulation category, while airbrushing and addition or subtraction manipulations made up our plausible manipulation category. Our fifth manipulation type, super-additive, presented all four manipulation types in a single image and thus included both categories of manipulation.

We had a number of predictions about people’s ability to detect and locate manipulations in real-world photos. We expected the type of manipulation—implausible versus plausible—to affect people’s ability to detect and locate manipulations. In particular, people should correctly identify more of the physically implausible manipulations than the physically plausible manipulations given the availability of evidence within the photo. We also expected people to be better at correctly detecting and locating manipulations that caused more change to the pixels in the photo than manipulations that caused less change.

Experiment 1

Methods

Subjects and design

A total of 707 (M = 25.8 years, SD = 8.8, range = 14–82; 460 male, 226 female, 21 declined to respond) subjects completed the task online. A further 17 subjects were excluded from the analyses because they had missing response time data for at least one response on the detection or location task. There were no geographical restrictions and subjects did not receive payment for taking part, but they did receive feedback on their performance at the end of the task. Subject recruitment stopped when we reached at least 100 responses per photo. We used a within-subjects design in which each person viewed a series of ten photos, half of which had one of five manipulation types applied, and half of which were original, non-manipulated photos. We measured people’s accuracy in determining whether a photo had been manipulated or not and their ability to locate manipulations.

Stimuli

We obtained ten colored images (JPEG format), 1600 × 1200 pixels, that depicted people in real-world scenes from Google Image search (permitted for non-commercial re-use with modification). The first author (SN) used GNU Image Manipulation Program (GIMP) to apply five different, commonly used manipulation techniques: (a) airbrushing, (b) addition or subtraction, (c) geometrical inconsistency, (d) shadow inconsistency, and (e) super-additive (manipulations a to d included within a single image). For the airbrushing technique, we changed the person’s appearance by whitening their teeth, removing spots, wrinkles, or sweat, or brightening their eye color. For the addition or subtraction technique, we added or removed objects, or parts of objects. For example, we removed links between tower columns on a suspension bridge and inserted a boat into a river scene. For geometrical inconsistencies, we created physically implausible changes, such as distorting angles of buildings or sheering trees in different directions to others to indicate inconsistent wind direction. For shadow inconsistencies, we removed or changed the direction of a shadow to make it incompatible with the remaining shadows in the scene. For instance, flipping a person’s face around the vertical axis causes the shadow to appear on the wrong side compared with the rest of the body and scene. In the super-additive technique we presented all four previously described manipulation types in one photo. Figure 1 shows examples of the five manipulation types, and higher resolution versions of these images, as well as other stimuli examples, appear in Additional file 1.

In total, we had ten photos of different real-world scenes. The non-manipulated version of each of these ten photos was used to create our original photo set. To generate the manipulated photos, we applied each of the five manipulation types to six of the ten photos, creating six versions of each manipulation for a total of 30 manipulated photos. This gave us an overall set of 40 photos. Subjects saw each of the five manipulation types and five original images but always on a different photo.

Image-based saliency cues can determine where subjects direct their attention; thus, we checked whether our manipulations had changed the salience of the manipulated area within the image. To examine this, we ran the images through two independent saliency models: the classic Itti-Koch model (Itti & Koch, 2000; Itti, Koch, & Niebur, 1998) and the Graph-Based Visual Saliency (GBVS) model (Harel, Koch, & Perona, 2006). To summarize, we found that our manipulations did not inadvertently change the salience of the manipulated regions. See Additional file 2 for details of these analyses.

Procedure

Subjects answered questions about their demographics, attitudes towards image manipulation, and experiences of taking and manipulating photos. Subjects were then shown a practice photo and instructed to adjust their browser zoom level so that the full image was visible. Next, subjects were presented with ten photos in a random order and they had an unlimited amount of time to view and respond to each photo. We first measured subjects’ ability to detect whether each photo had been manipulated by asking “Do you think this photograph has been digitally altered?” Subjects were given three response options: (a) “Yes, and I can see exactly where the digital alteration has been made”; (b) “Yes, but I cannot see specifically what has been digitally altered”; or (c) “No.” For the manipulated photos, we considered either of the “yes” responses as correct; for original photos we considered “no” as correct. Following a “yes” response, we immediately measured subjects’ ability to locate the manipulation by presenting the same photo again with a 3 × 3 grid overlaid^{Footnote 1} (see Fig. 2 for an example). Subjects were asked to: “Please select the box that you believe contains the digitally altered area of the photograph (if you believe that more than one region contains digital alteration, please select the one you feel contains the majority of the change).” On average, manipulations spanned two regions in the grid. For the analyses we considered a response to be correct if the subject clicked on a region that contained any of the manipulated area or a nearby area that could be used as evidence that a manipulation had taken place—a relatively liberal criterion. Subjects received feedback on their performance at the end of the study.

Results and discussion

An analysis of the response time data suggested that subjects were engaged with the task and spent a reasonable amount of time determining which photos were authentic. In the detection task, the mean response time per photo was 43.8 s (SD = 73.3 s) and the median response time 30.4 s (interquartile range 21.4, 47.7 s). In the location task, the mean response time was 10.5 s (SD = 5.7 s) and the median response time 9.1 s (interquartile range 6.5, 13.1 s). Following Cumming’s (2012) recommendations, we present our findings in line with the estimation approach by calculating a precise estimate of the actual size of the effects.

Overall accuracy on the detection task and the location task

We now turn to our primary research question: To what extent can people detect and locate manipulations of real-world photos? For the detection task, we collapsed across the two “yes” response options such that if subjects responded either “Yes, and I can see exactly where the digital alteration has been made” or “Yes, but I cannot see specifically what has been digitally altered”, then we considered this to be a “yes” response. Thus, chance performance was 50%. Overall performance on the detection task was better than chance; a mean 66% of the photos were correctly classified as original or manipulated, 95% confidence interval (CI)^{Footnote 2} [65%, 67%]. Subjects’ ability to distinguish between original (72% correct) and manipulated (60% correct) photos of real-world scenes was reliably greater than zero, discrimination (d') = 0.80, 95% CI [0.74, 0.85]. Moreover, subjects showed a bias towards saying that photos were real; response bias (c) = 0.16, 95% CI [0.12, 0.19]. Although subjects’ ability to detect manipulated images was above chance, it was still far from perfect. Furthermore, even when subjects correctly indicated that a photo had been manipulated, they could not necessarily locate the manipulation. Collapsing over all manipulation types, a mean 45% of the manipulations were accurately located, 95% CI [43%, 46%]. To determine chance performance in the location task, we need to take into account that subjects were asked to select one of nine regions of the image. Therefore, subjects had less chance of being correct by guessing in the location task than the detection task. On average, the manipulations were contained within two of the nine regions. But because the chance of being correct by guessing varied for each image and each manipulation type, we ran a Monte Carlo simulation to determine the chance rate of selecting the correct region. Table 1 shows the results from one million simulated responses. Overall, chance performance was 24%; therefore, collectively, subjects performed better than chance on the location task. Overall, the results show that people have some (above chance) ability to detect and locate manipulations, although performance is far from perfect.

Table 1 Mean number of regions (out of a possible nine) containing manipulation and results of Monte Carlo simulation to determine chance performance in location task by manipulation type and overall

Full size table

Ability to detect and locate by manipulation type

We predicted that people’s ability to detect and locate manipulations might vary according to the manipulation type. Figure 3 shows subjects’ accuracy on both the detection and the location task by manipulation type. In line with our prediction, subjects were better at detecting manipulations that included physically implausible changes (geometrical inconsistencies, shadow inconsistencies, and super-additive manipulations) than images that included physically plausible changes (airbrushing alterations and addition or subtraction of objects).

It was not the case, however, that subjects were necessarily better at locating the manipulation within the photo when the change was physically implausible. Figure 4 shows the proportion of manipulated photo trials in which subjects correctly detected a manipulation and also went on to correctly locate that manipulation, by manipulation type. Across both physically implausible and physically plausible manipulation types, subjects often correctly indicated that photos were manipulated but failed to then accurately locate the manipulation. Furthermore, although the physically implausible geometrical inconsistencies were more often correctly located, the shadow inconsistencies were only located equally as often as the physically plausible manipulation types—airbrushing and addition or subtraction. These findings suggest that people may find it easier to detect physically implausible, rather than plausible, manipulations, but this is not the case when it comes to locating the manipulation.

Image metrics and accuracy

To understand more about people’s ability to identify image manipulations, we examined how the amount of change in a photo affects people’s accuracy in the detection and location tasks. When an image is digitally altered, the structure of the underlying elements—the pixels—are changed. This change can be quantified in numerous ways but we chose to use Delta-E₇₆ because it is a measure based on both color and luminance (Robertson, 1977). To calculate Delta-E, we first converted the images in Matlab® to L*a*b* color space because it has a dimension for lightness as well as color. Next we calculated the difference between corresponding pixels in the original and manipulated versions of each photo. Finally, these differences were averaged to give a single Delta-E score for each manipulated photo. A higher Delta-E value indicates a greater amount of difference between the original and the manipulated photo.^{Footnote 3} We calculated Delta-E for each of the 30 manipulated photos.

Figure 5 shows the log Delta-E values on the x-axis, where larger values indicate more change in the color and luminance values of pixels in the manipulated photos compared with their original counterpart. The proportions of correct detection (Fig. 5a) and location (Fig. 5b) responses for each of the manipulated photos are presented on the y-axis. We found a positive relationship between the Delta-E measure and the proportion of photos that subjects correctly detected as manipulated, albeit not reaching significance: r(28) = 0.34, p = 0.07.^{Footnote 4} Furthermore, the Delta-E measure was positively correlated with the proportion of manipulations that were correctly located, r(28) = 0.41, p = 0.03. As predicted, these data suggest that people might be sensitive to the low level properties of real-world scenes when making judgments about the authenticity of photos. This finding is especially remarkable given that our subjects never saw the same scene more than once and so never saw the original version of a manipulated image. This finding fits with the proposition that disrupting the underlying pixel structure might exacerbate the difference between the manipulated photos and people’s expectations of how a scene should look. Presumably, these disruptions make it easier for people to accurately classify manipulated photos as being manipulated. We can also interpret these findings based on a signal detection account—adding greater signal (in our experiment, more change to an image, as measured by Delta-E) results in greater detection of that signal (Green & Swets, 1966; Wilken & Ma, 2004).

Next, we tested whether there was a relationship between the mean amount of change and the mean proportion of correct detection (Fig. 6a) and location (Fig. 6b) responses by the category of manipulation type. As Fig. 6 shows, there was a numerical, but non-significant, trend for a positive relationship between amount of change and the proportion of photos that subjects correctly detected as manipulated: r(3) = 0.68, p = 0.21. There was also a numerical trend for a positive relationship between amount of change and the proportion of manipulations that were correctly located: r(3) = 0.69, p = 0.19.

Individual factors in detecting and locating manipulations

To determine whether individual factors play a role in detecting and locating manipulations, we gathered subjects’ demographic data, attitudes towards image manipulation, and experiences of taking and manipulating photos. We also recorded subjects’ response times on the detection and location tasks.

To determine how each factor influenced subjects’ performance on the manipulated image trials, we conducted two generalized estimating equation (GEE) analyses—one for accuracy on the detection task and one for accuracy on the location task. Specifically, we conducted a repeated measures logistic regression with GEE because our dependent variables were binary with both random and fixed effects (Liang & Zeger, 1986). For the detection task, we ran two additional repeated measures linear regression GEE models to explore the effect of the predictor variables on signal detection estimates d' and c. The results of the GEE analyses are shown in Table 2. In the detection task, faster responses were more likely to be associated with accurate responses than slower responses. There was also a small effect of people’s general belief about the prevalence of manipulated photos in their everyday lives on accuracy in the detection task. Those who believe a greater percentage of photos are digitally manipulated were more likely to correctly identify manipulated photos than those who believe a lower percentage of photos are digitally manipulated. Further, the results of the signal detection analysis suggest that this results from a difference in ability to discriminate between original and manipulated photos, rather than a shift in response bias—those who believe a greater percentage of photos are digitally manipulated accurately identified more of the manipulated photos without an increased false alarm rate. General beliefs about the prevalence of photo manipulation did not have an effect on people’s ability to locate the manipulation. This pattern of results is somewhat surprising. It seems intuitive to think that a general belief that manipulated photos are prevalent simply makes people more likely to report that a photo is manipulated because they are generally skeptical about the veracity of photos rather than because they are better at spotting fakes. Although interesting, the small effect size and counterintuitive nature of the finding indicate that it is important to replicate the result prior to drawing any strong conclusions. The only variable that had an effect on accuracy in the location task was gender; males were slightly more likely than females to correctly locate the manipulation within the photo.

Table 2 Results of the GEE binary logistic and linear regression models to determine variables that predict accuracy on the detect and locate tasks

Full size table

Together these findings show that individual factors have relatively little impact on the ability to detect and locate manipulations. Although shorter response times were associated with more correct detections of manipulated photos, we did not manipulate response time so we cannot know whether response time affects people’s ability to discriminate between original and manipulated photos. In fact, our response time findings might be explained by a number of perceptual decision making models, for example, the drift diffusion model (Ratcliff, 1978). However, determining the precise mechanism that accounts for the association between shorter response times and greater accuracy is beyond the scope of the current paper.

Experiment 1 indicates that people have some ability to distinguish between original and manipulated real-world photos. People’s ability to correctly identify manipulated photos was better than chance, although not by much. Our data also suggest that locating photo manipulations is a difficult task, even when people correctly indicate that a photo is manipulated. We should note, however, that our study could have underestimated people’s ability to locate manipulations in real-world photos. Recall that subjects were only asked to locate manipulations on photos that they thought were manipulated. It remains possible people might be able to locate manipulations even if they do not initially think that a photo has been manipulated. We were unable to check this possibility in Experiment 1, so we addressed this issue in Experiment 2 by asking subjects to complete the location task for all photos, regardless of their initial response in the detection task. If subjects did not think that the photo had been manipulated, we asked them to make a guess about which area of the image might have been changed.

We also created a new set of photographic stimuli for Experiment 2. Rather than sourcing photos online, the first author captured a unique set of photos on a Nikon D40 camera in RAW format, and prior to any digital editing, converted the files to PNGs. There are two crucial benefits to using original photos rather than downloading photos from the web. First, by using original photos we could be certain that our images had not been previously manipulated in any way. Second, when digital images are saved, the data are compressed to reduce the file size. JPEG compression is lossy in that some information is discarded to reduce file size. This information is not generally noticeable to the human eye (except at very high compression rates when compression artifacts can occur); however, the process of converting RAW files to PNGs (a lossless format) prevented any loss of data in either the original or manipulated images and, again, ensured that our photos were not manipulated in any way before we intentionally manipulated them.