Cast shadows are formed when an opaque object obstructs light and prevents it from illuminating a surface, such as the ground. Because light travels in a straight line, a point in a shadowed region, its corresponding point on the shadow-casting object, and the light source must all lie on a single straight line (Farid, 2016). As such, shadows provide information about the geometry of a 3-D scene and can be used to determine the location of the 3-D light source (Casati, 2004; Farid, 2016; Farid & Bravo, 2010). Assuming linear perspective, then lines in 3-D are imaged as lines in 2-D—that is, the physical laws that constrain the behavior of light in the 3-D world also apply to 2-D images (Farid, 2016; Kajiya, 1986), so when we take photos, or render images from a virtual environment, the interaction of light and the 3-D objects in the scene is captured in the geometry of the 2-D image.Footnote 1
The constraint that connects the shadow, the shadow-casting object, and the light source provides an image-based technique for objectively verifying the authenticity of shadows (Farid, 2016; Kee et al., 2013). The scene in Fig. 1a contains shadows that are consistent with a single light source and have not been manipulated. The geometric technique has been applied to objectively demonstrate the authenticity of the shadows in the scene. To use this technique, one can locate any point on a shadow and its corresponding point on the object, then draw a line through them. Repeating this process for as many corresponding shadow and object points as possible reveals the point at which these lines intersect and the exact location of the projection of the light source. In Fig. 1b, a bus stop and its shadow have been taken from another scene where the light source is in a different position and added to the original scene from Fig. 1a. Using the same principle, the line connecting the bus stop’s shadow and the corresponding point on the object does not intersect the scene’s light source. This inconsistency indicates that the image has been manipulated—and demonstrates how shadows can be helpful in detecting forgeries (Farid, 2016; Kee et al., 2013).Footnote 2
We might predict that people can make use of shadow information to help identify image forgeries. Shadows convey important information about the arrangement and spatial position of objects in a scene, and numerous studies show that the human perceptual system makes use of such information to understand the scene (e.g., Allen, 1999; Dee & Santos, 2011; Khang, Koenderink, & Kappers, 2006; Tarr, Kersten, & Bülthoff, 1998). In an early study investigating the perception of inconsistent shadows, people searched for a target cube that was illuminated from a different direction to distractor cubes also present in the display (Enns & Rensink, 1990). Subjects rapidly identified the presence or absence of the target cube, suggesting that the human visual system can process complex visual properties, such as lighting direction, at a preattentive stage of processing. This remarkable ability to perceive shadow information suggests that such information might also help in the detection of image forgeries. Other research, however, suggests that the visual system discounts shadow information in early visual processing (e.g., Ehinger, Allen, & Wolfe, 2016). Essentially, to recognize objects under a wide range of lighting conditions, the visual system prioritizes extraction of the lighting invariant aspects of a scene and filters out shadow information as “noise.” In support of this suggestion, Ehinger et al. found that people were slower to detect changes to shadows than changes to objects even when the shadow changes altered the meaning of the scene. As such, it remains possible that observers will not make use of shadow information to help them to detect image forgeries.
When considering people’s potential ability to make use of shadow information in a given task, it is also important to appreciate that the visual system must determine which shadows are cast by which objects—the shadow correspondence problem (Dee & Santos, 2011; Mamassian, 2004). For stimuli that consist of simple geometric shapes with right-angle features and well-defined shadow regions, matching an object point with its corresponding shadow point can be relatively straightforward—and, accordingly, such stimuli allow for a reasonably accurate estimation of the lighting direction. Yet it is often extremely challenging to match shadow points with corresponding object points in real-world scenes. For example, research suggests that the ability to estimate lighting direction does not generally extend to more complex real-world or computer-generated scenes; although there might be a point at which lighting inconsistencies do become noticeable (Ostrovsky, Cavanagh, & Sinha, 2005; Tan, Lalonde, Sharan, Rushmeier, & O’Sullivan, 2015). Furthermore, it is not known whether the visual system automatically picks up on discrepancies in lighting direction and generates a signal that these should be attended (e.g., the way a single red item among green items might call attention to itself due to a local contrast difference; Lovell, Gilchrist, Tolhurst, & Troscianko, 2009; Rensink & Cavanagh, 2004).
In sum, studies have yet to determine whether people can identify consistent and inconsistent shadows in complex scenes when there are a number of well-defined points between objects and corresponding shadows that, theoretically, make it possible to determine the location of the light source. In the first series of experiments, we aimed to answer this question.
Experiment 1
Method
Subjects and design
A total of 102 subjects (M = 25.5 years, SD = 9.0, range: 14–57 years; 60 men, 39 women, three chose not to disclose their gender) completed the task online. A further four subjects were excluded from the analyses: three had missing response-time data for at least one response on the task, and one experienced technical difficulties. There were no geographical restrictions, and subjects did not receive payment for taking part, but they did receive feedback on their performance at the end of the task (this was the case for all experiments reported in this paper). The design was within subjects, with each person viewing four computer-generated images, half of which had consistent shadows, and half of which were manipulated to show inconsistent shadows. We measured people’s accuracy in determining whether an image had consistent or inconsistent shadows. A precision-for-planning analysis revealed that 81 subjects would provide a margin of error that is 0.25 of the population standard deviation with 95% assuranceFootnote 3 (Cumming, 2012, 2013); this analysis applies to all experiments reported here. All research in this paper was approved by the Psychology Department Research Ethics Committee, working under the auspices of the Humanities and Social Sciences Research Ethics Committee (HSSREC) of the University of Warwick. All participants provided informed consent.
Stimuli
To create five different outdoor city scenes, we used a 3-D cityscape model from turbosquid.com and 3-D animation software (Maya®, 2016, Autodesk, Inc.).Footnote 4 To represent a real-world outdoor environment lit by the sun, each scene was illuminated by a single distant-point light source.Footnote 5 Each scene included a target object—a lamppost—and its corresponding shadow. To ensure subjects could use the shadow-based analysis technique outlined in the introduction, we included other nontarget objects with corresponding shadows. Recall that when a scene is illuminated by a single source, all of the shadows must be consistent with that light; if any shadow is inconsistent with the light source, then the scene is physically impossible (Farid, 2016; Kee et al., 2013). We rendered each of the five 3-D scenes to generate TIF image files with a resolution of 960 × 720 pixels. For each scene, the light was in front of the camera, but not actually visible within the image. To ensure that the shadows in the 2-D images were physically accurate, and therefore representative of the shadows that people experience in the real world, we rendered the images with raytraced shadows. Raytracing is a type of shadow rendering that calculates the path of individual light rays from the light source to the camera; it produces physically accurate shadows that are like shadows in the real world (Autodesk, 2016). These five scenes comprised our original, consistent image set—each illuminated by a single source and thus containing only consistent shadows.
To create the inconsistent-shadow scenes, we rendered each of the five scenes two more times: once with the light moved to the left of its original position (−800 m on the horizontal axis) and once with the light moved to the right of its original position (+800 m on the horizontal axis).Footnote 6 The scene layout remained identical across each version of the scene, yet the three different light positions—original, left, and right—meant that each version had a different shadow configuration. For each of the five scenes, we selected a single lamppost and its corresponding shadow to manipulate. The manipulation process involved three stages completed using GNU Image Manipulation Program® (GIMP, Version 2.8). First, we removed the target lamppost’s shadow in the original version of the scene. Second, we cut the shadow of that same target lamppost from the version of the scene with the light moved left of the original position. Third, we overlaid this shadow onto the original version of the scene. We then repeated stages two and three for the version of the scene with the light moved right of the original position (see Fig. 2). We exported the images as PNGs, which is a lossless format. We repeated this manipulation process for the other four scenes.
Overall, we had three versions of each of the five city scenes to give a total of 15 images. The original version of each scene was used to create our consistent shadow image set. The two manipulated versions of each scene were used to create our inconsistent shadow image set. The fifth city scene was used as practice (further details on the practice described shortly).
Procedure
Subjects were told to assume that “each of the scenes is illuminated by a single light source, such as the sun.” Subjects were given a practice trial before being presented with the four city scenes in a random order. Subjects saw two consistent shadow scenes and two inconsistent shadow scenes; however, they were unaware of this 50:50 ratio. For each scene, to cue subjects’ attention to the target lamppost, they were first shown an almost entirely grayed-out image with only the target lamppost fully visible and highlighted within a red ellipse. After 4 s, the full scene became visible. We also added a small yellow dot on the base of the target lamppost to ensure subjects did not forget which lamppost to consider. Subjects were asked, “Is the lamppost’s shadow consistent or inconsistent with the shadows in the rest of the scene?” They were given unlimited time to select between (a) “Consistent,” (b) “Inconsistent.” They were then asked to rate their confidence in their decision using a 100-point Likert-type scale, from 0 (not at all confident) to 100 (extremely confident).
After completing the shadow task, subjects were asked a series of questions about their demographics, interest in photography, video gaming experience, and whether they had experienced any technical difficulties while completing the experiment (see Table 6 in Appendix A for exact questions). Finally, subjects received feedback on their performance.
Results and discussion
For all experiments, we calculated the mean and median response time per image and report these in Table 7 in Appendix A. For all experiments, we followed Cumming’s (2012) recommendations and calculated a precise estimate of the actual size of the effects.
Overall accuracy
Can people identify whether scenes have consistent or inconsistent shadows? Overall, a mean 61% of the scenes were correctly classified. Given that there were only two possible response options, chance performance is 50%, thus subjects scored a mean 11 percentage points better than chance. This difference equates to subjects’ performance being a mean 22% better than would be expected by chance alone. Subjects showed a limited ability to discriminate between consistent (75% correct) and inconsistent (46% correct) shadow scenes, discrimination (d') = 0.41, 95% CI [0.22, 0.59].Footnote 7 These findings offer further empirical support for the idea that people have only limited sensitivity to lighting inconsistencies (e.g., Farid & Bravo, 2010; Ostrovsky et al., 2005). Thus it appears that subjects did not use the information available within the scene to work out the answer objectively. Furthermore, they showed a bias towards accepting the shadow scenes as consistent response bias (c) = 0.29, 95% CI [0.20, 0.38]. Presumably, our subjects had a relatively conservative criterion for judging that shadows were inconsistent with the scene light source and typically accepted them as consistent.
Image metrics and individual factors
Next, we tested whether people’s accuracy on the shadow task was related to the difference between the position of the projected light source for the scene and the projected light source for the inconsistent shadow. To achieve this, we calculated the shortest distance between the projected light position for the scene and a line connecting the target lamppost with its inconsistent shadow. In addition, we checked whether two properties of the image itself affected people’s accuracy on the task: (1) whether the light position had moved left or right of the original light position, and (2) the location of the scene light source. Furthermore, to determine whether individual factors played a role in identifying consistent and inconsistent shadows, we gathered subjects’ demographic data and details about their interest in photography and video gaming. On the shadow task, we also asked subjects to rate their confidence for each of their decisions and recorded their response time.
We conducted exploratory analyses to determine how each factor influenced subjects’ performance by running two generalized estimating equation (GEE) analyses—one for the inconsistent shadow scenes and one for the consistent shadow scenes. Specifically, we conducted a repeated-measures logistic regression with GEE because our dependent variables were binary with both random and fixed effects (Liang & Zeger, 1986). The results are shown in Table 1.
Table 1 Results of the GEE binary logistic-regression models to determine variables that predict accuracy in the shadow task
The distance between the scene light source and inconsistent shadow constraint did not predict accuracy on the task. This result suggests that people either are not aware that they can use this geometrical image-based technique for objectively verifying the authenticity of shadows or that they make errors when trying to apply this technique. For example, the shadow correspondence problem (Dee & Santos, 2011; Mamassian, 2004) might limit the extent to which subjects were able to accurately estimate the position of the scene light source. Video-game playing was the only variable in the model that had an effect on the likelihood of responding correctly. Those who play video games frequently (at least once or twice a week) were more likely to correctly identify inconsistent shadow scenes than those who do not. At first glance, this finding seems consistent with previous research showing that video gamers outperform non-video-gamers across a range of perceptual measures (for a review, see Green & Bavelier, 2012). Yet a more recent review of these studies highlights a number of methodological flaws in the research (Simons et al., 2016). These flaws, along with the exploratory nature of the analysis in the current study, limit the extent to which we can draw any firm conclusions about the effect of video gaming on visual tasks.
For the consistent shadow scenes, the distance of the projected light source from the scene had a small effect on the likelihood of responding correctly. Scenes in which the projection of the light onto the image plane was closer to the center of the image were more likely to be identified as consistent than scenes in which the light was further from the center. Subjects might have been better able to determine the accuracy of shadows in a scene when the light source was more readily available to use as a guide. Perhaps, then, our subjects were able to make use of the shadow-based analysis technique, but only when it was relatively easy to calculate the location of the projected light source.
For each of the four scenes in our experiment, the projection of the light source was beyond the image plane. Therefore, applying the geometric shadow-based analysis technique with our stimuli required people to use information outside of the image plane. It is possible that this is a difficult task to perform perceptually and that, instead, people tended to more frequently rely on in-plane image cues. We tested this suggestion by running a second GEE analysis for the inconsistent shadow scenes. In this second analysis, we examined whether a new variable measuring the rotation, in degrees, between the consistent shadow position and the inconsistent shadow position (computed on the image plane) was related to accuracy on the task. We included this angle difference measure in the second GEE analysis in place of the variable that measured the distance between the scene light source and inconsistent shadow constraint. All other variables in the model remained the same.
As shown in Table 2, this time two variables had an effect on the likelihood of responding correctly. First, replicating the result of the first model, those who play video games frequently were more likely to correctly identify inconsistent shadow scenes than those who do not. Second, inconsistent shadows positioned further from the correct position were more likely to be associated with accurate responses than inconsistent shadows positioned closer to the correct position were. This finding suggests that there might be a discernible point at which the inconsistent shadow becomes different enough from its correct position to make the inconsistency noticeable—lending support to the notion of a perceptual threshold for detecting lighting inconsistencies (Lopez-Moreno, Sundstedt, Sangorrin, & Gutierrez, 2010; Tan et al., 2015). In other words, our subjects appeared to hold a basic understanding of where an object’s shadow must cast to be consistent with the light source, but their understanding was imprecise.
Table 2 Results of the follow-up GEE binary logistic regression model to determine variables that predict accuracy in the shadow task
Overall, subjects were slightly more likely to identify the inconsistent shadows when the angle difference from the correct shadow location was larger compared with when it was smaller. Yet the experimental design meant that there were only eight inconsistent shadow scenes and thus only eight angle differences to examine. In Experiments 2a and 2b, to more precisely estimate the perceptual threshold for identifying lighting inconsistencies, we asked subjects to rotate a target shadow to the position that they thought was consistent with the lighting of the scene.
Experiments 2a and 2b
The results of Experiments 2a and 2b largely replicate those of Experiment 1, except using a different experimental paradigm. Thus, for brevity, we present full details of Experiments 2a and 2b in Appendix B and summarize the findings here.
In Experiment 2a, subjects were able to change the shadow rotation 360° about the base of the target lamppost; their task was to place the shadow in the position that they believed to be consistent with the other shadows in the scene. Even with this high level of control over the shadow position, subjects were willing to rotate the shadow to a relatively wide range of positions that were inconsistent with the scene lighting—51% of the shadows were positioned between −10° and +10° of the consistent position, 95% CI [46%, 56%]. Although there were differences by scene, overall a mean 20% more shadows were positioned to the left than to the right of the correct location, Mdiff 95% CI [12%, 28%].
In Experiment 2b, subjects could both rotate the shadow and change the size of the shadow. The results were similar to those in Experiment 2a, with 46% of shadows positioned between ±10° of the consistent position, 95% CI [41%, 51%]. Replicating Experiment 2a, collapsed across the four scenes, subjects positioned 16% more of the shadows to the left of the correct position than to the right, Mdiff 95% CI [7%, 25%]. In sum, allowing subjects to adjust the size of the target shadow in Experiment 2b made virtually no difference to the pattern of results.
Overall, the results from Experiments 2a and 2b indicate that subjects frequently make imprecise judgements about where shadows must be positioned to be consistent with a single light source. It is important to note, however, that each target shadow in Experiments 2a and 2b was simply the correct one for the given scene rotated around the base of the object. That is, the manipulations were made on the image plane rather than in the 3-D environment. As such, incorrect shadows were also inconsistent with the casting object in terms of sizes and angles between the lamp and the pole parts of the object/shadow. Therefore it is possible that being able to change the scale of the target shadow did not prevent subjects using the shape of the shadow as a cue. If so, our results might still overestimate people’s ability on the task. To examine this possibility, in Experiment 3, using the 3-D environment, we generated different versions of the target shadow that were inconsistent with the scene light source in terms of both orientation and shape. Importantly though, in Experiment 3, each inconsistent shadow option was physically plausible with respect to a single light source (albeit not the scene light source) in terms of its size and angle.
Experiment 3
Method
Subjects and design
A total of 114 subjects (M = 25.2 years, SD = 8.4, range: 14–52 years; 48 women, 62 men, four chose not to disclose their gender) completed the task online. Five additional subjects were removed because they experienced technical difficulties. We used a within-subjects design.
Stimuli
We used the same city scenes as in the previous experiments. This time, however, we created 21 versions of each scene, each version with the objects in an identical position, but with 21 different light positions. In the consistent version, the target lamppost’s shadow was created by the same light source as the rest of the scene. In the other 20 inconsistent versions, we created a second light source that only cast a shadow for the target lamppost. By changing the position of the second light source only, we created 20 versions of the scene in which the shadow for the target lamppost was inconsistent with the shadow configuration for the rest of that scene—but physically consistent with being cast by the target lamppost. For 10 of the inconsistent versions of the scene, we moved the second light source in 10 equal increments of 200 m to the left of the original light position. For the other 10, we moved the second light source in 10 equal increments of 200 m to the right of the original light position. As a result, we created 21 versions of each of the five scenes: one with consistent lighting for all objects in the scene—including the target lamppost—and 20 with consistent lighting for all objects except the target lamppost. The versions of the scene were numbered from 1 to 21, with the consistent version of the scene always number 11. Versions 10 to 1 were inconsistent, with the target shadow moving incrementally further to the left of the consistent version, while versions 12 to 21 were inconsistent, with the target shadow moving incrementally further to the right of the consistent version (see Fig. 3 for examples).
We developed a program in HTML to randomly select one of the 21 versions of the scene to display. As well as this randomly selected version, subjects were able to scroll through a sequence of another 10 consecutive versions of that same scene—crucially, the sequence always included the consistent version. To illustrate, consider, for example, that the program randomly selects Version 1, the subject would be able to scroll through Versions 1 to 11 of the scene. Or, to consider another example, if the program randomly selects Version 15, then the subject will be able to scroll through Versions 5 to 15 of the scene. Having generated the sequence, the program randomized which of the 11 versions to display first, thus ensuring that subjects did not always start at the extreme end of a sequence. Subjects used the left and right arrow keys on the keyboard to scroll through the 11 versions.
Procedure
The procedure was the same as in Experiment 1, with one exception: Subjects scrolled through the 11 versions of each scene rather than deciding whether the shadows in each scene were (a) “Consistent” or (b) “Inconsistent.” We asked subjects to select the version of the scene in which the shadow of the target lamppost was consistent with the other shadows in the scene.
Results and discussion
Overall accuracy
Subjects’ performance on the shadow task can be classified in different ways. Taking a conservative approach, we defined an accurate response to be only when subjects selected the (single) consistent version of the scene. Collapsed across the four scenes, the consistent version was selected a mean 25% of the time, 95% CI [21%, 29%]. Given that there were 11 possible response options in the task chance performance is 9%, thus subjects scored a mean 16 percentage points better than chance, 95% CI [12%, 20%]. This difference equates to subjects’ performance being a mean 178% better than would be expected by chance alone. Taking a more lenient approach and defining an accurate response by including one version either side of the consistent shadow position—that is, when versions 10, 11, or 12 were selected—a mean 55% of shadows were positioned correctly, 95% CI [50%, 59%]. Replicating the findings from Experiments 2a and 2b (see Appendix B), Fig. 4 shows that subjects were least accurate for Scene 3, with a mean 40% selecting Versions 10, 11, or 12, 95% CI [31%, 49%]. In contrast to the previous experiments, however, subjects were most accurate for Scene 1, with a mean 65% selecting Versions 10, 11, or 12, 95% CI [56%, 74%]. There is no immediately obvious reason as to why subjects did relatively well on Scene 1 in Experiment 3. Speculatively, it is possible that the shape of the target shadow in Experiments 2a and 2b actually made the inconsistent positions seem more plausible rather than less plausible in Scene 1.
Preference for shadows to the left or right
In contrast to the results of Experiments 2a and 2b (see Appendix B), collapsing across all four scenes, the shadows were equally likely to be positioned to the left or to the right of the correct location, Mdiff = 0%, 95% CI [−8%, 7%]. Yet as Fig. 4 shows, there was still variation by scene. In line with our previous experiments, subjects were more likely to position the target shadow left of the correct position in Scene 3. And again, in Scene 2, subjects were more likely to position the target shadow right of the correct position. This time, in both Scenes 1 and 4, a similar proportion of subjects selected a target shadow to the left of its correct location as to the right.
Overall, the pattern of results across our four shadow experiments was largely consistent. Most importantly, the experiments suggest that people have a limited ability to identify consistent and inconsistent shadows. This finding is somewhat surprising considering that subjects viewed scenes in which there was sufficient information to determine the answer objectively. Next, we consider the extent to which people make use of reflections to identify authentic and manipulated scenes.