A complete account of color constancy requires a theory of how illumination fills three-dimensional space

Studies of lightness and color constancy use the terms lightness and brightness to refer to the qualia corresponding to perceived surface reflectance and perceived luminance, respectively (Arend & Goldstein, 1987). Missing from the literature is a consideration of how differential levels and chromaticities of the illumination that fills space are inferred (Mausfeld, 2003; Smithson, 2005). In this article, I propose that humans are aware not only of colored objects, but of the empty space around them being full of one or more levels and chromaticities of illumination. The argument is that theories of color constancy fail to grasp all of the parameters necessary to develop a broader perceptual theory, including the crucial variable of how light is inferred in open space, which has been missing from theory and, in fact, virtually all visual psychophysics.

The problem with collapsing a three-dimensional world into two dimensions

Color theory has been developed to explain the perception of isolated spots of light and their appearance, as, for example, by trichromacy and opponency. When lights are surrounded by other chromatic stimuli, further effects—such as color contrast and color constancy—have been studied. In all cases, the spots and surrounds are presented on a flat surface (e.g., “object mode”) or as an opening into a flat surface (e.g., “aperture mode”). Such methods do not easily lend themselves to inferences about the volumetric qualities of space, since any such properties will be identical for both the spot and the surround. Nevertheless, many authors have presupposed that color vision can be completely described by the perceptions of spots and surrounds on flat surfaces. One can trace this attitude back to the perspective reduction discovered in the Renaissance, which seemed to imply that all volumetric information can be represented in two dimensions by a painter.

The famous painter Alberti (1435) was the first to design a two-dimensional pane of glass that could represent a scene in three-dimensional perspective. Mausfeld (2002) discussed how Alberti’s window, capitalizing on this geometrical advance, contains all of the primitives necessary for the representation of both lower- and higher-order color processes. “Lower” processes contain elementary achievements, including color matching, color discrimination, and their temporal and spatial properties, while “higher” processes designate functional perspectives and focus on achievements such as color constancy. While Mausfeld does not consider “lower” to imply retinal versus “higher” cortical processes, as in Land and McCann’s (1971) Retinex model, Mausfeld does follow the traditional assumption that decontextual colors (represented in CIE space as hue, saturation, and brightness) are the elemental components used by higher-order processes, such as those that evoke color constancy. I argue here that this view is erroneous, in part because it ignores the three-dimensional nature of lived space and the need for this to be represented in the brain. True, Mausfeld correctly points out that part of this flaw originates in the assumption that decontextual (or “isolated,” with no surround) colors map directly onto the divisible physical qualities of wavelength, purity, and intensity, but he does not go far enough; he is willing to sacrifice the fact that the physical quality of space contains illumination. That is, Alberti’s window relies on the concept that the proximal stimuli fundamentally misallocate what must be the true nature of the primitives: Collapsed to two-dimensional physical space, the illuminant is reduced to a mere component of the luminance returning to the eye from reflected surfaces. Consequently, only its effects are seen (e.g., shadows), but this negates the fact that the illuminant also fills the phenomenological volume of space.

For example, when I look across my office at a red chair, the space in front of the chair does not appear red, but clear rather than black. Thus, the illuminant filling the space has an intensity and chroma that differs from the object that it is receiving its luminance from. It is this space that seems visible, not because of dust particles in the air, but because each point in the volume of space is inferred to contain a certain level and chromaticity of illumination. By assumption, complete color constancy can occur only when the illuminant is inferred to fill open space and is not simply evinced in the quality of brightness, which results from the interaction of illumination with surface reflectance.

For example, consider Gilchrist’s (1977) Science article, which describes two adjoining achromatic rooms being viewed through a pinhole (Fig. 1, left panel). The near room is dimly illuminated, while the far room is highly illuminated. Attached to the doorframe between the two rooms are two papers, arranged so that a white paper appears either adjacent to the doorframe or, with its corners removed, on the back wall of the far room. When the white paper appears coplanar to a black paper in the doorframe (i.e., under the near room’s dim illumination), it appears white (i.e., it was matched to a 9.0/ Munsell value chip), but when it appears on the back wall, coplanar to a white paper (i.e., under the far room’s high illumination), it appears dark gray (i.e., it was matched to a 3.5/ Munsell value chip). These findings support Gilchrist’s “coplanar ratio hypothesis,” which stipulates that the lightness of a surface is determined relative to coplanar surfaces, since they tend to share the same level of illumination. That is ascertained because one infers the “level of illumination” from the spatial average of surface brightnesses. Thus, by definition, the focus of the coplanar ratio hypothesis is on surfaces, while making only an implicit assumption that the volume of space needs to contain specific light levels.

Fig. 1
figure 1

(Left) A photograph of Gilchrist’s (1977) far-room condition. From “Perceived Lightness Depends on Perceived Spatial Arrangement,” by A. L. Gilchrist, 1977, Science, 195, pp. 185–187. Copyright 1977 by the American Association for the Advancement of Science. Reprinted with permission. (Right) A photograph of one of a stereo pair of images presented on a CRT used in Schirillo et al.’s (1990) far-room condition. From “Perceived Lightness, but Not Brightness, of Achromatic Surfaces Depends on Perceived Depth Information,” by J. Schirillo, A. Reeves, and L. Arend, 1990, Perception & Psychophysics, 48, pp. 82–90. Copyright 1990 by the Psychonomic Society. Reprinted with permission. Figure is on p. 85

Earlier, Gelb (1929/1938) had performed a closely related experiment. In a room illuminated by a weak ceiling lamp, he focused the concealed beam of a projection lantern on a revolving black disk. No penumbra was visible on the disk or the background. Observers saw the disk as white and dimly illuminated. When a small piece of white paper was placed to intercept the lantern beam, the percept changed. The disk was now seen as black, the paper white, and both as strongly illuminated. The change in the appearance of the disk from a dimly illuminated white to a strongly illuminated black has been ascribed to the white paper’s revealing that the disk was strongly illuminated (Woodworth & Schlosberg, 1954). This is how one can infer the “level of illumination.”

The level of illumination can be discounted when one perceives the lightness of a surface. Lightness refers to the perceived reflectance of a surface. Arend and Goldstein (1987) operationalized this notion by asking observers to adjust the luminance of one surface on a CRT under one level of illumination to match that of another surface under a second level of illumination, “as if they were cut from the same piece of paper.” Such matches were proportional to reflectance and independent of the overall brightness level, over a 1:1,000 range. Brightness, on the other hand, refers to the perceived luminance of a surface region. Because luminance is the product of surface reflectance and illumination, brightness covaries with the product of reflectance and illumination, and in fact is a power law of illumination, with the power determined by the reflectance (Arend & Goldstein, 1987). Note that neither lightness nor brightness alone reveals the level of illumination.

To see how the two distinct qualities of lightness and brightness differ, Schirillo and colleagues (Schirillo, Reeves, & Arend, 1990) repeated Gilchrist’s experiment simulated with a special-purpose Tektronix CRT and stereoscope to generate three-dimensional space and the same physical variations in intensity as Gilchrist. Although the main interest was in the difference between lightness and brightness, the data also revealed a striking contrast with those of Gilchrist: While the effects on lightness judgments in the simulation were in the same direction as his, the magnitude was only 41 % of his; that is, the lightness range of the gray paper that appeared in either the near or the far room spanned a 2.7:1 ratio, as compared to his 6.6:1 ratio.

The authors postulated that this occurred, in part, because their stereoscopic space did not permit observers to infer that each volume of space was illuminated differently (Fig. 1, right panel). That is, if the scene in Gilchrist’s condition could be viewed from off to one side, a stream of higher-intensity light would traverse the doorway from the far room into the near room, which, of course, would be evident in cues projected onto surfaces, such as shadowed lines and highlights on the floor (Fig. 1, left).

Schirillo et al. (1990) found that observers’ judgments of the brightness of the gray paper did not vary significantly between the near and far conditions. Instead, because it was actually under a constant level of illumination in both conditions (i.e., the lower level of illumination of the near room), its brightness remained constant across conditions.

The important point to consider in Gilchrist’s experiment is that the actual three-dimensional volume of space within his scenes contains differences both in the illuminant and in the cues as to the illuminant. Schirillo et al.’s (1990) stimuli made the three-dimensional space two-dimensional on a flat CRT screen (Fig. 1, right), minimizing the phenomenal quality of a volume of illumination. For example, they did not accurately simulate the illuminated and shadowed floor. Observers certainly perceived a 3-D space, but because the gradients on the floor were not simulated, they received conflicting information about the 3-D distribution of illumination; the only way that a real floor could have no visible illumination gradients would be if it were perfectly matte, soot black. Thus, only when an observer is located within a three-dimensional volume can higher-order issues of color and lightness constancy be approximated correctly.

Maloney and colleagues realized that illumination cues have a cumulative effect on determining the exact lighting of a volume. Their series of studies (Boyaci, Doerschner, & Maloney, 2006; Boyaci, Doerschner, Snyder, & Maloney, 2006; Maloney, 2002; Snyder, Doerschner, & Maloney, 2005) demonstrated that adding cues, such as specular highlights and graded and cast shadows, improves approximations of color constancy. They also showed that the human visual system can compensate for all of the complexity in the light field that affects the appearances of Lambertian surfaces (Doerschner, Boyaci, & Maloney, 2007).

Therefore, while reducing a three-dimensional scene to two dimensions using Alberti’s window retains cues about perspective, it limits the ability to discern that a volume of space contained within the pictorial plane is full of light. Thus, observers partially extract the primitives of surfaces (i.e., they underestimate lightness differences but correctly predict brightness) within the actual volume of space, yet they can only properly extract all of the primitives (e.g., inferred illumination) within three-dimensional space. This experimental observation is the one that Mausfeld (2002) should consider.

Inferred illumination

Along these lines, Helmholtz (1866/1962) may also not have realized the significance of illumination in empty space when he postulated the notion of “unconscious inferred illumination.” The light that reaches the eye is a product of the illuminant and of the reflecting properties of the surface in question, which means that the wavelength distribution of the light reaching the eye cannot be disambiguated from the surfaces. And this is true for an isolated spot with a single illuminant just covering it. But as soon as more elements are illuminated by the same illuminator, the paradox decreases; all that one needs is to assume “the gray world “or “brightest is white” for the entire problem to disappear. The reason for this is that the chromaticity of the light varies little over a wide region containing many differently colored surfaces. This observation is paradoxical, in that the light waves reaching the eye, say, from a red surface, are long, yet the space in front of the surface does not appear reddish, but neutral. This is a problem, one about the allocation of chroma to the surface or to the light. Helmholtz concluded that the surface reflectance is perceived at some depth, and that the level and chromaticity of the intervening illuminant can only be inferred from the cues left on surfaces (e.g., shadows, highlights, or a reddish hue). Vision science has yet to resolve the paradox that long wavelengths, for example, attach the sensation of redness to a surface and do not appear as a solid volume of red light between the front of the surface and the eye (De Weert, 2002). Thus, while Helmholtz considered inferring the illumination only to discount it, I would argue that it is not discounted, but inferred to have specific values other than zero.

Vision science holds that color sensations attach to a surface and do not appear as a solid volume of light in front of it, because psychophysical theories (Gibson, 1966) typically assume that retinal illuminance is a proximal cause, resulting in the representation of objects in space, and part of the distal cause (i.e., luminance) contains the illumination falling on objects. While part of the phenomenal effect contains specific qualities (e.g., perceived specularity) that can serve as cues to compute the distal cause of illumination, the phenomenological properties of the volume of space that contain the objects have not been acknowledged. For example, the inferred illumination in the left panel of Fig. 1 is high not only on the surface of the floor, where the highlight is, but in the otherwise empty space above the floor.

Specularity may indicate the location and quality of an illumination source, but the spaces between the illumination source and the object and from the object to the eye have specific qualities as well. Neglecting them obfuscates the long-standing philosophical debate over whether objects themselves are colored or whether various wavelengths of light serve as neural inputs to a physiological system that transforms colorless rays into colored objects (Aldrich, 1952; Grandy, 1989; Hardin, 1984a, 1984b; Hilbert, 1987; Maund, 1995; McGinn, 1991; Pickering, 1975; Smart, 1975; Sosa, 1996). If objects appear colored, why do the space that contains the objects, which parenthetically contains both the illuminant and the luminance reaching the eye, and the light rays themselves appear transparent? That is, why does the object appear colored, but not the space in front of it, in that this space contains light of the same wavelengths as that at the surface of the object? A thin fog is palpable, yet it also permits some visibility of surfaces; thus, observers can discriminate between space that is filled (with water drops) and empty space. That is, the light rays are indeed transparent—that is why one cannot see them, except in a fog.

Apparent illumination/lightness invariance

The phenomenal fact that the volume of space surrounding objects appears filled with illumination is evident in the common phrase that “the room is light” instead of that “all of the surfaces within the room are reflecting light.” In this case, language mirrors the phenomenological impression that the space itself is full of light. However, “the room is light” might mean “the average surface within the room is reflecting light.” In this case, all surfaces need not be luminant; some might be in shadow or dark. This hypothesis implies that the observer first discounts the illuminant, then calculates all of the surfaces’ lightnesses, then averages them, and then constructs an overall impression of surface lightness, as when looking into a white room (Reeves, personal communication, 3 Jan 2007).

However, it may be that the claim that we use our mental representation of the illumination field to solve the problem of color constancy is misdirected. Instead, why not conclude that we have knowledge of surface reflectance that helps us estimate properties of the illumination field? Which comes first, the chicken or the egg? Or is this an iterative process? There is no reason for the directional bias posited above to exist. In fact, in all of the experiments reported here, surface reflection was manipulated as a means to test how it affects field illumination.

Yet this hypothesis also fails to consider what happens when a three-dimensional space may have multiple levels of illumination, and any particular level may be inferred by surfaces in some depth planes but not others. This is the case when looking down a forest path under a canopy of trees. In this more complicated, but also more ecologically viable, situation, 3-D surface geometry, surface lightness, and apparent illumination must all be determined simultaneously (see Fig. 3).

For example, neglecting the phenomenological aspect of light contained within the three-dimensional space that embeds distal objects misrepresents the total effect that the visual system accounts for via a proximal cause. In the two panels of Fig. 2, the apparent illumination on two wall-of-cubes images appears to differ due to their different reflectances (Logvinenko & Ross, 2005). Logvinenko and Ross attributed this finding to what they called “apparent illumination/lightness invariance”: When apparent illumination increases, surface lightness decreases. This can be seen in the differences in the cube tops in the panels of Fig. 2. Logvinenko and Ross further claimed that the important question is not how the visual system discounts illumination changes, but how it encodes them and takes them into account when calculating lightness.

Fig. 2
figure 2

In both images, the cube tops (i.e., diamond shapes) have the same luminance, yet the left figure contains light reflecting sides, while the right figure includes dark reflecting sides, making the cube tops appear to be dark and light, respectively. From “Adelson’s Tile and Snake Illusions: A Helmholtzian Type of Simultaneous Lightness Contrast,” by A. D. Logvinenko and D. A. Ross, 2005, Spatial Vision, 18, pp. 25–72. Copyright 2005 by Brill Publications. Reprinted with permission

Thus, Logvinenko and Ross’s (2005) claim was that Helmholtz’s discounting of the illuminant is incorrect. Likewise, apparent illumination/lightness invariance can be seen in Marr’s (1982; see Fig. 3, left) version of a reversible Mach card, where the dark region appears either as a white surface in a shadowed region or, upside-down, as a gray surface under constant illumination. Notice how adding a cube to the shadowed region requires additional shadowed components to appear “correctly illuminated” (Fig. 3, right), suggesting that three-dimensional geometry, lightness, and apparent illumination require a simultaneous solution and that the level of illumination within any portion of the volume of space cannot be correctly determined without knowing the spatial geometry of all of the surfaces.

Fig. 3
figure 3

(Left) Marr’s (1982) version of a reversible Mach card, in which the dark region appears either as a white surface in a shadowed region or, upside down, as a gray surface under constant illumination. (Right) A cube added within the shadowed region has the “correct” shadows cast on it

Thus, the common misconception arises, in part, because the retinal structure reduces three-dimensional space to two dimensions, requiring a perceptual reconstruction of three-dimensional properties. Traditionally, this problem has been considered within the constructs of stereo vision (Marr, 1982), referring exclusively to the alignment of object properties and not of the luminant qualities of the vacant space per se, because the void has no “corresponding points,” as surface features do. Consequently, both Helmholtz (1866/1962) and Gibson (1966) considered surfaces to be the only qualia worth describing, and not the space that contains these surfaces.

Current philosophical assumptions

However, by 1935, Katz was distinguishing the “illumination of empty space” (Erleuchtung) from “illumination of an object” (Beleuchtung). Yet among Gestaltists, Koffka (1935) best articulated how light was to be “apprehended”: He said that the intensity of the lighting of space is not really “seen,” but “felt.” I will use the term inferred for the volume of illumination, as opposed to terms like apprehended or felt. The rationale for this notion becomes apparent in the inverse optics problem that addresses the transformation of a two-dimensional image space to a three-dimensional perceptual space.

Pizlo’s (2001) treatment of this issue misses the key fact that the mapping of distal onto proximal stimuli is not many-to-one, but two-manys-to-one. That is, his Eq. 3, (est)X = m x (Y x , P x ), should instead read (est)X x+y = m x (Y x , P x ), where X x indicates surface reflectance (i.e., lightness) and X y indicates the inferred illumination within the spatial volume. In essence, his inverse method works well for shape, but it does not help with inferred illumination. This reformulation more accurately represents how to estimate the complex phenomenological world (est)X from an equally complex physical world using a single-value proximal stimulus (i.e., where Y x equals luminance), and it places more restrictions on the possible mappings of Y onto X, thereby minimizing the role of P x a priori constraints and assumptions, such as that illumination edges are blurry as compared to reflectance edges.

More recently, Lehar’s (2003a, 2003b) spatial-perception thesis tackled this issue by first clarifying the subjective side of the mind/brain dichotomy. While physiological processes transform the physics of L = R × I (i.e., luminance = reflectance × illumination), it is prudent to focus on the subjective conscious experience of color, where brightness = lightness × inferred illumination. Lehar (2003b) correctly stated that by ascertaining three local variables—brightness, lightness, and illuminance—for every point in a volume, color theory is complete. Furthermore, he was absolutely correct in asserting that depth information is volumetric, and that appropriate neurological models must therefore contend with representations that accurately “represent transparency, with multiple depth values at every single (x, y) location, as well as represent the experience of empty space between the observer and a visible object” (2003b, p. 13). He went on to explain in his Fig. 5.3 that every voxel of empty space between the observer and a visible object contains some level and chromaticity of light. What are needed are experiments that test this hypothesis. Several examples are provided in the forthcoming section on previous experiments.

Lehar’s (2003b) inclusion of the Gestalt color constancy problem regarding lightness, brightness, and inferred illumination is nontrivial. Humans are subjectively aware of [empty] space being full of one or more levels and chromaticities of illumination, as well as of colored objects within that space, which explains how an observer could move objects around in Gilchrist’s (1977) room (see Fig. 1), see changes in brightness but not lightness, and not be shocked while making such movements. We already have knowledge about the levels of illumination within the volume of space. The inability to demultiplex the light within empty space from the color of surfaces places color theories on the same poor footing as current spatial-vision theories. Consequently, Lehar’s inclusion of illumination in “empty” space recognizes an essential component of subjective experience. He concluded that “volumes of empty space are perceived with the same geometrical fidelity as volumes of solid matter” (2003b, p. 61), making it plausible that each of these regions has some light level associated with it.

Unlike Lehar (2003a, 2003b), I do not assume fidelity, in that no experiments to date have been able to quantify the degree of fidelity. We certainly do not see photons of certain frequencies crisscrossing in the air. However, this would potentially be testable by placing a series of red surfaces at 90º angles from a series of green surfaces, and then exploring whether a gray surface introduced at the juxtaposition would be tinged yellow. I also do not assume that empty space is “perceived,” as if the inference rises to a certain level of consciousness, since again no direct experiments to date have tested this premise. However, the findings of Rutherford and Brainard (2002) suggest that the representation is not, at least, fully conscious. Rutherford and Brainard tested the idea that judgments of the illumination level could explain judgments of reflectance, in the sense that the relationship PR × PI = L would hold, where PR is perceived reflectance, PI is perceived illumination, and L is retinal illuminance. Thus, it is important to realize that “perceived illumination” as measured by Rutherford and Brainard may not be the same as “inferred illumination,” and that one may not have completely conscious access to “inferred illumination.”

What a Ganzfeld reveals about light in space

“In a perfectly homogeneous Ganzfeld, the surface of the sphere has no perceived depth because you don’t perceive a surface at all. You perceive a fog that extends to infinity. You see a surface only if there is a visible texture elided that defines the depth plane” (Gilchrist, personal communication, 18 April 2003). What is most interesting about this condition is that as soon as a surface is present, the surrounding field (i.e., the visible portion of the Ganzfeld) appears as a uniform background surface, without fog, at some more distant depth. That is, adding a single surface completely eradicates the volume of light (“fog”) that was present in the Ganzfeld.

Ganzfelds were extensively studied by Gibson and colleagues (Gibson, 1986; Gibson & Dibble, 1952; Gibson & Waddell, 1952), who concluded that in homogeneous ambient light, vision fails due to lack of information, even with adequate stimulation and corresponding sensations (Gibson, 1986). He used this finding to consider what he felt were two contradictory assertions: that nothing can be seen but light, but that light, properly speaking, can never be seen. He concluded that at least one of these propositions must be wrong (Gibson, 1986, p. 54), and that light is “properly seen” in both cases. In the first case, the light rays from surfaces impinge upon the back of the retina to form the images of the objects seen, and in the second, the ambient light in the volume of space that contains the object (which Gibson refers to as a “medium”) can also be seen.

Gibson’s (1986) ecological conception of optics is correct that light completely fills the air, and that each point in the air is an intersection of rays coming from all directions, which implies that light is ambient at every point. However, he failed to consider that this ambient array not only provides the structure for the perception of surfaces, but is itself inferred. He was so concerned to distinguish ecological optics from physical optics that he concluded that we can only see illumination through that which is illuminated. He claimed that we do not see the light that is in the air, that all we ever see is the environment, never photons or waves or radiant energy (Gibson, 1986). He did not discuss whether the medium is inferred.

However, if he were correct, all would appear black, as in outer space. This error is nontrivial, because he allowed for the perception of shadows attached to surfaces (Gibson, 1986, p. 90, Fig. 5.9, showing the hills and valleys on the surface of the barren earth), but failed to consider how such shadows would affect the level of illumination directly in front of those surfaces. If additional surfaces were placed within the shadowed portions of the scene (as in Fig. 3), they would also appear to be in shadow, as observers would expect. Thus, within the ambient illumination is some of the stimulus information that preoccupied Gibson in other contexts—the level and chromaticity of the illuminant. He did admit that when air is illuminated and fog-free, it affords visual perception (Gibson, 1986). Consequently, he acknowledged that “air” has an affordance, which, given the other components of his theory, suggests that it must be inferred. It afforded walking through, breathing, and so forth, just not “seeing.”

Previous experiments suggest spatial knowledge of illumination

Stereo displays

In 2003, Perkins and Schirillo showed that the brightnesses (i.e., perceived luminances) of surfaces within a three-dimensional scene are contingent on both their luminances and their three-dimensional spatial arrangement. In one experiment, a CRT screen was viewed through a haploscope in which simulated achromatic surfaces were presented in three dimensions (Fig. 4).

Fig. 4
figure 4

Schematic of a perceived fused CRT image. Each 0.75º-diameter circular patch has been randomly assigned a depth plane, except the 1º-square test and comparison surrounds, which remain in the far and near depth planes, respectively. The far-plane circles are under low luminance, while the near-plane circles are under five times this illumination. From “Three-Dimensional Spatial Grouping Affects Estimates of the Illuminant,” by K. R. Perkins and J. A. Schirillo, 2003, Journal of the Optical Society of America A, 20, pp. 2246–2253. Copyright 2003 by the Optical Society of America. Reprinted with permission. Figure is on p. 2251

A given surface could be in one of two possible depth planes—the near or the far depth plane, shown as light and darker circles, respectively. In the actual experiment, the luminances of the surfaces in each depth plane could vary. Using a joystick, observers had control over the lower-square test patch; in the displayed condition, they set the luminance intensity ~33 % higher than that of an upper-square comparison patch in order to match the two patches’ brightnesses. This behavior was consistent with viewing a real scene with a simple lighting interpretation from which to estimate a different level of illumination in each depth plane. It is important to realize that the CRT screen was black with no surfaces, thus giving no clue as to any level(s) of illumination. The observers’ inferences of different levels of illumination in each depth plane were created solely by the luminances of the circles contained within each depth plane.

Randomly positioning the circles so that each depth plane had some high- and some low-luminance circles minimized any simple lighting interpretation, concomitantly reducing brightness differences to ~8.5 %, although the areas immediately surrounding the test and comparison patches continued to differ by a 5:1 luminance ratio. This finding shows that lateral inhibition plays only a small part in determining the lightness and brightness of surfaces, whereas the inferred level of illumination contained within a specific location in depth plays a much more significant role.

In a related experiment, Schirillo and Shevell (1993) used stereovision to place one Mondrian composed of several simulated gray papers in a “near” depth plane under 100 % illumination retinally adjacent to a second, identical Mondrian in a “far” depth plane under 20 % illumination (Fig. 5). Observers adjusted the luminance of the “far” central comparison patch to match that of the “near” central test patch in brightness. On half of the trials, the stereo disparity of only the comparison patch shifted, so that it appeared to be in the same “near” (fully illuminated) depth plane as the test patch. In this case, observers increased its luminance by 16 % as compared to when it appeared in the “far” (dimly illuminated) depth plane.

Fig. 5
figure 5

Schematic of a perceived fused CRT image. In one condition, the 1º-square test patch and comparison patch were in the far and near depth planes, respectively, while in the other condition, the stereo disparity of the test patch matched that of the near Mondrian. The far-plane Mondrian was under low luminance, while the near-plane Mondrian was under five times this illumination. From “Lightness and Brightness Judgments of Coplanar Retinally Noncontiguous Surfaces,” by J. A. Schirillo and S. K. Shevell, 1993, Journal of the Optical Society of America A, 10, pp. 2442–2452. Copyright 1993 by the Optical Society of America. Reprinted with permission. Figure is on p. 2443

This paradigm is consistent with that of Perkins and Schirillo (2003), with one exception: Schirillo and Shevell (1993) did not alter the dim luminances of the “far” Mondrian patches when they made the comparison patch coplanar with the “near” test patch. Moving the comparison surface embedded in the “far” depth plane into the “near” depth plane should not alter its brightness, since it remained retinally adjacent to surfaces that appeared to be under low illumination. However, observers ascribed a higher level of illumination to the “near” depth plane, even though the patches beyond this space were one-fifth of the luminance of those in the “near” depth plane. The fact that observers increased the brightness of the patches could only have occurred if the observers inferred a level of illumination for the “near” depth plane that was different from that dictated by the luminances in the “far” depth plane. This suggests that observers represent a distribution of light intensities throughout three-dimensional space.

Real rooms

A growing number of studies have suggested a mental representation of the illuminant in three-dimensional space that is called recognized visual space illumination (RVSI). It is thought to be derived from the amount of “initial visual information” (IVI; Ikeda, Shinoda, & Mizokami, 1998a), which has typically been manipulated by altering the chromaticity of the walls or furnishings in a miniature room viewed under normal illumination. Several findings are particularly germane. First, by going from a simple display in which the test patch is propped a fixed distance in front of a uniform gray background to making more of the walls, ceiling, and floor a particular (nongray) chromaticity, chromatic induction increases significantly. That is, once the surround was extended to the walls, ceiling, and floor of a box, chromatic induction abruptly increased (Cunthasaksiri, Shinoda, & Ikeda, 2004). The changes were uniform: That is, there was a steady increase in color change. In essence, the well-known phenomenon of simultaneous color contrast is thought to be a weak version of this color change.

RVSI may play an important role in the well-known color contrast demonstration presented in two dimensions. Its significance lies in the fact that it is three-dimensional and is valid not only at the surfaces of the objects within the space, but also for the entire area within the space in which no objects exist (Ikeda, Shinoda, & Mizokami, 1998b). This specific property of RVSI enabled us to predict the appearance of an object’s surface in terms of lightness as well as color when the object shifted from one place to the other in the space, following the principle explored by Perkins and Schirillo (2003). Recall that they placed surfaces of different luminances at two stereoscopic distances so as to allow the observer to infer the level of illumination at each depth plane. This experiment was equivalent to providing minimal IVI and made observers successfully construct a 3-D representation of the levels of illumination contained within the stereoscopic display. Recall further that, apart from the surfaces suspended in space, the CRT was black, so that these levels of illumination were truly inferred. Yamauchi, Ikeda, and Shinoda (1999) showed that adding objects to a scene made it possible to describe an RVSI more fully. However, they also showed that the walls surrounding a space are the most important IVI for the construction of an RVSI in a specific hierarchy: The back wall was the most efficient, the floor next, and the side walls the least (Yamauchi, Ikeda, & Shinoda, 2003).

Mizokami, Ikeda, and Shinoda (1998) conducted a significant experiment testing the RVSI hypothesis. They constructed two rooms at the same level of illumination, in which the front room contained items of lower reflectance than did the far room. They found that when pulling a test square toward the front room, observers gradually set its lightness higher in order to retain constant lightness (i.e., under a smaller RVSI; their Fig. 4). They concluded that a target need not be in a plane for its lightness to be perceived in relation to the plane—as Gilchrist (1977, p. 186) had proposed in his coplanar theory—but that the volume of space that encloses the surfaces is critical (Yamauchi & Uchikawa, 2005). This finding suggests that observers have a mental representation of a gradient of illumination within a three-dimensional world.

Thus, while Yamauchi and Uchikawa (2005) were correct in stipulating that the light within the space regulates the coplanar theory, Gilchrist (1977) was correct in using the term lightness to describe the quality of the surfaces. This confusion seems to result from the lack of a term for the inferred illumination within a volume of space (see Blakeslee, Reetz, & McCourt, 2008). Hence, the new term RSVI may be an appropriate alternative.

However, RVSI may or may not represent the actual physical illumination, depending on how the visual system interprets the initial retinal image (IVI). For example, by introducing many objects with reddish surfaces into a miniature room, the visual system could be persuaded to make a reddish RVSI despite a neutral illuminant (Mizokami, Ikeda, & Shinoda, 2000). In that case, observers would perceive a reddish patch as being neutral white. However, the amount of the shift would be less than the color shift of the furniture. The experiments conducted to test the RVSI hypothesis have provided strong evidence that humans infer the level and chromaticity of the volume of space that they are within.

Physical light field representations

Koenderink, Pont, van Doorn, Kappers, and Todd (2007) provided additional evidence that observers have a mental representation of what Koenderink et al. called “the physical light field.” Notice that their term sidesteps any perceptual requirement and instead focuses on objective reality. They inserted, in the center of a stereoscopically presented three-dimensional scene, a white “gauge” sphere that observers could adjust to match the (1) direction of the light, (2) the diffuseness of the light, and (3) the intensity of the light of the scene (Fig. 6). By moving the sphere around in space, they found that observers were quite sensitive to these various parameters of the physical light field and generally arrived at close to veridical settings. This speaks to the question of the fidelity of the light field. For example, when the sphere was moved so that it did not fall under the lower level of illumination in Fig. 6, observers reacted to this fact by altering the shading on the sphere and making it the appropriate brightness. This strongly suggests that observers have implicit expectations concerning how objects should appear in three-dimensional scenes, and that these expectations are measurable. Yet note that these measurements are made on a surface. Thus, Koenderink and colleagues demonstrated that observers have representations of both light intensities and the direction(s) of the light source(s) throughout space. This is precisely what Lehar (2003a, 2003b) had called for.

Fig. 6
figure 6

Koenderink et al.’s (2007) beach scene with a white “gauge” sphere in the volume shadow of the right front-most puppet. From “The visual light field,” by J. J. Koenderink, S.C. Pont, A.J. van Doorn, A. M.L. Kappers & J. Todd, 2007, Perception, 36, pp. 1595-1610. Copyright 2007 by Pion publication. Reprinted with permission. Figure is on p. 1603

Conclusion

Several experiments, some using stereo disparity and others using miniature rooms, have shown that observers infer differences in the levels and chromaticities of the illuminant(s) within a volume of space. The term infer is used, after Helmholtz (1866/1962), rather than perceive, because it clarifies that one is aware of both surfaces and the light in front of them, without the additional specific qualities of transparency.

It is important to realize that this volume of space is not dark, but appears to contain light.Footnote 1 It may be that the only cues that observers have to the level and chromaticity of this light comes from projections onto surfaces contained within the space (e.g., shadows or highlights). However, this quality is not brightness; that term is reserved for the product of illumination and surface reflectance, and does not extend to the empty space between surfaces and the eye. The most parsimonious description of such a quality is inferred illumination, the awareness of which is phenomenologically real and measurable.