1 Introduction

When we look at an arrangement of objects in the space before our eyes, parts of some objects will be hidden from view behind parts of other objects. Nonetheless, we can often represent, in some sense visually, the shapes, colors and textures of the parts hidden behind the occluders. In vision science, this is called amodal completion. Vision science investigates the basis for our constructions of representations of the occluded parts, but it has not had much to say about the nature of the format of the representations involved and the way in which they represent. My aim in this paper is to put forward such an account of the nature of the resulting representations.

In the philosophical literature there has been a tendency to say that our representations of the hidden parts of a scene consist in visual imaginings. Call this the imagination theory of amodal completion. However, this imagination theory poses a paradox, as I will show. If we imagine the occluded parts, what spatial relation to the unoccluded parts are they imagined to stand in? We cannot, it seems, imagine them to be there, where they are, because there, where they are, they are hidden from view. Call this the problem of unimaginability. This problem, as we shall see, has a counterpart in the neuroscientific literature.

The alternative account of amodal completions that I will defend holds that amodal completions are mental models of the three-dimensional structure of the objects of perception. They are of a kind that can meaningfully be described as having certain operations, such as rotation, applied to them. These models are not mental images, but they can be used to generate mental images. For instance, they can be used to generate an image of what an object looks like from the other side of the object. This theory will avoid the problem of unimaginability, because on this theory, we represent the occluded parts of a scene by including them in our representation of the three-dimensional structure of the scene without imagining them at all.

2 The Science of Amodal Completion

A distinction is typically drawn between modal and amodal completion (Kellman and Shipley 1991). In so-called modal completion, the perceiver seems to experience illusory contours, such as those perceived in the famous Kanizsa triangle. In amodal completion, there is nothing illusory, but a representation of the parts of a scene occluded by other parts of the scene is formed that in a sense completes the perceptual representation of the scene.

In the 80’s and 90’s various experimental paradigms were used to demonstrate the psychological reality of amodal completion. Gerbino and Salmaso (1987) showed that participants more quickly match the shape of an object behind an occluder to the occluded shape than to the irregular shape corresponding to the visible regions of the shape behind the occluder. Sekuler and Palmer (1992) showed that displays in which one object occludes another prime participants to perform other tasks in much the same way that the same pair of objects unoccluded would prime them. In recent decades studies of brain activity have been used to locate the processes of amodal completion in the visual cortex. De Wit et al. (2006) use electroencephalogram (EEG) measurements to show that displays incompatible with expected amodal completions register a late-occurring stronger response than those compatible with expected amodal completions. Weigelt et al. (2007) use a functional magnetic resonance (fMRI) adaptation paradigm to argue that, while early visual cortex produces representations of features that feed into amodal completion, the completed shape partially hidden behind an occluder is first represented downstream in lateral occipital cortex.

Early studies of how occluded objects are completed often focused on the role of local features of the projected image (Kellman and Shipley 1991). For instance, if two curved edges meeting the edge of an occluder (in the retinal projection) can be smoothly joined, then they are perceived as an edge of an occluded object. However, it was increasingly argued that considerations of overall form can compete with such local considerations (van Lier et al. 1995). For instance, if an occluded object exhibits radial symmetry, then this radial symmetry may be preserved in the amodal completion. A preference for completion based on global factors can be demonstrated as well in the representation of the three-dimensional form of individual objects (van Lier and Wagemans 1999, Experiment 3).

Not only geometrical properties but also background knowledge about the properties of the kinds of things represented play a role in amodal completion. Vrins et al. (2009) argue that our knowledge of whether an object is hard (like a brick) or soft (like a block of cheese) affects amodal completions based on stimuli of short duration (150–300 milliseconds). Hazenberg et al. (2013) argue that learning the names of unfamiliar shapes can dispose participants, when other cues are ambiguous, toward amodal completions in conformity to the learned shapes. Hazenberg and van Lier (2016) show that knowledge of the typical shapes of fruits and vegetables determines amodal completion of objects of those kinds. A banana occluded in the middle is completed with a representation of a complete banana, but two apples semi-occluded at either end of an occluder are not completed with a representation of a single, long object. Hazenberg and van Lier were able to localize the pertinent EEG responses at higher visual cortex sites within a 300–400 millisecond window, which may be taken to show that the amodal completions detected were not merely effects on what the participants believed to be behind the occluders but were tightly bound up with their visual representations.

Neuroscientific studies of amodal completion (such as Weigelt et al. 2007; Hazenberg and van Lier 2016; and Thielen et al. 2019) confirm that the construction of the amodal completions occurs in stages. Some of the stages prior to the finished (though possibly incomplete) representation of three-dimensional structure may themselves count as representations of limited aspects, such as boundaries between figure and ground (Qiu and van der Heydt 2005). Beyond the finished representation of three-dimensional structure, the objects represented may be placed under lexical categories, such as coffee cup or cat. We can take for granted that amodal completions are representations of edges and surfaces, including possibly their colors, occluded by an object or an arrangement of objects present to the senses of the representing agent. Amodal completions in this sense are the target of the imagination theory and will be the target of my alternative to the imagination theory.

3 The Imagination Theory

In a paper from 2010, Bence Nanay argues that the amodal completion of a scene consists in imagining the occluded parts of the scene. His striking example is a cat behind a picket fence. If the cat’s tail is occluded behind the fence, then I may amodally complete my perception of the cat by using mental imagery to represent the cat’s tail (Nanay 2010 p. 249). Nanay stresses that the mental imagery constituting amodal completion need not be actively intended (2010, p. 249). Similarly, Amy Kind writes that “our imaginative capacities contribute to our perceptual experience by making unseen features of objects seem present” (2018, p. 176). Her example is a Diet Coke can. “The front side of the can is seen; the back side is imagined.” In a footnote, (2018, p. 168), she explicitly describes the phenomenon as “amodal completion”.

What these theorists mean to be saying depends on what they mean by “mental imagery”. In his 2010 paper, Nanay explains it by means of an example: “A paradigmatic case of visual imagery would be closing one’s eyes and imagining seeing an apple ‘in the mind’s eye’” (2010, p. 249). Moreover, he explicitly compares the experience of mental imagery to the experience of perception, explaining that, on his account, “what it is like” to be aware of the occluded parts of a perceived object is similar to “what it is like” to perceive the nonoccluded parts (2010, p. 252). Kind makes clear what she means by imagination when she writes: “It enables us to have an experience of something not present as if it were present. When I visualize my kids while talking to them on the phone, they become present to me in a way they weren’t before” (2018, p. 175). So amodal completion (or “phenomenal presence” in her terminology) consists in literally forming mental images of the occluded objects.

In Nanay’s version of the theory, in amodal completion we imagistically represent the occluded parts as being in the places in space that they actually occupy relative to the parts reflecting light into our eyes. “When we represent the occluded parts of perceived objects,” he writes, “we use mental imagery ... in a way that would allow us to localize the imagined object in our egocentric space.” He continues, “When I represent the cat’s occluded tail, I represent it as having a specific spatial location in my egocentric space” (2010, p. 250). Kind does not seem to take any stand on how what we imagistically represent in amodal completion is represented as being spatially related to what we concurrently perceive.

So there are philosophers who have stated in print that when we amodally complete a perceptual representation of a scene, we do so by imagining, in a quasi-perceptual way, the occluded portions. In the introduction, I have dubbed this the imagination theory of amodal completion. Moreover, some philosophers, including at least Nanay, hold that we imagistically represent the occluded portions as occupying the very regions in space that they occupy, connected to the unoccluded portions reflecting light into our eyes in the way they are actually so connected. Call this the connectedness supplement to the imagination theory.

4 The Problem of Unimaginability

The imagination theory of amodal completion requires the connectedness supplement. In contemplating a scene one can imagine what the objects in the scene look like when they are not partially occluded by other objects. But that is just not what we are calling amodal completion. Amodal completion is supposed to be a constant of perceptual experience. When we perceive a scene in everyday life, there are almost always objects in the scene partially occluded by other objects and in many cases we amodally complete our representations of them. It is not plausible that in all these cases we imagine the absence of the occluders or imagine viewing the occluded objects from an unoccluded point of view or imagine them in a separate imaginary space apart from the scene perceived. Amodal completion does not require the perceiver to select a way of altering the scene in imagination to provide imagined perceptual access to the occluded parts. In amodally completing my perception of the cat behind the picket fence, I do not have to choose whether I will imagine walking around the fence or will imagine removing the slat occluding the tail or will imagine somehow seeing through the slat.

However, when the imagination theory is combined with the connectedness supplement, it says that we do something impossible. We are supposed to imagistically represent the occluded parts of objects and represent them as being in the same spatial relations that they actually occupy relative to the parts of the scene that we actually see and to imagistically represent them from the point of view that we actually occupy. But we cannot do all that, precisely because the occluded objects, from the point of view that we occupy, are occluded by the objects that we see. We cannot imagistically represent the tail of the cat while imagining it remaining attached to the cat and the cat remaining where it is behind the picket fence, because the tail of the cat, where it is, is occluded by one of the slats of the fence. Call this the problem of unimaginability.Footnote 1

When asked to imagine the cat’s tail behind a slat of the fence, many people might form a mental image representing a ghostly, semi-transparent tail superimposed over the slat. We can form such images, if we choose, but such images cannot be what amodal completion consists in. Such an image does not represent anything one could actually see. In the case of the cat, such an image does not represent the location in space of the join between the tail and the cat relative to the slat. No particular spatial relation between the tail and the slat is represented in such an image. Our perceptual experience is permeated with amodal completions of the kind that concerns us, but it is not plausible that it is permeated with images of ghostly objects superimposed over the physical occluders that we see and bearing no particular spatial relation to them.

I suspect that the problem of unimaginability is sometimes overlooked because the examples of amodal completion that are considered are not representative. We may be shown a picture that is incomplete because part of it is covered by a black mask (e.g. Nanay 2010, Fig. 2, p. 244). But in this case, we are not supposed to think of the black mask as a representation of an occluding object in the scene perceived. It is just a gap, or a hole, in the picture. In this case, there may indeed be no problem in imagining the missing parts, because we do not have to decide what relation between the occluder and the occluded we are supposed to imagine. The way in which this reduction of the phenomenon can be misleading will be illustrated in Sect. 6.

Another source of encouragement for the imagination theory might be a comparison to modal completion. Again, modal completion is the phenomenon in which we in some sense seem to see illusory contours, as when we seem to see the outline of a triangle when we look at the famous Kanizsa “triangle”. It may be observed that activity in early regions of visual cortex corresponds to both modal completion and amodal completion (Nanay 2010, p. 245; de Haas and Schwarzkopf 2018.) One might be disposed to draw support from this for the imagination theory, as Nanay does (2010, p. 250). To the contrary, a common response in early visual cortex does not rule out quite different cortical responses elsewhere. The two phenomena may differ with respect to the role of lateral occipital cortex (as de Haas and Schwarzkopf 2018, p. 10, suggest), and there may be differences with respect to the involvement of the two hemispheres of the brain (as argued by Corballis et al. 1999).

Nanay emphasizes that the visual imagery by means of which we represent the occluded parts of things does not have to be completely determinate (Nanay 2010, p. 251). The visual imagery can fail to specify precisely the shape, position and color of the tail. That is true, but it does not answer the problem of unimaginability. However indeterminately I might represent the cat’s tail, I have to imagine it attached to the cat and behind the slat. Moreover, the problem of unimaginability arises also when our imagistic representation of the occluded parts would be quite determinate. If what I perceive behind the fence is a long, uniformly colored rod, then my imagistic representation of the occluded portions would be be highly determinate, since I would represent the occluded parts as looking exactly like the visible parts.

5 Nanay’s Defense

As a foe of phenomenological methods (Nanay 2023, p. 14), Nanay could protest that he does not really want to identify mental imagery by asking us to perform an experiment before our mind’s eye. In later work he defined mental imagery thus: “Mental imagery is perceptual processing that is not triggered by corresponding sensory stimulation in a given sense modality” (Nanay 2018, p. 6, italics in the original; in his 2023, p. 4, Nanay substitutes “perceptual representation” for “perceptual processing”). As it stands, this definition is very broad, so broad that even the models of three-dimensional structure that I will posit could count as mental imagery.

However, I do not think Nanay wants to admit that. He assumes that mental imagery is realized in retinotopically organized visual cortex. This means that each region of visual cortex corresponds to a region of the retina in the sense that stimulations of a region of the retina have an especially strong bearing on the state of the region of cortex that corresponds to it (though there may also be cross-talk between regions). Nanay’s commitment to the retinotopic organization of mental imagery is implicit in his arguments that amodal completion is not mediated by belief. He says that amodal completions cannot be beliefs, or be the product of beliefs, because beliefs will not produce retinotopic activations of visual cortex (2018, p. 6; 2023, p. 59).

Nanay can plausibly maintain that mental imagery is realized in retinotopically organized visual cortex. But then his claim that amodal completion consists in mental imagery will land him in a neurophysiological variant of the problem of unimaginability. He will be committed to saying that both the representation of the occluder and the representation of the surfaces behind it occupy the very same region of visual cortex. In any case, this combination of views is contradicted by research. Studies reviewed in Sect. 2 indicate that amodal completion is modulated by some kind of knowledge (it need not be propositional) of the characteristics of physical objects. We should not expect that kind of knowledge to shape representations within early visual cortex within very short time spans. Again, Weigelt et al. (2007) conclude that complete amodal representations of occluded parts are found first in lateral occipital cortex. In a comprehensive survey of the neuroscience of amodal completion circa 2019, Thielen et al. (2019, p. 11) conclude that “within human neuroimaging studies, only little evidence has been found in favor of amodal completion in early visual areas”.

Studies show that amodal completion can occur in response to stimuli lasting only 100–250 milliseconds. A number of researchers, including Nanay, have been inclined to describe these results as showing that amodal completion occurs within that short time span (Sekuler and Palmer 1992, p. 110; Rauschenberger and Yantis 2001, p. 369; Nanay 2018, p. 6; 2023, p. 59). However, the 100–250 milliseconds durations referred to by these authors are stimulus durations, not processing durations. If participants amodally complete in response to stimuli of 250 ms but not in response to stimuli of 100 ms (Rauschenberger et al. 2006), then, as Weigelt et al. point out (2007, p. 8), that shows that something relevant happens during the longer duration that does not happen during the shorter duration (such as assignments of border-ownership), but it does not show that amodal completion is complete within 250 ms. On the contrary, a study by Yun et al. (2018a) indicates that even stimuli as brief as 150 ms can elicit amodal completions that depend on background knowledge of a kind that presumably cannot be processed within 150 ms.

6 The Neuroscientific Counterpart to the Problem of Unimaginability

In the neuroscientific literature on amodal completion there exist lines of research that reveal a neuroscientific counterpart to the problem of unimaginability. Some researchers seem to hold that cortical responses to occluded things occur in the very same regions of cortex that ought to be responsive to the occluders. Thus they pose a variant of the problem of unimaginability inasmuch as representations of two distinct things have to be superimposed on top of one another.

Ban et al. (2013) show participants an animated display in which a colored wedge rotates around a center point, with the sharp point of the wedge pointed at the center. The upper right-hand quadrant and the lower left-hand quadrant are occupied by quarter-circle grey patches. In one condition, the grey patches are semi-transparent, so that the wedge appears to visibly pass beneath them. In another condition, the grey patches fully occlude the central section of the wedge as it appears to pass beneath them. There are two other control conditions, but there is no condition in which the grey patches are missing and the central part of the wedge just disappears at the point where the grey patches should be. Participants are fMRI scanned as they observe the display. Ban et al. find that V1 and V2 regions of cortex corresponding to the grey patches generate a periodic response when the wedge transparently passes beneath the grey patches and also exhibit the same periodic response when the wedge invisibly passes under the grey patches, but do not exhibit this response to the same degree in their control conditions. Ban et al. conclude that their study “clearly provides evidence of topographic representation of the occluded portion of an object (unseen but sensed object continuity) in human early visual cortex” (2013, p. 17,004).

No doubt Ban et al. show what they claim to show, but what they investigate is not the kind of amodal completion that we above all want to account for. The grey patches in their display are not paradigmatic occluders. Participants may perceive them as merely masks and not as objects such as might occur in a three-dimensional scene. Accordingly, the V1 and V2 regions of cortex that correspond to them are not occupied with the representation of any occluding object. Those regions are free to represent the wedges passing behind the grey patches. If the display had represented actual occluding objects, rather than grey patches, and the same result had been achieved, then we would face a variant of the problem of unimaginability. Representations of occluder and occluded would have to be superimposed on one another in the same region of cortex.

Morgan et al. (2019) report results that purport to show that cortical regions V1 and V2 contain representations of occluded portions of scenes that are cortically similar to representations of line drawings. They measure the cortical activity in cortical areas corresponding to the part of the visual field corresponding to the missing portions of a photograph. They ask participants to draw on paper what they expect the missing portions of the photograph to contain, and they measure the responses to these drawings that are produced by accepted artificial models of cortical response. They find that the two sets of responses are in some ways non-accidentally similar. They conclude, “Our results have shown that early visual cortex responds to visual information hidden from view and that these responses are well described by orientation information from line drawings” (p. 9417).

Viewed uncritically, this result might seem to support the imagination theory of amodal completion, since one might take participants’ drawings to represent what they imagine in amodal completion. Morgan et al. themselves describe the representations in early visual cortex as amodal completions (p. 9420). In amodal completion, it might be said, we represent the occluded portions of a scene by mentally drawing a picture of them. The trouble is that what Morgan et al. call occlusion is not true occlusion but merely omission. Their stimuli consist of rectangular photographs from which one quadrant has been removed. They do not study at all the representation of portions of scenes hidden by other objects in the scene. If participants were asked to draw a picture of a scene containing an object occluding portions of another object, they would presumably draw the occluder and not the things that it occludes. If we tried to maintain that the representation of the occluded portions is something like a line drawing depicting both the occluder and the occluded, we would face again the problem of unimaginability.

The review of the neuroscientific literature by Thielen et al. mentioned above (Thielen et al. 2019) reveals a false dichotomy. They write: “Here we address whether the representation of the invisible parts of an occluded object involves a detailed low-level representation as it would be when the object was not occluded or merely an abstract representation.” (Thielen et al. 2019, p. 15; see also p. 4). We should not expect the occluded parts to be represented as if they were visible. The supposition that they are leads to the problem of unimaginability discussed above. Whether “abstract representation” is the only other alternative depends on what exactly this means (which Thielen et al. do not explain). If it means that the objects occluded are merely categorized into kinds and that their geometrical configurations are not represented, then, as I will show, this is not the only other alternative.

My criticisms in this section show that in the conduct of research on amodal completion, experimenters must take care that participants represent the occluders employed as spatial, occluding objects and not merely as masks over a scene or as gaps in a scene. Some of the early, paradigmatic work on amodal completion that I cited in Sect. 2 might be subject to this criticism as well. What I need for my criticism of the imagination theory, however, is just that there are enough studies showing that there are amodal completions that one cannot take to be imagistic representations without stumbling over the problem of unimaginability. There certainly are some, in particular, those that study amodal completions of three-dimensional objects, such as van Lier and Wagemans (1999).

7 Prima Facie Motivation for 3D Representations

In order to motivate my alternative account of the representations constituting amodal completion, I will identify several phenomena that we are acquainted with from everyday life that seem apt for explanation in terms of mental models of three-dimensional structure.

In daily life we are sometimes able to imagine what an object of visual perception would look like if viewed from the other side. Figure 1(a) shows what a certain object would look like from a given perspective. Viewing the object from that perspective you can imagine what the object would look like from the other side. Figure 1(b) shows what the object would look like, as you would thus imagine it. There will be many cases in which we cannot imagine what the object we are looking at would look like from another angle, because the object is of an unfamiliar kind or has an irregular geometry (Rock et al. 1989). But our imaginations are reliable in the sense that when we do perceive an object and think that we can imagine what the object would look like when viewed from a different angle, we reliably imagine correctly. Our ordinary experience confirms this reliability, but there is also experimental confirmation (Pinker and Finke 1980).

Fig. 1
figure 1

A 3D transformation

A neat explanation of this ability could be formulated in terms of mental models of three-dimensional structure. When we perceive the object, such as that depicted in 1(a), we form a representation that stands in a relation of structural homomorphism to the three-dimensional structure perceived (such as I will define in Sects. 9 and 10 below). The representation includes a representation of a viewing angle. This representation is subject to a transformation abstractly describable as a rotation relative to the viewing angle. The rotation will be describable in something like the way rotations relative to a camera angle are defined in terms of matrix algebra in the computer graphics industry (Dunn and Parberry 2011). On the basis of this rotated representation, we construct our mental image of the object as it would appear from the other side.

Another ability that we have is the ability to decide in imagination whether two perceived physical objects can fit together. We can use our imaginations to recognize that the objects depicted in Figs. 2(a) and (b) will fit together to form the object depicted in Figs. 2(d) and that the objects depicted in Figs. 2(a) and 2(c) will not so fit together. (Guan and Firestone 2020 and Hafri and Firestone 2021 argue that such fitting relations can be represented in perception.) Naturally, this ability is fallible, but presumably, when we confidently believe that we can solve such a fitting problem in imagination, we are reliably right. An attractive approach to explaining this ability posits mental models of three-dimensional structure. Starting with a perceptual representation of the object depicted in Fig. 2(d), we can form representations of its three-dimensional structure that incorporate a division between two parts that preserves the visible seams. Then we can perform mental rotations on the representations of parts to see whether they can be rotated in such a way that on the basis of our representations of the rotated parts we can construct mental images that match our perceptions of the objects depicted in Figs. 2(a) and (b).

Fig. 2
figure 2

Interlocking and noninterlocking shapes

A third ability that we have is the ability to solve problems in visual analogy. Some problems in visual analogy seem to be solvable only by imagining a transformation of the objects of perception. (Spröte and Fleming 2016 and Schmidt et al. 2019 show that human participants can recognize similar shape transformations applied to distinct shapes.) For example, suppose we perceive the original pyramid and the stretched pyramid depicted in the first row of Fig. 3, and suppose we perceive the original spiral depicted in the second row of Fig. 3. From our representations of the three-dimensional structure of the pyramids we might be able to abstract a transformation that transforms the representation of the original pyramid into the representation of the stretched pyramid. We might then apply this stretching transformation to our representation of the three-dimensional structure of the original spiral to produce a representation of the three-dimensional structure of a stretched spiral, which we can then use to imagine the stretched spiral. Similarly, from our representations of the original pyramid and the grown pyramid in the third row we may abstract a transformation that allows us to imagine the grown spiral on the basis of our representation of the original spiral.

Fig. 3
figure 3

Analogous transformation, stretching and growth

The exercise of any of these abilities might have an effect on metacognitive awareness. For example, when we imagine what an object will look like from the other side and then confirm through perception that we have imagined correctly, we may become aware of the fact that we have done so. Or suppose that a scene A is more like a scene B than it is like a scene C in some respect, and this greater similarity is due to aspects of the scene that are occluded from view. That is, A is not more like B than like C with respect to just the visible surfaces. In that case, we might also represent A as more like B than like C by utilizing our mental models of the three-dimensional structures of A, B and C, including portions of the scenes occluded from view. The representation of this similarity relation, even if not our representations of the three-dimensional structure of the scenes themselves, may become the object of metacognitive awareness. The object of this metacognitive awareness could be a judgment to the effect that A is more like B than like C, or it could be the placement of our representations of A, B and C in some kind of psychological similarity space. The possibility of such metacognitive awareness of representation of similarity might account for the sense, reported by some authors (e.g. Lande 2023b), that they are visually aware of the occluded portions of the objects of perception.

8 3D Representations in the Mental Rotation Literature

Not only everyday experience but also empirical research encourages us to posit mental models of three-dimensional structure. In a short, but revolutionary paper from 1971, Roger Shepard and Jacqueline Metzler reported the results of a study of participants’ ability to mentally “rotate” depicted objects. This paper initiated a very fruitful research program, involving experiments of many different kinds. Shepard and Feng (1972) showed that people can imagine the results of folding a piece of paper. Pinker (1980) showed that people can form images by means of a mental rotation of an arrangement of objects in three dimensions and then scan from one to another along a two-dimensional projection onto an imaginary glass plate. Cooper (1990) showed that people, when shown an object from one perspective, can reidentify it when viewed from another perspective. Most theorists seem to agree that these abilities call for explanations in terms of representations of three-dimensional structure of some sort. Do any of these studies include clear and persuasive accounts of the nature of such representations?

As far as I can see, no. Neuroimaging studies have sought to identify the regions of the brain involved in mental rotation (Hiew et al. 2023), have argued that they are analog representations (Zacks 2008) and have even explored the neural coding of aspects of three dimensional form (Yamane et al. 2008). But there is nothing detailing the nature of the representation relation. I cannot survey the entire psychological and neuroscientific (and computer vision) literature to demonstrate this lacuna. But to illustrate the disappointing state of affairs, I will mention a couple of Shepard’s own attempts.

Shepard and Chipman (1970) claim that the relation between an internal representation and what it represents is a “second-order” isomorphism “between (a) the relations among alternative external objects, and (b) the relations among their corresponding internal representations” (Shepard and Chipman 1970, p. 2). They do not define second-order isomorphisms more precisely, and they do not explain how the accuracy of an individual representation can be defined in these terms. In support of their contention, they describe an experiment showing that similarity judgments based on mental images may correlate well with similarity judgments based on perceptions, which tells us nothing about the nature of the relation between accurate representations and the things they represent. Shepard (1975) argues that the representation relation cannot be defined as a relation of first-order isomorphism (which he does not define). But his argument rests on the assumption, which he does not substantiate, that apart from what he calls second-order isomorphism there would be no reason to favor one first-order isomorphism over another (1975, pp. 91–92).

Shepard and Judd (1976) claim that subjects in the experiments of Shepard and Metzler (1971) make “comparisons by carrying out a mental analog of the actual physical rotation of one object into congruence with the other.” They claim, further, “that the mental representations that are internally transformed in this way are more akin to the three-dimensional objects portrayed than to the two-dimensional retinal projections of those objects” (p. 952; see also Metzler and Shepard 1974, pp. 195–197). But then in the next paragraph they explain that what they mean in speaking of an analogy between an internal process and an external process is really that there is an analogy between an inner imaginative process and a process of perception in watching a physical rotation. So in effect, the relation between the inner representation and the outer process drops out of the account, which means that they do not explain the nature of the representation relation. In the same way, expectations are raised and then dashed on p. 105 of Shepard 1975.

9 Map-like Representations

My thesis will be that the kinds of representations that constitute amodal completions are representations of the same kind that we will posit to explain mental rotation and the abilities identified in Sect. 7. In this section I will characterize map-like representations, and in the next two sections I will explain how perceptual and imagistic representations can usefully be conceived as such map-like representations.

A literal map on paper is a representation of, let us say, a terrain. The map contains parts (such as dots) corresponding to parts of the terrain (such as towns). There are properties of the parts (such as the shapes of the dots) and there are relations between the parts (such as that a line of a certain length is drawn between them). These properties of and relations between the parts correspond to properties and relations between the parts of the terrain (such as the sizes of the towns and the fact that there is a road of a certain length from one town to the other). The map is accurate if and only if the map is homomorphic to the terrain in the following sense: A part of the map has a given property if and only if the part of the terrain that that part of the map corresponds to has the property that the given property of the part of the map corresponds to, and parts of the map stand in a given relation to one another on the map if and only if the parts of the terrain that the parts of the map correspond to stand in the relation to one another that the given relation between the parts of the map corresponds to.

For example, if spot x is mapped into town A and spot y is mapped into town B, and the relation of being connected by a 5 cm long straight line is mapped into the property of being connected by a 5 km straight road, then the map is accurate only if: x and y are connected by a 5 cm straight line if and only if A and B are connected by a 5 km straight road. If the property of being square is mapped into the property of having a population greater than 20,000 and less than or equal to 30,000, then the map is accurate only if: x is square if and only A has a population greater than 20,000 and less than or equal to 30,000.Footnote 2

Perceptual representations in the brain can be representations that are accurate representations of objects or scenes in just this sense.Footnote 3 The representation contains parts and the parts have properties and stand in relations to each other, the parts of the representation are mapped into parts of the scene, the properties of and relations between the parts of the representation are mapped into properties of and relations between parts of the scene, and the representation is homomorphic to the scene represented relative to these mappings. A mental representation of a triangle homomorphic to a triangle in the world is graphically illustrated in Fig. 4.

Fig. 4
figure 4

A homomorphism between a structure in the brain (abstractly represented) and an object in the external world relative to mappings m and Π

I am not able to say what kinds of neurological entity the parts of a perceptual representation are, and I am not able to identify the relevant neurological relations between these parts. I will also not try to say what kinds of parts and what kinds of relations between them are the objects of representation. (For relevant discussion, see Lande 2020.) Although the relevant relations between parts of a map are also distances, and Fig. 4 represents the relations between the parts of the mental representation as distances, the pertinent relations between parts of the mental representation are presumably not literally distances. If the parts of the representation are certain assemblies of neurons and these assemblies are mapped into vertices in the object represented, then in principle the relation between the assemblies of neurons that is mapped into a certain distance between vertices could consist in the fact that the firing activity of the first assembly of neurons is, say, twice as great as the firing activity of the second assembly of neurons.

The mapping of parts into parts and the mapping of properties and relations into properties and relations relative to which accuracy is defined are not arbitrary, and they are not gerrymandered so as to ensure that the representation is accurate. Ultimately, we have to say what causal, functional, perhaps teleological, relation makes it the case that a given part of the representation maps into a given part of the object represented and what makes it the case that a given relation between parts of the representation is mapped into a given relation between parts of the scene or object represented. I will not try to do that here. In constructing our theory, we want to make sure that our account usually judges representations as accurate when we, on other grounds, have reason to suppose that they are accurate (e.g., they guide behavior appropriately). But our account will render some representations inaccurate, and it need not confirm all of our pre-theoretic evaluations.

Treating perceptual representations as map-like has at least two virtues over thinking of them as proposition-bearing representations like sentences. First, perceptual representations must be capable of representing properties of and relations between parts in a manner quite different from the manner in which predicates represent properties and relations. The properties and relations represented will often be gradable qualities, such as distances (between parts), angles (between lines at a vertex), and tilts (for surfaces). Predicates can represent gradable qualities only by means of units. For instance, we can say, of a line, that it is 10 centimeters long. It is not plausible that the brain uses particular units of measurement to represent gradable qualities. Maps in the brain can represent gradable qualities by having parts that have gradable properties and that stand in gradable relations to one another. Second, map-like representations can be subject to transformations, such as rotation, that are analogous to transformations of the objects that they represent. We could describe an object by means of a sentence (with the help of units of measurement) and then describe a geometrical transformation of it. But no transformation of the description will be analogous to the transformation described. By contrast, a map-like representation might be subject to transformations, such as rotation, analogous to the transformations that the object it represents is subject to.

10 Perspectival versus Deep Perceptual Representations

In order to explain our imaginative capacities and amodal completion, it will be useful to posit two distinct aspects of the requisite map-like mental representations, the perspectival aspect and the deep aspect. In this section I draw the distinction for (visual) perceptual representations. In the next section, I extend the distinction to the case of imagistic representations and put these distinctions to use in explanation.

The perspectival perceptual representation represents the parts of a scene that actually reflect light into the eyes. It may represent the distances, tilts and curvatures of visible surfaces as well as the shapes of those surfaces to the extent that they are visible. (Compare the 2.5 sketch of Marr 1982). The deep perceptual representation, in contrast, will represent not only the visible surfaces of the scene but also the occluded surfaces of the objects in the scene. It also determines the angle from which the arrangement is viewed. An issue in philosophy and psychology is whether the mind in any sense represents the shapes that objects project onto our retinas, for instance, the ellipticality of a coin viewed at an angle (e.g. Morales et al. 2020). This is not a topic that I address in this paper. The perspectival perceptual representation in my sense is not supposed to be a representation of how things appear. It is a partial representation of how things arranged before the senses are.

The deep perceptual representation will represent only as much of the occluded portions as the mind is prepared to try to represent on the basis of the visible cues. Even when the deep perceptual representation can be constructed, the agent may fail to construct it due to lack of need or interest. A deep perceptual representation may represent the part-whole structure of an object by containing a part that represents the whole and parts that represent parts of the whole. A deep perceptual representation may represent the structure of an object independently from its representation of the particular angles at which the limbs of the structure meet on a particular occasion (Green 2019). The relations represented may be in some respects indeterminate. A perceptual representation may represent b as being farther away from the viewer than a is without representing any particular distance.

The perspectival perceptual representation must be a projection of the deep perceptual representation from the perceiver’s point of view (but not a projection onto a plane). To define this projection relation, we can suppose that imaginary lines are drawn from points on the three-dimensional object of perception converging on the viewpoint of the observer in space. (For simplicity I assume that there is only one viewpoint, not one for each eye.) A perspectival perceptual representation is a projection of a deep perceptual representation in the relevant sense if and only if for every surface represented in the perspectival perceptual representation and every point x on that surface, no other point on a surface represented in the perspectival perceptual representation lies on one of the imaginary lines between x and the viewpoint of the observer (since otherwise, the perspectival representation would include a representation of an occluded surface). What I am thus calling a projection is a purely geometric relation, not a causal relation. The claim is not that the perspectival perceptual representation is constructed by projecting from the deep perceptual representation, and the claim is not that the deep perceptual representation is constructed by constructing a representation from which the perspectival perceptual representation projects.Footnote 4

The mind can often construct a deep perceptual representation on the basis of the perspectival perceptual representation and some kind of background knowledge. However, the perspectival perceptual representation is not read directly off the stimulation of the senses; it is itself the product of a great deal of processing; and it need not be entirely complete before any piece of the deep perceptual representation is constructed. There may be feedback from a deep perceptual representation that fixes the structure of the perspectival perceptual representation. For instance, the slant of a visible surface may be determined in the perspectival perceptual representation on the basis of a representation of the three-dimensional shape of an object in the deep perceptual representation.

For example, Fig. 5(a) shows a horizontal pyramid that projects a grey square image onto a vertical plane. It also represents three dimensions, labeled x, y and z, and a dotted arrow, which is the axis along which the pyramid lies. Figure 5(b) shows a horizontal pyramid rotated 90° on the x-z plane relative to the pyramid depicted in Fig. 5(a); it projects a grey isosceles triangle onto the vertical plane. In each of Fig. 5(a) and 5(b), we can think of the flat drawing of a pyramid as a representation of a deep perceptual representation that in turn represents a three-dimensional structure, namely, the pyramid in space. And we can think of the flat drawings of a grey rectangle and a grey triangle as representations of the perspectival perceptual representations that in turn represent that same pyramid in space. So the flat drawings on paper or on a computer screen represent perceptual representations (in the mind), both deep and perspectival. The physical pyramid in space (outside the mind) that the perceptions represent is not represented in the drawings. The alphabetic labels in the diagram are not to be thought of as literally parts of the perceptions that the diagrams represent. They are present in the diagram to allow us to refer to the parts of the perception and the relations between them. While the flat planes represented in the diagrams are supposed to stand in for perspectival perceptual representations, we should not assume the perspectival perceptual representations are analogically flat. As I explained above, the perspectival perceptual representation may represent the distances, tilts and curvature of visible surfaces.

Fig. 5
figure 5

Two perceptual representations of a pyramid. The pyramid represented is not shown

Deep and perspectival perceptual representations, such as those depicted in Fig. 5, may represent physical scenes outside the mind, in the sense that they may be deemed accurate or inaccurate to the extent that they are homomorphic to the objects of the representation relative to appropriate mappings of parts into parts and properties and relations into properties and relations in the manner explained in the previous section. For example, we can describe a relation between the parts labeled a and b by writing “Three(a, b)”. This might be thought of as describing a neurological relation between parts a and b such that this relation is mapped into a certain distance between corresponding parts of the pyramid perceived. For example, we may find that Three(a, b) if and only if the parts of the external, perceived pyramid that the parts a and b of the perception map into have the distance from one another that the relation Three maps into.

Not all perceptible properties will be represented via a homomorphism between structures, and I exclude from what I am calling the perspectival perceptual representation and the deep perceptual representation all properties that are not represented in this way. So color and shading may be excluded, and those aspects of texture that are not represented as an aspect of geometrical configuration (such as shininess) may be excluded. There may be other properties represented in perception, such as pliability (Paulun et al. 2017), that are also excluded. Accordingly, I here also do not offer an account of the representation of color and other nongeometrical properties, and I do not say how the representation of these properties is bound into the representation of geometric configuration. I acknowledge that these other properties may play a role in the generation of amodal completions (Kim et al. 2014; Yun et al. 2018b).

While perspectival perceptual representations are posited as an intermediate step in the construction of deep perceptual representations, there is another reason to posit perspectival perceptual representations, namely, that we can verbally report on what is hidden from view and what is visible. Typically, if a vase is partly occluded behind a potted plant, I can report that there is both a vase and a potted plant before me, but I can also report that the vase is partially hidden by the potted plant. A plausible hypothesis is the latter reports are grounded in the perspectival perceptual representation.

11 Imagistic Representations and Mental Rotation

Mental images do not typically represent a scene before an agent’s eyes and so cannot be described as accurate or inaccurate in accordance with the criterion of homomorphism. Nonetheless, we can use the same mappings in terms of which homomorphism is defined for perceptual representations to characterize the content of a mental image. Inasmuch as mental images share a format with perceptions and are composed of the same sorts of parts, properties of parts and relations between parts as in perceptions, we can say that the parts, properties and relations that constitute a mental image represent the kinds of things that they would represent if the image were in fact an accurate perception.

As we distinguish between perspectival and deep perceptual representations, we can distinguish between perspectival and deep imagistic representations. The deep imagistic representation represents the visible and (as imagined) occluded three-dimensional structure of an imagined scene, and the perspectival imagistic representation represents the surfaces of the scene imagined as visible. Whereas a perspectival perceptual representation is used to generate a deep perceptual representation, and a deep perceptual representation may sometimes be used to settle some details in a perspectival perceptual representation, it can also happen that a deep imagistic representation is generated independently and is then used to generate a perspectival imagistic representation.

For example, that is what happens when we imagine what something will look like from the other side. Suppose, as before, that Fig. 5(a) represents a perception, in both its deep and perspectival aspects, but that Fig. 5(b) represents an imagistic representation, in both its deep and perspectival aspects. Given the perception of the pyramid viewed from the back, represented in Fig. 5(a), we may form an image of what the pyramid would look like as viewed from the side by means of the following procedure: We perform a rotation transformation on the deep perceptual representation in 5(a), so that the pyramid is represented as oriented sideways with respect to the point of view, and then we use the deep imagistic representation of the pyramid, as viewed from the side, to construct a perspectival imagistic representation of the pyramid as viewed from the side. The drawing of a pyramid in 5(b) represents the deep imagistic representation that results from mentally rotating the deep perceptual representation in 5(a), and the grey triangle in 5(b) represents the perspectival imagistic representation generated from it. When we have filled in this perspectival representation of the pyramid with representations of color and texture, we have imagined what the pyramid would look like from the side.

My hypothesis is that the construction of deep perceptual representations and the transformation of these into deep imagistic representations is involved in all of the abilities to solve problems by means of mental imagery surveyed in Sect. 7 above. In all cases, we start from perspectival perceptual representations, construct on that basis deep perceptual representations, perform transformations on these that generate deep imagistic representations and then use these to construct perspectival imagistic representations.

12 Amodal Completion Revisited

We must distinguish between perceptually representing the visible and occluded three-dimensional structure of a scene and imagining what the scene would look like from a certain point of view. The three-dimensional structure of a scene is represented in a deep perceptual or imagistic representation in the way I have explained. When we imagine what a scene will look like, we generate a perspectival imagistic representation, often or usually on the basis of a deep imagistic representation. We also fill it in with representations of properties such as color that are not represented in the perspectival imagistic representation.

My alternative to the imagination theory is that the representations constituting amodal completions are deep perceptual representations. (I take no stand on whether they also include representations of properties not included in deep perceptual representations.) These will represent the three-dimensional structure of a scene, including parts that are not represented in the perspectival perceptual representation. Amodal completion does not require the agent to imagine what the occluded parts would look like if they were not hidden from view. Imagining what a hidden part of the scene would look like if not hidden will be an additional step that we can take if we so choose. We take that step by performing a transformation on our deep perceptual representation, resulting in a deep imagistic representation, and then using that deep imagistic representation to generate a perspectival imagistic representation that includes representations of the parts that are occluded in the actual scene.

Returning to Nanay’s cat behind the picket fence, our representation of the tail behind the slat is a representation in our deep perceptual representation of the scene. We can represent the tail in this way without forming a perspectival representation of the cat’s tail. However, if we wish to imagine what the cat’s tail would look like, we can do so in any number of ways. We can imagine walking around the fence, by performing a suitable transformation on our deep perceptual representation of the scene, and then construct, on the basis of the resulting deep imagistic representation, a perspectival imagistic representation, including a representation of the cat’s tail. Or we can imagine removing a slat from the fence, by performing a suitable transformation on our deep perceptual representation of the scene, resulting in a deep imagistic representation of the scene without that slat, and then, on the basis of that result, forming a perspectival imagistic representation including the tail. What we cannot do is form a perspectival imagistic representation of the cat’s tail while representing the tail as attached to the cat and representing the cat where it is, with its tail occluded by the slat.