Let’s consider a perceptual stimulus in the form of a square-shaped lattice of dots, presented on a white background, composed of four rows, each containing four dots (see Fig. 1). While such stimuli are usually artificially created for the purpose of psychological experiments, let’s assume that in the considered case the perceivable dots are in fact parts of a single object whose other fragments are somehow occluded or that has some sort of camouflage that makes its other parts blend in with the background. Given these assumptions, the dot-lattice satisfies Green’s conditions for being a perceptual object. Parts of the lattice exhibit perceptible regularities regarding spatial layout, sameness of colour, and sameness of shape. In addition, these regularities are likely to be sustained by causal interactions as the perceptible dots are in fact fragments of a single, partially visible object. Furthermore, the considered lattice is not “too big”, as it does not contain any arbitrary, additional parts, and not “too small”, as it cannot be easily expanded by adding nearby regions without a significant decrease in perceptible regularities (these additional regions would be fragments of the white background).
Let’s now consider a fragment of the lattice: two middle dots in the second row from the top. According to Green’s account, these two dots do not constitute a perceptual object. This is because an object composed of them is “too small”. The object composed of these two dots can be extended by adding nearby dots without a significant loss of perceptible regularities, as additional elements would stand in the same relevant relations as the original dots: spatial proximity, equal distance between dots, sameness of colour and shape. Nevertheless, it seems that we can easily focus attention on the considered two dots and experience them as an object. Such a conclusion is justified because our experience in this case has characteristic features of object-perception. First, the whole composed of the two dots is experienced as distinguished from a ground constituted by the rest of the lattice (Vecera 2000). Second, it is perceived as possessing properties and as having a mereological structure (O’Callaghan 2016). Third, it is commonly claimed that a characteristic feature of object perception is that vision represents objects as being numerically the same despite changes in position resulting from spatiotemporally continuous movement (see Scholl 2007 for review). It seems that the two-dot whole can be experienced as persisting through such changes. For instance, after a displacement of the whole lattice to the left, we would still perceive the considered two-dot whole as the same object it was before the movement. A possible worry is that the perception of persistence may be disturbed by the fact that the two-dot object is part of a larger entity, the whole lattice. In fact, there are studies which show that visual abilities for tracking and re-identifying items are poor when one must simultaneously track parts of the same object (Scholl et al. 2001). However, according to these studies’ results, the significant drop in tracking abilities occurs when the tracked parts are spatially connected and may move independently despite such connection (e.g., left edge of an elongated bar goes up while the right edge moves down). Such factors are not present in the example considered here, as the dots composing the lattice are disjoint, and when the whole lattice moves left, all of its parts also move in the same direction. In consequence, studies concerning visual tracking do not provide strong reasons for claiming that we are unable to perceive the persistence of objects such as a two-dot lattice fragment. Finally, the object composed of dots has a certain level of spatial coherence obtained in virtue of perceptual grouping principles like proximity and similarity (Kubovy and Wagemans 1995). Furthermore, a situation in which considered two dots are perceived as an object cannot be easily classified as an example of erroneous object perception. As stated in Sect. 2, errors in object perception occur when perceptible regularities between parts of a fragment of the environment relevant for visual objecthood are not founded upon actual causal interactions. However, regularities between the two dots considered here are founded upon causal interactions in the same way as in the case of the whole lattice.
These considerations suggest that there are fragments of environments that are not perceptual objects according to Green’s definition, but which nevertheless give us strong intuitive reasons to classify them as perceptual objects. I believe that there are three ways to resolve this conflict. First, one may want to modify Green’s account by rejecting the maximality condition excluding “too small” perceptual objects, such as the considered object composed of two dots from the example above. However, such a modification leads to a proliferation of perceptual objects. For instance, in the case of a simple figure, such as a black square on a white background, every uniformly black fragment of the square and every uniformly white fragment of the background would be a perceptual object.
Such a proliferation of perceptual objects has an important negative consequence as the category of perceptual objects becomes too broad. To illustrate this, let’s again consider a simple figure, such as a black square and a circular fragment, also uniformly black, that is part of the square. After dropping the maximality constraint, the circular fragment is a perceptual object. However, even if attentionally perceived, it does not possess the crucial characteristics of perceptual objects. In particular, though attention is focused on it, it is not experienced as a figure distinguished from the ground because it is not presented as separated by any qualitative borders from the rest of the black square. In consequence, the circular fragment is, in an important aspect, different from the whole black square, which is likely to be experienced as a figure even without being the focus of attention. Furthermore, it is also significantly different from the two-dot lattice fragment considered earlier, which is experienced as a figure because of attentional processing. Due to the lack of figure-status, even in cases of attentional processing, it is implausible to treat elements such as the circular black fragment within a square as perceptual objects. Similarly, the fragments of the white background on which a figure such as a black square is positioned are not experienced as figures distinguished from ground even if they stand in an attentional relation to a subject. Nevertheless, without the maximality constraint, all fragments of such a background are visual objects. In consequence, dropping the maximality requirement entails an unintuitive claim that there is a huge number of visual objects such that, even in perfectly good perceptual conditions involving attentional perceptual relations, they are not represented as having features typical for visual objecthood.
A second idea is to acknowledge the crucial observation that the presence of “too-small” perceptual objects, such as the object composed of two dots, is connected with focusing attention on the relevant fragment of the environment and thereby establishing a specific perceptual relation between this fragment and a subject. Relying on this idea, it may be proposed that perceptual objects do not have to be ‘maximal’ in the sense proposed by Green, but rather must be fragments on which a subject can focus attention. However, given that visual attention has an important spatial aspect (see Scholl 2001), in virtue of which it can be directed on virtually any spatially coherent region within a certain range of sizes, this solution also greatly multiplies the number of perceptual objects. In consequence, it faces analogous problems as the one consisting in dropping the maximality constraint. In particular, it entails that elements such as a circular fragment of a black square are visual objects despite the fact that even if they are perceived attentionally, they are not experienced as having characteristics crucial for visual objecthood.
Finally, a third option is to accept the constitutive role of the perceptual relation for certain perceptual objects. According to this approach, the two-dot fragment of a dot lattice is not a perceptual object unless it stands in an attentional perceptual relation to a subject. In this case, the occurrence of a perceptual relation is constitutive for a perceptual object because the presence of a perceptual relation is a necessary element of a minimal set of jointly sufficient conditions of this object’s existence. This solution does not lead to a proliferation of perceptual objects, as it allows for distinguishing three categories of perceptual entities. First, there are fragments of the environment, such as a black square on a white background, which are perceptual objects even without standing in a perceptual relation to a subject. Second, there are fragments of the environment, such as a two-dot fragment of a dot-lattice, which are perceptual objects only when standing in attentional perceptual relations. For these perceptual objects, the occurrence of a perceptual relation is constitutive. Finally, there are fragments of the environment, such as parts of a white background, which are not perceptual objects even when standing in an attentional, perceptual relation. The previous solutions were not able to draw the distinctions outlined above and wrongly treated entities belonging to the third category as visual objects.
Furthermore, the constitutive solution does not force us to abandon intuitions concerning the maximality of perceptual objects. In fact, it allows us to acknowledge additional ways of obtaining maximality that arise from attentional processing. It is well-established that attention influences how entities phenomenally look (Carrasco and Barbot 2019). For instance, attention increases the contrast between an object’s colour and its surroundings (Fuller and Carrasco 2006), the properties of attended objects are perceived in a more detailed and determined way (Prinzmetal et al. 1998), attended elements seem to be closer than the surroundings (Green 2016), and regions at which attention is directed are presented as having greater spatial resolution, which may modify how objects are divided into parts (Carrasco and Yeshurun 2009). In general, attention seems to be able to cause, by various means, attended elements to be experienced as less similar to the unattended surrounding. Hence, attention may produce a perception of maximality by increasing the perceived loss of regularities that would appear if attended fragments of the environment were mereologically extended to include nearby elements. In consequence, a fragment of the environment that is not a perceptual object before focusing attention (due to a lack of maximality) may become a perceptual object when it is attended because it comes to be perceived as exhibiting maximality. It should be noted that accommodating the above observation requires making an amendment regarding Green’s account, since a fragment of the environment may be a perceptual object, not only in virtue of ‘objective maximality’ concerning relations between its properties and properties of the environment, but also in virtue of gaining ‘subjective maximality’, i.e. by being perceived as maximal due to attentional processing.
The fact that a fragment of the environment may gain perceptual, subjective maximality as a result of attentional processing may suggest yet another alternative to the constitutive solution: namely, it can be proposed that perceptual objects should be characterised in a dispositional way, such that in order to be a perceptual object, a fragment of the environment must be objectively maximal, or has to possess a disposition to be experienced as maximal when attentionally perceived. It seems that by applying such a dispositional solution, one can attribute objecthood to paradigmatic perceptual objects such as a black square on a white background (which are objectively maximal), and to entities such as a two-dot lattice fragment, which have a disposition to be experienced as maximal when attended, but not to entities such as fragments of a uniform background, which are not experienced as maximal, even when processed attentionally, and so do not have crucial dispositions. As a result, the relevant distinctions are made without postulating that there are perceptual objects constituted by the occurrence of perceptual relations. The two-dot lattice fragment considered here is a perceptual object even without an attentional perceptual relation, because it has a dispositional property to be experienced as maximal when standing in such a relation.
Nevertheless, it should be noted that, in fact, every fragment of the environment has some disposition for being experienced as maximal due to attentional operations of a certain perceptual system. For instance, even a white fragment of a uniformly white background has a disposition for being subjectively maximal when standing in a perceptual relation to a system that can phenomenally distinguish an arbitrarily selected part of a surface. In consequence, the dispositional solution also leads to proliferation of perceptual objects as virtually any fragment of the environment is a perceptual object due to certain dispositional properties. This problem can be avoided by postulating that it is not a sufficient condition for a perceptual object to have any disposition to gain subjective maximality due to the operations of some perceptual system, but that these operations must also be operations of a human visual system or, alternatively, of a particular system S of some person. However, this introduces an ambiguity to the dispositional solution, as it is not obvious which operations should be considered those of a human visual system in general or of a particular visual system. For instance, we may imagine that a fragment of the environment has a disposition to become subjectively maximal in virtue of operations of a visual system equipped with some neural implant. There is no straightforward answer for whether such operations are still those of human vision, or whether system S is numerically the same with and without such an implant.
Similarly, alternative proposals, like characterising the relevant operations as those of a system S in standard conditions or as those of a typical visual system face an analogous problem. It is not obvious how to characterise ‘standard’ conditions or typical human vision—for instance, whether the operations of a typical visual system are those available to 75% or 90% of population. In consequence, for some fragments of the environment it is difficult to decide whether they are perceptual objects, as it is unclear if they have a disposition to be experienced as maximal solely due to operations of a ‘human visual system’, ‘typical human visual system’, or ‘particular system S’. I do not claim that it is impossible to solve these problems, as it is conceivable that a convincing notion of typical system or standard conditions may be developed. However, an advantage of the constitutive solution is that it is completely free from such difficulties, as it maintains that a fragment of the environment can be a perceptual object in virtue of being experienced as maximal due to actually standing in an appropriate perceptual relation. Hence, regardless of the dispositions a fragment of the environment has, if it is not currently objectively or subjectively maximal then it is not a perceptual object.