1 Introduction

Within contemporary philosophy of perception, there is wide consensus that perceptual experiences commonly present individual objects (e.g., Jeshion 2010; Mehta 2014; Schellenberg 2016; Soteriou 2000). Ordinary perceptual states seem to involve standing in perceptual relations to some objects, and it is in virtue of such relations that we are able to gain knowledge about objects’ properties, form singular thoughts, and conduct successful actions (e.g., Brewer 2015; Martin 2002; Nanay 2012). Furthermore, it is also commonly believed that perception not only presents individual objects but also, in ordinary circumstances, correctly presents them as being objects. For instance, objects are typically experienced as being spatiotemporally coherent, as persisting through time despite movement, and as being figures differentiated from ground (e.g., Green 2019; Kubovy and Schutz 2010; Scholl 2007; O’Callaghan 2016).

Entities that are, in ordinary perceptual situations, correctly presented as objects can be named ‘perceptual objects’. However, it is far from obvious how to more precisely characterize the category of perceptual objects. First, even within a single modality, such as vision, the class of perceptual objects seems to be significantly heterogeneous. For instance, it is plausible that visual perceptual objects are both tables, flat figures, swarms of insects, or clouds. Second, the situation becomes even more complicated if one takes into account perceptual objects related to different modalities, as it is not clear whether all human modalities present objects (e.g., it is controversial in the case of olfaction, see Millar 2019; Skrzypulec 2019) and what typical non-visual perceptual objects might be (e.g., whether auditory objects are sounds, sound-producing events, or material objects interacting with the medium, e.g., Nudds 2010; O’Callaghan 2011). Furthermore, a full account of perceptual objects should also accommodate multimodal experiences of objects in which unimodal characteristics related with different modalities are combined into a more complex multimodal whole (e.g., a visually presented dog and an auditorily presented barking sound can be multimodally experienced as related to a single object that looks a certain way and makes a certain sound, see Kubovy and Schutz 2010; O’Callaghan 2016).

In the philosophical and psychological literature, one can find various propositions of crucial features that distinguish the class of perceptual objects. For instance, it has been postulated that perceptual objects possess features that distinguish them from their surroundings (Kubovy and Schutz 2010), have mereological structure (O’Callaghan 2016), are able to persist through time (Millar 2019), are characterized by spatiotemporal coherence (Palmer and Rock 1994), or are exemplars of perceptually recognizable categories (Batty 2014). While these propositions differ in many respects, they share an important general feature: all of them characterize perceptual objects as largely subject-independent. More specifically, they do not attribute any significant constitutive role to the perceptual relation connecting a fragment of the environment with a perceiving subject. Fragments of the environment are perceptual objects no matter whether they stand in a perceptual relation to a subject, mainly by virtue of having a certain physical structure. For instance, a black square on a white background is a visual perceptual object due to surface properties that distinguish it from its surroundings no matter whether it stands in a perceptual relation to some subject. In other words, the occurrence of a perceptual relation does not have an impact on perceptual objects’ existence; it only allows them to be perceptually selected and represented as objects.

In this paper, I attempt to question this common assumption. In particular, I will argue that a proper account of perceptual objects should accommodate the constitutive role of the perceptual relation. This is because there exist fragments of the environment that are perceptual objects only when they stand in a perceptual relation to a subject.

In conducting my investigation, I will use a definition of perceptual objects from Green (2019) as a reference point. I believe that it is the most general account of perceptual objects that successfully describes a variety of such objects across modalities. Other popular characterizations of perceptual objects can be plausibly treated as specifications of Green’s proposal that describe some types of perceptual objects. I argue that Green’s definition should be amended in order to account for fragments of the environment which become perceptual objects in virtue of being perceptually attended.

I start by presenting Green’s approach to perceptual objects (Sect. 2) and subsequently (in Sect. 3) formulate my thesis about the constitutive role of perceptual relations. Relying on these explications, in Sect. 4 and 5, I present examples of perceptual objects and argue that they can be plausibly interpreted as being constituted by a perceptual relation. The presented examples concern fragments of perceptual groups and some dynamic patterns used in visual tracking experiments. The common feature of the considered subject-dependent perceptual objects is that they are entities which are not perceptual objects unless they stand in an attentional, perceptual relation to a subject.

2 Green’s Theory of Perceptual Objects

In his paper, Green (2019) provides the following general characterization of perceptual objects:

(Green’s Definition) An individual O is an object for a perceptual system S just in case there exist dimensions D1…Dn perceptible through S such that (1) O decomposes without remainder into parts that participate in perceptible, causally sustained regularities constructed from D1…Dn, and (2) any mereological extension of O would incur a significant loss of perceptible regularities constructed from D1…Dn.

The main idea is that perceptual objects are composed of parts that can be perceived by a given perceptual system as standing in some regularities concerning dimensions such as, in case of vision, colour, shape, or spatial layout. In addition, these perceptible regularities occur in virtue of the actual causal relationships in which the parts of an object participate. For instance, a table is usually visually perceived as composed of parts corresponding to its legs and a top, which stand in specific spatial relations. Furthermore, the spatial relations perceived in the structure of a table occur because its fragments actually participate in causal relations. In particular, the legs are attached to a top and support its weight. Similarly, a V-shaped flock of geese may be plausibly treated as a perceptual object. This is because its parts are perceived as standing in some regularities related to motion parameters and in fact the individual geese forming a flock stand in a causal relation as they influence each other’s behaviour. It should be noted that Green’s definition does not require that a perceptual system has to possess the ability to represent causal relations (see Siegel 2009 for a discussion). What is sufficient is that the perceptual system can represent regularities, for instance connected with spatial factors, hue, or motion, which are sustained by the occurrence of causal interactions.

Furthermore, Green’s theory of perceptual objects allows us to account for cases of errors in object perception. Such errors are likely to occur when a fragment of the environment has parts that can be perceived as standing in relevant regularities but where there are in fact no causal relationships corresponding to these perceptible regularities. For instance, something may look as if it is composed of spatially connected fragments, but these fragments are not actually connected, only positioned in proximity. Analogously, some elements may be seen as moving together in a regular fashion, but this common pattern of movement is just a contingent product of their independent behaviours. Such entities are not perceptual objects, but in many cases would be erroneously perceived as objects in the sense of perceptually ascribing to them properties typical for visual objecthood, such as being a figure distinguished from ground, or having a mereological part/whole structure. It should be noted that cases of erroneous visual object perception may, but do not have to, involve perceptual illusions, understood as situations in which perceptual system presents an entity as having properties which this entity does not really possess. For instance, there may be a case in which several causally unrelated appearing and disappearing items are by coincidence arranged in a spatiotemporal pattern which is picked out by visual system as a continuously moving, persisting object. This is a case of error in object perception, as the concerned entities do not constitute a visual object, and it involves a visual illusion, as in fact none of the relevant items are persisting through continuous movement, but are simply appearing and disappearing. On the other hand, we can imagine a situation in which several birds by coincidence fly in proximity and create a regular pattern. Here again, vision may erroneously pick out the group of birds as a visual object. However, it is less likely to characterize this case as a visual illusion because the ascribed object-related characteristics, for instance regarding the number of parts creating the whole, or the presence of proximity relations relevant for perceptual grouping, are accurately represented.

The considered approach to perceptual objects contains two additional constraints: perceptual objects are neither “too big” nor “too small”. The first constraint is expressed in point (1) of Green’s definition. According to this definition, a perceptual object cannot contain a part that does not participate in relevant perceptible regularities in relation to other of its parts. This constraint allows us to exclude perceptual objects that are “too big”, which can be obtained by extending proper perceptual objects to include some arbitrary parts. For instance, while a flock of geese can be plausible treated as a visual perceptual object, a flock of geese plus a tree is not a perceptual object as a tree does not have properties that create relevant regularities with properties of the elements composing the flock of geese.

According to the second constraint, presented in point (2) of Green’s definition, a perceptual object has to be maximal in the sense that extending it by adding further parts would result in a loss of perceptible regularities. This additional requirement is postulated to exclude perceptual objects that are “too small”, which are usually not represented as objects by human perceptual systems. For instance, a black square on a white background is clearly a visual perceptual object. However, within the square’s interior there exist many uniformly black, smaller fragments (e.g., a circular fragment in the centre of a black square). Parts of these smaller fragments stand in relevant perceptible regularities, in particular, connected with spatial proximity and uniformity of colour. Nevertheless, they are not visually experienced as additional objects as they do not satisfy the maximality constraint. Their structure can be extended without a significant loss in perceptible regularities by adding proximal, black parts up to the edges of a black square.

To sum up, there are several ways in which a fragment of the environment can fail to be a perceptual object for some perceptual system. First, its parts may not exhibit any significant regularities perceptible by the given system. Second, the perceptible regularities may be present, but they are not founded upon actual causal interactions (such entities are likely to be erroneously picked out as objects). Third, a fragment of the environment may be “too big” in the sense of having some irrelevant parts, or “too small” in that its structure may be extended without the loss of relevant perceptible regularities. In contrast, genuine perceptual objects have parts exhibiting perceptible regularities, these regularities are founded on causal interactions, and the structure of an object is neither “too big” nor “too small”.

One of the main advantages of Green’s definition is that it abstracts from contingent differences related to the functioning of specific perceptual modalities. For instance, while the visual modality may combine parts into complex objects mainly due to spatial relations, audition may do the same using temporal relations (see O’Callaghan 2008). Green’s definition can accommodate both these modes of perceptual organization as each of them can be described in terms of detecting some perceptible regularities. Because of this, many alternative approaches to perceptual objects may be treated as special cases of Green’s theory. For example, characterizations of perceptual objects postulating the crucial role of having a mereological structure, being discernible from a ground, or persisting through time, seem to express some specific dimensions and perceptible regularities used by some of the perceptual modalities in some of the environmental circumstances.

Green’s account does not explicitly attribute any constitutive role to perceptual relations. It seems that, according to his approach, perceptual objects exist no matter whether they stand in a perceptual relation to a subject. Perceptual relation is only needed to allow for experiencing a perceptual object as an object by a perceiving subject. In the next section, I analyze more closely the notion of constitutivity, which then allows me to precisely state my stronger thesis about the role of perceptual relations.

3 The Constitutivity of Perceptual Relations

The notion of constitution plays a significant role in several contemporary philosophical debates concerning, inter alia, the metaphysics of material objects (e.g., Bennett 2011; Wasserman 2004; Wilson 2007), the extended mind hypothesis (see Clark and Chalmers 1998 for a classic source), and theories of mechanistic explanation (e.g., Baumgarten and Gebharter 2015; Craver 2007; Krickel 2018). In all these fields, constitution, in contrast to diachronic causation, is usually interpreted as a synchronic relation in virtue of which some elements compose a higher-order whole (see Bennett 2011; Baumgarten and Gebharter 2015; Couch 2011). For instance, in case of the well-known problem of material constitution, constitution is a relation between a material (e.g., iron) and a thing built out of it (e.g., a statue).

The notion of ‘constitution’ is commonly treated as closely related to the notion of sufficient conditions. For example, the presence of a properly shaped lump of iron is sufficient for the existence of a certain statue. However, the presence of a constituting element may not be necessary for the existence of a constituted whole (Wilson 2007). For instance, it is plausible that the same statue may exist while being constituted by numerically different lumps of iron (e.g., a statue may lose some iron atoms without being replaced by another distinct statue). More specifically, it is proposed (see Couch 2011; Harbecke 2010) that an element A is constitutive for a whole W if and only if a condition concerning the presence of A is an element of a minimal set of conditions whose joint satisfaction is sufficient for the existence of W. In other words, that A is constitutive of W means that there is a way of obtaining W which requires the synchronic presence of A. However, there may be also other ways of obtaining W (i.e., distinct minimal sets of jointly sufficient conditions) that does not require the presence of A.

In subsequent sections, I apply the above understanding of constitution in my investigations concerning the constitutive role of the perceptual relation.Footnote 1 By a perceptual relation, I mean a relation between a subject and a fragment of the environment by virtue of which a subject perceives a fragment of the environment, is able to represent it as having some features, and as a consequence can formulate judgments about its characteristics. In particular, I focus on perceptual relations occurring due to the functioning of attentional mechanisms, for instance when one focuses attention on a fragment of the environment in order to more precisely analyze its structure.

I argue that there are perceptual objects having a minimal set of conditions jointly sufficient of their existence such that it includes a condition concerning standing in a perceptual relation to a subject. In particular, there are fragments of the environment that do not satisfy Green’s maximality constraint (they are “too small”) but are perceptual objects due to standing in an attentional, perceptual relation because in virtue of such a relation they are experienced as being maximal.

On the other hand, I do not argue that every perceptual object is constituted by a perceptual relation. Neither do I claim that standing in a perceptual relation to a fragment of the environment is sufficient to make a perceptual object out of this fragment. It is likely that standing in a perceptual relation is not enough for something to obtain the status of a perceptual object if a given fragment of the environment does not also possess some intrinsic, structural features. Furthermore, because it is possible for a perceptual object to have several, distinct minimal sets of conditions jointly sufficient for its existence, there may be entities such that in some contexts the occurrence of a perceptual relation is constitutive of their object-status while in other contexts they become perceptual objects in a subject-independent way.

In Green’s account, the occurrence of a perceptual relation is constitutive not for the existence of a perceptual object but for its being presented as an object. If perceptual objects are defined as those entities whose parts exhibit perceptible regularities founded upon actual causal interactions, then there should be far more perceptual objects in the world than a human visual system is able to represent at a given moment. Because of this, being a perceptual object for some subject S does not entail being represented as an object by a subject S. For the representation of a perceptual object as an object, the occurrence of a perceptual relation between a perceptual object and a subject is required. In the subsequent sections, I argue for the stronger thesis that there is a variety of perceptual objects such that the occurrence of a perceptual relation is constitutive for their existence and not only for representing them as objects.

4 Fragment of a Dot-Lattice

Let’s consider a perceptual stimulus in the form of a square-shaped lattice of dots, presented on a white background, composed of four rows, each containing four dots (see Fig. 1). While such stimuli are usually artificially created for the purpose of psychological experiments, let’s assume that in the considered case the perceivable dots are in fact parts of a single object whose other fragments are somehow occluded or that has some sort of camouflage that makes its other parts blend in with the background. Given these assumptions, the dot-lattice satisfies Green’s conditions for being a perceptual object. Parts of the lattice exhibit perceptible regularities regarding spatial layout, sameness of colour, and sameness of shape. In addition, these regularities are likely to be sustained by causal interactions as the perceptible dots are in fact fragments of a single, partially visible object. Furthermore, the considered lattice is not “too big”, as it does not contain any arbitrary, additional parts, and not “too small”, as it cannot be easily expanded by adding nearby regions without a significant decrease in perceptible regularities (these additional regions would be fragments of the white background).

Fig. 1
figure 1

Lattice of dots

Let’s now consider a fragment of the lattice: two middle dots in the second row from the top. According to Green’s account, these two dots do not constitute a perceptual object. This is because an object composed of them is “too small”. The object composed of these two dots can be extended by adding nearby dots without a significant loss of perceptible regularities, as additional elements would stand in the same relevant relations as the original dots: spatial proximity, equal distance between dots, sameness of colour and shape. Nevertheless, it seems that we can easily focus attention on the considered two dots and experience them as an object. Such a conclusion is justified because our experience in this case has characteristic features of object-perception. First, the whole composed of the two dots is experienced as distinguished from a ground constituted by the rest of the lattice (Vecera 2000). Second, it is perceived as possessing properties and as having a mereological structure (O’Callaghan 2016). Third, it is commonly claimed that a characteristic feature of object perception is that vision represents objects as being numerically the same despite changes in position resulting from spatiotemporally continuous movement (see Scholl 2007 for review). It seems that the two-dot whole can be experienced as persisting through such changes. For instance, after a displacement of the whole lattice to the left, we would still perceive the considered two-dot whole as the same object it was before the movement. A possible worry is that the perception of persistence may be disturbed by the fact that the two-dot object is part of a larger entity, the whole lattice. In fact, there are studies which show that visual abilities for tracking and re-identifying items are poor when one must simultaneously track parts of the same object (Scholl et al. 2001). However, according to these studies’ results, the significant drop in tracking abilities occurs when the tracked parts are spatially connected and may move independently despite such connection (e.g., left edge of an elongated bar goes up while the right edge moves down). Such factors are not present in the example considered here, as the dots composing the lattice are disjoint, and when the whole lattice moves left, all of its parts also move in the same direction. In consequence, studies concerning visual tracking do not provide strong reasons for claiming that we are unable to perceive the persistence of objects such as a two-dot lattice fragment. Finally, the object composed of dots has a certain level of spatial coherence obtained in virtue of perceptual grouping principles like proximity and similarity (Kubovy and Wagemans 1995). Furthermore, a situation in which considered two dots are perceived as an object cannot be easily classified as an example of erroneous object perception. As stated in Sect. 2, errors in object perception occur when perceptible regularities between parts of a fragment of the environment relevant for visual objecthood are not founded upon actual causal interactions. However, regularities between the two dots considered here are founded upon causal interactions in the same way as in the case of the whole lattice.

These considerations suggest that there are fragments of environments that are not perceptual objects according to Green’s definition, but which nevertheless give us strong intuitive reasons to classify them as perceptual objects. I believe that there are three ways to resolve this conflict. First, one may want to modify Green’s account by rejecting the maximality condition excluding “too small” perceptual objects, such as the considered object composed of two dots from the example above. However, such a modification leads to a proliferation of perceptual objects. For instance, in the case of a simple figure, such as a black square on a white background, every uniformly black fragment of the square and every uniformly white fragment of the background would be a perceptual object.

Such a proliferation of perceptual objects has an important negative consequence as the category of perceptual objects becomes too broad. To illustrate this, let’s again consider a simple figure, such as a black square and a circular fragment, also uniformly black, that is part of the square. After dropping the maximality constraint, the circular fragment is a perceptual object. However, even if attentionally perceived, it does not possess the crucial characteristics of perceptual objects. In particular, though attention is focused on it, it is not experienced as a figure distinguished from the ground because it is not presented as separated by any qualitative borders from the rest of the black square. In consequence, the circular fragment is, in an important aspect, different from the whole black square, which is likely to be experienced as a figure even without being the focus of attention. Furthermore, it is also significantly different from the two-dot lattice fragment considered earlier, which is experienced as a figure because of attentional processing. Due to the lack of figure-status, even in cases of attentional processing, it is implausible to treat elements such as the circular black fragment within a square as perceptual objects. Similarly, the fragments of the white background on which a figure such as a black square is positioned are not experienced as figures distinguished from ground even if they stand in an attentional relation to a subject. Nevertheless, without the maximality constraint, all fragments of such a background are visual objects. In consequence, dropping the maximality requirement entails an unintuitive claim that there is a huge number of visual objects such that, even in perfectly good perceptual conditions involving attentional perceptual relations, they are not represented as having features typical for visual objecthood.

A second idea is to acknowledge the crucial observation that the presence of “too-small” perceptual objects, such as the object composed of two dots, is connected with focusing attention on the relevant fragment of the environment and thereby establishing a specific perceptual relation between this fragment and a subject. Relying on this idea, it may be proposed that perceptual objects do not have to be ‘maximal’ in the sense proposed by Green, but rather must be fragments on which a subject can focus attention. However, given that visual attention has an important spatial aspect (see Scholl 2001), in virtue of which it can be directed on virtually any spatially coherent region within a certain range of sizes, this solution also greatly multiplies the number of perceptual objects. In consequence, it faces analogous problems as the one consisting in dropping the maximality constraint. In particular, it entails that elements such as a circular fragment of a black square are visual objects despite the fact that even if they are perceived attentionally, they are not experienced as having characteristics crucial for visual objecthood.

Finally, a third option is to accept the constitutive role of the perceptual relation for certain perceptual objects. According to this approach, the two-dot fragment of a dot lattice is not a perceptual object unless it stands in an attentional perceptual relation to a subject. In this case, the occurrence of a perceptual relation is constitutive for a perceptual object because the presence of a perceptual relation is a necessary element of a minimal set of jointly sufficient conditions of this object’s existence. This solution does not lead to a proliferation of perceptual objects, as it allows for distinguishing three categories of perceptual entities. First, there are fragments of the environment, such as a black square on a white background, which are perceptual objects even without standing in a perceptual relation to a subject. Second, there are fragments of the environment, such as a two-dot fragment of a dot-lattice, which are perceptual objects only when standing in attentional perceptual relations. For these perceptual objects, the occurrence of a perceptual relation is constitutive. Finally, there are fragments of the environment, such as parts of a white background, which are not perceptual objects even when standing in an attentional, perceptual relation. The previous solutions were not able to draw the distinctions outlined above and wrongly treated entities belonging to the third category as visual objects.

Furthermore, the constitutive solution does not force us to abandon intuitions concerning the maximality of perceptual objects. In fact, it allows us to acknowledge additional ways of obtaining maximality that arise from attentional processing. It is well-established that attention influences how entities phenomenally look (Carrasco and Barbot 2019). For instance, attention increases the contrast between an object’s colour and its surroundings (Fuller and Carrasco 2006), the properties of attended objects are perceived in a more detailed and determined way (Prinzmetal et al. 1998), attended elements seem to be closer than the surroundings (Green 2016), and regions at which attention is directed are presented as having greater spatial resolution, which may modify how objects are divided into parts (Carrasco and Yeshurun 2009). In general, attention seems to be able to cause, by various means, attended elements to be experienced as less similar to the unattended surrounding. Hence, attention may produce a perception of maximality by increasing the perceived loss of regularities that would appear if attended fragments of the environment were mereologically extended to include nearby elements. In consequence, a fragment of the environment that is not a perceptual object before focusing attention (due to a lack of maximality) may become a perceptual object when it is attended because it comes to be perceived as exhibiting maximality. It should be noted that accommodating the above observation requires making an amendment regarding Green’s account, since a fragment of the environment may be a perceptual object, not only in virtue of ‘objective maximality’ concerning relations between its properties and properties of the environment, but also in virtue of gaining ‘subjective maximality’, i.e. by being perceived as maximal due to attentional processing.

The fact that a fragment of the environment may gain perceptual, subjective maximality as a result of attentional processing may suggest yet another alternative to the constitutive solution: namely, it can be proposed that perceptual objects should be characterised in a dispositional way, such that in order to be a perceptual object, a fragment of the environment must be objectively maximal, or has to possess a disposition to be experienced as maximal when attentionally perceived. It seems that by applying such a dispositional solution, one can attribute objecthood to paradigmatic perceptual objects such as a black square on a white background (which are objectively maximal), and to entities such as a two-dot lattice fragment, which have a disposition to be experienced as maximal when attended, but not to entities such as fragments of a uniform background, which are not experienced as maximal, even when processed attentionally, and so do not have crucial dispositions. As a result, the relevant distinctions are made without postulating that there are perceptual objects constituted by the occurrence of perceptual relations. The two-dot lattice fragment considered here is a perceptual object even without an attentional perceptual relation, because it has a dispositional property to be experienced as maximal when standing in such a relation.

Nevertheless, it should be noted that, in fact, every fragment of the environment has some disposition for being experienced as maximal due to attentional operations of a certain perceptual system. For instance, even a white fragment of a uniformly white background has a disposition for being subjectively maximal when standing in a perceptual relation to a system that can phenomenally distinguish an arbitrarily selected part of a surface. In consequence, the dispositional solution also leads to proliferation of perceptual objects as virtually any fragment of the environment is a perceptual object due to certain dispositional properties. This problem can be avoided by postulating that it is not a sufficient condition for a perceptual object to have any disposition to gain subjective maximality due to the operations of some perceptual system, but that these operations must also be operations of a human visual system or, alternatively, of a particular system S of some person. However, this introduces an ambiguity to the dispositional solution, as it is not obvious which operations should be considered those of a human visual system in general or of a particular visual system. For instance, we may imagine that a fragment of the environment has a disposition to become subjectively maximal in virtue of operations of a visual system equipped with some neural implant. There is no straightforward answer for whether such operations are still those of human vision, or whether system S is numerically the same with and without such an implant.

Similarly, alternative proposals, like characterising the relevant operations as those of a system S in standard conditions or as those of a typical visual system face an analogous problem. It is not obvious how to characterise ‘standard’ conditions or typical human vision—for instance, whether the operations of a typical visual system are those available to 75% or 90% of population. In consequence, for some fragments of the environment it is difficult to decide whether they are perceptual objects, as it is unclear if they have a disposition to be experienced as maximal solely due to operations of a ‘human visual system’, ‘typical human visual system’, or ‘particular system S’. I do not claim that it is impossible to solve these problems, as it is conceivable that a convincing notion of typical system or standard conditions may be developed. However, an advantage of the constitutive solution is that it is completely free from such difficulties, as it maintains that a fragment of the environment can be a perceptual object in virtue of being experienced as maximal due to actually standing in an appropriate perceptual relation. Hence, regardless of the dispositions a fragment of the environment has, if it is not currently objectively or subjectively maximal then it is not a perceptual object.

5 Multiple Object Tracking

Examples of perceptual objects for which the occurrence of perceptual relation is constitutive are not restricted to static stimuli. For a dynamic version, let’s consider a stimulus used in the standard version of the Multiple Object Tracking experiment (see Pylyshyn 2007). In such studies, a participant is presented with a set of objects with the same properties (e.g., black circles of the same diameter). At the beginning, several of these objects are marked as targets, for instance they blink few times, and the rest serve the role of distractors. Subsequently, all objects start to move in a random fashion and the task is to track targets. When the movement stops, the test-subject points out which objects are the targets. If the whole set of dots was not artificially created, but is some actual physical entity (e.g., a small swarm of insects or circling birds which causally influence each other motion), it will be correctly perceived as an object. This is because elements of the set would exhibit perceptible regularities concerning moving in proximity within a restricted region that are founded on causal interactions.

One of the major interpretations of what happens during Multiple Object Tracking is that participants treat targets as vertices of a single, virtual moving figure. A major reason for this proposal is that there are results suggesting that people do not individually track the identity of each target (see Scholl 2009). For instance, when targets are assigned labels (e.g., A, B, C, D) at the beginning of an experiment, which subsequently disappear during movement, it is difficult for participants to assign a proper label to a target when the movement stops. On the other hand, people are very successful in deciding which objects were targets and which were distractors. It suggests that targets are tracked as members of some whole composed of targets and not as individual entities. Furthermore, a single-figure interpretation is supported by the fact that attention during tracking is focused on the centre of virtual figure composed of targets (see Yantis 1992).

Given this interpretation, tracked targets are experienced as constituting an object whose parts exhibit analogous regularities, as in the case of the whole stimuli composed of both targets and distractors. However, the figure composed solely of targets does not satisfy the maximality constraint, as it could be extended by adding other circles without a significant loss of perceptible regularities. It seems that once again it is an example of a fragment of the environment, this time composed of moving elements, which is not a perceptual object unless it stands in an attentional perceptual relation. In this dynamic case there are also reasons to believe that in virtue of attention a fragment of the environment is experienced as being maximal. This is because in the case of Multiple Object Tracking experiments, a phenomenon known as ‘inhibition of distractors’ is observed (Pylyshyn 2006). It has been discovered that changes happening during tracking, like the brief appearance of a small dot, are harder to notice when they occur on distractors in comparison to analogous changes both on targets and on the background between moving objects. It seems that in virtue of attentional processing, unattended distractors are represented in a more rudimentary way and so the perceptible relations concerning regularities between them and targets are weakened.

A figure composed of targets tracked during the Multiple Object Tracking experiment offers another example of a fragment of the environment that comes to be perceptual object when it stands in an attentional, perceptual relation to a subject. The common characteristic of such perceptual objects is that due to attentional processing they are experienced as maximal. Attention is able to increase the colour contrast of the perceived region (Fuller and Carrasco 2006), allows it to represent properties and spatial relations in a more detailed way (Prinzmetal et al. 1998), inhibits the unattended elements (Pylyshyn 2006), and may lead the attended elements to be experienced as positioned closer to the observer (Green 2016). By these different means attentional processing increases the perceived differences between attended elements and their surroundings. In consequence, an attended fragment of the environment may be experienced as maximal, since extending its structure by adding elements of the surroundings may lead to a significant decrease in the perceived strength of regularities.

Nevertheless, one may argue that object perception relying on changes introduced by attentional processing is erroneous. In the considered cases, a fragment of the environment becomes a perceptual object because attention strengthens the experienced intrinsic regularities between its parts in comparison to the external regularities occurring between fragments of an object and fragments of the surroundings. However, these changes happen without any modifications concerning the actual causal interactions between an object’s parts, as the modifications that occur are merely products of attentional processing. Furthermore, the experiential changes introduced by attentional processing are likely to involve some sort of illusory perception. For example, in the earlier example of a dot lattice and two-dot object, the two-dot object may be experienced as maximal because attended dots seem to be closer to the perceiver while, in fact, all dots are positioned in the same depth plane. In consequence, one may suppose that the visual system errs in treating such entities as objects, as it fails in conducting its function of selecting fragments of the environment that are significantly distinct from the surroundings in terms of perceptible regularities and underlying causal interactions. For instance, it may be doubted whether vision performs its function correctly when picking out the two-dot fragment, as only the whole lattice is significantly different from the surroundings. In consequence, vision performs in a suboptimal fashion when selecting the two-dot fragment, as the whole lattice is a better candidate for being selected as an object.

I believe that such an argument is not successful, as it neglects the fact that in different contexts, the significance of the same fragment of the environment for a perceptual system may vary independently of changes in causal patterns and perceptible regularities between parts of this fragment. The function of visual mechanisms is not simply to pick out as objects those fragments which differ from the surroundings by some objective value, but rather to select those fragments which, given certain circumstances, are of high relevance for visual perception. It is plausible to assume that, in some circumstances, certain perceptible regularities and underlying causal patterns may be sufficient to constitute a perceptual object, while in others they are not, due to their lower significance for a perceptual system. Significance is likely to be modulated by a variety of factors, such as a task the visual system is performing. For example, while in various contexts a distinction between a two-dot fragment and the whole lattice may be not perceptually significant, in others, for instance when a perceptual task is to count how many pairs constitute the lattice, the same differences are of high relevance. Thus, it cannot be simply stated the whole lattice is a ‘better’ visual object that should be chosen by an optimally functioning visual system instead of the two-dot fragment. When in a given context its relevance is high, the two-dot fragment can be properly selected as a perceptual object, even if it is not experienced wholly accurately and is not clearly distinguished from its surroundings in terms of perceptible regularities and underlying causal patterns.

From this perspective, attention may be understood as one of the modulating mechanisms strengthening the perceptual relevance of relations between perceived elements. In particular, exogenous attention enhances the relevance of an element in virtue of its properties, like movement, and endogenous attention enhances the relevance of an element in virtue of the subject’s beliefs and expectations. Because of this, the constitution of a perceptual object through the establishment of an attentional, perceptual relation between a fragment of the environment and a subject is not necessarily an error in a visual system’s function. What actually happens is that, due to attentional modification of perceived regularities, a fragment of the environment is treated as being of high significance and, in consequence, gains the status of a visual object. In such a case, the visual system can properly conduct its function of picking out as objects the fragments of the environment which, in a given context, are relevantly distinct from the surrounding.

6 Conclusions

The major contemporary characterizations of perceptual objects do not attribute a constitutive role to the occurrence of perceptual relations. According to such theories, perceptual objects exist no matter whether they stand in a perceptual relation to a subject, and such a relation is only relevant for representing a perceptual object as an object. I have argued that such a subject-independent perspective is incomplete in the case of human visual perception. This is because there are fragments of the environment that are not perceptual objects unless they stand in a perceptual relation associated with the functioning of attentional mechanisms.

My arguments do not entail that the occurrence of a perceptual relation is constitutive for all perceptual objects or that every fragment of the environment can become a perceptual object simply by being processed by attentional mechanisms. Instead, what is shown is that attentional mechanisms can modify the perception of maximality and so lead to obtaining the status of a perceptual object. In particular, in virtue of standing in an attentional, perceptual relation a nonmaximal fragment of the environment may be experienced as maximal. Such subject-dependent perceptual objects do not arise from errors in object perception. They are most likely to appear when a fragment of the environment is of some special importance for a subject, for example when significant information is likely to be present on an initially nonmaximal fragment of the environment. Given this, the existence of subject-dependent perceptual objects is a result of the proper functioning of perceptual mechanisms that organize the visual scene according to the cognitive interests of a subject.