Within the contemporary philosophy of perception, it is commonly claimed that there are significant similarities between our visual experiences of objects and auditory experiences of sounds. First, both visually experienced objects and auditorily experienced sounds, i.e. objects and sounds considered in respect of how they are perceptually experienced, seem to be subjects to which various properties are attributed, like colour and size in the case of objects; pitch and loudness in case of sounds (Cohen 2010; O’Callaghan 2008). Second, in both cases, perceptually experienced entities are distinguished from ground and differentiated from other simultaneously perceived objects and sounds (Matthen 2010; Nudds 2009). Third, objects, as well as sounds are experienced as persisting entities that stay numerically the same despite certain changes and can be re-identified at different moments (Davies 2010; O’Callaghan 2011a). Because of these similarities, it is believed that visually experienced objects and auditorily experienced sounds can be classified as belonging to a common ontological category of individuals (Nudds 2010; Scruton 2009).

However, it is also commonly claimed that while visually experienced objects and auditorily experienced sounds persist through time and change, in each case the persistence is experienced differently. Auditorily experienced sounds seem to be entities which are extended in time, which seem to unfold in time, and which are composed of temporal parts (e.g., Nudds 2014; O’Callaghan 2011a). On the other hand, visually experienced objects are not characterized by such temporal extension and seem to be “wholly present” at each moment of their existence (O’Callaghan 2008).Footnote 1

These metaphors, used by philosophers of perception to express a difference between visual and auditory experiences, come from metaphysical debates about the opposition between endurance and perdurance, where the term “endurants” names entities that persist by being “wholly present” at different moments and the term “perdurants” names those entities that persist by being temporally extended (e.g., Effingham 2012). In this context, it can be asked whether it is justified to classify visually experienced objects and auditorily experienced sounds as different types of individuals, enduring individuals and perduring individuals respectively, given the apparent difference between visual and auditory experiences of persistence. The goal of this paper is to investigate whether the endurance/perdurance distinction is applicable to visually experienced objects and auditorily experienced sounds.

This question also has a broader significance as it concerns the unity of human persistence perception. One may believe that there is a single way of presenting persistence, common for all perceptual modalities. However, it may also be the case that human perception is disunified in this respect, and individual modalities present persistence in distinct ways. If visually experienced objects are endurants and auditorily experienced sounds are perdurants, then such a result supports the disunity hypothesis. On the other hand, the hypothesis that there is a unity of human persistence perception is corroborated, if visually experienced objects and auditorily experienced sounds persist in the same manner.

Later in the paper, I argue that it is not justified to characterize visually experienced objects as endurants and auditorily experienced sounds as perdurants. In fact, quite surprisingly, both these types of entities posses many characteristics which are usually attributed to perdurants. Moreover, the initial intuition that visually experienced objects persist differently than auditorily experienced sounds can be explained without postulating that these entities belong to separate ontological categories. A proper explanation consists of pointing out that properties characteristic for perdurants are perceived far more frequently in usual auditory experiences of sounds than in usual visual experiences of objects.Footnote 2

To provide evidence for the above claims, I analyze the four major ways in which a distinction between endurants and perdurants is explicated within the contemporary analytic metaphysics. I show that regardless of the explication that is adopted, visually experienced objects and auditorily experienced sounds do not differ in the way they persist.

Section 1 introduces the Strong Thesis and the Weak Thesis. According to the Strong Thesis, the intuition about “temporal extension” of auditorily experienced sounds and “whole presence” of visually experienced objects has to be explained by postulating an ontological difference between these entities. According to the Weak Thesis, these intuitions can be accounted for without claiming that visually experienced objects and auditorily experienced sounds belong to separate ontological categories. In the subsequent Sects. 25, the truth of both the Strong and the Weak Theses is investigated from the perspective of various explications of the endurance/perdurance debate. These explications are connected with the notion of temporal parts (Sect. 2), the notion of being temporally located (Sect. 3), debates regarding A- and B-theory of time (Sect. 4), and the idea of determining identity by temporally local properties (Sect. 5). In the last section, I summarize the obtained results and claim that while the Strong Thesis is false, the Weak thesis is likely to be true.

1 Endurance, Perdurance, and Philosophy of Perception

Philosophical works concerning auditory perception quite frequently contain fragments suggesting that auditorily experienced sounds (henceforth AS) persist differently than visually experienced objects (henceforth VO). First, AS are claimed to be events or processes (O’Callaghan 2007, 2011a, pp. 379–380; Scruton 2009, pp. 60–62). Second, philosophers believe that we experience sounds as “temporally extended” (Nudds 2009, p. 81). Third, it seems that sounds unfold over time (Nudds 2014, p. 343; O’Callaghan 2011b, pp. 151–152). Fourth, it is claimed that AS have temporal parts (Matthen 2010, p. 80; O’Callaghan 2008, p. 823). Fifth, and in contrast, it is believed that VO have only spatial and not temporal parts and are not “temporally extended” but are “wholly present” at each moment of their existence (O’Callaghan 2008, p. 815).

These metaphors are virtually the same as those used in metaphysical discussions concerning enduring and perduring entities. According to a standard explication (see Lewis 1986, p. 202), “persistence” is a neutral term which simply describes an ability of an entity to exist at more than one moment. However, there are at least two alternative ways in which an entity can persist. The first way is that of “enduring” which is usually described by a metaphor of being “wholly present” at each moment at which an entity exists. The second way, “perduring”, is commonly characterized by being “temporally extended” or having “temporal parts”. In fact, this connection between persistence perception and the endurance/perdurance debate is explicitly expressed by some philosophers who suggest that our conflicting metaphysical intuitions about persistence may arise from the difference in how persistence is experienced in vision and audition (O’Callaghan 2014). Given these initial observations, we may wonder how to more precisely characterize the relation between the endurance/perdurance distinction and differences in the visual and auditory experiences of persistence.

As a starting point, I assume that philosophers of perception are right in claiming that a strong intuition leads us to believe that that VO and AS persist in a different way. This intuition can be expressed as follows:

(Intuition of Persistence Difference) It seems that visual experiences of objects’ persistence differ from auditory experiences of sounds’ persistence in a way that can be plausibly described by using metaphors connected with the endurance/perdurance debate: VO seem to endure while AS seem to perdure.

The question is, what grounds this intuition? We may believe that we have the Intuition of Persistence Difference simply because VO and AS belong to separate categories along the endurants/perdurants distinction. I will call this belief the Strong Thesis:

(Strong Thesis) We have the Intuition of Persistence Difference because VO and AS belong to separate ontological categories: VO are endurants while AS are perdurants.

However, it can also be claimed that to explain the presence of the Intuition of Persistence Difference there is no need to postulate an ontological difference between VO and AS. Instead, the Weak Thesis may be proposed according to which Intuition of Persistence Difference is grounded in the fact that not all features associated with endurants and perdurants are equally common in experiences coming from different modalities:

(Weak Thesis) We have the Intuition of Persistence Difference because features characteristic for perdurants are more commonly possessed by AS than by VO or features characteristic for endurants are more commonly possessed by VO than by AS.

In the subsequent sections, I argue against the Strong Thesis, and I defend a variant of the Weak Thesis, according to which VO and AS are neither paradigmatic endurants nor paradigmatic perdurants. However, characteristic features of perdurants are more commonly experienced in audition than in vision.

While philosophers of perception mainly use the intuitive metaphors of being “temporally extended” versus being “wholly present” when discussing the endurants/perdurants status of VO and AS, analytic metaphysicians have developed several explications of what it means to be an endurant or a perdurant. Further, I analyse the four most influential explications of the endurance/perdurance division and argue that in each case the Weak Thesis is more plausible than the Strong Thesis. Each of the considered metaphysical explication is interpreted in such a way that it can be applied in the perceptual context of investigations concerning VO and AS.

It should be noted that by analyzing the status of VO and AS, I investigate objects and sounds as they are presented by vision and audition. Furthermore, I do not make any assumption about whether our perceptual experiences are usually veridical or not. If our perception is not completely veridical in respect of representing persistence, it may be the case that the actual endurance/perdurance status of objects and sounds is different from that of VO and AS. Accordingly, I do not discuss theories concerning the nature and identity conditions of physical sounds or material objects (e.g., stating that physical sounds are events, and so are likely to be perdurants, see O’Callaghan 2007) but the characteristics that are ascribed to sounds and objects in auditory and visual experiences. However, in the concluding section, I make some notes regarding the consequences of my investigation for the question concerning the accuracy of perceptual experiences of persistence.

According to the first of the considered explications, what distinguishes perdurants from endurants is having proper temporal parts (Sect. 2). In the case of the second explication, the endurance/perdurance controversy is about different ways of being temporally located: a perdurant is located at a single, extended period of time while an enduring entity is multi-located at many instantaneous moments (Sect. 3). The third explication treats the endurance/perdurance debate as a discussion concerning the theories of time (Sect. 4). Finally, the distinction between endurants and perdurants can be understood as a distinction between entities whose identity is completely determined by temporally local properties and those which do not have such a characteristic (Sect. 5).

2 Temporal Parts

Probably the most popular way of explicating the endurance/perdurance distinction is by referring to the notion of temporal parts. In this case, the metaphor of being “temporally extended” is understood in terms of having proper temporal parts which are not possessed by entities which persist by being “wholly present”. Intuitively speaking, an entity with proper temporal parts is only partially present at each moment of its existence, as it is composed of parts existing at different moments. In this sense it is “temporally extended”.

According to a common definition (see Effingham 2012; Olson 2006; Sider 1997; Simons 2014 for different variants of this idea), an entity x is a temporal part of an entity y if and only if (1) x is a part of y at some period T of y’s existence and (2) x is a maximal part in the sense that all other parts possessed by y during the period T are parts of x. For instance, a minimal, instantaneous temporal part of a dog is its maximal part which exists at a single moment. Intuitively speaking, it is a time-slice of a dog. Larger temporal parts exist not at a single moment, but at a period of time, and may be composed of many instantaneous time-slices. Under the explication relying on the notion of temporal parts, perdurants are entities that have proper temporal parts, i.e. temporal parts that are not identical with the whole entity, while an endurant has only one temporal part, i.e. itself.

In fact, the philosophical debate concerning endurants, perdurants, and their temporal parts is a little more complicated, as four types of entities can be specified (Wasserman 2004):

  1. 1.

    Entities such that for each moment of their existence they have an instantaneous proper temporal part existing at exactly this moment. For instance, if such an entity exists in a period from t1 to t5, then it has at least five temporal proper parts for each moment from t1 to t5.

  2. 2.

    Entities such that they do not have a proper temporal part for each moment of their existence but are identical to the sum of their proper temporal parts. For example, there may be an entity E existing from t1 to t5 which has exactly two proper temporal parts: one from t1 to t2 and the second from t3 to t5. It does not belong to the category (1) as it does not have any instantaneous temporal parts. However, the sum of its two proper temporal parts is identical to E itself.

  3. 3.

    Entities such that they have proper temporal parts but are not identical to the sum of their proper temporal parts. An example would be an entity existing from t1 to t5 which has only one proper temporal part from t1 to t2 and does not have any proper temporal parts during the period from t3 to t5.

  4. 4.

    Entities that do not have any proper temporal parts. Such an entity may persist through time, e.g. from t1 to t5, while not having any temporal parts different from itself.

There is no consensus where exactly a line should be drawn that separates endurants from perdurants among the above four types of entities (Miller 2009; McKinnon 2002; Wasserman 2004). However, there is an agreement that entities belonging to (1) are paradigmatic perdurants, those belonging to (4) are paradigmatic endurants, and categories (2) and (3) describe some intermediate options.

Under the considered explication of the endurance/perdurance debate in terms of proper temporal parts, the Strong Thesis is true when the following condition is met:

(Strong Thesis: Temporal Parts Explication) The Strong Thesis is true if and only if AS are closer to paradigmatic perdurants than VO on the endurants/perdurants spectrum defined in terms of proper temporal parts (e.g. AS are entities from category (1) while VO are entities from category (3)).

In a similar way, a condition for the truth of the Weak Thesis can be formulated:

(Weak Thesis: Temporal Parts Explication) The Weak Thesis is true if and only if (1) AS and VO belong to the same category in respect of having proper temporal parts, but (2) typical AS have more proper temporal parts or their proper temporal parts are more salient than in the case of typical VO.

To judge the Strong Thesis under the Temporal Parts Explication, we have to investigate what it means to perceptually experience entities as having, or as not having, proper temporal parts.

A good starting point is to consider what are requirements for visually experiencing objects. According to psychological characterizations (see Palmer and Rock 1994; Scholl 2001), to perceive a fragment of the environment as an object, it has to be perceptually distinguished from the surrounding, usually in virtue of perceiving borders separating it from the rest of the environment. Actually, it often happens that two spatially connected regions “compete” for the status of an object. The borders are assigned to the winner of the competition while the other region is experienced as an objectless ground (Pomerantz and Kubovy 1986; Vecera 2000). In this perspective, the simplest VO are perceived in virtue of processes that detect edges between different surface features and represent regions bounded by these edges as figures separated from ground (Craft et al. 2007; Qiu and von der Heydt 2005). Such rudimentary VO may be then combined into more complex objects by recognizing appropriate spatial relations (Elder and Goldberg 2002; Kubovy et al. 1998).

The above necessary condition leads to a broad, minimal conception of VO which encompasses not only three dimensional things but also flat figures, shadows, edges, and even holes. The empirical studies regarding visual tracking show that all such various VO share important features regarding their diachronic identity. It is so because objects are visually experienced as being the same as long as they change in a spatiotemporally continuous fashion (e.g., Pylyshyn 2007; Scholl 2007). Nevertheless, even such a broad notion of VO does not entail that every fragment of the environment is experienced as an object. For instance, when perceiving a red wall, there are fragments of the wall which do not have any borders allowing to perceptually differentiate them from the rest of the wall and so such fragments are not VO.

The above minimal notion allows the division of fragments of the perceived environment into two fundamental types. First, there are VO which are fragments that are distinguished from ground and from other VO. Second, there are objectless fragments that constitute ground. Of course, distinguishing from ground is not for sufficient for experiencing every type of visual object. For instance, experiencing three dimensional visual objects may require processes responsible for depth perception. However, figure/ground discrimination seems to be a necessary factor in visually experiencing objects and referring to it is particularly useful in the present context as it allows to explain what are the parts of VO. In particular, even very simple VO, such as flat figures, can have proper spatial parts. For instance, a red square is perceived as having proper parts corresponding to its four edges. However, analogously as in the case of the whole visual field, not every spatial fragment of a red square is perceived as containing one of its parts. In particular, an interior of a red square can be divided in many ways into fragments of various shapes. For instance, a circle or a triangle could be drawn inside a square. Nevertheless, when visually perceiving a red square we do not experience it as having these various potential parts unless some additional borders are added, which allow us to distinguish them. This intuition is in accordance with the main psychological models of visual parts perception which claim that fragments of objects are recognized as parts if they are separated by edges or if the outline of a figure creates points of convexity (like in a case of an hour-glass shape, Hoffman and Richards 1984; Xu and Singh 2002). In other words, relying on the minimal notion of VO, proper spatial parts of VO are those of their spatial fragments that themselves are VO.

Of course, one can still propose a more liberal notion of perceptual parts, according to which, every spatial fragment of VO is its part. However, such an understanding of perceptual parts is less plausible in the present context. Investigations concerning part-structure of VO should formulate the notion of perceptual parts relaying on how human vision divides objects into parts and not on how we can conceptualize the structure of objects using some postperceptual abilities. From this perspective, the liberal notion of perceptual parts is not consistent with the major psychological models of part perception which, relying on behavioural data, claim that part-structure is determined by the represented edges (e.g., Hoffman and Richards 1984; Xu and Singh 2002). In addition, such a broader notion does not fit well with models of visual attention. If a fragment of an object is its perceptual part, then one should be able to demonstratively refer to this fragment. In the case of perceptual demonstratives, such a reference usually requires focusing attention on a given fragment (see Campbell 2002). However, if a fragment is not designated by edges, then visual attention is able only to choose an approximate, blob-like region, and not a region of a specific shape (Cave and Bichot 1999). On the other hand, visual attention has a tendency to spread within the edges (Richard et al. 2008). For instance, when one tries to focus on a fragment within a red square, it is likely that due to lack of edges distinguishing this fragment, the attention would spread up to the square’s boundaries, and as a result, not the concerned fragment, but the whole figure would be attentionally chosen. Furthermore, as shown in the subsequent paragraphs, the liberal notion of perceptual parts entails that both VO and AS have a maximal number of proper temporal parts, and so it trivialises the debate about their endurance/perdurance status.

Relying on the above intuitions regarding spatial parts, we may analogously characterise temporal parts of AS and VO as those fragments of their history that themselves are distinguished as VO or AS. By definition, every persisting entity has more than one moment at which it exists, so we may characterise a persisting entity E by specifying moments of its existence and features which it possesses at each of these moments. The description of all moments of E’s existence, and its features at these moments, specifies the whole history of E. Analogously, a description of a subset of moments of E’s existence and corresponding features distinguishes a fragment of E’s history. The existence of a history is consistent, both with having and with not having proper temporal parts, and is entailed by the mere persistence of an entity. Given such characterisation, it can be now analysed whether VO and AS vary in respect of having proper temporal parts. It should be noted that the above conceptual independence of persistence, and having temporal parts, cannot be achieved with a liberal notion of perceptual parts. If such a notion is adopted, then each moment of history of an entity is its proper temporal part, so persistence entails perdurance.

It is well-recognized within the cognitive psychology that VO and AS can persist, despite the fact that during their existence they undergo various qualitative changes. In the case of VO they are experienced as being the same, as long as there are no topological changes (e.g. dividing an object into fragments or adding holes, van Marle and Scholl 2003; Zhou et al. 2010) or changes that break spatiotemporal continuity (like jumping from place to place instead of a continuous movement, Scholl 2007). Similarly, AS persist through a variety of qualitative changes, but changes that include longer temporal discontinuities and significant differences in pitch are likely to break the persistence of AS (O’Callaghan 2008; Phlips 2013).

Relying on the above remarks, it is clear that there exist AS that have proper temporal parts. When hearing a complex, persisting sound, like a musical piece, we can distinguish that it is composed of various simpler sounds which may differ in pitch, loudness, or timbre. As was already mentioned, differences in pitch seem to be of special importance because sounds differing in pitch are unlikely to be perceptually identified with each other (see O’Callaghan 2008). These qualitative differences constitute borders which allow us to distinguish such proper temporal parts within the history of a persisting sound. What is more, if one fragment of a history of an AS is its proper temporal part, then, by contrast, the remaining fragments are also its proper temporal parts. For instance, when one experiences a sound S lasting from t1 to t5 and at t3 one perceives a part that has features different to earlier and later fragments of S’s history, then fragments t1 − t2 and t4 − t5 are also experienced as proper temporal parts of S. Hence AS seem to be identical to the sums of their proper temporal parts.

It is less clear whether there are AS that have an instantaneous temporal part at each moment of their existence. While there may be extremely dynamic sounds that change properties at every moment, it is doubtful if the human auditory system is able to distinguish a new, momentary sound at each fragment of a history of a persisting sound. On the other hand, there may be AS with no proper temporal parts. An example would be a constant sound which lasts for a while but at each moment has exactly the same properties. Using a visual analogy, such sound is like an interior of a red square as there are no qualitative borders which would allow one to distinguish some fragments of its history as additional sounds.

These considerations show AS are not uniform in respect of having proper temporal parts. There are AS that belong to categories (2) and (4), and perhaps also AS that should be classified as paradigmatic perdurants of category (1). However, it seems that category (3) does not contain any AS because there are no AS that have proper temporal parts but are not identical to the sums of their parts.

Perhaps more surprisingly, the exact same conclusion is reached when the temporal structure of a VO is investigated. Visually experienced objects have proper temporal parts, if there are visual experiences of persisting objects that have a history with fragments separated by borders arising from qualitative changes. This condition seems to be satisfied, as there exist VO that have proper temporal parts, and are identical to the sums of such parts. For instance, we may perceive a persisting square that at the beginning is red, but then changes its colour to green. In this case, due to standard principles of visual persistence perception related to spatiotemporal continuity, the square will be experienced as a single, persisting entity. However, it will also have two temporal fragments distinguished by a discontinuity in colour. In consequence, such a VO is composed of two proper temporal parts, a red part and a green part, and the sum of them is identical to the whole persisting VO. Furthermore, similar to the case of AS, it is not clear whether there may be VO so dynamically changing that they should be classified as belonging to category (1). On the other hand, there are also VO without proper temporal parts, which persist through a period of time without undergoing any qualitative changes.

One may propose that there is still a difference between the status of VO and AS, because even in case of constant, unchanging AS, some proper temporal parts can be distinguished; as such AS have a temporal beginning and an end. On the other hand, even dynamically changing objects are not usually visually experienced as having a beginning or as having an end in time. However, this is only a quantitative difference. For instance, while it does not often happen in natural situations, modern technology makes it easy to generate visual stimuli experienced as objects that start to exist and then disintegrate. Similarly, there are sounds which are not auditorily experienced as having a beginning or an end. For example, if one visits Niagara Falls, then she is likely to have an auditory experience in which AS does not start or stop existing.

Because VO and AS have exactly the same status in respect of temporal parts, the Strong Thesis investigated in the context of Temporal Parts Explication should be rejected.Footnote 3 However, it seems that there is a plausible justification for the Weak Thesis, because typical AS have more temporal parts than typical VO. It is common to visually experience objects without experiencing any changes in them. On the other hand, constant AS without any proper temporal parts are far less common, as usually experienced sounds change rapidly, and are experienced as having a complex temporal structure. The occurrence of this quantitative difference is a plausible candidate for grounding the Intuition of Persistence Difference.Footnote 4

3 Ways of Being Located

Not all authors agree that the endurance/perdurance distinction should be understood primarily in terms of having or lacking proper temporal parts. Another popular explication is to interpret enduring and perduring as different modes of being located in regions of time (Donnelly 2011; Rychter 2011). Let’s assume that there is an entity E existing from t1 to t5. This situation can be interpreted in two ways. First, it may be the case that E is “temporally extended” in virtue of being located in an extended temporal region, which is a sum of moments from t1 to t5 and therefore E is a perdurant (Balashov 2007; Miller 2009). However, there is also a second possibility that E is not located in an extended temporal region but has multiple locations by having five instantaneous localizations at each moment from t1 to t5. If this is true, then E is an endurant in virtue of being “wholly present” at each moment of its existence (Gibson and Pooley 2006; Parsons 2000).

To properly analyze the Strong and the Weak Theses interpreted in accordance with the above explication, we need a more detailed account of what it means for an entity to be located [below, I use the framework proposed by Hudson (2005, p. 97)]. Let’s assume that being located is an intuitive, primitive notion according to which an entity is located in a region if, and only if, it encompasses this region. For instance, I am located where my leg is, and where my hand is, and obviously also in a region filled-in by my whole body. Relying on this intuitive notion, two stronger variants of being located can be defined. First, an entity E is entirely located in a region R if and only if there is no region disjoint from R in which E is located. Second, an entity E is wholly located in a region R if and only if for every proper part of E there is a subregion of R in which this part is located. Both being entirely located and being wholly located express, from differing perspectives, an intuitive idea that an entity fills-in a region but is also contained in a region by not extending beyond its borders.

Using the above distinction, we can more precisely characterize the condition needed for the truth of the strong thesis under the considered explication of the endurance/perdurance distinction. The truth of the strong thesis requires that AS are perdurants and so are wholly or entirely localised only in extended temporal regions, while VO are endurants and so are wholly or entirely localised only in instantaneous moments. More precisely, these truth conditions can be formulated as follows:

(Strong Thesis: Location Explication) The Strong Thesis is true if and only if (a) if an AS persists through a period from ti to tk, then it is entirely or wholly located at the extended region ti,…,tk and it is neither wholly nor entirely located at any of the instantaneous moments from ti to tk and (b) if a VO persists through the period from ti to tk, then it is neither entirely nor wholly located at the extended region ti,…,tk and it is wholly or entirely located at each instantaneous moment from ti to tk.

In a similar way, we may also formulate a truth condition for the Weak Thesis under the Location Explication. The Weak Thesis is true when AS and VO do not differ in their ability to be temporally located, but (1) typical AS are more frequently wholly or entirely located at extended temporal regions or alternatively (2) typical VO are more frequently wholly or entirely located at instantaneous moments:

(Weak Thesis: Location Explication) The Weak Thesis is true if and only if (a) VO and AS have the same ability to be temporally located, but (b) typical AS are more often wholly or entirely located at an extended region corresponding to a period through which they persist than typical VO, or typical VO are more often wholly or entirely located at instantaneous moments in which they exist than typical AS.

To judge the above version of the Strong Thesis, let’s consider a persisting entity E that exists from t1 to t5. According to our intuitive notion of being located, an entity is located in a region when there is no fragment of this region that is not encompassed by the given entity, as in a case of me being spatially located in the region where my hand is. In this sense, the entity E is temporally located at each of the instantaneous moments from t1 to t5 as those moments do not have any temporal fragments in which E does not exist (in fact, as instantaneous moments, they do not have any temporal fragments different from themselves).

The question is whether it is possible for E to be located at each of the moments from t1 to t5 without being entirely and wholly located at an extended temporal region t1,…,t5 composed of these five instantaneous moments. It is easy to observe that it can only happen when the extended region t1,…,t5 does not exist, despite the existence of five instantaneous moments from t1 to t5. If the entity E is located only at moments from t1 to t5, and there exists an extended region t1,…,t5, then E is entirely located at t1,…,t5 as there is no moment at which E is located, that is not a fragment of t1,…,t5. Similarly, if the entity E is located only at moments from t1 to t5 and there exists an extended region t1,…,t5, then E is wholly located at t1,…,t5 because each proper part of E is located somewhere within t1,…,t5.

In the context of perceptual experience, the statement that a series of subsequent moments do not form an extended temporal region composed of these moments may be interpreted in two ways. First, it may mean that we experience time merely as a series of instantaneous moments. Second, it may be the case that we experience time as composed of only one, present moment (a doctrine known in metaphysical investigation as “presentism”, Parsons 2000). However, both these options are implausible in the context of our perceptual experience of time. First, we perceive time as continuous flow of events and not as a series of separate, atomic moments (Matthews and Meck 2014). In fact, it seems very difficult to distinguish such instantaneous moments in our usual experiences. Second, it is unlikely that we experience time as composed of only a single, present moment, as the perceptual system has access to some of the past moments through memory mechanisms, so perception does not process time as being composed only of what is present. In fact, such mechanisms, allowing us to compare features of objects perceived in both current and in previous moments, are crucial for perceiving entities as persisting individuals (Kahneman et al. 1992; O’Callaghan 2008). Hence there seem to be no plausible reasons to believe that we perceive time as composed only of instantaneous moments. As a consequence, if a VO or an AS persists from ti to tk, then it is likely that there is an extended temporal region ti,…,tk at which a VO or an AS is wholly and entirely located.

Furthermore, we may consider whether a persisting VO or AS, which exists during a period from ti to tk, is wholly or entirely located at each of these instantaneous moments. In the previous section, I argued that there exist VO and AS which have proper temporal parts, as well as VO and AS which lack any such parts. Let’s consider a VO that exists from t1 to t5 and has two proper temporal parts: a green temporal part from t1 to t2 and a red temporal part from t3 to t5 (an analogous example can be formulated for an AS). Such a VO is neither wholly nor entirely located at any moment from t1 to t5. It is not wholly located because for each moment there is a proper part which is not located at this moment. For instance, a red part is not located at t1 and a green part is not located at t4. It is not entirely located, because for each instantaneous moment in which this VO is located, there is a disjoint moment in which it is also located. For example, it is located at t1 and at t2 while these moments do not overlap.

The situation is different in case of VO and AS which do not have any proper temporal parts. Such entities are wholly located at each instantaneous moment comprising a period of their existence. Let’s again assume a VO such that it exists from t1 to t5 and does not have any proper temporal parts (an analogous example can also be formulated for an AS here). The lack of proper temporal parts entails that this VO does not undergo any change and so if it has any other proper parts, like spatial proper parts, then these parts are exactly the same at each moment of its existence. Hence each proper part of such a VO is located at each instantaneous moment in which this VO exists, so this VO is wholly located at every such moment.

Relying on the above considerations, we may state that if a VO or an AS persists through a period from ti to tk, then this VO or AS is wholly and entirely located at an extended region ti,…,tk. Furthermore, if a VO or an AS has proper temporal parts, then it is neither wholly nor entirely located at any instantaneous moment from ti to tk. Such entities are paradigmatic perdurants according to the explication of endurance/perdurance distinction in terms of different ways of being temporally located. However, if a VO or an AS does not have any proper temporal parts, then it is wholly located at each instantaneous moment of its existence. The status of such entities is less clear as they combine characteristics of paradigmatic endurants and perdurants.

Because VO and AS do not differ in terms of being temporally located, the Strong Thesis under the Location Explication is false. On the other hand, there is a plausible justification of the Weak Thesis. It is less common to perceive stable, temporally partless AS than stable, temporally partless VO. As shown earlier, such stable, temporally partless VO and AS are exactly those that are wholly located at each moment of their existence. Hence the Weak Thesis is justified, as it is less frequent for AS to be wholly located at each moment of its existence, than it is for VO. Therefore our intuitions concerning persistence are more strongly shaped by experiencing paradigmatic perdurants in the case of audition than in than case of vision.

4 Theories of Time

The previous section has already shown that considerations about time perception are relevant for deciding whether the Strong or Weak Thesis is true. In the metaphysical investigations regarding time, the fundamental distinction lies between so called A-theories and B-theories. According to A-theories of time, not all moments have the same status, since some belong to the past, some are future moments, and there is a present moment. Conversely, in B-theories time is not divided into three types of moments. All moments have the same status and are organized into a series by relations “x is earlier/later than y”. B-theories naturally combine with eternalism, i.e. a view that all moments exist and so what we intuitively name “past” and “future” is just as real as the present. A-theories allow for claims that not all moments exist, and according to the strongest positions, presentism, only a present moment has a real existence.

Some authors claim that the debate between A- and B-theories of time is strongly logically connected with the endurance/perdurance debate. First, it is argued that A-theories entail that objects endure (Carter and Hestevold 1994, see also Merricks 1999). Proponents of this statement interpret the main metaphor connected with endurance/perdurance debate, i.e. that perdurants are “temporally extended” while endurants are not, in an analogy with the way in which entities are extended in space. In the case of spatial extension, places are not divided into different types, as is the case with past, present, and future moments in A-theories. Hence, if entities are “temporally extended”, then A-theories are not true, as they postulate a structure of time that does not allow for extension. Therefore the truth of A-theories entails that entities are not “temporally extended”, and so persisting entities should be characterized as endurants.

Second, it is argued that B-theories entail that objects perdure. Here, the main line of reasoning is connected with the classical problem of intrinsic properties change (Lewis 1986; Olson 2006; Wasserman 2006). It seems intuitive that a persisting object O can have a property F at moment t1 and lack it at a different moment t2. However, if one accepts that being identical entails having the same properties, then O from t1 and O from t2 cannot be the same object. A proponent of perdurantism may claim the O at t1 and O at t2 are, in fact, proper parts of a temporally extended object that encompasses both moments. To deny this conclusion, one has to propose that properties are not intrinsic but are relations to times, or that properties are instantiated in a tense-sensitive way, so having F in the past is compatible with not having it now. The first option is criticised as mischaracterising the nature of common properties, and the second requires abandoning the B-theory of time by introducing the division between past, present, and future. Because the only legitimate way to avoid perdurantism is to reject B-theories, perdurantism follows from B-theories.

In addition, if (a) endurantism follows from A-theories, (b) perdurantism follows from B-theories, and (c) A-theories and B-theories are contradictory while endurance is at least contrary of perdurance, then the truth of A-theories is logically equivalent to the truth of endurantism and the truth of B-theories is logically equivalent to the truth of perdurantism (Carter and Hestevold 1994).

My goal is not to analyse and evaluate the above arguments. What is important, is that they show an influential explication of the endurance/perdurance distinction, according to which, the truth of endurantism is equivalent to the truth of A-theories, and the truth of perdurantism is equivalent to the truth of B-theories. We may, therefore, consider whether the Strong Thesis or the Weak thesis is justified if endurance/perdurance distinction is understood in accordance with this explication.

Under the considered explication, the Strong Thesis is true when the following condition is satisfied:

(Strong Thesis: Time Explication) The Strong Thesis is true if and only if time is experienced differently in visual modality than in auditory modality: the visually experienced time is the time of A-theories and the auditorily experienced time is the time of B-theories.

In this case, visually experienced persisting entities, including VO, are endurants, and auditorily experienced persisting entities, including AS, are perdurants. On the other hand, the Weak Thesis will hold true if vision and audition present time in the same way, but differences between past, present, and future associated with A-theories are more vivid in the case of vision:

(Weak Thesis: Time Explication) The Weak Thesis is true if and only if time is experienced in the same way in both modalities, but differences between past, present, and future are more salient in visual than in auditory modality.

However, in contrast to the Strong Thesis, it is counterintuitive that in visual experiences we perceive time differently than in auditory experiences. In both cases the experienced time seems to be the time of A-theories. When we visually perceive an object, we see its present features, we can recall some of its previously possessed features (Cowan 2000; Kahneman et al. 1992), and we are able to make some predictions about its future changes (in particular, we can extrapolate its position from current motion, Intriligator and Cavanagh 2001). Analogously, in audition we perceive present features of a sound, we can compare these present features with earlier ones, and, especially when hearing musical pieces, we predict the future developments of a sound (Luntley 2003; O’Callaghan 2008). So in the case of both modalities not all moments have the same status: some are presently experienced moments, some are past moments accessible by memory, and some are future moments to which we have limited access through predictive mechanisms (Brogaard and Gatzia 2011). As a consequence, visually and auditorily experienced time fits better in the framework of A-theories than it does B-theories and so, under the considered explication of the endurance/perdurance debate, both VO and AS should be characterized as endurants. This result is clearly incompatible with the Strong Thesis.

Despite the above reasons, some authors believe that perceptually experienced time should be characterized in terms of B-theories, because physical time is adequately described by B-theories, and characterizing perceptual time as a time of A-theories would entail that we always misperceive time (see Kriegel 2009). However, from this perspective there is still no reason to claim that the B-theoretic characterization of perceptual time is applicable only to audition and not to vision. As a consequence, also under this interpretation the Strong Thesis would be false.

While the Strong Thesis seems to be false, the Weak Thesis is compatible with the above observations and would be corroborated if, in typical visual experiences of objects, the differences between past, present, and future moments were more salient than in typical auditory experiences of sounds.Footnote 5 Typical AS are more dynamic than typical VO as they usually undergo more changes in a given period of time. For instance, the perceived auditory qualities of musical pieces, speech utterances, and various environmental sounds are constantly changing, while we commonly visually experience objects which seem to be completely stable. Therefore, to re-identify a typical AS across time, the perceptual system has to constantly compare current features with past ones and predict future qualities. Given that, one may ask whether these higher cognitive demands are somehow reflected in auditory experiences, such that differences between past, present, and future moments are less salient.

A behavioural indicator of such lack of saliency would be the presence of mistakes and uncertainty concerning the temporal order of sounds. Occurrences of difficulties in recognising the temporal order suggest that that incoming stimuli are not rigidly divided into three separate categories of past, present, and future, as applying these categories requires the “earlier than” and “later than” relations to be identified. The psychological data consistently show that people are less certain about the temporal order of stimuli when the changes are more frequent (Block and Gruber 2013, Warren and Obusek 1972), and the changing stimuli are more complex (van Wassenhove 2009). Furthermore, these factors similarly influence both visual and auditory perception of temporal order (Kanabus et al. 2002), so the lower certainty of judgements seems to result mainly from the characteristics of stimuli, and not from the differences between modalities. In this perspective, it is likely that cases in which it is more difficult to recognise the temporal order of stimuli are more frequent in audition than in vision. Auditory experiences usually present rapidly changing entities with complex temporal structures, while visual experiences are more likely to present objects with simpler temporal structures composed of several stable phases (e.g. a period in which a square is red, and a period in which it is green). This corroborates the hypothesis that experiences in which the division between past, present, and future moments is less pronounced occur more often in the case of audition.

5 Determination of Identity

According to the final considered explication, the endurance/perdurance debate should be understood as a debate concerning determinants of diachronic identity (Hofweber and Velleman 2011). Let’s consider an entity E existing at a moment T. Some properties possessed by E at T are local properties, in the sense that having these properties by E does not depend on properties possessed by entities at moments that are different from T. For instance, “being red” seems to be a local property of E at T as E’s being red does not depend on the colours, sizes, locations, etc., of the object at moments earlier or later than T. However, not all properties are local. For example, a property of being spatiotemporally continuous with an entity existing at an earlier moment T − 1 is not local for E at T. In particular, it depends on spatial properties of entities at T − 1.

Relying on the above notion, the difference between endurants and perdurants can be understood as a difference between entities whose diachronic numeric identity is completely determined by local properties, and those whose identity relies also on nonlocal properties. The endurant status of an entity E existing at ti means that its identity with an entity X existing at some other moment tk depends solely on properties local for ti and tk. On the other hand, the lack of such dependency of identity on local properties is a defining feature of perdurants. For instance, Hofweber and Vellman suggest that persons are perdurants because the identity of a person at ti with a person at tk depends on properties connected with occurrence of continuity relations that are nonlocal for ti and tk (Hofweber and Vellman 2011, p. 56). On the other hand, sets seem to be paradigmatic endurants as the identity of a set at ti with a set at tk depends on properties local for ti and tk which determine elements belonging to a set at ti, and elements belonging to a set at tk.

Hofweber and Vellman apply the above idea to investigations regarding numeric identity, which regards whether an entity x is the same as an entity y. However, their approach can also be considered in the context of general identity concerning a category that an entity exemplifies. For instance, there may be an entity existing at a moment T whose general identity depends on properties possessed by this entity at earlier moments. Such properties are obviously nonlocal for a moment T so such an entity can be characterized as a perdurant. On the other hand, it may be the case that the general identity of an entity at T is determined solely by properties local for T, which allows us to interpret it as an endurant. In fact, both VO and AS are frequently recognized as exemplars of certain categories (Hummel 2013; Davies 2010). For instance, we may identify a VO as a dog or recognize that an AS is a performance of Tchaikovsky’s Symphony No. 4. So it is worth asking whether the general identity of a VO and an AS is determined by local properties.

These considerations allow the formulation of yet another explication of the Strong and Weak Theses. According to this explication, the truth of the Strong Thesis requires that the identity of a VO is completely determined by local properties, but it is not the case for AS:

(Strong Thesis: Identity Explication) The Strong Thesis is true if and only if (1) the identity of VO is determined solely by local properties but (2) the identity of AS is at least partially determined by nonlocal properties.

On the other hand, the Weak Thesis is true when there are no differences in how the identity of VO and AS is determined, but local properties play a more significant role in the case of VO than in the case of AS:

(Weak Thesis: Identity Explication) The Weak Thesis is true if and only if (1) VO and AS do not differ in the way in which their identity is determined, but (2) in the case of typical VO, local properties play a more significant role in determining identity than in the case of typical AS.

In the context of numeric identity, it is easy to observe that the Identity Explication of the Strong Thesis is false. Neither the diachronic numeric identity of VO nor the diachronic numeric identity of AS is determined completely by local properties, since in both cases the perception of sameness relies on the occurrence of continuity relations. In the case of vision, objects are perceived as being diachronically the same when they are spatiotemporally continuous (Scholl 2007). The spatial continuity does not play a significant role in audition, but the temporal continuity and continuity in changes of pitch are crucial for perceiving diachronic sameness of sounds (O’Callaghan 2008). Hence the diachronic numeric identity of both VO and AS is determined to an important degree by nonlocal properties.

In philosophy of perception, there are two main approaches to perceptual recognition of general identity. First, it is claimed that perceptual recognition is in fact a postperceptual process, consisting of forming a judgement which relies on perceived properties (see Bayne 2009 for a discussion). According to the second option, in addition to recognising by forming a postperceptual judgement, there is also a lower-level, genuinely perceptual form of recognition. In this case, a general identity is attributed by virtue of perceptually representing an object as having a kind-property like “being a pine tree” or “being a tiger” (e.g. Siegel 2006). It is believed that such a type of perceptual recognition is largely independent from background knowledge. For instance, one’s perceptual system may recognize an object as a rabbit even if one knows that, in fact, it is a hologram, and so does not form a postperceptual “It is a rabbit” judgement (Reiland 2014).

It seems very intuitive that general identity is not completely determined by local properties in the case of AS, regardless of whether one interprets perceptual recognition in terms of postperceptual judgement or as a genuinely perceptual process. It is usually insufficient, in particular when musical pieces are considered, to categorise an AS by relying on properties represented at a single moment. Identification of a sound usually involves tracking its qualitative development, as only data from multiple moments taken together allow the recognition of an AS as an exemplar of some general category.

On the other hand, it is initially more plausible that the general identity of a VO is completely determined by local properties, since visual categorisation usually depends on shape properties, and all such relevant properties may be experienced at a single moment (Hummel 2013). Nevertheless, this is not universally the case, as there are certain situations in which the general identity of a VO depends on features perceived during previous moments. For instance, we may imagine a situation in which at a moment T a person perceives something that looks exactly like an ordinary apple. However, at an earlier moment T − 1 the person perceived this object from the other side and saw that its interior is filled in by a mechanism composed of metal gears. In this case, a person, relying on perceptually obtained information, would not form a judgement that such an artificial, clockwork fruit is an apple.

However, the above example is not convincing if one interprets perceptual recognition as a genuinely perceptual process. From this perspective, it may be claimed that the perceived object is automatically identified as an apple if only its apple side is visible, without taking into account additional knowledge about its mechanical interior. Nevertheless, even if such view is adopted, there are cases in which general identity is not completely determined by local properties: visual recognition is a dynamic process that often requires exploratory behaviours connected, inter alia, with looking at an object from multiple angles, or focusing attention on its various parts to discover details of its features (see Meijer and Van der Lubbe 2011). Such actions are especially important if an object is perceived in suboptimal conditions, or is a non-standard exemplar of its category, such that not all its diagnostic features can be perceived from a single point of view (e.g., Hill et al. 1997; Tarr et al. 1997). In consequence, there are visual cases in which the general identity is not determined by features represented at a single moment, but requires a period of gathering and combining visual information.

Overall, it seems that neither diachronic numeric nor general identity of VO and AS is determined solely by local properties, so the Strong Thesis is false. However, the considerations regarding general identity provide another justification for the Weak Thesis, because situations in which local properties are sufficient for determining general identity are far more frequent in visual than in auditory experiences. Therefore, given the considered explication of endurance/perdurance distinction, features that are characteristic of perdurants are more salient in the case of typical AS than in the case of typical VO.

6 Experiencing Persistence

The conducted analyses show that regardless of the explications of the endurance/perdurance debate, the Strong Thesis is false. It means that it is not justified to interpret VO as endurants in opposition to AS which should be interpreted as perdurants. On the contrary, VO and AS do not differ in the way they persist and they are neither paradigmatic endurants nor paradigmatic perdurants. For instance, both VO and AS can have temporal parts, but temporally partless VO and AS also exist. Especially interesting is the fact that many characteristics associated with perdurants also apply to VO, which are intuitively described as enduring individuals. Most importantly, VO can have temporal parts, they are localized at extended temporal regions, and their identity is not entirely determined by local features.

While the Strong Thesis is false, there is a significant body of evidence that the Weak Thesis adequately explains the initial intuition about the endurance of VO and perdurance of AS. According to the Weak Thesis, VO seem to be endurants and AS seem to be perdurants, because characteristics attributed to perdurants are more frequently encountered in auditory than in visual experiences (see also O’Callaghan 2007, p. 27 where a similar interpretation is proposed). This claim is supported by the fact that (1) typical AS have more temporal parts than typical VO, (2) it is more frequent for AS than for VO to persist through a period ti,…,tk without being wholly localized in any of the instantaneous moments from ti to tk, (3) the boundaries between earlier and later moments are less salient in auditory experiences, and (4) it is less common for temporally local features to be sufficient in determining the general identity of AS than is the case with VO.

These results also have consequences when assessing the accuracy of visual and auditory experiences of persistence, in the context of philosophical theories about the nature of physical sounds and objects. According to influential positions (in particular, see O’Callaghan 2007), physical sounds are events, so they are likely to be paradigmatic perdurants. Similarly, many authors characterise physical material objects that can be visually perceived as paradigmatic endurants. However, the obtained results show that sounds are not always auditorily experienced as perduring (e.g. there are AS without proper temporal parts), and objects are not always visually experienced as enduring (e.g. there are VO with many temporal proper parts). In consequence, if one believes that all physical sounds are perdurants and all physical objects are endurants, then one has to accept that our experiences of persistence are only approximately accurate as no all AS perdure and not all VO endure.

The conducted analyses also allow us to formulate a more general claim about the way in which persistence is perceptually experienced. The fact that VO and AS do not differ in the way they persist, corroborates the hypothesis that human senses are unified in respect of presenting persistence. Both vision and audition present time in the same way. This perceptually experienced time can be intuitively described by a non-presentist’s version of A-theory. However, the main results of this paper are also compatible with characterizing perceptual time in terms of B-theory. Within this general temporal framework, both vision and audition are able to present entities as persisting in virtue of having proper temporal parts, as being located in extended temporal regions, and as having identity determined by nonlocal features. Further investigations may reveal whether the same way of presenting persistence is also applicable to other human perceptual modalities connected with touch, taste, and olfaction.