1 Auditory Objects and the Functions of Audition

Philosophical and scientific debates on perception have been largely dominated by research on vision (see Bennett 1966, p. 30; Casati and Dokic 1994; Spence and Driver 2004; Calvert et al. 2004). However, our perceptual knowledge of physical objects, like our actions, depends on the connections between distinct perceptual modalities and a number of cognitive systems besides vision.Footnote 1 Among the latter, audition appears most prominently. Similarly to vision, audition is an essential source of spatiotemporal information about the world. Hearing acquaints us with spatially distant events or objects in a way that cannot be achieved through modalities such as smell, touch or taste, which require more direct physical or chemical contact with their sources. Furthermore, audition plays an essential part in language and speech, and as such, it is also of central importance in the understanding of human cognition.

The focus of the present issue of the Review of Philosophy and Psychology is on the nature of auditory perception, and on the role played by audition in our cognition of physical objects and sound sources (what we shall refer to as object-directed cognition in what follows). The study of auditory cognition raises the problem of specifying the nature of the ‘objects’ that are known through its innate or learned operations. Because its meaning is so highly underspecified, the term ‘object’ is used in the literature on auditory perception in many discrepant ways, often reflective of more substantial disagreements between theories. The present volume brings together contributions by psychologists and philosophers on the content of auditory perception, and purports to address the conceptual problems raised by these different conceptions.

Before specifying their nature, it is useful to ponder the main issues concerning the nature of auditory cognition. At least five series of problems can be singled out, corresponding to distinct methodological approaches to the study of auditory perception:

  1. i)

    A first set of questions lies at the intersection of biology and psychology. What are the main biological functions of audition, as opposed to those of other sensory modalities? How should one model auditory information processing and its relations to other cognitive systems in cognitive psychology and neuroscience?

  2. ii)

    The role of audition in cognition raises epistemological problems. How do auditory processing and auditory experience serve animal cognition and human knowledge? How does audition contribute to the tracking of individuals? Moreover, given the importance of language in human cognition, how does audition constrain the semantic and syntactic aspects of linguistic processing? Such problems are dealt with by cognitive scientists, acousticians, phoneticians and phonologists.

  3. iii)

    A third series of questions concerns the phenomenology of experience: What is the specific quality of auditory experience and of multimodal experience involving audition? Are auditory contents representations of physical objects and events, or do they consist in purely phenomenal qualities?

  4. iv)

    The study of auditory cognition is also bound to cultural and aesthetical problems. How can one account for the aesthetic dimension of audition? What is musical listening and how does it relate to non-musical hearing?

  5. v)

    Finally, and this will be our central subject matter in this introduction, the previous issues raise ontological problems about the targets and contents of audition. What do we hear? Does the basic material of audition consist of sounds, acoustic features, auditory streams, or sound sources? More generally, what are the ‘ontological commitments’ of audition?

Of these five series of questions, the third, fourth and fifth receive particular attention in the various papers brought together in this issue. Biological and linguistic aspects of the study of audition are not directly addressed in the present volume; phenomenological, ontological, and aesthetical aspects of the study of audition, on the other hand, are the main focus.

Ontological questions as raised in v), in particular, are relevant for psychology and epistemology as well as biology, because the main function of audition is most likely to track certain kinds of entities as opposed to other kinds of entities. In spite of caveats formulated by eminent skeptics on the inadequacy of the word ‘object’ to characterize that which is tracked by auditory perception (most notably Bregman 1990, pp. 16–20), the notion of auditory object is now widely used in studies on audition and it calls for a comparison with the domain of visual perception, where the category of ‘object’ is of more common use (Brewer 2006; Campbell 2002; Feldman 2003; Handel 1988, 2006; Kubovy and Van Valkenburg 2001; Peterson 2001; Smith 2002; Tarr and Bülthoff 1998; Van Valkenburg and Kubovy 2003).

2 Mind-Dependent vs. Mind-Independent Objects

Regardless of the exact status of its targets, auditory experience may be characterized as intentional, in the sense that it is about that which is heard, or that which determines whether our demonstrative judgements based on auditory experience are true or false (their truth conditions). In accordance with the terminology inherited from Brentano and Husserl, the contents of auditory experience can therefore be described in terms of ‘intentional objects’, in the broad sense of that which the state is about. Not only is the notion of intentional object of wide use among philosophers, but it has acquired some currency in psychology.Footnote 2

The notion of intentional object is helpful to avoid any prototypical association of the term ‘object’ with the concept of ‘physical object’ understood as ‘three-dimensional material substance’ or ‘spatially extended and solid individual’, which might primarily derive from the perception of material individuals through vision and touch (Matthen 2005; Quinton 1979). On the other hand, analyses based on the concept of intentional object often keep a completely neutral stance regarding the ontology of these intentional objects and the specific sensory information they convey. Short of a more fine-grained characterization of the structure of auditory contents, therefore, stating that audition provides information about auditory intentional objects is too weak a claim to cast light on the ontology of auditory contents. A substantive theory of the function of audition must engage in an analysis of the ontology of audition.

In order to refine the description of intentional objects, some additional concepts are needed to provide a characterization of the targets or contents of auditory experience. A fundamental dichotomy that has proven helpful to analyze the ontology of targets of mental states distinguishes mind-dependent and mind-independent objects. Although there is no complete agreement on the extension of these concepts, they remain valuable to clarify the status of mental states and representations.

Mind-independent objects can be defined as those physical entities whose existence is not directly caused or maintained by mental states or brain processes. They are ‘objective particulars’ (Strawson 1959) or ‘physical objects’ (Campbell 1993, 2002), such as biological organisms and agents, or natural and human-made inanimate objects such as stones, chairs and other artefacts. They correspond to those material particulars that are experienced in multimodal perception, and therefore can be seen, touched, smelt, tasted and heard without being reducible to any ephemeral phenomenal quality or sensation. Even when some conscious system may have contributed to shape their properties (as happens with artefacts, for instance), their sustained physical or chemical existence is not continuously dependent on the causation of some mental states or perceptual representations. This claim, of course, assumes a realistic ontology, which is only one among many rival metaphysical contenders. Nevertheless, this realistic metaphysics is attractive for many philosophers because it offers an intuitive causal explanation of how veridical perceptual experience is generated, namely through the causal influence of mind-independent physical objects in the perceiver’s environment (see, e.g., Grice 1961; Strawson 1974; Noë 2003).

By contrast, mind-dependent objects (or observer-independent objects) are entities the existence of which depends in an essential way on the activities of the mind. For instance, an auditory hallucination (such as tinnitus, or in hallucinating roar or voices) can be thought of as a mind-dependent intentional object because the content cannot occur independently of the mind and brain of the person who is experiencing it. More generally, on an anti-realist metaphysics of the ontology of perceptible qualities, colors, smells, tastes, but also textures and likewise sounds would not exist independently of the existence of a perceiving mind.Footnote 3 The claim, this time, may be viewed as a form of metaphysical idealism or phenomenalism, postulating all intentional objects to be essentially mind-dependent, and here too, alternative positions exist.

While it is difficult to deny that hallucinatory contents are mind-dependent entities, other mental contents are less easy to categorize on either side of the dichotomy. This problem is addressed in the present issue. Contributions in the present volume deal with the concepts of visual and auditory objects (Matthen, this issue; Nudds, this issue) as well as audio-visual objects (Kubovy and Schutz, this issue). In most cases, however, the objects under examination appear to be hybrid cases, namely intentional objects whose properties depend to varying degrees on the properties of mind-independent objects or sources. Figure 1 presents a schematic distribution of the notions of visual, auditory and audio-visual objects along a single dimension, ranging from hallucination and illusion up to veridical perception. In normal situations, we take the experience of a perceptual object, whether visual, auditory, or multimodal, to depend in a reliable way on a mind-independent object (such as an external and observer-independent light or sound source), as opposed to what happens in a case of hallucination. Illusions occupy an intermediary position: though clearly mind-dependent, their status remains largely debated because they usually depend on the structure or configuration of an external stimulus (Brewer 2006; Smith 2002).

Fig. 1
figure 1

Perceptual objects and degrees of mind-dependence

The degree of dependence or independence of the object on the mind, moreover, hinges on the theory of representations one adopts. Consider auditory objects. Although a minority of psychologists might identify ‘auditory objects’ with sound sources qua mind-independent physical objects, most psychologists view auditory objects primarily as constructions of the brain (Griffiths and Warren 2004; Kubovy and Van Valkenburg 2001; Nelken 2004), and therefore as typical instances of mind-dependent objects.

However, the epistemologist can still submit that the function of audition is to constrain the construction of those mind-dependent auditory objects in such a way that they track, and inform one about external sound sources located in the hearer’s environment (see Matthen, this issue, and Nudds, this issue). If so, certain properties of the auditory contents of veridical experiences must depend on the properties of mind-independent objects. On this epistemological view, the very possibility of distinguishing correct and incorrect perceptual experiences presupposes a distinction between mind-independent and mind-dependent objects (Cummins 1996; Dretske 1995).

Three main options may therefore be distinguished in the description of the intentional objects of audition: such a content is either (1) a mind-independent object, such as a sound source, (2) a mind-dependent or internal object, or (3) an object that depends both on the mind and on the properties of mind-independent objects. Very schematically, option (1) can be associated with an externalist and direct realist conception of the nature of auditory contents, as opposed to (2), which we may call an internalist and phenomenalist conception; option (3), finally, may be referred to as a form of indirect realism, leaving room for different forms of compromise between the former two views. The theoretical implications of these three conceptions for biology, psychology or epistemology are distinct.

3 Different Theories of Auditory Objects

To express these differences, it is useful to distinguish the target of a perceptual or sensory-motor system and the featural content of a perceptual state.Footnote 4 Define the target of a perceptual or sensory-motor system as the actual object to which the system is applied. One can hold that the system tracks a target t when the mechanisms of this system are actually directed at t. For instance, John is the target of Maria’s oculomotor and auditory systems because Maria’s oculomotor and auditory systems are directed at John while she is looking at him and listening to his speech. Secondly, define the featural content of a system’s state as the set of features F that are specified, experienced or represented by the system as properties of its current target t.

This distinction is useful to explain a number of epistemological characteristics of perception and misperception. Because the featural content presents or describes the perceptual target, it offers an indication of how perceptual information can serve the formation of demonstrative thoughts about such a target. Furthermore, the distinction suggests an explanation of why incorrect perceptual contents are possible (Cummins 1996; Dretske 1995). The gist of the explanation is that incorrect intentional contents—or perceptual errors—occur whenever there happens to be a mismatch between the properties of the actual target t and the features which are specified in the states’ content as belonging to t. For instance, suppose that Mary visually and auditorily tracks John delivering a talk at a conference but perceives the hallucinatory content of a screaming chimpanzee. Although she is auditorily and visually tracking an actual individual delivering a talk (namely John), her perceptual experience is incorrect, due to the mismatch between the actual properties of the target (the production of normal speech sounds) and the properties corresponding to the featural content of her experience (namely chimpanzee-like screamings).

Now consider the link between the ontology and the psychophysiology of a perceptual system. Arguably, the biological function of a perceptual system is to provide information about the perceiver’s environment and their bodily states or actions. Ecological (see Gibson 1966)Footnote 5 and cognitive (see Broadbent 1958; Neisser 1967, 1976) approaches to perception will both agree with this truism. Where they easily disagree, however, is on the exact definition of the targets of a perceptual-motor system, and on the dependence between the featural contents and the targets of perception.

As for visual perception, many authors concur that the targets of the visual system are physical objects qua mind-independent individuals.Footnote 6 Vision provides information on the surfaces of material individuals and allows one to identify, localize and recognize these individuals (see Davies, this issue; Matthen, this issue; Nudds, this issue; Kubovy and Schutz, this issue). Can a similar conclusion be drawn about the biological function of audition?

In comparing auditory and visual experiences, several authors point out that auditory contents exhibit puzzling characteristics, because, unlike visual contents, they do not directly present surfaces and spatial parts of physical objects (see Plomp 2002; Kubovy and Van Valkenburg 2001; Kubovy and Schutz, this issue; Matthen, this issue; Nudds, this issue). For instance, it has been suggested that auditory contents are completely deprived of spatial content, a position famously endorsed by Strawson (1959) among others (see O’Callaghan, this issue, for a discussion of spatial audition).Footnote 7

In psychology and neuroscience, disagreements about the nature of targets and contents of audition are no less prevalent. A number of writers prefer to avoid using the concepts of object-directed cognition in their writings about audition (see Bregman 1990; Warren 1982; McAdams and Bigand 1993; Plomp 2002). Competing views, on the other hand, adopt the concepts of object-directed cognition, and make use either of the realist concept of object qua sound source, or of the concept of a mind-dependent auditory object of attention.Footnote 8

Despite this lack of consensus, a point of convergence may be found in the consideration that the main function of the auditory system is to track sound sources, understood as vibrating material objects or physical media. If this claim is correct, the normal targets of audition may well be mind-independent physical individuals (objects, agents or media) that radiate acoustic vibrations. In addition, the function of audition would be to generate intentional contents that specify for the hearer a set of features of the sound source, which can be subsequently grouped or segregated by preattentional and attentional processes.

Versions of this approach are developed in different guises by Pasnau (1999), Nudds (this issue) and others. In philosophical terms, the view is compatible with the application of direct realism and externalism to auditory perception. The doctrine of direct realism applied to audition lies in the thought that the direct objects (or targets) of auditory perception are macroscopic and mind-independent physical individuals in the hearer’s environment. On this strong brand of realism, audibles are merely mind-independent physical individuals, and the function of audition is to track sound sources qua mind-independent objects.

In its strongest form, direct realism assumes that if a subject directly hears a particular entity, the latter must be a mind-independent physical object. However, this assumption can be challenged on several grounds. The main objection is that we can directly hear entities that are essentially mind-dependent objects. Such entities are, for instance, the contents of auditory hallucinations, (see Fig. 1). A listener can experience a hallucination without the content of this hallucination being directly dependent on the mechanical behavior of an external material object. But even if we disregard hallucinations, auditory contents like auditory illusions, harmonies and melodies in music, or phonemes in speech perception, appear to be essentially mind-dependent auditory objects, constructed by the brain—see Sapir (1949 [1933]), Bromberger and Halle (2000), Halle and Stevens (2002 [1991]) and Matthen (this issue). More generally, as argued by Matthiessen (this issue), a purely phenomenological characterization of direct perception may not do justice to the complexity of cross-modal processes by means of which we construct an auditory or visual image.

Yet, although the strong direct realist view must be amended, there may well be good reasons to maintain some key realist and externalist insights in a reasonable ontology of auditory objecthood. An important such insight is that, in natural and ecological hearing and listening, we do obtain information about the identity and location of physical objects. For instance, we obtain knowledge about the identity and location of particular speakers by hearing their voices. Likewise we can acquire knowledge about a moving train while listening to the variety of sounds that it produces.

There are several, non-exclusive ways to maintain the fundamental motivation for the realist principle without being too liberal concerning the ontology of audition. One way is to adopt the thesis that we directly hear events or activities caused by material individuals or activities produced by material individuals (Casati & Dokic 1994). This doctrine is sometimes termed the Located Event Theory. It holds that the direct targets of auditory perception are sounds qua events located in resonating objects (physical objects that generate the sound). Several theories endorse or refine the Located Event Theory: see Pasnau (1999), Casati and Dokic (2005), O’Callaghan (2007), Matthen (this issue), Roden (this issue). On that account, the function of audition is to keep track of events or activities happening to material objects.

The objection levelled against strong direct realism (or strong externalism) can be reiterated for certain versions of the Located Event Theory, since one may hear audibles that are not located events. An alternative approach, put forward by Matthen (this issue), rests on the idea that audition can give us direct access to a variety of audibles or auditory objects, with various degrees of dependence to their physical source. This view renders the specification of the function of audition more complex. But it has the advantage of putting on a par the various kinds of auditory contents that we experience, including the sounds of speech and music.

4 Overview of the Issue

This issue on ‘Objects and Sound Perception’ contains seven articles on the nature of sound perception. While the papers tackle the issue in different but complementary ways, four main themes emerge. The focus of Stephen Davies’ and David Roden’s contributions is on the phenomenology of musical perception, and more specifically on the role of timbre in the identification of musical sources. In a second group of papers, Hannes Matthiessen and Mohan Matthen discuss the question of what we can hear directly, and propose different ways of clarifying the concept of direct perception in the case of audition. A third theme, common to the works by Matthew Nudds, Mohan Matthen and Michael Kubovy & Michael Schutz, concerns the phenomenon of perceptual organization and object perception: Matthen and Nudds study the status of grouping in auditory perception; Kubovy & Schutz’s emphasis is on cross-modal perception and audio-visual binding. A fourth theme, finally, discussed more specifically by Casey O’Callaghan, by Michael Kubovy and Michael Schutz, and by David Roden, concerns how sounds and sources are located in space.

At the crossroads of aesthetics and the cognitive sciences, Stephen Davies analyzes the sense in which research on object-directed cognition can improve our understanding of musical works. Davies argues that audition is comparable to vision in many respects, such as the capacity to recognize enduring material objects as the same persisting individuals. According to Davies, the visual capacity to recognize individuals that persist as the same over time and change in visual appearance is matched by an aural equivalent in the musical domain, which is the capacity to recognize melodies.

Object-directed cognition is at the heart of Davies’ ontology of musical works, which he calls timbral sonicism (Davies 2001: 64). Davies holds that timbre—which may well be an indispensable attribute for the recognition of sound sources (Handel 1995; Neuhoff 2003)—is relevant to the specification of the identity and existence conditions of a musical work. This relevance is evidenced by the intimate connection between timbre and the voice or instrument from which it is issued, and the importance of the latter for the identity of a musical work. Davies rejects the kind of ontological formalism that denies that timbre, or ‘musical colour’, is a work-characterizing element within the musical composition. Timbre is an integral constituent of each musical work and cannot be altered without threatening the work’s identity. In his view, a performance accurately instantiates a musical work if it preserves the specified timbres—namely those of the instruments indicated by the composer (in the case of written music).

Davies’ analysis deals mainly with the ontology of works that use scores and written materials to direct musical performances. This ontology cannot exhaust the field of ‘sonic arts’, if we define sonic arts as the set of arts (such as sound installations, electronic music, audio-visual films) that manipulate sounds regardless of whether they use musical scores as a norm for identifying the artwork. Clearly, many kinds of works of art use oral communication, recorded sounds or electronically generated sounds without using the mediation of scores.

Some specific challenges for the ontology of music are discussed by David Roden. Roden uses considerations about timbre to refute formalist and acousmatic accounts of electronic music and sonic arts. As exemplified in the work of the radiophonist and musician Pierre Schaeffer (1952, 1966), audio techonologies make it possible to distort and disconnect otherwise familiar sounds from their causal origin. In such cases, the experience of sounds is described as ‘acousmatic’, namely as severed from the assignation of a source, thereby giving support to a phenomenalist conception of the nature of sounds. In his paper, Roden defends the view that perceiving sounds is fundamentally perceiving changes in a sounding object (following, in particular, Casati and Dokic (1994), and in broad agreement with the papers by Nudds and O’Callaghan in this issue). The problem discussed by Roden is that electronic sounds in particular are often perceived as having no clear location or source, in line with the Schaefferian view, and in conflict with the physicalist conception of auditory objects as distally located events. Roden examines several responses to this problem, and argues that spatial indeterminacy is only a transitory phase in the process of sound perception, irrespective of the familarity or strangeness of the sound generation mechanism.

In ‘Audio-Visual Objects’, Michael Kubovy and Michael Schutz discuss the problem of audio-visual binding, namely the way in which the visual system and the auditory system bind the information received through different perceptual channels. In an earlier paper, Kubovy and van Valkenburg (2001) pointed out that audition, unlike vision, is concerned more with sources than with surfaces: in their view, pitch and timbre matter more to the identity of a sound than its spatial origin, or the surfaces on which it is reverberated. In vision as well as in audition, moreover, a perceptual object is defined by Kubovy and van Valkenburg as ‘that which is susceptible to figure-ground segregation’. In the cross-modal case, audio-visual illusions provide evidence that the content of auditory perception can be altered by visual perception (as in ventriloquism and the McGurk effect). In their paper, Kubovy and Schultz present further evidence for this fact, based on experiments by Schutz and Lipscomb (2007), revealing how visual information can modify the perception of the duration of sounds in specific musical performances.

In his paper on the diversity of auditory objects, Mohan Matthen discusses the problems of atomism and direct objects in auditory perception. Matthen adduces evidence against perceptual atomism, namely the idea that a whole is perceived by synthesis of its different parts. In the perception of phonemes or the experience of melodies and harmonies, he argues that perception is more clearly holistic than atomistic, and that the analysis of parts is a top-down process that occurs only secondarily. While Matthen adopts an externalist conception of sounds as object-located events, his main claim is that we hear directly not only sounds, but also complexes of sounds. For Matthen, the scope of direct auditory perception is delimited by subpersonal learning processes, and to that extent, Matthen’s view of auditory perception sides more clearly with a constructivist conception than with a strictly externalist one.

Hannes Matthiessen’s paper proposes a critical discussion of a conception of direct perception originally proposed by Paul Snowdon, and investigates the extent to which the concept can be used in the same way in the visual and in the auditory domain. Snowdon’s criterion for direct perception is the possibility to demonstratively refer to the perceived object and thereby form a true judgment. Matthiessen questions whether we can directly hear sounds, sound sources, events and also physical objects in that sense. The focus of Matthiessen’s paper is on a phenomenon of cross-modal perception called ‘facial vision’, also known as echolocation, which designates the ability to create a visual image of a material object based on the way it reflects sounds. Phenomenologically, agents with ‘facial vision’ seem to be able to make true demonstrative judgments. Psychophysically, however, the process of echolocation seems emblematic of a phenomenon of indirect perception. Matthiessen uses this case to cast doubt on a purely phenomenological definition of direct perception.

In his paper, Matthew Nudds submits that auditory experience cannot be adequately understood by theories that merely refer to internal auditory principles of binding. According to Nudds, audition does more than represent sounds, it represents sound sources, which are mind-independent individuals. Nudds’ account is clearly externalist in this respect. Thus, for Nudds, ‘we can only explain why the auditory system groups frequency components as it does in terms of a process that functions to tell us about the sources of those sounds and auditory objects.’

This object-directed account raises several issues. A first question is the counterpart for audition of a traditional question about vision, originally put forward by Dretske (1981) and Campbell (2002): How can auditory grouping reliably reflect the mind-independent, environmental or ‘objective’ distribution of sound sources? Could grouping fail to adequately reflect the identity and location of sound sources? Nudds argues that the answer is to be found in the biological function of the auditory system. On his account, ‘the auditory system groups sets of frequency components because they have the same spectral composition,’ and, in the normal case, because they originate from the same source. Nudds gives a careful account of auditory processing and auditory grouping in support of his claim that the function of audition is not primarily to attend to the qualities of a sound, but rather to objects that produce it. On that basis, he draws the conclusion that musical hearing is unparadigmatic of auditory perception, precisely because attention is drawn to the grouping of sounds rather than to the sources.

In an approach that parallels Nudds’ critical analysis of traditional psychoacoutics, Casey O’Callaghan criticizes the view according to which the function of audition is to primarily inform us about the qualities of sounds, namely about their pitch, timbre, duration and loudness. On O’Callaghan’s account of the phenomenology of audition, audition has the function of representing space, and in particular of representing the location of objects and events that produce sounds. In this, O’Callaghan opposes the account defended by Strawson (1959), according to which sounds are intrinsically non-spatial, and bring spatial information only derivatively, in relation to other sensory modalities. O’Callaghan presents several arguments against Strawson and those (like Nudds 2001; O’Shaughnessy 2000) who consider that the location of sounds is inferred, rather than directly represented in the experience of sounds. A consequence O’Callaghan draws from his account is that vision and audition share a dimension of spatial content, even if sounds do not provide the same kind of spatial information about objects as vision does. Incidentally, O’Callaghan’s view seems compatible with the account of Kubovy and Schutz (this issue), who describe the function of the auditory ‘where’ subsystem as that of providing information to the visual system ‘to direct the gaze toward the source of the sound’.