Keywords

1 Introduction

Sound design has been elevated as a sound art through the rise of “musique concrète” (concrete music), introduced by Pierre Schaeffer around 1948. This music composition technique uses recorded sounds as raw material and, through the later addition of electronic sound production, has been generally known under the term electroacoustic music. Today, electroacoustic music encompasses many styles—from purely concrete or electronic to hybrid forms that include instrumental performances—in both “academic” and “popular” styles. In a project to establish a corpus of historical electronic music [1], the authors found it hard to trace a categorical boundary between art music and popular music. The present study does not focus on a particular type of music but rather on the sound qualities of the music, whatever it is, leaving aside more traditional musical aspects (mainly related to pitch and tonality) already addressed by traditional musicology. The focus will be on electroacoustic music but with an eye to possible applications of the methodologies for other types of music.

This chapter’s topic of interest is related to the analysis of the sound qualities of electroacoustic music. This has remained a marginal academic activity compared to the analysis of more traditional types of music, “classical” music in particular. For instance, the reference articles about music analysis, cf., [2,3,4], do not even mention electroacoustic music [5]. Furthermore, many electroacoustic music composers do not consider analysis a necessary activity. Some even consider it as potentially hazardous [5].

Music analysis encompasses many approaches, from understanding the creative composition process on one side to the variability of listeners’ reception, understanding, and appreciation on the other. The latter, the esthesic perspective, can be considered in various ways, from surveys of the broad feelings experienced by listeners to more systematic investigations of aspects of the music that could impact the listener’s experience [6]. The focus of this chapter lies in the latter approach. Guided by previous musicological systematic studies on the topic and the state of the art in computational sound and music analysis, we investigate whether computational tools can offer new ways to go beyond the current limitations of musicological analyses. We will see a gap between, on one side, the overarching analytical methodologies and ideals developed by musicologists and, on the other side, the modest contributions of today’s computational systems. The complexity and infinite richness of the electroacoustic sound universe make it challenging to design computational analytical approaches and, for the musicologists, even to formalise and systematise their modus operandi. Once we provide the machine with the capability to analyse electroacoustic music, the resulting tool could metamorphose the paradigms framed by musicology.

Interestingly, musique concrète was still in its infancy when an extensive and seminal theorisation of its compositional process was published in Pierre Schaeffer’s Traité des objets musicaux [7, 8]. Despite its deep influence on later musicological works, the treatise was not aimed at analysis. Rather, Schaeffer described it as “first and foremost a treatise on listening” ([8], p. 539). It was oriented towards a particular music aesthetics based on clearly separated sound objects, alluding to the limited music technology at that time.

Since the Traité, a few important analytical frameworks have been developed, as will be discussed in later sections. This overview of musicological methodologies of electroacoustic music analysis enables us to highlight the most important points of the Traité and augment it with a large area of descriptors that can be structured along various categories. We can then use this categorisation as a reference grid to compile an overview (presented in Sect. 4) of the state-of-the-art computational music analysis suitable for electroacoustic music. We will see that what today’s technologies can offer is of great interest for the analytical investigation, but still, a lot of progress needs to be made. Thanks to this deep and synthetic understanding of musicology’s needs, Sect. 5 sketches a proposed answer to those needs, with the objective in the longer term to establish a Toolbox des objets sonores.

2 Pierre Schaeffer’s Traité des objets musicaux

The Traité des objets musicaux has played a monumental role in the establishment of a theoretical framework for electroacoustic music, and musique concrète in particular, both for music analysis but also composition. Pierre Schaeffer should not be considered only as an artistic trailblazer for his vision of a new musique concrète and the foundation of the French school (around the INA-GRM in particular), but also as the proponent of a highly multidisciplinary scientific endeavour to theorize the musical activity of sound listening. Schaeffer saw the limitations of psychoacoustics and music psychology, which were at the time restricted to individual parameters. The activity of listening to sound, which he considered to be studied in a domain called acoulogy, would require attention to the multidimensionality of highly interdependent dimensions.

From the start, the Traité was conceived by Schaeffer as a first step towards a more complete treatise of musical organisation. This first step is mainly based on the articulation of two parts. First, a typology, to identify sound objects based on criteria of articulation (related to discontinuities in the sound) and prolongation, a dichotomy taken from the opposition between consonant and vowels in linguistics ([8], Chaps. 24–26). Second, a morphology, to qualify the sound objects within their contexture (Chaps. 28–34).

2.1 Sound Objects and Reductive Listening

The core notion in the Traité is, as indicated in its name, musical objects, or sound objects. These sound objects are supposed to be detected through a phenomenological approach called reductive listening, meaning that the focus should be on the sound material itself, without reference to the origin (production) or signification (context) of the sounds (Chap. 15). For one listener, the same sound object, when listened to repeatedly, is fluctuating and unstable due to the variability of the listener’s intentions, trying each time to focus on some particular aspects of the sound. This is not considered subjective but a bundle of complementary perspectives around the same object, leading to a set of unified traits.

2.2 Typology

The first step in Schaeffer’s approach, the typology, aims at identifying the sound objects through segmentation and classification. The typology is decomposed into two dimensions (Chap. 24):

  • Facture addresses the overall shape of sound objects under three different characteristics. The main one is related to how the energy of the sound production of a given sound object is maintained over time: either sustained (continuous energy over some duration), iterative (discontinuous energy production, leading to succession of sound and silence) or impulsive (significantly short duration). For sustained and iterative objects, there is also a distinction between moderate and immoderate duration, with respect to a duration threshold of around seven seconds. Finally, in the case of immoderate duration, there is a further distinction between homogeneous and unpredictable facture depending on whether the dynamic evolution is either stable or predictable, or variable and unstable.

  • Mass relates to the sound’s inner material, pitch, and spectral distribution. Mass is considered an elementary morphological characterisation further developed in the morphology step. The first distinction is whether the mass is fixed or varies over the temporal duration of the object. Fixed mass is further distinguished, whether a clear pitch or an inharmonic sound. Varying mass is distinguished by whether it evolves simply and predictably or, on the contrary, in a somewhat random fashion.

A third underlying dimension considered in the typology is related to the capacity of objects to be integrated into music structures, distinguishing between balanced, redundant and eccentric objects.

2.3 Morphology

The morphology describes the internal features of the sound objects, i.e., textural, spectral, timbral, and pitch-related. The treatise presents seven distinct morphological criteria (chapter 34):

  • Mass is related to sound spectrum characterisation and is decomposed into seven classes: pure sound (one single fundamental), tonic sound (harmonic sound), tonic group (a chord), dystonic sound (“son cannelé”, where pitch becomes ambiguous due to inharmonic partials), nodal sound (noise occupying a specific spectral range), nodal group (several ranges), and white or coloured noise (occupying complete spectrum).

  • Harmonic timbre qualifies the spectral envelope of the series of partials based on sub-dimensions such as full/hollow/narrow, rich/poor, and bright/matt.

  • Dynamics takes into consideration seven nuances of intensity and eight dynamic profiles, as well as eight attack classes. It also formalises the decomposition of sound objects into three phrases: attack, body and decay.

  • Grain (or granularity) relates to the sound’s rugosity due, for instance, to very fast oscillations of dynamics, or very rapid iteration of short sounds. This is studied along three main parameters: oscillation amplitude (the dynamic range of the oscillation), rate (how fast the oscillation is) and type (resonance, friction, and iteration).

  • Gait (“allure” in French) relates to the slower fluctuations in harmonic content, pitch, loudness, etc. It can be either a continuous oscillation (for instance, a regular and continuous oscillation in the pitch curve or a regularly recurring continuous variability of the spectrum, of a slight and continuous oscillation in dynamics, etc.) or a more discontinuous succession of events of more clearly distinguishable sub-objects. Two parameters associated with gait are agent (mechanical, living, and natural) and form (order, fluctuation, and disorder).

  • Melodic profile describes a non-periodic (or else very slow) variation. It relates to the profile in pitch height of the pitch component(s). Four melodic profiles are distinguished: podatus, torculus, clivis and porrectus.

  • Mass profile relates to the evolution of the temporal harmonic or inharmonic spectrum, with four classes of profiles: swelled, delta, thinned and hollow.

For both melodic and mass profiles and for gait and dynamics, each object is characterised using a combination of two dimensions (amplitude and variation rate), with three classes in each dimension, leading to a \(3 \times 3\) table with nine different categories.

2.4 Beyond the Traité

As acknowledged by Pierre Schaeffer himself, the ambitious programme of music research that was heralded by the Traité was cut short and restricted to its first part. It mainly focused on the taxonomy and morphology of individual sound objects, as presented above, without the initially planned additional morphology of the constructions of elementary objects into structures. However, the conceptual and methodological framework developed in the Traité is still valuable for analysing electroacoustic music.

Schaeffer conceptualised and wrote the Traité at a time when musique concrète, at its early stages, was characterised by structural simplicity due to technological limitations. For that reason, objects are identified in the typology through articulation and stress (“appui”) (Chap. 21), which would be suitable solely for objects with apparent attack and sustain phases. The framework does not handle the “polyphony” of superimposed sound objects nor more complex music productions featuring continuously evolving elements. This is also problematic in the morphology, for instance, for the characterisation of dynamics, due to the limitations of dividing all sound objects into three phases: attack, body, and release [9].

The typology is aimed at identifying sound objects through segmentation but also at placing these objects within a classification, to organise the collection of objects. The objective seems to be to select objects of sufficient quality for integration into the composition. Thus, there seems to be an underlying poietic perspective motivating its conception. As discussed below, this aesthetically normative connotation of the Traité has been criticised.

The proposed morphology is very rich, conceptually and methodologically speaking, and had a substantial impact on later research. There is, however, a belief in the possibility of a very systematic and highly articulated method in the Traité, which did not actualise effectively. Nonetheless, this is inspiring, from a scientific point of view, for the design of a systematic and comprehensive framework, as discussed in the rest of this chapter.

3 More Recent Analytical Approaches

Most, if not all, musicological research on electroacoustic music has been highly influenced by Schaeffer’s theoretical and analytical accomplishments, while opening new perspectives. This section presents a chronology of the major developments.

3.1 R. Murray Schafer’s Soundscape Analysis

As part of his analytical study of soundscapes, R. Murray Schafer classified sound objects into two main branches related to physical characteristics and referential aspects [10]. Concerning the physical characteristics, he builds on characteristics by Schaeffer but separates the analysis of each sound object into its three successive phases: attack, body and decay. For each, the following characteristics are evaluated:

  • Relative duration, for attacks: sudden, moderate, slow, multiple; for the body: non-existent, brief, moderate, long, continuous; for the decay: rapid, moderate, slow, multiple

  • Frequency and mass, five degrees from very low to very high

  • Fluctuation and grain, steady state, transient, multiple transients, rapid warble, medium pulsation, slow throb

  • Dynamics, five degrees from very soft to very loud, plus the transitions from loud to soft and from soft to loud

The referential categorisation is divided into natural sounds (the four elements, animals, seasons), human sounds (voice, body, clothing), society-related sounds (rural, urban, maritime, domestic, activities), mechanical sounds, silence and indicators (alarms, etc.).

3.2 Denis Smalley’s Spectro-Morphology

The composer Denis Smalley observes that still, at the end of the 20th century, there is a lack of shared terminology for describing sound materials and their relationships in electroacoustic music. He proposes an approach founded on a spectral typology, a morphology, and a study of motions, structuring processes and space [11, 12].

Articulating between a typology and a morphology seems reminiscent of Schaeffer’s typo-morphology. However, there is a contradiction between the two scholars concerning what should be part of typology and what should be in the morphology. Indeed, Schaeffer considered the dynamic shape of sounds (the facture) as one core element of the typology. In contrast, its other element, the mass, was integrated into the typology in a compact form and the morphology in a more extended form. For Smalley, this is quite the contrary: the typology is founded on Schaeffer’s idea of mass—here developed through an exciting reflection about the possible states along the note-to-noise continuum—while the morphology studies how “spectral types are formed into basic temporal shapes” ([11], p. 65).

The morphology discerns three morphological archetypes as the source of traditional instrumental sounds (Fig. 1). The first describes impulsive attacks. The second includes attacks with a decay, either closed (with a quick decay that is strongly attack-determined) or open (including an intermediary continuous sound). The third covers graduated continuants, modelled on sustained sounds, with a graduated onset and a graduated termination. The three archetypes further contain one or several temporal phases, among the trilogy onset/constituant/termination, itself echoing the trilogy attack/body/decay.

Departing from these traditional reference points, Smalley extends the archetypes into a broader listing of morphological models by manipulating the duration and spectral energy of the three phases [12]. Morphologies can be linked or merged to “create hybrids” ([11], p. 71), in the form of morphological stringing, where correspondences can be merged within open constituents through cross-fading, or as a consequence of reversed onset-termination.

Fig. 1.
figure 1

Morphological archetypes: 1. attack-impulse, 2. closed attack-decay and open attack-decay, 3. graduated continuant, based on [11, p. 69]

Smalley also developed a refined motion typology, related to real and imagined motions created by spectro-morphological design. Here a motion category can be defined as “the external contouring of a gesture, or the internal behaviour of a texture” ([11], p. 73). He develops an additional typology related to the internal motion style of spectral texture, with four modes (streaming, flocking, convolution and turbulence), either continuous or discontinuous, and with an additional axis (iterative/granular/sustained) and three additional characteristics: periodicity, accelerating vs. decelerating and grouping patterns.

Smalley also discusses the variable scales of significant units in electroacoustic music. He argues that a unit “is often difficult or impossible to perceive, particularly in continuous musical contexts which thrive on closely interlocked morphologies and motions” ([11], p. 80). This exposes the limitations of the Traité, which focuses on isolated sound objects. In Smalley’s theoretical framework, there can be a multi-levelled structure, with possibly permanent or temporally fractured hierarchies of various temporal dimensions. Finally, a detailed spatiomorphology is detailed [12, 13].

3.3 Stephane Roy’s Hierarchical and Functional Analysis

Stephane Roy demonstrates an impressive ability to carry out detailed analyses of electroacoustic music. His approach is based on producing visual representations (which he calls a “transcription”) of the pieces, based on depicting sound objects using a large palette of graphical styles [5]. Then he develops a hierarchical analysis based on “units” of multiple hierarchical levels. The graphical representation is completed with a detailed textual description.

Can such eloquent analyses be systematised into a reproducible methodology? The approach is developed within the music semiology of Jean-Jacques Nattiez [14], who supervised Roy’s doctoral thesis, and is based on the “neutral-level analysis.” The idea is that the analysis could be conducted by systematically applying a limited list of objective rules to the music, following an approach initially developed by Nicolas Ruwet [15]. In my view, a close study of Ruwet’s argumentation proves the scientific invalidity of the approach [16]. The whole analysis is founded on subjective decisions, contrary to what is claimed. However, despite the epistemological failure, the discussion about the possible mechanisms underlying music analysis (including Gestalt rules and auditory scene analysis through stream segmentation) is of high interest and will be discussed later.

The originality of Roy’s approach is the functional taxonomy that can be associated to units based on their inter-relationships. It is based on the inner characteristics of each unit, the relationships of these characteristics among units, and the overall context of the development of those units throughout the piece. The functions are structured into four categories:

  • Orientation: introduction, trigger, interruption, conclusion, suspension, appoggiatura, generation, extension, prolongation, transition

  • Stratification: figure, support, foreground, accompaniment, tonic and complex polarising axis, movement, background

  • Process: accumulation vs. dispersion, acceleration vs. deceleration, intensification vs. attenuation, spatial progression

  • Rhetorical:

    • Relational: call and response, announcement and reminder, theme and variation, anticipation, affirmation, reiteration, imitation, simultaneous and successive antagonism

    • Rupture: deviation, parenthesis, indication, articulation, retention, rupture, spatialisation

This functional typology is also translated into a set of graphical symbols that are added to the visual analyses.

Roy also experimented with the adaptation of other notated music analyses to his “transcription” of electroacoustic music: Nicolas Ruwet’s paradigmatic analysis, as mentioned above, as well as Lerdahl and Jackendoff’s General Theory of Tonal Music [17] and Leonard Meyer’s implicative analysis [18].

3.4 Lasse Thoresen’s Graphical Formalisation

In addition to creating a phenomenological perspective on Schaeffer’s framework, Lasse Thoresen has adapted Schaeffer’s typomorphology augmented with a graphical formalisation [19, 20]. The typomorphology is simplified by removing the normative concepts of object suitability, originality and redundancy, the distinction between “facture” and “entretien,” as well as the duration threshold (although very long notes are formalised in the form of ambient notes). This leads to a simpler typology, where long sustained notes with unpredictable dynamics are called vacillating, while long iterative notes with unpredictable iteration are called accumulated. In between those extremes are the concepts of stratified and composite objects.

The core contribution of Thoresen is the design of a graphical formalisation of Schaeffer’s theory, representing each sound object in the time/frequency space with its typological characteristics, as illustrated in Fig. 2. It enables us to go into more detail, localising over time the spectral particularities (position and characteristic of each spectral subgroup) and indicating their individual facture. Additional global graphical characterisation of objects is made available, such as the distinction between flutter and ripple notes, with an indication of their inner pulse regularity as well as of possible accelerando and ritardando. While Schaeffer kept a single structural level for the successive objects, lacking, therefore, an actual structural analysis, Thoresen takes benefit of this representation to show how objects are made of sub-objects, which can be characterised as well. For instance, the individual components of an accumulation, the construction of a sound web (“trame”), a large note, an ostinato, a cell, an incident or accident (special cases of, respectively, composite and stratified objects), a chord.

This formalisation enables us to address Schaeffer’s morphology, representing the mass of each object, its evolution over time (expanding, bulging, receding, concave, etc.), and its dynamic profile. A few graphical conventions have been added to indicate particular aspects of the morphology:

  • Mass: saturated spectrum and white noise

  • Dynamic profile: categorisation of onset (brusque, sharp, marked, flat, swelled, gradual, inexistent) and ending (abrupt, sharp, marked, flat, soft, resonating, interrupted)

  • Pitch, dynamic and spectral gait: characterising both deviation and pulse velocity

  • Granularity: characterising the coarseness and the velocity of the grains, as well as sound spectrum location, weight (or importance) and spectral placement of the grains.

It is also possible to indicate the brightness level of each sound, as well as its gradual change.

Lasse Thoresen also identifies “time-fields,” describing the segmentation of form sections, and “dynamic forms” tracing the perceived directions of energy flow [20, 24]. He also pursued the application of two other central terms in Schaffer’s analytical work [7, 8], namely “caractère” (character) and “valeur” (value). Whereas “sound-character” refers only to a timbral constant that supports pertinent values, a form-building entity, termed integral sound-character, consists of a union of sound-character and its temporal behaviour [20, 23].

3.5 Ecrins Audio-Content Description

The Ecrins project, a collaboration between IRCAM and INA-GRM, was aimed at offering tools for the classification of online sound samples, based in particular on Schaeffer’s typomorphology [25]. It also contributed to the theoretical establishment of a taxonomy of analytical descriptors, introducing audio content features such as duration, dynamic profile (flat, increasing, decreasing), melodic profile (flat, up or down), attack (long, medium, sharp), pitch (note pitch or area), spectral distribution (dark, medium, strident), space (position and movement), and texture (vibrato, tremolo, grain).

Fig. 2.
figure 2

Graphical analysis by Lasse Thoresen of the beginning of Åke Parmerud’s Les Objets Obscurs. Screenshot of an animated version [21] of [20, Figure 11.4], from the companion website [22]. The orange rectangles highlight the section being heard, as indicated by the orange vertical playhead. (Color figure online)

3.6 Structural and Functional Analyses

Some aspects of structural and functional analyses have been mentioned above, but there exists also a large range of works related to the establishment of units or sections—possibly along multiple hierarchical levels, and not necessarily following a strict hierarchy—and in assigning various functions or categories to the units or sections [19, 26]. This is not a research question specific to electroacoustic music, so it can be investigated for traditional instrumental music, on score or audio recordings of performances. The main question of interest for a computational implementation of approaches of this type is whether they are systematic and can be formalised with explicit discovery methods. This is a question that exceeds the scope of the present study.

3.7 Pierre Couprie’s Morphology

Pierre Couprie proposes a methodology for morphological analysis of electroacoustic music offering a comprehensive synthesis of previous approaches [9]. The internal morphology focuses on what is inherent to the sound and does not depend on any external factor:

  • Spectrum: type (related to Schaeffer’s facture), density (compact, normal, transparent), movement type (stationary, linear, breaking, oscillating), movement cycle, amplitude and rate, acceleration or deceleration (related to Shaeffer’s gait)

  • Dynamics: attack profile (related to Schaeffer’s categorisation, but simplified a bit), movement type (same as for spectrum), movement cycle, amplitude and rate, acceleration or deceleration (also related to Shaeffer’s gait)

  • Grain: number of sounds, spectral positions, characterisation (with respect to type and amplitude), speed

  • Internal space: considered in 2 dimensions in a \(3 \times 3\) grid.

The referential morphology links the considered object to other elements in the work (indicating whether the citation is exact, transformed, or an evocation) or external to the work itself (from another work, or from a general concept). The references are analysed according to these categories:

  • Causality, based on Schafer’s categorization

  • Voice description: type, text, rhythm, speed, pitch variation, colour, cadenza, silence, density, alliteration

  • Effects: temporal modification, internal spectrum, dynamic envelope, external element

  • Emotions and sentiments.

The structural morphology is based on analytical tools to reveal the structures of the work along all levels.

4 Computational Electroacoustic Music Analysis

Analysing electroacoustic music can be challenging. One reason is that there are fewer formalised rules than in many other genres. Another reason is the absence of a written music representation provided by the composer. On the other hand, the composer might provide detailed sketches describing a piece, and related computer code for processing and synthesis may sometimes be made available for analysis. The present study does not consider such poietic information, instead focusing on analysing a piece simply from available audio.

Can some of the analytical frameworks introduced above be formalised, systematised and automated with the help of computer implementations? First, we need to clarify and formalise the analytical principles. In the following, we will discuss, first, the detection of basic objects in the music, and second, how to address various musical dimensions.

4.1 Sound Object Detection

Much research has been dedicated to automated score transcription of music performance recordings, particularly detecting individual notes, characterising their temporal positions, pitch, instrumentation, playing style, etc. [27]. This is a complicated problem which has been tackled using two different methods. The first is purely based on signal processing, computing representations based on mathematical equations to decide the location of the note events and their characteristics. These equations are based on general music acoustics and psychoacoustics principles, particularly those related to sound scene analysis and Gestalt rules.

An alternative approach, which has largely superseded the first one this last decade, is based on machine learning and especially deep learning [27]. Here, an artificial neural network is trained using large audio collections, often indicating where notes should be found and their characteristics. The main difficulty in this approach is the need for such an extensive training dataset. This is particularly problematic for electroacoustic music because there do not exist many detailed analyses. More problematic is the lack of consensus about such analyses, and one analyst might struggle to decide what constitutes a sound object and what does not. Even if we could create an extensive dataset, the large variety of electroacoustic music styles is such that the machine could not generalise well to styles or, for instance, synthesis techniques not included in the dataset.

A possible solution might be to rely on unsupervised learning, where the model is not trained on examples given beforehand but through an automated search for regularities. To my knowledge, these approaches have been used to broadly segment audio recordings into distinct parts, but not yet for more detailed detection of individual sound events. All in all, the problem of automated detection of sound objects in electroacoustic music remains unsolved.

4.2 Dynamics

Once a sound object has been segmented, with a set of partials and/or wider energy bands evolving within a specific time and frequency region, the characterisation of its dynamics might look at first sight somewhat straight-forward, measuring the amplitude of the signal on a relatively slow temporal scale. However, perceived dynamics are not directly correlated with the linear amplitude of the sound or even a more subtle logarithmic relationship. It requires taking into account more subtle properties of the auditory system, for instance, related to the variable impact of the different frequency regions, the effects of critical bands, and the presence of masking effects.

Even more complicated is the fact that listeners’ assessment of the dynamics of a given sound event is not simply related to the mere properties of the sound itself but also to their experience of how, when listening to live sound production (such as an instrumental music performance, but not only), the spectral quality of the sound changes depending on the actual loudness of the sound. For example, if the spectral quality of a recording corresponds to the production of a loud sound but is played back with low loudness, the sound dynamics would generally be perceived as loud. As Smalley mentioned:

During execution of a note, energy input is translated into changes in spectral richness or complexity. When listening to the note we reverse this cause and effect by deducing energy phenomena from the changes in spectral richness. ([11], p. 68)

For these reasons, predicting the perceived dynamics of each sound object is challenging, which has been addressed, for instance, using machine learning approaches [28].

In opposition to the problematic concept of perceived dynamics, it can be valuable to simply estimate the dynamic evolution of the loudness throughout the sound object. As mentioned in the literature review above, a sound object can be decomposed into three main phases: attack, body and decay. Another typical pattern is the attack-decay-sustain-release (ADSR) concept used in many synthesisers to generate natural-sounding sounds. But real-life sounds—and even more complex, artificial electro-acoustic sounds—may have dynamic curves that do not easily fit ABD or ADSR patterns. The detection of attack and (final) decay phases can be done by computation of temporal derivatives and detecting when they reach particular thresholds. This will work on simple examples, but more refined heuristics may be needed for more complex temporal envelopes. Characterisations such as attack time, attack slope, etc., play an essential role in timbre characterisation; as we will see later, these can be directly measured from the extracted attack and decay phases.

Dynamics can be assessed not only for individual sound objects but also for the resulting mix. This results in a single dynamic curve indicating the overall profile. The traditional method for dynamic curve estimation consists of discarding the fast-evolving part of the signal to focus solely on the slowly evolving part using signal processing methods such as low-pass filtering or windowed analysis. I developed a new method that can adequately represent a sudden increase of dynamics while discarding micro-silences (shorter than one second), while simultaneously attempting to model saturation effects taking place within separate frequency registers [29].

Figure 3 shows an example of the dynamic curve I developed, computed here for the analysis of Pierre Schaeffer’s fourth of the Five Studies of Noise (Cinq études de bruits), initially called “Composée, ou étude au piano”, composed from piano sounds recorded for Schaeffer by Pierre Boulez. The dynamic curves are compared with simple RMS computation. We can notice in particular that some parts in the piece—for instance, 100 s after the start—have rather low RMS values but a larger value in the dynamics curve. In other places—such as between 170 and 180 s—RMS values oscillate rapidly, while the dynamics curve indicates a more progressive evolution. The dynamics curve is obtained through a decomposition of the energy into Mel bands, filtering of each band separately via an original filtering model, and concluded with a summation along bands.

Fig. 3.
figure 3

Analysis of dynamics in Pierre Schaeffer’s fourth Étude de bruits. Top-down: 1. Root Mean Square (RMS) computed on 0.1 s frames with half-overlapping, 2. RMS on 0.5 s frames, 3. proposed dynamics curve and 4. decomposition of that dynamics along 35 Mel bands (higher amplitude shown with brighter colour) with subsequent filtering within each band. (Color figure online)

4.3 Facture

Schaeffer’s notion of facture—which, as described above, corresponds to the characterisation of sound objects as impulsive, sustained or iterative—can be approximated using relatively simple signal processing approaches. Once extracting the dynamic curve, as discussed in the previous paragraph, we can qualitatively differentiate between sound objects that are either clearly impulsive or sustained through observation of the duration of the attack, body and decay phases. But what are the actual thresholds governing the limit between those categories, and what is the impact of the three successive phases in the appreciation of a sustained sound? There does not seem to be any published study on that matter.

Detecting whether a sound is iterative can be computed, for instance, by extracting the envelope curve or by computing the spectral flux over time. But iterativity is not only about a dynamic oscillation; there should also be some invariance of what is supposed to be repeated at each successive iteration.

4.4 Mass, Harmonicity and Pitch

Schaeffer’s concept of mass is related to physical characterisations commonly studied in signal processing. First, a time-frequency image of the sound object is computed through, for instance, a spectrogram, showing the evolution of the spectral distribution of the sound for successive short instants (or window frames). Noisy parts of the sound can be detected as regions of high energy with relatively large frequency widths. Partials are characterised by regions with narrow widths and can form harmonic series of one or several pitches. In other words, each pitch comprises a series of partials around multiples of the fundamental frequency. The possible deviation of the partials to the ideal series indicates the inharmonicity of the sound, often found in complex percussion instruments like bells.

“Pitch salience” indicates the relative prominence of a series of partials corresponding to one or several pitches. It can be estimated by first computing the autocorrelation function to detect pitch-related periodicities; pitchness is then estimated as the ratio of the magnitude of the highest autocorrelation peak to the magnitude of the 0-lag peak [30, 31].

Similarly, the qualification of harmonic timbre (full/hollow/narrow, rich/poor, and bright/matt) can be based on a statistical description of the partials’ distribution. Hollowness is related to the ratio of amplitudes of even and odd harmonics [32], while fullness and narrowness denote the width of the spectral distribution. Brightness, in the context of harmonic timbre, could correspond to the ratio of high-frequency partials or to the frequency centroid of the partials.

4.5 Temporal Motions

By estimating dynamics, pitch, harmonicity, and spectrum on successive time frames of a sound object, we obtain a temporal evolution of those different characteristics. One particular interest of the Ecrins project (cf. Sect. 3.5) is its detailed study of the categorisation of dynamics profiles, derived from Schaeffer’s classification (dynamic, melodic and mass profile, and gait). The dynamics profile is estimated through an envelope extraction, followed by low-pass filtering, B-spline approximation, thresholding and peak picking [33], as illustrated in Fig. 4. This allows us to estimate the temporal ratio of the ascending and descending phases as well as their slopes. Simpler estimation and classification of the dynamic profile is proposed in [31], where a series of features computed from an estimation of dynamic curves (flatness coefficient, number of onsets, maximum amplitude time, derivative before and after the maximum, and temporal centroid) were used as predictors for a machine learning classification into five classes: ascending, descending, ascending/descending, stable and impulsive.

Concerning periodic motions, grain and gait are considered in the Ecrin project under one single concept called “grain/iteration.” Dynamic periodicity is estimated through auto-correlation, while timbre and pitch periodicity is estimated using a similar method based on the similarity matrix [33]. Then, the amount of repetition and the cycle period are measured, and the repeated element is characterised. There was also the intention to classify melodic profiles, but this has not been implemented due to the complexity of the task. Some simple classification strategies are proposed too [31].

Estimating two parameters associated with Schaeffer’s gait—agent (mechanical, living, natural) and form (order, fluctuation, disorder)—and of the categorical classes associated with the profiles, remains to be studied. The characterisation of spectromorphological design into imagined motions, based, for instance, on the taxonomy proposed by Smalley, is an even more challenging topic, and its computational systematisation has not been addressed either.

4.6 Spatial Analysis

Very few studies exist in computational music analysis of audio recordings addressing the spatiality of the sound production [34]. Two features have been designed for electroacoustic music analysis, focusing mainly on stereo mix [1]:

  • Stereo spatial ebb is a measure of spectral movement comparing left and right channels

  • Two channel loudness difference is the absolute difference in perceptual loudness between the left and right channels

There has been relatively little focus on analysing more advanced spatialisation techniques from audio recordings. The spatialisation can be represented relatively straightforwardly if one starts from multichannel audio with a specification of the spatial localisation related to each track. However, there is a lack of analytical approaches and related tools for performing spatial analysis.

4.7 Other Timbral Aspects

In Schaeffer’s theory, timbre is studied through mass, harmonic timbre, attack characterisation and granularity. What about other aspects of timbre? Some timbral aspects might be implicitly indicated in Smalley’s typology of the internal motion style of spectral textures. In music psychology research, timbre has been conceptualised as a three-dimensional space, with spectral centroid (the distribution of energy along frequency), spectral flux (related to contrast in the temporal evolution of the spectrum), and attack characterisation [32]. Harmonic brightness, as part of Schaeffer’s harmonic timbre, could be related to the spectral centroid in the case of harmonic sound. But more generally, a simpler estimation of the brightness of the whole spectral distribution can be carried out by estimating the energy ratio above a given threshold [35] or by computing the spectral centroid. More generally, it has also been suggested to measure the distribution of energy along frequency bands [1, 29]. The second dimension in the timbre space, spectral flux, can be related to the study of fluctuation, or granularity. The third dimension, attack characterisation, was discussed in Sect. 4.2.

Sensory dissonance [1, 36], spectral entropy and flatness [1] are other relevant descriptors. Since they do not require a given harmonic series, they can also be computed on a general mix of sound objects. Another timbral description is transientness [1]. In Music Information Retrieval (MIR) research, timbre has been very often described in the form of Mel Frequency Cepstral Coefficients (MFCC), which is a technical representation of the spectral shape of the sound. It offers a particular interest for structural analysis, as discussed later.

One aspect of timbre that is central to everyday listening, and also to traditional music listening, is related to the recognition of sound categories based on the type of sound production and the association to the typical family of sound production classes and the underlying contexts (especially for non-musical sounds). This identification of sound class is the opposite of what Pierre Schaeffer aimed at achieving with the concept of reductive listening, but at the same time, a critical aspect of more modern sound analysis methods such as Schafer’s referential categorization. Current machine learning technologies enable the classification of each successive instant of an audio recording according to the detected sound categories, with a taxonomy that nicely resembles Schafer’s referential categorisation. For instance, the Sound Analysis framework released by Apple can recognise over 300 sound classes in four categories: Sounds of things (train, car horn, ...), Animals (cow moo, duck quack, ...), Human sounds (singing, laughter, ...) and Music (along various instrument classes). However, since individual sound events are not yet clearly detected from complex pieces, the referential categorisation of these individual sound events remains an open challenge.

Fig. 4.
figure 4

Estimation of dynamic profile parameters: a) loudness (blue) and smoothed loudness over time (red), b) 10% threshold applied to smoothed loudness, c) smoothed loudness in log-scale, d) Maximum value (vertical red bar) and B-spline modelling. From [33] (Color figure online)

4.8 Rhythm

Rhythm is considered in Schaeffer’s morphology solely in terms of the possible internal iterativity and gait within one sound object. No other aspect of rhythm is represented in more recent musicological approaches, except that Thoresen’s graphical formalisation enables the representation of the cyclic repetition of sequences of short events and conceptualises the pulse velocity and its possible change over time.

The near absence of rhythmical representation is due to the aesthetics of musique concrete and electroacoustic music, especially in the early decades. However, although not explicitly discussed, musique concrète and electroacoustic music feature interesting rhythmic elements. The “curated corpus of historical electronic music” [1], introduces rhythmical features for the computational analysis of electronic music. A first set of features is based on statistics related to the temporal position (or “onsets”) of sound objects. Another set is related to statistics concerning beats. Computational methods exist to describe rhythmical pulsation from audio without detecting an actual beat sequence. The autocorrelation function of the dynamic curve can be used to detect pulsations and their hierarchies and estimate metrical clarity and centroid [37].

Figure 5 shows this type of rhythmical analysis for Pierre Schaeffer’s fourth Étude de bruits. We notice a prominent and regular periodicity of period 0.75 s because those early studies by Pierre Schaeffer were highly based on using special phonograph discs with a “sillon fermé” (closed groove), thus with a fixed period. But other periodicities can be seen, such as a period of 1 s at the beginning of the piece, or very fast repetitions here and there. A bit before 100 s, we see a 0.75 s loop divided into 6 regular subbeats. Between 160 and 180 s, the subdivision of the loop is a bit more complex, with a seeming decomposition into 8 sub beats, but also containing other internal patterns.

Fig. 5.
figure 5

Rhythmic periodicities (shown in white) throughout Pierre Schaeffer’s fourth Étude de bruits, with time from left to right, and periods (i.e., duration between successive beats, indicated in seconds on the left) in ascending order from bottom to top.

4.9 Structural and Semantic Analysis

There exists a large range of research on the topic of computational formalisation and automation of structural analysis of recorded music. One common technique is based on computing a similarity matrix along a given audio or musical features computed on successive window frames on a given audio recording [38]. Figure 6 shows an example of a similarity matrix for Pierre Schaeffer’s fourth Étude de bruits, here focusing on simple timbral aspects related to MFCCs. From that matrix can be detected sharp transitions between successive segments (the succession of squares of various sizes along the diagonal) and repetition of sequential patterns (little white lines parallel to the diagonal).

Fig. 6.
figure 6

Similarity matrix related to the MFCC computed on 0.1 s overlapping frames throughout Pierre Schaeffer’s fourth Étude de bruits, where the frames are compared using the Euclidean distance. The high similarity is shown in white.

An overview of computational approaches in structural and semantic analysis of recorded music [39] is beyond the scope of this chapter. But concerning electroacoustic music in particular, one particular system has been developed that allows detecting the repetition of samples in a given piece of music, even in the case of “polyphonic” superposition of samples [40]. The system was designed to be partially automatic, requiring an interaction with the user. It remained in the form of a prototype, demonstrated with artificial musical examples made of concatenation and juxtaposition of pre-selected samples.

4.10 Software

A large panoply of software can be of interest for analysing electroacoustic music. Basic representations of the sound, such as waveform or spectrogram, can be computed using free or commercial software. Audio and music features can be computed using software such as Sonic Visualiser with Vamp plug-ins, PRAAT, MIRtoolbox or AudioSculpt.

Fig. 7.
figure 7

Screenshot of the EAnalysis software.

One common way to manually analyse electroacoustic music is to annotate the spectrogram by adding forms related to particular sound objects. The most common software for visual annotation of electroacoustic music are the following:

  • iAnalyse developed by Pierre Couprie since 2006, is aimed at displaying music representations in pedagogical settings for musicians, teachers and musicologists [41]. The music timeline is decomposed into successive pages, to which can be added graphic annotations, such as annotations based on Lasse Thoresen’s conceptual framework (cf. Sect. 3.4). This enables the user to illustrate music analyses, produce listening guides from annotated scores and help musicologists in their analyses. A playhead can be synced to the visualisation, the graphical annotations can be animated, and audio descriptors computed from other software can be integrated into the display.

  • EAnalysis also developed by Pierre Couprie, this time in the context of the project “New multimedia tools for electroacoustic music analysis” hosted at the Music, Technology and Innovation - Institute for Sonic Creativity at De Montfort University in Leicester (UK), funded between 2010 and 2013. EAnalysis allows the integration of various types of representations (acoustical, mathematical, musical), for music analysis purposes, as illustrated in Fig. 7.

  • The Acousmographe is a software developed by INA-GRM for the annotation of general audio representations such as waveforms and spectrograms with graphical and textual representations. The Aural Sonology Plug-In is inspired by the compositional procedures and the theoretical reflection of Lasse Thoresen (cf. Sect. 3.4) to help the listener to conceptualise and write down sound objects heard. The plug-in is equipped with a library for spectromorphological and form analysis, which includes time fields (the temporal segmentation of the musical discourse), layers (the synchronous segmentation of the musical discourse), dynamic form (time directions and energetic shape), thematic form (recurrence, variation, and contrast) and form-building transformations (simple and complex gestalts, transformations between them, e.g., proliferation/collection, fission/fusion; liquidation/crystallisation).

  • The Acousmoscribe is another annotation tool developed by the SCRIME team at the University of Bordeaux, this time based on the theoretical framework developed by Jean-Louis Di Santo [42].

  • TIAALS (Tools for Interactive Aural Analysis) a toolbox developed as part of the Interactive Research in Music as Sound (IRiMaS) at the University of Huddersfield, for musicologists to use in conducting and presenting research in which audio and video are fully integrated into the research process and its dissemination. TIAALS focuses on sound material analysis and the realisation of typological, paradigmatical or other analytical charts.

  • Other annotation software built on top of audio feature extractor tools are CLAM Annotator (on top of the CLAM framework ) and ASAnnotation (on top of AudioSculpt).

  • The EASY (Electro-Acoustic muSic analYsis) Toolbox providing a 3D visualization environment for sonic exploration and interaction [43] (cf. Fig. 8). The temporal evolution of timbre is represented as a curve in the 3D timbre space. 26 signal processing features can be computed. Automated segmentation of audio recordings is carried out, mainly based on k-means clustering.

Fig. 8.
figure 8

A screenshot of the EASY software, from [43].

The software above offer various ways to display basic visual representations of the music and to manually annotate them with more advanced analytical representations.

Interviews with three musicologists [40], revealed that they wanted some automated sound object segmentation to correct and enrich manual annotations. They also wanted the possibility to detect all repetitions of the same sample to retrieve isolated voices from a mix. There have also been suggestions of automated high-level structural analysis, for instance, with the possibility to detect the repetition of sequential patterns of sound objects.

5 Towards a Toolbox des Objets Sonores

There remains a large gap between, on one side, the overarching analytical methodologies and ideals developed by musicologists and, on the other side, the relatively modest contribution of what computational automation can offer today. The complexity and infinite richness of the electroacoustic sound universe make it challenging to design computational analytical approaches and for musicologists to even formalise and systematise their modus operandi. Once we provide the machine with the capability to analyse electroacoustic music, the resulting tool could metamorphose the paradigms framed by musicology.

Based on the panorama outlined above, I would like to emphasise the following capabilities:

  1. 1.

    to detect and precisely describe and characterise the components constituting the piece of music, from basic objects to groups of objects to structural segments

  2. 2.

    to reveal intra- and intertextuality, concerning the repetitions (with possible transformation) of those components

  3. 3.

    to reveal this rich information in the form of visualisations

  4. 4.

    to allow the analysts to modify those analyses

One overarching aim of my work is to develop technologies automating the analysis of music of all kinds, with a high level of richness and on many different musical dimensions. These technologies are aimed at being made available in the form of toolboxes for analysts (such as MIRtoolbox) as well as interactive music visualisations. In this context, one ambition here, in collaboration with Rolf Inge Godøy, is to develop technologies in line with Schaeffer’s “programme de recherche musicale,” hence a “Toolbox des objets sonores.”

The main difficulty concerns detecting the more or less “elementary” components of the piece of music. This corresponds to Schaeffer’s sound objects, but as discussed above, this notion is somewhat limited and should also include the possibility of “polyphonic” superposition of objects, horizontally and vertically. Here computational formalisation can help the theoretical development: whereas manual analyses require the theory to simplify the general organisation, defining one level of organisation, or else a rather strict structural hierarchy, computational formalisation can be based on less stringent rules, allowing the emergence of a richer variety of structures. Hence, various object candidates can be suggested in parallel on a given piece of music, and there is no need to make decisions at that stage. The computer tool can work in dialogue with the musicologist, who can correct the computational predictions, and the computer might also learn from those mistakes or the musicologist’s preferences. The detection of such components is based on auditory scene analysis and inspired by Gestalt rules and cognitive morphodynamics [44, 45]. Allowing components to contain smaller subcomponents, implicitly enables the detection, tracking and formalising of iterative and granular objects. Iteration can sometimes appear only at some parts of the super-object. Besides, the successive sub-objects do not need to be iterations of the same pattern. In this way, many descriptions can be related to the succession of sub-objects: the similarity between successive sub-objects, the contrast between them, etc.

A large range of descriptors (such as mass) can be computed, both on the overall mix and on each isolated component. The available list of signal processing features (such as harmonicity) needs to be more closely articulated with the dimensions and the corresponding categorisations proposed by the musicological works since Schaeffer’s Traité. But here also, the simplicity of strict classifications in those works can be replaced with multidimensional parametric spaces, in which particular regions define the theoretical concepts. A bit like phase diagrams. The closer a given position in the diagram is to the paradigmatic centre or border of a region, the more clearly the concept is associated with the corresponding sound object. For instance, the concepts of being impulsive or sustained can be considered as two phases defined in a multidimensional parametric space, including dimensions of attack, sustain and decay times.

The intratextual connections between components in a piece of music can be drawn by detecting similarities along particular parameters or even detecting repetition of the same or similar samples or synthesis types. Iterativeness can be considered as a particular type of succession of sub-components featuring such similarity. Sequential patterns of sounds can also be detected, as well as the iteration of such sequential patterns, as formalised in Thoresen’s theory.

The richness of this analysis needs to be made accessible to both musicologists and the public. In particular through the design of visualisation strategies. One visualisation follows the traditional unfolding of time from left to right, like scores, spectrograms and acousmographs, and shows the various constituents (or sound objects) with the depiction of their particularities through forms and colours. Interactivity allows one to browse through the various types of information to be displayed and to highlight the intratextual connections. The overall structure and form of the piece can be shown as well.

Such a “rolling” representation can be compared to another representation in which the elements currently being played are visible anywhere on the screen, and then simply disappear. “This method of presentation is much more natural and makes the display experiential rather than simply informative” [46]. For this second, “experiential” type of representation, the mapping strategy between music and visuals has so far been based on the display of specific simple forms or colours related to elementary musical aspects. The objective here is to make a more immersive visualisation, depicting the music as it unfolds in time with more richness.

Another application is to show a whole corpus of music in the form of a 2D or 3D interactive space where each piece is represented by one point. Intertextual analysis shows the relationships between pieces of music based on similar configurations. The pieces of music are distributed according to their features and can be clustered based on similarities and commonalities.

6 Conclusions

As I have tried to show through this overview, the dream of establishing a systematic, formalised and computerised analysis of electroacoustic music is on its way to becoming a reality. Considerable challenges remain, in particular, related to detecting sound objects and other basic constituents of the pieces of music to be analysed. Fortunately, much progress has been made concerning descriptions of the overall sound along various sound and music dimensions. Gathering a range of the state of the art in computational music analysis within a toolbox would make all the separate research accessible to a larger community. Offering the possibility to perform some approximate segmentation at the more basic levels, and to carry out all those analyses on the different individual objects, would interest musicologists.

This technological progress could enable, in the longer term, to automate analyses along Schafer’s physical morphology as well as Smalley’s spectro-morphology, and could also allow automation of graphical representations such as those proposed by Thoresen. On the other hand, any attempt at automation of Smalley’s motion typology or his functional or spatial approach, or any higher-level structural or functional analysis, would require much more work.

Through developing the “Toolbox des objets sonores,” accompanied by interactive interfaces for visualising and browsing music pieces and music catalogues, we hope to stimulate musicological interest in electroacoustic music. We have experienced that the visualisation of such music offers the general public new ways to enjoy the richness of this art. In addition, this would allow further scientific research around this topic in the domains of music psychology and music cognition in particular.