Using the Audio Respiration Signal for Multimodal Discrimination of Expressive Movement Qualities

  • Vincenzo Lussu
  • Radoslaw Niewiadomski
  • Gualtiero Volpe
  • Antonio Camurri
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9997)


In this paper we propose a multimodal approach to distinguish between movements displaying three different expressive qualities: fluid, fragmented, and impulsive movements. Our approach is based on the Event Synchronization algorithm, which is applied to compute the amount of synchronization between two low-level features extracted from multimodal data. In more details, we use the energy of the audio respiration signal captured by a standard microphone placed near to the mouth, and the whole body kinetic energy estimated from motion capture data. The method was evaluated on 90 movement segments performed by 5 dancers. Results show that fragmented movements display higher average synchronization than fluid and impulsive movements.


Movement analysis Expressive qualities Respiration Synchronization 

1 Introduction

Expressive qualities of movement refer to how a movement is performed. The same movement can be performed with different qualities, e.g., in a fluid, fragmented, hesitant, impulsive, or contracted way. Expressive qualities are a very relevant aspect of dance, where e.g., they convey emotion to external observers. They also play an important role in rehabilitation, sport, and entertainment (e.g., in video-games). Several computational models and analysis techniques for assessing and measuring expressive movement qualities have been proposed (see e.g., [14] for a recent review), as well as algorithms to automatically detect and compute expressive qualities of a movement (e.g., [2]). In this paper, we propose a multimodal approach to analysis of expressive qualities of movement, integrating respiration and movement data. Whilst motion capture systems, often used to analyze human behavior, provide precise and accurate data on human motion, they are very invasive and cannot be used in several scenarios e.g., in artistic performance. In the long term, the multimodal technique discussed here for distinguishing between expressive qualities may make the use of motion capture systems dispensable.

Respiration is of paramount importance for body movement. Respiration is strongly related to any physical activity. The interaction between body movement and respiration is bidirectional. The respiration pattern may provoke certain visible body movements, e.g., in the case of laughter [15]. It can also be influenced by body movements, e.g., huddling oneself up corresponds to the expiration phase. Rhythm of respiration synchronizes with repetitive motoric activities such as running [8]. Several physical activities such as yoga or tai-chi explicitly connect physical movement to respiration patterns. In this work dance is taken as a use case. During a dance performance, dancers are used to display a huge variety of expressive qualities, and they dedicate a lot of effort and time to exercise their expressive vocabulary. Thus, one can expect that various performances by the same dancer, conveying different expressive qualities, can provide a solid ground to base our study upon.

In this paper, we hypothesize that different multimodal synchronization patterns can be observed for movements performed with different expressive qualities. Movements displaying different qualities such as fluid, fragmented, or impulsive movements engage different parts of the body to different extents. For example, whilst fluid movements are propagated along the kinematic chains of the body, impulsive and fragmented movements usually engage most of the body parts at once. Consequently, respiration patterns may be influenced by the expressive quality of movement. To confirm our hypothesis, we study intrapersonal synchronization between two features, one extracted from the audio signal of respiration and one from motion capture data.

The paper is organized as follows: in Sect. 2, we present existing works on analysis of human movement and of respiration signals; in Sect. 3, we describe the expressive qualities we study in this paper; Sect. 4 presents our dataset; Sect. 5 describes the techniques we developed and tested in the experiment presented in Sect. 6; we conclude the paper in Sect. 7.

2 State of the Art

Several works analyzed respiration in sport activities such as walking and running [4, 8], and rowing [3]. Respiration data was also used to detect emotions [11]. Bernasconi and Kohl [4] studied the effect of synchronization between respiration rhythm and legs movement rhythm to analyze efficiency in physical activities such as running or cycling. They measured synchronization as a percentage of the coincidence between the beginning of a respiration phase and the beginning of a step (or a pedaling cycle). According to their results, the higher is synchronization the higher is efficiency and the lower is consumption of oxygen.

Bateman and colleagues [3] measured synchronization between the start of a respiration phase, and the phase of a stroke in rowing by expert and non-expert rowers. Respiration phases were detected with a nostril thermistor, whereas the stroke phase (1 out of 4) was detected from the spinal kinematics and the force applied to the rowing machine. The higher synchronization the higher stroke rate was observed for expert rowers. Additionally, the most frequently observed pattern consisted of two breath cycles per stroke.

Schmid and colleagues [22] analyzed synchronization between postural sway and respiration patterns captured with a respiratory belt at chest level. A difference was observed in respiration frequency and amplitude between sitting and standing position.

In most of the works that consider respiration, data is captured with respiration sensors such as belt-like strips placed on the chest, or other very dedicated devices. An example of such a device is the CO2100C module by Biopac1 that measures the quantity of \(CO_2\) in the exhaled air. This sensor is able to detect even very short changes of carbon dioxide concentration levels. Unfortunately, this method is very invasive, thus several alternative solutions were proposed (see [7, 20] for recent reviews). Folke and colleagues [7] proposed three major categories of measurements for the respiration signal:
  • movement, volume, and tissue composition measurements, e.g., transthoracic impedance measured with skin electrodes placed on chest;

  • air flow measurements, e.g., nasal thermistors;

  • blood gas concentration measurements, e.g., the pulse - oximetry method that measures oxygen saturation in blood.

Several works focused on tracheal signals. Huq and colleagues [9] distinguished the respiration phases using the average power and log-variance of the band-pass filtered tracheal breath. In particular, the strongest differences between the two respiratory phases were found in the 300–450 Hz and 800–1000 Hz bands for average power and log-variance respectively. Jin and colleagues [10] segmented breath using tracheal signals through genetic algorithms.

Another popular approach is to use Inertial Measurement Units (IMUs). In [13] a single IMU sensor was placed on the person’s abdomen and it was used to extract the respiration pattern. The raw signal captured with the IMU device was filtered with an adaptive filter based on energy expenditure (EE) to remove frequencies that are not related to respiration. Three classes of activities were considered: Low EE (e.g., sitting) Medium EE (e.g., walking), and High EE (e.g., running).

Some works used the audio signal of respiration captured with a microphone placed near the mouth. In [1], the audio of respiration was used to detect the respiration phases. For this purpose, authors first isolated the respiration segments using a Voice Activity Detection (VAD) algorithm based on short time energy (STE). Next, they computed Mel-frequency cepstrum coefficients (MFCC) of respiration segments, and they used MFCC and a linear thresholding to distinguish between the two respiration phases. Yahya and colleagues [23] also classified respiration phases from audio data. Again, a VAD algorithm was applied to the audio signal to extract the respiration segments. Next several low-level audio features extracted from the segments were used by a Support Vector Machine (SVM) classifier to separate the exhilaration segments from the inspiration ones.

Ruinskiy and colleagues [21] aimed to separate respiration segments from voice segments in audio recordings. First, they created a respiration template using a mean cepstrogram matrix for each participant. Next they used it to compute a similarity measurement between the template and an input segment in order to classify the input segment as a breathy/not breathy one.

Compared to the state-of-the-art, this work brings the following contributions:
  • according to the authors’ knowledge, this is the first work that uses information extracted from the respiration data to distinguish between different expressive qualities of movement,

  • contrary to most of previous works, we use a standard microphone to study respiration and we capture respiration data from the microphone placed near to the mouth. This approach is appropriate to capture e.g., dancers respiration patterns, because dancers do not speak during a performance, but they move a lot and cannot wear invasive devices. Our approach is less invasive than other approaches based on other respiration sensors.

3 Definitions of Expressive Qualities

Recently, Camurri and colleagues [5] proposed a conceptual framework conceived for analysis of expressive content conveyed by whole body movement and gesture. The framework consists of four layers: the first one is responsible for capturing and preprocessing data from sensor systems, including video, motion capture, and audio. The second one computes low-level motion features such as energy or smoothness at a small time scale (i.e., frame by frame or over short time windows e.g., 100 ms–150 ms long) from such data. The third layer computes mid-level features such as fluidity, impulsivity, and so on, i.e., complex higher-level qualities that are usually extracted on groups of joints or on the whole body, and require significantly longer temporal intervals to be detected (i.e., 0.5 s–3 s). Finally, the fourth layer corresponds to even higher-level communicative expressive qualities, such as the user’s emotional states and social attitudes. Following this framework, in this paper we focus on three expressive movement qualities belonging to the third layer, i.e., fluid, fragmented, and impulsive movements. These three qualities at layer 3 are modeled in terms of features at layers 1 and 2. Below, we recall the definitions of these qualities.

Fluid Movement. A fluid movement is characterized by the following properties [18]:
  • the movement of each involved body joint is smooth;

  • the energy of movement (energy of muscles) is free to propagate along the kinematic chains of (parts of) the body according to a coordinated wave-like propagation.

Fluidity is a major expressive quality in classical dance and ballet. Outside the dance context, fluid movements are, for example, body movements as in the butterfly swimming technique, or moving as a fish in the water.

Fragmented Movement. A fragmented movement is characterized by:
  • non coordinated propagation of energy between adjacent body joints, i.e., only “bursts” of propagation are observed;

  • autonomous movements of different parts of the body, i.e., autonomous and non (or lowly) correlated sequences of “free” followed by “bound” movements (in Laban’s Effort terms [12]): typically, a joint alternates a free movement (e.g., for a short time interval) with a “bound” movement;

  • joints movements are neither synchronized nor coordinated among themselves: an observer perceives the whole body movement as composed by parts of the body obeying to separate and independent motor planning strategies, with no unified, coherent, and harmonic global movement.

Such movements are typical, for example, in contemporary dance.

Impulsive Movement. An impulsive movement is characterized by [16]:
  • a sudden and non predictable change of velocity;

  • no preparation phase.

Examples of impulsive movements are avoidance movements (e.g., when hearing a sudden and unexpected noise) or a movement to recover from a loss of balance. It is important to notice that impulsivity is different from high kinetic energy. Quick but repetitive movements are not impulsive.

4 Experimental Setup

We collected a set of short performances of dancers asked to perform whole body movements with a requested expressive quality. Five female dancers were invited to participate in the recordings. They performed short performances focusing on one of the three selected expressive qualities. Each trial had a duration of 1.5 to 2 min. At the beginning of each session, dancers were given definitions of the expressive quality by means of textual images (more details on the recording procedure are available in [17]). The dancers were asked to perform: (i) an improvised choreography containing movements that, in their opinion, express the quality convincingly, as well as (ii) several repetitions of predefined sequences of movements by focusing on the given expressive quality.

A custom procedure was defined to obtain and record several impulsive movements: the blindfolded dancer was induced to express this quality by an external event (e.g., an unexpected touch). When she perceived a touch on her body, she had to imagine that she was touched by a hot stick that she had to avoid. Thus, for impulsive trials, the dancer was, by default, performing fluid movements and impulsive movements appears only when she is touched (for more details see [17]). Each quality was performed by two different dancers.

We recorded multimodal data captured with (i) a Qualisys motion capture system, tracking 6 single markers and 11 rigid bodies (10 on the body and 1 on the head) plates at 100 frames per second; resulting data consists of the 3D positions of 60 markers; (ii) one wireless microphone (mono, 48 kHz) placed close to the dancer’s mouth, recording the sound of respiration; (iii) 2 video cameras (\(1280\times 720\), at 50 fps).

The freely available EyesWeb XMI platform, developed at University of Genoa2, was used for synchronized recording and analysis of the multimodal streams of data. Motion capture data was cleaned, missing data was filled using linear and polynomial interpolation.

5 Analysis Techniques

Our aim is to check whether the overall amount of synchronization between low-level features from movement and from the audio signal of respiration enable us to distinguish between the three selected expressive qualities. In more details, we consider one audio feature: the energy of the audio signal, and one movement feature: the kinetic energy of the whole body movement. These features were chosen as they can be easily computed in real-time. In the future, we plan to estimate kinetic energy with sensors, which are less invasive than a motion capture system, such as IMUs. We define events to be extracted from the time-series of the low-level features and then we apply the Event Synchronization algorithm [19] to compute the amount of synchronization between the events detected in the two energy time-series. Figure 1 shows the details of our approach.
Fig. 1.

Block diagram of the analysis procedure. Event Synchronization takes as input events detected in the time-series of energy of the audio signal of respiration and in the time-series of kinetic energy from motion capture data.

5.1 Feature Extraction

The audio signal was segmented in frames of 1920 samples. To synchronize the motion capture data with the audio signal, the former was undersampled at 25 fps. Next, body and audio features were computed separately at this sampling rate.

Motion Capture. Motion capture data was used to compute one movement feature: kinetic energy. This feature was computed in two stages: first, 17 markers from the initial set of 60 were used to compute the instantaneous kinetic energy frame-by-frame. The velocities of single body markers contribute to the instantaneous kinetic energy according to the relative weight of the corresponding body parts as retrieved in anthropometric tables [6]. In the second step, the envelope of the instantaneous kinetic energy was extracted using an 8-frames buffer.

Respiration. The instantaneous energy of the audio signal was computed using Root Mean Square (RMS). This returns one value for every input frame. Next, we extracted the envelope of the instantaneous audio energy using an 8-frames buffer.

5.2 Synchronization

The Event Synchronization (ES) algorithm, proposed by Quian Quiroga and colleagues [19], is used to measure synchronization between two time series in which some events are identified. Let us consider two time-series of features: \(x_{1}\) and \(x_{2}\). For each time-series \(x_{i}\) let us define \(t^{x_i}\) as the time occurrences of events in \(x_{i}\). Thus, \(t^{x_i}_j\) is the time of the j-th event in time-series \(x_{i}\). Let \(m_{x_i}\) be the number of events in \(x_{i}\). Then, the amount of synchronization \(Q^\tau \) is computed as:
$$\begin{aligned} Q^{\tau } = \frac{{c}^{\tau }(x_{1}|x_{2}) + {c}^{\tau }(x_{2}|x_{1})}{\sqrt{{m}_{x_1}{m}_{x_2}}}\end{aligned}$$
$$\begin{aligned} {c}^{\tau }(x_{1}|x_{2}) = \sum _{i=1}^{{m}_{x_1}}\sum _{j=1}^{{m}_{x_2}}{J}^{\tau }_{ij} \end{aligned}$$
$$\begin{aligned} {J}^{\tau }_{ij} = {\left\{ \begin{array}{ll} 1 &{} {if \quad } 0< {t_i}^{x_1} - {t_j}^{x_2} < \tau \\ 1/2 &{} {if \quad } {t_i}^{x_1} = {t_j}^{x_2} \\ 0 &{} {otherwise} \end{array}\right. } \end{aligned}$$
\(\tau \) defines the length of the synchronization window. Thus, events contribute to the overall amount of synchronization, only if they occur in a \(\tau \)-long window.

In order to apply the ES algorithm to our data, two steps were needed: (i) defining and retrieving events in our two time-series, and (ii) tuning the parameters of the ES algorithm.

Events Definition. We defined as events the peaks (local maxima) of kinetic and audio energy. To extract peaks, we applied a peak detector algorithm that computes the position of peaks in an N-size buffer, given a threshold \(\alpha \) defining the minimal relative “altitude” of a peak. That is, at time p, the local maximum \(x_p\) is considered a peak if the preceding and the following local maxima \(x_i\) and \(x_j\) are such that \( x_i + \alpha < x_p\) and \(x_j + \alpha < x_p\), \(i< p < j\), and there is no other local maximum \(x_k\), such that \(i< k < j\). We empirically chose the buffer size to be 10 frames (corresponding to 400ms) and \(\alpha \) = 0.4465. Figure 2 shows three excerpts of the two time-series, representing an example of fluid, fragmented, and impulsive movement respectively, and the events the peak detector extracted.
Fig. 2.

Three excerpts of the two time-series of energy (audio energy and kinetic energy), representing an example of fluid, fragmented, and impulsive movement respectively (lower panel), and the events extracted from the two time-series and provided as input to the ES algorithm (upper panel).

Algorithm Tuning. Next, the ES algorithm was applied to the events identified in the previous step. At each execution, the ES algorithm works on a sliding window of the data and it computes one value – the amount of synchronization \(Q^\tau \). In our case, the value of ES is reset at every sliding window. Thus, the past values of ES do not affect the current output. The algorithm has two parameters: the size of the sliding window \(dim_{sw}\) and \(\tau \). The size of the sliding window was set to 20 samples (corresponding to 800 ms at 25 fps). This value was chosen as the breath frequency of a moving human is in between 35 and 45 cycles per minute. Thus, 800 ms corresponds to half of one breath. We analyzed multimodal synchronization with all \(\tau \) in interval \([4, dim_{sw}*0.5]\) (i.e., not higher than half of the size of the sliding window \(dim_{sw}\)).

6 Data Analysis and Results

To check whether our approach can distinguish between the three selected expressive qualities, we analyzed the data described in Sect. 4. Our hypothesis is: there are significant differences in the synchronization between the peaks of energy in respiration audio and body movement among the three expressive qualities. At the same time, we expect that there is no significant difference in the synchronization between the peaks of energy in respiration audio and body movement among the different dancers within one single expressive quality. We describe the details below.

6.1 Dataset

Two experts in the domain of expressive movement analysis segmented the data. They selected segments where only one out of the three expressive qualities was clearly observable. Thus, segmentation was based not only on the dancer’s expressive intention, but also on the observer’s perception of the expressive quality the dancer displayed. Additionally, it was checked whether the audio segments contained only the respiration sounds (segments should not contain any noise that may occasionally occur during the recordings e.g., by unintentionally touching the microphone). The resulting dataset is composed of 90 segments3 of multimodal data by five dancers. Total duration is 9 min and 20 s. Segments are grouped into three sets, according to the expressive quality they display:
  • Fluid Movements Set (FluidMS) consisting of 15 segments by Dancer 1 and 15 segments by Dancer 2 (average segment duration 7.615 s, \(\mathrm{sd} = 3.438\) s);

  • Fragmented Movements Set (FragMS) consisting of 15 segments by Dancer 3 and 15 segments by Dancer 4 (average segment duration 5.703 s, \(\mathrm{sd} = 2.519\) s);

  • Impulsive Movements Set (ImplMS) consisting of 15 segments by Dancer 1 and 15 segments by Dancer 5 (average segment duration 5.360 s, \(\mathrm{sd} = 1.741\) s).

Due to the complexity of the recording procedure, at the moment we do not have data of one single dancer performing movements displaying all the three expressive qualities. To limit the effect of the particular dancer’s personal style we use, for each quality, segments performed by two different dancers.

6.2 Results

For each segment and each considered value of \(\tau \), we computed the average value (\(AvgQ^\tau \)) of the amount of synchronization \(Q^\tau \) on the whole segment. Next, we computed the mean and standard deviation of \(AvgQ^\tau \) separately for all fluid, fragmented, and impulsive segments (see the 4th column of Tables 1, 2 and 3).

To check for differences between the amount of synchronization in the segments in FluidMS, FragMS, and ImplMS, we applied ANOVA with one independent variable, Quality, and one dependent variable, \(AvgQ^{\tau }\). All post hoc comparisons were carried out by using the LSD test with Bonferroni correction. Similar results were obtained for all the tested \(\tau \). A significant main effect of Quality for \(\tau = 4\) was observed, \(F(2, 87) = 10.973, p < .001\). Post hoc comparisons indicated that multimodal synchronization in fragmented movements was significantly higher compared to the impulsive (\(p < .01\)), and fluid ones (\(p < .001\)). A significant main effect of Quality for \(\tau = 6\) was also observed, \(F(2, 87) = 9.650, p < .001\). Post hoc comparisons showed that multimodal synchronization in fragmented movements was significantly higher than in impulsive (\(p < .01\)), and fluid ones (\(p < .001\)). A significant main effect of Quality for \(\tau = 8\) was also observed, \(F(2, 87) = 6.903, p < .01\). Post hoc comparisons indicated that multimodal synchronization in fragmented movements was significantly higher compared to the impulsive (\(p < .05\)), and fluid ones (\(p < .01\)). A significant main effect of Quality for \(\tau = 10\) was also observed, \(F(2, 87) = 6.929, p < .01\). Post hoc comparisons indicated again that multimodal synchronization in fragmented movements was significantly higher compared to the impulsive (\(p < .05\)), and fluid ones (\(p < .01\)).
Table 1.

Mean and standard deviation of \(AvgQ^\tau \) for fluid movements.

\(\tau \)

Dancer 1

Dancer 2


\(\tau =4\)

0.185 (0.090)

0.188 (0.135)

0.187 (0.113)

\(\tau =6\)

0.284 (0.121)

0.298 (0.165)

0.291 (0.143)

\(\tau =8\)

0.352 (0.124)

0.378 (0.153)

0.365 (0.138)

\(\tau =10\)

0.392 (0.116)

0.419 (0.160)

0.406 (0.137)

Table 2.

Mean and standard deviation of \(AvgQ^\tau \) for fragmented movements.

\(\tau \)

Dancer 3

Dancer 4


\(\tau =4\)

0.415 (0.186)

0.292 (0.123)

0.354 (0.167)

\(\tau =6\)

0.512 (0.191)

0.395 (0.134)

0.454 (0.173)

\(\tau =8\)

0.552 (0.170)

0.438 (0.111)

0.495 (0.153)

\(\tau =10\)

0.590 (0.167)

0.473 (0.105)

0.532 (0.149)

Table 3.

Mean and standard deviation of \(AvgQ^\tau \) for impulsive movements.

\(\tau \)

Dancer 1

Dancer 5


\(\tau =4\)

0.241 (0.191)

0.208 (0.093)

0.225 (0.149)

\(\tau =6\)

0.353 (0.171)

0.292 (0.095)

0.323 (0.139)

\(\tau =8\)

0.379 (0.180)

0.394 (0.106)

0.387 (0.145)

\(\tau =10\)

0.425 (0.147)

0.437 (0.111)

0.431 (0.128)

Next, for each Quality and each \(\tau \) we checked whether there are significant differences between the dancers using independent samples t-tests. Similar results were found for most of the \(\tau \) values (see Tables 1, 2 and 3). For \(\tau = 4\), there was no significant difference for the FluidMS segments (two tailed, \(t = -.083\), \(df = 28\), \(p = .934\)), and for the ImplMS segments (two tailed, \(t = .600\), \(df = 28\), \(p = .553\)). A significant difference between the two dancers was only observed for the FragMS segments (two tailed, \(t = 2.151\), \(df = 24.26\), \(p < .05\), corrected because of significance of the Levene’s test). For \(\tau = 6\), no significant differences were observed (FluidMS: two tailed, \(t = -.267\), \(df = 28\), \(p = .791\); ImplMS: two tailed, \(t = 1.200\), \(df = 28\), \(p = .233\); FragMS: two tailed, \(t = 1.950\), \(df = 28\), \(p = .061\)). For \(\tau = 8\), there was no significant difference between dancers for the FluidMS segments (two tailed, \(t = -.498\), \(df = 28\), \(p = .623\)), and for the ImplMS segments (two tailed, \(t =.288\), \(df = 28\), \(p = .775\)). A significant difference between the two dancers was, however, observed for the FragMS segments (two tailed, \(t = 2.193\), \(df = 28\), \(p < .05\)). For \(\tau = 10\), there was no significant difference neither for the FluidMS segments (two tailed, \(t = -.546\), \(df = 28\), \(p = .589\)), nor for the ImplMS ones (two tailed, \(t = -.247\), \(df = 28\), \(p = .807\)). A significant difference between the two dancers was observed for the FragMS segments (two tailed, \(t = 2.303\), \(df = 28\), \(p < .05\)).
Fig. 3.

Box plots of the amount of synchronization for fluid, fragmented, and impulsive movements, respectively.

6.3 Discussion

According to the results, our hypothesis was confirmed as multimodal synchronization between the energy of the audio signal of respiration and the kinetic energy of whole body movement allowed us to distinguish between the selected expressive qualities. In particular, audio respiration and kinetic energy were found to be more synchronized in fragmented movements than in impulsive and fluid movements. There was no significant difference between impulsive and fluid movements. This might be due to the type of exercise we asked the dancers to perform. In most impulsive segments, dancers were asked to perform one impulsive movement (e.g., when they get touched) while they were moving in a fluid way. Thus, even if the impulsive segments were rather short, it cannot be excluded that the average amount of synchronization in impulsive movements was also affected by the dancer’s fluid movements performed before and after the impulsive reaction to the external stimulus.

Significant differences between dancers were found only for fragmented movements. It should be noticed, however, that in the case of fragmented movements the average amount of synchronization for any out of 2 considered dancers was much higher than the average synchronization values for any other quality and dancer.

7 Conclusion and Future Work

In this paper, we proposed a novel approach for distinguishing between expressive qualities of movement from multimodal data, consisting of features from the audio signal of respiration and from body movement. According to the results, our technique – based on the Event Synchronization algorithm – was successful in distinguishing between fragmented and other movements.

This is ongoing work: our long-term aim is to detect different expressive qualities without using a motion capture system. For this purpose, we plan to use data from IMU sensors placed on the dancer’s limbs, and to estimate her kinetic energy (and possible further features) using these input devices. This would allow us to eliminate the need of using motion capture systems. At the same time, we want to extract from audio more precise information about the respiration phase. We also plan to study further audio features e.g., MFCC, that proved successful in detection of respiration phases [1].

The results of this work will be exploited in the framework of the EU-H2020 ICT Project DANCE4, which aims at investigating how sound and music can express, represent, and analyze the affective and relational qualities of body movement. To transfer vision into sound, however, models and techniques are needed to understand what we see when we observe the expressive qualities of a movement. The work presented here is a step toward multimodal analysis of expressive qualities of movement and is propaedeutic to their multi- and cross-sensorial translation.




This research has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 645553 (DANCE). DANCE investigates how affective and relational qualities of body movement can be expressed, represented, and analyzed by the auditory channel.

We thank our collegues at Casa Paganini - InfoMus Paolo Alborno, Corrado Canepa, Paolo Coletta, Nicola Ferrari, Simone Ghisio, Maurizio Mancini, Alberto Massari, Ksenia Kolykhalova, Stefano Piana, and Roberto Sagoleo for the fruitful discussions and for their invaluable contributions in the design of the multimodal recordings, and the dancers Roberta Messa, Federica Loredan, and Valeria Puppo for their kind availability to participate in the recordings of our repository of movement qualities.


  1. 1.
    Abushakra, A., Faezipour, M.: Acoustic signal classification of breathing movements to virtually aid breath regulation. IEEE J. Biomed. Health Inf. 17(2), 493–500 (2013)CrossRefGoogle Scholar
  2. 2.
    Alborno, P., Piana, S., Mancini, M., Niewiadomski, R., Volpe, G., Camurri, A.: Analysis of intrapersonal synchronization in full-body movements displaying different expressive qualities. In: Proceedings of the International Working Conference on Advanced Visual Interfaces, AVI 2016, New York, pp. 136–143 (2016).
  3. 3.
    Bateman, A., McGregor, A., Bull, A., Cashman, P., Schroter, R.: Assessment of the timing of respiration during rowing and its relationship to spinal kinematics. Biol. Sport 23, 353–365 (2006)Google Scholar
  4. 4.
    Bernasconi, P., Kohl, J.: Analysis of co-ordination between breathing and exercise rhythms in man. J. Physiol. 471, 693–706 (1993)CrossRefGoogle Scholar
  5. 5.
    Camurri, A., Volpe, G., Piana, S., Mancini, M., Niewiadomski, R., Ferrari, N., Canepa, C.: The dancer in the eye: towards a multi-layered computational framework of qualities in movement. In: 3rd International Symposium on Movement and Computing, MOCO 2016 (2016)Google Scholar
  6. 6.
    Dempster, W.T., Gaughran, G.R.L.: Properties of body segments based on size and weight. Am. J. Anat. 120(1), 33–54 (1967).
  7. 7.
    Folke, M., Cernerud, L., Ekström, M., Hök, B.: Critical review of non-invasive respiratory monitoring in medical care. Med. Biol. Eng. Comput. 41(4), 377–383 (2003)CrossRefGoogle Scholar
  8. 8.
    Hoffmann, C.P., Torregrosa, G., Bardy, B.G.: Sound stabilizes locomotor-respiratory coupling and reduces energy cost. PLoS ONE 7(9), e45206 (2012)CrossRefGoogle Scholar
  9. 9.
    Huq, S., Yadollahi, A., Moussavi, Z.: Breath analysis of respiratory flow using tracheal sounds. In: 2007 IEEE International Symposium on Signal Processing and Information Technology, pp. 414–418 (2007)Google Scholar
  10. 10.
    Jin, F., Sattar, F., Goh, D., Louis, I.M.: An enhanced respiratory rate monitoring method for real tracheal sound recordings. In: 2009 17th European Signal Processing Conference, pp. 642–645 (2009)Google Scholar
  11. 11.
    Kim, J., Andre, E.: Emotion recognition based on physiological changes in music listening. IEEE Trans. Pattern Anal. Mach. Intell. 30(12), 2067–2083 (2008)CrossRefGoogle Scholar
  12. 12.
    Laban, R., Lawrence, F.C.: Effort. Macdonald & Evans, London (1947)Google Scholar
  13. 13.
    Liu, G., Guo, Y., Zhu, Q., Huang, B., Wang, L.: Estimation of respiration rate from three-dimensional acceleration data based on body sensor network. Telemed. J. e-Health 17(9), 705–711 (2011)CrossRefGoogle Scholar
  14. 14.
    Niewiadomski, R., Mancini, M., Piana, S.: Human and virtual agent expressive gesture quality analysis and synthesis. In: Rojc, M., Campbell, N. (eds.) Coverbal Synchrony in Human-Machine Interaction, pp. 269–292. CRC Press (2013)Google Scholar
  15. 15.
    Niewiadomski, R., Mancini, M., Ding, Y., Pelachaud, C., Volpe, G.: Rhythmic body movements of laughter. In: Proceedings of the 16th International Conference on Multimodal Interaction. ICMI 2014, New York, pp. 299–306 (2014).
  16. 16.
    Niewiadomski, R., Mancini, M., Volpe, G., Camurri, A.: Automated detection of impulsive movements in HCI. In: Proceedings of the 11th Biannual Conference on Italian SIGCHI Chapter, CHItaly 2015, New York, pp. 166–169 (2015).
  17. 17.
    Piana, S., Coletta, P., Ghisio, S., Niewiadomski, R., Mancini, M., Sagoleo, R., Volpe, G., Camurri, A.: Towards a multimodal repository of expressive movement qualities in dance. In: 3rd International Symposium on Movement and Computing, MOCO 2016 (2016).
  18. 18.
    Piana, S., Alborno, P., Niewiadomski, R., Mancini, M., Volpe, G., Camurri, A.: Movement fluidity analysis based on performance and perception. In: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, CHI EA 2016, New York, pp. 1629–1636 (2016).
  19. 19.
    Quian Quiroga, R., Kreuz, T., Grassberger, P.: Event synchronization: a simple and fast method to measure synchronicity and time delay patterns. Phys. Rev. E 66, 041904.
  20. 20.
    Rao, K.M., Sudarshan, B.: A review on different technical specifications of respiratory rate monitors. IJRET: Int. J. Res. Eng. Technol. 4(4), 424–429 (2015)CrossRefGoogle Scholar
  21. 21.
    Ruinskiy, D., Lavner, Y.: An effective algorithm for automatic detection and exact demarcation of breath sounds in speech and song signals. IEEE Trans. Audio Speech Lang. Process. 15(3), 838–850 (2007)CrossRefGoogle Scholar
  22. 22.
    Schmid, M., Conforto, S., Bibbo, D., D’Alessio, T.: Respiration and postural sway: detection of phase synchronizations and interactions. Hum. Mov. Sci. 23(2), 105–119 (2004)CrossRefGoogle Scholar
  23. 23.
    Yahya, O., Faezipour, M.: Automatic detection and classification of acoustic breathing cycles. In: 2014 Zone 1 Conference of the American Society for Engineering Education (ASEE Zone 1), pp. 1–5, April 2014Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Vincenzo Lussu
    • 1
  • Radoslaw Niewiadomski
    • 1
  • Gualtiero Volpe
    • 1
  • Antonio Camurri
    • 1
  1. 1.Casa Paganini - InfoMusDIBRIS - University of GenoaGenoaItaly

Personalised recommendations