Influence of Expressive Speech on ASR Performances: Application to Elderly Assistance in Smart Home

  • Frédéric Aman
  • Véronique Aubergé
  • Michel Vacher
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9924)


Smart homes are discussed as a win-win solution for maintaining the Elderly at home as a better alternative to care homes for dependent elderly people. Such Smart homes are characterized by rich domestic commands devoted to elderly safety and comfort. The vocal command has been identified as an efficient, well accepted, interaction way, it can be directly addressed to the “habitat”, or through a robotic interface. In daily use, the challenges of vocal commands recognition are the noisy environment but moreover the reformulation and the expressive change of the strictly authorized commands. This paper focuses (1) to show, on the base of elicited corpus, that expressive speech, in particular distress speech, strongly affects generic state of the art ASR systems (20 to 30 %) (2) how interesting improvement thanks to ASR adaptation can regulate (15 %) this degradation. We conclude on the necessary adaptation of ASR system to expressive speech when they are designed for person’s assistance.


Expressive speech Distress call Ambient assisted living Home automation 



This study was supported by the French funding agencies ANR and CNSA through the project CIRDO - Industrial Research (ANR-2010-TECS-012). The authors would like to thank the persons who agreed to participate in recordings.


  1. 1.
    Aman, F., Auberge, V., Vacher, M.: How affects can perturbe the automatic speech recognition of domotic interactions. In: Workshop on Affective Social Speech Signals, pp. 1–5, Grenoble, France, August 2013Google Scholar
  2. 2.
    Aman, F., Vacher, M., Rossato, S., Portet, F.: Analysing the performance of automatic speech recognition for ageing voice: does it correlate with dependency level? In: 4th Workshop on Speech and Language Processing for Assistive Technologies, pp. 9–15, Grenoble, France, August 2013Google Scholar
  3. 3.
    Audibert, N.: Prosodie de la parole expressive: dimensionnalité d’énoncés méthodologiquement contrôlés authentiques et actés. Ph.D. thesis, INPG, Ecole Doctorale “Ingénierie pour la Santé, la Cognition et l’Environnement” (2008)Google Scholar
  4. 4.
    Chastagnol, C.: Reconnaissance automatique des dimensions affectives dans l’interaction orale homme-machine pour des personnes dépendantes. Ph.D. thesis, Université Paris Sud-Paris XI (2013)Google Scholar
  5. 5.
    Clarcke, A.C.: 2001: A Space Odyssey. New American Library, New York (1968)Google Scholar
  6. 6.
    Lamel, L., Gauvain, J., Eskenazi, M.: BREF, a large vocabulary spoken corpus for French. In: Proceedings of EUROSPEECH 1991, vol. 2, pp. 505–508, Geneva, Switzerland (1991)Google Scholar
  7. 7.
    Lecouteux, B., Vacher, M., Portet, F.: Distant speech recognition in a smart home: Comparison of several multisource ASRs in realistic conditions. In: 12th International Conference on Speech Science and Speech Technology (INTERSPEECH 2011), pp. 2273–2276, Florence, Italy, 28–31 August 2011Google Scholar
  8. 8.
    Peetoom, K., Lexis, M., Joore, M., Dirksen, C., De Witte, L.: Literature review on monitoring technologies and their outcomes in independently living elderly people. Disabil. Rehabil. Assist. Technol. 10, 271–294 (2014)CrossRefGoogle Scholar
  9. 9.
    Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)CrossRefGoogle Scholar
  10. 10.
    Scherer, K.R.: Vocal communication of emotion: a review of research paradigms. Speech Commun. 40(1–2), 227–256 (2003). CrossRefzbMATHGoogle Scholar
  11. 11.
    Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53(9–10), 1062–1087 (2011). CrossRefGoogle Scholar
  12. 12.
    Seymore, K., Stanley, C., Doh, S., Eskenazi, M., Gouvea, E., Raj, B., Ravishankar, M., Rosenfeld, R., Siegler, M., Stern, R., Thayer, E.: The 1997 CMU Sphinx-3 english broadcast news transcription system. DARPA Broadcast News Transcription and Understanding Workshop (1998)Google Scholar
  13. 13.
    Soury, M., Devillers, L.: Stress detection from audio on multiple window analysis size in a public speaking task. In: Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), pp. 529–533, September 2013Google Scholar
  14. 14.
    Vaudable, C.: Analyse et reconnaissance des émotions lors de conversations de centres d’appels. Ph.D. thesis, Université Paris Sud-Paris XI (2012)Google Scholar
  15. 15.
    Vidrascu, L.: Analyse et détection des émotions verbales dans les interactions orales. Ph.D. thesis, Université Paris Sud-Paris XI, Discipline: Informatique (2007)Google Scholar
  16. 16.
    Vlasenko, B., Prylipko, D., Philippou-Hübner, D., Wendemuth, A.: Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In: Proceedings of INTERSPEECH 2011, pp. 1577–1580 (2011)Google Scholar
  17. 17.
    Vlasenko, B., Prylipko, D., Wendemuth, A.: Towards robust spontaneous speech recognition with emotional speech adapted acoustic models. In: Proceedings of the KI 2012, pp. 103–107 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Frédéric Aman
    • 1
    • 2
  • Véronique Aubergé
    • 1
    • 2
  • Michel Vacher
    • 1
    • 2
  1. 1.LIGUniversité Grenoble AlpesGrenobleFrance
  2. 2.LIGCNRSGrenobleFrance

Personalised recommendations