Smart Health pp 161-188 | Cite as

On Distant Speech Recognition for Home Automation

  • Michel Vacher
  • Benjamin Lecouteux
  • François Portet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8700)


In the framework of Ambient Assisted Living, home automation may be a solution for helping elderly people living alone at home. This study is part of the Sweet-Home project which aims at developing a new home automation system based on voice command to improve support and well-being of people in loss of autonomy. The goal of the study is vocal order recognition with a focus on two aspects: distance speech recognition and sentence spotting. Several ASR techniques were evaluated on a realistic corpus acquired in a 4-room flat equipped with microphones set in the ceiling. This distant speech French corpus was recorded with 21 speakers who acted scenarios of activities of daily living. Techniques acting at the decoding stage, such as our novel approach called Driven Decoding Algorithm (DDA), gave better speech recognition results than the baseline and other approaches. This solution which uses the two best SNR channels and a priori knowledge (voice commands and distress sentences) has demonstrated an increase in recognition rate without introducing false alarms. Generally speaking, a short overview allows then to outline the research challenges that speech technologies must take up for Ambient Assisted Living and Augmentative and Alternative Communication, and the current reseach avenues in this domain.


Distant speech recognition Keyword detection Triggered language models Home automation Smart home Application of speech processing for assistive technologies Ambient assisted living 



This work is part of the Sweet-Home project supported by the French National Research Agency (Agence Nationale de la Recherche / ANR-09-VERS-011).


  1. 1.
    Chan, M., Estève, D., Escriba, C., Campo, E.: A review of smart homes- present state and future challenges. Comput. Methods Programs Biomed. 91(1), 55–81 (2008)CrossRefGoogle Scholar
  2. 2.
    Vacher, M., Portet, F., Rossato, S., Aman, F., Golanski, C., Dugheanu, R.: Speech-based interaction in an AAL context. Gerontechnology 11(2), 310 (2012)Google Scholar
  3. 3.
    Vacher, M., Portet, F., Fleury, A., Noury, N.: Development of audio sensing technology for ambient assisted living: applications and challenges. Int. J. E-Health Med. Commun. 2(1), 35–54 (2011)CrossRefGoogle Scholar
  4. 4.
    Katz, S., Akpom, C.: A measure of primary sociobiological functions. J. Health Serv. 6(3), 493508 (1976)Google Scholar
  5. 5.
    Badii, A., Boudy, J.: CompanionAble - integrated cognitive assistive & domotic companion robotic systems for ability and security. In: 1st Congrés of the Société Française des Technologies pour l’Autonomie et de Gérontechnologie (SFTAG 2009), pp. 18–20, Troyes (2009)Google Scholar
  6. 6.
    Filho, G., Moir, T.: From science fiction to science fact: a smart-house interface using speech technology and a photorealistic avatar. Int. J. Comput. Appl. Technol. 39(8), 32–39 (2010)CrossRefGoogle Scholar
  7. 7.
    Gödde, F., Möller, S., Engelbrecht, K.P., Kühnel, C., Schleicher, R., Naumann, A., Wolters, M.: Study of a speech-based smart home system with older users. In: International Workshop on Intelligent User Interfaces for Ambient Assisted Living pp. 17–22 (2008)Google Scholar
  8. 8.
    Hamill, M., Young, V., Boger, J., Mihailidis, A.: Development of an automated speech recognition interface for personal emergency response systems. J. NeuroEngineering Rehabil. 6(1), 26 (2009)CrossRefGoogle Scholar
  9. 9.
    Vacher, M., Chahuara, P., Lecouteux, B., Istrate, D., Portet, F., Joubert, T., Sehili, M.E.A., Meillon, B., Bonnefond, N., Fabre, S., Roux, C., Caffiau, S.: The SWEET-HOME project: audio technology in smart homes to improve well-being and reliance. In: 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2013), Osaka, Japan, pp. 7298–7301, July 2013Google Scholar
  10. 10.
    Portet, F., Vacher, M., Golanski, C., Roux, C., Meillon, B.: Design and evaluation of a smart home voice interface for the elderly: acceptability and objection aspects. Pers. Ubiquit. Comput. 17(1), 127–144 (2013)CrossRefGoogle Scholar
  11. 11.
    López-Cózar, R., Callejas, Z.: Multimodal dialogue for ambient intelligence and smart environments. In: Nakashima, H., Aghajan, H., Augusto, J.C. (eds.) Handbook of Ambient Intelligence and Smart Environments, pp. 559–579. Springer, Berlin (2010)CrossRefGoogle Scholar
  12. 12.
    Koskela, T., Väänänen-Vainio-Mattila, K.: Evolution towards smart home environments: empirical evaluation of three user interfaces. Pers. Ubiquit. Comput. 8, 234–240 (2004)CrossRefGoogle Scholar
  13. 13.
    Vacher, M., Portet, F., Fleury, A., Noury, N.: Challenges in the processing of audio channels for ambient assisted living. In: IEEE HealthCom 2010, Lyon, France, pp. 330–337, 1–3 July 2010Google Scholar
  14. 14.
    Mäyrä, F., Soronen, A., Vanhala, J., Mikkonen, J., Zakrzewski, M., Koskinen, I., Kuusela, K.: Probing a proactive home: challenges in researching and designing everyday smart environments. Hum. Technol. 2, 158–186 (2006)Google Scholar
  15. 15.
    Edwards, W., Grinter, R.: At home with ubiquitous computing: seven challenges. In: Abowd, G., Brumitt, B., Shafer, S. (eds.) Ubicomp 2001. LNCS, vol. 2201, pp. 256–272. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  16. 16.
    Wölfel, M., McDonough, J.W.: Distant Speech Recognition. Wiley, New York (2009)CrossRefGoogle Scholar
  17. 17.
    Deng, L., Acero, A., Plumpe, M., Huang, X.: Large-vocabulary speech recognition under adverse acoustic environments. In: ICSLP-2000, vol. 3, pp. 806–809. ISCA, Beijing, China (2000)Google Scholar
  18. 18.
    Baba, A., Lee, A., Saruwatari, H., Shikano, K.: Speech recognition by reverberation adapted acoustic model. In: ASJ General Meeting, pp. 27–28 (2002)Google Scholar
  19. 19.
    Michaut, F., Bellanger, M.: Filtrage adaptatif: théorie et algorithmes. Hermes Science Publication, Lavoisier (2005)Google Scholar
  20. 20.
    Valin, J.M.: On adjusting the learning rate in frequency domain echo cancellation with double talk. IEEE Trans. Acoust. Speech Signal Process. 15(3), 1030–1034 (2007)Google Scholar
  21. 21.
    Vacher, M., Fleury, A., Guirand, N., Serignat, J.F., Noury, N.: Speech recognition in a smart home: some experiments for telemonitoring. In: Corneliu Burileanu, H.N.T. (ed.) From Speech Processing to Spoken Language Technology, pp. 171–179. Publishing House of the Romanian Academy, Constanta (2009)Google Scholar
  22. 22.
    Vacher, M., Fleury, A., Serignat, J.F., Noury, N., Glasson, H.: Preliminary evaluation of speech/sound recognition for telemedicine application in a real environment. In: Proceedings of the InterSpeech, pp. 496–499 (2008)Google Scholar
  23. 23.
    Reidel, K., Tamblyn, R., Patel, V., Huang, A.: Pilot study of an interactive voice response system to improve medication refill compliance. BMC Med. Inform. Decis. Mak. 8, 46 (2008)CrossRefGoogle Scholar
  24. 24.
    Vacher, M., Lecouteux, B., Chahuara, P., Portet, F., Meillon, B., Bonnefond, N.: The Sweet-Home speech and multimodal corpus for home automation interaction. In: The 9th edition of the Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland, pp. 4499–4506 (2014)Google Scholar
  25. 25.
    Nocera, P., Linares, G., Massonié, D., Lefort, L.: Phoneme lattice based A* search algorithm for speech recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 301–308. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  26. 26.
    Aman, F., Vacher, M., Rossato, S., Portet, F.: Speech recognition of aged voices in the AAL context: detection of distress sentences. In: The 7th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2013, Cluj-Napoca, Romania, pp. 177–184 (2013)Google Scholar
  27. 27.
    Wang, Y., Zhu, X.: A new approach for incremental speaker adaptation. In: Proceedings of the International Symposium on Chinese Spoken Language Processing (ISCSLP 2000), pp. 163–166 (2000)Google Scholar
  28. 28.
    Fiscus, J.G.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: Proceedings of the IEEE Workshop ASRU, pp. 347–354 (1997)Google Scholar
  29. 29.
    Lecouteux, B., Linarès, G., Estève, Y., Mauclair, J.: System combination by driven decoding. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2007, vol. 4, pp. IV-341–IV-344 (2007)Google Scholar
  30. 30.
    Lecouteux, B., Linarès, G., Estève, Y., Gravier, G.: Generalized driven decoding for speech recognition system combination. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2008, pp. 1549–1552 (2008)Google Scholar
  31. 31.
    Lecouteux, B., Linarès, G., Nocéra, P., Bonastre, J.: Reconnaissance de la parole guidée par des transcriptions approchées. In: Journées d’Etudes sur la Parole (JEP 2006), Dinard, France, pp. 53–56 (2006)Google Scholar
  32. 32.
    Berndt, D., Clifford, J.: Using dynamic time warping to find patterns in time series. In: Workshop on Knowledge Discovery in Databases (KDD 1994) pp. 359–370 (1994)Google Scholar
  33. 33.
    Vacher, M., Lecouteux, B., Istrate, D., Joubert, T., Portet, F., Sehili, M., Chahuara, P.: Experimental evaluation of speech recognition technologies for voice-based home automation control in a smart home. In: 4th Workshop on Speech and Language Processing for Assistive Technologies, Grenoble, France, pp. 99–105 (2013)Google Scholar
  34. 34.
    Chahuara, P., Portet, F., Vacher, M.: Making context aware decision from uncertain information in a smart home: a Markov logic network approach. In: Augusto, J.C., Wichert, R., Collier, R., Keyson, D., Salah, A.A., Tan, A.-H. (eds.) AmI 2013. LNCS, vol. 8309, pp. 78–93. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  35. 35.
    Franco, A.: Conférence invitée: Nouveaux paradigmes et technologies pour la santé et l’autonomie (invited conference: new paradigms and technologies for health and autonomy) [in french]. In: JEP-TALN-RECITAL 2012, Workshop ILADI 2012: Interactions Langagières pour personnes Agées Dans les habitats Intelligents (ILADI 2012: Language Interaction for Elderly in Smart Homes), pp. 1–2. ATALA/AFCP, Grenoble, France, June 2012Google Scholar
  36. 36.
    Vacher, M., Lecouteux, B., Portet, F.: Recognition of voice commands by multisource ASR and noise cancellation in a smart home environment. In: EUSIPCO (European Signal Processing Conference), Bucarest, Romania, pp. 1663–1667, 27–31 August 2012Google Scholar
  37. 37.
    Gemmeke, J.F., Ons, B., Tessema, N., hamme, H.V., van de Loo, J., Pauw, G.D., Daelemans, W., Huyghe, J., Derboven, J., Vuegen, L., Broeck, B.V.D., Karsmakers, P., Vanrumste, B.: Self-taught assistive vocal interfaces: an overview of the ALADIN project. In: Interspeech 2013, pp. 2039–2043 (2013)Google Scholar
  38. 38.
    Christensen, H., Casanueva, I., Cunningham, S., Green, P., Hain, T.: homeService: Voice-enabled assistive technology in the home using cloud-based automatic speech recognition. In: 4th Workshop on Speech and Language Processing for Assistive Technologies (2013)Google Scholar
  39. 39.
    Cristoforetti, L., Ravanelli, M., Omologo, M., Sosi, A., Abad, A., Hagmueller, M., Maragos, P.: The DIRHA simulated corpus. In: The 9th edition of the Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland, pp. 2629–2634 (2014)Google Scholar
  40. 40.
    Rougui, J., Istrate, D., Souidene, W.: Audio sound event identification for distress situations and context awareness. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2009, Minneapolis, USA, pp. 3501–3504 (2009)Google Scholar
  41. 41.
    Milhorat, P., Istrate, D., Boudy, J., Chollet, G.: Hands-free speech-sound interactions at home. In: Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp. 1678–1682, August 2012Google Scholar
  42. 42.
    Lines, L., Hone, K.S.: Multiple voices, multiple choices: older adults’ evaluation of speech output to support independent living. Gerontechnology J. 5(2), 78–91 (2006)Google Scholar
  43. 43.
    Wolters, M.K., Georgila, K., Moore, J.D., MacPherson, S.E.: Being old doesn’t mean acting old: how older users interact with spoken dialog systems. TACCESS 2(1), 1–31 (2009)CrossRefGoogle Scholar
  44. 44.
    Cavazza, M., de la Camara, R.S., Turunen, M.: How was your day?: a companion ECA. In: AAMAS, pp. 1629–1630 (2010)Google Scholar
  45. 45.
    Istrate, D., Vacher, M., Serignat, J.F.: Embedded implementation of distress situation identification through sound analysis. J. Inf. Technol. Healthc. 6, 204–211 (2008)Google Scholar
  46. 46.
    Charalampos, D., Maglogiannis, I.: Enabling human status awareness in assistive environments based on advanced sound and motion data classification. In: Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments, pp. 1:1–1:8 (2008)Google Scholar
  47. 47.
    Popescu, M., Li, Y., Skubic, M., Rantz, M.: An acoustic fall detector system that uses sound height information to reduce the false alarm rate. In: Proceedings 30th Annual International Conference of the IEEE-EMBS 2008, pp. 4628–4631, 20–25 August 2008Google Scholar
  48. 48.
    Lecouteux, B., Vacher, M., Portet, F.: Distant speech recognition in a smart home: comparison of several multisource ASRs in realistic conditions. In: Association, I.S.C. (ed.) Interspeech 2011 Florence, pp. 2273–2276. Florence, Italy (2011)Google Scholar
  49. 49.
    Bouakaz, S., Vacher, M., Bobillier-Chaumon, M.E., Aman, F., Bekkadja, S., Portet, F., Guillou, E., Rossato, S., Desserée, E., Traineau, P., Vimon, J.P., Chevalier, T.: CIRDO: smart companion for helping elderly to live at home for longer. IRBM 35(2), 101–108 (2014)CrossRefGoogle Scholar
  50. 50.
    Barker, J., Vincent, E., Ma, N., Christensen, H., Green, P.: The PASCAL CHiME speech separation and recognition challenge. Comput. Speech Lang. 27(3), 621–633 (2013)CrossRefGoogle Scholar
  51. 51.
    Vincent, E., Barker, J., Watanabe, S., Le Roux, J., Nesta, F., Matassoni, M.: The second ‘CHiME’ speech separation and recognition challenge: an overview of challenge systems and outcomes. In: 2013 IEEE Automatic Speech Recognition and Understanding Workshop, Olomouc, Czech Republic, December 2013Google Scholar
  52. 52.
    Ryan, W., Burk, K.: Perceptual and acoustic correlates in the speech of males. J. Commun. Disord. 7, 181–192 (1974)CrossRefGoogle Scholar
  53. 53.
    Takeda, N., Thomas, G., Ludlow, C.: Aging effects on motor units in the human thyroarytenoid muscle. Laryngoscope 110, 1018–1025 (2000)CrossRefGoogle Scholar
  54. 54.
    Mueller, P., Sweeney, R., Baribeau, L.: Acoustic and morphologic study of the senescent voice. Ear Nose Throat J. 63, 71–75 (1984)Google Scholar
  55. 55.
    Vipperla, R.C., Wolters, M., Georgila, K., Renals, S.: Speech input from older users in smart environments: challenges and perspectives. In: Stephanidis, C. (ed.) Universal Access in HCI, Part II, HCII 2009. LNCS, vol. 5615, pp. 117–126. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  56. 56.
    Pellegrini, T., Trancoso, I., Hämäläinen, A., Calado, A., Dias, M.S., Braga, D.: Impact of age in ASR for the elderly: preliminary experiments in European Portuguese. In: Torre Toledano, D., Ortega Giménez, A., Teixeira, A., González Rodríguez, J., Hernández Gómez, L., San Segundo Hernández, R., Ramos Castro, D. (eds.) IberSPEECH 2012. CCIS, vol. 328, pp. 139–147. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  57. 57.
    Baba, A., Yoshizawa, S., Yamada, M., Lee, A., Shikano, K.: Acoustic models of the elderly for large-vocabulary continuous speech recognition. Electron. Commun. 87(2), 49–57 (2004)Google Scholar
  58. 58.
    Vipperla, R., Renals, S., Frankel, J.: Longitudinal study of ASR performance on ageing voices. In: Proceedings of Interspeech 2008, Brisbane, pp. 2550–2553 (2008)Google Scholar
  59. 59.
    Baeckman, L., Small, B., Wahlin, A.: Aging and memory: cognitive and biological perspectives. In: Birren, J.E., Schaie, K.W. (eds.) Handbook of the Psychology of Aging, 5th edn, pp. 349–377. Academic Press, San Diego (2001)Google Scholar
  60. 60.
    Fozard, J., Gordont-Salant, S.: Changes in vision and hearing with aging. In: Birren, J.E., Schaie, K.W. (eds.) Handbook of the Psychology of Aging, 5th edn, pp. 241–266. Academic Press, San Diego (2001)Google Scholar
  61. 61.
    Audibert, N., Aubergé, V., Rilliard, A.: The prosodic dimensions of emotion in speech: the relative weights of parameters. In: Proceedings of Interspeech 2005, Lisbon, Portugal, pp. 525–528 (2005)Google Scholar
  62. 62.
    Vlasenko, B., Prylipko, D., Philippou-Hübner, D., Wendemuth, A.: Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. Proc. Interspeech 2011, 1577–1580 (2011)Google Scholar
  63. 63.
    Vlasenko, B., Prylipko, D., Wendemuth, A.: Towards robust spontaneous speech recognition with emotional speech adapted acoustic models. In: 35th German Conference on Artificial Intelligence (KI-2012), Saarbrücken, Germany, pp. 103–107, September 2012Google Scholar
  64. 64.
    Aman, F., Auberge, V., Vacher, M.: How affects can perturbe the automatic speech recognition of domotic interactions. In: Workshop on Affective Social Speech Signals, Grenoble, France, pp. 1–5 (2013)Google Scholar
  65. 65.
    Ziefle, M., Wilkowska, W.: Technology acceptability for medical assistance. In: PervasiveHealth, pp. 1–9, March 2010Google Scholar
  66. 66.
    McCoy, K., Waller, A.: Introduction to the special issue on AAC. ACM Trans. Access. Comput. 1(3), 1–34 (2009)CrossRefGoogle Scholar
  67. 67.
    McCoy, K., Arnott, J., Ferres, L., Fried-Oken, M., Roark, B.: Speech and language processing as assistive technologies. Comput. Speech Lang. 27, 1143–1146 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Michel Vacher
    • 1
  • Benjamin Lecouteux
    • 2
  • François Portet
    • 2
  1. 1.LIGCNRSGrenobleFrance
  2. 2.LIGUniversity Grenoble AlpesGrenobleFrance

Personalised recommendations