Advertisement

Giving Voices to Multimodal Applications

  • Nuno Almeida
  • António Teixeira
  • Ana Filipa Rosa
  • Daniela Braga
  • João Freitas
  • Miguel Sales Dias
  • Samuel Silva
  • Jairo Avelar
  • Cristiano Chesi
  • Nuno Saldanha
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9170)

Abstract

The use of speech interaction is important and useful in a wide range of applications. It is a natural way of interaction and it is easy to use by people in general. The development of speech enabled applications is a big challenge that increases if several languages are required, a common scenario, for example, in Europe. Tackling this challenge requires the proposal of methods and tools that foster easier deployment of speech features, harnessing developers with versatile means to include speech interaction in their applications. Besides, only a reduced variety of voices are available (sometimes only one per language) which raises problems regarding the fulfillment of user preferences and hinders a deeper exploration regarding voices’ adequacy to specific applications and users.

In this article, we present some of our contributions to these different issues: (a) our generic modality that encapsulates the technical details of using speech synthesis; (b) the process followed to create four new voices, including two young adult and two elderly voices; and (c) some initial results exploring user preferences regarding the created voices.

The preliminary studies carried out targeted groups including both young and older-adults and addressed: (a) evaluation of the intrinsic properties of each voice; (b) observation of users while using speech enabled interfaces and elicitation of qualitative impressions regarding the chosen voice and the impact of speech interaction on user satisfaction; and (c) ranking of voices according to preference.

The collected results, albeit preliminary, yield some evidence of the positive impact speech interaction has on users, at different levels. Additionally, results show interesting differences among the voice preferences expressed by both age groups and genders.

Keywords

Synthetic voices Speech output Multimodal interaction Age effects 

Notes

Acknowledgments

Research partially funded by IEETA Research Unit funding (PEst-OE/EEI/UI0127/2014), project Cloud Thinking (funded by the QREN Mais Centro program, ref. CENTRO-07-ST24-FEDER-002031), Marie Curie Actions IRIS (ref. 610986, FP7-PEOPLE-2013-IAPP), project Smart Phones for Seniors (S4S), a QREN project (QREN 21541), co-funded by COMPETE and FEDER, project PaeLife (AAL-08-1-2001-0001) and project AAL4ALL (AAL/0015/2009).

References

  1. 1.
    Bijani, C., White, B.-K., Vilrokx, M.: Giving voice to enterprise mobile applications. In: Proceedings of the 15th International Conference on Human-Computer Interaction with Mobile Devices and Services - MobileHCI 2013, p. 428. ACM Press, New York, USA (2013)Google Scholar
  2. 2.
    Nass, C., Brave, S.: Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship. MIT Press, Cambridge, MA, USA (2007)Google Scholar
  3. 3.
    McCoy, S.L., Tun, P., Cox, L., Wingate, A.: Aging in a fast-paced world: rapid speech and its effect on understanding. ASHA Lead. 12, 30–31 (2005)Google Scholar
  4. 4.
    Gordon-Salant, S., et al.: Sources of age-related recognition difficulty for time-compressed speech. J. Speech Lang. Hear Res. 44, 709–719 (2001)CrossRefGoogle Scholar
  5. 5.
    Vipperla, R., Wolters, M., Renals, S.: Spoken dialogue interfaces for older people. Adv. Home Care Technol. 1, 118–137 (2012)Google Scholar
  6. 6.
    Hale, K., Reeves, L., Stanney, K.: Design of systems for Improved Human Interaction (2006)Google Scholar
  7. 7.
    Bodell, M., Dahl, D., Kliche, I., Larson, J., Porter, B., Raggett, D., Raman, T., Rodriguez, B.H., Selvaraj, M., Tumuluri, R., Wahbe, A., Wiechno, P., Yudkowsky, M.: Multimodal architecture and interfaces: W3C recommendation. http://www.w3.org/TR/mmi-arch/
  8. 8.
    Dahl, D.A.: The W3C multimodal architecture and interfaces standard. J. Multimodal User Interfaces 7, 171–182 (2013)CrossRefGoogle Scholar
  9. 9.
    Almeida, N., Silva, S., Teixeira, A.: Design and Development of Speech Interaction: A Methodology. In: Kurosu, M. (ed.) HCI 2014, Part II. LNCS, vol. 8511, pp. 370–381. Springer, Heidelberg (2014)Google Scholar
  10. 10.
    Teixeira, A., Ferreira, F., Almeida, N., Rosa, A., Casimiro, J., Silva, S., Queirós, A., Oliveira, A.: Multimodality and adaptation for an enhanced mobile medication assistant for the elderly. In: Proceedings of the Third Mobile Accessibility Workshop (MOBACC), CHI 2013, France (2013)Google Scholar
  11. 11.
    Ferreira, F., Almeida, N., Rosa, A.F., Oliveira, A., Teixeira, A., Pereira, J.C.: Multimodal and adaptable medication assistant for the elderly: A prototype for interaction and usability in smartphones. In: 2013 8th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–6. IEEE, Lisboa (2013)Google Scholar
  12. 12.
    Ferreira, F., Almeida, N., Rosa, A.F., Oliveira, A., Casimiro, J., Silva, S., Teixeira, A.: Elderly centered design for interaction – the case of the S4S medication assistant. In: 5th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion, DSAI (2013)Google Scholar
  13. 13.
    Teixeira, A., Hämäläinen, A., Avelar, J., Almeida, N., Németh, G., Fegyó, T., Zainkó, C., Csapó, T., Tóth, B., Oliveira, A., Dias, M.S.: Speech-centric multimodal interaction for easy-to-access online services: a personal life assistant for the elderly. In: Proceedings of the DSAI 2013, Procedia Computer Science, pp. 389–397 (2013)Google Scholar
  14. 14.
    Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51, 1039–1064 (2009)CrossRefGoogle Scholar
  15. 15.
    Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A.W., Tokuda, K.: The HMM-based speech synthesis system version 2.0. In: Speech Synthesis Workshop, Bonn, Germany, pp. 294–299 (2007)Google Scholar
  16. 16.
    Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing Conference, pp. 373–376. IEEE (1996)Google Scholar
  17. 17.
    Viswanathan, M., Viswanathan, M.: Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale. Comput. Speech Lang. 19, 55–83 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Nuno Almeida
    • 1
    • 2
  • António Teixeira
    • 1
    • 2
  • Ana Filipa Rosa
    • 1
  • Daniela Braga
    • 3
  • João Freitas
    • 4
  • Miguel Sales Dias
    • 4
    • 5
  • Samuel Silva
    • 1
    • 2
  • Jairo Avelar
    • 4
  • Cristiano Chesi
    • 4
  • Nuno Saldanha
    • 4
  1. 1.Institute of Electronics and Telematics EngineeringUniversity of AveiroAveiroPortugal
  2. 2.Department of Electronics, Telecommunications and Informatics EngineeringUniversity of AveiroAveiroPortugal
  3. 3.Voicebox TechnologiesBellevueUSA
  4. 4.Microsoft Language Development CenterLisbonPortugal
  5. 5.Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR-IULLisbonPortugal

Personalised recommendations