Giving Voices to Multimodal Applications
The use of speech interaction is important and useful in a wide range of applications. It is a natural way of interaction and it is easy to use by people in general. The development of speech enabled applications is a big challenge that increases if several languages are required, a common scenario, for example, in Europe. Tackling this challenge requires the proposal of methods and tools that foster easier deployment of speech features, harnessing developers with versatile means to include speech interaction in their applications. Besides, only a reduced variety of voices are available (sometimes only one per language) which raises problems regarding the fulfillment of user preferences and hinders a deeper exploration regarding voices’ adequacy to specific applications and users.
In this article, we present some of our contributions to these different issues: (a) our generic modality that encapsulates the technical details of using speech synthesis; (b) the process followed to create four new voices, including two young adult and two elderly voices; and (c) some initial results exploring user preferences regarding the created voices.
The preliminary studies carried out targeted groups including both young and older-adults and addressed: (a) evaluation of the intrinsic properties of each voice; (b) observation of users while using speech enabled interfaces and elicitation of qualitative impressions regarding the chosen voice and the impact of speech interaction on user satisfaction; and (c) ranking of voices according to preference.
The collected results, albeit preliminary, yield some evidence of the positive impact speech interaction has on users, at different levels. Additionally, results show interesting differences among the voice preferences expressed by both age groups and genders.
KeywordsSynthetic voices Speech output Multimodal interaction Age effects
Research partially funded by IEETA Research Unit funding (PEst-OE/EEI/UI0127/2014), project Cloud Thinking (funded by the QREN Mais Centro program, ref. CENTRO-07-ST24-FEDER-002031), Marie Curie Actions IRIS (ref. 610986, FP7-PEOPLE-2013-IAPP), project Smart Phones for Seniors (S4S), a QREN project (QREN 21541), co-funded by COMPETE and FEDER, project PaeLife (AAL-08-1-2001-0001) and project AAL4ALL (AAL/0015/2009).
- 1.Bijani, C., White, B.-K., Vilrokx, M.: Giving voice to enterprise mobile applications. In: Proceedings of the 15th International Conference on Human-Computer Interaction with Mobile Devices and Services - MobileHCI 2013, p. 428. ACM Press, New York, USA (2013)Google Scholar
- 2.Nass, C., Brave, S.: Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship. MIT Press, Cambridge, MA, USA (2007)Google Scholar
- 3.McCoy, S.L., Tun, P., Cox, L., Wingate, A.: Aging in a fast-paced world: rapid speech and its effect on understanding. ASHA Lead. 12, 30–31 (2005)Google Scholar
- 5.Vipperla, R., Wolters, M., Renals, S.: Spoken dialogue interfaces for older people. Adv. Home Care Technol. 1, 118–137 (2012)Google Scholar
- 6.Hale, K., Reeves, L., Stanney, K.: Design of systems for Improved Human Interaction (2006)Google Scholar
- 7.Bodell, M., Dahl, D., Kliche, I., Larson, J., Porter, B., Raggett, D., Raman, T., Rodriguez, B.H., Selvaraj, M., Tumuluri, R., Wahbe, A., Wiechno, P., Yudkowsky, M.: Multimodal architecture and interfaces: W3C recommendation. http://www.w3.org/TR/mmi-arch/
- 9.Almeida, N., Silva, S., Teixeira, A.: Design and Development of Speech Interaction: A Methodology. In: Kurosu, M. (ed.) HCI 2014, Part II. LNCS, vol. 8511, pp. 370–381. Springer, Heidelberg (2014)Google Scholar
- 10.Teixeira, A., Ferreira, F., Almeida, N., Rosa, A., Casimiro, J., Silva, S., Queirós, A., Oliveira, A.: Multimodality and adaptation for an enhanced mobile medication assistant for the elderly. In: Proceedings of the Third Mobile Accessibility Workshop (MOBACC), CHI 2013, France (2013)Google Scholar
- 11.Ferreira, F., Almeida, N., Rosa, A.F., Oliveira, A., Teixeira, A., Pereira, J.C.: Multimodal and adaptable medication assistant for the elderly: A prototype for interaction and usability in smartphones. In: 2013 8th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–6. IEEE, Lisboa (2013)Google Scholar
- 12.Ferreira, F., Almeida, N., Rosa, A.F., Oliveira, A., Casimiro, J., Silva, S., Teixeira, A.: Elderly centered design for interaction – the case of the S4S medication assistant. In: 5th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion, DSAI (2013)Google Scholar
- 13.Teixeira, A., Hämäläinen, A., Avelar, J., Almeida, N., Németh, G., Fegyó, T., Zainkó, C., Csapó, T., Tóth, B., Oliveira, A., Dias, M.S.: Speech-centric multimodal interaction for easy-to-access online services: a personal life assistant for the elderly. In: Proceedings of the DSAI 2013, Procedia Computer Science, pp. 389–397 (2013)Google Scholar
- 15.Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A.W., Tokuda, K.: The HMM-based speech synthesis system version 2.0. In: Speech Synthesis Workshop, Bonn, Germany, pp. 294–299 (2007)Google Scholar
- 16.Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing Conference, pp. 373–376. IEEE (1996)Google Scholar