Abstract
Recent technological advancements bring virtual agents, avatars, and social robotic characters into our daily lives. These characters must acquire the ability to express (simulated) emotions vocally and gesturally. In the vocal channel, Natural Language Interaction technologies today have some limitations when used in real-world natural environments and the models of expressivity of text to speech synthesis engines are not yet mature enough. To address these limitations, an alternative form of vocal communication - gibberish speech - is introduced in this paper. Gibberish speech consists of vocalizations of meaningless strings of speech sounds, and thus has no semantic meaning. It is occasionally used by performing artists or for cartoon animations and games to express intended emotions (e.g. Teletubbies and The Sims). In this paper, our approach for constructing expressive gibberish speech is described and the experimental evaluations with its intended robotic agents are reported. It is shown that the generated gibberish speech can contribute to a significant extent to studies concerning emotion expression for robotic agents and can be further utilized in affective human-robot interaction studies.
This is a preview of subscription content, access via your institution.









Notes
In this context a sample is considered natural when it sounds like an unrecognized real language and not as an unnatural or random combination of sounds.
In hypothesis testing (or statistical significance testing) in statistics, p value is the significance of the sample statistics [1]. It represents the probability of obtaining the observed effect (or larger) under a null hypothesis. A significant effect can be claimed if the p value is smaller than a conventional significance level (which is typically 0.05).
ETRO audio-visual lab, http://www.etro.vub.ac.be/Research/Nosey_Elephant_Studios/.
Annosoft Lipsync Tool 4.1 can be downloaded from: http://www.annosoft.com/lipsync-tool.
References
Argyrous G (2005) Statistics for Research. Sage Publications Ltd, London
Ayesh A (2009) Emotionally expressive music based interaction language for social robots. ICGST Int J Autom Robot Auton Syst 9(1):1–10
Bamidis PD, Luneski A, Vivas A, Papadelis C, Maglaveras N (2007) Multi-channel physiological sensing of human emotion: insights into emotion-aware computing using affective protocols, avatars and emotion specifications. In: Medinfo 2007: Proceedings of the 12th world congress on health (Medical) Informatics IOS Press, Building Sustainable Health Systems
Breazeal C (2000) Sociable machines: expressive social exchange between humans and robots. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
Breazeal C (2001) Emotive qualities in robot speech. In: Proceedings of the 2001 IEEE/RSJ international conference on intelligent robots and systems. pp 1388–1394
Burleson W (2006) Affective learning companions: strategies for empathetic agents with real-time multimodal affective sensing to foster meta-cognitive and meta-affective approaches to learning, motivation. PhD thesis, Massachusetts Institute of Technology
Busso C, Narayanan S (2008) Recording audio-visual emotional databases from actors look. In: 2nd international workshop on emotion: corpora for research on emotion and affect, international conference on language resources and evaluation (LREC 2008), pp 17–22
Carlson R, Granström B, Nord L(1991) Segmental evaluation using the Esprit/SAM test procedures and monosyllabic words. In: The ESCA workshop on speech synthesis
Chomsky N (1956) Three models for the description of language. Inf Theory, IRE Trans 2(3):113–124
Corveleyn S, Coose B, Verhelst W (2002) Voice modification and conversion using PLAR-Parameters. In: IEEE Benelux workshop on model based processing and coding of audio (MPCA)
Goodrich MA, Schultz A C(2007) Human-robot interaction: a survey. Found Trends Human-Comput Inter 1(3):203–275
Gouaillier D, Hugel V, Blazevic P, Kilner C, Monceaux J, Lafourcade P, Marnier B, Serre J, Maisonnier B (2008) The NAO humanoid: a combination of performance and affordability. CoRR abs/0807. 3223
Hart M (1971) Project Gutenberg. http://www.gutenberg.org. Accessed March 2014
Jee ES, Jeong YJ, Kim CH, Kobayashi H (2010) Sound design for emotion and intention expression of socially interactive robots. Intel Serv Robo 3:199–206 634
Juslin PN, Laukka P (2003) Communication of emotions in vocal expression and music performance: different channels, same code? Psychol Bull 129(5):770–814
Latacz L, Kong Y, MattheysesW, VerhelstW(2008) An overview of the VUB entry for the 2008 blizzard challenge. In: Proceedings of the interspeech blizzard challenge
Libin AV, Libin EV (2004) Person-robot interactions from the robopsychologists’ point of view: the robotic psychology and robotherapy approach. Proc IEEE 92(11):1789–1803
Lisetti C, Nasoz F, LeRouge C, Ozyer O, Alvarez K (2003) Developing multimodal intelligent affective interfaces for tele-home health care. Int J Human-Computer Stud 59(1):245–255
Luneski A, Konstantinidis E, Bamidis P (2010) Affective medicine: a review of affective computing efforts in medical informatics. Methods Inf Med 49(3):207–218
Mubin O, Bartneck C, Feijs L (2009) What you say is not what you get: arguing for artificial languages instead of natural languages in human robot speech interaction. In: The spoken dialogue and human-robot interaction workshop at IEEE RoMan 2009. IEEE, Japan
Nijholt A, Tan D (2007) Playing with your brain: brain-computer interfaces and games. In: Proceedings of the international conference on Advances in computer entertainment technology. ACM, pp 305306
Olive J, Buchsbaum A (1987) Changing voice characteristics in text to speech synthesis. AT&T Bell-Labs, Technical Memorandum
Oudeyer PY (2003) The production and recognition of emotions in speech: features and algorithms. Int J Human-Comput Stud 59(1):157–183
Prendinger H, Ishizuka M (2004) What affective computing and life-like character technology can do for tele-home health care. In: Proceedings workshop HCI and homecare, Citeseer
Read R, Belpaeme T (2012) How to Use Non-Linguistic Utterances to Convey Emotion in Child-Robot Interaction. In: Proceedings of the 7th annual ACM/IEEE international conference on Human-Robot Interaction. ACM, Boston, MA, pp 219–220
Riek LD (2012) Wizard of Oz studies in HRI: a systematic review and new reporting guidelines. J Human-Robot Interact 1(1):119–136
Saldien J, Goris K, Yilmazyildiz S, Verhelst W, Lefeber D (2008) On the design of the huggable robot Probo. J Phys Agents 2(2):3–12
Saldien J, Goris K, Vanderborght B, Vanderfaeillie J, Lefeber D (2010) Expressing emotions with the social robot Probo. Int J Soc Robot 2(4):377–389
Schröder M (2003) Speech and emotion research: an overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD thesis, University of Saarland
Schröder M (2009) Expressive speech synthesis: past, present, and possible futures. In: Tao J, Tan T (eds) Affective information processing. Springer, London, pp 111–126
Schröder M, Trouvain J (2003) The German text-to-speech synthesis system MARY: a tool for research, development and teaching. Int J Speech Technol 6(4):365–377
Schröder M, Cowie R, Douglas-Cowie E, Westerdijk M, Gielen SC (2001) Acoustic correlates of emotion dimensions in view of speech synthesis. In: INTERSPEECH, pp 87–90
Smith RN, Frawley WJ (1999) Affective computing: medical applications. In: Proceedings of HCI international (the 8th international conference on human-computer interaction) on human-computer interaction: ergonomics and user interfaces-Volume I-Volume I, L. Erlbaum Associates Inc., pp 843–847
Verhelst W, Roelands M Verhelst W, Roelands M (1993) An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: Conference on IEEE international acoustics, speech, and signal processing (ICASSP), vol 2. IEEE, pp 554–557
Wang W, Athanasopoulos G, Yilmazyildiz S, Patsis G, Enescu V, Sahli H, Verhelst W, Hiolle A, Lewis M, Canamero L (2014) Natural emotion elicitation for emotion modeling in child-robot interactions, (accepted)
Winters RM, Wanderley MM (2013) Sonification of Emotion: Strategies for Continuous Display of Arousal and Valence. In: Luck G, Brabant O (eds) Proceedings of the 3rd international conference on music & emotion (ICME3). University of Jyväskylä, Department of Music. Jyväskylä, Finland
Yang PF, Stylianou Y (1998) Real Time voice alteration based on linear prediction. In: Proceedings of ICSLP, Citeseer. Sydney, Australia, pp 1667–1670
Yilmazyildiz S, Mattheyses W, Patsis Y, Verhelst W (2006) Expressive speech recognition and synthesis as enabling technologies for affective robot-child communication. In: Zhuang Y, Yang SQ, Rui Y, He Q (eds) Advances in multimedia information processing - PCM 2006, lecture notes in computer science, vol 4261. Springer, Berlin Heidelberg, pp 1–8
Yilmazyildiz S, Latacz L, Mattheyses W, Verhelst W (2010) Expressive gibberish speech synthesis for affective human-computer interaction. In: Sojka P, Horák A, Kopecék I, Pala K (eds) text, speech and dialogue, lecture notes in computer science, vol 6231. Springer Berlin, Heidelberg, pp 584–590
Yilmazyildiz S, Henderickx D, Vanderborght B, Verhelst W, Soetens E, Lefeber D (2011) EMOGIB: emotional gibberish speech database for affective human-robot interaction. In: DMello S, Graesser A, Schuller B, Martin JC (eds) Affective computing and intelligent interaction, lecture notes in computer science, vol 6975. Springer Berlin, Heidelberg. Memphis, Tennessee, pp 163–172
Yilmazyildiz S, Athanasopoulos G, Patsis G, Wang W, Oveneke MC, Latacz L, Verhelst W, Sahli H, Henderickx D, Vanderborght B, Soetens E, Lefeber D (2013) Voice modification forwizard-of-oz experiments in robot-child interaction. In: Workshop on affective social speech signals (WASSS 2013)
Yilmazyildiz S, Henderickx D, Vanderborght B, Verhelst W, Soetens E, Lefeber D (2013) Multi-modal emotion expression for affective human-robot interaction. In: Workshop on affective social speech signals (WASSS 2013) 705
Acknowledgements
The research reported in this paper was supported in part by the Research counsel of the Vrije Universiteit Brussel with horizontale onderzoeksactie HOA16 and by the European Commission (EU-FP7 project ALIZ-E, ICT-248116).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yilmazyildiz, S., Verhelst, W. & Sahli, H. Gibberish speech as a tool for the study of affective expressiveness for robotic agents. Multimed Tools Appl 74, 9959–9982 (2015). https://doi.org/10.1007/s11042-014-2165-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2165-1
Keywords
- Vocal emotion expression
- Speech without semantic information
- Human robot interaction
- Expressive speech
- Expressive gibberish speech
- Voice modification