Abstract
Interactions with speech interfaces are growing, helped by the advent of intelligent personal assistants like Amazon Alexa and Google Assistant. This software is utilised in hardware such as smart home devices (e.g. Amazon Echo and Google Home), smartphones and vehicles. Given the unprecedented level of spoken interactions with machines, it is important we understand what is considered appropriate, desirable and attractive computer speech. Previous research has suggested that the overuse of humanlike voices in limited-communication devices can induce uncanny valley effects—a perceptual tension arising from mismatched stimuli causing incongruence between users’ expectations of a system and its actual capabilities. This chapter explores the possibility of verbal uncanny valley effects in computer speech by utilising the interpersonal linguistic strategies of politeness, relational work and vague language. This work highlights that using these strategies can create perceptual tension and negative experiences due to the conflicting stimuli of computer speech and ‘humanlike’ language. This tension can be somewhat moderated with more humanlike than robotic voices, though not alleviated completely. Considerations for the design of computer speech and subsequent future research directions are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Discourse markers may also be referred to, amongst other terms, as discourse particles, pragmatic particles and pragmatic expressions. Their purposes can include switching topics, marking boundaries between segments of talk, helping to conduct linguistic repair and being used as hedging devices (Jucker & Ziv, 1998).
- 2.
These were adaptors, e.g. more or less, somewhat (reduce assertiveness, minimise imposition); discourse markers, e.g. so, now (structure talk, mitigate assertive impact of utterance); minimisers, e.g. just, basically (structure talk, reduce perceived difficulty, mitigate utterance impact) and vague nouns, e.g. thing, bit (improve language efficiency) (Clark et al., 2016).
- 3.
- 4.
References
Abercrombie, D. (1967). Elements of general phonetics (Vol. 203). Edinburgh: Edinburgh University Press.
Aylett, M. P., Cowan, B. R., & Clark, L. (2019). Siri, echo and performance: You have to suffer darling. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM.
Bickmore, T. W., Trinh, H., Olafsson, S., O’Leary, T. K., Asadi, R., Rickles, N. M., & Cruz, R. (2018). Patient and consumer safety risks when using conversational assistants for medical information: An observational study of Siri, Alexa, and Google Assistant. Journal of Medical Internet Research, 20(9). https://doi.org/10.2196/11510.
Brown, P., & Levinson, S. C. (1987). Politeness: Some universals in language usage. Cambridge University Press.
Cameron, D. (2001). Working with spoken discourse. SAGE.
Carr, E. W., Hofree, G., Sheldon, K., Saygin, A. P., & Winkielman, P. (2017). Is that a human? Categorization (dis)fluency drives evaluations of agents ambiguous on human-likeness. Journal of Experimental Psychology: Human Perception and Performance, 43(4), 651–666. https://doi.org/10.1037/xhp0000304.
Channell, J. (1994). Vague language. Oxford University Press.
Clark, L. (2018). Social boundaries of appropriate speech in HCI: A politeness perspective. In Proceedings of British HCI.
Clark, L., Cabral, J. & Cowan, B. R. (2018). The CogSIS project: Examining the cognitive effects of speech interface synthesis. In Proceedings of British HCI.
Clark, L., Doyle, P., Garaialde, D., Gilmartin, E., Schlögl, S., Edlund, J., ... & Cowan, B. R. (2019a). The state of speech in HCI: Trends, themes and challenges. Interacting with Computers, 31(4), 349–371. https://doi.org/10.1093/iwc/iwz016.
Clark, L., Pantidi, N., Cooney, O., Doyle, P., Garaialde, D., Edwards, J., ... & Cowan, B.R. (2019b, May). What makes a good conversation? challenges in designing truly conversational agents. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1–12). https://doi.org/10.1145/3290605.3300705.
Clark, L. M. H., Bachour, K., Ofemile, A., Adolphs, S., & Rodden, T. (2014). Potential of imprecision: Exploring vague language in agent instructors (pp. 339–344). ACM Press. https://doi.org/10.1145/2658861.2658895
Clark, L., Ofemile, A., Adolphs, S., & Rodden, T. (2016). A multimodal approach to assessing user experiences with agent helpers. ACM Transactions on Interactive Intelligent Systems, 6(4), 29:1–29:31. https://doi.org/10.1145/2983926.
Coulthard, M. (2013). Advances in spoken discourse analysis. Routledge.
Cowan, B. R., Branigan, H. P., Obregón, M., Bugis, E., & Beale, R. (2015). Voice anthropomorphism, interlocutor modelling and alignment effects on syntactic choices in human − computer dialogue. International Journal of Human-Computer Studies, 83, 27–42. https://doi.org/10.1016/j.ijhcs.2015.05.008.
Cowan, B. R., Pantidi, N., Coyle, D., Morrissey, K., Clarke, P., Al-Shehri, S., … Bandeira, N. (2017). ‘What can I help you with?’: Infrequent users’ experiences of intelligent personal assistants (pp. 1–12). ACM Press. https://doi.org/10.1145/3098279.3098539.
Gilmartin, E., Cowan, B. R., Vogel, C., & Campbell, N. (2017). Exploring multiparty casual talk for social human-machine dialogue. In International Conference on Speech and Computer (pp. 370–378). Springer.
Goffman, E. (1955). On face-work. Psychiatry, 18(3), 213–231. https://doi.org/10.1080/00332747.1955.11023008.
Goffman, E. (2005). Interaction ritual: Essays in face to face behavior. AldineTransaction.
Grimshaw, M. (2009). The audio Uncanny Valley: Sound, fear and the horror game. Audio Mostly, 21–26.
Hone, K. S., & Graham, R. (2000). Towards a tool for the subjective assessment of speech system interfaces (SASSI). Natural Language Engineering, 6(3–4), 287–303.
Jucker, A. H., & Ziv, Y. (1998). Discourse markers: Descriptions and theory. John Benjamins Publishing.
Kätsyri, J., Förger, K., Mäkäräinen, M., & Takala, T. (2015). A review of empirical evidence on different uncanny valley hypotheses: Support for perceptual mismatch as one road to the valley of eeriness. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.00390.
Large, D. R., Clark, L., Quandt, A., Burnett, G., & Skrypchuk, L. (2017). Steering the conversation: A linguistic exploration of natural language interactions with a digital assistant during simulated driving. Applied Ergonomics, 63, 53–61. https://doi.org/10.1016/j.apergo.2017.04.003.
Laver, J. (1980). The phonetic description of voice quality (Cambridge Studies in Linguistics). Cambridge: Cambridge University Press.
Locher, M. A. (2004). Power and politeness in action: Disagreements in oral communication. Walter de Gruyter.
Locher, M. A. (2006). Polite behavior within relational work: The discursive approach to politeness. Walter de Gruyter.
Locher, M. A., & Watts, R. J. (2005). Politeness theory and relational work. Journal of Politeness Research. Language, Behaviour, Culture, 1(1). https://doi.org/10.1515/jplr.2005.1.1.9
Locher, M. A., & Watts, R. J. (2008). Relational work and impoliteness: Negotiating norms of linguistic behaviour. Mouton de Gruyter.
Luger, E., & Sellen, A. (2016). ‘Like having a really bad PA’: The gulf between user expectation and experience of conversational agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 5286–5297). New York, NY, USA: ACM. https://doi.org/10.1145/2858036.2858288.
McCarthy, M., & Carter, R. (2006). This that and the other: Multi-word clusters in spoken English as visible patterns of interaction. Explorations in Corpus Linguistics, 7.
Meah, L. F. S., & Moore, R. K. (2014). The uncanny valley: A focus on misaligned cues. In M. Beetz, B. Johnston, & M.-A. Williams (Eds.), Social robotics (pp. 256–265). Springer International Publishing.
Mitchell, W. J., Szerszen, K. A., Lu, A. S., Schermerhorn, P. W., Scheutz, M., & MacDorman, K. F. (2011). A mismatch in the human realism of face and voice produces an Uncanny Valley. I-Perception, 2(1), 10–12. https://doi.org/10.1068/i0415.
Moore, R. K. (2012). A Bayesian explanation of the ‘Uncanny Valley’ effect and related psychological phenomena. Scientific Reports, 2(1). https://doi.org/10.1038/srep00864.
Moore, R. K. (2015). From talking and listening robots to intelligent communicative machines. In Robots that talk and listen: de Gruyter.
Moore, R. K. (2017a). Appropriate voices for artefacts: Some key insights. In 1st International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots.
Moore, R. K. (2017b). Is spoken language all-or-nothing? Implications for future speech-based human-machine interaction. In Dialogues with Social Robots (pp. 281–291). Springer, Singapore. https://doi.org/10.1007/978-981-10-2585-3_22.
Mori, M. (1970). The uncanny valley. Energy, 7(4), 33–35.
Mori, M., MacDorman, K. F., & Kageki, N. (2012). The uncanny valley [from the field]. IEEE Robotics and Automation Magazine, 19(2), 98–100.
Porcheron, M., Fischer, J. E., Reeves, S., & Sharples, S. (2018). Voice interfaces in everyday life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (p. 640). ACM.
Porcheron, M., Fischer, J. E., & Sharples, S. (2017). ‘Do animals have accents?’: Talking with agents in multi-party conversation (pp. 207–219). ACM Press. https://doi.org/10.1145/2998181.2998298.
Strait, M., Canning, C., & Scheutz, M. (2014). Let me tell you! Investigating the effects of robot communication strategies in advice-giving situations based on robot appearance, interaction modality and distance (pp. 479–486). ACM Press. https://doi.org/10.1145/2559636.2559670.
Torrey, C., Fussell, S. R., & Kiesler, S. (2013). How a robot should give advice (pp. 275–282). IEEE. https://doi.org/10.1109/HRI.2013.6483599
Trappes-Lomax, H. (2007). Vague language as a means of self-protective avoidance: Tension management in conference talks. In Vague language explored (pp. 117–137). Springer.
Wang, N., Johnson, W. L., Mayer, R. E., Rizzo, P., Shaw, E., & Collins, H. (2008). The politeness effect: Pedagogical agents and learning outcomes. International Journal of Human-Computer Studies, 66(2), 98–112. https://doi.org/10.1016/j.ijhcs.2007.09.003.
Watts, R. J. (2003). Politeness. Cambridge University Press.
Zuckerman, M., & Driver, R. E. (1988). What sounds beautiful is good: The vocal attractiveness stereotype. Journal of Nonverbal Behavior, 13(2), 67–82. https://doi.org/10.1007/BF00990791.
Acknowledgments
This research was funded by a New Horizons grant from the Irish Research Council entitled “The COG-SIS Project: Cognitive effects of Speech Interface Synthesis” (Grant R17339).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Clark, L., Ofemile, A., Cowan, B.R. (2021). Exploring Verbal Uncanny Valley Effects with Vague Language in Computer Speech. In: Weiss, B., Trouvain, J., Barkat-Defradas, M., Ohala, J.J. (eds) Voice Attractiveness. Prosody, Phonology and Phonetics. Springer, Singapore. https://doi.org/10.1007/978-981-15-6627-1_17
Download citation
DOI: https://doi.org/10.1007/978-981-15-6627-1_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6626-4
Online ISBN: 978-981-15-6627-1
eBook Packages: Social SciencesSocial Sciences (R0)