Is Spoken Language All-or-Nothing? Implications for Future Speech-Based Human-Machine Interaction

Moore, Roger K.

doi:10.1007/978-981-10-2585-3_22

Roger K. Moore³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 427))

1941 Accesses
27 Citations
1 Altmetric

Abstract

Recent years have seen significant market penetration for voice-based personal assistants such as Apple’s Siri. However, despite this success, user take-up is frustratingly low. This article argues that there is a habitability gap caused by the inevitable mismatch between the capabilities and expectations of human users and the features and benefits provided by contemporary technology. Suggestions are made as to how such problems might be mitigated, but a more worrisome question emerges: “is spoken language all-or-nothing”? The answer, based on contemporary views on the special nature of (spoken) language, is that there may indeed be a fundamental limit to the interaction that can take place between mismatched interlocutors (such as humans and machines). However, it is concluded that interactions between native and non-native speakers, or between adults and children, or even between humans and dogs, might provide critical inspiration for the design of future speech-based human-machine interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See [1] for a comprehensive review of the history of speech technology R&D up to, and including, the release of Siri.
2.
It is often argued that such an approach is unimportant as users will habituate. However, habituation only occurs after sustained exposure, and a key issue here is how to increase the effectiveness of first encounters (since that has a direct impact on the likelihood of further usage).
3.
Interestingly, these ideas do appear to be having some impact on the design of contemporary autonomous social agents such as Jibo (which has a childlike and mildly robotic voice) [28].
4.
Members of the same species.
5.
Interestingly, Nass and Brave [8] noted that people speak to poor automatic speech recognition systems as if they were non-native listeners.
6.
Unfortunately, this term has already been coined to refer to a robot’s natural language abilities in robot-robot and robot-human communication [54].

References

Pieraccini, R.: The Voice in the Machine. MIT Press, Cambridge (2012)
Google Scholar
Liao, S.-H.: Awareness and Usage of Speech Technology. Masters thesis, Dept. Computer Science, University of Sheffield (2015)
Google Scholar
Deng, L., Huang, X.: Challenges in adopting speech recognition. Commun. ACM 47(1), 69–75 (2004)
Article Google Scholar
Minker, W., Pittermann, J., Pittermann, A., Strauß, P.-M., Bühler, D.: Challenges in speech-based human-computer interfaces. Int. J. Speech Technol. 10(2–3), 109–119 (2007)
Article Google Scholar
Gales, M., Young, S.J.: The application of hidden Markov models in speech recognition. Found. Trends Signal Process. 1(3), 195–304 (2007)
Article MATH Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. (2012)
Google Scholar
Moore, R.K.: Modelling data entry rates for ASR and alternative input methods. In: Proceedings of the INTERSPEECH-ICSLP, Jeju, Korea (2004)
Google Scholar
Nass, C., Brave, S.: Wired for Speech: How Voice Activates and Advances the Human-computer Relationship. MIT Press, Cambridge (2005)
Google Scholar
Moore, R.K.: From talking and listening robots to intelligent communicative machines. In: Markowitz, J. (ed.) Robots That Talk and Listen, pp. 317–335. De Gruyter, Boston (2015)
Google Scholar
Bernsen, N.O., Dybkjaer, H., Dybkjaer, L.: Designing Interactive Speech Systems: From First Ideas to User Testing. Springer, London (1998)
Book Google Scholar
McTear, M.F.: Spoken Dialogue Technology: Towards the Conversational User Interface. Springer, London (2004)
Book Google Scholar
Lopez Cozar Delgado, R.: Spoken, Multilingual and Multimodal Dialogue Systems: Development and Assessment. Wiley (2005)
Google Scholar
Philips, M.: Applications of spoken language technology and systems. In: Gilbert, M., Ney, H. (eds.) IEEE/ACL Workshop on Spoken Language Technology (SLT) (2006)
Google Scholar
Tomko, S., Harris, T.K., Toth, A., Sanders, J., Rudnicky, A., Rosenfeld, R.: Towards efficient human machine speech communication. ACM Trans. Speech Lang. Process. 2(1), 1–27 (2005)
Article Google Scholar
Tomko, S.L.: Improving User Interaction with Spoken Dialog Systems via Shaping. Ph.D. Thesis, Carnegie Mellon University (2006)
Google Scholar
Komatani, K., Fukubayashi, Y., Ogata, T., Okuno, H.G.: Introducing utterance verification in spoken dialogue system to improve dynamic Help generation for novice users. In: Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, pp. 202–205 (2007)
Google Scholar
Schlangen, D., Skantze, G.: A general, abstract model of incremental dialogue processing. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-09), Athens, Greece (2009)
Google Scholar
Hastie, H., Lemon, O., Dethlefs, N.: Incremental spoken dialogue systems: tools and data. In: Proceedings of NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community, pp. 15–16, Montreal, Canada (2012)
Google Scholar
Williams, J.D., Young, S.J.: Partially observable Markov decision processes for spoken dialog systems. Comput. Speech Lang. 21(2), 231–422 (2007)
Article Google Scholar
Gašić, M., Breslin, C., Henderson, M., Kim, D., Szummer, M., Thomson, B., Tsiakoulis, P., Young, S.J.: POMDP-based dialogue manager adaptation to extended domains. In: Proceedings of 14th SIGdial Meeting on Discourse and Dialogue, pp. 214–222, Metz, France (2013)
Google Scholar
Mori, M.: Bukimi no tani (the uncanny valley). Energy 7, 33–35 (1970)
Google Scholar
Moore, R.K.: A Bayesian explanation of the “Uncanny Valley” effect and related psychological phenomena. Nat. Sci. Rep. 2(864) (2012)
Google Scholar
Moore, R.K., Maier, V.: Visual, vocal and behavioural affordances: some effects of consistency. In: Proceedings of the 5th International Conference on Cognitive Systems (CogSys 2012), Vienna (2012)
Google Scholar
Gibson, J.J.: The theory of affordances. In: Shaw, R., Bransford, J. (eds.) Perceiving, Acting, and Knowing: Toward an Ecological Psychology, pp. 67–82. Lawrence Erlbaum, Hillsdale (1977)
Google Scholar
Worgan, S., Moore, R.K.: Speech as the perception of affordances. Ecolog. Psychol. 22(4), 327–343 (2010)
Article Google Scholar
Balentine, B.: It’s Better to Be a Good Machine Than a Bad Person: Speech Recognition and Other Exotic User Interfaces at the Twilight of the Jetsonian Age. ICMI Press, Annapolis (2007)
Google Scholar
Moore, R.K., Morris, A.: Experiences collecting genuine spoken enquiries using WOZ techniques. In: Proceedings of the 5th DARPA Workshop on Speech and Natural Language, New York (1992)
Google Scholar
Jibo: The World’s First Social Robot for the Home. https://www.jibo.com
Jokinen, K., Hurtig, T.: User expectations and real experience on a multimodal interactive system. In: Proceedings of the INTERSPEECH-ICSLP Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA (2006)
Google Scholar
Gardiner, A.H.: The Theory of Speech and Language. Oxford University Press, Oxford (1932)
Google Scholar
Bickerton, D.: Language and Human Behavior. University of Washington Press, Seattle (1995)
Google Scholar
Hauser, M.D.: The Evolution of Communication. The MIT Press (1997)
Google Scholar
Hauser, M.D., Chomsky, N., Fitch, W.T.: The faculty of language: what is it, who has it, and how did it evolve? Science 298, 1569–1579 (2002)
Article Google Scholar
Everett, D.: Language: The Cultural Tool. Profile Books, London (2012)
Google Scholar
Moore, R.K.: Spoken language processing: piecing together the puzzle. Speech Commun. 49(5), 418–435 (2007)
Article MathSciNet Google Scholar
Maturana, H.R., Varela, F.J.: The Tree of Knowledge: The Biological Roots of Human Understanding. New Science Library/Shambhala Publications, Boston (1987)
Google Scholar
Cummins, F.: Voice, (inter-)subjectivity, and real time recurrent interaction. Front. Psychol. 5, 760 (2014)
Google Scholar
Bickhard, M.H.: Language as an interaction system. New Ideas Psychol. 25(2), 171–187 (2007)
Article Google Scholar
Cowley, S.J. (ed.): Distributed Language. John Benjamins Publishing Company (2011)
Google Scholar
Fusaroli, R., Raczaszek-Leonardi, J., Tylén, K.: Dialog as interpersonal synergy. New Ideas Psychol. 32, 147–157 (2014)
Article Google Scholar
Scott-Phillips, T.: Speaking Our Minds: Why Human Communication Is Different, and How Language Evolved to Make It Special. Palgrave MacMillan (2015)
Google Scholar
Baron-Cohen, S.: Evolution of a theory of mind? In: Corballis, M., Lea, S. (eds.) The Descent of Mind: Psychological Perspectives on Hominid Evolution. Oxford University Press (1999)
Google Scholar
Malle, B.F.: The relation between language and theory of mind in development and evolution. In: Givón, T., Malle, B.F. (eds.) The Evolution of Language out of Pre-Language, pp. 265–284. Benjamins, Amsterdam (2002)
Chapter Google Scholar
Lakoff, G., Johnson, M.: Metaphors We Live By. University of Chicago Press, Chicago (1980)
Google Scholar
Feldman, J.A.: From Molecules to Metaphor: A Neural Theory of Language. Bradford Books (2008)
Google Scholar
Levinson, S.C.: Pragmatics. Cambridge University Press, Cambridge (1983)
Google Scholar
Friston, K., Kiebel, S.: Predictive coding under the free-energy principle. Phil. Trans. R. Soc. B 364(1521), 1211–1221 (2009)
Article Google Scholar
Rizzolatti, G., Craighero, L.: The mirror-neuron system. Annu. Rev. Neurosci. 27, 169–192 (2004)
Article Google Scholar
Wilson, M., Knoblich, G.: The case for motor involvement in perceiving conspecifics. Psychol. Bull. 131(3), 460–473 (2005)
Article Google Scholar
Pickering, M.J., Garrod, S.: Do people use language production to make predictions during comprehension? Trends Cogn. Sci. 11(3), 105–110 (2007)
Article Google Scholar
Garrod, S., Gambi, C., Pickering, M.J.: Prediction at all levels: forward model predictions can enhance comprehension. Lang. Cogn. Neurosci. 29(1), 46–48 (2013)
Article Google Scholar
Moore, R.K.: Introducing a pictographic language for envisioning a rich variety of enactive systems with different degrees of complexity. Int. J. Adv. Robot. Syst. 13(74) (2016)
Google Scholar
Fernald, A.: Four-month-old infants prefer to listen to Motherese. Infant Behav. Dev. 8, 181–195 (1985)
Article Google Scholar
Matson, E.T., Taylor, J., Raskin, V., Min, B.-C., Wilson, E.C.: A natural language exchange model for enabling human, agent, robot and machine interaction. In: Proceedings of the 5th International Conference on Automation, Robotics and Applications, pp. 340–345. IEEE (2011)
Google Scholar
Serpell, J.: The Domestic Dog: Its Evolution, Behaviour and Interactions with People. Cambridge University Press (1995)
Google Scholar

Download references

Acknowledgements

This work was supported by the European Commission [grant numbers EU-FP6-507422, EU-FP6-034434, EU-FP7-231868 and EU-FP7-611971], and the UK Engineering and Physical Sciences Research Council [grant number EP/I013512/1].

Author information

Authors and Affiliations

Speech and Hearing Research Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK
Roger K. Moore

Authors

Roger K. Moore
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roger K. Moore .

Editor information

Editors and Affiliations

Institute of Behavioural Sciences, University of Helsinki Institute of Behavioural Sciences, Helsinki, Finland
Kristiina Jokinen
University of Helsinki , Helsinki, Finland
Graham Wilcock

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Moore, R.K. (2017). Is Spoken Language All-or-Nothing? Implications for Future Speech-Based Human-Machine Interaction. In: Jokinen, K., Wilcock, G. (eds) Dialogues with Social Robots. Lecture Notes in Electrical Engineering, vol 427. Springer, Singapore. https://doi.org/10.1007/978-981-10-2585-3_22

Download citation

DOI: https://doi.org/10.1007/978-981-10-2585-3_22
Published: 25 December 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2584-6
Online ISBN: 978-981-10-2585-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics