Dialogues with Social Robots pp 281-291 | Cite as
Is Spoken Language All-or-Nothing? Implications for Future Speech-Based Human-Machine Interaction
- 6 Citations
- 1 Mentions
- 1.4k Downloads
Abstract
Recent years have seen significant market penetration for voice-based personal assistants such as Apple’s Siri. However, despite this success, user take-up is frustratingly low. This article argues that there is a habitability gap caused by the inevitable mismatch between the capabilities and expectations of human users and the features and benefits provided by contemporary technology. Suggestions are made as to how such problems might be mitigated, but a more worrisome question emerges: “is spoken language all-or-nothing”? The answer, based on contemporary views on the special nature of (spoken) language, is that there may indeed be a fundamental limit to the interaction that can take place between mismatched interlocutors (such as humans and machines). However, it is concluded that interactions between native and non-native speakers, or between adults and children, or even between humans and dogs, might provide critical inspiration for the design of future speech-based human-machine interaction.
Keywords
Spoken language Habitability gap Human-machine interactionNotes
Acknowledgements
This work was supported by the European Commission [grant numbers EU-FP6-507422, EU-FP6-034434, EU-FP7-231868 and EU-FP7-611971], and the UK Engineering and Physical Sciences Research Council [grant number EP/I013512/1].
References
- 1.Pieraccini, R.: The Voice in the Machine. MIT Press, Cambridge (2012)Google Scholar
- 2.Liao, S.-H.: Awareness and Usage of Speech Technology. Masters thesis, Dept. Computer Science, University of Sheffield (2015)Google Scholar
- 3.Deng, L., Huang, X.: Challenges in adopting speech recognition. Commun. ACM 47(1), 69–75 (2004)CrossRefGoogle Scholar
- 4.Minker, W., Pittermann, J., Pittermann, A., Strauß, P.-M., Bühler, D.: Challenges in speech-based human-computer interfaces. Int. J. Speech Technol. 10(2–3), 109–119 (2007)CrossRefGoogle Scholar
- 5.Gales, M., Young, S.J.: The application of hidden Markov models in speech recognition. Found. Trends Signal Process. 1(3), 195–304 (2007)CrossRefzbMATHGoogle Scholar
- 6.Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. (2012)Google Scholar
- 7.Moore, R.K.: Modelling data entry rates for ASR and alternative input methods. In: Proceedings of the INTERSPEECH-ICSLP, Jeju, Korea (2004)Google Scholar
- 8.Nass, C., Brave, S.: Wired for Speech: How Voice Activates and Advances the Human-computer Relationship. MIT Press, Cambridge (2005)Google Scholar
- 9.Moore, R.K.: From talking and listening robots to intelligent communicative machines. In: Markowitz, J. (ed.) Robots That Talk and Listen, pp. 317–335. De Gruyter, Boston (2015)Google Scholar
- 10.Bernsen, N.O., Dybkjaer, H., Dybkjaer, L.: Designing Interactive Speech Systems: From First Ideas to User Testing. Springer, London (1998)CrossRefGoogle Scholar
- 11.McTear, M.F.: Spoken Dialogue Technology: Towards the Conversational User Interface. Springer, London (2004)CrossRefGoogle Scholar
- 12.Lopez Cozar Delgado, R.: Spoken, Multilingual and Multimodal Dialogue Systems: Development and Assessment. Wiley (2005)Google Scholar
- 13.Philips, M.: Applications of spoken language technology and systems. In: Gilbert, M., Ney, H. (eds.) IEEE/ACL Workshop on Spoken Language Technology (SLT) (2006)Google Scholar
- 14.Tomko, S., Harris, T.K., Toth, A., Sanders, J., Rudnicky, A., Rosenfeld, R.: Towards efficient human machine speech communication. ACM Trans. Speech Lang. Process. 2(1), 1–27 (2005)CrossRefGoogle Scholar
- 15.Tomko, S.L.: Improving User Interaction with Spoken Dialog Systems via Shaping. Ph.D. Thesis, Carnegie Mellon University (2006)Google Scholar
- 16.Komatani, K., Fukubayashi, Y., Ogata, T., Okuno, H.G.: Introducing utterance verification in spoken dialogue system to improve dynamic Help generation for novice users. In: Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, pp. 202–205 (2007)Google Scholar
- 17.Schlangen, D., Skantze, G.: A general, abstract model of incremental dialogue processing. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-09), Athens, Greece (2009)Google Scholar
- 18.Hastie, H., Lemon, O., Dethlefs, N.: Incremental spoken dialogue systems: tools and data. In: Proceedings of NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community, pp. 15–16, Montreal, Canada (2012)Google Scholar
- 19.Williams, J.D., Young, S.J.: Partially observable Markov decision processes for spoken dialog systems. Comput. Speech Lang. 21(2), 231–422 (2007)CrossRefGoogle Scholar
- 20.Gašić, M., Breslin, C., Henderson, M., Kim, D., Szummer, M., Thomson, B., Tsiakoulis, P., Young, S.J.: POMDP-based dialogue manager adaptation to extended domains. In: Proceedings of 14th SIGdial Meeting on Discourse and Dialogue, pp. 214–222, Metz, France (2013)Google Scholar
- 21.Mori, M.: Bukimi no tani (the uncanny valley). Energy 7, 33–35 (1970)Google Scholar
- 22.Moore, R.K.: A Bayesian explanation of the “Uncanny Valley” effect and related psychological phenomena. Nat. Sci. Rep. 2(864) (2012)Google Scholar
- 23.Moore, R.K., Maier, V.: Visual, vocal and behavioural affordances: some effects of consistency. In: Proceedings of the 5th International Conference on Cognitive Systems (CogSys 2012), Vienna (2012)Google Scholar
- 24.Gibson, J.J.: The theory of affordances. In: Shaw, R., Bransford, J. (eds.) Perceiving, Acting, and Knowing: Toward an Ecological Psychology, pp. 67–82. Lawrence Erlbaum, Hillsdale (1977)Google Scholar
- 25.Worgan, S., Moore, R.K.: Speech as the perception of affordances. Ecolog. Psychol. 22(4), 327–343 (2010)CrossRefGoogle Scholar
- 26.Balentine, B.: It’s Better to Be a Good Machine Than a Bad Person: Speech Recognition and Other Exotic User Interfaces at the Twilight of the Jetsonian Age. ICMI Press, Annapolis (2007)Google Scholar
- 27.Moore, R.K., Morris, A.: Experiences collecting genuine spoken enquiries using WOZ techniques. In: Proceedings of the 5th DARPA Workshop on Speech and Natural Language, New York (1992)Google Scholar
- 28.Jibo: The World’s First Social Robot for the Home. https://www.jibo.com
- 29.Jokinen, K., Hurtig, T.: User expectations and real experience on a multimodal interactive system. In: Proceedings of the INTERSPEECH-ICSLP Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA (2006)Google Scholar
- 30.Gardiner, A.H.: The Theory of Speech and Language. Oxford University Press, Oxford (1932)Google Scholar
- 31.Bickerton, D.: Language and Human Behavior. University of Washington Press, Seattle (1995)Google Scholar
- 32.Hauser, M.D.: The Evolution of Communication. The MIT Press (1997)Google Scholar
- 33.Hauser, M.D., Chomsky, N., Fitch, W.T.: The faculty of language: what is it, who has it, and how did it evolve? Science 298, 1569–1579 (2002)CrossRefGoogle Scholar
- 34.Everett, D.: Language: The Cultural Tool. Profile Books, London (2012)Google Scholar
- 35.Moore, R.K.: Spoken language processing: piecing together the puzzle. Speech Commun. 49(5), 418–435 (2007)MathSciNetCrossRefGoogle Scholar
- 36.Maturana, H.R., Varela, F.J.: The Tree of Knowledge: The Biological Roots of Human Understanding. New Science Library/Shambhala Publications, Boston (1987)Google Scholar
- 37.Cummins, F.: Voice, (inter-)subjectivity, and real time recurrent interaction. Front. Psychol. 5, 760 (2014)Google Scholar
- 38.Bickhard, M.H.: Language as an interaction system. New Ideas Psychol. 25(2), 171–187 (2007)CrossRefGoogle Scholar
- 39.Cowley, S.J. (ed.): Distributed Language. John Benjamins Publishing Company (2011)Google Scholar
- 40.Fusaroli, R., Raczaszek-Leonardi, J., Tylén, K.: Dialog as interpersonal synergy. New Ideas Psychol. 32, 147–157 (2014)CrossRefGoogle Scholar
- 41.Scott-Phillips, T.: Speaking Our Minds: Why Human Communication Is Different, and How Language Evolved to Make It Special. Palgrave MacMillan (2015)Google Scholar
- 42.Baron-Cohen, S.: Evolution of a theory of mind? In: Corballis, M., Lea, S. (eds.) The Descent of Mind: Psychological Perspectives on Hominid Evolution. Oxford University Press (1999)Google Scholar
- 43.Malle, B.F.: The relation between language and theory of mind in development and evolution. In: Givón, T., Malle, B.F. (eds.) The Evolution of Language out of Pre-Language, pp. 265–284. Benjamins, Amsterdam (2002)CrossRefGoogle Scholar
- 44.Lakoff, G., Johnson, M.: Metaphors We Live By. University of Chicago Press, Chicago (1980)Google Scholar
- 45.Feldman, J.A.: From Molecules to Metaphor: A Neural Theory of Language. Bradford Books (2008)Google Scholar
- 46.Levinson, S.C.: Pragmatics. Cambridge University Press, Cambridge (1983)Google Scholar
- 47.Friston, K., Kiebel, S.: Predictive coding under the free-energy principle. Phil. Trans. R. Soc. B 364(1521), 1211–1221 (2009)CrossRefGoogle Scholar
- 48.Rizzolatti, G., Craighero, L.: The mirror-neuron system. Annu. Rev. Neurosci. 27, 169–192 (2004)CrossRefGoogle Scholar
- 49.Wilson, M., Knoblich, G.: The case for motor involvement in perceiving conspecifics. Psychol. Bull. 131(3), 460–473 (2005)CrossRefGoogle Scholar
- 50.Pickering, M.J., Garrod, S.: Do people use language production to make predictions during comprehension? Trends Cogn. Sci. 11(3), 105–110 (2007)CrossRefGoogle Scholar
- 51.Garrod, S., Gambi, C., Pickering, M.J.: Prediction at all levels: forward model predictions can enhance comprehension. Lang. Cogn. Neurosci. 29(1), 46–48 (2013)CrossRefGoogle Scholar
- 52.Moore, R.K.: Introducing a pictographic language for envisioning a rich variety of enactive systems with different degrees of complexity. Int. J. Adv. Robot. Syst. 13(74) (2016)Google Scholar
- 53.Fernald, A.: Four-month-old infants prefer to listen to Motherese. Infant Behav. Dev. 8, 181–195 (1985)CrossRefGoogle Scholar
- 54.Matson, E.T., Taylor, J., Raskin, V., Min, B.-C., Wilson, E.C.: A natural language exchange model for enabling human, agent, robot and machine interaction. In: Proceedings of the 5th International Conference on Automation, Robotics and Applications, pp. 340–345. IEEE (2011)Google Scholar
- 55.Serpell, J.: The Domestic Dog: Its Evolution, Behaviour and Interactions with People. Cambridge University Press (1995)Google Scholar