Is Spoken Language All-or-Nothing? Implications for Future Speech-Based Human-Machine Interaction

  • Roger K. MooreEmail author
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 427)


Recent years have seen significant market penetration for voice-based personal assistants such as Apple’s Siri. However, despite this success, user take-up is frustratingly low. This article argues that there is a habitability gap caused by the inevitable mismatch between the capabilities and expectations of human users and the features and benefits provided by contemporary technology. Suggestions are made as to how such problems might be mitigated, but a more worrisome question emerges: “is spoken language all-or-nothing”? The answer, based on contemporary views on the special nature of (spoken) language, is that there may indeed be a fundamental limit to the interaction that can take place between mismatched interlocutors (such as humans and machines). However, it is concluded that interactions between native and non-native speakers, or between adults and children, or even between humans and dogs, might provide critical inspiration for the design of future speech-based human-machine interaction.


Spoken language Habitability gap Human-machine interaction 



This work was supported by the European Commission [grant numbers EU-FP6-507422, EU-FP6-034434, EU-FP7-231868 and EU-FP7-611971], and the UK Engineering and Physical Sciences Research Council [grant number EP/I013512/1].


  1. 1.
    Pieraccini, R.: The Voice in the Machine. MIT Press, Cambridge (2012)Google Scholar
  2. 2.
    Liao, S.-H.: Awareness and Usage of Speech Technology. Masters thesis, Dept. Computer Science, University of Sheffield (2015)Google Scholar
  3. 3.
    Deng, L., Huang, X.: Challenges in adopting speech recognition. Commun. ACM 47(1), 69–75 (2004)CrossRefGoogle Scholar
  4. 4.
    Minker, W., Pittermann, J., Pittermann, A., Strauß, P.-M., Bühler, D.: Challenges in speech-based human-computer interfaces. Int. J. Speech Technol. 10(2–3), 109–119 (2007)CrossRefGoogle Scholar
  5. 5.
    Gales, M., Young, S.J.: The application of hidden Markov models in speech recognition. Found. Trends Signal Process. 1(3), 195–304 (2007)CrossRefzbMATHGoogle Scholar
  6. 6.
    Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. (2012)Google Scholar
  7. 7.
    Moore, R.K.: Modelling data entry rates for ASR and alternative input methods. In: Proceedings of the INTERSPEECH-ICSLP, Jeju, Korea (2004)Google Scholar
  8. 8.
    Nass, C., Brave, S.: Wired for Speech: How Voice Activates and Advances the Human-computer Relationship. MIT Press, Cambridge (2005)Google Scholar
  9. 9.
    Moore, R.K.: From talking and listening robots to intelligent communicative machines. In: Markowitz, J. (ed.) Robots That Talk and Listen, pp. 317–335. De Gruyter, Boston (2015)Google Scholar
  10. 10.
    Bernsen, N.O., Dybkjaer, H., Dybkjaer, L.: Designing Interactive Speech Systems: From First Ideas to User Testing. Springer, London (1998)CrossRefGoogle Scholar
  11. 11.
    McTear, M.F.: Spoken Dialogue Technology: Towards the Conversational User Interface. Springer, London (2004)CrossRefGoogle Scholar
  12. 12.
    Lopez Cozar Delgado, R.: Spoken, Multilingual and Multimodal Dialogue Systems: Development and Assessment. Wiley (2005)Google Scholar
  13. 13.
    Philips, M.: Applications of spoken language technology and systems. In: Gilbert, M., Ney, H. (eds.) IEEE/ACL Workshop on Spoken Language Technology (SLT) (2006)Google Scholar
  14. 14.
    Tomko, S., Harris, T.K., Toth, A., Sanders, J., Rudnicky, A., Rosenfeld, R.: Towards efficient human machine speech communication. ACM Trans. Speech Lang. Process. 2(1), 1–27 (2005)CrossRefGoogle Scholar
  15. 15.
    Tomko, S.L.: Improving User Interaction with Spoken Dialog Systems via Shaping. Ph.D. Thesis, Carnegie Mellon University (2006)Google Scholar
  16. 16.
    Komatani, K., Fukubayashi, Y., Ogata, T., Okuno, H.G.: Introducing utterance verification in spoken dialogue system to improve dynamic Help generation for novice users. In: Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, pp. 202–205 (2007)Google Scholar
  17. 17.
    Schlangen, D., Skantze, G.: A general, abstract model of incremental dialogue processing. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-09), Athens, Greece (2009)Google Scholar
  18. 18.
    Hastie, H., Lemon, O., Dethlefs, N.: Incremental spoken dialogue systems: tools and data. In: Proceedings of NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community, pp. 15–16, Montreal, Canada (2012)Google Scholar
  19. 19.
    Williams, J.D., Young, S.J.: Partially observable Markov decision processes for spoken dialog systems. Comput. Speech Lang. 21(2), 231–422 (2007)CrossRefGoogle Scholar
  20. 20.
    Gašić, M., Breslin, C., Henderson, M., Kim, D., Szummer, M., Thomson, B., Tsiakoulis, P., Young, S.J.: POMDP-based dialogue manager adaptation to extended domains. In: Proceedings of 14th SIGdial Meeting on Discourse and Dialogue, pp. 214–222, Metz, France (2013)Google Scholar
  21. 21.
    Mori, M.: Bukimi no tani (the uncanny valley). Energy 7, 33–35 (1970)Google Scholar
  22. 22.
    Moore, R.K.: A Bayesian explanation of the “Uncanny Valley” effect and related psychological phenomena. Nat. Sci. Rep. 2(864) (2012)Google Scholar
  23. 23.
    Moore, R.K., Maier, V.: Visual, vocal and behavioural affordances: some effects of consistency. In: Proceedings of the 5th International Conference on Cognitive Systems (CogSys 2012), Vienna (2012)Google Scholar
  24. 24.
    Gibson, J.J.: The theory of affordances. In: Shaw, R., Bransford, J. (eds.) Perceiving, Acting, and Knowing: Toward an Ecological Psychology, pp. 67–82. Lawrence Erlbaum, Hillsdale (1977)Google Scholar
  25. 25.
    Worgan, S., Moore, R.K.: Speech as the perception of affordances. Ecolog. Psychol. 22(4), 327–343 (2010)CrossRefGoogle Scholar
  26. 26.
    Balentine, B.: It’s Better to Be a Good Machine Than a Bad Person: Speech Recognition and Other Exotic User Interfaces at the Twilight of the Jetsonian Age. ICMI Press, Annapolis (2007)Google Scholar
  27. 27.
    Moore, R.K., Morris, A.: Experiences collecting genuine spoken enquiries using WOZ techniques. In: Proceedings of the 5th DARPA Workshop on Speech and Natural Language, New York (1992)Google Scholar
  28. 28.
    Jibo: The World’s First Social Robot for the Home.
  29. 29.
    Jokinen, K., Hurtig, T.: User expectations and real experience on a multimodal interactive system. In: Proceedings of the INTERSPEECH-ICSLP Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA (2006)Google Scholar
  30. 30.
    Gardiner, A.H.: The Theory of Speech and Language. Oxford University Press, Oxford (1932)Google Scholar
  31. 31.
    Bickerton, D.: Language and Human Behavior. University of Washington Press, Seattle (1995)Google Scholar
  32. 32.
    Hauser, M.D.: The Evolution of Communication. The MIT Press (1997)Google Scholar
  33. 33.
    Hauser, M.D., Chomsky, N., Fitch, W.T.: The faculty of language: what is it, who has it, and how did it evolve? Science 298, 1569–1579 (2002)CrossRefGoogle Scholar
  34. 34.
    Everett, D.: Language: The Cultural Tool. Profile Books, London (2012)Google Scholar
  35. 35.
    Moore, R.K.: Spoken language processing: piecing together the puzzle. Speech Commun. 49(5), 418–435 (2007)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Maturana, H.R., Varela, F.J.: The Tree of Knowledge: The Biological Roots of Human Understanding. New Science Library/Shambhala Publications, Boston (1987)Google Scholar
  37. 37.
    Cummins, F.: Voice, (inter-)subjectivity, and real time recurrent interaction. Front. Psychol. 5, 760 (2014)Google Scholar
  38. 38.
    Bickhard, M.H.: Language as an interaction system. New Ideas Psychol. 25(2), 171–187 (2007)CrossRefGoogle Scholar
  39. 39.
    Cowley, S.J. (ed.): Distributed Language. John Benjamins Publishing Company (2011)Google Scholar
  40. 40.
    Fusaroli, R., Raczaszek-Leonardi, J., Tylén, K.: Dialog as interpersonal synergy. New Ideas Psychol. 32, 147–157 (2014)CrossRefGoogle Scholar
  41. 41.
    Scott-Phillips, T.: Speaking Our Minds: Why Human Communication Is Different, and How Language Evolved to Make It Special. Palgrave MacMillan (2015)Google Scholar
  42. 42.
    Baron-Cohen, S.: Evolution of a theory of mind? In: Corballis, M., Lea, S. (eds.) The Descent of Mind: Psychological Perspectives on Hominid Evolution. Oxford University Press (1999)Google Scholar
  43. 43.
    Malle, B.F.: The relation between language and theory of mind in development and evolution. In: Givón, T., Malle, B.F. (eds.) The Evolution of Language out of Pre-Language, pp. 265–284. Benjamins, Amsterdam (2002)CrossRefGoogle Scholar
  44. 44.
    Lakoff, G., Johnson, M.: Metaphors We Live By. University of Chicago Press, Chicago (1980)Google Scholar
  45. 45.
    Feldman, J.A.: From Molecules to Metaphor: A Neural Theory of Language. Bradford Books (2008)Google Scholar
  46. 46.
    Levinson, S.C.: Pragmatics. Cambridge University Press, Cambridge (1983)Google Scholar
  47. 47.
    Friston, K., Kiebel, S.: Predictive coding under the free-energy principle. Phil. Trans. R. Soc. B 364(1521), 1211–1221 (2009)CrossRefGoogle Scholar
  48. 48.
    Rizzolatti, G., Craighero, L.: The mirror-neuron system. Annu. Rev. Neurosci. 27, 169–192 (2004)CrossRefGoogle Scholar
  49. 49.
    Wilson, M., Knoblich, G.: The case for motor involvement in perceiving conspecifics. Psychol. Bull. 131(3), 460–473 (2005)CrossRefGoogle Scholar
  50. 50.
    Pickering, M.J., Garrod, S.: Do people use language production to make predictions during comprehension? Trends Cogn. Sci. 11(3), 105–110 (2007)CrossRefGoogle Scholar
  51. 51.
    Garrod, S., Gambi, C., Pickering, M.J.: Prediction at all levels: forward model predictions can enhance comprehension. Lang. Cogn. Neurosci. 29(1), 46–48 (2013)CrossRefGoogle Scholar
  52. 52.
    Moore, R.K.: Introducing a pictographic language for envisioning a rich variety of enactive systems with different degrees of complexity. Int. J. Adv. Robot. Syst. 13(74) (2016)Google Scholar
  53. 53.
    Fernald, A.: Four-month-old infants prefer to listen to Motherese. Infant Behav. Dev. 8, 181–195 (1985)CrossRefGoogle Scholar
  54. 54.
    Matson, E.T., Taylor, J., Raskin, V., Min, B.-C., Wilson, E.C.: A natural language exchange model for enabling human, agent, robot and machine interaction. In: Proceedings of the 5th International Conference on Automation, Robotics and Applications, pp. 340–345. IEEE (2011)Google Scholar
  55. 55.
    Serpell, J.: The Domestic Dog: Its Evolution, Behaviour and Interactions with People. Cambridge University Press (1995)Google Scholar

Copyright information

© Springer Science+Business Media Singapore 2017

Authors and Affiliations

  1. 1.Speech and Hearing Research Group, Department of Computer ScienceUniversity of SheffieldSheffieldUK

Personalised recommendations