Spoken Language Understanding for Natural Interaction: The Siri Experience

  • Jerome R. BellegardaEmail author
Conference paper


Recent advances in software integration and efforts toward more personalization and context awareness have brought closer the long-standing vision of the ubiquitous intelligent personal assistant. This has become particularly salient in the context of smartphones and electronic tablets, where natural language interaction has the potential to considerably enhance mobile experience. Far beyond merely offering more options in terms of user interface, this trend may well usher in a genuine paradigm shift in man-machine communication. This contribution reviews the two major semantic interpretation frameworks underpinning natural language interaction, along with their respective advantages and drawbacks. It then discusses the choices made in Siri, Apple’s personal assistant on the iOS platform, and speculates on how the current implementation might evolve in the near future to best mitigate any downside.


Belief State Semantic Interpretation Partially Observable Markov Decision Process Natural Language Understanding Dialog Management 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
  2. 2.
    Bellegarda, J.R.: Latent semantic mapping. In: Deng, L., Wang, K., Chou, W. (eds.) Signal Processing Magazine, Special Issue on Speech Technology and Systems in Human-Machine Communication, vol. 22(5), pp. 70–80, Sep 2005Google Scholar
  3. 3.
    Berry, P., Myers, K., Uribe, T., Yorke-Smith, N.: Constraint solving experience with the CALO project. In: Proceedings of Workshop on Constraint Solving Under Change and Uncertainty, pp. 4–8 (2005)Google Scholar
  4. 4.
    Buchanan, B.G., Shortliffe, E.H.: Rule–Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project. Addison–Wesley, Reading (1984)Google Scholar
  5. 5.
    Cheyer, A., Martin, D.: The open agent architecture. J. Auton. Agents Multi-Agent Syst. 4(1), 143–148 (2001)CrossRefGoogle Scholar
  6. 6.
    Fu, W.-T., Anderson, J.: From recurrent choice to skill learning: a reinforcement-learning model. J. Exp. Psychol. Gen. 135(2), 184–206 (2006)CrossRefGoogle Scholar
  7. 7.
    Gasic, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., Young, S.: Training and evaluation of the HIS POMDP dialogue system in noise. In: Proceedings of 9th SIGdial Workshop Discourse Dialog, Columbus, OH (2008)Google Scholar
  8. 8.
  9. 9.
    Guzzoni, D., Baur, C., Cheyer, A.: Active: a unified platform for building intelligent web interaction assistants. In: Proceedings of 2006 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, IEEE Computer Society, 2006Google Scholar
  10. 10.
    Kaelbling, J.L., Littman, M., Cassandra, A.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101, 99–134 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Kording, J.K., Wolpert, D.: Bayesian integration in sensorimotor learning. Nature 427, 224–227 (2004)CrossRefGoogle Scholar
  12. 12.
    Laird, J.E., Newell, A., Rosenbloom, P.S.: SOAR: an architecture for general intelligence. Artif. Intell. 33(1), 1–64 (1987)MathSciNetCrossRefGoogle Scholar
  13. 13.
  14. 14.
    Morris, J., Ree, P., Maes, P.: SARDINE: dynamic seller strategies in an auction marketplace. In: Proceedings of ACM Conference on Electronic Commerce, pp. 128–134 (2000)Google Scholar
  15. 15.
  16. 16.
    Rabiner, L.R., Juang, B.H., Lee, C.-H.: An overview of automatic speech recognition, Chapter 1. In: Lee, C.-H., Soong, F.K., Paliwal, K.K. (eds.) Automatic Speech and Speaker Recognition: Advanced Topics, pp. 1–30. Kluwer Academic Publishers, Boston (1996)Google Scholar
  17. 17.
    Sondik, E.: The optimal control of partially observable markov decision processes. Ph.D. Dissertation, Stanford University, Palo Alto, CA (1971)Google Scholar
  18. 18.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge (1998)Google Scholar
  19. 19.
    Sycara, K., Paolucci, M., van Velsen, M., Giampapa, J.: The RETSINA MAS Infrastructure. Technical Report CMU- RI-TR-01-05, Robotics Institute Technical Report, Carnegie Mellon University, 2001Google Scholar
  20. 20.
    Thomson, B., Schatzmann, J., Young, S.: Bayesian update of dialogue state for robust dialogue systems. In: Proceedings of International Conference on Acoustics Speech Signal Processing, Las Vegas, NV (2008)Google Scholar
  21. 21.
    Vlingo Mobile Voice User Interface. (2008)
  22. 22.
    Wildfire Virtual Assistant Service, Virtuosity Corp. (1995)
  23. 23.
    Williams, J., Young, S.: Scaling POMDPs for spoken dialog management. IEEE Trans. Audio, Speech Lang. Process. 15(7), 2116–2129 (2007)Google Scholar
  24. 24.
    Williams, J., Poupart, P., Young, S.: Factored partially observable Markov decision processes for dialogue management. In: Proceedings of 4th Workshop Knowledge Reasoning in Practical Dialogue Systems, Edinburgh, UK (2005)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Apple Inc., One Infinite LoopCupertinoUSA

Personalised recommendations