Improving command and control speech recognition on mobile devices: using predictive user models for language modeling

Original Paper

Abstract

Command and control (C&C) speech recognition allows users to interact with a system by speaking commands or asking questions restricted to a fixed grammar containing pre-defined phrases. Whereas C&C interaction has been commonplace in telephony and accessibility systems for many years, only recently have mobile devices had the memory and processing capacity to support client-side speech recognition. Given the personal nature of mobile devices, statistical models that can predict commands based in part on past user behavior hold promise for improving C&C recognition accuracy. For example, if a user calls a spouse at the end of every workday, the language model could be adapted to weight the spouse more than other contacts during that time. In this paper, we describe and assess statistical models learned from a large population of users for predicting the next user command of a commercial C&C application. We explain how these models were used for language modeling, and evaluate their performance in terms of task completion. The best performing model achieved a 26% relative reduction in error rate compared to the base system. Finally, we investigate the effects of personalization on performance at different learning rates via online updating of model parameters based on individual user data. Personalization significantly increased relative reduction in error rate by an additional 5%.

Keywords

Command and control Language modeling Speech recognition Predictive user models 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chickering, D., Heckerman, D., Meek, C.: A Bayesian approach to learning Bayesian networks with local structure. In: Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence, Providence, RI, pp. 80–89. Morgan Kaufmann, 1997Google Scholar
  2. Chickering, D., Paek, T.: Personalizing influence diagrams: applying online learning strategies to dialogue management. User Modeling and User-Adaped Interaction, 2005Google Scholar
  3. Chickering, D.M.: The winmine toolkit. Technical Report MSR-TR-2002-103 Microsoft, Redmond, WA, 2002Google Scholar
  4. Dietterich T.G. (1998) Approximate statistical test for comparing supervised classification learning algorithms. Neural Comput. 10(7):1895–1923CrossRefGoogle Scholar
  5. Horvitz, E., Paek, T.: Harnessing models of users’ goals to mediate clarification dialog in spoken language systems. In: Proceedings of the Eighth International Conference on User Modeling, pp. 3–13. Sonthofen, Germany, 2001Google Scholar
  6. Horvitz, E., Shwe, M.: In pursuit of effective handsfree decision support: coupling Bayesian inference, speech understanding, and user models. In: Nineteenth Anuual Symposium on Computer Applications in Medical Care. Toward Cost-Effective Clinical Computing, 1995Google Scholar
  7. Hunt, A., McGlashan, S. (eds.): Speech Recognition Grammar Specification Version 1.0, W3C Recommendation (2004) http://www.w3.org/TR/2004/REC-speech-grammar-20040316/.Google Scholar
  8. Jameson A., Klöckner K. (2004) User multitasking with mobile multimodal systems. In: Minker W., Bühler D., Dybkjær L. (eds). Spoken Multimodal Human-Computer Dialogue in Mobile Environments. Kluwer Academic Publishers, Dordrecht, pp. 349–377Google Scholar
  9. Jameson, A., Wittig, F.: Leveraging data about users in general in the learning of individual user models. In: Nebel B., (ed.) Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 1185–1192. Morgan Kaufmann, San Francisco, CA 2001Google Scholar
  10. Jelinek F. (1997) Statistical Methods for Speech Recognition. MIT Press, Cambridge, MAGoogle Scholar
  11. Johansson, P.: User modeling in dialog systems. Technical Report Technical Report SAR 02-2, Santa Anna IT Research, 2002Google Scholar
  12. Manning C.D., Schütze H. (1999) Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge MassachusettsMATHGoogle Scholar
  13. Oviatt S., MacEachern M., Levow G. (1998) Predicting hyperarticulate speech during human-computer error resolution. Speech Commun. 24(2):87–110CrossRefGoogle Scholar
  14. Paek, T., Horvitz, E.: Conversation as action under uncertainty. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 455–464. Stanford, CA, 2000Google Scholar
  15. Rosenfeld, R.: Two decades of statistical language modeling: where do we go from here? In: Proc. IEEE 88(8), 1270–1278 (2000)Google Scholar
  16. Rosenfeld R., Olsen D., Rudnicky A. (2001) Universal speech interfaces. Interactions 8(6):34–44CrossRefGoogle Scholar
  17. Strother, N.: Future cell phones: the big trends, 2005–2010. Technical Report IN0502105WH, In-Stat, Scottsdale, AZ, 2005Google Scholar
  18. Webb G., Pazzani M., Billsus D. (2001) Machine learning for user modeling. User Model. User-Adapted Interac. 11, 19–20MATHCrossRefGoogle Scholar
  19. Widmer G., Kubat M. (1996) Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 69–101Google Scholar
  20. Woods, W.A.: Language processing for speech understanding. Computer Speech Processing, pp. 305–334, Prentice Hall, UK (1985)Google Scholar
  21. Yu, D., Wang, K., Mahajan, M., Mau, P., Acero, A.: Improved name recognition with user modeling. In: Proceedings of the Eurospeech Conference, pp. 1229–1232. Geneva, Switzerland, 2003Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  1. 1.Microsoft ResearchRedmondUSA

Personalised recommendations