Machine Learning

, Volume 60, Issue 1–3, pp 97–115 | Cite as

Online Multiclass Learning with k-Way Limited Feedback and an Application to Utterance Classification



This paper introduces a setting for multiclass online learning with limited feedback and its application to utterance classification. In this learning setting, a parameter k limits the number of choices presented for selection by the environment (e.g. by the user in the case of an interactive spoken system) during each trial of the online learning sequence. New versions of standard additive and multiplicative weight update algorithms for online learning are presented that are more suited to the limited feedback setting, while sharing the efficiency advantages of the standard ones. The algorithms are evaluated on an utterance classification task in two domains. In this utterance classification task, no training material for the domain is provided (for training the speech recognizer or classifier) prior to the start of online learning. We present experiments on the effect of varying k and the weight update algorithms on the learning curve for online utterance classification. In these experiments, the new online learning algorithms improve classification accuracy compared with the standard ones. The methods presented are directly relevant to applications such as building call routing systems that adapt from feedback rather than being trained in batch mode.


online learning limited feedback utterance classification call routing 


  1. Alshawi, H. (2003). Effective utterance classification with unsupervised phonotactic models. In Proc. of the 2003 NAACL-Human Language Technology Conference. Edmonton, Canada: ACL.Google Scholar
  2. Blum, A. Learning Boolean functions in an infinite attribute space. Machine Learning, 9, 373–386.Google Scholar
  3. Carpenter, R., & Chu-Carroll, J. (1998). Natural language call routing: A robust, self-organizing approach. In Proc. of the International Conference on Speech and Language Processing, Sydney, Australia.Google Scholar
  4. Cohen, W. W., & Singer, Y. (1996). Context sensitive learning methods for text categorization. In Proc. of the 19th Annual International ACM Conference on Research and Development in Information Retrieval.Google Scholar
  5. Cortes, C., Haffner, P., & Mohri, M. (2003). Positive definite rational kernels In Proc. Annual Conference on Computational Learning Theory, LNCS 2777, Springer.Google Scholar
  6. Crammer, K., & Singer, Y. (2001). Ultraconservative online algorithms for multiclass problems. In D. Helmbold, and B. Williamson (Eds.), COLT/EuroCOLT 2001, LNAI 2111 (pp. 99–115). Springer.Google Scholar
  7. Dagan, I., Karov, Y., & Roth, D. (1997). Mistake-driven learning in text categorization. In EMNLP ‘97, 2nd Conference on Empirical Methods in Natural Language Processing.Google Scholar
  8. Golding, A. R. & Roth, D. (1999). A winnow-based approach to spelling correction. Machine Learning, 34, 107–130.CrossRefGoogle Scholar
  9. Gorin, A. L., Riccardi, G., & Wright, J.H. (1997). Howmay I help you?. Speech Communication, 23:1/2, 113–127.CrossRefGoogle Scholar
  10. Kivinen, J., & Warmuth, M. (1997). Additive versus exponentiated gradient updates for linear prediction. Journal of Information and Computation, 132:1, 1–64.CrossRefGoogle Scholar
  11. Lewis, D., Schapire, R. E., Callan, J. P., & Papka, R. (1996). Training algorithms for linear text classifiers. In SIGIR 96: Proc. of the 19th International Conference on Research and Development in Information Retrieval.Google Scholar
  12. Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285–318.Google Scholar
  13. Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108, 212–216.CrossRefGoogle Scholar
  14. Ljolje, A., Hindle, D. M., Riley, M. D., & Sproat, R. W. (2000). The AT&T LVCSR-2000 System. In Speech Transcription Workshop, Univ. of Maryland.Google Scholar
  15. Mesterharm, C. (2002). Tracking linear-threshold concepts with winnow. In Proc. of the Annual Conference on Computational Learning Theory, LNAI 2375, (pp. 138–152). Springer.Google Scholar
  16. Riccardi, G., Pieraccini, R., & Bocchieri, E. (1996). Stochastic automata for language modeling. Computer Speech and Language, 10, 265–293.CrossRefGoogle Scholar
  17. Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–407.Google Scholar
  18. Schapire, R. E. (2001). Drifting Games. Machine Learning, 43, 265–291.CrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.Google, Inc.&New YorkUSA

Personalised recommendations