Application and Evaluation of a Conditioned Hidden Markov Model for Estimating Interaction Quality of Spoken Dialogue Systems

  • Stefan UltesEmail author
  • Robert ElChab
  • Wolfgang Minker
Conference paper


The interaction quality (IQ) metric has recently been introduced for measuring the quality of spoken dialogue systems (SDSs) on the exchange level. While previous work relied on support vector machines (SVMs), we evaluate a conditioned hidden Markov model (CHMM) which accounts for the sequential character of the data and, in contrast to a regular hidden Markov model (HMM), provides class probabilities. While the CHMM achieves an unweighted average recall (UAR) of 0.39, it is outperformed by regular HMM with an UAR of 0.44 and a SVM with an UAR of 0.49, both trained and evaluated under the same conditions.


Support Vector Machine Hide Markov Model User Satisfaction State Belief Hide State 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Cohen, J.: A coefficient of agreement for nominal scales. In: Educational and Psychological Measurement, vol. 20, pp. 37–46 (1960)CrossRefGoogle Scholar
  2. 2.
    Cohen, J.: Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychol. bull. 70(4), 213 (1968)CrossRefGoogle Scholar
  3. 3.
    Engelbrecht, K.P., Gödde, F., Hartard, F., Ketabdar, H., Möller, S.: Modeling user satisfaction with hidden markov model. In: SIGDIAL ’09: Proceedings of the SIGDIAL 2009 Conference, pp. 170–177. Association for Computational Linguistics, Morristown, (2009)Google Scholar
  4. 4.
    Faber, V.: Clustering and the continuous k-means algorithm. Los Alamos Science (22), 138–144 (1994)Google Scholar
  5. 5.
    Glodek, M., Scherer, S., Schwenker, F.: Conditioned hidden markov model fusion for multimodal classification. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), pp. 2269–2272. International Speech Communication Association (2011)Google Scholar
  6. 6.
    Higashinaka, R., Minami, Y., Dohsaka, K., Meguro, T.: Issues in predicting user satisfaction transitions in dialogues: Individual differences, evaluation criteria, and prediction models. In:  Lee, G.,  Mariani, J.,  Minker, W., Nakamura, S. (eds.) Spoken Dialogue Systems for Ambient Environments, Lecture Notes in Computer Science, vol. 6392, pp. 48–60. Springer, Berlin (2010). 10.1007/978-3-642-16202-2_5CrossRefGoogle Scholar
  7. 7.
    Higashinaka, R., Minami, Y., Dohsaka, K., Meguro, T.: Modeling user satisfaction transitions in dialogues from overall ratings. In: Proceedings of the SIGDIAL 2010 Conference, pp. 18–27. Association for Computational Linguistics, Tokyo (2010)Google Scholar
  8. 8.
    Klinger, R., Tomanek, K.: Classical probabilistic models and conditional random fields. Tech. rep., Algorithm Engineering, Faculty of Computer Science, Dortmund (2007). TR07-2-013Google Scholar
  9. 9.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Morgan Kaufmann Publishers Inc., San Francisco (1989)Google Scholar
  10. 10.
    Raux, A., Bohus, D., Langner, B., Black, A.W., Eskenazi, M.: Doing research on a deployed spoken dialogue system: One year of lets go! experience. In: Proceedings of the International Conference on Speech and Language Processing (ICSLP) (2006)Google Scholar
  11. 11.
    Schmitt, A., Schatz, B., Minker, W.: Modeling and predicting quality in spoken human-computer interaction. In: Proceedings of the SIGDIAL 2011 Conference. Association for Computational Linguistics, Portland (2011)Google Scholar
  12. 12.
    Schmitt, A., Schatz, B., Minker, W.: A statistical approach for estimating user satisfaction in spoken human-machine interaction. In: Proceedings of the IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT). IEEE, Amman (2011)Google Scholar
  13. 13.
    Schmitt, A., Ultes, S., Minker, W.: A parameterized and annotated corpus of the cmu let’s go bus information system. In: International Conference on Language Resources and Evaluation (LREC) (2012)Google Scholar
  14. 14.
    Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15, 88–103 (1904)Google Scholar
  15. 15.
    Ultes, S., Schmitt, A., Minker, W.: Towards quality-adaptive spoken dialogue management. In: NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012), pp. 49–52. Association for Computational Linguistics, Montréal, (2012). URL
  16. 16.
    Walker, M., Litman, D., Kamm, C.A., Abella, A.: Paradise: a framework for evaluating spoken dialogue agents. In: Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, pp. 271–280. Association for Computational Linguistics, Morristown (1997). DOI 10.3115/979617.979652Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Institute of Communications TechnologyUlmGermany

Personalised recommendations