Abstract
We describe a novel dialogue strategy enabling robust interaction under noisy environments where automatic speech recognition (ASR) results are not necessarily reliable. We have developed a method that exploits utterance timing together with ASR results to interpret user intention, that is, to identify one item that a user wants to indicate from system enumeration. The timing of utterances containing referential expressions is approximated by Gamma distribution, which is integrated with ASR results by expressing both of them as probabilities. In this paper, we improve the identification accuracy by extending the method. First, we enable interpretation of utterances including ordinal numbers, which appear several times in our data collected from users. Then we use proper acoustic models and parameters, improving the identification accuracy by 4.0% in total. We also show that Latent Semantic Mapping (LSM) enables more expressions to be handled in our framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wang, Y.Y., Yu, D., Ju, Y.C., Acero, A.: An introduction to voice search. IEEE Signal Processing Magazine (May 2008)
Matsuyama, K., Komatani, K., Ogata, T., Okuno, H.G.: Enabling a User to Specify an Item at Any Time During System Enumeration – Item Identification for Barge-In-Able Conversational Dialogue Systems. In: Interspeech-2009, pp. 252–255 (2009)
Bellegarda, J.R.: Latent semantic mapping. IEEE Signal Processing Magazine 22(5), 70–80 (2005)
Rose, R.C., Kim, H.K.: A hybrid barge-in procedure for more reliable turn-taking in human-machine dialogue systems. In: Proceeding of IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 198–203 (2003)
Ljolje, A., Goffin, V.: Discriminative training of multi-state barge-in models. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 353–358 (2007)
McTear, M.F.: Spoken Dialogue Technology: Enabling the Conversational User Interface. ACM Computing Surveys, 90–169 (2002)
Ström, N., Seneff, S.: Intelligent Barge-in in Conversational Systems. In: Proceeding of International Conference on Spoken Language Processing (2000)
Kawahara, T., Lee, A., Takeda, K., Itou, K., Shikano, K.: Recent progress of open-source LVCSR Engine Julius and Japanese model repository. In: Proceeding of International Conference on Spoken Language Processing, pp. 3069–3072 (2004)
Zhou, Y., Gao, J., White, K., Merk, I., Yao, K.: Perceptual Dominance Time Distributions in Multistable Visual Perception. Biological Cybernetics 90(4), 256–263 (2004)
Salton, G.: Automatic Text Processing. Addison-Wesley, Reading (1988)
Takeda, R., Nakadai, K., Komatani, K., Ogata, T., Okuno, H.G.: Barge-in-able Robot Audition Based on ICA and Missing Feature Theory under Semi-Blind Situation. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1718–1723 (2008)
Kawaguchi, N., Matsubara, S., Takeda, K., Itakura, F.: CIAIR In-Car Speech Corpus -Influence of Driving Status-. IEICE Transactions on Information and Systems, 578–582 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Matsuyama, K., Komatani, K., Takahashi, T., Ogata, T., Okuno, H.G. (2010). Improving Identification Accuracy by Extending Acceptable Utterances in Spoken Dialogue System Using Barge-in Timing. In: GarcÃa-Pedrajas, N., Herrera, F., Fyfe, C., BenÃtez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13025-0_60
Download citation
DOI: https://doi.org/10.1007/978-3-642-13025-0_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13024-3
Online ISBN: 978-3-642-13025-0
eBook Packages: Computer ScienceComputer Science (R0)