Improving Identification Accuracy by Extending Acceptable Utterances in Spoken Dialogue System Using Barge-in Timing

Matsuyama, Kyoko; Komatani, Kazunori; Takahashi, Toru; Ogata, Tetsuya; Okuno, Hiroshi G.

doi:10.1007/978-3-642-13025-0_60

Kyoko Matsuyama²⁴,
Kazunori Komatani²⁴,
Toru Takahashi²⁴,
Tetsuya Ogata²⁴ &
…
Hiroshi G. Okuno²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6097))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1150 Accesses

Abstract

We describe a novel dialogue strategy enabling robust interaction under noisy environments where automatic speech recognition (ASR) results are not necessarily reliable. We have developed a method that exploits utterance timing together with ASR results to interpret user intention, that is, to identify one item that a user wants to indicate from system enumeration. The timing of utterances containing referential expressions is approximated by Gamma distribution, which is integrated with ASR results by expressing both of them as probabilities. In this paper, we improve the identification accuracy by extending the method. First, we enable interpretation of utterances including ordinal numbers, which appear several times in our data collected from users. Then we use proper acoustic models and parameters, improving the identification accuracy by 4.0% in total. We also show that Latent Semantic Mapping (LSM) enables more expressions to be handled in our framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Everyday Conversations: A Comparative Study of Expert Transcriptions and ASR Outputs at a Lexical Level

Can We Predict How Challenging Spoken Language Understanding Corpora Are Across Sources, Languages, and Domains?

Dialogues on Levitation Techniques and Acoustic Levitation

References

Wang, Y.Y., Yu, D., Ju, Y.C., Acero, A.: An introduction to voice search. IEEE Signal Processing Magazine (May 2008)
Google Scholar
Matsuyama, K., Komatani, K., Ogata, T., Okuno, H.G.: Enabling a User to Specify an Item at Any Time During System Enumeration – Item Identification for Barge-In-Able Conversational Dialogue Systems. In: Interspeech-2009, pp. 252–255 (2009)
Google Scholar
Bellegarda, J.R.: Latent semantic mapping. IEEE Signal Processing Magazine 22(5), 70–80 (2005)
Article Google Scholar
Rose, R.C., Kim, H.K.: A hybrid barge-in procedure for more reliable turn-taking in human-machine dialogue systems. In: Proceeding of IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 198–203 (2003)
Google Scholar
Ljolje, A., Goffin, V.: Discriminative training of multi-state barge-in models. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 353–358 (2007)
Google Scholar
McTear, M.F.: Spoken Dialogue Technology: Enabling the Conversational User Interface. ACM Computing Surveys, 90–169 (2002)
Google Scholar
Ström, N., Seneff, S.: Intelligent Barge-in in Conversational Systems. In: Proceeding of International Conference on Spoken Language Processing (2000)
Google Scholar
Kawahara, T., Lee, A., Takeda, K., Itou, K., Shikano, K.: Recent progress of open-source LVCSR Engine Julius and Japanese model repository. In: Proceeding of International Conference on Spoken Language Processing, pp. 3069–3072 (2004)
Google Scholar
Zhou, Y., Gao, J., White, K., Merk, I., Yao, K.: Perceptual Dominance Time Distributions in Multistable Visual Perception. Biological Cybernetics 90(4), 256–263 (2004)
Article MATH Google Scholar
Salton, G.: Automatic Text Processing. Addison-Wesley, Reading (1988)
Google Scholar
Takeda, R., Nakadai, K., Komatani, K., Ogata, T., Okuno, H.G.: Barge-in-able Robot Audition Based on ICA and Missing Feature Theory under Semi-Blind Situation. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1718–1723 (2008)
Google Scholar
Kawaguchi, N., Matsubara, S., Takeda, K., Itakura, F.: CIAIR In-Car Speech Corpus -Influence of Driving Status-. IEICE Transactions on Information and Systems, 578–582 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Informatics, Kyoto University, Kyoto, Japan
Kyoko Matsuyama, Kazunori Komatani, Toru Takahashi, Tetsuya Ogata & Hiroshi G. Okuno

Authors

Kyoko Matsuyama
View author publications
You can also search for this author in PubMed Google Scholar
Kazunori Komatani
View author publications
You can also search for this author in PubMed Google Scholar
Toru Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuya Ogata
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi G. Okuno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computing and Numerical Analysis, University of Cordoba, Campus Universitario de Rabanales, Einstein Building, 3rd floor,, 14071, Cordoba, Spain
Nicolás García-Pedrajas
Dept. of Computer Science and Artificial Intelligence, ETS de Ingenierias Informática y de Telecomunicación, University of Granada, 18071, Granada, Spain
Francisco Herrera
School of Computing, University of the West of Scotland, PA1 2BE, Paisley, UK
Colin Fyfe
Dept. Computer Science and Artificial Intelligence, ETS de Ingenierias Informática y de Telecomunicación, University of Granada, 18071, Granada, Spain
José Manuel Benítez
Department of Computer Science, Texas State University-San Marcos, 601 University Drive, TX 78666-4616, San Marcos, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Matsuyama, K., Komatani, K., Takahashi, T., Ogata, T., Okuno, H.G. (2010). Improving Identification Accuracy by Extending Acceptable Utterances in Spoken Dialogue System Using Barge-in Timing. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13025-0_60

Download citation

DOI: https://doi.org/10.1007/978-3-642-13025-0_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13024-3
Online ISBN: 978-3-642-13025-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Identification Accuracy by Extending Acceptable Utterances in Spoken Dialogue System Using Barge-in Timing

Abstract

Access this chapter

Preview

Similar content being viewed by others

Everyday Conversations: A Comparative Study of Expert Transcriptions and ASR Outputs at a Lexical Level

Can We Predict How Challenging Spoken Language Understanding Corpora Are Across Sources, Languages, and Domains?

Dialogues on Levitation Techniques and Acoustic Levitation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving Identification Accuracy by Extending Acceptable Utterances in Spoken Dialogue System Using Barge-in Timing

Abstract

Access this chapter

Preview

Similar content being viewed by others

Everyday Conversations: A Comparative Study of Expert Transcriptions and ASR Outputs at a Lexical Level

Can We Predict How Challenging Spoken Language Understanding Corpora Are Across Sources, Languages, and Domains?

Dialogues on Levitation Techniques and Acoustic Levitation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation