Skip to main content

Efficient Language Model Construction for Spoken Dialog Systems by Inducting Language Resources of Different Languages

  • Conference paper
  • First Online:

Abstract

Since the quality of the language model directly affects the performance of the spoken dialog system (SDS), we should use a statistical language model (LM) trained with a large amount of data that is matched to the task domain. When porting a SDS to another language, however, it is costly to re-collect a large amount of user utterances in the target language. We thus use the language resources in a source language by utilizing statistical machine translation. The main challenge in this work is to induct automatic speech recognition results collected using a speech-input system that differs from the target SDS both in the task and the target language. To select appropriate sentences to be included in the training data for the LM, we induct a spoken language understanding module of the dialog system in the source language. Experimental construction using over three million user utterances showed that it is vital to conduct a selection from the translation results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://translate.google.com/.

  2. 2.

    Although the speech-to-speech translation system is oriented to a travel domain, the log data contains many irrelevant inputs.

  3. 3.

    http://mastar.jp/translation/voicetra-en.html.

  4. 4.

    http://crfpp.sourceforge.net.

  5. 5.

    http://www.csie.ntu.edu.tw/~cjlin/liblinear/.

  6. 6.

    This translation system has been used in our speech-to-speech translation application called “VoiceTra,” http://mastar.jp/translation/index-en.html.

  7. 7.

    The lexicon, the acoustic model, and language model WFSTs are composed to generate a large WFST for recognition.

References

  1. Abe, K., Sakti, S., Isotani, R., Kawai, H., Nakamura, S.: Brazilian portuguese acoustic model training based on data borrowing from other language. In: Proceedings of Interspeech, pp. 861–864 (2010)

    Google Scholar 

  2. Bulyko, I., Ostendorf, M., Stolcke, A.: Getting more mileage fromweb text sources for conversational speech language modeling using class-dependent mixtures. In: Proceedings of Human Language Technology (HLT), vol. 2, pp. 7–9 (2003)

    Google Scholar 

  3. C Liu, L.M.: Training acoustic models with speech data from different languages. In: Proceedings of Multilingual Speech and Language Processing (2006)

    Google Scholar 

  4. Dixon, P., Finch, A., Hori, C., Kashioka, H.: Investigation on the effects of asr tuning on speech translation performance. In: Proceedings of The International Workshop on Spoken Language Translation (IWSLT), pp. 167–174 (2011)

    Google Scholar 

  5. Lefèvre, F., Mairesse, F., Young, S.: Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation. In: Proceedings of Interspeech, pp. 78–81 (2010)

    Google Scholar 

  6. Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Machine Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  7. Goh, C., Watanabe, T., Paul, M., Finch, A., Sumita, E.: The NICT translation system for IWSLT 2010. In: Proceedings of The International Workshop on Spoken Language Translation (IWSLT), pp. 139–146 (2010)

    Google Scholar 

  8. Hsu, B., Glass, J.: Iterative language model estimation: efficient data structure and algorithms. In: Proceedings of Interspeech, pp. 841–844 (2008)

    Google Scholar 

  9. Jabaian, B., Besacier, L., Lefèvre, F.: Combination of stochastic understanding and machine translation systems for language portability of dialogue systems. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5612–5615 (2011)

    Google Scholar 

  10. Kim, W., Khudanpur, S.: Lexical triggers and latent semantic analysis for cross-lingual language model adaptation. ACM Trans. Asian Lang. Inf. Process. 3(2), 94–112 (2004)

    Article  Google Scholar 

  11. Misu, T., Kawahara, T.: A bootstrapping approach for developing language model of new spoken dialogue systems by selecting web texts. In: Proceedings of Interspeech, pp. 9–12 (2006)

    Google Scholar 

  12. Misu, T., Mizukami, E., Kashioka, H., Nakamura, S., Li, H.: A bootstrapping approach for SLU portability to a new language by inducting unannotated user queries. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012)

    Google Scholar 

  13. Nanjo, H., Oku, Y., Yoshimi, T.: Automatic speech recognition framework for multilingual audio contents. In: Proceedings of Interspeech, pp. 1445–1448 (2007)

    Google Scholar 

  14. Sarikaya, R., Gravano, A., Gao, Y.: Rapid language model development using external resources for new spoken dialog domains. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 573–576 (2005)

    Google Scholar 

  15. Schultz, T., Waibel, A.: Language-independent and language-adaptive acoustic modeling for speech recognition. Speech Commun. 35(1–2), 31–51 (2001)

    Article  MATH  Google Scholar 

  16. Servan, C., Camelin, N., Raymond, C., Béchet, F., De Mori, R.: On the use of machine translation for spoken language understanding portability. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5330–5333 (2010)

    Google Scholar 

  17. Zhu, X., Rosenfeld, R.: Improving trigram language modeling with the world wide web. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 533–536 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Teruhisa Misu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this paper

Cite this paper

Misu, T., Matsuda, S., Mizukami, E., Kashioka, H., Li, H. (2014). Efficient Language Model Construction for Spoken Dialog Systems by Inducting Language Resources of Different Languages. In: Mariani, J., Rosset, S., Garnier-Rizet, M., Devillers, L. (eds) Natural Interaction with Robots, Knowbots and Smartphones. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8280-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-8280-2_10

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-8279-6

  • Online ISBN: 978-1-4614-8280-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics