Skip to main content
Log in

Automatic induction of language model data for a spoken dialogue system

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this paper, we address the issue of generating in-domain language model training data when little or no real user data are available. The two-stage approach taken begins with a data induction phase whereby linguistic constructs from out-of-domain sentences are harvested and integrated with artificially constructed in-domain phrases. After some syntactic and semantic filtering, a large corpus of synthetically assembled user utterances is induced. In the second stage, two sampling methods are explored to filter the synthetic corpus to achieve a desired probability distribution of the semantic content, both on the sentence level and on the class level. The first method utilizes user simulation technology, which obtains the probability model via an interplay between a probabilistic user model and the dialogue system. The second method synthesizes novel dialogue interactions from the raw data by modelling after a small set of dialogues produced by the developers during the course of system refinement. Evaluation is conducted on recognition performance in a restaurant information domain. We show that a partial match to usage-appropriate semantic content distribution can be achieved via user simulations. Furthermore, word error rate can be reduced when limited amounts of in-domain training data are augmented with synthetic data derived by our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The transcripts of the flight domain speech data will be made available for research purposes. Check the author’s website at http://people.csail.mit.edu/wangc for updates.

References

  • Araki, M., & Doshita, S. (1996). Automatic evaluation environment for spoken dialog systems. In Proceedings of the workshop on dialog processing in spoken language systems (pp. 183–194). Budapest, Hungary.

  • Bacchiani, M., Roark, B., & Saraclar, M. (2004). Language model adaptation with MAP estimation and the perceptron algorithm. In Proceedings of the human language technology conference (HLT) (pp. 21–24). Boston, MA.

  • Baptist, L., & Seneff, S. (2000). Genesis-II: A versatile system for language generation in conversational system applications. In Proceedings of the international conference on spoken language processing (ICSLP) (pp. 271–274). Beijing, China.

  • Bechet, F., Riccardi, G., & Hakkani-Tur, D. (2004). Mining spoken dialogue corpora for system evaluation and modeling. In Proceedings of conference on empirical methods in natural language processing (EMNLP), (pp. 134–141). Barcelona, Spain.

  • Bellagarda, J. (1998). Exploiting both Local and global constraint for multispan language modeling. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (Vol. II, pp. 677–680). Seattle, WA.

  • Bertoldi, N., Brugnara, F., Cettolo, M., Federico, M., & Giuliani, D. (2001). From broadcast news to spontaneous dialogue transcription: Portability issues. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (Vol. I, pp. 37–40). Salt Lake City, UT.

  • Brown, R. D. (1999). Adding linguistic knowledge to a lexical example-based translation system. In Proceedings of the eighth international conference on theoretical and methodological issues in machine translation (TMI) (pp. 22–32). Chester, England.

  • Bulyko, I., Ostendorf, M., & Stolcke, A. (2003). Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In Proceedings of the human language technology conference (HLT) (Vol. II, pp. 7–9). Edmonton, Canada.

  • Chung, G. (2004). Developing a flexible spoken dialog system using simulation. In Proceedings of the conference of the Association for Computational Linguistics (ACL) (pp. 63–70). Barcelona, Spain.

  • Chung, G., Seneff, S., & Wang, C. (2005). Automatic induction of language model data for a spoken dialogue system. In Proceedings of the sixth SIGdial workshop on discourse and dialogue (pp. 55–64). Lisbon, Portugal.

  • Chung, G., Seneff, S., Wang, C., & Hetherington, L. (2004). A dynamic vocabulary spoken dialogue interface. In Proceedings of the international conference on spoken language processing (ICSLP) (pp. 327–330). Jeju, Korea.

  • Fabbrizio, G. D., Tur, G., & Hakkani-Tür, D. (2004). Bootstrapping spoken dialog systems with data reuse. In Proceedings of the fifth SIGdial workshop on discourse and dialogue (pp. 72–80). Cambridge, MA.

  • Feng, J., Bangalore, S., & Rahim, M. (2003). Webtalk: Mining websites for automatically building dialog systems. In Proceedings of IEEE ASRU: Automatic speech recognition and understanding (pp. 168–173). Virgin Islands.

  • Fosler-Lussier, E., & Kuo, H. K. J. (2001). Using semantic class information for rapid development of language models within ASR dialogue systems. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (Vol. I, pp. 553–556). Salt Lake City, Utah.

  • Galescu, L., Ringger, E., & Allen, J. (1998). Rapid language model development for new task domains. In Proceedings of the first international conference on language resources and evaluation (LREC) (pp. 807–812). Granada, Spain.

  • Gillick, L., & Cox, S. (1989). Some statistical issues in the comparison of speech recognition algorithms. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (pp. 532–535). Glasgow, Scotland.

  • Glass, J. (2003). A probabilistic framework for segment-based speech recognition. Computer Speech and Language, 17(2–3), 137–152.

    Article  Google Scholar 

  • Hammerton, J., Osborne, M., Armstrong, S., & Daelemans, W. (2002). Introduction to special issue on machine learning approaches to shallow parsing. Journal of Machine Learning Research, Special Issue on Shallow Parsing, 2(4), 551–558.

    Article  Google Scholar 

  • Hazen, T. J., Hetherington, I. L., & Park, A. (2001). FST-based recognition techniques for multi-lingual and multi-domain spontaneous speech. In Proceedings of the European conference on speech communication and technology (Eurospeech) (pp. 1591–1594). Aalborg, Denmark.

  • Hone, K., & Baber, C. (1995). Using a simulation method to predict the transaction time effects of applying alternative levels of constraint to user utterances within speech interactive dialogs. In Proceedings of ESCA workshop on spoken dialogue systems (pp. 209–212). Vigs  Denmark.

  • Iyer, R., & Ostendorf, M. (1999). Relevance weighting for combining multi-domain data for n-gram language model. Computer, Speech and Language, 13(3), 267–282.

    Article  Google Scholar 

  • Jurafsky, D., Wooters, C., Tajchman, G., Segal, J., Stolcke, A., Fosler, E., & Morgan, N. (1994). The Berkeley restaurant project. In Proceedings of the international conference on spoken language processing (ICSLP) (pp. 2139–2142).

  • Klakow, D. (2000). Selecting articles from the language model training corpus. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (Vol. III, pp. 1905–1698).

  • Levin, E., Pieraccini, R., & Eckert, W. (2000). A stochastic model of human–machine interaction for learning dialogue strategies. IEEE Transactions on Speech and Audio Processing, 8, 11–23.

    Article  Google Scholar 

  • Levin, L., Lavie, A., Woszczyna, M., Gates, D., Gavalda, M., Koll, D., & Waibel, A. (2000). The Janus III translation system: Speech-to-speech translation in multiple domains. Machine Translation, Special Issue on Spoken Language Translation, 15(1–2), 3–25.

    Google Scholar 

  • Lin, B. S., & Lee, L. S. (2001). Computer-aided analysis and design for spoken dialog systems based on quantitative simulations. IEEE Transactions on Speech and Audio Processing, 9(5), 534–548.

    Article  Google Scholar 

  • López-Cózar, R., De la Torre, A., Segura, J. C., & Rubio, A. J. (2003). Assessment of dialogue systems by means of a new simulation technique. Speech Communication, 40(3), 387–407.

    Article  Google Scholar 

  • Popovici, C., & Baggia, P. (1997). Language modelling for task-oriented domains. In Proceedings of the European conference on speech communication and technology (Eurospeech) (pp. 1459–1462). Rhodes, Greece.

  • Rudnicky, A. (1995). Language modeling with limited domain data. In Proceedings of the ARPA spoken language technology workshop (pp. 66–69).

  • Sato, S. (1992). CTM: An example-based translation aid system. In Proceedings of the International Conference on Computational Linguistics (COLING) (pp. 1259–1263). Nantes, France.

  • Scheffler, K., & Young, S. (2000). Probabilistic simulation of human-machine dialogs. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (Vol. II, pp. 1217–1220). Istanbul, Turkey.

  • Seneff, S. (2002). Response planning and generation in the mercury flight reservation system. Computer Speech and Language, 16, 283–312.

    Article  Google Scholar 

  • Seneff, S., Wang, C., & Hazen, T. J. (2003). Automatic induction of n-gram language models from a natural language grammar. In Proceedings of the European conference on speech communication and technology (Eurospeech) (pp. 641–644). Geneva, Switzerland.

  • Varges, S., & Mellish, C. (2001). Instance-based natural language generation. In Proceedings of the conference of the North American chapter of the Association for Computational Linguistics (NAACL) (pp. 1–8). Pittsburgh, PA.

  • Veale, T., & Way, A. (1997). Gaijin: A template-driven bootstrapping approach to example-based machine translation. In Proceedings of the conference on non-empirical methods in natural language processing (NeMNLP) (pp. 239–244). Sofia, Bulgaria.

  • Wang, C., Chung, G., & Seneff, S. (2005). Language model data filtering via user simulation and dialogue resynthesis. In Proceedings of the European conference on speech communication and technology (Eurospeech) (pp. 21–24). Lisbon, Portugal.

  • Wang, C., & Seneff, S. (2004). High-quality speech translation for language learning. In Proceedings of the InSTIL/ICALL symposium: NLP and speech technologies in advanced language learning systems (pp. 99–102). Venice, Italy.

  • Zhu, X., & Rosenfeld, R. (2001). Improving Trigram language models with the world wide web. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (Vol. I, pp. 533–536).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Wang.

Additional information

The research at MIT was supported by an industrial consortium supporting the MIT Oxygen Alliance. The research at CNRI was supported in part by SPAWAR SSC-SD. The content of this paper does not necessarily reflect the position or policy of the Government, and no official endorsement should be inferred.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Chung, G. & Seneff, S. Automatic induction of language model data for a spoken dialogue system. Lang Resources & Evaluation 40, 25–46 (2006). https://doi.org/10.1007/s10579-006-9007-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-006-9007-3

Keywords

Navigation