Automatic induction of language model data for a spoken dialogue system

Wang, Chao; Chung, Grace; Seneff, Stephanie

doi:10.1007/s10579-006-9007-3

Automatic induction of language model data for a spoken dialogue system

Original Paper
Published: 08 November 2006

Volume 40, pages 25–46, (2006)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Chao Wang¹,
Grace Chung² &
Stephanie Seneff¹

130 Accesses
4 Citations
6 Altmetric
Explore all metrics

Abstract

In this paper, we address the issue of generating in-domain language model training data when little or no real user data are available. The two-stage approach taken begins with a data induction phase whereby linguistic constructs from out-of-domain sentences are harvested and integrated with artificially constructed in-domain phrases. After some syntactic and semantic filtering, a large corpus of synthetically assembled user utterances is induced. In the second stage, two sampling methods are explored to filter the synthetic corpus to achieve a desired probability distribution of the semantic content, both on the sentence level and on the class level. The first method utilizes user simulation technology, which obtains the probability model via an interplay between a probabilistic user model and the dialogue system. The second method synthesizes novel dialogue interactions from the raw data by modelling after a small set of dialogues produced by the developers during the course of system refinement. Evaluation is conducted on recognition performance in a restaurant information domain. We show that a partial match to usage-appropriate semantic content distribution can be achieved via user simulations. Furthermore, word error rate can be reduced when limited amounts of in-domain training data are augmented with synthetic data derived by our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Conditioned Text Generation with Transfer for Closed-Domain Dialogue Systems

User Simulation in the Development of Statistical Spoken Dialogue Systems

Efficient Language Model Construction for Spoken Dialog Systems by Inducting Language Resources of Different Languages

Notes

The transcripts of the flight domain speech data will be made available for research purposes. Check the author’s website at http://people.csail.mit.edu/wangc for updates.

References

Araki, M., & Doshita, S. (1996). Automatic evaluation environment for spoken dialog systems. In Proceedings of the workshop on dialog processing in spoken language systems (pp. 183–194). Budapest, Hungary.
Bacchiani, M., Roark, B., & Saraclar, M. (2004). Language model adaptation with MAP estimation and the perceptron algorithm. In Proceedings of the human language technology conference (HLT) (pp. 21–24). Boston, MA.
Baptist, L., & Seneff, S. (2000). Genesis-II: A versatile system for language generation in conversational system applications. In Proceedings of the international conference on spoken language processing (ICSLP) (pp. 271–274). Beijing, China.
Bechet, F., Riccardi, G., & Hakkani-Tur, D. (2004). Mining spoken dialogue corpora for system evaluation and modeling. In Proceedings of conference on empirical methods in natural language processing (EMNLP), (pp. 134–141). Barcelona, Spain.
Bellagarda, J. (1998). Exploiting both Local and global constraint for multispan language modeling. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (Vol. II, pp. 677–680). Seattle, WA.
Bertoldi, N., Brugnara, F., Cettolo, M., Federico, M., & Giuliani, D. (2001). From broadcast news to spontaneous dialogue transcription: Portability issues. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (Vol. I, pp. 37–40). Salt Lake City, UT.
Brown, R. D. (1999). Adding linguistic knowledge to a lexical example-based translation system. In Proceedings of the eighth international conference on theoretical and methodological issues in machine translation (TMI) (pp. 22–32). Chester, England.
Bulyko, I., Ostendorf, M., & Stolcke, A. (2003). Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In Proceedings of the human language technology conference (HLT) (Vol. II, pp. 7–9). Edmonton, Canada.
Chung, G. (2004). Developing a flexible spoken dialog system using simulation. In Proceedings of the conference of the Association for Computational Linguistics (ACL) (pp. 63–70). Barcelona, Spain.
Chung, G., Seneff, S., & Wang, C. (2005). Automatic induction of language model data for a spoken dialogue system. In Proceedings of the sixth SIGdial workshop on discourse and dialogue (pp. 55–64). Lisbon, Portugal.
Chung, G., Seneff, S., Wang, C., & Hetherington, L. (2004). A dynamic vocabulary spoken dialogue interface. In Proceedings of the international conference on spoken language processing (ICSLP) (pp. 327–330). Jeju, Korea.
Fabbrizio, G. D., Tur, G., & Hakkani-Tür, D. (2004). Bootstrapping spoken dialog systems with data reuse. In Proceedings of the fifth SIGdial workshop on discourse and dialogue (pp. 72–80). Cambridge, MA.
Feng, J., Bangalore, S., & Rahim, M. (2003). Webtalk: Mining websites for automatically building dialog systems. In Proceedings of IEEE ASRU: Automatic speech recognition and understanding (pp. 168–173). Virgin Islands.
Fosler-Lussier, E., & Kuo, H. K. J. (2001). Using semantic class information for rapid development of language models within ASR dialogue systems. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (Vol. I, pp. 553–556). Salt Lake City, Utah.
Galescu, L., Ringger, E., & Allen, J. (1998). Rapid language model development for new task domains. In Proceedings of the first international conference on language resources and evaluation (LREC) (pp. 807–812). Granada, Spain.
Gillick, L., & Cox, S. (1989). Some statistical issues in the comparison of speech recognition algorithms. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (pp. 532–535). Glasgow, Scotland.
Glass, J. (2003). A probabilistic framework for segment-based speech recognition. Computer Speech and Language, 17(2–3), 137–152.
Article Google Scholar
Hammerton, J., Osborne, M., Armstrong, S., & Daelemans, W. (2002). Introduction to special issue on machine learning approaches to shallow parsing. Journal of Machine Learning Research, Special Issue on Shallow Parsing, 2(4), 551–558.
Article Google Scholar
Hazen, T. J., Hetherington, I. L., & Park, A. (2001). FST-based recognition techniques for multi-lingual and multi-domain spontaneous speech. In Proceedings of the European conference on speech communication and technology (Eurospeech) (pp. 1591–1594). Aalborg, Denmark.
Hone, K., & Baber, C. (1995). Using a simulation method to predict the transaction time effects of applying alternative levels of constraint to user utterances within speech interactive dialogs. In Proceedings of ESCA workshop on spoken dialogue systems (pp. 209–212). Vigs Denmark.
Iyer, R., & Ostendorf, M. (1999). Relevance weighting for combining multi-domain data for n-gram language model. Computer, Speech and Language, 13(3), 267–282.
Article Google Scholar
Jurafsky, D., Wooters, C., Tajchman, G., Segal, J., Stolcke, A., Fosler, E., & Morgan, N. (1994). The Berkeley restaurant project. In Proceedings of the international conference on spoken language processing (ICSLP) (pp. 2139–2142).
Klakow, D. (2000). Selecting articles from the language model training corpus. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (Vol. III, pp. 1905–1698).
Levin, E., Pieraccini, R., & Eckert, W. (2000). A stochastic model of human–machine interaction for learning dialogue strategies. IEEE Transactions on Speech and Audio Processing, 8, 11–23.
Article Google Scholar
Levin, L., Lavie, A., Woszczyna, M., Gates, D., Gavalda, M., Koll, D., & Waibel, A. (2000). The Janus III translation system: Speech-to-speech translation in multiple domains. Machine Translation, Special Issue on Spoken Language Translation, 15(1–2), 3–25.
Google Scholar
Lin, B. S., & Lee, L. S. (2001). Computer-aided analysis and design for spoken dialog systems based on quantitative simulations. IEEE Transactions on Speech and Audio Processing, 9(5), 534–548.
Article Google Scholar
López-Cózar, R., De la Torre, A., Segura, J. C., & Rubio, A. J. (2003). Assessment of dialogue systems by means of a new simulation technique. Speech Communication, 40(3), 387–407.
Article Google Scholar
Popovici, C., & Baggia, P. (1997). Language modelling for task-oriented domains. In Proceedings of the European conference on speech communication and technology (Eurospeech) (pp. 1459–1462). Rhodes, Greece.
Rudnicky, A. (1995). Language modeling with limited domain data. In Proceedings of the ARPA spoken language technology workshop (pp. 66–69).
Sato, S. (1992). CTM: An example-based translation aid system. In Proceedings of the International Conference on Computational Linguistics (COLING) (pp. 1259–1263). Nantes, France.
Scheffler, K., & Young, S. (2000). Probabilistic simulation of human-machine dialogs. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (Vol. II, pp. 1217–1220). Istanbul, Turkey.
Seneff, S. (2002). Response planning and generation in the mercury flight reservation system. Computer Speech and Language, 16, 283–312.
Article Google Scholar
Seneff, S., Wang, C., & Hazen, T. J. (2003). Automatic induction of n-gram language models from a natural language grammar. In Proceedings of the European conference on speech communication and technology (Eurospeech) (pp. 641–644). Geneva, Switzerland.
Varges, S., & Mellish, C. (2001). Instance-based natural language generation. In Proceedings of the conference of the North American chapter of the Association for Computational Linguistics (NAACL) (pp. 1–8). Pittsburgh, PA.
Veale, T., & Way, A. (1997). Gaijin: A template-driven bootstrapping approach to example-based machine translation. In Proceedings of the conference on non-empirical methods in natural language processing (NeMNLP) (pp. 239–244). Sofia, Bulgaria.
Wang, C., Chung, G., & Seneff, S. (2005). Language model data filtering via user simulation and dialogue resynthesis. In Proceedings of the European conference on speech communication and technology (Eurospeech) (pp. 21–24). Lisbon, Portugal.
Wang, C., & Seneff, S. (2004). High-quality speech translation for language learning. In Proceedings of the InSTIL/ICALL symposium: NLP and speech technologies in advanced language learning systems (pp. 99–102). Venice, Italy.
Zhu, X., & Rosenfeld, R. (2001). Improving Trigram language models with the world wide web. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (Vol. I, pp. 533–536).

Download references

Author information

Authors and Affiliations

MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar Street, Cambridge, MA, 02139, USA
Chao Wang & Stephanie Seneff
Corporation for National Research Initiatives, 1895 Preston White Drive, Suite 100, Reston, VA, 22209, USA
Grace Chung

Authors

Chao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Grace Chung
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie Seneff
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Wang.

Additional information

The research at MIT was supported by an industrial consortium supporting the MIT Oxygen Alliance. The research at CNRI was supported in part by SPAWAR SSC-SD. The content of this paper does not necessarily reflect the position or policy of the Government, and no official endorsement should be inferred.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Chung, G. & Seneff, S. Automatic induction of language model data for a spoken dialogue system. Lang Resources & Evaluation 40, 25–46 (2006). https://doi.org/10.1007/s10579-006-9007-3

Download citation

Published: 08 November 2006
Issue Date: February 2006
DOI: https://doi.org/10.1007/s10579-006-9007-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic induction of language model data for a spoken dialogue system

Abstract

Access this article

Similar content being viewed by others

Conditioned Text Generation with Transfer for Closed-Domain Dialogue Systems

User Simulation in the Development of Statistical Spoken Dialogue Systems

Efficient Language Model Construction for Spoken Dialog Systems by Inducting Language Resources of Different Languages

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic induction of language model data for a spoken dialogue system

Abstract

Access this article

Similar content being viewed by others

Conditioned Text Generation with Transfer for Closed-Domain Dialogue Systems

User Simulation in the Development of Statistical Spoken Dialogue Systems

Efficient Language Model Construction for Spoken Dialog Systems by Inducting Language Resources of Different Languages

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation