Concept Discovery and Automatic Semantic Annotation for Language Understanding in an Information-Query Dialogue System Using Latent Dirichlet Allocation and Segmental Methods

Camelin, Nathalie; Detienne, Boris; Huet, Stéphane; Quadri, Dominique; Lefèvre, Fabrice

doi:10.1007/978-3-642-37186-8_3

Nathalie Camelin⁵,
Boris Detienne⁶,
Stéphane Huet⁶,
Dominique Quadri⁶ &
…
Fabrice Lefèvre⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 348))

Included in the following conference series:

International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management

1151 Accesses

Abstract

Efficient statistical approaches have been recently proposed for natural language understanding in the context of dialogue systems. However, these approaches are trained on data semantically annotated at the segmental level, which increases the production cost of these resources. This kind of semantic annotation implies both to determine the concepts in a sentence and to link them to their corresponding word segments. In this paper, we propose a two-step automatic method for semantic annotation. The first step is an implementation of the latent Dirichlet allocation aiming at discovering concepts in a dialogue corpus. Then this knowledge is used as a bootstrap to infer automatically a segmentation of a word sequence into concepts using either integer linear optimisation or stochastic word alignment models (IBM models). The relation between automatically-derived and manually-defined task-dependent concepts is evaluated on a spoken dialogue task with a reference annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Bonneau-Maynard, H., Rosset, S., Ayache, C., Kuhn, A., Mostefa, D.: Semantic annotation of the french media dialog corpus. In: Proceedings of the ISCA 9th European Conference on Speech Communication and Technology (2005)
Google Scholar
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Celikyilmaz, A., Hakkani-Tur, D., Tur, G.: LDA based similarity modeling for question answering. In: Proceedings of the NAACL HLT 2010 Workshop on Semantic Search (2010)
Google Scholar
Chen, D.-S., Batson, R., Dang, Y.: Applied Integer Programming: Modeling and Solution. Wiley (2010)
Google Scholar
Hahn, S., Dinarelli, M., Raymond, C., Lef‘evre, F., Lehnen, P., De Mori, R., Moschitti, A., Hermann Ney, H., Riccardi, G.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech and Language Processing 19(6), 1569–1583 (2010)
Article Google Scholar
Huet, S., Lefèvre, F.: Unsupervised alignment for segmental-based language understanding. In: Proceedings of the EMNLP 1st Workshop on Unsupervised Learning in NLP, UNSUP (2011)
Google Scholar
Iosif, E., Tegos, A., Pangos, A., Fosler-Lussier, E., Potamianos, A.: Unsupervised combination of metrics for semantic class induction. In: Proceedings of the IEEE/ACL Spoken Language Technology Workshop (2006)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of ACL, Companion Volume (2007)
Google Scholar
Lefèvre, F.: Dynamic bayesian networks and discriminative classifiers for multi-stage semantic interpretation. In: Proceedings of the IEEE International Conference on Audio, Signal and Speech Processing (2007)
Google Scholar
Mairesse, F., Gašić, M., Jurčíček, F., Keizer, S., Thomson, B., Yu, K., Young, S.: Spoken language understanding from unaligned data using discriminative classification models. In: Proceedings of the IEEE International Conference on Audio, Signal and Speech Processing (2009)
Google Scholar
Meurs, M.-J., Lefèvre, F., de Mori, R.: Spoken language interpretation: On the use of dynamic bayesian networks for semantic composition. In: Proceedings of the IEEE International Conference on Audio, Signal and Speech Processing (2009)
Google Scholar
Och, F., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Article MATH Google Scholar
Pargellis, A., Fosler-Lussier, E., Potamianos, A., Lee, C.: Metrics for measuring domain independence of semantic classes. In: Proceedings of the 7th ISCA European Conference on Speech Communication and Technology (2001)
Google Scholar
Phan, X., Nguyen, L., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceeding of the 17th ACM International Conference on World Wide Web (2008)
Google Scholar
ReVelle, C.S., Eiselt, H.A.: Location analysis: A synthesis and survey. European Journal of Operational Research 165(1), 1–19 (2005)
Article MathSciNet MATH Google Scholar
Siu, K., Meng, H.: Semi-automatic acquisition of domain-specific semantic structures. In: Proceedings of the ISCA 6th European Conference on Speech Communication and Technology (1999)
Google Scholar
Tam, Y., Schultz, T.: Unsupervised language model adaptation using latent semantic marginals. In: Proceedings of ISCA INTERSPEECH (2006)
Google Scholar
Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In: Proceedings of COLING (1996)
Google Scholar
Ward, W.: Understanding spontaneous speech. In: Proceedings of the IEEE International Conference on Audio, Signal and Speech Processing (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

LIUM, Université du Maine, Le Mans, France
Nathalie Camelin
LIA-CERI, Université d’Avignon, Avignon, France
Boris Detienne, Stéphane Huet, Dominique Quadri & Fabrice Lefèvre

Authors

Nathalie Camelin
View author publications
You can also search for this author in PubMed Google Scholar
Boris Detienne
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Huet
View author publications
You can also search for this author in PubMed Google Scholar
Dominique Quadri
View author publications
You can also search for this author in PubMed Google Scholar
Fabrice Lefèvre
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IST - Technical University of Lisbon, Av.Rovisco Pais, 1, 1049-001, Lisbon, Portugal
Ana Fred
Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
Jan L. G. Dietz
Informatics Research Centre, Henley Business School, University of Reading, RG6 6UD, Reading, UK
Kecheng Liu
INSTICC and IPS, Estefanilha, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Camelin, N., Detienne, B., Huet, S., Quadri, D., Lefèvre, F. (2013). Concept Discovery and Automatic Semantic Annotation for Language Understanding in an Information-Query Dialogue System Using Latent Dirichlet Allocation and Segmental Methods. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2011. Communications in Computer and Information Science, vol 348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37186-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-37186-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37185-1
Online ISBN: 978-3-642-37186-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics