Concept Discovery and Automatic Semantic Annotation for Language Understanding in an Information-Query Dialogue System Using Latent Dirichlet Allocation and Segmental Methods

  • Nathalie Camelin
  • Boris Detienne
  • Stéphane Huet
  • Dominique Quadri
  • Fabrice Lefèvre
Part of the Communications in Computer and Information Science book series (CCIS, volume 348)

Abstract

Efficient statistical approaches have been recently proposed for natural language understanding in the context of dialogue systems. However, these approaches are trained on data semantically annotated at the segmental level, which increases the production cost of these resources. This kind of semantic annotation implies both to determine the concepts in a sentence and to link them to their corresponding word segments. In this paper, we propose a two-step automatic method for semantic annotation. The first step is an implementation of the latent Dirichlet allocation aiming at discovering concepts in a dialogue corpus. Then this knowledge is used as a bootstrap to infer automatically a segmentation of a word sequence into concepts using either integer linear optimisation or stochastic word alignment models (IBM models). The relation between automatically-derived and manually-defined task-dependent concepts is evaluated on a spoken dialogue task with a reference annotation.

Keywords

Concept discovery Segmental semantic annotation Language understanding Latent Dirichlet analysis Dialogue systems 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)MATHGoogle Scholar
  2. 2.
    Bonneau-Maynard, H., Rosset, S., Ayache, C., Kuhn, A., Mostefa, D.: Semantic annotation of the french media dialog corpus. In: Proceedings of the ISCA 9th European Conference on Speech Communication and Technology (2005)Google Scholar
  3. 3.
    Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)Google Scholar
  4. 4.
    Celikyilmaz, A., Hakkani-Tur, D., Tur, G.: LDA based similarity modeling for question answering. In: Proceedings of the NAACL HLT 2010 Workshop on Semantic Search (2010)Google Scholar
  5. 5.
    Chen, D.-S., Batson, R., Dang, Y.: Applied Integer Programming: Modeling and Solution. Wiley (2010)Google Scholar
  6. 6.
    Hahn, S., Dinarelli, M., Raymond, C., Lef‘evre, F., Lehnen, P., De Mori, R., Moschitti, A., Hermann Ney, H., Riccardi, G.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech and Language Processing 19(6), 1569–1583 (2010)CrossRefGoogle Scholar
  7. 7.
    Huet, S., Lefèvre, F.: Unsupervised alignment for segmental-based language understanding. In: Proceedings of the EMNLP 1st Workshop on Unsupervised Learning in NLP, UNSUP (2011)Google Scholar
  8. 8.
    Iosif, E., Tegos, A., Pangos, A., Fosler-Lussier, E., Potamianos, A.: Unsupervised combination of metrics for semantic class induction. In: Proceedings of the IEEE/ACL Spoken Language Technology Workshop (2006)Google Scholar
  9. 9.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of ACL, Companion Volume (2007)Google Scholar
  10. 10.
    Lefèvre, F.: Dynamic bayesian networks and discriminative classifiers for multi-stage semantic interpretation. In: Proceedings of the IEEE International Conference on Audio, Signal and Speech Processing (2007)Google Scholar
  11. 11.
    Mairesse, F., Gašić, M., Jurčíček, F., Keizer, S., Thomson, B., Yu, K., Young, S.: Spoken language understanding from unaligned data using discriminative classification models. In: Proceedings of the IEEE International Conference on Audio, Signal and Speech Processing (2009)Google Scholar
  12. 12.
    Meurs, M.-J., Lefèvre, F., de Mori, R.: Spoken language interpretation: On the use of dynamic bayesian networks for semantic composition. In: Proceedings of the IEEE International Conference on Audio, Signal and Speech Processing (2009)Google Scholar
  13. 13.
    Och, F., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)MATHCrossRefGoogle Scholar
  14. 14.
    Pargellis, A., Fosler-Lussier, E., Potamianos, A., Lee, C.: Metrics for measuring domain independence of semantic classes. In: Proceedings of the 7th ISCA European Conference on Speech Communication and Technology (2001)Google Scholar
  15. 15.
    Phan, X., Nguyen, L., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceeding of the 17th ACM International Conference on World Wide Web (2008)Google Scholar
  16. 16.
    ReVelle, C.S., Eiselt, H.A.: Location analysis: A synthesis and survey. European Journal of Operational Research 165(1), 1–19 (2005)MathSciNetMATHCrossRefGoogle Scholar
  17. 17.
    Siu, K., Meng, H.: Semi-automatic acquisition of domain-specific semantic structures. In: Proceedings of the ISCA 6th European Conference on Speech Communication and Technology (1999)Google Scholar
  18. 18.
    Tam, Y., Schultz, T.: Unsupervised language model adaptation using latent semantic marginals. In: Proceedings of ISCA INTERSPEECH (2006)Google Scholar
  19. 19.
    Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In: Proceedings of COLING (1996)Google Scholar
  20. 20.
    Ward, W.: Understanding spontaneous speech. In: Proceedings of the IEEE International Conference on Audio, Signal and Speech Processing (1991)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Nathalie Camelin
    • 1
  • Boris Detienne
    • 2
  • Stéphane Huet
    • 2
  • Dominique Quadri
    • 2
  • Fabrice Lefèvre
    • 2
  1. 1.LIUMUniversité du MaineLe MansFrance
  2. 2.LIA-CERIUniversité d’AvignonAvignonFrance

Personalised recommendations