MEDIA: a semantically annotated corpus of task oriented dialogs in French

Results of the French media evaluation campaign
  • Hélène Bonneau-Maynard
  • Matthieu Quignard
  • Alexandre Denis
Article

Abstract

The aim of the French Media project was to define a protocol for the evaluation of speech understanding modules for dialog systems. Accordingly, a corpus of 1,257 real spoken dialogs related to hotel reservation and tourist information was recorded, transcribed and semantically annotated, and a semantic attribute-value representation was defined in which each conceptual relationship was represented by the names of the attributes. Two semantic annotation levels are distinguished in this approach. At the first level, each utterance is considered separately and the annotation represents the meaning of the statement without taking into account the dialog context. The second level of annotation then corresponds to the interpretation of the meaning of the statement by taking into account the dialog context; in this way a semantic representation of the dialog context is defined. This paper discusses the data collection, the detailed definition of both annotation levels, and the annotation scheme. Then the paper comments on both evaluation campaigns which were carried out during the project and discusses some results.

Keywords

Dialog system Speech understanding Corpus Annotation Evaluation 

References

  1. Allemandou, J. (2007). SIMDIAL, un paradigme d’évaluation automatique de systèmes de dialogue homme-machine par simulation déterministe d’utilisateurs. Ph.D. thesis, Université Paris XI, Orsay.Google Scholar
  2. Barras C., Geoffrois E., et al. (2001). Transcriber: Development and use of a tool for assisting speech corpora production. Speech Communication, 33(1–2), 5–22.CrossRefGoogle Scholar
  3. Bonneau-Maynard, H., Ayache, C., Bechet, F., et al. (2006). Results of the French Evalda-Media evaluation campaign for literal understanding. In Proceedings of the international conference on language resources and evaluation (LREC), Genoa (pp. 2054–2059).Google Scholar
  4. Bonneau-Maynard, H., Devillers, L., & Rosset, S. (2000). Predictive performance of dialog systems. In Proceedings of the international conference on language resources and evaluation (LREC), Athens. (pp. 177–181).Google Scholar
  5. Bonneau-Maynard, H., & Lefevre, F. (2005). A 2+1-level stochastic understanding model. In Proceedings of the IEEE automatic speech recognition and understanding workshop (ASRU), San Juan (pp. 256–261).Google Scholar
  6. Bonneau-Maynard, H., & Rosset, S. (2003). Semantic representation for spoken dialog. In Proceedings of the European conference on speech communication and technology (Eurospeech), Geneva (pp. 253–256).Google Scholar
  7. Carletta, J. (1996). Assessing agreement on classification tasks: The kappa statistics. Computational Linguistics, 2(22), 249–254.Google Scholar
  8. Chinchor, N., & Hirschmann, L. (1997). MUC-7 coreference task definition (version 3.0). In Proceedings of message understanding conference (MUC-7).Google Scholar
  9. Denis, A. (2008). Robustesse dans les systèmes de dialogue finalisés: Modélisation et évaluation du processus d’ancrage pour la gestion de l’incompréhension. Ph.D. thesis, Université Henri Poincaré, Nancy.Google Scholar
  10. Denis, A., Béchet, F., & Quignard, M. (2007). Résolution de la référence dans des dialogues homme-machine : évaluation sur corpus de deux approches symbolique et probabiliste. In: Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN), Toulouse (pp. 261–270).Google Scholar
  11. Denis, A., Quignard, M., & Pitel, G. (2006). A deep-parsing approach to natural language understanding in dialogue system: Results of a corpus-based evaluation. In Proceedings of the international conference on language resources and evaluation (LREC) (pp. 339–344).Google Scholar
  12. Devillers, L., Bonneau-Maynard, H., et al. (2003). The PEACE SLDS understanding evaluation paradigm of the French MEDIA campaign. In EACL workshop on evaluation initiatives in natural language processing, Budapest (pp. 11–18).Google Scholar
  13. FIPA. (2002). Communicative act library specification. Technical report SC00037J. Foundations for Intelligent Physical Agents, http://www.fipa.org/specs/fipa00037/.
  14. Fiscus, J. (1997). A post-processing system to yield reduced word error rates: Recogniser output voting error reduction (ROVER). In Proceedings of the IEEE automatic speech recognition and understanding workshop (ASRU), Santa Barbara, CA (pp. 347–352).Google Scholar
  15. Giachim, E., & McGlashan, S. (1997). Spoken language dialog systems. In S. Young & G. Bloothooft (Eds.), Corpus based methods in language and speech processing (pp. 69–117). Dordrecht: Kluwer.Google Scholar
  16. Gibbon, D., Moore, P., & Winski, R. (1997). Handbook of standards and resources for spoken language resources. New York: Mouton de Gruyter.Google Scholar
  17. Hirschman, L. (1992). Multi-site data collection for a spoken language corpus. In Proceedings of the DARPA speech and natural language Workshop (pp. 7–14).Google Scholar
  18. King, M., Maegaard, B., Schutz, J., et al. (1996). EAGLES—evaluation of natural language processing systems. Technical report EAG-EWG-PR.2, Centre for Language Technology, University of Copenhagen.Google Scholar
  19. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th international conference on machine learning (ICML), Williamstown, MA (pp. 282–289).Google Scholar
  20. Lamel, L., Rosset, S., et al. (1999). The LIMSI ARISE system for train travel information. In IEEE conference on acoustics, speech, and signal processing (pp. 501–504).Google Scholar
  21. Lefévre, F., & Bonneau-Maynard, H. (2002). Issues in the development of a stochastic speech understanding system. In Proceedings of the international conference on spoken language processing (ICSLP), Denver (pp. 365–368).Google Scholar
  22. Popescu-Belis, A., Rigouste, L., Salmon-Alt, S., & Romary, L. (2004). Online evaluation of coreference resolution. In Proceedings of the international conference on language resources and evaluation (LREC), Lisbon. (pp. 1507–1510).Google Scholar
  23. Raymond, C., Béchet, F., De Mori, R., & Damnati, G. (2006). On the use of finite state transducers for semantic interpretation. Speech Communication, 48(3–4), 288–304.CrossRefGoogle Scholar
  24. Rosset, S., & Tribout, D. (2005). Multi-level information and automatic dialog acts detection in human–human spoken dialogs’. In Proceedings of ISCA InterSpeech 2005, Lisbon (pp. 2789–2792).Google Scholar
  25. Salmon-Alt, S. (2001). Référence et Dialogue finalisé : de la linguistique à un modéle opérationnel. Ph.D. thesis, Université Henri Poincaré, Nancy.Google Scholar
  26. Salmon-Alt, S., & Romary, L. (2004). Towards a reference annotation framework. In Proceedings of the international conference on language resources and evaluation (LREC), Lisbon.Google Scholar
  27. van Deemter, K., & Kibble, R. (2000). On coreferring: Coreference in MUC and related annotation schemes. Computational Linguistics, 26(4):629–637.CrossRefGoogle Scholar
  28. Vanderveken, D. (1990). Meaning and speech acts. Cambridge: Cambridge University Press.Google Scholar
  29. Villaneau, J., Antoine, J.-Y., & Ridoux, O. (2004). Logical approach to natural language understanding in a spoken dialogue system. In Proceedings of the 7th international conference on text, speech and dialogue (TSD), Brno (pp. 637–644).Google Scholar
  30. Walker, M., Litman, D., et al. (1998). Evaluating spoken cialogue agents with PARADISE: 2 Cases studies. Computer Speech and Language, 3(12), 317–347.CrossRefGoogle Scholar
  31. Walker, M., Passonneau, R., & Boland, J. (2001). Quantitative and qualitative evaluation of Darpa communicator sopken dialog systems. In Proceedings of the annual meeting of the association for computational linguistics (ACL), Toulouse (pp. 515–522).Google Scholar
  32. Walker, M., Rudnicky, A., et al. (2002). Darpa communicator: cross-system results for the 2001 evaluation. In Proceedings of the international conference on spoken language processing (ICSLP), Denver (pp. 269–272).Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  • Hélène Bonneau-Maynard
    • 1
  • Matthieu Quignard
    • 2
  • Alexandre Denis
    • 2
  1. 1.LIMSI–CNRSOrsay CedexFrance
  2. 2.LORIA, Campus ScientifiqueVandoeuvre-lès-Nancy CedexFrance

Personalised recommendations