Automatic Annotation of Dialogues Using n-Grams

  • Carlos D. Martínez-Hinarejos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4188)


The development of a dialogue system for any task implies the acquisition of a dialogue corpus in order to study the structure of the dialogues used in that task. This structure is reflected in the dialogue system behaviour, which can be rule-based or corpus-based. In the case of corpus-based dialogue systems, the behaviour is defined by statistical models which are inferred from an annotated corpus of dialogues. This annotation task is usually difficult and expensive, and therefore, automatic dialogue annotation tools are necessary to reduce the annotation effort. An automatic dialogue labeller technique that is based on n-grams is presented in this work. Its different variants are evaluated with respect to manual human annotations of a dialogue corpus devoted to train queries.


Machine Translation Automatic Annotation Dialogue System Statistical Machine Translation Word Error Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kuppevelt, J.V., Smith, R.W.: Current and New Directions in Discourse and Dialogue. In: Text, Speech and Language Technology, vol. 22. Springer, Heidelberg (2003)Google Scholar
  2. 2.
    Aust, H., Oerder, M., Seide, F., Steinbiss, V.: The philips automatic train timetable information system. Speech Communication 17, 249–263 (1995)CrossRefGoogle Scholar
  3. 3.
    Fraser, M., Gilbert, G.: Simulating speech systems. Computer Speech and Language 5, 81–99 (1991)CrossRefGoogle Scholar
  4. 4.
    Gorin, A., Riccardi, G., Wright, J.: How i help you? Speech Communication 23, 113–127 (1997)CrossRefGoogle Scholar
  5. 5.
    Stolcke, A., Coccaro, N., Bates, R., Taylor, P., van Ess-Dykema, C., Ries, K., Shriberg, E., Jurafsky, D., Martin, R., Meteer, M.: Dialogue act modelling for automatic tagging and recognition of conversational speech. Computational Linguistics 26(3), 1–34 (2000)CrossRefGoogle Scholar
  6. 6.
    Searle, J.R.: Speech acts. Cambridge University Press, Cambridge (1969)Google Scholar
  7. 7.
    Alcacer, N., Benedí, J., Blat, F., Granell, R., Martínez, C.D., Torres, F.: Acquisition and labelling of a spontaneous speech dialogue corpus. In: Proceeding of 10th International Conference on Speech and Computer (SPECOM), Patras, Greece, pp. 583–586 (2005)Google Scholar
  8. 8.
    Picó, D., Tomás, J., Casacuberta, F.: GIATI: A general methodology for finite-state translation using alignments. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 216–223. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Casacuberta, F., Vidal, E., Picó, D.: Inference of finite-state transducers from regular languages. Pattern Recognition 38(9), 1431–1443 (2005)MATHCrossRefGoogle Scholar
  10. 10.
    Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)Google Scholar
  11. 11.
    Martínez-Hinarejos, C.D., Casacuberta, F.: A pattern recognition approach to dialog labelling by using finite-state transducers. In: Proceedings of 5th IberoAmerican Symposium on Pattern Recognition, Lisbon, Portugal, pp. 669–677 (2000)Google Scholar
  12. 12.
    Benedí, J.M., Varona, A., Lleida, E.: Dihana: Dialogue system for information access using spontaneous speech in several environments tic2002-04103-c03. In: Reports for Jornadas de Seguimiento - Programa Nacional de Tecnologías Informáticas, Málaga, Spain (2004)Google Scholar
  13. 13.
    Martínez-Hinarejos, C.D., Casacuberta, F.: Evaluating a probabilistic dialogue model for a railway information task. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 381–388. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Rosenfeld, R.: The cmu-cambridge statistical language modelling toolkit v2. Technical report, Carnegie Mellon University (1998)Google Scholar
  15. 15.
    Church, K.W., Gale, W.A.: A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of english bigrams. Computer Speech and Language 5, 19–54 (1991)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Carlos D. Martínez-Hinarejos
    • 1
  1. 1.Departamento de Sistemas Informáticos y ComputaciónUniversidad Politécnica de ValenciaValenciaSpain

Personalised recommendations