Advertisement

International Journal of Speech Technology

, Volume 3, Issue 1, pp 5–14 | Cite as

The Role of Evaluation in the Development of Spoken Language Systems

  • Wolfgang Minker
Article

Abstract

In this article, several criteria and paradigms are described tomeasure the performance of spoken language systems developed in theframework of national and international research projects. Theseevaluations are carried out in the domain of spontaneous human-humaninteraction as supported by machine translation systems. They are alsoapplied in the domain of spontaneous human-machine interactiontypically used in information retrieval applications. Some evaluationparadigms are discussed in more detail. It is also shown that officialperformance tests and site-specific evaluation criteria arecomplementary in use.

human-machine interaction human-human communication natural language understanding speech recognition machine translation system response 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, J.F., Miller, B.W., Ringger, E.K., and Sikorski, T. (1996). A robust system for natural spoken dialogue. Proceedings of the 34th Annual Meeting of the Association of Computational Linguistics (ACL), Santa Cruz, USA, pp. 62–70.Google Scholar
  2. d’Alessandro, C., Aubergé, V., Bailly, G., Béchet, F., Boula de Mareüil, P., Foukia, S., Goldman, J.P., Isabelle, J.F., Keller, E., Marchal, A., Mertens, P., Pagel, V., O’Shaughnessy, D., Richard, G., Talon, M.-H., Wehrli, E., and Yvon, F. (1997). Vers l’évaluation de systèmes de synthèse de parole à partir du texte en français. Proceedings of the Journées Scientifiques et Techniques du Réseau Francophone d’Ingénierie de la Langue de l’AUPELF-UREF, Avignou, France, pp. 393–397.Google Scholar
  3. Bennacef, S.K., Bonneau-Maynard, H., Gauvain, J.L., Lamel, L.F., and Minker, W. (1994). A spoken language system for information retrieval. Proceedings of the International Conference on Speech and Language Processing (ICSLP), Yokohama, Japan, pp. 1271–1274.Google Scholar
  4. Bruce, B. (1975). Case systems for natural language. Artificial Intelligence, 6:327–360.Google Scholar
  5. Dahl, D.A., Bates, M., Brown, M., Fisher, W., Huncke-Smith, K., Pallett, D., Pao, C., Rudnicky, A., and Shriberg, E. (1992). Expanding the scope of the ATIS task: The ATIS-3 corpus. Proceedings of the ARPA Workshop on Human Language Technology, Plainsborrow, USA, pp. 43–48.Google Scholar
  6. Dolmazon, J.M., Bimbot, F., Adda, G., El Bèze, M., Caërou, J.C., Zeiliger, J., and Adda-Decker, M. (1997). Organisation de la première campagne aupelf pour l’évaluation des systèmes de dictée vocale. Proceedings of the Journées Scientifiques et Techniques du Réseau Francophone d’Ingénierie de la Langue de l’AUPELF-UREF, Avignou, France, pp. 13–18.Google Scholar
  7. Gates, D., Lavie, A., Levin, L., Waibel, A., Gavaldà, M., Mayfield, L., Woszczyna, M., and Zahn, P. (1996). End-to-end evaluation in JANUS: A speech-to-speech translation system. Proceedings of the European Conference on Artificial Intelligence (ECAI), Budapest, Hungary, pp. 35–40.Google Scholar
  8. Gauvain, J.L., Bennacef, S., Devillers, L., Lamel, L., and Rosset, S. (1997). Spoken language component of the MASK kiosk. In K. Varghese and S. Pfleger (Eds.), Human Comfort & Security of Information Systems. Berlin/Heidelberg: Springer-Verlag, pp. 93–103.Google Scholar
  9. Gibbon, D., Moore, R., and Winski, R. (Eds.) (1997). Handbook of Standards and Resources for Spoken Language Systems. Berlin/New York: Walter de Gruyter.Google Scholar
  10. Life, A., Salter, I., Temem, J.N., Bernard, F., Rosset, S., Bennacef, S., and Lamel, L. (1996). Data collection for the MASK kiosk: WOZ vs. prototype system. Proceedings of the International Conference on Speech and Language Processing (ICSLP), Philadelphia, USA, pp. 1672–1675.Google Scholar
  11. MADCOW (1992). Multi-site data collection for a spoken language corpus. Proceedings of the DARPA Workshop on Speech and Natural Language, Harriman, USA, pp. 7–14.Google Scholar
  12. Mariani, J.J. (1993). Overview of the cocosda initiative. Workshop of the International Coordinating Committee on Speech Databases and Speech I/O System Assessment, Berlin, Germany, pp. 1–3.Google Scholar
  13. Markowitz, J.A. (1996). Using Speech Recognition. Upper Saddle River, NJ: Prentice Hall.Google Scholar
  14. Minami, Y., Shikano, K., Takahashi, S., Yamada, T., Yoshioka, O., and Furui, S. (1995). Large-vocabulary continuous speech recognition algorithm applied to a multi-modal telephone directory assistance system. Speech Communication, 15:301–310.Google Scholar
  15. Minker, W. (1998). Evaluation methodologies for interactive speech systems. Proc. First International Conference on Language Resources and Evaluation(LREC), Granada, Spain, pp. 199–206, May.Google Scholar
  16. Minker, W., Bennacef, S.K., and Gauvain, J.L. (1996). A stochastic case frame approach for natural language understanding. Proceedings of the International Conference on Speech and Language Processing (ICSLP), Philadelphia, USA, pp. 1013–1016.Google Scholar
  17. Néel, F., Chollet, G., Lamel, L.F., Minker, W., and Constantinescu, A. (1996). Reconnaissance et compréhension—Évaluation et applications. Fondements et perspectives en Traitement Automatique de la Parole, AUPELF-UREF, Paris, France, pp. 331–367.Google Scholar
  18. Oerder, M. and Aust, H. (1994). A realtime prototype of an automatic inquiry system. Proceedings of the International Conference on Speech and Language Processing (ICSLP), Yokohama, Japan, pp. 703–706.Google Scholar
  19. Pallett, D.S. (1990). DARPA ATIS test results June 1990. Proceedings of the DARPA Workshop on Speech and Natural Language, Hidden Valley, USA, pp. 114–121.Google Scholar
  20. Pallett, D.S. (1991). DARPA resource management and ATIS benchmark test poster session. Proceedings of the DARPA Workshop on Speech and Natural Language, Pacific Grove, USA, pp. 49–58.Google Scholar
  21. Pallett, D.S., Dahlgren, N.L., Fiscus, J.G., Fisher, W.M., Garofolo, J.S., and Tjaden, B.C. (1992). DARPA February 1992 ATIS benchmark test results. Proceedings of the DARPA Workshop on Speech and Natural Language, Harriman, USA, pp. 15–27.Google Scholar
  22. Pallett, D.S., Fiscus, J.G., Fisher, W.M., Garofolo, J., Lund, B.A., Martin, A., and Przybocki, M.A. (1995). 1994 Benchmark tests for the ARPA spoken language program. Proceedings of the ARPA Workshop on Spoken Language Technology, Austin, USA, pp. 5–36.Google Scholar
  23. Pallett, D.S., Fiscus, J.G., Fisher, W.M., Garofolo, J., Lund, B.A., and Przybocki. M.A. (1994). 1993 Benchmark tests for the ARPA spoken language program. Proceedings of the ARPA Workshop on Spoken Language Technology, Plainsborrow, USA, pp. 15–40.Google Scholar
  24. Rabiner, L.R. (1986). A tutorial on hidden Markov models and selected applications in speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 77(2):257–285.Google Scholar
  25. Ramshaw, L.A. and Boisen, S. (1990). An SLS answer comparator. Technical report, BBN Systems and Technologies Corporation, SLS Note 7, Cambridge.Google Scholar
  26. Waibel, A., Finke, M., Gates, D., Gavaldà, M., Kemp, T., Lavie, A., Maier, M., Mayfield, L., McNair, A., Rogina, I., Shima, K., Sloboda, T., Woszczyna, M., Zeppenfeld, T., and Zahn, P. (1996). JANUS-II-Translation of spontaneous conversational speech. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Atlanta, USA, pp. 409–412.Google Scholar
  27. Young, S., Adda-Decker, M., Aubert, X., Dugast, C., Gauvain, J.L., Kershaw, D.J., Lamel, L., Leeuwen, D.A., Pye, D., Robinson, A.J., Steeneken, H.J.M., and Woodland, P.C. (1997). Multilingual large vocabulary speech recognition: The European SQALE project. Computer Speech and Language, 11:73–89.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Wolfgang Minker
    • 1
  1. 1.Spoken Language Processing Group, LIMSI-CNRSOrsay CedexFrance

Personalised recommendations