International Journal of Speech Technology

, Volume 19, Issue 2, pp 373–383 | Cite as

Usefulness, localizability, humanness, and language-benefit: additional evaluation criteria for natural language dialogue systems

Special Issue Article

Abstract

Human–computer dialogue systems interact with human users using natural language. We used the ALICE/AIML chatbot architecture as a platform to develop a range of chatbots covering different languages, genres, text-types, and user-groups, to illustrate qualitative aspects of natural language dialogue system evaluation. We present some of the different evaluation techniques used in natural language dialogue systems, including black box and glass box, comparative, quantitative, and qualitative evaluation. Four aspects of NLP dialogue system evaluation are often overlooked: “usefulness” in terms of a user’s qualitative needs, “localizability” to new genres and languages, “humanness” or “naturalness” compared to human–human dialogues, and “language benefit” compared to alternative interfaces. We illustrated these aspects with respect to our work on machine-learnt chatbot dialogue systems; we believe these aspects are worthwhile in impressing potential new users and customers.

Keywords

Chatbot Usefulness Localizability Humanness Naturalness Language benefit 

References

  1. Abu Shawar, B. (2008). Chatbots are natural web interface to information portals. In Proceedings of INFOS2008, (pp. NLP101–NLP107).Google Scholar
  2. Abu Shawar, B. (2011). A chatbot as a natural web Interface to Arabic web QA. International Journal of Emerging Technologies in Education (iJET), 6(1), 37–43.Google Scholar
  3. Abu Shawar, B., & Atwell, E. (2010). Chatbots: Can they serve as natural language interfaces to QA corpus? In Proceedings of the sixth IASTED international conference advances in computer science and engineering (ACSE 2010), (pp. 183–188).Google Scholar
  4. Aust, H., Oerder, M., Seide, F., & Steinbiss, V. (1995). The Philips Automatic Train Timetable Information System. Speech Communication, 17, 249–262.CrossRefMATHGoogle Scholar
  5. Chai, J., Horvath, V., Nicolov, N., Stys-Budzikowska, M., Kambhatla, N., & Zadrozny, W. (2001b). Natural language sales assistant—A web-based dialog system for online sales. In Proceedings of the thirteenth innovative applications of artificial intelligence conference, (pp. 19–26). The AAAI Press.Google Scholar
  6. Chai, J., Lin, J., Zadrozny, W., Ye, Y., Stys-Budzikowska, M., Horvath, V., et al. (2001a). The role of a natural language conversational interface in online sales: a case study. International Journal of Speech Technology, 4(3/4), 285–295.CrossRefMATHGoogle Scholar
  7. Colby, K. (1973). Simulation of belief systems. In R. Schank & K. Colby (Eds.), Computer Models of Thought and Language (pp. 251–286). San Francisco: Freeman.Google Scholar
  8. Colby, K. (1975). Artificial Paranoia: A Computer Simulation of Paranoid Processes. New York, NY: Pergamon Press.Google Scholar
  9. Crockett, K., Bandar, Z., O’Shea, J. & Mclean, D. (2009). Bullying and debt: Developing novel applications of dialogue systems. In A. Jönsson, J. Alexandersson, D. Traum and I. Zukerman (Eds.), Proceedings of the 6th IJCAI Workshop on knowledge and reasoning in practical dialogue systems, Palo Alto, CA: AAAI (www.aaai.org).
  10. Cunningham, H. (1999). A definition and short history of language engineering. Journal of Natural Language Engineering, 5(1), 1–16.CrossRefGoogle Scholar
  11. Dahlbaeck, N., Jonsson, A., & Ahrenberg, A. (1993). In Wizard of Oz studies: Why and how? Proceedings of intelligent user interfaces (IUI 93) (pp. 193–200). New York, NY: ACM Press.Google Scholar
  12. Dybkjaer, L., Bernsen, N. O., & Minker, W. (2004). Evaluation and usability of multimodal spoken language dialogue systems. Speech Communication, 43(1–2), 33–54.CrossRefGoogle Scholar
  13. Elliott D., Atwell E., & Hartley A. (2004). Compiling and using a shareable parallel corpus for MT evaluation. In Proceedings of the workshop on the amazing utility of parallel and comparable corpora, fourth international conference on language resources and evaluation (LREC), Lisbon, Portugal, (pp. 18–21).Google Scholar
  14. Gandhe, S. & Traum, D. (2007). First steps towards dialogue modeling from an un-annotated human-human corpus. In Proceedings of the 5th workshop on knowledge and reasoning in practical dialogue systems, Hyderabad, India, (pp. 22–27).Google Scholar
  15. Glass, J., Polifroni, J., Seneff, S., & Zue, V. (2000). Data collection and performance evaluation of spoken dialogue systems: The MIT experience. In Proceedings international conference on spoken language processing, Beijing, China, October 2000Google Scholar
  16. Güzeldere, G., & Franchi, S. (1995). Dialogue with colourful “personalities” of early AI. Stanford Electronic Humanities Review, 4(2), 1–9.Google Scholar
  17. Hasida, K., & Den, Y. (1999). A synthetic evaluation of dialogue systems. In Y. Wilks (Ed.), Machine Conversations (pp. 113–126). Boston: Kluwer.CrossRefGoogle Scholar
  18. Hirschman, L. (1995). The roles of language processing in a spoken language interface. In D. Roe & J. Wilpon (Eds.), Voice Communication between Humans and Machines (pp. 217–237). Washington, DC: National Academy Press.Google Scholar
  19. Hirschman, L., & Thompson, H. (1997). Overview of evaluation in speech and natural language processing. In R. A. Cole, J. Mariani, H. Uzkoreit, A. Zaanen, & V. Zue (Eds.), State of the Art in Natural Language Processing (pp. 475–518). Cambridge: Cambridge University Press.Google Scholar
  20. Hughes, J., & Atwell, E. (1994). The automated evaluation of inferred word classifications. In A. G. Cohn (Ed.), Proceedings of ECAI’94: 11th European Conference on Artificial Intelligence (pp. 535–540). Chichester: John Wiley.Google Scholar
  21. Inui, N., Koiso, T., & Kotani Y. (2002). Using patterns for syntactic parsing, In Proceedings of IASTED international conference artificial intelligence and applications, (pp. 522–527).Google Scholar
  22. Inui, N., Koiso, T., Nakamura, J., & Kotani, Y. (2003). Fully corpus-based natural language dialogue system, AAAI Spring Symposium on Natural Language Generation in Spoken and Written Dialogue, AAAI Technical Report S-03-06 (pp. 58–64). Palo Alto, CA: AAAI.Google Scholar
  23. Kelly, D., Kantor, P., Morse, E., Scholtz, J. & Sun, Y. (2006). User-centered evaluation of interactive question-answering Systems, In Proceedings of the interactive question answering workshop at HLT-NAACL 2006, June, Stroudsburg, PA: Association for Computational Linguistics, (pp. 49–56).Google Scholar
  24. Koiso, T., Ikeda, T., Inui, N., and Kotani, Y. (2002). A dialog system which chooses a response using similarity between a surface case rule patterns. In Proceedings of the IPSJ Conference, IM-03, 2002.Google Scholar
  25. Kruschwitz, U., De Roeck, A., Scott, P., Steel, S., Turner, R., & Webb N. (2000). Extracting semistructured data-lessons learnt. In Proceedings of the 2nd international conference on natural language processing (NLP2000), (pp. 406–417).Google Scholar
  26. Kruschwitz, U., De Roeck, A., Scott, P., Steel, S., Turner, R., & Webb, N. (1999). Natural language access to yellow pages. In Proceedings of third International conference on knowledge-based intelligent information engineering systems, (pp. 34–37).Google Scholar
  27. Maier, E., Mast, M., & Luperfoy, S. (1996). Overview. In E. Maier, M. Mast, & S. Luperfoy (Eds.), Dialogue Processing in Spoken Language Systems (pp. 1–13). Berlin: Springer.Google Scholar
  28. McTear, M. (2002). Spoken dialogue technology: Enabling the conversational user interface. ACM Computing Surveys, 34(1), 90–169.CrossRefGoogle Scholar
  29. Mikic, F. A., Burguillo, J. C., Rodríguez, D. A., Rodríguez, E., &d Llamas, M. (2008). T-BOT and Q-BOT: A couple of AIML-based bots for tutoring courses and evaluating students. In Proceedings of 38th ASEE/IEEE frontiers in education conference, (pp. S3A-7-S3A-12).Google Scholar
  30. Quarteroni, S. (2008). Personalized, interactive question answering on the Web. Proceedings of the Workshop on Knowledge and Reasoning for Answering Questions (KRAQ ‘08), COLING 2008, Stroudsburg, PA: ACL, (pp. 33–40).Google Scholar
  31. Quarteroni, S., & Manandhar, S. (2006). User modeling for adaptive question answering and Information retrieval. G.C.J. Sutcliiffe & R. G. Goebel (Eds.), In Proceedings of the nineteenth international florida artificial intelligence research society conference (FLAIRS-19), Melbourne Beach, FL, May 2006, (pp. np).Google Scholar
  32. Quarteroni, S. and Manandhar, S. (2007). A chatbot-based interactive question answering system. In Proceedings of the 11th workshop on the semantics and pragmatics of dialogue (SemDial 11), Rovereto, (DECALOG 2007), (pp. 83–90).Google Scholar
  33. Rayson, P. (2003). Matrix: a statistical method and software tool for linguistic analysis through corpus comparison. Ph.D. thesis, Bailrigg, Lancaster: Lancaster University.Google Scholar
  34. Schuetzler, R., Grimes, G. M., Giboney, J., & Buckman, J. (2014). Facilitating natural conversational agent interactions: Lessons from a deception experiment. International conference on information systems. Auckland, December 14–17, 2014, pp. 1–16.Google Scholar
  35. Shaalan, K. (2014). A Survey of Arabic Named Entity Recognition and Classification. Computational Linguistics, 40(2), 469–510.CrossRefGoogle Scholar
  36. Traum, D. R., Swartout, W., Marsella, S. & Gratch, J. (2005). Virtual humans for non-team interaction training. In proceedings of the AAMAS Workshop on creating bonds with embodied conversational Agents.Google Scholar
  37. Van Zaanen, M., Roberts, A., & Atwell, E.S. (2004). A multilingual parallel parsed corpus as gold standard for grammatical inference evaluation. In Proceedings of the workshop on the amazing utility of parallel and comparable corpora. Fourth International Conference on Language Resources and Evaluation (LREC), Lisbon, (pp. 58–61).Google Scholar
  38. Walker, M., Litman, A., Kamm, D., and Abella, A. (1997). Evaluating interactive Dialogue systems: Extending component evaluation to integrated system evaluation. In Proceedings of the ACL/EACL workshop on spoken dialogue systems (pp. 1–8).Google Scholar
  39. Wallace, R. (2003). The Elements of AIML Style. Foundation: ALICE A.I.Google Scholar
  40. Weizenbaum, J. (1966). ELIZA-A computer program for the study of natural language communication between man and machine. Communications of the ACM, 10(8), 36–45.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.IT DepartmentArab Open UniversityAmmanJordan
  2. 2.School of ComputingUniversity of LeedsLeedsUK

Personalised recommendations