Issues and Red Herrings in Evaluating Natural Language Interfaces
Due to growing interest in using natural language interfaces, it is appropriate to begin discussing their evaluation. This position paper presents the point of view that natural language interfaces offer special problems for evaluation, compared to problems encountered with other computer software. If ignored, such problems will make evaluation ineffectual. First, the paper describes the purposes and dimensions of problems in such an evaluation; differing purposes dictate differing approaches to evaluating natural language interfaces. Then, the paper points out some issues that turn out to be red herrings, and suggests a modest proposal that covers much of the broad spectrum of difficult issues involved. Since some of the issues and problems are common to both expert systems and natural language interfaces, comparisons of issues in evaluation of expert systems is also provided.
This position paper offers the point of view of one in research and development of natural language processors to those not involved in developing natural language interfaces. Consequently, it is an overview of issues, rather than a case study of an existing system or an argument for a particular mathematical evaluation technique.
KeywordsNatural Language Expert System Analytic Hierarchy Process User Community Computational Linguistics
Unable to display preview. Download preview PDF.
- Bates, M., and Bobrow, R. J., 1984, “Natural Language Interfaces: What’s Here, What’s Coming, and Who Needs It”, Artificial Intelligence Applications for Business, Reitman, W., ed., Ablex Publishing Corp., New York.Google Scholar
- Bates, M., Stallard, D., and Moser, M., 1985, “The IRUS Transportable Natural Language Database Interface”, Expert Database Systems, Cummings Publishing Company, Menlo Park, CA.Google Scholar
- Buchanan, B. G., and Shortliffe, E. H., 1984, “The Problem of Evaluation”, Rule-Based Expert Systems, Addison-Wesley, Reading, MA, Chapter 30, pp. 571–588.Google Scholar
- Damerau, F. J., 1981, “Operating Statistics for the Transformational Question Answering System”, American Journal of Computational Linguistics, 7, (1), pp. 30–42.Google Scholar
- Gaschnig, J., Klahr, P., Pople, P., Shortliffe, E., and Terry, A., 1983, “Evaluation of Expert Systems: Issues and„Case Studies”, Building Expert Systems, Hayes-Roth, F., Waterman, D. A., and Lenat, D. B., eds., Addison-Wesley, Reading, MA, Chapter 8, pp. 241–282.Google Scholar
- Goodman, G., to appear, “Reference Identification and Reference Identification Failures”, Computational Linguistics, to appear.Google Scholar
- Koile, K., and Walker, E., 1986, An IRUS Interface, Technical Report 6261, BBN Laboratories Inc., Cambridge, MA, May 1986.Google Scholar
- Malhotra, A., 1975, “Design Criteria for a Knowledge-Based English Language System for Management: An Experimental Analysis”, MAC 146, Massachusetts Institute of Technology, February, 1975, Cambridge, Mass.Google Scholar
- Pollack, M. E., Hirschberg, J., and Webber, B., 1982, User Participation in the Reasoning Processes of Expert Systems, Technical Report, University of Pennsylvania, July 1982.Google Scholar
- Tennant, H., 1981, Evaluation of Natural Language Processors, Technical Report T-103, Coordinated Science Laboratory, University of Illinois, Urbana, Illinois, November 1980.Google Scholar
- Thompson, B. H., 1980, “Linguistic Analysis of Natural Language Communication with Computers”, Proceedings of the Eighth International Conference on Computational Linguistics, International Committee on Computational Linguistics, October 1980, pp. 190–201.Google Scholar