Skip to main content
Log in

Accuracy and Suitability: New Challenges for Evaluation

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  • ALPAC. (1966) Languages and Machines: Computers in Translation and Linguistics. Report of the Automatic Language Processing Advisory Committee, Division of Behavioral Sciences, National Academy of Sciences, National Research Council Publication 1416, Washington, DC.

  • Ankherst M. (2001) Human Involvement and Interactivity of the Next Generation’s Data Mining Tools. Workshop on Research Issues in Data Mining and Knowledge Discovery, Data Mining and Knowledge Discovery (DMKD) 2001.

  • AMTA. (1992) MT Evaluation: Basis for Future Directions. In Proceedings of a workshop held in San Diego, CA. Technical report, Association for Machine Translation in the Americas.

  • D.C. Blair (2002) Some Thoughts on the Reported Results of TREC Information Processing and Management, 38/3 Pergamon Press Tarrytown, NY 445–451

    Google Scholar 

  • Blasband M. (1999) Practice of Validation: The ARISE Application of the Eagles Framework. In Proceedings of the European Evaluation of Language Systems Workshop. Hoevelaken, Holland.

  • M. Canelli D. Grasso M. King (2000) Methods and Metrics for the Evaluation of Dictation Systems: a case study LREC 2000 Athens 1325–1331

    Google Scholar 

  • F.J. Damerau (1980) The Transformational Question Answering System: Description, Operating Experience and Implications Report RC8287, IBM Thomas Watson Research Center Yorktown Heights, NY

    Google Scholar 

  • Doyon J., Taylor K., White J.S. (1998) The DARPA MT Evaluation Methodology: Past and Present. In Proceedings of the AMTA Conference, Philadelphia, PA.

  • EAGLES. (1996) EAGLES Evaluation of Natural Language Processing Systems. Final Report, EAGLES Evaluation Working group. Report EAG-EWG-PR.2 (ISBN 87-90708-00-8), Center for Sprogteknologi, Copenhagen.

  • Falkedal K. (1994) Evaluation Methods for Machine Translation Systems: An Historical Overview and a Critical Account. Internal report, ISSCO. Available from ISSCO.

  • D. Flickinger J. Nerbonne I. Sag T. Wasow (1987) Towards Evaluation of NLP Systems Report, Hewlett Packard Laboratories Palo Alto, CA

    Google Scholar 

  • R. Grishman (1997) Information Extraction: Techniques and Challenges M.-T. Pazienza (Eds) Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology SCIE-97 Frascatti, Italy 10–26

    Google Scholar 

  • Grishman R., Sundheim B. (1996) Message Understanding Conference-6: A Brief History. Coling-96.

  • T.F. Hand (1997) A Proposal for Task-based Evaluation of Text Summarization Systems ACL/EACL workshop on Intelligent Scaleable Text Summarization Madrid 31–37

    Google Scholar 

  • Hawking D., Carswell N., Thistlewaite P., Harman D. (1999) Results and Challenges in Web Search Evaluation. In Proceedings of the Eighth International Conference on World Wide Web, Elsevier.

  • L. Hirschman (1998a) Language Understanding Evaluations: Lessons learned from MUC and ATIS LREC-98 Granada, Spain

    Google Scholar 

  • L. Hirschman (1998b) ArticleTitleThe Evolution of Evaluation: Lessons from the Message Understanding Conferences Computer Speech and Language 12 281–305 Occurrence Handle10.1006/csla.1998.0102

    Article  Google Scholar 

  • E.H. Hovy M. King A. Popescu-Belis (2002a) ArticleTitlePrinciples of Context-Based Machine Translation Evaluation Machine Translation 16 1–33

    Google Scholar 

  • Hovy E.H., King M., Popescu-Belis A. (2002b) Computer-Aided Specification of Quality Models for Machine Translation Evaluation. LREC-02, pp. 729–753.

  • ISO/IEC 9126-1. (2001) Software Engineering – Product Quality – Part 1: Quality Model. Geneva, International Organization for Standardization and International Electrotechnical Commission.

  • ISO/IEC DTR 9126-2. (2003a) Software Engineering – Product Quality – Part 2: External Metrics. Geneva, International Organization for Standardization and International Electrotechnical Commission.

  • ISO/IEC CD TR 9126-3. (2003b) Software Engineering – Product Quality – Part 3: Internal Metrics. Geneva, International Organization for Standardization and International Electrotechnical Commission.

  • ISO/IEC CD 9126-4. (2004) Software Engineering – Product Quality – Part 4: Quality in use Metrics. Geneva, International Organization for Standardization and International Electrotechnical Commission.

  • ISO/IEC CD 9126-30. (in preparation) Software Engineering – Software Product Quality Requirements and Evaluation – Part 30: Quality Metrics – Metrics reference model and guide. Geneva, International Organization for Standardization and International Electrotechnical Commission.

  • ISO/IEC 14598-1. (1999) Information Technology – Software Product Evaluation – Part 1: General Overview. Geneva, International Organization for Standardization and International Electrotechnical Commission.

  • ISO/IEC 14598-2. (2000a) – Software Engineering – Product Evaluation – Part 2: Planning and Management. Geneva, International Organization for Standardization and International Electrotechnical Commission.

  • ISO/IEC 14598-3. (2000b) – Software Engineering – Product Evaluation – Part 3: Process for Developers. Geneva, International Organization for Standardization and International Electrotechnical Commission.

  • ISO/IEC 14598-4. (2000c) – Software Engineering – Product Evaluation – Part 4: Process for Acquirers. Geneva, International Organization for Standardization and International Electrotechnical Commission.

  • ISO/IEC 14598-5. (1998) Information Technology – Software Product Evaluation – Part 5: Process for Evaluators. Geneva, International Organization for Standardization and International Electrotechnical Commission.

  • ISO/IEC 14598-6. (2001) – Software Engineering – Product Evaluation – Part 6: Documentation of Evaluation Modules. Geneva, International Organization for Standardization and International Electrotechnical Commission.

  • ISO/IEC 9126. (1991) Information Technology – Software Product Evaluation – Quality Characteristics and Guidelines for their Use. Geneva, International Organization for Standardization and International Electrotechnical Commission.

  • M. Jarke J.A. Turner E.A. Stohr Y. Vassiliou N.H. White K. Michielsen (1985) ArticleTitleA Field Evaluation of Natural Language for Data Retrieval IEEE Transactions on Software Engineering, SE-11 1 97–113

    Google Scholar 

  • InstitutionalAuthorNameJEIDA. (1992) JEIDA Methodology and Criteria on Machine Translation Evaluation Report, Japan Electronic Development Association Tokyo

    Google Scholar 

  • King M., Underwood N. (2004) User Oriented Evaluation of Knowledge Discovery Systems. In Proceedings of a Workshop at LREC-04.

  • Minker W. (2002) Overview on Recent Activities in Speech Understanding and Dialogue Systems Evaluation. International Conference on Speech and Language Processing (ICSLP), Denver, USA.

  • H. Nomura J. Isahara (1992) The JEIDA Report on MT Evaluation. Workshop on MT Evaluation: Basis for Future Directions Association for Machine Translation in the Americas (AMTA) San Diego, CA

    Google Scholar 

  • P. Paggio N. Underwood (1998) ArticleTitleValidating the TEMAA Evaluation Methodology: A Case Study on Danish Spelling Checkers Natural Language Engineering 4 IssueID3 211–228 Occurrence Handle10.1017/S1351324998001995

    Article  Google Scholar 

  • Papineni K., Roukos S., Ward T., Zhu W.-J. (2001) BLEU: A Method for Automatic Evaluation of MT. Research Report, Computer Science RC22176 (W0109-022), IBM Research Division, T.J. Watson Research Center.

  • Peters C. (2002) The Contribution of Evaluation: The CLEF Experience. Special Interest Group in Information Retrieval (SIGIR), 2002.

  • K. Sparck Jones (2001) ArticleTitleAutomatic Language and Information Processing: Rethinking Evaluation In Natural Language Engineering 7 IssueID1 29–46

    Google Scholar 

  • K. Sparck-Jones J.R. Galliers (1996) Evaluating Natural Language Processing Systems: An Analysis and Review. Lecture Notes in Artificial Intelligence 1083 Springer-Verlag Berlin/New York

    Google Scholar 

  • Spiliopoulou M., Rinaldi F., Black W.J., Zarri G.P., Mueller R.M., Brunzel M. Theodoulidis B., Orphanos G., Hess M., Dowdall J., McNaught J., King M., Persidis A., Bernard L. (2004) Coupling Information Extraction and Data Mining for Ontology Learning in Parmenides. RIAO 2004, Avignon.

  • Starlander M., Popescu-Belis A. (2002) Corpus-Based Evaluation of a French Spelling and Grammar Checker. LREC-02, Las Palmas de Gran Canaria, Spain. pp.262–274.

  • TEMAA. (1996) TEMAA Final Report. Technical report LRE-62-070 (March 1996), Center for Sprogteknologi, Copenhagen, Denmark.

  • TREC. (2005) Text Retrieval Conference (TREC) TREC-9 Proceedings. Available from http://trec.nist.gov.

  • VanSlype G. (1979) Critical Study of Methods for Evaluating the Quality of MT. Technical Report BR 19142, European Commission, Directorate for General Scientific and Technical Information Management (DG XIII) Available from www.issco.unige.ch/projects/isle.

  • Vasilakopoulos A., Bersani M., Black B. (2004) A Suite of Tools for Marking Up Textual Data for Temporal Text Mining Scenarios. LREC-04, Lisbon.

  • Voorhees E.M. (2003) Evaluating the Evaluation: A Case Study Using the TREC 2002 Question Answering Track. HLT-NAAL.

  • Voorhees E.M. (2000) The Evaluation of Question-Answering Systems: Lessons Learned from the TREC QA Track. LREC-2000, Athens.

  • White J.S., O’Connell T.A. (1994) The DARPA MT Evaluation Methodologies: Evolution, Lessons and Future Approaches. In Proceedings of the First Conference of the Association for Machine Translation in the Americas (AMTA-94). Columbia, Maryland.

  • White J.S., Taylor K.B. (1998) A Task-Oriented Evaluation Metric for Machine Translation, LREC-98.

  • W.A. Woods (1973) ArticleTitleProgress in NLU – An Application to Lunar Geology AFIPS 42 441–450

    Google Scholar 

  • S. Whittaker M. Walker (1989) Comparing Two User-Oriented Database Query Languages: A Field Study Technical Report HPL-ISC-89-060 Hewlett Packard Laboratories Bristol, UK

    Google Scholar 

  • A.S. Yeh L. Hirschman A.A. Morgan (2003) ArticleTitleEvaluation of Text Data Mining for Data Base Curation: Lessons Learned from the KDD Challenge Cup Bioinformatics 19 IssueIDSuppl. 1 i331–i339 Occurrence Handle10.1093/bioinformatics/btg1046 Occurrence Handle12855478

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Margaret King.

Rights and permissions

Reprints and permissions

About this article

Cite this article

King, M. Accuracy and Suitability: New Challenges for Evaluation. Language Res Eval 39, 45–64 (2005). https://doi.org/10.1007/s10579-005-2695-2

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-005-2695-2

Keywords

Navigation