Evaluating Web Search Result Summaries
The aim of our research is to produce and assess short summaries to aid users’ relevance judgements, for example for a search engine result page. In this paper we present our new metric for measuring summary quality based on representativeness and judgeability, and compare the summary quality of our system to that of Google. We discuss the basis for constructing our evaluation methodology in contrast to previous relevant open evaluations, arguing that the elements which make up an evaluation methodology: the tasks, data and metrics, are interdependent and the way in which they are combined is critical to the effectiveness of the methodology. The paper discusses the relationship between these three factors as implemented in our own work, as well as in SUMMAC/MUC/DUC.
KeywordsEvaluation Methodology Relevance Judgement Text Summarisation Summary Quality Judgeability Task
Unable to display preview. Download preview PDF.
- 2.Berger, A., Mittal, V.O.: Query-Relevant summarisation using FAQs. In: ACL, pp. 294–301 (2000)Google Scholar
- 3.Borko, H., Bernier, C.L.: Abstracting concepts and methods. Academic Press, San Diego (1975)Google Scholar
- 4.Chinchor, N., Hirschman, L., Lewis, D.D.: Evaluating message understanding systems: An analysis of the third message understanding conference. Association for Computation Linguistics 19(3), 409–449 (1993)Google Scholar
- 5.Chinchor, N.: MUC-3 Evaluation metrics. In: Proceedings of third message understanding conference, pp. 17–24 (1991)Google Scholar
- 6.Harman, D., Over, P.: The effects of human variation in DUC summarization evaluation. In: Proceedings of the ACL 2004 Workshop in Text Summarization Branches Out, Barcelona, Spain, pp. 10–17 (July 2004)Google Scholar
- 7.Liang, S.F., Devlin, S., Tait, J.: Poster: Using query term order for result summarisation. In: SIGIR 2005, Brazil, pp. 629–630 (2005)Google Scholar
- 8.Lin, C.Y.: ROUGE: a Package for Automatic Evaluation of Summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, Barcelona, Spain, July 25-26 (2004)Google Scholar
- 9.Mani, I., Firmin, T., Sundheim, B.: The TIPSTER SUMMAC text summarization evaluation. In: Proceedings of the ninth conference on European chapter of the Association ofr Computational Linguistics, Bergen, Norway, pp. 77–85 (1999)Google Scholar
- 11.Pagano, R.R.: Understanding statistics in the behavioural sciences. Wadsworth/Thomson Learning (2001)Google Scholar
- 12.Sparck Jones, C., Galliers, J.R.: Evaluating natural language processing systems: an analysis and review. Springer, New York (1996)Google Scholar
- 13.Tipster text phase III 18-month workshop notes (1998), Fairfax, VA (May 1998)Google Scholar