Abstract
The aim of our research is to produce and assess short summaries to aid users’ relevance judgements, for example for a search engine result page. In this paper we present our new metric for measuring summary quality based on representativeness and judgeability, and compare the summary quality of our system to that of Google. We discuss the basis for constructing our evaluation methodology in contrast to previous relevant open evaluations, arguing that the elements which make up an evaluation methodology: the tasks, data and metrics, are interdependent and the way in which they are combined is critical to the effectiveness of the methodology. The paper discusses the relationship between these three factors as implemented in our own work, as well as in SUMMAC/MUC/DUC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Afantenos, S., Karkaletsis, V., Stamatopoulos, P.: Summarization from medical documents: a survey. Artificial Intelligence in Medicine 33(2), 157–177 (2005)
Berger, A., Mittal, V.O.: Query-Relevant summarisation using FAQs. In: ACL, pp. 294–301 (2000)
Borko, H., Bernier, C.L.: Abstracting concepts and methods. Academic Press, San Diego (1975)
Chinchor, N., Hirschman, L., Lewis, D.D.: Evaluating message understanding systems: An analysis of the third message understanding conference. Association for Computation Linguistics 19(3), 409–449 (1993)
Chinchor, N.: MUC-3 Evaluation metrics. In: Proceedings of third message understanding conference, pp. 17–24 (1991)
Harman, D., Over, P.: The effects of human variation in DUC summarization evaluation. In: Proceedings of the ACL 2004 Workshop in Text Summarization Branches Out, Barcelona, Spain, pp. 10–17 (July 2004)
Liang, S.F., Devlin, S., Tait, J.: Poster: Using query term order for result summarisation. In: SIGIR 2005, Brazil, pp. 629–630 (2005)
Lin, C.Y.: ROUGE: a Package for Automatic Evaluation of Summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, Barcelona, Spain, July 25-26 (2004)
Mani, I., Firmin, T., Sundheim, B.: The TIPSTER SUMMAC text summarization evaluation. In: Proceedings of the ninth conference on European chapter of the Association ofr Computational Linguistics, Bergen, Norway, pp. 77–85 (1999)
Mani, I.: Automatic Summarization. John Benjamins, Amsterdam (2001)
Pagano, R.R.: Understanding statistics in the behavioural sciences. Wadsworth/Thomson Learning (2001)
Sparck Jones, C., Galliers, J.R.: Evaluating natural language processing systems: an analysis and review. Springer, New York (1996)
Tipster text phase III 18-month workshop notes (1998), Fairfax, VA (May 1998)
Voorhees, E.M.: Variations in Relevance Judgements and the Measurement of Retrieval Effectiveness. Information Processing & Management 36(5), 697–716 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liang, S.F., Devlin, S., Tait, J. (2006). Evaluating Web Search Result Summaries. In: Lalmas, M., MacFarlane, A., RĂĽger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds) Advances in Information Retrieval. ECIR 2006. Lecture Notes in Computer Science, vol 3936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11735106_10
Download citation
DOI: https://doi.org/10.1007/11735106_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33347-0
Online ISBN: 978-3-540-33348-7
eBook Packages: Computer ScienceComputer Science (R0)