The Philosophy of Information Retrieval Evaluation

Voorhees, Ellen M.

doi:10.1007/3-540-45691-0_34

Ellen M. Voorhees⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2406))

Included in the following conference series:

Workshop of the Cross-Language Evaluation Forum for European Languages

1055 Accesses
86 Citations
1 Altmetric

Abstract

Evaluation conferences such as TREC, CLEF, and NTCIR are modern examples of the Cranfield evaluation paradigm. In Cranfield, researchers perform experiments on test collections to compare the relative effectiveness of different retrieval approaches. The test collections allow the researchers to control the effects of different system parameters, increasing the power and decreasing the cost of retrieval experiments as compared to user-based evaluations. This paper reviews the fundamental assumptions and appropriate uses of the Cranfield paradigm, especially as they apply in the context of the evaluation conferences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Martin Braschler. CLEF 200-Overview of results. In Carol Peters, editor, Cross-Language Information Retrieval and Evaluation; Lecture Notes in Computer Science2069, pages 89–101. Springer, 2001.
Chapter Google Scholar
Chris Buckley and Ellen M. Voorhees. Evaluating evaluation measure stability. In N. Belkin, P. Ingwersen, and M.K. Leong, editors, Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 33–40, 2000.
Google Scholar
C. W. Cleverdon. The Cranfield tests on index language devices. In Aslib Proceedings, volume 19, pages 173–192, 1967. (Reprinted in Readings in Information Retrieval, K. Sparck-Jones and P. Willett, editors, Morgan Kaufmann, 1997).
Article Google Scholar
Cyril W. Cleverdon. The significance of the Cranfield tests on index languages. In Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pages 3–12, 1991.
Google Scholar
Gordon V. Cormack, Christopher R. Palmer, and Charles L.A. Clarke. Efficient construction of large test collections. In Alistair Moffat, C.J. van Rijsbergen, Ross Wilkinson, and Justin Zobel, editors. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 1998. ACM Press, New York Croft et al. [6], pages 282–289.
Chapter Google Scholar
W. Bruce Croft, Alistair Moffat, C.J. van Rijsbergen, Ross Wilkinson, and Justin Zobel, editors. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 1998. ACM Press, New York.
Google Scholar
C. A. Cuadra and R. V. Katter. Opening the black box of relevance. Journal of Documentation, 23(4):291–303, 1967.
Article Google Scholar
Donna Harman. Overview of the fourth Text REtrieval Conference (TREC-4). In D. K. Harman, editor, Proceedings of the Fourth Text REtrieval Conference (TREC-4), pages 1–23, October 1996. NIST Special Publication 500–236.
Google Scholar
Stephen P. Harter. Variations in relevance assessments and the measurement of retrieval effectiveness. Journal of the American Society for Information Science, 47(1):37–49, 1996.
Article Google Scholar
William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek, and Daniel Olson. Do batch and user evaluations give the same results? In N. Belkin, P. Ingwersen, and M.K. Leong, editors, Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 17–24, 2000.
Google Scholar
Noriko Kando, Kazuko Kuriyama, Toshihiko Nozue, Koji Eguchi, Hiroyuki Kato, and Souichiro Hidaka. Overview of IR tasks at the first NTCIR workshop. In Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition, pages 11–44, 1999.
Google Scholar
M.E. Lesk and G. Salton. Relevance assessments and retrieval system evaluation. Information Storage and Retrieval, 4:343–359, 1969.
Article Google Scholar
G. Salton, editor. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Inc. Englewood Cliffs, New Jersey, 1971.
Google Scholar
Linda Schamber. Relevance and information behavior. Annual Review of Information Science and Technology, 29:3–48, 1994.
Google Scholar
K. Sparck Jones and C. van Rijsbergen. Report on the need for and provision of an “ideal” information retrieval test collection. British Library Research and Development Report 5266, Computer Laboratory, University of Cambridge, 1975.
Google Scholar
Karen Sparck Jones. The Cranfield tests. In Karen Sparck Jones, editor, Information Retrieval Experiment, chapter 13, pages 256–284. Butterworths, London, 1981.
Google Scholar
Karen Sparck Jones. Information Retrieval Experiment. Butterworths, London, 1981.
Google Scholar
Karen Sparck Jones and Peter Willett. Evaluation. In Karen Sparck Jones and Peter Willett, editors, Readings in Information Retrieval, chapter 4, pages 167–174. Morgan Kaufmann, 1997.
Google Scholar
Alan Stuart. Kendall’s tau. In Samuel Kotz and Norman L. Johnson, editors, Encyclopedia of Statistical Sciences, volume 4, pages 367–369. John Wiley & Sons, 1983.
Google Scholar
M. Taube. A note on the pseudomathematics of relevance. American Documentation, 16(2):69–72, April 1965.
Google Scholar
Andrew H. Turpin and William Hersh. Why batch and user evaluations do not give the same results. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 225–231, 2001.
Google Scholar
C.J. van Rijsbergen. Information Retrieval, chapter 7. Butterworths, 2 edition, 1979.
Google Scholar
Ellen M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing and Management, 36:697–716, 2000.
Article Google Scholar
Ellen M. Voorhees and Donna Harman. Overview of the eighth Text REtrieval Conference (TREC-8). In E.M. Voorhees and D.K. Harman, editors, Proceedings of the Eighth Text REtrieval Conference (TREC-8), pages 1–24, 2000. NIST Special Publication 500–246. Electronic version available at http://trec.nist.gov/pubs.html.
Ellen M. Voorhees and Donna Harman. Overview of TREC 2001. In Proceedings of TREC 2001 (Draft), 2001. To appear.
Google Scholar
Justin Zobel. How reliable are the results of large-scale information retrieval experiments? In Alistair Moffat, C.J. van Rijsbergen, Ross Wilkinson, and Justin Zobel, editors. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 1998. ACM Press, New York Croft et al. [6], pages 307–314.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA
Ellen M. Voorhees

Authors

Ellen M. Voorhees
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Consiglio Nazionale delle Ricerche, Istituto di Scienza e Tecnologie, Via G. Moruzzi 1, 56124, Pisa, Italy
Carol Peters
Eurospider Information Technology AG, Schaffhauserstrasse 18, 8006, Zürich, Switzerland
Martin Braschler
E.T.S.I. Industriales, Universidad Nacional de Educación a Distancia, Ciudad Universitaria s/n, 28040, Madrid, Spain
Julio Gonzalo
Informations Zentrum Sozialwissenschaften, Lennestr. 30, 53113, Bonn, Germany
Michael Kluck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Voorhees, E.M. (2002). The Philosophy of Information Retrieval Evaluation. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Evaluation of Cross-Language Information Retrieval Systems. CLEF 2001. Lecture Notes in Computer Science, vol 2406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45691-0_34

Download citation

DOI: https://doi.org/10.1007/3-540-45691-0_34
Published: 02 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44042-0
Online ISBN: 978-3-540-45691-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics