Abstract
The paper reviews six recent efforts to better understand performance measurements on information retrieval (IR) systems within the framework of the Text REtrieval Conferences (TREC): analysis of variance, cluster analyses, rank correlations, beadplots, multidimensional scaling, and item response analysis. None of this work has yielded any substantial new insights. Prospects that additional work along these lines will yield more interesting results vary but are in general not promising. Some suggestions are made for paying greater attention to richer descriptions of IR system behavior but within smaller, better controlled settings.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Banks DL and Constantine GM (1998) Metric models for random graphs. Journal of Classification, Vol. 15, 199–224.
Critchlow DE (1985) Metric Methods for Analyzing Partially Ranked Data. Springer-Verlag, New York, NY.
David HA (1981) Order Statistics. Wiley, New York.
Diaconis P (1988) Group Representations in Probability and Statistics. IMS Lecture Note Series, Vol. 11. Institute of Mathematical Statistics, Hayward, CA.
Dixon WJ, ed. (1985) BMDP Statistical Software: 1985 Printing. University of California Press, Berkeley, CA.
Dyer FL and Martin TC (1910) Edison: His Life and Inventions. Harper and Brothers, New York.
Harman DK (1994) Overview of the second text REtrieval conference (TREC-2). In: Harman DK, ed. The Second Text REtrieval Conference (TREC-2). U.S. Government Printing Office, Washington, DC, pp. 1–20.
Harman D, ed. (1996) The Fourth Text REtrieval Conference (TREC-4). National Institute of Standards and Technology Special Publication 500–236. U.S. Government Printing Office, Washington, DC.
Hartigan JA (1975). Clustering Algorithms. Wiley, New York.
Hull DA, Kantor PB and Ng KB (1997) Advanced approaches to the statistical analysis of TREC information retrieval experiments. Report presented at the TREC-6 Conference at the National Institute of Standards and Technology.
Kolen MJ and Brennan RL (1995) Test Equating: Methods and practices. Springer-Verlag, New York.
Lawrence S and Giles CL (1998) Searching the world wide web. Science, 280: 98–100.
Mallows C (1957) Non-null ranking models I. Biometrika, 44: 114–130.
Milliken GA and Johnson DE (1991) Analysis of Messy Data, Volume 2: Nonreplicated Experiments. Van Nostrand Reinhold, New York.
Mislevy RJ and Bock RD (1990) BILOG 3: Item Analysis and Binary Scoring with Binary Logistic Variables, 2nd ed. Scientific Software, Mooresville, IN.
Rorvig M and Fitzpatrick S (1998) Visualization and scaling of TREC topic document sets. Journal of Information Processing and Management, Vol. 34, 135–149.
Rorvig M, Sullivan T and Oyarce G (1998) A visualization case study of feature vector and stemmer effects on TREC topic-document subsets. Proceedings of the 1998 Annual Meeting of the American Society for Information Science, Information Access in the Global Information Economy, CM Preston, Ed. Vol. 35, 130–142.
SAS Institute, Inc. (1996) SAS/STAT Software: Changes and enhancements through release 6.11. SAS Institute, Cary, NC.
Tague-Sutcliffe J and Blustein J (1995) A statistical analysis of the TREC-3 data. In: Harman DK, ed. Overview of the Third Text REtrieval Conference (TREC-3). U.S. Government Printing Office, Washington, DC, pp. 385–398.
Young FW (1985) Multidimensional scaling. In: Kotz S, Johnson NL and Read CB, eds. Encyclopedia of Statistical Sciences. Wiley, New York, Vol. 5, pp. 649–659.
Rights and permissions
About this article
Cite this article
Banks, D., Over, P. & Zhang, NF. Blind Men and Elephants: Six Approaches to TREC data. Information Retrieval 1, 7–34 (1999). https://doi.org/10.1023/A:1009984519381
Issue Date:
DOI: https://doi.org/10.1023/A:1009984519381