Here or There

Preference Judgments for Relevance
  • Ben Carterette
  • Paul N. Bennett
  • David Maxwell Chickering
  • Susan T. Dumais
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4956)


Information retrieval systems have traditionally been evaluated over absolute judgments of relevance: each document is judged for relevance on its own, independent of other documents that may be on topic. We hypothesize that preference judgments of the form “document A is more relevant than document B” are easier for assessors to make than absolute judgments, and provide evidence for our hypothesis through a study with assessors. We then investigate methods to evaluate search engines using preference judgments. Furthermore, we show that by using inferences and clever selection of pairs to judge, we need not compare all pairs of documents in order to apply evaluation methods.


Absolute Judgment Judgment Type Preference Judgment Preference Interface Discount Cumulative Gain 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Voorhees, E.M., Harman, D. (eds.): TREC. The MIT Press, Cambridge (2005)Google Scholar
  2. 2.
    Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: Proceedings of SIGIR, pp. 41–48 (2000)Google Scholar
  3. 3.
    Voorhees, E.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: Proceedings of SIGIR, pp. 315–323 (1998)Google Scholar
  4. 4.
    Kendall, M.: Rank Correlation Methods, 4th edn., Griffin, London, UK (1970)Google Scholar
  5. 5.
    Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of ICML, pp. 89–96 (2005)Google Scholar
  6. 6.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of KDD, pp. 133–142 (2002)Google Scholar
  7. 7.
    Bartell, B., Cottrell, G., Belew, R.: Learning to retrieve information. In: Proceedings of the Swedish Conference on Connectionism (1995)Google Scholar
  8. 8.
    Frei, H.P., Schäuble, P.: Determining the effectiveness of retrieval algorithms. Information Processing and Management 27(2-3), 153–164 (1991)CrossRefGoogle Scholar
  9. 9.
    Joachims, T., Granka, L., Pang, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: Proceedings of SIGIR, pp. 154–161 (2005)Google Scholar
  10. 10.
    Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of SIGIR, pp. 25–32 (2004)Google Scholar
  11. 11.
    Mizzaro, S.: Measuring the agreement among relevance judges. In: Proceedings of MIRA (1999)Google Scholar
  12. 12.
    Rorvig, M.E.: The simple scalability of documents. JASIS 41(8), 590–598 (1990)CrossRefGoogle Scholar
  13. 13.
    Carterette, B., Allan, J., Sitaraman, R.: Minimal test collections for retrieval evaluation. In: Proceedings of SIGIR, pp. 268–275 (2006)Google Scholar
  14. 14.
    Carterette, B., Petkova, D.: Learning a ranking from pairwise preferences. In: Proceedings of SIGIR, pp. 629–630 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ben Carterette
    • 1
  • Paul N. Bennett
    • 2
  • David Maxwell Chickering
    • 3
  • Susan T. Dumais
    • 2
  1. 1.University of Massachusetts Amherst 
  2. 2.Microsoft Research 
  3. 3.Microsoft Live Labs 

Personalised recommendations