If I Had a Million Queries

  • Ben Carterette
  • Virgil Pavlu
  • Evangelos Kanoulas
  • Javed A. Aslam
  • James Allan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5478)

Abstract

As document collections grow larger, the information needs and relevance judgments in a test collection must be well-chosen within a limited budget to give the most reliable and robust evaluation results. In this work we analyze a sample of queries categorized by length and corpus-appropriateness to determine the right proportion needed to distinguish between systems. We also analyze the appropriate division of labor between developing topics and making relevance judgments, and show that only a small, biased sample of queries with sparse judgments is needed to produce the same results as a much larger sample of queries.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Ben Carterette
    • 1
  • Virgil Pavlu
    • 2
  • Evangelos Kanoulas
    • 2
  • Javed A. Aslam
    • 2
  • James Allan
    • 3
  1. 1.Dept. of Computer and Info. SciencesUniversity of DelawareNewarkUSA
  2. 2.College of Computer and Info. ScienceNortheastern UniversityBostonUSA
  3. 3.Dept. of Computer ScienceUniversity of Massachusetts AmherstAmherstUSA

Personalised recommendations