Chapter

Advances in Information Retrieval

Volume 5478 of the series Lecture Notes in Computer Science pp 288-300

If I Had a Million Queries

  • Ben CarteretteAffiliated withDept. of Computer and Info. Sciences, University of Delaware
  • , Virgil PavluAffiliated withCollege of Computer and Info. Science, Northeastern University
  • , Evangelos KanoulasAffiliated withCollege of Computer and Info. Science, Northeastern University
  • , Javed A. AslamAffiliated withCollege of Computer and Info. Science, Northeastern University
  • , James AllanAffiliated withDept. of Computer Science, University of Massachusetts Amherst

* Final gross prices may vary according to local VAT.

Get Access

Abstract

As document collections grow larger, the information needs and relevance judgments in a test collection must be well-chosen within a limited budget to give the most reliable and robust evaluation results. In this work we analyze a sample of queries categorized by length and corpus-appropriateness to determine the right proportion needed to distinguish between systems. We also analyze the appropriate division of labor between developing topics and making relevance judgments, and show that only a small, biased sample of queries with sparse judgments is needed to produce the same results as a much larger sample of queries.