A test collection is a standard set of data used to measure search engine performance. It comprises a set of queries, ideally randomly sampled from some space, a set of documents to be searched, and a set of judgments indicating the relevance of each document to each query in the set.
The use of test collections for performance evaluation began with Cleverdon and Mills  and is today known as the Cranfield methodology. Test collections today are much larger than Cleverdon’s Cranfield collection, consisting of millions of documents and tens of thousands of relevance judgments. The advantage of having standardized test collections is that experimental results can be compared across research groups and over time.
The National Institute of Standards and Technology (NIST), through their annual Text REtrieval Conferences (TREC), has led the way in providing test collections for information retrieval research. NIST has assembled large-scale test...
- 1.Voorhees EM, Harman DK, editors. TREC: experiment and evaluation in information retrieval. Cambridge: MIT; 2005.Google Scholar