Measuring Effectiveness in the TREC Legal Track

Part of the The Information Retrieval Series book series (INRE, volume 29)

Abstract

In this chapter, we report our experiences from attempting to measure the effectiveness of large e-Discovery result sets in the TREC Legal Track campaigns of 2007–2009. For effectiveness measures, we have focused on recall, precision and F1. We state the estimators that we have used for these measures, and we outline both the rank-based and set-based approaches to sampling that we have taken. We share our experiences with the sampling error in the resulting estimates for the absolute performance on individual topics, relative performance on individual topics, mean performance across topics, and relative performance across topics. Finally, we discuss our experiences with assessor error, which we have found has often had a larger impact than sampling error.

References

  1. 1.
    Allan J, Carterette B, Dachev B et al. (2008) Million query track 2007 overview. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/1MQ.OVERVIEW16.pdf Google Scholar
  2. 2.
    Baron JR (ed) (2007) The Sedona conference® best practices commentary on the use of search and information retrieval methods in e-discovery. Sedona Conf J VIII:189–223 Google Scholar
  3. 3.
    Baron JR, Lewis DD, Oard DW (2007) TREC-2006 legal track overview. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/LEGAL06.OVERVIEW.pdf Google Scholar
  4. 4.
    Buckley C, Dimmick D, Soboroff I, Voorhees E (2006) Bias and the limits of pooling. In: SIGIR 2006, pp 619–620 CrossRefGoogle Scholar
  5. 5.
    Büttcher S, Clarke CLA, Soboroff I (2007) The TREC 2006 terabyte track. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/TERA06.OVERVIEW.pdf Google Scholar
  6. 6.
    Carterette B, Soboroff I (2010) The effect of assessor errors on IR system evaluation. In: SIGIR 2010, pp 539–546 Google Scholar
  7. 7.
    Harman DK (2005) The TREC test collections. In: TREC: Experiment and evaluation in information retrieval, pp 21–52 Google Scholar
  8. 8.
    Hedin B, Tomlinson S, Baron JR, Oard DW (2010) Overview of the TREC 2009 legal track. In: Proceedings of TREC 2009. http://trec-legal.umiacs.umd.edu/LegalOverview09.pdf Google Scholar
  9. 9.
    Lewis D, Agam G, Argamon S et al. (2006) Building a test collection for complex document information processing. In: SIGIR 2006, pp 665–666 CrossRefGoogle Scholar
  10. 10.
    Oard DW, Baron JR, Hedin B et al (2010) Evaluation of information retrieval for e-discovery. Artif Intell Law Google Scholar
  11. 11.
    Oard DW, Hedin B, Tomlinson S, Baron JR (2009) Overview of the TREC 2008 legal track. In: Proceedings of TREC 2008. http://trec.nist.gov/pubs/trec17/papers/LEGAL.OVERVIEW08.pdf Google Scholar
  12. 12.
    Thompson SK (2002) Sampling, 2nd edn. Wiley, New York MATHGoogle Scholar
  13. 13.
    Tomlinson S (2007) Experiments with the negotiated boolean queries of the TREC 2006 legal discovery track. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/opentext.legal.final.pdf Google Scholar
  14. 14.
    Tomlinson S (2008) Experiments with the negotiated boolean queries of the TREC 2007 legal discovery track. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/open-text.legal.final.pdf Google Scholar
  15. 15.
    Tomlinson S (2009) Experiments with the negotiated boolean queries of the TREC 2008 legal track. In: Proceedings of TREC 2008. http://trec.nist.gov/pubs/trec17/papers/open-text.legal.rev.pdf Google Scholar
  16. 16.
    Tomlinson S, Oard DW, Baron JR, Thompson P (2008) Overview of the TREC 2007 legal track. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/LEGAL.OVERVIEW16.pdf Google Scholar
  17. 17.
    van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworths, London. http://www.dcs.gla.ac.uk/Keith/Preface.html Google Scholar
  18. 18.
    Voorhees EM (2000) Variations in relevance judgments and the measurement of retrieval effectiveness. Inf Process Manag 36(5):697–716 CrossRefGoogle Scholar
  19. 19.
    Voorhees EM, Harman D (1997) Overview of the fifth text retrieval conference (TREC-5). In: Proceedings of TREC-5. http://trec.nist.gov/pubs/trec5/papers/overview.ps.gz Google Scholar
  20. 20.
    Webber W (2010) Accurate recall confidence intervals for stratified sampling. Manuscript Google Scholar
  21. 21.
    Webber W, Oard DW, Scholer F, Hedin B (2010) Assessor error in stratified evaluation. In: CIKM 2010, pp 539–548 Google Scholar
  22. 22.
    Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: CIKM 2006, pp 102–111 CrossRefGoogle Scholar
  23. 23.
    Zobel J (1998) How reliable are the results of large-scale information retrieval experiments. In: SIGIR 1998, pp 307–314 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Open Text CorporationOttawaCanada
  2. 2.H5San FranciscoUSA

Personalised recommendations