Artificial Intelligence and Law

, Volume 18, Issue 4, pp 347–386 | Cite as

Evaluation of information retrieval for E-discovery

  • Douglas W. Oard
  • Jason R. Baron
  • Bruce Hedin
  • David D. Lewis
  • Stephen Tomlinson
Article

Abstract

The effectiveness of information retrieval technology in electronic discovery (E-discovery) has become the subject of judicial rulings and practitioner controversy. The scale and nature of E-discovery tasks, however, has pushed traditional information retrieval evaluation approaches to their limits. This paper reviews the legal and operational context of E-discovery and the approaches to evaluating search technology that have evolved in the research community. It then describes a multi-year effort carried out as part of the Text Retrieval Conference to develop evaluation methods for responsive review tasks in E-discovery. This work has led to new approaches to measuring effectiveness in both batch and interactive frameworks, large data sets, and some surprising results for the recall and precision of Boolean and statistical information retrieval methods. The paper concludes by offering some thoughts about future research in both the legal and technical communities toward the goal of reliable, effective use of information retrieval in E-discovery.

Keywords

E-discovery Information retrieval Interactive search Evaluation 

Notes

Acknowledgments

The authors first wish to thank a number of individuals who in discussions with the authors contributed ideas and suggestions that found their way into portions of the present paper, including Thomas Bookwalter, Gordon Cormack, Todd Elmer, Maura Grossman and Richard Mark Soley. Additionally, the TREC Legal Track would not have been possible without the support of Ellen Voorhees and Ian Soboroff of NIST; the faculty, staff and students of IIT, UCSF, Tobacco Documents Online, and Roswell Park Cancer Institute who helped build IIT CDIP or the LTDL on which it was based; Celia White (the 2006 Track expert interactive searcher); Venkat Rangan of Clearwell Systems who helped to build the TREC Enron test collection; Richard Braman of The Sedona Conference® and the hundreds of law students, lawyers and Sedona colleagues who have contributed pro bono time to the project. Finally, the authors wish to thank Kevin Ashley and Jack Conrad for their support of and participation in the First and Third DESI Workshops, held as part of the Eleventh and Twelfth International Conferences on Artificial Intelligence and Law, at which many of the ideas herein were discussed.

References

  1. American Institute of Certified Public Accountants (2009) Statement on auditing standards no. 70: Service organizations. SAS 70Google Scholar
  2. Aslam JA, Pavlu V, Yilmaz E (2006) A statistical method for system evaluation using incomplete judgments. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 541–548Google Scholar
  3. Bales S, Wang P (2006) Consolidating user relevance criteria: a meta-ethnography of empirical studies. In: Proceedings of the 42nd annual meeting of the American society for information science and technologyGoogle Scholar
  4. Baron JR (2005) Toward a federal benchmarking standard for evaluation of information retrieval products used in E-discovery. Sedona Conf J 6:237–246Google Scholar
  5. Baron JR (2007) The TREC legal track: origins and reports from the first year. Sedona Conf J 8:237–246Google Scholar
  6. Baron JR (2008) Towards a new jurisprudence of information retrieval: what constitutes a ’reasonable’ search for digital evidence when using keywords?. Digit Evid Electronic Signature Law Rev 5:173–178Google Scholar
  7. Baron JR (2009) E-discovery and the problem of asymmetric knowledge. Mercer Law Rev 60:863Google Scholar
  8. Baron JR, Thompson P (2007) The search problem posed by large heterogeneous data sets in litigation: possible future approaches to research. In: Proceedings of the 11th international conference on artificial intelligence and law, pp 141–147Google Scholar
  9. Baron JR, Lewis DD, Oard DW (2007) TREC-2006 Legal Track overview. In: The fifteenth text retrieval conference proceedings (TREC 2006), pp 79–98Google Scholar
  10. Bauer RS, Brassil D, Hogan C, Taranto G, Brown JS (2009) Impedance matching of humans and machines in high-Q information retrieval systems. In: Proceedings of the IEEE international conference on systems, man and cybernetics, pp 97–101Google Scholar
  11. Blair D (2006) Wittgenstein, language and information: back to the rough ground. Springer, New YorkGoogle Scholar
  12. Blair D, Maron ME (1985) An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun ACM 28(3):289–299CrossRefGoogle Scholar
  13. Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Computer Netw ISDN Syst 30(1–7):107–117CrossRefGoogle Scholar
  14. Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international conference on research and development in information retrieval, pp 25–32Google Scholar
  15. Buckley C, Voorhees EM (2005) Retrieval system evaluation. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 53–75Google Scholar
  16. Buckley C, Dimmick D, Soboroff I, Voorhees E (2006) Bias and the limits of pooling. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 619–620Google Scholar
  17. Büttcher S, Clarke CLA, Yeung PCK, Soboroff I (2007) Reliable information retrieval evaluation with incomplete and biased judgements. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp 63–70Google Scholar
  18. Carmel D, Yom-Tov E, Darlow A, Pelleg D (2006) What makes a query difficult? In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 390–397Google Scholar
  19. Carterette B, Pavlu V, Kanoulas E, Aslam JA, Allan J (2008) Evaluation over thousands of queries. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 651–658Google Scholar
  20. Clarke C, Craswell N, Soboroff I (2005) The TREC terabyte retrieval track. SIGIR Forum 39(1):25–25CrossRefGoogle Scholar
  21. Cleverdon C (1967) The Cranfield tests on index language devices. Aslib Proceed 19(6):173–194CrossRefGoogle Scholar
  22. Cormack GV, Lynam TR (2006) TREC 2005 spam track overview. In: The fourteenth text retrieval conference (TREC 2005), pp 91–108Google Scholar
  23. Cormack GV, Palmer CR, Clarke CLA (1998) Efficient construction of large test collections. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, pp 282–289Google Scholar
  24. Dumais ST, Belkin NJ (2005) The TREC interactive tracks: putting the user into search. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 123–152Google Scholar
  25. Fox EA (1983) Characterization of two new experimental collections in computer and information science containing textual and bibliographic concepts. Tech. rep. TR83-561, Cornell UniversityGoogle Scholar
  26. Harman DK (2005) The TREC test collections. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 21–52Google Scholar
  27. Hedin B, Oard DW (2009) Replication and automation of expert judgments: information engineering in legal E-discovery. In: SMC’09: Proceedings of the 2009 IEEE international conference on systems, man and cybernetics, pp 102–107Google Scholar
  28. Hedin B, Tomlinson S, Baron JR, Oard DW (2010) Overview of the TREC 2009 Legal Track. In: The eighteenth text retrieval conference (TREC 2009)Google Scholar
  29. Ingwersen P (1992) Information retrieval interaction. Taylor Graham, LondonGoogle Scholar
  30. Ingwersen P, Järvelin K (2005) The turn: integration of information seeking and retrieval in context. SpringerGoogle Scholar
  31. International Organization for Standards (2005) Quality management systems—fundamentals and vocabulary. ISO 9000:2005Google Scholar
  32. Jensen JH (2000) Special issues involving electronic discovery. Kansas J Law Public Policy 9:425Google Scholar
  33. Kando N, Mitamura T, Sakai T (2008) Introduction to the NTCIR-6 special issue. ACM Trans Asian Lang Inform Process 7(2):1–3CrossRefGoogle Scholar
  34. Kazai G, Lalmas M, Fuhr N, Gövert N (2004) A report on the first year of the initiative for the evaluation of XML retrieval (INEX’02). J Am Soc Inform Sci Technol 55(6):551–556CrossRefGoogle Scholar
  35. Lewis D, Agam G, Argamon S, Frieder O, Grossman D, Heard J (2006) Building a test collection for complex document information processing. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 665–666Google Scholar
  36. Lewis DD (1996) The TREC-4 filtering track. In: The fourth text retrieval conference (TREC-4), pp 165–180Google Scholar
  37. Lynam TR, Cormack GV (2009) Multitext legal experiments at TREC 2008. In: The sixteenth text retrieval conference (TREC 2008)Google Scholar
  38. Majumder P, Mitra M, Pal D, Bandyopadhyay A, Maiti S, Mitra S, Sen A, Pal S (2008) Text collections for FIRE. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 699–700Google Scholar
  39. Moffat A, Zobel J (2008) Rank-biased precision for measurement of retrieval effectiveness. ACM Trans Inf Syst 27(1)Google Scholar
  40. Oard DW, Hedin B, Tomlinson S, Baron JR (2009) Overview of the TREC 2008 Legal Track. In: The seventeenth text retrieval conference (TREC 2008)Google Scholar
  41. Oot P, Kershaw A, Roitblat HL (2010) Mandating reasonableness in a reasonable inquiry. Denver Univ Law Rev 87:533Google Scholar
  42. Paul GL, Baron JR (2007) Information inflation: can the legal system adapt? Richmond J Law Technol 13(3)Google Scholar
  43. PCI Security Standards Council (2009) Payment card industry (PCI) data security standard: requirements and security assessment procedures, version 1.2.1. http://www.pcisecuritystandards.org
  44. Peters C, Braschler M (2001) European research letter: cross-language system evaluation: the CLEF campaigns. J Am Soc Inf Sci Technol 52(12):1067–1072CrossRefGoogle Scholar
  45. Roitblat HL, Kershaw A, Oot P (2010) Document categorization in legal electronic discovery: computer classification vs. manual review. J Am Soc Inf Sci Technol 61(1):70–80Google Scholar
  46. Sakai T, Kando N (2008) On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf Retr 11(5):447–470CrossRefGoogle Scholar
  47. Sanderson M, Joho H (2004) Forming test collections with no system pooling. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, pp 33–40Google Scholar
  48. Sanderson M, Zobel J (2005) Information retrieval system evaluation: effort, sensitivity, and reliability. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 162–169Google Scholar
  49. Schmidt H, Butter K, Rider C (2002) Building digital tobacco document libraries at the University of California, San Francisco Library/Center for Knowledge Management. D-Lib Mag 8(2)Google Scholar
  50. Singhal A, Salton G, Buckley C (1995) Length normalization in degraded text collections. In: Proceedings of fifth annual symposium on document analysis and information retrieval, pp 15–17Google Scholar
  51. Soboroff I (2007) A comparison of pooled and sampled relevance judgments. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp 785–786Google Scholar
  52. Solomon RD, Baron JR (2009) Bake offs, demos & kicking the tires: a practical litigator’s brief guide to evaluating early case assessment software & search & review tools. http://www.kslaw.com/Library/publication/BakeOffs_Solomon.pdf
  53. Spärck Jones K, van Rijsbergen CJ (1975) Report on the need for and provision of an ideal information retrieval test collection. Tech. Rep. 5266, Computer Laboratory, University of Cambridge, Cambridge (UK)Google Scholar
  54. Taghva K, Borsack J, Condit A (1996) Effects of OCR errors on ranking and feedback using the vector space model. Inf Process Manage 32(3):317–327CrossRefGoogle Scholar
  55. The Sedona Conference (2007a) The Sedona Principles, second edition: best practice recommendations and principles for addressing electronic document production. http://www.thesedonaconference.org
  56. The Sedona Conference (2007b) The Sedona Conference best practices commentary on the use of search and information retrieval methods in E-discovery. The Sedona Conf J 8:189–223Google Scholar
  57. The Sedona Conference (2009) The Sedona Conference commentary on achieving quality in the E-discovery process. The Sedona Conf J 10:299–329Google Scholar
  58. Tomlinson S (2007) Experiments with the negotiated Boolean queries of the TREC 2006 legal discovery track. In: The fifteenth text retrieval conference (TREC 2006)Google Scholar
  59. Tomlinson S, Oard DW, Baron JR, Thompson P (2008) Overview of the TREC 2007 Legal Track. In: The sixteenth text retrieval conference (TREC 2007)Google Scholar
  60. Turpin A, Scholer F (2006) User performance versus precision measures for simple search tasks. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 11–18Google Scholar
  61. Voorhees EM, Garofolo JS (2005) Retrieving noisy text. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 183–197Google Scholar
  62. Voorhees EM, Harman DK (2005) The text retrieval conference. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 3–19Google Scholar
  63. Wayne CL (1998) Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation. In: Proceedings of the first international conference on language resources and evaluationGoogle Scholar
  64. Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 15th international conference on information and knowledge management (CIKM), pp 102–111Google Scholar
  65. Zhao FC, Oard DW, Baron JR (2009) Improving search effectiveness in the legal E-discovery process using relevance feedback. In: ICAIL 2009 DESI III Global E-Discovery/E-Disclosure Workshop. http://www.law.pitt.edu/DESI3_Workshop/DESI_III_papers.htm
  66. Zobel J (1998) How reliable are the results of large-scale information retrieval experiments? In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, pp 307–314Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • Douglas W. Oard
    • 1
  • Jason R. Baron
    • 2
  • Bruce Hedin
    • 3
  • David D. Lewis
    • 4
  • Stephen Tomlinson
    • 5
  1. 1.College of Information Studies and Institute for Advanced Computer StudiesUniversity of MarylandCollege ParkUSA
  2. 2.Office of the General Counsel, National Archives and Records AdministrationCollege ParkUSA
  3. 3.H5San FranciscoUSA
  4. 4.David D. Lewis ConsultingChicagoUSA
  5. 5.Open Text CorporationOttawaCanada

Personalised recommendations