The effectiveness of information retrieval technology in electronic discovery (E-discovery) has become the subject of judicial rulings and practitioner controversy. The scale and nature of E-discovery tasks, however, has pushed traditional information retrieval evaluation approaches to their limits. This paper reviews the legal and operational context of E-discovery and the approaches to evaluating search technology that have evolved in the research community. It then describes a multi-year effort carried out as part of the Text Retrieval Conference to develop evaluation methods for responsive review tasks in E-discovery. This work has led to new approaches to measuring effectiveness in both batch and interactive frameworks, large data sets, and some surprising results for the recall and precision of Boolean and statistical information retrieval methods. The paper concludes by offering some thoughts about future research in both the legal and technical communities toward the goal of reliable, effective use of information retrieval in E-discovery.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
See, e.g., Sarbanes-Oxley Act, Title 18 of the U.S. Code, Section 1519 (U.S. securities industry requirement to preserve email for 7 years); National Archives and Records Administration regulations, 36 Code of Federal Regulations Part 1236.22 (all email that is considered to fall within the definition of “federal records” under Title 44 of the U.S. Code, Section 3301, must be archived in either paper or electronic systems).
See, e.g., Qualcomm v. Broadcom, 539 F. Supp. 2d 1214 (S.D. Cal 2007), rev’d 2010 WL 1336937 (S.D. Cal. Apr. 2, 2010); In re Fannie Mae Litigation, 552 F.3d 814 (D.C. Cir. 2009).
See also Report of Anton R. Valukas, Examiner, In re Lehman Brothers Holdings Inc. (U.S. Bankruptcy Ct. S.D.N.Y. March 11, 2010), vol. 7, Appx. 5 (350 billion pages subjected to dozens of Boolean searches), available at http://lehmanreport.jenner.com/
See Practice Point 1 in (The Sedona Conference 2007b) (referred to herein as the “Sedona Search Commentary”).
Zubulake v. UBS Warburg LLC, 217 F.R.D. 309, 311 (2003); see generally (The Sedona Conference 2007a).
See Pension Committee of the University of Montreal Pension Plan et al. v Banc of America Securities LLC, et al., 2010 WL 184312, *1 (S.D.N.Y. Jan. 15, 2010) (“Courts cannot and do not expect that any party can reach a standard of perfection.”).
See (The Sedona Conference 2007b) at 202.
There is, at virtually all times, an admitted asymmetry of knowledge as between the requesting party (who does not own and therefore does not know what is in the target data collection), and the receiving or responding party (who does own the collection and thus in theory could know its contents). For an exploration into ethical questions encountered when the existence of documents is not reached by a given keyword search method, see (Baron 2009).
For example, see, People of the State of California v. Philip Morris, et al., Case No. J.C.C.P. 4041 (Sup. Ct. Cal.) (December 9, 1998 consent decree incorporating terms of Master Settlement Agreement or “MSA”). These documents have for the most part been digitized using Optical Character Recognition (OCR) technology and are available online on various Web sites. See the Legacy Tobacco Documents Library, available at http://legacy.library.ucsf.edu/. Portions of the MSA collection have been used in the TREC Legal Track.
As used by E-discovery practitioners, “keyword search” most often refers to the use of single query terms to identify the set of all documents containing that term as part of a pre-processing step to identify documents that merit manual review.
Id. at 202–03; 217 (Appendix describing alternative search methods at greater length).
Id. at 202–03.
In re Lorazepam & Clorazepate Antitrust Litigation, 300 F. Supp. 2d 43 (D.D.C. 2004).
J.C. Associates v. Fidelity & Guaranty Ins. Co., 2006 WL 1445173 (D.D.C. 2006).
For example, see Medtronic Sofamor Danck, Inc. v. Michelson, 229 F.R.D. 550 (W.D. Tenn. 2003); Treppel v. Biovail, 233 F.R.D. 363, 368–69 (S.D.N.Y. 2006) (court describes plaintiff’s refusal to cooperate with defendant in the latter’s suggestion to enter into a stipulation defining the keyword search terms to be used as a “missed opportunity” and goes on to require that certain terms be used); see also Alexander v. FBI, 194 F.R.D. 316 (D.D.C. 2000) (court places limitations on the scope of plaintiffs’ proposed keywords in a case involving White House email).
In addition to cases discussed infra, see, e.g., Dunkin Donuts Franchised Restaurants, Inc. v. Grand Central Donuts, Inc, 2009 WL 175038 (E.D.N.Y. June 19, 2009) (parties directed to meet and confer on developing a workable search protocol); ClearOne Communications, Inc. v. Chiang, 2008 WL 920336 (D. Utah April 1, 2008) (court adjudicates dispute over conjunctive versus disjunctive Boolean operators).
242 F.R.D. 139 (D.D.C. 2007)
Id. at 148 (citing to (Paul and Baron 2007), supra).
537 F. Supp. 2d 14, 24 (D.D.C. 2008).
Id. at 16 (quoting U.S. v. O’Keefe, 2007 WL 1239204, at *3 (D.D.C. April 27, 2007)) (internal quotations omitted).
537 F. Supp. 2d at 16.
Based only on what is known from the opinion, it is admittedly somewhat difficult to parse the syntax used in this search string. One is left to surmise that the ambiguity present on the face of the search protocol may have contributed to the court finding the matter of adjudicating a proper search string to be too difficult a task.
537 F. Supp. 2d at 24.
Equity Analytics v. Lundin, 248 F.R.D. 331 (D.D.C. 2008) (stating that in O’Keefe “I recently commented that lawyers express as facts what are actually highly debatable propositions as to efficacy of various methods used to search electronically stored information,” and requiring an expert to describe scope of proposed search); see also discussion of Victor Stanley, Inc. v. Creative Pipe, Inc., infra.
250 F.R.D. 251 (D. Md. 2008).
Id. at 254.
Id. at 256–57.
Id. at 259 n.9.
Id. at 260 n.10.
William A. Gross Construction Assocs., Inc. v. Am. Mftrs. Mutual Ins. Co., 256 F.R.D. 134, 135 (S.D.N.Y. 2009).
We note that at least one important decision has been rendered by a court in the United Kingdom, which in sophisticated fashion similarly has analyzed keyword choices by parties at some length. See Digicel (St. Lucia) Ltd. & Ors. v. Cable & Wireless & Ors.,  EWHC 2522 (Ch.).
Strictly speaking, unlike for a set, it is not meaningful to refer to the “recall” or “precision” of a ranking of documents. The popular ranked-based measures of Recall@K and Precision@K (which measure recall and precision of the set of top-ranked K documents) nominally suggest a recall or precision orientation for ranking, but actually compare ranked retrieval systems identically on individual topics. One can observe the recall-precision tradeoff in a ranking, however, by varying the cutoff K; e.g., increasing K will tend to increase recall at the expense of precision.
Clearwell Systems obtained the collection from Aspen Systems and performed the processing described in this section.
A pilot Interactive task was run in 2007 (Tomlinson et al. 2008), but with a very different task design.
As is common, we use “Boolean query” somewhat loosely to mean queries built using not just the three basic Boolean operators (and, or, not), but also truncation and (unordered) proximity operators.
Software Engineering Institute, http://www.sei.cmu.edu/cmmi/general
American Institute of Certified Public Accountants (2009) Statement on auditing standards no. 70: Service organizations. SAS 70
Aslam JA, Pavlu V, Yilmaz E (2006) A statistical method for system evaluation using incomplete judgments. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 541–548
Bales S, Wang P (2006) Consolidating user relevance criteria: a meta-ethnography of empirical studies. In: Proceedings of the 42nd annual meeting of the American society for information science and technology
Baron JR (2005) Toward a federal benchmarking standard for evaluation of information retrieval products used in E-discovery. Sedona Conf J 6:237–246
Baron JR (2007) The TREC legal track: origins and reports from the first year. Sedona Conf J 8:237–246
Baron JR (2008) Towards a new jurisprudence of information retrieval: what constitutes a ’reasonable’ search for digital evidence when using keywords?. Digit Evid Electronic Signature Law Rev 5:173–178
Baron JR (2009) E-discovery and the problem of asymmetric knowledge. Mercer Law Rev 60:863
Baron JR, Thompson P (2007) The search problem posed by large heterogeneous data sets in litigation: possible future approaches to research. In: Proceedings of the 11th international conference on artificial intelligence and law, pp 141–147
Baron JR, Lewis DD, Oard DW (2007) TREC-2006 Legal Track overview. In: The fifteenth text retrieval conference proceedings (TREC 2006), pp 79–98
Bauer RS, Brassil D, Hogan C, Taranto G, Brown JS (2009) Impedance matching of humans and machines in high-Q information retrieval systems. In: Proceedings of the IEEE international conference on systems, man and cybernetics, pp 97–101
Blair D (2006) Wittgenstein, language and information: back to the rough ground. Springer, New York
Blair D, Maron ME (1985) An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun ACM 28(3):289–299
Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Computer Netw ISDN Syst 30(1–7):107–117
Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international conference on research and development in information retrieval, pp 25–32
Buckley C, Voorhees EM (2005) Retrieval system evaluation. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 53–75
Buckley C, Dimmick D, Soboroff I, Voorhees E (2006) Bias and the limits of pooling. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 619–620
Büttcher S, Clarke CLA, Yeung PCK, Soboroff I (2007) Reliable information retrieval evaluation with incomplete and biased judgements. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp 63–70
Carmel D, Yom-Tov E, Darlow A, Pelleg D (2006) What makes a query difficult? In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 390–397
Carterette B, Pavlu V, Kanoulas E, Aslam JA, Allan J (2008) Evaluation over thousands of queries. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 651–658
Clarke C, Craswell N, Soboroff I (2005) The TREC terabyte retrieval track. SIGIR Forum 39(1):25–25
Cleverdon C (1967) The Cranfield tests on index language devices. Aslib Proceed 19(6):173–194
Cormack GV, Lynam TR (2006) TREC 2005 spam track overview. In: The fourteenth text retrieval conference (TREC 2005), pp 91–108
Cormack GV, Palmer CR, Clarke CLA (1998) Efficient construction of large test collections. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, pp 282–289
Dumais ST, Belkin NJ (2005) The TREC interactive tracks: putting the user into search. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 123–152
Fox EA (1983) Characterization of two new experimental collections in computer and information science containing textual and bibliographic concepts. Tech. rep. TR83-561, Cornell University
Harman DK (2005) The TREC test collections. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 21–52
Hedin B, Oard DW (2009) Replication and automation of expert judgments: information engineering in legal E-discovery. In: SMC’09: Proceedings of the 2009 IEEE international conference on systems, man and cybernetics, pp 102–107
Hedin B, Tomlinson S, Baron JR, Oard DW (2010) Overview of the TREC 2009 Legal Track. In: The eighteenth text retrieval conference (TREC 2009)
Ingwersen P (1992) Information retrieval interaction. Taylor Graham, London
Ingwersen P, Järvelin K (2005) The turn: integration of information seeking and retrieval in context. Springer
International Organization for Standards (2005) Quality management systems—fundamentals and vocabulary. ISO 9000:2005
Jensen JH (2000) Special issues involving electronic discovery. Kansas J Law Public Policy 9:425
Kando N, Mitamura T, Sakai T (2008) Introduction to the NTCIR-6 special issue. ACM Trans Asian Lang Inform Process 7(2):1–3
Kazai G, Lalmas M, Fuhr N, Gövert N (2004) A report on the first year of the initiative for the evaluation of XML retrieval (INEX’02). J Am Soc Inform Sci Technol 55(6):551–556
Lewis D, Agam G, Argamon S, Frieder O, Grossman D, Heard J (2006) Building a test collection for complex document information processing. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 665–666
Lewis DD (1996) The TREC-4 filtering track. In: The fourth text retrieval conference (TREC-4), pp 165–180
Lynam TR, Cormack GV (2009) Multitext legal experiments at TREC 2008. In: The sixteenth text retrieval conference (TREC 2008)
Majumder P, Mitra M, Pal D, Bandyopadhyay A, Maiti S, Mitra S, Sen A, Pal S (2008) Text collections for FIRE. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 699–700
Moffat A, Zobel J (2008) Rank-biased precision for measurement of retrieval effectiveness. ACM Trans Inf Syst 27(1)
Oard DW, Hedin B, Tomlinson S, Baron JR (2009) Overview of the TREC 2008 Legal Track. In: The seventeenth text retrieval conference (TREC 2008)
Oot P, Kershaw A, Roitblat HL (2010) Mandating reasonableness in a reasonable inquiry. Denver Univ Law Rev 87:533
Paul GL, Baron JR (2007) Information inflation: can the legal system adapt? Richmond J Law Technol 13(3)
PCI Security Standards Council (2009) Payment card industry (PCI) data security standard: requirements and security assessment procedures, version 1.2.1. http://www.pcisecuritystandards.org
Peters C, Braschler M (2001) European research letter: cross-language system evaluation: the CLEF campaigns. J Am Soc Inf Sci Technol 52(12):1067–1072
Roitblat HL, Kershaw A, Oot P (2010) Document categorization in legal electronic discovery: computer classification vs. manual review. J Am Soc Inf Sci Technol 61(1):70–80
Sakai T, Kando N (2008) On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf Retr 11(5):447–470
Sanderson M, Joho H (2004) Forming test collections with no system pooling. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, pp 33–40
Sanderson M, Zobel J (2005) Information retrieval system evaluation: effort, sensitivity, and reliability. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 162–169
Schmidt H, Butter K, Rider C (2002) Building digital tobacco document libraries at the University of California, San Francisco Library/Center for Knowledge Management. D-Lib Mag 8(2)
Singhal A, Salton G, Buckley C (1995) Length normalization in degraded text collections. In: Proceedings of fifth annual symposium on document analysis and information retrieval, pp 15–17
Soboroff I (2007) A comparison of pooled and sampled relevance judgments. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp 785–786
Solomon RD, Baron JR (2009) Bake offs, demos & kicking the tires: a practical litigator’s brief guide to evaluating early case assessment software & search & review tools. http://www.kslaw.com/Library/publication/BakeOffs_Solomon.pdf
Spärck Jones K, van Rijsbergen CJ (1975) Report on the need for and provision of an ideal information retrieval test collection. Tech. Rep. 5266, Computer Laboratory, University of Cambridge, Cambridge (UK)
Taghva K, Borsack J, Condit A (1996) Effects of OCR errors on ranking and feedback using the vector space model. Inf Process Manage 32(3):317–327
The Sedona Conference (2007a) The Sedona Principles, second edition: best practice recommendations and principles for addressing electronic document production. http://www.thesedonaconference.org
The Sedona Conference (2007b) The Sedona Conference best practices commentary on the use of search and information retrieval methods in E-discovery. The Sedona Conf J 8:189–223
The Sedona Conference (2009) The Sedona Conference commentary on achieving quality in the E-discovery process. The Sedona Conf J 10:299–329
Tomlinson S (2007) Experiments with the negotiated Boolean queries of the TREC 2006 legal discovery track. In: The fifteenth text retrieval conference (TREC 2006)
Tomlinson S, Oard DW, Baron JR, Thompson P (2008) Overview of the TREC 2007 Legal Track. In: The sixteenth text retrieval conference (TREC 2007)
Turpin A, Scholer F (2006) User performance versus precision measures for simple search tasks. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 11–18
Voorhees EM, Garofolo JS (2005) Retrieving noisy text. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 183–197
Voorhees EM, Harman DK (2005) The text retrieval conference. In: Voorhees EM, Harman DK (eds) TREC: experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 3–19
Wayne CL (1998) Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation. In: Proceedings of the first international conference on language resources and evaluation
Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 15th international conference on information and knowledge management (CIKM), pp 102–111
Zhao FC, Oard DW, Baron JR (2009) Improving search effectiveness in the legal E-discovery process using relevance feedback. In: ICAIL 2009 DESI III Global E-Discovery/E-Disclosure Workshop. http://www.law.pitt.edu/DESI3_Workshop/DESI_III_papers.htm
Zobel J (1998) How reliable are the results of large-scale information retrieval experiments? In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, pp 307–314
The authors first wish to thank a number of individuals who in discussions with the authors contributed ideas and suggestions that found their way into portions of the present paper, including Thomas Bookwalter, Gordon Cormack, Todd Elmer, Maura Grossman and Richard Mark Soley. Additionally, the TREC Legal Track would not have been possible without the support of Ellen Voorhees and Ian Soboroff of NIST; the faculty, staff and students of IIT, UCSF, Tobacco Documents Online, and Roswell Park Cancer Institute who helped build IIT CDIP or the LTDL on which it was based; Celia White (the 2006 Track expert interactive searcher); Venkat Rangan of Clearwell Systems who helped to build the TREC Enron test collection; Richard Braman of The Sedona Conference® and the hundreds of law students, lawyers and Sedona colleagues who have contributed pro bono time to the project. Finally, the authors wish to thank Kevin Ashley and Jack Conrad for their support of and participation in the First and Third DESI Workshops, held as part of the Eleventh and Twelfth International Conferences on Artificial Intelligence and Law, at which many of the ideas herein were discussed.
The first three sections of this article draw upon material in the introductory sections of two papers presented at events associated with the 11th and 12th International Conferences on Artificial Intelligence and Law (ICAIL) (Baron and Thompson 2007; Zhao et al. 2009) as well as material first published in (Baron 2008), with permission.
About this article
Cite this article
Oard, D.W., Baron, J.R., Hedin, B. et al. Evaluation of information retrieval for E-discovery. Artif Intell Law 18, 347–386 (2010). https://doi.org/10.1007/s10506-010-9093-9