Advertisement

Artificial Intelligence and Law

, Volume 18, Issue 4, pp 321–345 | Cite as

E-Discovery revisited: the need for artificial intelligence beyond information retrieval

  • Jack G. ConradEmail author
Article

Abstract

In this work, we provide a broad overview of the distinct stages of E-Discovery. We portray them as an interconnected, often complex workflow process, while relating them to the general Electronic Discovery Reference Model (EDRM). We start with the definition of E-Discovery. We then describe the very positive role that NIST’s Text REtrieval Conference (TREC) has added to the science of E-Discovery, in terms of the tasks involved and the evaluation of the legal discovery work performed. Given the critical nature that data analysis plays at various stages of the process, we present a pyramid model, which complements the EDRM model: for gathering and hosting; indexing; searching and navigating; and finally consolidating and summarizing E-Discovery findings. Next we discuss where the current areas of need and areas of growth appear to be, using one of the field’s most authoritative surveys of providers and consumers of E-Discovery products and services. We subsequently address some areas of Artificial Intelligence, both Information Retrieval-related and not, which promise to make future contributions to the E-Discovery discipline. Some of these areas include data mining applied to e-mail and social networks, classification and machine learning, and the technologies that will enable next generation E-Discovery. The lesson we convey is that the more IR researchers and others understand the broader context of E-Discovery, including the stages that occur before and after primary search, the greater will be the prospects for broader solutions, creative optimizations and synergies yet to be tapped.

Keywords

Electronically stored information ESI E-Discovery E-Disclosure EDD Information retrieval Data mining Text REtrieval Conference TREC 

Notes

Acknowledgements

We thank Peter Jackson and Khalid Al-Kofahi for the time and resources to pursue this study. And we are grateful to Marc Light for his review of this work and his recommendations for its increased clarity. We also thank the formal reviewers of this work for their comments and suggestions for its improvement. Finally, we wish to thank Kevin Ashley and Jason Baron for their support and feedback on this expanded work, as well as for their numerous contributions to the First and Third DESI Workshops, the former from which this paper was germinated.

References

  1. Ashley KD, Bridewell W (2010) Emerging AI & Law approaches to automating analysis and retrieval of electronically stored information in discovery proceedings. In: Artificial intelligence and law special issue on E-Discovery (This issue)Google Scholar
  2. Barnett T, Godjevac S, Renders J-M, Privault C, Schneider J, Wickstrom R (2009) Machine learning classification for document review. In: Proceedings of the global E-Discovery/E-Disclosure workshop on electronically stored information in discovery at the 12th international conference on artificial intelligence and law (ICAIL09 DESI Workshop). DESI Press, Barcelona Google Scholar
  3. Baron JR, Lewis DD, Oard DW (2006) TREC 2006 legal track overview. In: The fifteenth Text REtrieval Conference proceedings (TREC 2006), Gaithersburg, MD, Nov 2006. National Institute of Standards and Technology (NIST), USAGoogle Scholar
  4. Baron JR, Thompson P (2007) The search problem posed by large heterogeneous data sets in litigation: possible future approaches to research. In: Proceedings of the 11th international conference on artificial intelligence and law (ICAIL07). ACM Press, Palo Alto, CA Google Scholar
  5. Baron JR (2008) Panning for gold in E-discovery: what every information scientist should know about how lawyers search for electronic evidence. In: CIKM panel on E-Discovery, 17th ACM conference on information and knowledge management (CIKM 2008). ACM, USA (Oct. CIKM Web site) Google Scholar
  6. Barsocchini A (2005) Electronic discovery primer, in law technology News, 28 Aug 2005Google Scholar
  7. Bauer RS, Jade T, Hedin B, Hogan C (2008) Automated legal sensemaking: the centrality of relevance and intentionality. In: Proceedings of the second international workshop on supporting search and sensemaking for electronically stored information in discovery at the international conference on digital evidence (ICDE 2008, DESI Workshop). DESI Press, UKGoogle Scholar
  8. Blair DC, Maron ME (1985) An evaluation of retrieval effectiveness for a full-text document retrieval system. In: Communications of the ACM, 28(3). ACM Press, New York, pp 289–299Google Scholar
  9. Bobrow DG, King TH, Lee LC (2007) Enhancing legal discovery with linguistic processing. In: Proceedings of the first international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings at the 11th international conference on artificial intelligence and law (ICAIL07 DESI Workshop, Stanford University). DESI Press, CA Google Scholar
  10. Buckley C (2008) IR perspectives on the E-discovery problems. In: CIKM Panel on E-Discovery, 17th ACM conference on information and knowledge management (CIKM 2008). ACM, USA (CIKM Web Site)Google Scholar
  11. Chaplin D, Jytyla R (2009) Conceptual search technology: avoid sanctions, prevent privilege waiver, and understand your data. In: Proceedings of the global E-Discovery/E-Disclosure workshop on electronically stored information in discovery at the 12th international conference on artificial intelligence and law (ICAIL09 DESI Workshop). DESI Press, Barcelona Google Scholar
  12. Conrad JG (2007) E-Discovery revisited: a broader perspective for IR researchers. In: Proceedings of the first international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings at the 11th international conference on artificial intelligence and law (ICAIL07 DESI Workshop, Stanford University). DESI Press, CAGoogle Scholar
  13. Cormack GV, Mojdeh M (2009) Machine learning for information retrieval: TREC 2009 Web, relevance feedback and legal tracks. In: The eighteenth Text REtrieval Conference proceedings (TREC 2009), Gaithersburg, MD, Nov 2009. National Institute of Standards and Technology (NIST), USAGoogle Scholar
  14. Counsel C (2006) The American Bar Association (ABA), section of litigation, committee on Corporate Counsel. http://www.abanet.org/litigation/committees/corporate/
  15. Evans DA (2008) Why E-Discovery is a CIKM-hard problem. In: CIKM Panel on E-Discovery, 17th ACM conference on information and knowledge management (CIKM 2008). ACM, USA (Oct. CIKM Web site)Google Scholar
  16. Evans S (2009) E-discovery market set for 2010 boom: Gartner, http://www.cbronline.com/news. 16 Dec 2009
  17. Fios (2010) Discovery resources web site. Resources and news about E-discovery: http://www.discoveryresources.com
  18. Hedin B, Tomlinson S, Baron JR, Oard DW (2009) Overview of the TREC 2009 legal track. In: The eighteenth Text REtrieval Conference proceedings (TREC 2009), Gaithersburg, MD, Nov 2009. National Institute of Standards and Technology (NIST), USAGoogle Scholar
  19. Henseler H (2010) Network-based filtering for large e-mail collections in E-discovery. In: Artificial intelligence and law special issue on e-Discovery (This issue)Google Scholar
  20. Hogan C, Brassil D, Rugani SM, Reinhart J, Gerber M, Jade T (2008) H5 at TREC 2008 legal interactive: user modeling, assessment & measurement. In: The seventeenth Text REtrieval Conference proceedings (TREC 2008), Gaithersburg, MD, Nov 2008. National Institute of Standards and Technology (NIST), USAGoogle Scholar
  21. Hogan C, Bauer R, Brassil D (2010) Human-aided computer cognition for E-discovery. In: Artificial intelligence and law special issue on e-Discovery (This issue)Google Scholar
  22. Isaza J, Jablonski JJ (2010) Legal holds: define the scope: the third article in a series aimed at helping organizations discharge their duty to preserve ESI. From Law.com, http://www.almdc.com/jsp/lawtechnologynews. 26 Feb 2010
  23. Klimt B, Yang Y (2004) A new dataset: the Enron Corpus. In: ECML, pp 217–226Google Scholar
  24. Lang JP, Baffa J (2010) Electronic discovery: an overview and practical pointers. “Firm News and Activities”. Bates & Carey LLP Web site. http://www.batescarey.com/newsandarticles/electronicdiscovery.asp
  25. Law.com (2010) Web-based legal news and information network: http://www.Law.com
  26. Lewis DD, Agam G, Argamon S, Frieder O, Grossman DA, Heard J (2006) Building a test collection for complex document information processing. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR06). ACM Press, New York, pp 665–666Google Scholar
  27. Losey R (2009) Jason Baron on search—how do you find anything when you have a billion emails? “E-Discovery Team” blog. http://www.e-discoveryteam.com, March 4 2009
  28. Illinois Institute of Technology (IIT) (2006) Complex document information processing (CDIP) 1.0 collection. Master Settlement Agreement (MSA) subcollection of the Legacy Tobacco Documents Library (LTDL)Google Scholar
  29. Oard DW, Hedin B, Tomlinson S, Baron JR (2008) Overview of the TREC 2008 legal track. In: The seventeenth Text REtrieval Conference proceedings (TREC 2008), Gaithersburg, MD, Nov 2008. National Institute of Standards and Technology (NIST), USAGoogle Scholar
  30. Oard DW, Baron JR, Hedin B, Lewis DD, Tomlinson S (2010) Evaluation of information retrieval for E-Discovery. In: Artificial intelligence and law special issue on e-Discovery (This issue)Google Scholar
  31. Radding A (2006) The forecast for EDD, special to Law.com, Nov 15 2006Google Scholar
  32. Roitblat HL, Kershaw A, Oot P (2010) Document categorization in legal electronic discovery: computer classification vs manual review. J Am Soc Info Sci Techn (JASIST), Wiley: Hoboken, NJ 61(1):70–80Google Scholar
  33. Scott J (2000) Social network analysis: a handbook, 2nd edn. Sage Publications, LondonGoogle Scholar
  34. SearchFinancialSecurity.com (2009) Definitions: Electronic Discovery. http://searchfinancialsecurity.techtarget.com
  35. Search Security (2010) IT site to keep corporate data and assets secure: http://www.SearchSecurity.com
  36. The Sedona Conference (2009) Commentary on achieving quality in the E-Discovery process. Working group on best practices for document retention & production. Public Comment Version, MayGoogle Scholar
  37. The Sedona Conference (2010) Facilitates discussion among legal experts on topics like complex litigation: http://www.theSedonaConference.org
  38. Socha G, Gelbmann T (2006) The 2006 Socha-Gelbmann electronic discovery survey report. Socha Consulting LLC and Gelbmann & Associates, MNGoogle Scholar
  39. Socha G, Gelbmann T (2008) The 2008 Socha-Gelbmann 6th annual electronic discovery survey. Socha Consulting LLC and Gelbmann & Associates, MNGoogle Scholar
  40. Socha G, Gelbmann T (2009a) Strange times, a summary of the 2009 Socha-Gelbmann 7th annual electronic discovery survey. Socha Consulting LLC and Gelbmann & Associates, MN. Law technology news, Law.com, 1 Aug 2009 Google Scholar
  41. Socha G, Gelbmann T (2009b) Electronic discovery reference model. edrm.netGoogle Scholar
  42. Socha G (2008) Description of the electronic discovery reference model. Interview by Kenna Kim, PivotalDiscovery.com (Part 3) [on http://www.YouTube.com] At 12th Annual Thomson Reuters E-Discovery and Records Retention Conference. San Francisco, CA, Dec
  43. Sterenzy T (2009) Equivio at TREC 2009 legal interactive. In: The eighteenth Text REtrieval Conference proceedings (TREC 2009), Gaithersburg, MD, Nov 2009. National Institute of Standards and Technology (NIST), USAGoogle Scholar
  44. Third International Workshop on Supporting Search and Sensemaking for Electronically Stored Information in Discovery (DESI3) (2009) Proceedings of the global E-Discovery/E-Disclosure workshop on electronically stored information in discovery at the 12th international conference on artificial intelligence and law (ICAIL09 DESI Workshop). DESI Press, Barcelona. http://www.law.pitt.edu/DESI3_Workshop/
  45. Thompson P, Turtle HR, Yang B, Flood J (1994) TREC-3 ad hoc retrieval and routing experiments using the WIN system. In: The third Text REtrieval Conference proceedings (TREC 1994), Gaithersburg, MD, Nov 1994. National Institute of Standards and Technology (NIST), USAGoogle Scholar
  46. Tomlinson S, Oard DW, Baron JR, Thompson P (2007) Overview of the TREC 2007 legal track. In: The sixteenth Text REtrieval Conference proceedings (TREC 2007), Gaithersburg, MD, Nov 2007. National Institute of Standards and Technology (NIST), USAGoogle Scholar
  47. Voorhees EM (2007) Overview of TREC 2007. In: The sixteenth Text REtrieval Conference proceedings (TREC 2007), Gaithersburg, MD, Nov 2007. National Institute of Standards and Technology (NIST), USAGoogle Scholar
  48. Voorhees EM, Buckland LP (eds) (2008) Proceedings of the seventeenth Text REtrieval Conferences (TREC 2008), Gaithersburg MD, Nov 2008. National Institute of Standards and Technology (NIST), USA Google Scholar
  49. Zhao FC, Oard DW, Baron JR (2009) Improving search effectiveness in the legal e-discovery process using relevance feedback. In: Proceedings of the global E-Discovery/E-Disclosure workshop on electronically stored information in discovery at the 12th international conference on artificial intelligence and law (ICAIL09 DESI Workshop). DESI Press, BarcelonaGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  1. 1.Thomson Reuters Research and DevelopmentSaint PaulUSA

Personalised recommendations