Advertisement

Artificial Intelligence and Law

, Volume 18, Issue 4, pp 413–430 | Cite as

Network-based filtering for large email collections in E-Discovery

  • Hans HenselerEmail author
Article

Abstract

The information overload in E-Discovery proceedings makes reviewing expensive and it increases the risk of failure to produce results on time and consistently. New interactive techniques have been introduced to increase reviewer productivity. In contrast, the techniques presented in this article propose an alternative method that tries to reduce information during culling so that less information needs to be reviewed. The proposed method first focuses on mapping the email collection universe using straightforward statistical methods based on keyword filtering combined with date time and custodian identities. Subsequently, a social network is constructed from the email collection that is analyzed by filtering on date time and keywords. By using the network context we expect to provide a better understanding of the keyword hits and the ability to discard certain parts of the collection.

Keywords

E-Discovery Social network analysis Information retrieval Email visualisation 

References

  1. Ashley KD, Bridewell W (2009) Emerging AI+law approaches to automating analysis and retrieval of ESI in discovery proceedings. DESI III Global E-Discovery/E-Disclosure workshop, Barcelona. http://www.law.pitt.edu/DESI3_Workshop/Papers/DESI_III.KAshley.pdf
  2. Batagelj V, Mrvar A (2003) Pajek—analysis and visualization of large networks. In: Jünger M, Mutzel P (eds) Graph drawing software. Springer, New York, pp 77–103Google Scholar
  3. Bobrow D, King T, Lee L (2007) Enhancing legal discovery with linguistic processing. DESI I. Second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings. http://www.umiacs.umd.edu/~oard/desi-ws/papers/bobrow.pdf
  4. Bommarito II MJ, Katz D, Zelner J (2009) Law as a seamless web? Comparison of various network representations of the United States supreme court corpus (1791–2005). In: Proceedings of the 12th international conference on artificial intelligence and law, pp 234–235Google Scholar
  5. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst Arch 30(1–7):107–117. http://infolab.stanford.edu/~backrub/google.html Google Scholar
  6. Chaplin D (2008) Conceptual search—ESI, litigation and the issue of language. DESI II. Second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings. http://www.cs.ucl.ac.uk/staff/S.Attfield/desi/9.%20Chaplin.pdf
  7. Craswell N, de Vries A, Soboroff I (2005) Overview of the TREC-2005 enterprise track. In: The fourteenth text retrieval conference proceedings (TREC 2005). http://trec.nist.gov/pubs/trec14/papers/ENTERPRISE.OVERVIEW.pdf
  8. Culotta A, Bekkerman R, Mccallum A (2004) Extracting social networks and contact information from email and the web. In CEAS-1. http://www.ceas.cc/papers-2004/176.pdf
  9. Görg C, Stasko J (2008) Jigsaw: investigative analysis on text document collections through visualization. DESI II. Second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings. http://www.cs.ucl.ac.uk/staff/S.Attfield/desi/7.%20Gorg.pdf
  10. Heer J (2005) Exploring Enron: visual data mining of email. Available online at http://jheer.org/enron/
  11. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5): 604–632. http://www.cs.cornell.edu/home/kleinber/auth.pdf Google Scholar
  12. Klimt B, Yang Y (2004) Introducing the Enron corpus. In: Proceedings of the collaboration, electronic messaging, anti-abuse and spam conference. http://www.ceas.cc/papers-2004/168.pdf
  13. Krause J (2009) In search of the perfect search. ABA J. http://www.abajournal.com/magazine/in_search_of_the_perfect_search
  14. Mazzega P, Bourcier D, Boulet R (2009) The network of French legal codes. In: Proceedings of the 12th international conference on artificial intelligence and law, pp 236–237Google Scholar
  15. Paul G, Baron J (2007) Information inflation: can the legal system adapt? Richmond J Law Technol XIII(3). http://law.richmond.edu/jolt/v13i3/article10.pdf
  16. Reeves A, May C (2008) Term testing: a case study. DESI II. Second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings. http://www.cs.ucl.ac.uk/staff/S.Attfield/desi/4.%20May.pdf
  17. Scott J (1991) Social network analysis. Sage, LondonGoogle Scholar
  18. Socha-Gelbmann (2006) EDRM E-discovery reference model. http://www.edrm.net
  19. Tuulos VH, Perkiö J, Tirri H (2005) Multi-faceted information retrieval system for large scale email archives. In: SIGIR ‘05, pp 683–683. http://cosco.hiit.fi/Articles/wi05-mail.pdf
  20. Viégas F, Boyd D, Nguyen D, Potter J, Donath J (2004) Digital artifacts for remembering and storytelling: post history and social network fragments. In: HICSS-37. http://alumni.media.mit.edu/~fviegas/papers/posthistory_snf.pdf
  21. Viégas F, Golder S, Donath J (2006) Visualizing email content: portraying relationships from conversational histories. In: Proceedings of ACM CHI 2006, pp 979–988. http://www.research.ibm.com/visual/papers/themail_chi_paper.pdf
  22. Weerkamp W, Balog K, de Rijke M (2009) Using contextual information to improve search in email archives. In: 31st European conference on information retrieval conference (ECIR 2009), LNCS 5478, pp 400–411. http://staff.science.uva.nl/~mdr/Publications/Files/ecir2009-discsearch.pdf

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  1. 1.Amsterdam University of Applied SciencesAmsterdamThe Netherlands

Personalised recommendations