Skip to main content
Log in

Legal information retrieval for understanding statutory terms

  • Original Research
  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

In this work we study, design, and evaluate computational methods to support interpretation of statutory terms. We propose a novel task of discovering sentences for argumentation about the meaning of statutory terms. The task models the analysis of past treatment of statutory terms, an exercise lawyers routinely perform using a combination of manual and computational approaches. We treat the discovery of sentences as a special case of ad hoc document retrieval. The specifics include retrieval of short texts (sentences), specialized document types (legal case texts), and, above all, the unique definition of document relevance provided in detailed annotation guidelines. To support our experiments we assembled a data set comprising 42 queries (26,959 sentences) which we plan to release to the public in the near future in order to support further research. Most importantly, we investigate the feasibility of developing a system that responds to a query with a list of sentences that mention the term in a way that is useful for understanding and elaborating its meaning. This is accomplished by a systematic assessment of different features that model the sentences’ usefulness for interpretation. We combine features into a compound measure that accounts for multiple aspects. The definition of the task, the assembly of the data set, and the detailed task analysis provide a solid foundation for employing a learning-to-rank approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://github.com/jsavelka/statutory_interpretation/blob/master/annotation_guidelines_v2.pdf.

  2. https://github.com/jsavelka/statutory_interpretation.

  3. A small portion of the data set is available at http://www.case.law. The complete data set can be obtained upon entering into research agreement with LexisNexis.

  4. https://www.elastic.co/.

  5. https://github.com/vhyza/elasticsearch-analysis-lemmagen.

  6. https://github.com/hlavki/jlemmagen.

  7. http://lemmatise.ijs.si.

  8. Per information provided in the email from www.info@case.law on 2019-01-07.

  9. https://github.com/jsavelka/statutory_interpretation/blob/master/annotation_guidelines_v2.pdf.

  10. We used Gensim’s phrases module which is available at https://radimrehurek.com/gensim/models/phrases.html.

  11. See https://radimrehurek.com/gensim/models/word2vec.html.

  12. https://code.google.com/archive/p/word2vec/.

  13. https://nlp.stanford.edu/projects/glove/.

  14. Both available at https://github.com/RaRe-Technologies/gensim-data.

  15. https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md.

  16. https://radimrehurek.com/gensim/models/ldamodel.html.

  17. http://www.case.law.

References

  • Allan J, Wade C, Bolivar A (2003) Retrieval and novelty detection at the sentence level. In: Proceedings of the 26th international ACM SIGIR conference on research and cevelopment in information retrieval, ACM, pp 314–321

  • Arora S, Liang Y, Ma T (2017) A simple but tough-to-beat baseline for sentence embeddings. In: ICLR 2017

  • Ashley KD (1991) Modeling legal arguments: reasoning with cases and hypotheticals. MIT Press

  • Ashley KD (2017) Artificial intelligence and legal analytics: new tools for law practice in the digital age. Cambridge University Press

  • Ashley KD, Walker VR (2013) From information retrieval (ir) to argument retrieval (ar) for legal cases: Report on a baseline study. In: JURIX, pp 29–38

  • Bhattacharya P, Paul S, Ghosh K, Ghosh S, Wyner A (2019) Identification of rhetorical roles of sentences in indian legal judgments. Preprint arXiv:191105405

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022

    MATH  Google Scholar 

  • Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146

    Article  Google Scholar 

  • Bonferroni C (1936) Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze 8:3–62

    MATH  Google Scholar 

  • Bouma G (2009) Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of GSCL, pp 31–40

  • Chierchia G, McConnell-Ginet S (2000) Meaning and grammar: an introduction to semantics. MIT Press. https://books.google.com/books?id=pxJGet3pKdoC

  • D’Agostini Bueno TC, von Wangenheim CG, da Silva Mattos E, Hoeschl HC, Barcia RM (1999) Jurisconsulto: retrieval in jurisprudencial text bases using juridical terminology. In: Proceedings of the 7th international conference on artificial intelligence and law, pp 147–155

  • Daniels JJ, Rissland EL (1997a) Finding legally relevant passages in case opinions. In: Proceedings of the 6th international conference on artificial intelligence and law, pp 39–46

  • Daniels JJ, Rissland EL (1997b) What you saw is what you want: using cases to seed information retrieval. In: International conference on case-based reasoning. Springer, pp 325–336

  • Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30

    MathSciNet  MATH  Google Scholar 

  • Doko A, Stula M, Stipanicev D (2013) A recursive tf-isf based sentence retrieval method with local context. IJMLC 3(2):195

    Article  Google Scholar 

  • Falakmasir MH, Ashley KD (2017) Utilizing vector space models for identifying legal factors from text. In: JURIX, pp 183–192

  • Fernández RT (2011) Improving search effectiveness in sentence retrieval and novelty detection. PhD thesis, Universidade de Santiago de Compostela

  • Fernández RT, Losada DE, Azzopardi LA (2011) Extending the language modeling framework for sentence retrieval to include local context. Inf Retr 14(4):355–389

    Article  Google Scholar 

  • Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  Google Scholar 

  • Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92

    Article  MathSciNet  Google Scholar 

  • Ganguly D, Roy D, Mitra M, Jones GJ (2015) Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 795–798

  • Grabmair M (2016) Document ranking with citation information and oversampling sentence classification in the luima framework. In: Legal knowledge and information systems: JURIX 2016: the twenty-ninth annual conference. IOS Press, vol 294, p 33

  • Grabmair M, Ashley KD, Chen R, Sureshkumar P, Wang C, Nyberg E, Walker VR (2015) Introducing luima: an experiment in legal conceptual retrieval of vaccine injury decisions using a uima type system and tools. In: Proceedings of the 15th international conference on artificial intelligence and law, pp 69–78

  • Gurulingappa H, Toldo L, Schepers C, Bauer A, Megaro G (2016) Semi-supervised information retrieval system for clinical decision support. In: TREC

  • Harašta J, Šavelka J, Kasl F, Míšek J (2019) Automatic segmentation of czech court decisions into multi-paragraph parts. Jusletter IT 4(M)

  • Harašta J, Novotná T, Šavelka J (2020) Citation data of czech apex courts. Preprint arXiv:200202224

  • Hoffman M, Bach FR, Blei DM (2010) Online learning for latent dirichlet allocation. In: Advances in neural information processing systems, pp 856–864

  • Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 289–296

  • Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat:65–70

  • Hyman H, Sincich T, Will R, Agrawal M, Padmanabhan B, Fridy W (2015) A process model for information retrieval context learning and knowledge discovery. Artif Intell Law 23(2):103–132

    Article  Google Scholar 

  • Iyyer M, Manjunatha V, Boyd-Graber J, Daumé III H (2015) Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), vol 1, pp 1681–1691

  • Joulin A, Grave E, Bojanowski P, Douze M, Jégou H, Mikolov T (2016a) Fasttext. zip: compressing text classification models. Preprint arXiv:161203651

  • Joulin A, Grave E, Bojanowski P, Mikolov T (2016b) Bag of tricks for efficient text classification. Preprint arXiv:160701759

  • Juršic M, Mozetic I, Erjavec T, Lavrac N (2010) Lemmagen: multilingual lemmatisation with induced ripple-down rules. J Univ Comput Sci 16(9):1190–1214

    Google Scholar 

  • Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. Preprint arXiv:14042188

  • Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems, pp 3294–3302

  • Klein MC, Van Steenbergen W, Uijttenbroek EM, Lodder AR, van Harmelen F (2006) Thesaurus-based retrieval of case law. Front Artif Intell Appl 152:61

    Google Scholar 

  • Kusner M, Sun Y, Kolkin N, Weinberger K (2015) From word embeddings to document distances. In: International conference on ML, pp 957–966

  • Landthaler J, Waltl B, Holl P, Matthes F (2016) Extending full text search for legal document collections using word embeddings. In: JURIX, pp 73–82

  • Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196

  • Lu Q, Conrad JG, Al-Kofahi K, Keenan W (2011) Legal document clustering with built-in topic segmentation. In: Proceedings of the 20th ACM international conference on information and knowledge management. ACM, pp 383–392

  • MacCormick D, Summers R (2016) Interpreting statutes: a comparative study. Applied Legal Philosophy, Taylor & Francis. https://books.google.com/books?id=SDWoDQAAQBAJ

  • Manning C, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Merkl D, Schweighoffer E, Winiwarter W (1999) Exploratory analysis of concept and document spaces with connectionist networks. Artif Intell Law 7(2–3):185–209

    Article  Google Scholar 

  • Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv:13013781

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  • Mikolov T, Yih SWt, Zweig G (2013c) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the north american chapter of the ACL: HLT, ACL

  • Mitra B, Nalisnick E, Craswell N, Caruana R (2016) A dual embedding space model for document ranking. Preprint arXiv:160201137

  • Mochales R, Moens MF (2009) Argumentation mining: the detection, classification and structure of arguments in text. In: Proceedings of the 12th international conference on artificial intelligence and law, pp 98–107

  • Mochales R, Moens MF (2011) Argumentation mining. Artif Intell Law 19(1):1–22

    Article  Google Scholar 

  • Moens MF, Angheluta R (2003) Concept extraction from legal cases: the use of a statistic of coincidence. In: Proceedings of the 9th international conference on artificial intelligence and law, pp 142–146

  • Momtazi S, Lease M, Klakow D (2010) Effective term weighting for sentence retrieval. In: International conference on theory and practice of digital libraries. Springer, pp 482–485

  • Murdock VG (2006) Aspects of sentence retrieval. University of Massachusetts Amherst Department of Computer Science, Tech. rep

  • Nejadgholi I, Bougueng R, Witherspoon S (2017) A semi-supervised training method for semantic search of legal facts in canadian immigration cases. In: JURIX, pp 125–134

  • Novotná T (2020) Document similarity of czech supreme court decisions. Masaryk Univ J Law Technol 14(1):105–122

    Article  Google Scholar 

  • Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  • Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 275–281

  • Poudyal P, Quaresma P, Gonçalves T (2018) An architecture for the automatic identification of arguments in legal documents. In: MET-ARG’2018

  • Poudyal P, Gonçalves T, Quaresma P (2019) Using clustering techniques to identify arguments in legal documents. In: ASAIL at ICAIL

  • Poudyal P, Savelka J, Ieven A, Moens MF, Gonçalves T, Quaresma P (2020) Echr: legal corpus for argument mining. In: Proceedings of the 7th workshop on argument mining, pp 67–75

  • Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks, ELRA, Valletta, Malta, pp 45–50

  • Rissland EL, Skalak DB, Friedman MT (1996) Bankxx: supporting legal arguments through heuristic retrieval. Artif Intell Law 4(1):1–71

    Article  Google Scholar 

  • Rossi J, Kanoulas E (2019) Legal search in case law and statute law. In: Legal knowledge and information systems: JURIX 2019: the twenty-ninth annual conference. IOS Press, vol 322, p 83

  • Saravanan M, Ravindran B (2010) Identification of rhetorical roles for segmentation and summarization of a legal judgment. Artif Intell Law 18(1):45–76

    Article  Google Scholar 

  • Saravanan M, Ravindran B, Raman S (2009) Improving legal information retrieval using an ontological framework. Artif Intell Law 17(2):101–124

    Article  Google Scholar 

  • Savelka J (2020) Discovering sentences for argumentation about the meaning of statutory terms. PhD thesis, University of Pittsburgh

  • Savelka J, Ashley KD (2016) Extracting case law sentences for argumentation about the meaning of statutory terms. In: Proceedings of the third workshop on argument mining (ArgMining2016), pp 50–59

  • Savelka J, Ashley KD (2018) Segmenting us court decisions into functional and issue specific parts. In: JURIX, pp 111–120

  • Savelka J, Ashley KD (2020) Learning to rank sentences for explaining statutory terms. In: Proceedings of the fourth workshop on automated semantic analysis of information in legal text held online in conjunction with the 33rd international conference on legal knowledge and information systems (JURIX 2020)

  • Savelka J, Walker VR, Grabmair M, Ashley KD (2017) Sentence boundary detection in adjudicatory decisions in the united states. Traitement Automatique des Langues 58(2):21–45

    Google Scholar 

  • Savelka J, Xu H, Ashley KD (2019) Improving sentence retrieval from case law for statutory interpretation. In: Proceedings of the seventeenth international conference on artificial intelligence and law, pp 113–122

  • Šavelka J, Westermann H, Benyekhlef K (2020) Cross-domain generalization and knowledge transfer in transformers trained on legal data. In: Proceedings of the fourth workshop on automated semantic analysis of information in legal text held online in conjunction with the 33rd international conference on legal knowledge and information systems (JURIX 2020)

  • Savelka J, Westermann H, Benyekhlef K, Alexander CS, Grant JC, Amariles DR, El Hamdani R, Meeùs S, Troussel A, Araszkiewicz M, Ashley KD, Ashley A, Branting K, Falduti M, Grabmair M, Harašta J, Novotná T, Tippett E, Johnson S (2021) Lex rosetta: transfer of predictive models across languages, jurisdictions, and legal domains. In: Proceedings of the eighteenth international conference on artificial intelligence and law

  • Schweighofer E (2015) The role of ai & law in legal data science. In: JURIX, pp 191–192

  • Schweighofer E, Geist A et al (2007) Legal query expansion using ontologies and relevance feedback. LOAIT 7:149–160

    Google Scholar 

  • Socher R, Huang EH, Pennin J, Manning CD, Ng AY (2011) Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in neural information processing systems, pp 801–809

  • Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. Preprint arXiv:150300075

  • The President and Fellows of Harvard University (2018) Caselaw access project. https://case.law/. Accessed 21 Dec 2018

  • Uijttenbroek EM, Klein MC, Lodder AR, Van Harmelen F (2007) Case law retrieval by concept search and visualization. In: Proceedings of the 11th international conference on artificial intelligence and law, pp 95–96

  • Walker V, Vazirova K, Sanford C (2014) Annotating patterns of reasoning about medical theories of causation in vaccine cases: toward a type system for arguments. In: Proceedings of the first workshop on argumentation mining, pp 1–10

  • Walker V, Foerster D, Ponce JM, Rosen M (2018) Evidence types, credibility factors, and patterns or soft rules for weighing conflicting evidence: argument mining in the context of legal rules governing evidence assessment. In: Proceedings of the 5th workshop on argument mining, pp 68–78

  • Walker VR, Bagheri P, Lauria AJ (2015) Argumentation mining from judicial decisions: the attribution problem and the need for legal discourse models. In: Workshop on automated detection, extraction and analysis of semantic information in legal texts (ASAIL-2015)

  • Walker VR, Han JH, Ni X, Yoseda K (2017) Semantic types for computational legal reasoning: propositional connectives and sentence roles in the veterans’ claims dataset. In: Proceedings of the 16th edition of the international conference on articial intelligence and law, pp 217–226

  • Walker VR, Pillaipakkamnatt K, Davidson AM, Linares M, Pesce DJ (2019) Automatic classification of rhetorical roles for sentences: comparing rule-based scripts with machine learning. In: ASAIL@ ICAIL

  • Walter S (2009) Definition extraction from court decisions using computational linguistic technology. Form Linguist Law 212:183

    Google Scholar 

  • Wang Y, Huang H, Feng C, Zhou Q, Gu J, Gao X (2016) Cse: centence embeddings based on attention model. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 505–515

  • Westermann H, Šavelka J, Walker VR, Ashley KD, Benyekhlef K (2020) Sentence embeddings and high-speed similarity search for fast computer assisted annotation of legal documents. In: Legal knowledge and information systems. IOS Press, pp 164–173

  • Wieting J, Bansal M, Gimpel K, Livescu K (2015) Towards universal paraphrastic sentence embeddings. Preprint arXiv:151108198

  • Xu H, Šavelka J, Ashley KD (2020) Using argument mining for legal text summarization. In: Legal knowledge and information systems. IOS Press, pp 184–193

  • Xu H, Savelka J, Ashley KD (2021) Toward summarizing case decisions via extracting argument issues, reasons, and conclusions. In: Proceedings of the eighteenth international conference on artificial intelligence and law

  • Zhang D, He D (2018) Can word embedding help term mismatch problem?—A result analysis on clinical retrieval tasks. In: International conference on information. Springer, pp 402–408

  • Zhong L, Zhong Z, Zhao Z, Wang S, Ashley KD, Grabmair M (2019) Automatic summarization of legal decisions using iterative masking of predictive sentences. In: Proceedings of the seventeenth international conference on artificial intelligence and law, pp 163–172

  • Zhou G, He T, Zhao J, Hu P (2015) Learning continuous word embedding with metadata for question retrieval in community question answering. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), pp 250–259

Download references

Acknowledgements

The first author would like to acknowledge the University of Pittsburgh as his home institution during the time this work was conducted. This work was supported in part by a National Institute of Justice Graduate Student Fellowship (Fellow: Jaromir Savelka) Award # 2016-R2-CX-0010, “Recommendation System for Statutory Interpretation in Cybercrime,” and by a University of Pittsburgh Pitt Cyber Accelerator Grant entitled “Annotating Machine Learning Data for Interpreting Cyber-Crime Statutes.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaromír Šavelka.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Šavelka, J., Ashley, K.D. Legal information retrieval for understanding statutory terms. Artif Intell Law 30, 245–289 (2022). https://doi.org/10.1007/s10506-021-09293-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-021-09293-5

Keywords

Navigation