Boosting court judgment prediction and explanation using legal entities

Benedetto, Irene; Koudounas, Alkis; Vaiani, Lorenzo; Pastor, Eliana; Cagliero, Luca; Tarasconi, Francesco; Baralis, Elena

doi:10.1007/s10506-024-09397-8

Boosting court judgment prediction and explanation using legal entities

Original Research
Published: 18 March 2024

(2024)
Cite this article

Artificial Intelligence and Law Aims and scope Submit manuscript

Irene Benedetto ORCID: orcid.org/0000-0001-7086-7898^1,2^na1,
Alkis Koudounas¹^na1,
Lorenzo Vaiani¹^na1,
Eliana Pastor¹^na1,
Luca Cagliero¹,
Francesco Tarasconi² &
…
Elena Baralis¹

292 Accesses
Explore all metrics

Abstract

The automatic prediction of court case judgments using Deep Learning and Natural Language Processing is challenged by the variety of norms and regulations, the inherent complexity of the forensic language, and the length of legal judgments. Although state-of-the-art transformer-based architectures and Large Language Models (LLMs) are pre-trained on large-scale datasets, the underlying model reasoning is not transparent to the legal expert. This paper jointly addresses court judgment prediction and explanation by not only predicting the judgment but also providing legal experts with sentence-based explanations. To boost the performance of both tasks we leverage a legal named entity recognition step, which automatically annotates documents with meaningful domain-specific entity tags and masks the corresponding fine-grained descriptions. In such a way, transformer-based architectures and Large Language Models can attend to in-domain entity-related information in the inference process while neglecting irrelevant details. Furthermore, the explainer can boost the relevance of entity-enriched sentences while limiting the diffusion of potentially sensitive information. We also explore the use of in-context learning and lightweight fine-tuning to tailor LLMs to the legal language style and the downstream prediction and explanation tasks. The results obtained on a benchmark dataset from the Indian judicial system show the superior performance of entity-aware approaches to both judgment prediction and explanation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

Article Open access 24 August 2023

Deep learning modelling techniques: current progress, applications, advantages, and challenges

Article Open access 17 April 2023

Machine learning in crime prediction

Article Open access 02 February 2023

Notes

https://huggingface.co/models latest access: January 2024.
The results on the test set are not publicly available for the ILDC dataset (Malik et al. 2021).
https://anonymous.4open.science/r/NER-Boosting-CJPE Latest access: January 2024.

References

Alali M, Syed S, Alsayed M, et al (2021) Justice: a benchmark dataset for supreme court’s judgment prediction. arXiv:2112.03414
Aletras N, Tsarapatsanis D, Preoţiuc-Pietro D et al (2016) Predicting judicial decisions of the European court of human rights: a natural language processing perspective. PeerJ Comput Sci 2:e93. https://doi.org/10.7717/peerj-cs.93
Article Google Scholar
Angelidis I, Chalkidis I, Koubarakis M (2018) Named entity recognition, linking and generation for greek legislation. In: JURIX, URL https://ebooks.iospress.nl/volumearticle/50829
Arrieta AB, Díaz-Rodríguez N, Del Ser J et al (2020) Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 58:82–115
Article Google Scholar
Attanasio G, Pastor E, Di Bonaventura C, et al (2023) ferret: a framework for benchmarking explainers on transformers. In: Croce D, Soldaini L (eds) Proceedings of the 17th conference of the European chapter of the association for computational linguistics: system demonstrations. Association for Computational Linguistics, Dubrovnik, Croatia, pp 256–266, https://doi.org/10.18653/v1/2023.eacl-demo.29, URL https://aclanthology.org/2023.eacl-demo.29
Au TWT, Cox IJ, Lampos V (2022) E-NER—an annotated named entity recognition corpus of legal text. CoRR arXiv:abs/2212.09306. https://doi.org/10.48550/arXiv.2212.09306,
Benedetto I, Cagliero L, Tarasconi F (2022) Automatic inference of taxonomy relationships among legal documents. In: Chiusano S, Cerquitelli T, Wrembel R, et al (eds) New Trends in Database and Information Systems. Springer International Publishing, Cham, pp 24–33, https://doi.org/10.1007/978-3-031-15743-1_3
Benedetto I, Cagliero L, Tarasconi F, et al (2023a) Benchmarking abstractive models for italian legal news summarization. In: Sileno G, Spanakis J, van Dijck G (eds) Legal knowledge and information systems—JURIX 2023: the thirty-sixth annual conference, Maastricht, The Netherlands, 18-20 December 2023, Frontiers in Artificial Intelligence and Applications, vol 379. IOS Press, pp 311–316, https://doi.org/10.3233/FAIA230980,
Benedetto I, Koudounas A, Vaiani L, et al (2023b) PoliToHFI at SemEval-2023 task 6: leveraging entity-aware and hierarchical transformers for legal entity recognition and court judgment prediction. In: Proceedings of the The 17th international workshop on semantic evaluation (SemEval-2023). Association for computational linguistics, Toronto, Canada, pp 1401–1411, URL https://aclanthology.org/2023.semeval-1.194
Benedetto I, Sportelli G, Bertoldo S et al (2023) On the use of pretrained language models for legal Italian document classification. Proc Comput Sci 225:2244–2253. https://doi.org/10.1016/j.procs.2023.10.215
Article Google Scholar
Bhambhoria R, Dahan S, Zhu X (2021) Investigating the state-of-the-art performance and explainability of legal judgment prediction. In: Canadian Conference on AI
Bhambhoria R, Liu H, Dahan S, et al (2022) Interpretable low-resource legal decision making. In: Proceedings of the AAAI conference on artificial intelligence, pp 11819–11827
Bibal A, Lognoul M, De Streel A et al (2021) Legal requirements on explainability in machine learning. Artif Intell Law 29:149–169. https://doi.org/10.1007/s10506-020-09270-4
Article Google Scholar
Chalkidis I, Søgaard A (2022) Improved multi-label classification under temporal concept drift: rethinking group-robust algorithms in a label-wise setting. In: Findings of the association for computational linguistics: ACL 2022. Association for computational linguistics, Dublin, Ireland, pp 2441–2454, https://doi.org/10.18653/v1/2022.findings-acl.192, URL https://aclanthology.org/2022.findings-acl.192
Chalkidis I, Androutsopoulos I, Aletras N (2019) Neural legal judgment prediction in English. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for computational linguistics, Florence, Italy, pp 4317–4323, https://doi.org/10.18653/v1/P19-1424, URL https://aclanthology.org/P19-1424
Chalkidis I, Fergadiotis M, Malakasiotis P, et al (2020) LEGAL-BERT: the muppets straight out of law school. In: Findings of the association for computational linguistics: EMNLP 2020. Association for computational linguistics, Online, pp 2898–2904, https://doi.org/10.18653/v1/2020.findings-emnlp.261
Choi E, Levy O, Choi Y, et al (2018) Ultra-fine entity typing. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, pp 87–96, https://doi.org/10.18653/v1/P18-1009, URL https://aclanthology.org/P18-1009
Cui J, Shen X, Nie F, et al (2022) A survey on legal judgment prediction: Datasets, metrics, models and challenges. arXiv preprint arXiv:2204.04859
Dai Y, Feng D, Huang J, et al (2023) Laiw: A chinese legal large language models benchmark (A technical report). CoRR arXiv:abs/2310.05620. https://doi.org/10.48550/ARXIV.2310.05620,
Dettmers T, Lewis M, Shleifer S, et al (2021) 8-bit optimizers via block-wise quantization. CoRR arXiv:abs/2110.02861
Devlin J, Chang M, Lee K, et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp 4171–4186, https://doi.org/10.18653/v1/n19-1423,
DeYoung J, Jain S, Rajani NF, et al (2020) ERASER: a benchmark to evaluate rationalized NLP models. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for computational linguistics, Online, pp 4443–4458, https://doi.org/10.18653/v1/2020.acl-main.408, URL https://aclanthology.org/2020.acl-main.408
Dozier C, Kondadadi R, Light M et al (2010) Named entity recognition and resolution in legal text. Springer, Berlin. https://doi.org/10.1007/978-3-642-12837-0_2
Book Google Scholar
Fei Z, Shen X, Zhu D, et al (2023) Lawbench: Benchmarking legal knowledge of large language models. CoRR arXiv:abs/2309.16289. https://doi.org/10.48550/ARXIV.2309.16289
Goel K, Rajani NF, Vig J, et al (2021) Robustness gym: unifying the NLP evaluation landscape. In: Proceedings of the 2021 Conference of the North American chapter of the association for computational linguistics: human language technologies: demonstrations. Association for computational linguistics, Online, pp 42–55, https://doi.org/10.18653/v1/2021.naacl-demos.6, URL https://aclanthology.org/2021.naacl-demos.6
Górski L, Ramakrishna S (2021) Explainable artificial intelligence, lawyer’s perspective. In: Proceedings of the eighteenth international conference on artificial intelligence and law. Association for computing machinery, New York, NY, USA, ICAIL ’21, p 60-68, https://doi.org/10.1145/3462757.3466145,
Górski Ł, Ramakrishna S, Nowosielski JM (2021) Towards grad-cam based explainability in a legal text processing pipeline. extended version. In: Rodríguez-Doncel V, Palmirani M, Araszkiewicz M, et al (eds) AI approaches to the complexity of legal systems XI-XII. Springer International Publishing, Cham, pp 154–168, URL https://link.springer.com/chapter/10.1007/978-3-030-89811-3_11
Guha N, Nyarko J, Ho DE, et al (2023) Legalbench: A collaboratively built benchmark for measuring legal reasoning in large language models. arXiv:2308.11462
Hassan F, Domingo-Ferrer J, Soria-Comas J (2018) Anonymization of unstructured data via named-entity recognition. In: Torra V, Narukawa Y, Aguiló I et al (eds) Modeling decisions for artificial intelligence. Springer International Publishing, Cham, pp 296–305
Chapter Google Scholar
Hendrycks D, Burns C, Chen A, et al (2021) CUAD: an expert-annotated NLP dataset for legal contract review. CoRR arXiv:abs/2103.06268
Hu EJ, Shen Y, Wallis P, et al (2021) Lora: Low-rank adaptation of large language models. CoRR arXiv:abs/2106.09685
Jain D, Borah MD, Biswas A (2021) Summarization of legal documents: where are we now and the way forward. Comput Sci Rev 40:100388. https://doi.org/10.1016/j.cosrev.2021.100388
Article Google Scholar
Jiang AQ, Sablayrolles A, Mensch A, et al (2023) Mistral 7b. arXiv:2310.06825
Kalamkar P, Agarwal A, Tiwari A, et al (2022a) Named entity recognition in Indian court judgments. In: Proceedings of the natural legal language processing workshop 2022. Association for computational linguistics, Abu Dhabi, United Arab Emirates (Hybrid), pp 184–193, URL https://aclanthology.org/2022.nllp-1.15
Kalamkar P, Tiwari A, Agarwal A, et al (2022b) Corpus for automatic structuring of legal documents. In: Proceedings of the thirteenth language resources and evaluation conference. European language resources association, Marseille, France, pp 4420–4429, URL https://aclanthology.org/2022.lrec-1.470
Kaur A, Bozic B (2019) Convolutional neural network-based automatic prediction of judgments of the european court of human rights. In: Irish conference on artificial intelligence and cognitive science, URL https://ceur-ws.org/Vol-2563/aics_42.pdf
Koudounas A, Giobergia F, Baralis E (2023a) Bad exoplanet! explaining degraded performance when reconstructing exoplanets atmospheric parameters. In: NeurIPS 2023 AI for science workshop, URL https://openreview.net/forum?id=9Z4XZOhwiz
Koudounas A, Pastor E, Attanasio G, et al (2023b) Exploring subgroup performance in end-to-end speech models. In: ICASSP 2023 - 2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1–5, https://doi.org/10.1109/ICASSP49357.2023.10095284
Koudounas A, Pastor E, Attanasio G, et al (2024a) Prioritizing data acquisition for end-to-end speech model improvement. In: ICASSP 2024 - 2024 IEEE international conference on acoustics, speech and signal processing (ICASSP)
Koudounas A, Pastor E, Attanasio G et al (2024) Towards comprehensive subgroup performance analysis in speech models. IEEE/ACM Trans Audio Speech Lang Process. https://doi.org/10.1109/TASLP.2024.3363447
Article Google Scholar
Kowsrihawat K, Vateekul P, Boonkwan P (2018) Predicting judicial decisions of criminal cases from thai supreme court using bi-directional gru with attention mechanism. In: 2018 5th Asian conference on defense technology (ACDT) pp 50–55. URL https://ieeexplore.ieee.org/document/8592948
Lavie A, Agarwal A (2007) METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation. Association for computational linguistics, Prague, Czech Republic, pp 228–231, URL https://aclanthology.org/W07-0734
Leitner E, Rehm G, Moreno-Schneider J (2020) A dataset of German legal documents for named entity recognition. In: Proceedings of the twelfth language resources and evaluation conference. European language resources association, Marseille, France, pp 4478–4485, URL https://aclanthology.org/2020.lrec-1.551
Li J, Sun A, Han J et al (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng 34(1):50–70
Article Google Scholar
Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. In: Text summarization branches out. Association for computational linguistics, Barcelona, Spain, pp 74–81, URL https://aclanthology.org/W04-1013
Liu H, Tam D, Muqeeth M, et al (2022) Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. arXiv:2205.05638
Liu Y, Ott M, Goyal N, et al (2019) Roberta: a robustly optimized bert pretraining approach. https://doi.org/10.48550/ARXIV.1907.11692, URL https://arxiv.org/abs/1907.11692
Lu J, Henchion M, Bacher I, et al (2021) A sentence-level hierarchical BERT model for document classification with limited labelled data, pp 231–241. https://doi.org/10.1007/978-3-030-88942-5_18
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, et al (eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc., URL https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
Luo CF, Bhambhoria R, Dahan S, et al (2022) Evaluating explanation correctness in legal decision making. In: Proceedings of the Canadian conference on artificial intelligence https://doi.org/10.21428/594757db.8718dc8b
Malik V, Sanjay R, Nigam SK, et al (2021) ILDC for CJPE: Indian legal documents corpus for court judgment prediction and explanation. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers). Association for computational linguistics, Online, pp 4046–4062, https://doi.org/10.18653/v1/2021.acl-long.313
McCallum A, Li W (2003) Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003, pp 188–191, URL https://aclanthology.org/W03-0430
Medvedeva M, Üstün A, Xu X, et al (2021) Automatic judgement forecasting for pending applications of the european court of human rights. In: ASAIL/LegalAIIA@ ICAIL, pp 12–23, URL https://ceur-ws.org/Vol-2888/paper2.pdf
Mosbach M, Pimentel T, Ravfogel S, et al (2023) Few-shot fine-tuning vs. in-context learning: A fair comparison and evaluation. In: Findings of the association for computational linguistics: ACL 2023. Association for computational linguistics, Toronto, Canada, pp 12284–12314, https://doi.org/10.18653/v1/2023.findings-acl.779, URL https://aclanthology.org/2023.findings-acl.779
Napolitano D, Cagliero L (2023) GX-HUI: global explanations of AI models based on high-utility itemsets. In: Shahriar H, Teranishi Y, Cuzzocrea A, et al (eds) 47th IEEE annual computers, software, and applications conference, COMPSAC 2023, Torino, Italy, June 26-30, 2023. IEEE, pp 292–297, https://doi.org/10.1109/COMPSAC57700.2023.00045,
Papineni K, Roukos S, Ward T, et al (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for computational linguistics, USA, ACL ’02, pp 311–318, https://doi.org/10.3115/1073083.1073135,
Pastor E, Baralis E (2019) Explaining black box models by means of local rules. In: Proceedings of the 34th ACM/SIGAPP symposium on applied computing. Association for computing machinery, New York, NY, USA, SAC ’19, pp 510–517, https://doi.org/10.1145/3297280.3297328
Pastor E, de Alfaro L, Baralis E (2021a) Looking for trouble: analyzing classifier behavior via pattern divergence. In: Proceedings of the 2021 international conference on management of data. Association for computing machinery, New York, NY, USA, SIGMOD ’21, p 1400-1412, https://doi.org/10.1145/3448016.3457284,
Pastor E, Gavgavian A, Baralis E et al (2021) How divergent is your data? Proc VLDB Endow 14(12):2835–2838. https://doi.org/10.14778/3476311.3476357
Article Google Scholar
Pastor E, Baralis E, de Alfaro L (2023) A hierarchical approach to anomalous subgroup discovery. In: 2023 IEEE 39th international conference on data engineering (ICDE), pp 2647–2659, https://doi.org/10.1109/ICDE55515.2023.00203
Pastor E, Koudounas A, Attanasio G, et al (2024) Explaining speech classification models via word-level audio segments and paralinguistic features. In: Proceedings of the 18th conference of the European chapter of the association for computational linguistics. Association for computational linguistics
Paul S, Goyal P, Ghosh S (2022) Lesicin: a heterogeneous graph-based approach for automatic legal statute identification from Indian legal documents. In: Proceedings of the AAAI conference on artificial intelligence, pp 11139–11146, URL https://aaai-2022.virtualchair.net/poster_aaai10463
Quemy A, Wrembel R (2020) On integrating and classifying legal text documents. In: Hartmann S, Küng J, Kotsis G, et al (eds) Database and expert systems applications. Springer International Publishing, Cham, pp 385–399, URL https://dl.acm.org/doi/abs/10.1007/978-3-030-59003-1_25
Ribeiro MT, Singh S, Guestrin C (2016) “why should i trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. Association for computing machinery, New York, NY, USA, KDD ’16, pp 1135–1144, https://doi.org/10.1145/2939672.2939778,
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1(5):206–215
Article PubMed PubMed Central Google Scholar
Saeed W, Omlin CW (2023) Explainable AI (XAI): a systematic meta-survey of current challenges and future opportunities. Knowl Based Syst 263:110273. https://doi.org/10.1016/j.knosys.2023.110273
Article Google Scholar
Sansone C, Sperlí G (2022) Legal information retrieval systems: state-of-the-art and open issues. Inf Syst 106:101967. https://doi.org/10.1016/j.is.2021.101967
Article Google Scholar
Selvaraju RR, Cogswell M, Das A, et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Setzu M, Guidotti R, Monreale A et al (2021) Glocalx—from local to global explanations of black box AI models. Artif Intell 294:103457. https://doi.org/10.1016/j.artint.2021.103457
Article MathSciNet Google Scholar
Shaikh RA, Sahu TP, Anand V (2020) Predicting outcomes of legal cases based on legal factors using classifiers. Proc Comput Sci 167:2393–2402. https://doi.org/10.1016/j.procs.2020.03.292
Article Google Scholar
Shukla A, Bhattacharya P, Poddar S, et al (2022) Legal case document summarization: extractive and abstractive methods and their evaluation. In: Proceedings of the 2nd conference of the asia-pacific chapter of the association for computational linguistics and the 12th international joint conference on natural language processing (Volume 1: Long Papers). Association for Computational Linguistics, Online only, pp 1048–1064, URL https://aclanthology.org/2022.aacl-main.77
Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034
Strickson B, De La Iglesia B (2020) Legal judgement prediction for UK courts. In: Proceedings of the 3rd international conference on information science and systems. Association for computing machinery, New York, NY, USA, ICISS ’20, p 204-209, https://doi.org/10.1145/3388176.3388183,
Sundararajan M, Taly A, Yan Q (2017a) Axiomatic attribution for deep networks. In: International conference on machine learning, PMLR, pp 3319–3328
Sundararajan M, Taly A, Yan Q (2017b) Axiomatic attribution for deep networks. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, proceedings of machine learning research, vol 70. PMLR, pp 3319–3328, URL https://proceedings.mlr.press/v70/sundararajan17a.html
Tiersma P (2000) Legal language. Bibliovault OAI Repository, the University of Chicago Press 27. https://doi.org/10.1016/S1352-0237(00)00210-0
Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003, pp 142–147, URL https://aclanthology.org/W03-0419
Touvron H, Martin L, Stone K, et al (2023) Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288
Tunstall L, Beeching E, Lambert N, et al (2023) Zephyr: direct distillation of lm alignment. arXiv:2310.16944
Ventura F, Greco S, Apiletti D et al (2022) Trusting deep learning natural-language models via local and global explanations. Knowl Inf Syst 64(7):1863–1907
Article Google Scholar
Visentin A, Nardotto A, O’Sullivan B (2019) Predicting judicial decisions: a statistically rigorous approach and a new ensemble classifier. In: 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI) pp 1820–1824. URL https://ieeexplore.ieee.org/document/8995348
Williams C (2005) Tradition and Change in Legal English. Peter Lang Verlag, Lausanne, Switzerland, https://doi.org/10.3726/978-3-0351-0317-5, URL https://www.peterlang.com/document/1043657
Yamada I, Asai A, Shindo H, et al (2020) LUKE: deep contextualized entity representations with entity-aware self-attention. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). Association for computational linguistics, Online, pp 6442–6454, https://doi.org/10.18653/v1/2020.emnlp-main.523, URL https://aclanthology.org/2020.emnlp-main.523
Zhang Y, Zhong V, Chen D, et al (2017) Position-aware attention and supervised data improve slot filling. In: Conference on empirical methods in natural language processing
Zhao H, Chen H, Yang F, et al (2023) Explainability for large language models: a survey. arXiv:2309.01029
Zhong L, Zhong Z, Zhao Z, et al (2019) Automatic summarization of legal decisions using iterative masking of predictive sentences. In: Proceedings of the seventeenth international conference on artificial intelligence and law. Association for computing machinery, New York, NY, USA, ICAIL ’19, pp 163–172, https://doi.org/10.1145/3322640.3326728

Download references

Author information

Irene Benedetto, Alkis Koudounas, Lorenzo Vaiani and Eliana Pastor have contributed equally to this work.

Authors and Affiliations

Department of Computer and Control Engineering, Politecnico di Torino, Corso Castelfidardo, 39, 10129, Turin, Italy
Irene Benedetto, Alkis Koudounas, Lorenzo Vaiani, Eliana Pastor, Luca Cagliero & Elena Baralis
MAIZE SRL, Via San Quintino, 31, 10121, Turin, Italy
Irene Benedetto & Francesco Tarasconi

Authors

Irene Benedetto
View author publications
You can also search for this author in PubMed Google Scholar
Alkis Koudounas
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Vaiani
View author publications
You can also search for this author in PubMed Google Scholar
Eliana Pastor
View author publications
You can also search for this author in PubMed Google Scholar
Luca Cagliero
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Tarasconi
View author publications
You can also search for this author in PubMed Google Scholar
Elena Baralis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irene Benedetto.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

We report in appendix

A detailed description of the hardware and configuration settings used in our experiments (see Sect. A.1);
Additional results for the CJP task (see Sect. A.2);
A deeper analysis of the impact of the boosting parameter on explanations on plausibility (see Sect. A.3);
additional results on the JUSTICE dataset (see Sect. A.4).

1.1 Experimental settings

Training parameters. In the L-NER task, we trained the models based on LUKE architecture using a batch size of 256 and a learning rate of \(10^{-4}\). In contrast, we trained BERT-based models using a batch size of 1 and a learning rate of \(10^{-5}\). Both models undergo a maximum of 10 training epochs, with an early stopping criterion applied. Both models also have a weight decay of 0.01 and a warmup ratio of 0.06.

For the CJP task, we trained sentence encoders for a maximum of 15 epochs, utilizing a learning rate of \(5\cdot 10^{-5}\), a warmup ratio of 0.06, a weight decay of 0.01, and a batch size of 64. The hierarchical transformer-based architecture has a maximum length of 256 tokens and undergoes a maximum of 100 epochs of training, with a learning rate of \(5\cdot 10^{-5}\) and a batch size of 256.

Tables 12 and 13 summarize the main per-model parameter settings, whereas Table 14 reports the training and evaluation times for L-NER and CJP models in terms of the number of samples processed per second.

Hardware. The experiments were run on a machine equipped with Intel\(^{\circledR }\) Core\(^{\textrm{TM}}\) i9-10980XE CPU, 2 \(\times \) Nvidia\(^{\circledR }\) RTX A6000 GPU, 128 GB of RAM running Ubuntu 22.04 LTS. We provide detailed information about the models used for the evaluation and the fine-tuning procedure in the official project repository.^{Footnote 3}

Table 12 Summary of the used models and parameters for the NER task

Full size table

Table 13 Summary of the used models and parameters for the CJP task

Full size table

Table 14 Performance Analysis: training and evaluation samples per second for L-NER and CJP models. CJP models have four MLP layers; however, variations in the number of MLP layers do not affect the training and validation time

Full size table

1.2 Additional results for court judgment prediction

Here we report the values of additional performance metrics on CJP, i.e., Precision and Recall. Table 15 reports the result obtained including the NER-masking step in the CJP pipeline whereas Table 16 shows the results obtained by exploiting original document sentences to make predictions.

Some relevant facts emerge from the analysis of these results:

Relationship between Recall and F1 Score: The best recall values are consistently associated with the best F1 scores. This implies that when the approach performs well in terms of recall, it tends to achieve a higher F1 score, which is a balanced measure considering both precision and recall. This indicates that a higher recall, or the ability to correctly identify positive instances, generally leads to better overall performance as reflected by the F1 score.
Precision and Recall with NER-Masking: The results show that when NER-masking is included in the pipeline, the precision and recall values tend to be higher compared to the approach without NER-masking. This suggests that incorporating NER-masking has a positive impact on the model’s ability to accurately identify positive instances and capture relevant information. In most cases, the precision and recall values achieved with NER-masking are consistently higher, indicating better performance in terms of correctly identifying entities of interest. However, there is one exception since the precision obtained without NER-masking on the development set is an instance where the precision value is higher compared to the precision achieved with NER-masking.

Table 15 CJP results obtained applying a hierarchical approach, masking the text with NER tags

Full size table

Table 16 CJP results obtained applying a hierarchical approach, without masking the text with NER tags

Full size table

1.3 Impact of boosting parameter on plausibility

We study the impact of the boosting parameter \(\beta \) on plausibility. We compute the explanations for the explain set for which ground truth explanations are available We report the results in Fig. 4. Generally, boosting is highly beneficial for Gradient (G) and GradientXInput (GxI) with no masking. These explainers are indeed the ones with the increasing percentage of documents whose explanations are affected by the boosting. The impact is low or none for LOO without masking and G with masking, for which it is measured a low percentage of modified explanations also when increasing \(\beta \).

1.4 Court judgment predictions on JUSTICE

We replicated the court judgment prediction experiments, with entity masking (H-NER-masked) and without (H-unmasked), on the JUSTICE dataset as well. The results are reported in Table 17. Notice that the entities recognized by the NER step cannot be validated because the JUSTICE dataset lacks on entity-level annotations.

Entity masking slightly improves prediction performance. The reason behind the more limited performance improvement compared to to Malik et al. (2021), are: (1) The lack of domain-specific NER annotations (we can just reuse the NER module trained on Indian court judgments), and (2) The more limited redundancy level of JUSTICE’s judgments, which contain only the facts and no extra information about the verdit.

Table 17 Results on JUSTICE dataset

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Benedetto, I., Koudounas, A., Vaiani, L. et al. Boosting court judgment prediction and explanation using legal entities. Artif Intell Law (2024). https://doi.org/10.1007/s10506-024-09397-8

Download citation

Accepted: 25 February 2024
Published: 18 March 2024
DOI: https://doi.org/10.1007/s10506-024-09397-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosting court judgment prediction and explanation using legal entities

Abstract

Access this article

Similar content being viewed by others

Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

Deep learning modelling techniques: current progress, applications, advantages, and challenges

Machine learning in crime prediction

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Experimental settings

1.2 Additional results for court judgment prediction

1.3 Impact of boosting parameter on plausibility

1.4 Court judgment predictions on JUSTICE

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Boosting court judgment prediction and explanation using legal entities

Abstract

Access this article

Similar content being viewed by others

Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

Deep learning modelling techniques: current progress, applications, advantages, and challenges

Machine learning in crime prediction

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Experimental settings

1.2 Additional results for court judgment prediction

1.3 Impact of boosting parameter on plausibility

1.4 Court judgment predictions on JUSTICE

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation