Skip to main content
Log in

Boosting court judgment prediction and explanation using legal entities

  • Original Research
  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

The automatic prediction of court case judgments using Deep Learning and Natural Language Processing is challenged by the variety of norms and regulations, the inherent complexity of the forensic language, and the length of legal judgments. Although state-of-the-art transformer-based architectures and Large Language Models (LLMs) are pre-trained on large-scale datasets, the underlying model reasoning is not transparent to the legal expert. This paper jointly addresses court judgment prediction and explanation by not only predicting the judgment but also providing legal experts with sentence-based explanations. To boost the performance of both tasks we leverage a legal named entity recognition step, which automatically annotates documents with meaningful domain-specific entity tags and masks the corresponding fine-grained descriptions. In such a way, transformer-based architectures and Large Language Models can attend to in-domain entity-related information in the inference process while neglecting irrelevant details. Furthermore, the explainer can boost the relevance of entity-enriched sentences while limiting the diffusion of potentially sensitive information. We also explore the use of in-context learning and lightweight fine-tuning to tailor LLMs to the legal language style and the downstream prediction and explanation tasks. The results obtained on a benchmark dataset from the Indian judicial system show the superior performance of entity-aware approaches to both judgment prediction and explanation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://huggingface.co/models latest access: January 2024.

  2. The results on the test set are not publicly available for the ILDC dataset (Malik et al. 2021).

  3. https://anonymous.4open.science/r/NER-Boosting-CJPE Latest access: January 2024.

References

  • Alali M, Syed S, Alsayed M, et al (2021) Justice: a benchmark dataset for supreme court’s judgment prediction. arXiv:2112.03414

  • Aletras N, Tsarapatsanis D, Preoţiuc-Pietro D et al (2016) Predicting judicial decisions of the European court of human rights: a natural language processing perspective. PeerJ Comput Sci 2:e93. https://doi.org/10.7717/peerj-cs.93

    Article  Google Scholar 

  • Angelidis I, Chalkidis I, Koubarakis M (2018) Named entity recognition, linking and generation for greek legislation. In: JURIX, URL https://ebooks.iospress.nl/volumearticle/50829

  • Arrieta AB, Díaz-Rodríguez N, Del Ser J et al (2020) Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 58:82–115

    Article  Google Scholar 

  • Attanasio G, Pastor E, Di Bonaventura C, et al (2023) ferret: a framework for benchmarking explainers on transformers. In: Croce D, Soldaini L (eds) Proceedings of the 17th conference of the European chapter of the association for computational linguistics: system demonstrations. Association for Computational Linguistics, Dubrovnik, Croatia, pp 256–266, https://doi.org/10.18653/v1/2023.eacl-demo.29, URL https://aclanthology.org/2023.eacl-demo.29

  • Au TWT, Cox IJ, Lampos V (2022) E-NER—an annotated named entity recognition corpus of legal text. CoRR arXiv:abs/2212.09306. https://doi.org/10.48550/arXiv.2212.09306,

  • Benedetto I, Cagliero L, Tarasconi F (2022) Automatic inference of taxonomy relationships among legal documents. In: Chiusano S, Cerquitelli T, Wrembel R, et al (eds) New Trends in Database and Information Systems. Springer International Publishing, Cham, pp 24–33, https://doi.org/10.1007/978-3-031-15743-1_3

  • Benedetto I, Cagliero L, Tarasconi F, et al (2023a) Benchmarking abstractive models for italian legal news summarization. In: Sileno G, Spanakis J, van Dijck G (eds) Legal knowledge and information systems—JURIX 2023: the thirty-sixth annual conference, Maastricht, The Netherlands, 18-20 December 2023, Frontiers in Artificial Intelligence and Applications, vol 379. IOS Press, pp 311–316, https://doi.org/10.3233/FAIA230980,

  • Benedetto I, Koudounas A, Vaiani L, et al (2023b) PoliToHFI at SemEval-2023 task 6: leveraging entity-aware and hierarchical transformers for legal entity recognition and court judgment prediction. In: Proceedings of the The 17th international workshop on semantic evaluation (SemEval-2023). Association for computational linguistics, Toronto, Canada, pp 1401–1411, URL https://aclanthology.org/2023.semeval-1.194

  • Benedetto I, Sportelli G, Bertoldo S et al (2023) On the use of pretrained language models for legal Italian document classification. Proc Comput Sci 225:2244–2253. https://doi.org/10.1016/j.procs.2023.10.215

    Article  Google Scholar 

  • Bhambhoria R, Dahan S, Zhu X (2021) Investigating the state-of-the-art performance and explainability of legal judgment prediction. In: Canadian Conference on AI

  • Bhambhoria R, Liu H, Dahan S, et al (2022) Interpretable low-resource legal decision making. In: Proceedings of the AAAI conference on artificial intelligence, pp 11819–11827

  • Bibal A, Lognoul M, De Streel A et al (2021) Legal requirements on explainability in machine learning. Artif Intell Law 29:149–169. https://doi.org/10.1007/s10506-020-09270-4

    Article  Google Scholar 

  • Chalkidis I, Søgaard A (2022) Improved multi-label classification under temporal concept drift: rethinking group-robust algorithms in a label-wise setting. In: Findings of the association for computational linguistics: ACL 2022. Association for computational linguistics, Dublin, Ireland, pp 2441–2454, https://doi.org/10.18653/v1/2022.findings-acl.192, URL https://aclanthology.org/2022.findings-acl.192

  • Chalkidis I, Androutsopoulos I, Aletras N (2019) Neural legal judgment prediction in English. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for computational linguistics, Florence, Italy, pp 4317–4323, https://doi.org/10.18653/v1/P19-1424, URL https://aclanthology.org/P19-1424

  • Chalkidis I, Fergadiotis M, Malakasiotis P, et al (2020) LEGAL-BERT: the muppets straight out of law school. In: Findings of the association for computational linguistics: EMNLP 2020. Association for computational linguistics, Online, pp 2898–2904, https://doi.org/10.18653/v1/2020.findings-emnlp.261

  • Choi E, Levy O, Choi Y, et al (2018) Ultra-fine entity typing. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, pp 87–96, https://doi.org/10.18653/v1/P18-1009, URL https://aclanthology.org/P18-1009

  • Cui J, Shen X, Nie F, et al (2022) A survey on legal judgment prediction: Datasets, metrics, models and challenges. arXiv preprint arXiv:2204.04859

  • Dai Y, Feng D, Huang J, et al (2023) Laiw: A chinese legal large language models benchmark (A technical report). CoRR arXiv:abs/2310.05620. https://doi.org/10.48550/ARXIV.2310.05620,

  • Dettmers T, Lewis M, Shleifer S, et al (2021) 8-bit optimizers via block-wise quantization. CoRR arXiv:abs/2110.02861

  • Devlin J, Chang M, Lee K, et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp 4171–4186, https://doi.org/10.18653/v1/n19-1423,

  • DeYoung J, Jain S, Rajani NF, et al (2020) ERASER: a benchmark to evaluate rationalized NLP models. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for computational linguistics, Online, pp 4443–4458, https://doi.org/10.18653/v1/2020.acl-main.408, URL https://aclanthology.org/2020.acl-main.408

  • Dozier C, Kondadadi R, Light M et al (2010) Named entity recognition and resolution in legal text. Springer, Berlin. https://doi.org/10.1007/978-3-642-12837-0_2

    Book  Google Scholar 

  • Fei Z, Shen X, Zhu D, et al (2023) Lawbench: Benchmarking legal knowledge of large language models. CoRR arXiv:abs/2309.16289. https://doi.org/10.48550/ARXIV.2309.16289

  • Goel K, Rajani NF, Vig J, et al (2021) Robustness gym: unifying the NLP evaluation landscape. In: Proceedings of the 2021 Conference of the North American chapter of the association for computational linguistics: human language technologies: demonstrations. Association for computational linguistics, Online, pp 42–55, https://doi.org/10.18653/v1/2021.naacl-demos.6, URL https://aclanthology.org/2021.naacl-demos.6

  • Górski L, Ramakrishna S (2021) Explainable artificial intelligence, lawyer’s perspective. In: Proceedings of the eighteenth international conference on artificial intelligence and law. Association for computing machinery, New York, NY, USA, ICAIL ’21, p 60-68, https://doi.org/10.1145/3462757.3466145,

  • Górski Ł, Ramakrishna S, Nowosielski JM (2021) Towards grad-cam based explainability in a legal text processing pipeline. extended version. In: Rodríguez-Doncel V, Palmirani M, Araszkiewicz M, et al (eds) AI approaches to the complexity of legal systems XI-XII. Springer International Publishing, Cham, pp 154–168, URL https://link.springer.com/chapter/10.1007/978-3-030-89811-3_11

  • Guha N, Nyarko J, Ho DE, et al (2023) Legalbench: A collaboratively built benchmark for measuring legal reasoning in large language models. arXiv:2308.11462

  • Hassan F, Domingo-Ferrer J, Soria-Comas J (2018) Anonymization of unstructured data via named-entity recognition. In: Torra V, Narukawa Y, Aguiló I et al (eds) Modeling decisions for artificial intelligence. Springer International Publishing, Cham, pp 296–305

    Chapter  Google Scholar 

  • Hendrycks D, Burns C, Chen A, et al (2021) CUAD: an expert-annotated NLP dataset for legal contract review. CoRR arXiv:abs/2103.06268

  • Hu EJ, Shen Y, Wallis P, et al (2021) Lora: Low-rank adaptation of large language models. CoRR arXiv:abs/2106.09685

  • Jain D, Borah MD, Biswas A (2021) Summarization of legal documents: where are we now and the way forward. Comput Sci Rev 40:100388. https://doi.org/10.1016/j.cosrev.2021.100388

    Article  Google Scholar 

  • Jiang AQ, Sablayrolles A, Mensch A, et al (2023) Mistral 7b. arXiv:2310.06825

  • Kalamkar P, Agarwal A, Tiwari A, et al (2022a) Named entity recognition in Indian court judgments. In: Proceedings of the natural legal language processing workshop 2022. Association for computational linguistics, Abu Dhabi, United Arab Emirates (Hybrid), pp 184–193, URL https://aclanthology.org/2022.nllp-1.15

  • Kalamkar P, Tiwari A, Agarwal A, et al (2022b) Corpus for automatic structuring of legal documents. In: Proceedings of the thirteenth language resources and evaluation conference. European language resources association, Marseille, France, pp 4420–4429, URL https://aclanthology.org/2022.lrec-1.470

  • Kaur A, Bozic B (2019) Convolutional neural network-based automatic prediction of judgments of the european court of human rights. In: Irish conference on artificial intelligence and cognitive science, URL https://ceur-ws.org/Vol-2563/aics_42.pdf

  • Koudounas A, Giobergia F, Baralis E (2023a) Bad exoplanet! explaining degraded performance when reconstructing exoplanets atmospheric parameters. In: NeurIPS 2023 AI for science workshop, URL https://openreview.net/forum?id=9Z4XZOhwiz

  • Koudounas A, Pastor E, Attanasio G, et al (2023b) Exploring subgroup performance in end-to-end speech models. In: ICASSP 2023 - 2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1–5, https://doi.org/10.1109/ICASSP49357.2023.10095284

  • Koudounas A, Pastor E, Attanasio G, et al (2024a) Prioritizing data acquisition for end-to-end speech model improvement. In: ICASSP 2024 - 2024 IEEE international conference on acoustics, speech and signal processing (ICASSP)

  • Koudounas A, Pastor E, Attanasio G et al (2024) Towards comprehensive subgroup performance analysis in speech models. IEEE/ACM Trans Audio Speech Lang Process. https://doi.org/10.1109/TASLP.2024.3363447

    Article  Google Scholar 

  • Kowsrihawat K, Vateekul P, Boonkwan P (2018) Predicting judicial decisions of criminal cases from thai supreme court using bi-directional gru with attention mechanism. In: 2018 5th Asian conference on defense technology (ACDT) pp 50–55. URL https://ieeexplore.ieee.org/document/8592948

  • Lavie A, Agarwal A (2007) METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation. Association for computational linguistics, Prague, Czech Republic, pp 228–231, URL https://aclanthology.org/W07-0734

  • Leitner E, Rehm G, Moreno-Schneider J (2020) A dataset of German legal documents for named entity recognition. In: Proceedings of the twelfth language resources and evaluation conference. European language resources association, Marseille, France, pp 4478–4485, URL https://aclanthology.org/2020.lrec-1.551

  • Li J, Sun A, Han J et al (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng 34(1):50–70

    Article  Google Scholar 

  • Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. In: Text summarization branches out. Association for computational linguistics, Barcelona, Spain, pp 74–81, URL https://aclanthology.org/W04-1013

  • Liu H, Tam D, Muqeeth M, et al (2022) Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. arXiv:2205.05638

  • Liu Y, Ott M, Goyal N, et al (2019) Roberta: a robustly optimized bert pretraining approach. https://doi.org/10.48550/ARXIV.1907.11692, URL https://arxiv.org/abs/1907.11692

  • Lu J, Henchion M, Bacher I, et al (2021) A sentence-level hierarchical BERT model for document classification with limited labelled data, pp 231–241. https://doi.org/10.1007/978-3-030-88942-5_18

  • Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, et al (eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc., URL https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf

  • Luo CF, Bhambhoria R, Dahan S, et al (2022) Evaluating explanation correctness in legal decision making. In: Proceedings of the Canadian conference on artificial intelligence https://doi.org/10.21428/594757db.8718dc8b

  • Malik V, Sanjay R, Nigam SK, et al (2021) ILDC for CJPE: Indian legal documents corpus for court judgment prediction and explanation. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers). Association for computational linguistics, Online, pp 4046–4062, https://doi.org/10.18653/v1/2021.acl-long.313

  • McCallum A, Li W (2003) Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003, pp 188–191, URL https://aclanthology.org/W03-0430

  • Medvedeva M, Üstün A, Xu X, et al (2021) Automatic judgement forecasting for pending applications of the european court of human rights. In: ASAIL/LegalAIIA@ ICAIL, pp 12–23, URL https://ceur-ws.org/Vol-2888/paper2.pdf

  • Mosbach M, Pimentel T, Ravfogel S, et al (2023) Few-shot fine-tuning vs. in-context learning: A fair comparison and evaluation. In: Findings of the association for computational linguistics: ACL 2023. Association for computational linguistics, Toronto, Canada, pp 12284–12314, https://doi.org/10.18653/v1/2023.findings-acl.779, URL https://aclanthology.org/2023.findings-acl.779

  • Napolitano D, Cagliero L (2023) GX-HUI: global explanations of AI models based on high-utility itemsets. In: Shahriar H, Teranishi Y, Cuzzocrea A, et al (eds) 47th IEEE annual computers, software, and applications conference, COMPSAC 2023, Torino, Italy, June 26-30, 2023. IEEE, pp 292–297, https://doi.org/10.1109/COMPSAC57700.2023.00045,

  • Papineni K, Roukos S, Ward T, et al (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for computational linguistics, USA, ACL ’02, pp 311–318, https://doi.org/10.3115/1073083.1073135,

  • Pastor E, Baralis E (2019) Explaining black box models by means of local rules. In: Proceedings of the 34th ACM/SIGAPP symposium on applied computing. Association for computing machinery, New York, NY, USA, SAC ’19, pp 510–517, https://doi.org/10.1145/3297280.3297328

  • Pastor E, de Alfaro L, Baralis E (2021a) Looking for trouble: analyzing classifier behavior via pattern divergence. In: Proceedings of the 2021 international conference on management of data. Association for computing machinery, New York, NY, USA, SIGMOD ’21, p 1400-1412, https://doi.org/10.1145/3448016.3457284,

  • Pastor E, Gavgavian A, Baralis E et al (2021) How divergent is your data? Proc VLDB Endow 14(12):2835–2838. https://doi.org/10.14778/3476311.3476357

    Article  Google Scholar 

  • Pastor E, Baralis E, de Alfaro L (2023) A hierarchical approach to anomalous subgroup discovery. In: 2023 IEEE 39th international conference on data engineering (ICDE), pp 2647–2659, https://doi.org/10.1109/ICDE55515.2023.00203

  • Pastor E, Koudounas A, Attanasio G, et al (2024) Explaining speech classification models via word-level audio segments and paralinguistic features. In: Proceedings of the 18th conference of the European chapter of the association for computational linguistics. Association for computational linguistics

  • Paul S, Goyal P, Ghosh S (2022) Lesicin: a heterogeneous graph-based approach for automatic legal statute identification from Indian legal documents. In: Proceedings of the AAAI conference on artificial intelligence, pp 11139–11146, URL https://aaai-2022.virtualchair.net/poster_aaai10463

  • Quemy A, Wrembel R (2020) On integrating and classifying legal text documents. In: Hartmann S, Küng J, Kotsis G, et al (eds) Database and expert systems applications. Springer International Publishing, Cham, pp 385–399, URL https://dl.acm.org/doi/abs/10.1007/978-3-030-59003-1_25

  • Ribeiro MT, Singh S, Guestrin C (2016) “why should i trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. Association for computing machinery, New York, NY, USA, KDD ’16, pp 1135–1144, https://doi.org/10.1145/2939672.2939778,

  • Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1(5):206–215

    Article  PubMed  PubMed Central  Google Scholar 

  • Saeed W, Omlin CW (2023) Explainable AI (XAI): a systematic meta-survey of current challenges and future opportunities. Knowl Based Syst 263:110273. https://doi.org/10.1016/j.knosys.2023.110273

    Article  Google Scholar 

  • Sansone C, Sperlí G (2022) Legal information retrieval systems: state-of-the-art and open issues. Inf Syst 106:101967. https://doi.org/10.1016/j.is.2021.101967

    Article  Google Scholar 

  • Selvaraju RR, Cogswell M, Das A, et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

  • Setzu M, Guidotti R, Monreale A et al (2021) Glocalx—from local to global explanations of black box AI models. Artif Intell 294:103457. https://doi.org/10.1016/j.artint.2021.103457

    Article  MathSciNet  Google Scholar 

  • Shaikh RA, Sahu TP, Anand V (2020) Predicting outcomes of legal cases based on legal factors using classifiers. Proc Comput Sci 167:2393–2402. https://doi.org/10.1016/j.procs.2020.03.292

    Article  Google Scholar 

  • Shukla A, Bhattacharya P, Poddar S, et al (2022) Legal case document summarization: extractive and abstractive methods and their evaluation. In: Proceedings of the 2nd conference of the asia-pacific chapter of the association for computational linguistics and the 12th international joint conference on natural language processing (Volume 1: Long Papers). Association for Computational Linguistics, Online only, pp 1048–1064, URL https://aclanthology.org/2022.aacl-main.77

  • Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034

  • Strickson B, De La Iglesia B (2020) Legal judgement prediction for UK courts. In: Proceedings of the 3rd international conference on information science and systems. Association for computing machinery, New York, NY, USA, ICISS ’20, p 204-209, https://doi.org/10.1145/3388176.3388183,

  • Sundararajan M, Taly A, Yan Q (2017a) Axiomatic attribution for deep networks. In: International conference on machine learning, PMLR, pp 3319–3328

  • Sundararajan M, Taly A, Yan Q (2017b) Axiomatic attribution for deep networks. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, proceedings of machine learning research, vol 70. PMLR, pp 3319–3328, URL https://proceedings.mlr.press/v70/sundararajan17a.html

  • Tiersma P (2000) Legal language. Bibliovault OAI Repository, the University of Chicago Press 27. https://doi.org/10.1016/S1352-0237(00)00210-0

  • Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003, pp 142–147, URL https://aclanthology.org/W03-0419

  • Touvron H, Martin L, Stone K, et al (2023) Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288

  • Tunstall L, Beeching E, Lambert N, et al (2023) Zephyr: direct distillation of lm alignment. arXiv:2310.16944

  • Ventura F, Greco S, Apiletti D et al (2022) Trusting deep learning natural-language models via local and global explanations. Knowl Inf Syst 64(7):1863–1907

    Article  Google Scholar 

  • Visentin A, Nardotto A, O’Sullivan B (2019) Predicting judicial decisions: a statistically rigorous approach and a new ensemble classifier. In: 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI) pp 1820–1824. URL https://ieeexplore.ieee.org/document/8995348

  • Williams C (2005) Tradition and Change in Legal English. Peter Lang Verlag, Lausanne, Switzerland, https://doi.org/10.3726/978-3-0351-0317-5, URL https://www.peterlang.com/document/1043657

  • Yamada I, Asai A, Shindo H, et al (2020) LUKE: deep contextualized entity representations with entity-aware self-attention. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). Association for computational linguistics, Online, pp 6442–6454, https://doi.org/10.18653/v1/2020.emnlp-main.523, URL https://aclanthology.org/2020.emnlp-main.523

  • Zhang Y, Zhong V, Chen D, et al (2017) Position-aware attention and supervised data improve slot filling. In: Conference on empirical methods in natural language processing

  • Zhao H, Chen H, Yang F, et al (2023) Explainability for large language models: a survey. arXiv:2309.01029

  • Zhong L, Zhong Z, Zhao Z, et al (2019) Automatic summarization of legal decisions using iterative masking of predictive sentences. In: Proceedings of the seventeenth international conference on artificial intelligence and law. Association for computing machinery, New York, NY, USA, ICAIL ’19, pp 163–172, https://doi.org/10.1145/3322640.3326728

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irene Benedetto.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

We report in appendix

  • A detailed description of the hardware and configuration settings used in our experiments (see Sect. A.1);

  • Additional results for the CJP task (see Sect. A.2);

  • A deeper analysis of the impact of the boosting parameter on explanations on plausibility (see Sect. A.3);

  • additional results on the JUSTICE dataset (see Sect. A.4).

1.1 Experimental settings

Training parameters. In the L-NER task, we trained the models based on LUKE architecture using a batch size of 256 and a learning rate of \(10^{-4}\). In contrast, we trained BERT-based models using a batch size of 1 and a learning rate of \(10^{-5}\). Both models undergo a maximum of 10 training epochs, with an early stopping criterion applied. Both models also have a weight decay of 0.01 and a warmup ratio of 0.06.

For the CJP task, we trained sentence encoders for a maximum of 15 epochs, utilizing a learning rate of \(5\cdot 10^{-5}\), a warmup ratio of 0.06, a weight decay of 0.01, and a batch size of 64. The hierarchical transformer-based architecture has a maximum length of 256 tokens and undergoes a maximum of 100 epochs of training, with a learning rate of \(5\cdot 10^{-5}\) and a batch size of 256.

Tables 12 and 13 summarize the main per-model parameter settings, whereas Table 14 reports the training and evaluation times for L-NER and CJP models in terms of the number of samples processed per second.

Hardware. The experiments were run on a machine equipped with Intel\(^{\circledR }\) Core\(^{\textrm{TM}}\) i9-10980XE CPU, 2 \(\times \) Nvidia\(^{\circledR }\) RTX A6000 GPU, 128 GB of RAM running Ubuntu 22.04 LTS. We provide detailed information about the models used for the evaluation and the fine-tuning procedure in the official project repository.Footnote 3

Table 12 Summary of the used models and parameters for the NER task
Table 13 Summary of the used models and parameters for the CJP task
Table 14 Performance Analysis: training and evaluation samples per second for L-NER and CJP models. CJP models have four MLP layers; however, variations in the number of MLP layers do not affect the training and validation time

1.2 Additional results for court judgment prediction

Here we report the values of additional performance metrics on CJP, i.e., Precision and Recall. Table 15 reports the result obtained including the NER-masking step in the CJP pipeline whereas Table 16 shows the results obtained by exploiting original document sentences to make predictions.

Some relevant facts emerge from the analysis of these results:

  • Relationship between Recall and F1 Score: The best recall values are consistently associated with the best F1 scores. This implies that when the approach performs well in terms of recall, it tends to achieve a higher F1 score, which is a balanced measure considering both precision and recall. This indicates that a higher recall, or the ability to correctly identify positive instances, generally leads to better overall performance as reflected by the F1 score.

  • Precision and Recall with NER-Masking: The results show that when NER-masking is included in the pipeline, the precision and recall values tend to be higher compared to the approach without NER-masking. This suggests that incorporating NER-masking has a positive impact on the model’s ability to accurately identify positive instances and capture relevant information. In most cases, the precision and recall values achieved with NER-masking are consistently higher, indicating better performance in terms of correctly identifying entities of interest. However, there is one exception since the precision obtained without NER-masking on the development set is an instance where the precision value is higher compared to the precision achieved with NER-masking.

Table 15 CJP results obtained applying a hierarchical approach, masking the text with NER tags
Table 16 CJP results obtained applying a hierarchical approach, without masking the text with NER tags

1.3 Impact of boosting parameter on plausibility

Fig. 4
figure 4

Plausibility results varying the degree of boosting \(\beta \). We reported with a star the results when no boosting is applied (\(\beta \)=0)

We study the impact of the boosting parameter \(\beta \) on plausibility. We compute the explanations for the explain set for which ground truth explanations are available We report the results in Fig. 4. Generally, boosting is highly beneficial for Gradient (G) and GradientXInput (GxI) with no masking. These explainers are indeed the ones with the increasing percentage of documents whose explanations are affected by the boosting. The impact is low or none for LOO without masking and G with masking, for which it is measured a low percentage of modified explanations also when increasing \(\beta \).

1.4 Court judgment predictions on JUSTICE

We replicated the court judgment prediction experiments, with entity masking (H-NER-masked) and without (H-unmasked), on the JUSTICE dataset as well. The results are reported in Table 17. Notice that the entities recognized by the NER step cannot be validated because the JUSTICE dataset lacks on entity-level annotations.

Entity masking slightly improves prediction performance. The reason behind the more limited performance improvement compared to to Malik et al. (2021), are: (1) The lack of domain-specific NER annotations (we can just reuse the NER module trained on Indian court judgments), and (2) The more limited redundancy level of JUSTICE’s judgments, which contain only the facts and no extra information about the verdit.

Table 17 Results on JUSTICE dataset

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Benedetto, I., Koudounas, A., Vaiani, L. et al. Boosting court judgment prediction and explanation using legal entities. Artif Intell Law (2024). https://doi.org/10.1007/s10506-024-09397-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10506-024-09397-8

Keywords

Navigation