Skip to main content
Log in

Lessons learned building a legal inference dataset

  • Original Research
  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

Legal inference is fundamental for building and verifying hypotheses in police investigations. In this study, we build a Natural Language Inference dataset in Korean for the legal domain, focusing on criminal court verdicts. We developed an adversarial hypothesis collection tool that can challenge the annotators and give us a deep understanding of the data, and a hypothesis network construction tool with visualized graphs to show a use case scenario of the developed model. The data is augmented using a combination of Easy Data Augmentation approaches and round-trip translation, as crowd-sourcing might not be an option for datasets with sensible data. We extensively discuss challenges we have encountered, such as the annotator’s limited domain knowledge, issues in the data augmentation process, problems with handling long contexts and suggest possible solutions to the issues. Our work shows that creating legal inference datasets with limited resources is feasible and proposes further research in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The data is available on GitHub (https://github.com/onspark/LEAP_NLI_v2.0).

Notes

  1. Both police manuals for creating investigation result reports and investigation review reports were internal documents. We pursued expert interviews and solicited detailed explanations in written form to understand these processes more accurately.

References

  • Auto-GPT: An Autonomous GPT-4 Experiment. (2023). [Python]. Significant Gravitas. https://github.com/Significant-Gravitas/Auto-GPT

  • Bayer M, Kaufhold M-A, Reuter C (2022) A survey on data augmentation for text classification. ACM Comput Surv. https://doi.org/10.1145/3544558

    Article  Google Scholar 

  • Belinkov Y, Bisk Y (2018) Synthetic and natural noise both break neural machine translation (arXiv:1711.02173)

  • Beltagy I, Peters ME, Cohan A (2020) Longformer: the long-document transformer (arXiv:2004.05150)

  • Bhagavatula C, Bras RL, Malaviya C, Sakaguchi K, Holtzman A, Rashkin H, Downey D, Yih SW, Choi Y (2020) Abductive commonsense reasoning (arXiv:1908.05739)

  • Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference (arXiv:1508.05326)

  • Bras RL, Swayamdipta S, Bhagavatula C, Zellers R, Peters ME, Sabharwal A, Choi Y (2020) Adversarial filters of dataset biases (arXiv:2002.04108)

  • Clark K, Luong M-T, Le QV, Manning CD (2020) ELECTRA: pre-training text encoders as discriminators rather than generators (arXiv:2003.10555)

  • Conneau A, Rinott R, Lample G, Williams A, Bowman S, Schwenk H, Stoyanov V (2018) XNLI: evaluating cross-lingual sentence representations. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2475–2485. https://doi.org/10.18653/v1/D18-1269

  • Coulombe C (2018) Text data augmentation made simple by leveraging NLP cloud APIs (arXiv:1812.04718). arXiv. http://arxiv.org/abs/1812.04718

  • Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding (arXiv:1810.04805). arXiv. http://arxiv.org/abs/1810.04805

  • Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. arXiv. http://arxiv.org/abs/1412.6572

  • Gururangan S, Swayamdipta S, Levy O, Schwartz R, Bowman S, Smith NA (2018) Annotation artifacts in natural language inference data. In: Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: human language technologies, Volume 2 (Short Papers), pp 107–112. https://doi.org/10.18653/v1/N18-2017

  • Ham J, Choe YJ, Park K, Choi I, Soh H (2020) KorNLI and KorSTS: new benchmark datasets for Korean natural language understanding. Findings of the Association for Computational Linguistics: EMNLP 2020, pp 422–430. https://doi.org/10.18653/v1/2020.findings-emnlp.39

  • Heo J (2021) 110 cases per person... Police investigation examiner who was hit by a “day bomb.” Seoul Economic Daily. https://www.sedaily.com/NewsView/22M7D9OSWB

  • Jia Y, Liu Y, Yu X, Voida S (2017) Designing leaderboards for gamification: perceived differences based on user ranking, application domain, and personality traits. In: Proceedings of the 2017 CHI conference on human factors in computing systems, pp 1949–1960. https://doi.org/10.1145/3025453.3025826

  • Kaushik D, Hovy E, Lipton ZC (2020) Learning the difference that makes a difference with counterfactually-augmented data (arXiv:1909.12434)

  • Kim A (2022) How Democratic Party of Korea-led prosecution reforms fail victims. Korean Herald. https://www.koreaherald.com/view.php?ud=20220501000254&ACE_SEARCH=1

  • Kim M-Y, Rabelo J, Okeke K, Goebel R (2022) Legal information retrieval and entailment based on BM25, transformer and semantic thesaurus methods. Rev Socionetwork Strateg 16(1):157–174. https://doi.org/10.1007/s12626-022-00103-1

    Article  Google Scholar 

  • Kim T (2020) KorEDA [Python]. https://github.com/catSirup/KorEDA (Original work published 2020)

  • KLAID LJP Base (2022) Law&Company. lawcompany/KLAID_LJP_base

  • LBox Open (2022) [Python]. LBOX. https://github.com/lbox-kr/lbox-open

  • Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22(140):55–55

    Google Scholar 

  • Liu, H., Cui, L., Liu, J., & Zhang, Y. (2020). Natural Language Inference in Context—Investigating Contextual Reasoning over Long Texts (arXiv:2011.04864)

  • Nakajima, Y. (2023). BabyAGI [Python]. https://github.com/yoheinakajima/babyagi

  • Nie Y, Williams A, Dinan E, Bansal M, Weston J, Kiela D (2020) Adversarial NLI: a new benchmark for natural language understanding. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 4885–4901. https://doi.org/10.18653/v1/2020.acl-main.441

  • Oshin M (2023) GPT-4 & LangChain—Create a ChatGPT Chatbot for Your PDF Files [TypeScript]. https://github.com/mayooear/gpt4-pdf-chatbot-langchain

  • Park D (2021) KoEDA [Python]. https://github.com/toriving/KoEDA (Original work published 2020)

  • Park J (2022) KoELECTRA [Python]. https://github.com/monologg/KoELECTRA

  • Park S, Moon J, Kim S, Cho WI, Han J, Park J, Song C, Kim J, Song Y, Oh T, Lee J, Oh J, Lyu S, Jeong Y, Lee I, Seo S, Lee D, Kim H, Lee M et al (2021) KLUE: Korean language understanding evaluation (arXiv:2105.09680)

  • Pirolli P, Card S (2005) The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. In: The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis

  • Poliak A, Naradowsky J, Haldar A, Rudinger R, Van Durme B (2018) Hypothesis only baselines in natural language inference. In: Proceedings of the seventh joint conference on lexical and computational semantics, pp 180–191. https://doi.org/10.18653/v1/S18-2023

  • Rabelo J, Goebel R, Kim M-Y, Kano Y, Yoshioka M, Satoh K (2022) Overview and discussion of the competition on legal information extraction/entailment (COLIEE) 2021. Rev Socionetwork Strateg 16(1):111–133. https://doi.org/10.1007/s12626-022-00105-z

    Article  Google Scholar 

  • Um J (2022) S. Korean Democrats’ long road to reforming prosecution service: Victory or blunder? Hankyoreh. https://english.hani.co.kr/arti/english_edition/e_national/1041606.html

  • Wei J, Zou K (2019) EDA: easy data augmentation techniques for boosting performance on text classification tasks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 6381–6387. https://doi.org/10.18653/v1/D19-1670

  • Williams A, Nangia N, Bowman S (2018) A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, vol 1 (Long Papers), pp 1112–1122. https://doi.org/10.18653/v1/N18-1101

  • Woo J (2020) S. Korea takes long overdue steps to rein in prosecution service, but task far from over. Yonhap News. https://en.yna.co.kr/view/AEN20201217008100315

  • Xie Q, Dai Z, Hovy E, Luong M-T, Le QV (2020) Unsupervised data augmentation for consistency training (arXiv:1904.12848)

  • Yu AW, Dohan D, Luong M-T, Zhao R, Chen K, Norouzi M, Le QV (2018) QANet: combining local convolution with global self-attention for reading comprehension (arXiv:1804.09541)

  • Zhang WE, Sheng QZ, Alhazmi A, Li C (2020) Adversarial attacks on deep-learning models in natural language processing: a survey. ACM Tran Intell Syst Technol 11(3):1–41. https://doi.org/10.1145/3374217

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported and funded by the Korean National Police Agency [Project Name: AI-Based Crime Investigation Support System/ Project Number: PR10-02-000-21]. The authors also thank the Legal Informatics and Forensic Science (LIFS) institute at Hallym University and its researchers for their indispensable help in creating the data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joshua I. James.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1

Appendix 1

See Table 13. Key phrases are highlighted in bold.

Table 13 Types of inference marked with an asterisk (*) were adapted from (Nie et al. 2020)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, S., James, J.I. Lessons learned building a legal inference dataset. Artif Intell Law (2023). https://doi.org/10.1007/s10506-023-09370-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10506-023-09370-x

Keywords

Navigation