Skip to main content
Log in

Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents

  • Original Research
  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

Named entity recognition (NER) is a very relevant task for text information retrieval in natural language processing (NLP) problems. Most recent state-of-the-art NER methods require humans to annotate and provide useful data for model training. However, using human power to identify, circumscribe and label entities manually can be very expensive in terms of time, money, and effort. This paper investigates the use of prompt-based language models (OpenAI’s GPT-3) and weak supervision in the legal domain. We apply both strategies as alternative approaches to the traditional human-based annotation method, relying on computer power instead human effort for labeling, and subsequently compare model performance between computer and human-generated data. We also introduce combinations of all three mentioned methods (prompt-based, weak supervision, and human annotation), aiming to find ways to maintain high model efficiency and low annotation costs. We showed that, despite human labeling still maintaining better overall performance results, the alternative strategies and their combinations presented themselves as valid options, displaying positive results and similar model scores at lower costs. Final results demonstrate preservation of human-trained models scores averaging 74.0% for GPT-3, 95.6% for weak supervision, 90.7% for GPT + weak supervision combination, and 83.9% for GPT + 30% human-labeling combination.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://ai.stanford.edu/blog/weak-supervision/.

  2. https://nido.unb.br/.

  3. https://github.com/UnB-KnEDLe.

  4. https://huggingface.co/pierreguillou/bert-base-cased-pt-lenerbr.

  5. https://huggingface.co/adalbertojunior/distilbert-portuguese-cased.

  6. The Critical Difference (CD) is a metric established at Demšar (2006), that determines if one or more learning algorithms, in a specific domain, are in fact statistically different or not. The CD value is calculated considering each algorithm’s results and their relative difference from each other. This value represents a critical difference threshold, that if surpassed by the relative difference between algorithms, makes it possible to affirm their statistical difference.

References

  • Bach SH, Rodriguez D, Liu Y et al (2019) Snorkel drybell: a case study in deploying weak supervision at industrial scale. In: Proceedings of the 2019 international conference on management of data, SIGMOD ’19. Association for Computing Machinery, New York, NY, USA, pp 362–375. https://doi.org/10.1145/3299869.3314036

  • Brown TB, Mann B, Ryder N et al (2020) Language models are few-shot learners. arXiv:2005.14165

  • Chowdhary K (2020) Natural language processing. In: Fundamentals of artificial intelligence. Springer, New Delhi, pp 603–649

  • Dai H, Song Y, Wang H (2021) Ultra-fine entity typing with weak supervision from a masked language model. arXiv:2106.04098

  • Dale R (2021) Gpt-3: what’s it good for? Nat Lang Eng 27(1):113–118. https://doi.org/10.1017/S1351324920000601

    Article  Google Scholar 

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  Google Scholar 

  • Dozier C, Kondadadi R, Light M et al (2010) Named entity recognition and resolution in legal text. In: Semantic processing of legal texts. Springer, pp 27–43

  • Eddy SR (2004) What is a hidden Markov model? Nat Biotechnol 22(10):1315–1316

    Article  CAS  PubMed  Google Scholar 

  • Floridi L, Chiriatti M (2020) Gpt-3: its nature, scope, limits, and consequences. Mind Mach 30(4):681–694

    Article  Google Scholar 

  • Fredriksson T, Mattos DI, Bosch J et al (2020) Data labeling: an empirical investigation into industrial challenges and mitigation strategies. In: Product-focused software process improvement: 21st international conference, PROFES 2020, Proceedings 21, Turin, Italy, November 25–27, 2020. Springer, pp 202–216

  • Giri R, Porwal Y, Shukla V et al (2017) Approaches for information retrieval in legal documents. In: 2017 tenth international conference on contemporary computing (IC3). IEEE, pp 1–6

  • Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18(5):602–610. https://doi.org/10.1016/j.neunet.2005.06.042

    Article  PubMed  Google Scholar 

  • Karamanolakis G, Mukherjee S, Zheng G et al (2021) Self-training with weak supervision. arXiv:2104.05514

  • Lison P, Hubin A, Barnes J et al (2020) Named entity recognition without labelled data: a weak supervision approach. arXiv:2004.14723

  • Lison P, Barnes J, Hubin A (2021) skweak: weak supervision made easy for NLP. arXiv preprint arXiv:2104.09683

  • Liu Y, Ott M, Goyal N et al (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

  • Liu P, Yuan W, Fu J et al (2023) Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput Surv 55(9):1–35. https://doi.org/10.1145/3560815

    Article  Google Scholar 

  • Luz de Araujo PH, de Campos TE, de Oliveira RR et al (2018) LeNER-Br: a dataset for named entity recognition in brazilian legal text. In: International conference on computational processing of the Portuguese language. Springer, pp 313–323

  • Maiya AS (2020) ktrain: a low-code library for augmented machine learning. arXiv preprint arXiv:2004.10703 [cs.LG]

  • Marrero M, Urbano J, Sánchez-Cuadrado S et al (2013) Named entity recognition: fallacies, challenges and opportunities. Comput Stand Interfaces 35(5):482–489

    Article  Google Scholar 

  • Meyer S, Elsweiler D, Ludwig B et al (2022) Do we still need human assessors? prompt-based gpt-3 user simulation in conversational ai. In: Proceedings of the 4th conference on conversational user interfaces, CUI ’22. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3543829.3544529,

  • Nasar Z, Jaffry SW, Malik MK (2021) Named entity recognition and relation extraction: state-of-the-art. ACM Comput Surv 54(1):1–39

    Article  Google Scholar 

  • Ratner A, Bach SH, Ehrenberg H et al (2020) Snorkel: rapid training data creation with weak supervision. VLDB J 29(2):709–730

    Article  PubMed  Google Scholar 

  • Ratner AJ, De Sa CM, Wu S et al (2016) Data programming: creating large training sets, quickly. Advances in neural information processing systems 29

  • Sakhaee N, Wilson MC (2021) Information extraction framework to build legislation network. Artif Intell Law 29(1):35–58

    Article  Google Scholar 

  • Smith LN (2015) Cyclical learning rates for training neural networks. arXiv:1506.01186

  • Souza F, Nogueira R, Lotufo R (2020) BERTimbau: pretrained BERT models for Brazilian Portuguese. In: Brazilian conference on intelligent systems. Springer, pp 403–417

  • Sun C, Qiu X, Xu Y et al (2019) How to fine-tune bert for text classification? In: China national conference on Chinese computational linguistics. Springer, Cham, pp 194–206

  • Torfi A, Shirvani RA, Keneshloo Y et al (2020) Natural language processing advancements by deep learning: a survey. arXiv preprint arXiv:2003.01200

  • Vardhan H, Surana N, Tripathy B (2021) Named-entity recognition for legal documents. In: International conference on advanced machine learning technologies and applications. Springer, pp 469–479

  • Vasiliev Y (2020) Natural Language processing with Python and SpaCy: a practical introduction. No Starch Press, San Francisco

    Google Scholar 

  • Wang S, Liu Y, Xu Y et al (2021) Want to reduce labeling cost? GPT-3 can help. arXiv:2108.13487

  • Wang S, Sun X, Li X et al (2023) Gpt-ner: named entity recognition via large language models. arXiv:2304.10428

  • Wei X, Cui X, Cheng N et al (2023) Zero-shot information extraction via chatting with chatgpt. arXiv:2302.10205

  • Zamani H, Croft WB (2018) On the theory of weak supervision for information retrieval. In: Proceedings of the 2018 ACM SIGIR international conference on theory of information retrieval, ICTIR ’18. Association for Computing Machinery, New York, NY, USA, pp 147–154. https://doi.org/10.1145/3234944.3234968

  • Zhang S, He L, Dragut E et al (2019) How to invest my time: Lessons from human-in-the-loop entity extraction. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2305–2313

  • Zhou ZH (2018) A brief introduction to weakly supervised learning. Natl Sci Rev 5(1):44–53

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Fundação de Apoio à Pesquisa do Distrito Federal (FAPDF), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES), Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP - process number 2023/10100-4), and project KnEDLe-UNB.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Marcacini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oliveira, V., Nogueira, G., Faleiros, T. et al. Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents. Artif Intell Law (2024). https://doi.org/10.1007/s10506-023-09388-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10506-023-09388-1

Keywords

Navigation