Skip to main content
Log in

Transformer-Based Approaches for Legal Text Processing

JNLP Team - COLIEE 2021

  • Article
  • Published:
The Review of Socionetwork Strategies Aims and scope Submit manuscript

Abstract

In this paper, we introduce our approaches using Transformer-based models for different problems of the COLIEE 2021 automatic legal text processing competition. Automated processing of legal documents is a challenging task because of the characteristics of legal documents as well as the limitation of the amount of data. With our detailed experiments, we found that Transformer-based pretrained language models can perform well with automated legal text-processing problems with appropriate approaches. We describe in detail the processing steps for each task such as problem formulation, data processing and augmentation, pretraining, finetuning. In addition, we introduce to the community two pretrained models that take advantage of parallel translations in legal domain, NFSP and NMSP. In which, NFSP achieves the state-of-the-art result in Task 5 of the competition. Although the paper focuses on technical reporting, the novelty of its approaches can also be an useful reference in automated legal document processing using Transformer-based models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://pypi.org/project/rank-bm25/.

  2. https://huggingface.co/models.

  3. https://www.japaneselawtranslation.go.jp.

  4. https://pypi.org/project/langdetect/.

  5. https://huggingface.co/cl-tohoku/bert-base-japanese.

  6. https://huggingface.co/cl-tohoku/bert-base-japanese-whole-word-masking.

  7. https://huggingface.co/xlm-roberta-base.

References

  1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, pp. 5998–6008.

  2. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT, pp. 4171–4186.

  3. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). Albert: A lite bert for self-supervised learning of language representations. International Conference on Learning Representations.

  4. Clark, K., Luong, M.-T., Le, Q. V, & Manning, C. D. (2020). Electra: Pre-training text encoders as discriminators rather than generators. International Conference on Learning Representations.

  5. Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. Association for Computational Linguistics, pp. 7871–7880.

  6. Radford, A., Jeffrey, W., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.

    Google Scholar 

  7. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A. et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, pp. 1877–1901.

  8. Kowalski, R., & Datoo, A. (2021). Logical English meets legal English for swaps and derivatives. Artificial Intelligence and Law. https://doi.org/10.1007/s10506-021-09295-3.

    Article  Google Scholar 

  9. Satoh, K., Asai, K., Kogawa, T., Kubota, M., Nakamura, M., Nishigai, Y., et al. (2010). Proleg: an implementation of the presupposed ultimate fact theory of Japanese civil code by prolog technology. JSAI international symposium on artificial intelligence (pp. 153–164). Berlin: Springer.

    Google Scholar 

  10. Cooper, W. S. (1971). A definition of relevance for information retrieval. Information Storage and Retrieval, 7(1), 19–37.

    Article  Google Scholar 

  11. Hans, P. L. (1957). A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4), 309–317.

    Article  Google Scholar 

  12. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing Management, 24(5), 513–523.

    Article  Google Scholar 

  13. Sugathadasa, K., Ayesha, B., de Silva, N., Perera, A. S., & Jayawardana, V. (2018). Legal document retrieval using document vector embeddings and deep learning. In L. Dimuthu & P. Madhavi (Eds.), Science and information conference (pp. 160–175). Springer.

    Google Scholar 

  14. Kien, P. M., Nguyen, H.-T., Bach, N. X., Tran, V., Nguyen, M. L., & Phuong, T. M. (2020). Answering legal questions by learning neural attentive text representation. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain, pp. 988–998.

  15. Tran, V., Nguyen, M. L., & Satoh, K. (2019). Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, pp. 275–282.

  16. Rabelo, J., Kim, M.-Y., & Goebel, R. (2019). Combining similarity and transformer methods for case law entailment. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, 19, 290–296.

    Google Scholar 

  17. Nguyen, H.-T., Vuong, H.-Y. T., Nguyen, . M., Dang, B. T., Bui, Q. M., Vu, S. T., Nguyen, C. M., Tran, V., Satoh, K., & Nguyen, M. L. (2020). Jnlp team: Deep learning for legal processing in coliee 2020. Proceedings of the International Workshop on Juris-Informatics, pp. 195–208.

  18. Dang, T.B., Nguyen, T., & Nguyen, L. M. (2019). An approach to statute law retrieval task in coliee-2019. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.

  19. Wehnert, S., Hoque, S.A., Fenske, W., & Saake, G. (2019). Threshold-based retrieval and textual entailment detection on legal bar exam questions. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.

  20. Gain, B., Bandyopadhyay, D., Saikh, T., & Ekbal, A. (2019). Iitp@coliee 2019: Legal information retrieval using bm25 and bert. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.

  21. Hayashi, R., & Kano, Y. (2019). Searching relevant articles for legal bar exam by doc2vec and tf-idf. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.

  22. Hoshino, R., Kiyota, N., & Kano, Y. (2019). Question answering system for legal bar examination using predicate argument structures focusing on exceptions. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.

  23. Hudzina, J., Vacek, T., Madan, K., Tonya, C., & Schilder, F. (2019). Statutory entailment using similarity features and decomposable attention models. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.

  24. Nguyen, H. T., Tran, V., & Nguyen, L. M. (2019). A deep learning approach for statute law entailment task in coliee-2019. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.

  25. Triguero, I., García, S., & Herrera, F. (2015). Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems, 42(2), 245–284.

    Article  Google Scholar 

  26. Nguyen, H.-T., Tran, V., Nguyen, P. M., Vuong, T.-H.-Y., Bui, Q. M., Nguyen, C. M., Dang, B. T., Nguyen, M. L., & Satoh, K. (2021). Paralaw nets–cross-lingual sentence-level pretraining for legal text processing. Proceedings of the 8th Competition on Legal Information Extraction/Entailment. COLIEE.

  27. Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L. & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. Association for Computational Linguistics, pp. 8440–8451.

Download references

Acknowledgements

This work was supported by JSPS Kakenhi Grant Number 20H04295, 20K20406, and 20K20625. The research also was supported in part by the Asian Office of Aerospace R&D (AOARD), Air Force Office of Scientific Research (Grant no. FA2386-19-1-4041).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ha-Thanh Nguyen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, HT., Nguyen, MP., Vuong, THY. et al. Transformer-Based Approaches for Legal Text Processing. Rev Socionetwork Strat 16, 135–155 (2022). https://doi.org/10.1007/s12626-022-00102-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12626-022-00102-2

Keywords

Navigation