Transformer-Based Approaches for Legal Text Processing

Nguyen, Ha-Thanh; Nguyen, Minh-Phuong; Vuong, Thi-Hai-Yen; Bui, Minh-Quan; Nguyen, Minh-Chau; Dang, Tran-Binh; Tran, Vu; Nguyen, Le-Minh; Satoh, Ken

doi:10.1007/s12626-022-00102-2

Transformer-Based Approaches for Legal Text Processing

JNLP Team - COLIEE 2021

Article
Published: 25 January 2022

Volume 16, pages 135–155, (2022)
Cite this article

The Review of Socionetwork Strategies Aims and scope Submit manuscript

Ha-Thanh Nguyen ORCID: orcid.org/0000-0003-2794-7010¹,
Minh-Phuong Nguyen¹,
Thi-Hai-Yen Vuong³,
Minh-Quan Bui¹,
Minh-Chau Nguyen¹,
Tran-Binh Dang¹,
Vu Tran²,
Le-Minh Nguyen¹ &
…
Ken Satoh⁴

446 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we introduce our approaches using Transformer-based models for different problems of the COLIEE 2021 automatic legal text processing competition. Automated processing of legal documents is a challenging task because of the characteristics of legal documents as well as the limitation of the amount of data. With our detailed experiments, we found that Transformer-based pretrained language models can perform well with automated legal text-processing problems with appropriate approaches. We describe in detail the processing steps for each task such as problem formulation, data processing and augmentation, pretraining, finetuning. In addition, we introduce to the community two pretrained models that take advantage of parallel translations in legal domain, NFSP and NMSP. In which, NFSP achieves the state-of-the-art result in Task 5 of the competition. Although the paper focuses on technical reporting, the novelty of its approaches can also be an useful reference in automated legal document processing using Transformer-based models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bringing order into the realm of Transformer-based language models for artificial intelligence and law

Article Open access 20 November 2023

Multi-language transfer learning for low-resource legal case summarization

Article Open access 25 September 2023

Translating Simple Legal Text to Formal Representations

Notes

References

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, pp. 5998–6008.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT, pp. 4171–4186.
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). Albert: A lite bert for self-supervised learning of language representations. International Conference on Learning Representations.
Clark, K., Luong, M.-T., Le, Q. V, & Manning, C. D. (2020). Electra: Pre-training text encoders as discriminators rather than generators. International Conference on Learning Representations.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. Association for Computational Linguistics, pp. 7871–7880.
Radford, A., Jeffrey, W., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.
Google Scholar
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A. et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, pp. 1877–1901.
Kowalski, R., & Datoo, A. (2021). Logical English meets legal English for swaps and derivatives. Artificial Intelligence and Law. https://doi.org/10.1007/s10506-021-09295-3.
Article Google Scholar
Satoh, K., Asai, K., Kogawa, T., Kubota, M., Nakamura, M., Nishigai, Y., et al. (2010). Proleg: an implementation of the presupposed ultimate fact theory of Japanese civil code by prolog technology. JSAI international symposium on artificial intelligence (pp. 153–164). Berlin: Springer.
Google Scholar
Cooper, W. S. (1971). A definition of relevance for information retrieval. Information Storage and Retrieval, 7(1), 19–37.
Article Google Scholar
Hans, P. L. (1957). A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4), 309–317.
Article Google Scholar
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing Management, 24(5), 513–523.
Article Google Scholar
Sugathadasa, K., Ayesha, B., de Silva, N., Perera, A. S., & Jayawardana, V. (2018). Legal document retrieval using document vector embeddings and deep learning. In L. Dimuthu & P. Madhavi (Eds.), Science and information conference (pp. 160–175). Springer.
Google Scholar
Kien, P. M., Nguyen, H.-T., Bach, N. X., Tran, V., Nguyen, M. L., & Phuong, T. M. (2020). Answering legal questions by learning neural attentive text representation. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain, pp. 988–998.
Tran, V., Nguyen, M. L., & Satoh, K. (2019). Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, pp. 275–282.
Rabelo, J., Kim, M.-Y., & Goebel, R. (2019). Combining similarity and transformer methods for case law entailment. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, 19, 290–296.
Google Scholar
Nguyen, H.-T., Vuong, H.-Y. T., Nguyen, . M., Dang, B. T., Bui, Q. M., Vu, S. T., Nguyen, C. M., Tran, V., Satoh, K., & Nguyen, M. L. (2020). Jnlp team: Deep learning for legal processing in coliee 2020. Proceedings of the International Workshop on Juris-Informatics, pp. 195–208.
Dang, T.B., Nguyen, T., & Nguyen, L. M. (2019). An approach to statute law retrieval task in coliee-2019. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.
Wehnert, S., Hoque, S.A., Fenske, W., & Saake, G. (2019). Threshold-based retrieval and textual entailment detection on legal bar exam questions. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.
Gain, B., Bandyopadhyay, D., Saikh, T., & Ekbal, A. (2019). Iitp@coliee 2019: Legal information retrieval using bm25 and bert. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.
Hayashi, R., & Kano, Y. (2019). Searching relevant articles for legal bar exam by doc2vec and tf-idf. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.
Hoshino, R., Kiyota, N., & Kano, Y. (2019). Question answering system for legal bar examination using predicate argument structures focusing on exceptions. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.
Hudzina, J., Vacek, T., Madan, K., Tonya, C., & Schilder, F. (2019). Statutory entailment using similarity features and decomposable attention models. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.
Nguyen, H. T., Tran, V., & Nguyen, L. M. (2019). A deep learning approach for statute law entailment task in coliee-2019. Proceedings of the 6th Competition on Legal Information Extraction/Entailment. COLIEE.
Triguero, I., García, S., & Herrera, F. (2015). Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems, 42(2), 245–284.
Article Google Scholar
Nguyen, H.-T., Tran, V., Nguyen, P. M., Vuong, T.-H.-Y., Bui, Q. M., Nguyen, C. M., Dang, B. T., Nguyen, M. L., & Satoh, K. (2021). Paralaw nets–cross-lingual sentence-level pretraining for legal text processing. Proceedings of the 8th Competition on Legal Information Extraction/Entailment. COLIEE.
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L. & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. Association for Computational Linguistics, pp. 8440–8451.

Download references

Acknowledgements

This work was supported by JSPS Kakenhi Grant Number 20H04295, 20K20406, and 20K20625. The research also was supported in part by the Asian Office of Aerospace R&D (AOARD), Air Force Office of Scientific Research (Grant no. FA2386-19-1-4041).

Author information

Authors and Affiliations

Japan Advanced Institute of Science and Technology, Nomi, Japan
Ha-Thanh Nguyen, Minh-Phuong Nguyen, Minh-Quan Bui, Minh-Chau Nguyen, Tran-Binh Dang & Le-Minh Nguyen
Institute of Statistical Mathematics, Tokyo, Japan
Vu Tran
University of Engineering and Technology, VNU, Hanoi, Vietnam
Thi-Hai-Yen Vuong
National Institute of Informatics, Tokyo, Japan
Ken Satoh

Authors

Ha-Thanh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Minh-Phuong Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Thi-Hai-Yen Vuong
View author publications
You can also search for this author in PubMed Google Scholar
Minh-Quan Bui
View author publications
You can also search for this author in PubMed Google Scholar
Minh-Chau Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Tran-Binh Dang
View author publications
You can also search for this author in PubMed Google Scholar
Vu Tran
View author publications
You can also search for this author in PubMed Google Scholar
Le-Minh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Ken Satoh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ha-Thanh Nguyen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, HT., Nguyen, MP., Vuong, THY. et al. Transformer-Based Approaches for Legal Text Processing. Rev Socionetwork Strat 16, 135–155 (2022). https://doi.org/10.1007/s12626-022-00102-2

Download citation

Received: 09 September 2021
Accepted: 07 January 2022
Published: 25 January 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s12626-022-00102-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformer-Based Approaches for Legal Text Processing

Abstract

Access this article

Similar content being viewed by others

Bringing order into the realm of Transformer-based language models for artificial intelligence and law

Multi-language transfer learning for low-resource legal case summarization

Translating Simple Legal Text to Formal Representations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transformer-Based Approaches for Legal Text Processing

Abstract

Access this article

Similar content being viewed by others

Bringing order into the realm of Transformer-based language models for artificial intelligence and law

Multi-language transfer learning for low-resource legal case summarization

Translating Simple Legal Text to Formal Representations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation