Skip to main content
Log in

NOWJ at COLIEE 2023: Multi-task and Ensemble Approaches in Legal Information Processing

  • Article
  • Published:
The Review of Socionetwork Strategies Aims and scope Submit manuscript

Abstract

This paper presents the NOWJ team’s approach to the COLIEE 2023 Competition, which focuses on advancing legal information processing techniques and applying them to real-world legal scenarios. Our team tackled the four tasks in the competition, which involved legal case retrieval, legal case entailment, statute law retrieval, and legal textual entailment. We employ state-of-the-art machine learning models and innovative approaches, such as BERT, Longformer, BM25-ranking algorithm, and multi-task learning models. Our participation in the COLIEE 2023 has provided useful insights including the importance of the pre-processing and feature engineering, effectiveness of the multi-task models in combining different legal tasks to improve model’s performance. Although our team did not achieve state-of-the-art results, our findings identify areas for further research and improvements in legal information processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

The datasets used in this paper are provided and published in the COLIEE 2023 competition. We added the data availability information in the paper.

Notes

  1. https://huggingface.co/bert-base-multilingual-cased.

  2. https://www.elastic.co/.

  3. https://huggingface.co/bert-base-uncased.

  4. https://huggingface.co/lexlms/legal-longformer-base.

  5. https://huggingface.co/bert-base-multilingual-uncased.

References

  1. Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150

  2. Chalkidis, I., & Kampas, D. (2019). Deep learning in law: Early adaptation and legal word embeddings trained on large corpora. Artificial Intelligence and Law, 27(2), 171–198.

    Article  Google Scholar 

  3. Dietterich, T.G. (2000). Ensemble methods in machine learning. In: Multiple Classifier Systems: First International Workshop, MCS 2000 Cagliari, Italy, June 21–23, 2000 Proceedings (1, pp. 1–15). Springer

  4. Fujita, M., Onaga, T., Ueyama, A., & Kano, Y. (2023). Legal textual entailment using ensemble of rule-based and bert-based method with data augmentation by related article generation. In: New Frontiers in Artificial Intelligence: JSAI-isAI 2022 Workshop, JURISIN 2022, and JSAI 2022 International Session, Kyoto, Japan, June 12–17, 2022, Revised Selected Papers (pp. 138–153). Springer

  5. Goebel, R., Kano, Y., Kim, MY. et al. (2024). Overview and Discussion of the Competition on Legal Information, Extraction/Entailment (COLIEE) 2023. Review of Socionetwork Strategies 16, 111–133. https://doi.org/10.1007/s12626-023-00152-0.

    Article  Google Scholar 

  6. Kim, M. Y., Rabelo, J., Okeke, K., & Goebel, R. (2022). Legal information retrieval and entailment based on bm25, transformer and semantic thesaurus methods. The Review of Socionetwork Strategies, 16(1), 157–174.

    Article  Google Scholar 

  7. Nguyen, H. T., Phi, M. K., Ngo, X. B., Tran, V., Nguyen, L. M., & Tu, M. P. (2022). Attentive deep neural networks for legal document retrieval. Artificial Intelligence and Law 1–30

  8. Rabelo, J., Kim, M. Y., & Goebel, R. (2023). Semantic-based classification of relevant case law. In: New Frontiers in Artificial Intelligence: JSAI-isAI 2022 Workshop, JURISIN 2022, and JSAI 2022 International Session, Kyoto, Japan, June 12–17, 2022, Revised Selected Papers (pp. 84–95). Springer

  9. Rosa, G., Bonifacio, L., Jeronymo, V., Lotufo, R., & Nogueira, R. (2022). Billions of parameters are worth more than in-domain training data: A case study in the legal case entail- ment task. Proceedings of the Sixteenth International Workshop on Juris-informatics (JURISIN 2022).

  10. Rosa, G. M., Rodrigues, R. C., Lotufo, R., & Nogueira, R. (2021). Yes, bm25 is a strong baseline for legal case retrieval. arXiv preprint arXiv:2105.05686

  11. Shao, Y., Mao, J., Liu, Y., Ma, W., Satoh, K., Zhang, M., & Ma, S. (2020). Bert-pli: Modeling paragraph-level interactions for legal case retrieval. In: C. Bessiere (ed.), Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 (pp. 3501–3507). International Joint Conferences on Artificial Intelligence Organization. https://doi.org/10.24963/ijcai.2020/484.

  12. Tran, M. V., Nguyen, T. T., Nguyen, T. S., & Le, H. Q. (2010). Automatic named entity set expansion using semantic rules and wrappers for unary relations. In: 2010 International Conference on Asian Language Processing (pp. 170–173). IEEE

  13. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., & Polosukhin, I. (2017). Attention is all you need. In: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (eds.), Advances in Neural Information Processing Systems vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.

  14. Vuong, Y.TH., Bui, Q.M., Nguyen, HT. et al. (2023). SM-BERT-CR: a deep learning approach for case law retrieval with supporting model. Artificial Intelligence and Law 31, 601–628. https://doi.org/10.1007/s10506-022-09319-6.

    Article  Google Scholar 

  15. Yoshioka, M., Suzuki, Y., & Aoki, Y. (2023). Hukb at the coliee 2022 statute law task. In: New Frontiers in Artificial Intelligence: JSAI-isAI 2022 Workshop, JURISIN 2022, and JSAI 2022 International Session, Kyoto, Japan, June 12–17, 2022, Revised Selected Papers (pp. 109–124). Springer

Download references

Acknowledgements

Hai-Long Nguyen was funded by the Master, Ph.D. Scholarship Programme of Vingroup Innovation Foundation (VINIF), code VINIF.2023.ThS.075.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai-Long Nguyen.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vuong, THY., Nguyen, HL., Nguyen, TM. et al. NOWJ at COLIEE 2023: Multi-task and Ensemble Approaches in Legal Information Processing. Rev Socionetwork Strat 18, 145–165 (2024). https://doi.org/10.1007/s12626-024-00157-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12626-024-00157-3

Keywords

Navigation