Abstract
This paper presents the NOWJ team’s approach to the COLIEE 2023 Competition, which focuses on advancing legal information processing techniques and applying them to real-world legal scenarios. Our team tackled the four tasks in the competition, which involved legal case retrieval, legal case entailment, statute law retrieval, and legal textual entailment. We employ state-of-the-art machine learning models and innovative approaches, such as BERT, Longformer, BM25-ranking algorithm, and multi-task learning models. Our participation in the COLIEE 2023 has provided useful insights including the importance of the pre-processing and feature engineering, effectiveness of the multi-task models in combining different legal tasks to improve model’s performance. Although our team did not achieve state-of-the-art results, our findings identify areas for further research and improvements in legal information processing.
Similar content being viewed by others
Data availability
The datasets used in this paper are provided and published in the COLIEE 2023 competition. We added the data availability information in the paper.
References
Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150
Chalkidis, I., & Kampas, D. (2019). Deep learning in law: Early adaptation and legal word embeddings trained on large corpora. Artificial Intelligence and Law, 27(2), 171–198.
Dietterich, T.G. (2000). Ensemble methods in machine learning. In: Multiple Classifier Systems: First International Workshop, MCS 2000 Cagliari, Italy, June 21–23, 2000 Proceedings (1, pp. 1–15). Springer
Fujita, M., Onaga, T., Ueyama, A., & Kano, Y. (2023). Legal textual entailment using ensemble of rule-based and bert-based method with data augmentation by related article generation. In: New Frontiers in Artificial Intelligence: JSAI-isAI 2022 Workshop, JURISIN 2022, and JSAI 2022 International Session, Kyoto, Japan, June 12–17, 2022, Revised Selected Papers (pp. 138–153). Springer
Goebel, R., Kano, Y., Kim, MY. et al. (2024). Overview and Discussion of the Competition on Legal Information, Extraction/Entailment (COLIEE) 2023. Review of Socionetwork Strategies 16, 111–133. https://doi.org/10.1007/s12626-023-00152-0.
Kim, M. Y., Rabelo, J., Okeke, K., & Goebel, R. (2022). Legal information retrieval and entailment based on bm25, transformer and semantic thesaurus methods. The Review of Socionetwork Strategies, 16(1), 157–174.
Nguyen, H. T., Phi, M. K., Ngo, X. B., Tran, V., Nguyen, L. M., & Tu, M. P. (2022). Attentive deep neural networks for legal document retrieval. Artificial Intelligence and Law 1–30
Rabelo, J., Kim, M. Y., & Goebel, R. (2023). Semantic-based classification of relevant case law. In: New Frontiers in Artificial Intelligence: JSAI-isAI 2022 Workshop, JURISIN 2022, and JSAI 2022 International Session, Kyoto, Japan, June 12–17, 2022, Revised Selected Papers (pp. 84–95). Springer
Rosa, G., Bonifacio, L., Jeronymo, V., Lotufo, R., & Nogueira, R. (2022). Billions of parameters are worth more than in-domain training data: A case study in the legal case entail- ment task. Proceedings of the Sixteenth International Workshop on Juris-informatics (JURISIN 2022).
Rosa, G. M., Rodrigues, R. C., Lotufo, R., & Nogueira, R. (2021). Yes, bm25 is a strong baseline for legal case retrieval. arXiv preprint arXiv:2105.05686
Shao, Y., Mao, J., Liu, Y., Ma, W., Satoh, K., Zhang, M., & Ma, S. (2020). Bert-pli: Modeling paragraph-level interactions for legal case retrieval. In: C. Bessiere (ed.), Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 (pp. 3501–3507). International Joint Conferences on Artificial Intelligence Organization. https://doi.org/10.24963/ijcai.2020/484.
Tran, M. V., Nguyen, T. T., Nguyen, T. S., & Le, H. Q. (2010). Automatic named entity set expansion using semantic rules and wrappers for unary relations. In: 2010 International Conference on Asian Language Processing (pp. 170–173). IEEE
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., & Polosukhin, I. (2017). Attention is all you need. In: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (eds.), Advances in Neural Information Processing Systems vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
Vuong, Y.TH., Bui, Q.M., Nguyen, HT. et al. (2023). SM-BERT-CR: a deep learning approach for case law retrieval with supporting model. Artificial Intelligence and Law 31, 601–628. https://doi.org/10.1007/s10506-022-09319-6.
Yoshioka, M., Suzuki, Y., & Aoki, Y. (2023). Hukb at the coliee 2022 statute law task. In: New Frontiers in Artificial Intelligence: JSAI-isAI 2022 Workshop, JURISIN 2022, and JSAI 2022 International Session, Kyoto, Japan, June 12–17, 2022, Revised Selected Papers (pp. 109–124). Springer
Acknowledgements
Hai-Long Nguyen was funded by the Master, Ph.D. Scholarship Programme of Vingroup Innovation Foundation (VINIF), code VINIF.2023.ThS.075.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Vuong, THY., Nguyen, HL., Nguyen, TM. et al. NOWJ at COLIEE 2023: Multi-task and Ensemble Approaches in Legal Information Processing. Rev Socionetwork Strat 18, 145–165 (2024). https://doi.org/10.1007/s12626-024-00157-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12626-024-00157-3