Contribution Analysis of Large Language Models and Data Augmentations for Person Names in Solving Legal Bar Examination at COLIEE 2023

Onaga, Takaaki; Fujita, Masaki; Kano, Yoshinobu

doi:10.1007/s12626-024-00155-5

Contribution Analysis of Large Language Models and Data Augmentations for Person Names in Solving Legal Bar Examination at COLIEE 2023

Article
Open access
Published: 08 March 2024

Volume 18, pages 123–143, (2024)
Cite this article

Download PDF

You have full access to this open access article

The Review of Socionetwork Strategies Aims and scope Submit manuscript

Contribution Analysis of Large Language Models and Data Augmentations for Person Names in Solving Legal Bar Examination at COLIEE 2023

Download PDF

Takaaki Onaga¹,
Masaki Fujita¹ &
Yoshinobu Kano¹

271 Accesses
Explore all metrics

Abstract

This paper describes our system for COLIEE 2023 Task 4, which automatically answers Japanese legal bar exam problems. We propose an extension to our previous system in COLIEE 2022, which achieved the highest accuracy among all submissions using data augmentation. We focus on problems that include mentions of person names. In this paper, we present two main contributions. First, we incorporate LUKE as our deep learning component, which is a named entity recognition model trained on RoBERTa. Second, we fine-tune the pretrained LUKE model in multiple ways, comparing fine-tuning on training datasets that include alphabetical person names and ensembling different fine-tuning models. We confirmed that LUKE and its fine-tuned model on person type problems improve their accuracies. Our formal run results show that LUKE and our fine-tuning approach using alphabetical person names were effective, achieving an accuracy of 0.69 in the COLIEE 2023 Task 4 formal run.

A Study on the Impact of Intradomain Finetuning of Deep Language Models for Legal Named Entity Recognition in Portuguese

EduNER: a Chinese named entity recognition dataset for education research

Article 20 May 2023

On the Assessment of Deep Learning Models for Named Entity Recognition of Brazilian Legal Documents

1 Introduction

Competition for Legal Information Extraction (COLIEE) is an annual international competition held in conjunction with the International Conference on Artificial Intelligence and Law (ICAIL) and Juris-informatics (JURISIN) conferences [1, 5,6,7,8,9, 12,13,14]. COLIEE 2023 consists of four tasks: Tasks 1 and 2 are case law tasks that use datasets from the Canadian Federal Court, while Tasks 3 and 4 are statute law tasks that use the Japanese Legal Bar exam. In Task 3, a participant system is given a problem text and asked to retrieve relevant articles from Japanese Civil Law to solve the problem. In Task 4, a participant system is given a problem text and its relevant articles, and asked to determine whether the articles entail the problem text or not by answering Yes or No. We participated in Task 4. The analysis of problem types in previous COLIEE tasks [13] showed that the COLIEE dataset includes diverse types of problems. Some are relatively easy to solve, because the texts in the pairs are very similar, while others are complex and difficult, requiring parsing, semantics, anaphora, logic, etc. Previous Task 4 participant systems have included rule-based and deep learning-based systems, such as BERT [19], ELECTRA [11], and GNN [17]. However, previous systems have not performed well on problems that require inferences about person roles. In this paper, we focus on person name resolution, where person names/roles are represented using alphabetical letters. We propose a system that extends our previous system in COLIEE 2022, which achieved the highest accuracy among all submissions using data augmentations. Our proposed system provides two main contributions. First, while we use an ensemble of a rule-based component and a deep learning-based component, we adopt LUKE as our deep learning-based component, which is a named entity trained model based on RoBERTa, instead of BERT. Second, we fine-tune the pretrained LUKE model in multiple ways, comparing fine-tuned training datasets that include alphabetical person names and an ensemble of different fine-tuned models. Our formal run results show that LUKE and our fine-tuning approach for alphabetical person names are effective.

2 Related Works

LUKE [18] is a language model based on RoBERTa [10], which is a derivative of BERT [2]. BERT is a deep learning model that is commonly used in various NLP tasks, and it utilizes the encoder part of the Transformer [16] architecture. LUKE, on the other hand, uses a unique mechanism called Entity-aware Self-attention. LUKE treats not only words, but also entities as independent tokens, and computes intermediate and output representations for all tokens using the Transformer (Fig. 1). Since entities are treated as tokens, LUKE can directly model the relationships between entities. In this paper, we focus on the person type problems which include named entities of persons, thus LUKE is expected to work well with these issues. Furthermore, at the time of its development, LUKE achieved the highest accuracy in several NLP tasks. We adopt LUKE as the base model and fine-tune the pretrained LUKE model.

Hoshino et al. [4] is our previous work presented in COLIEE 2019. They proposed a rule-based system that parses sentences into clauses based on their original definition. The parsing results were then used to extract the set of clauses, including subject, predicate, and object for each clause, and compared these sets. They developed several modules, such as the Precise Match module, which compared the relevant civil law clauses with the clause set of the problem text and answered Yes if all the elements in the clause sets matched. Fujita et al. [3] is another recent work of ours in COLIEE 2022, which proposed an ensemble of the rule-based system developed by Hoshino et al.’s rule-based system and a BERT-based system. This system achieved the highest accuracy in the formal run of COLIEE 2022 Task 4. To address the issue of limited training data, we performed data augmentation such as logical inversion, replacement of person terms, and replacement of article numbers. In this paper, we extended our previous system by replacing BERT with LUKE and modifying the ensemble method to build different fine-tuned models depending on the type of problem.

3 System

3.1 System Overview

Our system comprises a rule-based component and an LUKE-based component. The LUKE-based component utilizes an LUKE model, which is fine-tuned on three different datasets: all training datasets provided by COLIEE, and two types of training datasets extracted from different problem types. The rule-based and LUKE-based components are integrated through ensemble, which performs binary classification, predicting either Yes or No based on the higher probability value. In the COLIEE Task 4 dataset, alphabetical characters are used to represent persons in the problem text, as illustrated in Fig. 2, which shows an example of a problem involving alphabetical person characters. It is necessary to determine the relationship between each person indicated by an alphabetical character and the person role described in the civil law text. In the example, A in the problem text represents a person who contracted as an agent of another person, B represents a different person, and C corresponds to a counterparty, as defined in the civil code text. Such problems are considered to be among the most challenging to solve automatically.

We focus on problems that involve alphabetical person names, and create separate LUKE models trained on such problems and trained on other problems. For the LUKE-based part, we prepare three LUKE models for comparison: an LUKE model trained on all data (LUKE-all), an LUKE model trained on problems with alphabetical person names (LUKE-person), and an LUKE model trained on problems without alphabetical person names (LUKE-nonperson).

While our previous system [4] had different modules with different matching methods for the clause sets, our previous study [3] showed that the Precise Match module was the most effective, answering Yes only when all pairs of subjects, objects, and predicates match. Therefore, we adopt the Precise Match module as our rule-based part. We fine-tuned a publicly available LUKE model (studio-ousia/luke-japanese-base-lite^{Footnote 1}) which was pretrained on Wikipedia articles, to output binary probabilities of Yes or No, given a problem text and a relevant civil law article as input.

In this section, we describe the design of our system as follows. First, we create additional training data using civil law articles (3.2). Second, after preprocessing the data, we select the most relevant civil law article for solving a given problem statement, based on the similarity of their texts (3.3). Third, we expand the training data by performing logical inversion and replacing person terms (3.4). Fourth, we fine-tune the LUKE model using these datasets. We split the datasets by year and create multiple models for all possible combinations of the training and validation datasets (3.5). Based on the methods above, we created three different submission models for our formal run results: KIS1, KIS2, and KIS3, which were designed for different types of problems (3.6). Among the three formal run submissions, KIS2 was our proposed system. KIS1 was an ensemble of an LUKE-based model using all of the training data and the rule-based system. KIS2 was an ensemble of KIS1 and a model trained specifically for problems in which alphabetical person names appear. KIS3 was an ensemble of a model trained specifically for problems in which alphabetical person names appear and a model trained specifically for problems in which they do not appear. Figure 3 illustrates these relationships. We applied our article selection preprocess (3.3) to the formal run test dataset.

3.2 Create Training Data from Article(s)

To increase the size of the official training dataset, we created an additional training dataset using the civil code articles without problem texts. In this subsection, we will refer to the relevant articles in COLIEE as premise (t1) and the problem text in COLIEE as hypothesis (t2) to avoid confusion, since both are taken from the articles. First, we divided the distributed civil law articles into sections and created pairs of identical civil code sections, setting their correct answer labels to Yes. For example, “A minor must obtain the consent of his/her legal representative to perform a legal act. However, this shall not apply to acts merely to obtain rights or to be relieved of obligations. (Civil Code Article 5)” and the same paragraph is paired with the label Yes. If the text of the article contains an exception sentence or proviso, such as “Provided, however, [...], this shall not apply.”, we divided the original article texts into a text before the sentence (a principle part) and after the sentence (a proviso part). If “However, [...], this shall not apply” describes an act, person, or right, we manually replace that act, person, or right in the principle part with an act, person, or right in the proviso part. Then, we invert the logic of the predicate as described in 3.4. In the example in Fig. 4, Article 5 of the Civil Code “However, this shall not apply to acts by which a minor merely acquires a right or is relieved of a duty.” was rewritten as “A minor need not obtain the consent of his or her legal representative to commit an act merely to obtain a right or to be relieved of a duty.” The subject normally appears in the principle part, but sometimes it appears in the proviso part. When the subject appears in the proviso part, we revert the affirmative/negation of the principle part using the method described later (3.4) and add it to the training dataset, sharing the same original premise (t1). Figure 5 shows an example.

3.3 Preprocess and Article Selection

First, we apply the following preprocessing steps to the articles and then select the relevant ones. A problem statement may have multiple related articles. If we concatenate the texts of all these articles as input, the input to the model may become too long, exceeding the upper limit (in our case, 512 tokens), and important parts may be lost when we truncate the input. To address this issue, we split the relevant articles into sections (each article consists of one or more sections). Then, we create all possible combinations of the divided sections (Fig. 6). We discard any combination in which the total number of tokens of the combined sections and the given problem text exceeds the upper limit.

If the generated text contains reference notations such as “preceding paragraph” or “Article XX”, we search the given relevant articles for the referred article and replace the reference notations with the text from the referred article (as shown in Fig. 7). The replaced version is then added to the training dataset. Notations such as “listed below” are substituted with the specified items in the article. Figure 8 provides an example of this process.

As shown in Fig. 4, the proviso part of an article describes an exceptional situation where the principle part does not apply. To understand the meaning of the proviso part, we need to include the principle part as well. Therefore, we concatenate the proviso part with its principle part, inverting the affirmation/negation of the latter. If the proviso part includes an act, person, or right, we replace the corresponding item in the principle part with the one in the proviso part. Among these preprocessed articles, we select most relevant article to solve the given problem by the similarity scores of the vectors obtained by Sentence Luke (sonoisa/sentence-luke-japanese-base-lite^{Footnote 2}). Sentence LUKE is a tool for creating advanced sentence vectors using the LUKE model (LUKE version of the Sentence BERT [15] in other words), which was pretrained by the Japanese Wikipedia and the Siamese network. We remove the suffixes of predicates, which could contain negation expressions. This is because we search for the most similar content regardless of affirmative/negative. Figure 9 shows an example.

3.4 Data Augmentation

Our previous COLIEE 2022 system [3] consisted of two expansions: negation expansion and person term replacement, which we describe below. In this year’s formal run, we have added more negative words and person terms to our manual dictionary. For negation expansion, we create a new sample by reversing the logic at the end of a sentence, along with its Yes or No answers, using a predefined list of affirmative and negation expression pairs. We apply this expansion to both pairs created from the Civil Code articles as described in the previous sections and the given problem text. However, we do not apply this expansion to problems with a gold standard answer of No, since the negative form at the end of a sentence does not always result in a Yes when the original answer is No. The COLIEE problems sometimes use alphabetical characters, such as A or B, to represent person names. Our person term replacement expansion addresses this issue by creating a dataset from the training data that replaces person names with alphabetical characters. We assign the alphabetical letters in the order of appearance, holding identical person names to be identical characters.

3.5 Combinatorial Split of Training and Validation Dataset

To fully utilize the COLIEE official training dataset, we created multiple models trained with different parts of the official dataset. We split the official dataset using various patterns, such as the cross-validation method, where we selected each 2-year period as a validation dataset and used the rest of the official dataset as its training dataset. After fine-tuning for each pattern, we applied an ensemble of these multiple models. We chose 2 years as our splitting unit, because it would be too many combinations if we split by year. Figure 10 illustrates this split method.

3.6 Fine-Tune for Alphabetical Person Names

When alphabetical letters are used as person names in the given problem text, a different approach is required to solve the problem, as it becomes necessary to determine which person the alphabetical character corresponds to in the relevant civil law article. Therefore, we fine-tune a model specifically for such problems. Additionally, we fine-tune a model for problems in which alphabetical person names do not appear. Each model internally performs an ensemble of the combinatorial split fine-tunes described in Sect. 3.5, and thus, the preprocessing steps described in Sects. 3.1 to 3.4 are applied before the fine-tuning. We regard a problem as an alphabetical person name type problem if it contains any single alphabetical character (as the original text is in Japanese except for these characters). As mentioned earlier, KIS2 and KIS3 use the model fine-tuned with problems containing alphabetical characters, while KIS1 uses the model fine-tuned without them. During binary classification, a fully connected linear transformation is performed on the output of the last layer’s node corresponding to the “<s>” token (or the “[CLS]” token in the case of BERT) for both Yes and No answers. Then, the classification scores are compared to determine whether the answer is Yes or No. For fine-tuning, the classification scores are converted into probabilities for each label using the Softmax function, and the loss is calculated using cross-entropy.

3.7 Ensemble Prediction

Finally, we perform an ensemble of our rule-based part and our LUKE-based part. The rule-based (precise match module) is the same as in our previous work, which has high precision but a low number of answerable problems. Therefore, we first apply the rule-based part when applicable, and then apply the LUKE-based part when the rule-based part is not applicable. For the LUKE-based part, we have prepared three models: LUKE-all (fine-tuned on all of our datasets), LUKE-person (fine-tuned on problems with alphabetical person names), and LUKE-nonperson (fine-tuned on problems without alphabetical person names). KIS3 applies LUKE-person when the problem includes alphabetical person names and applies LUKE-nonperson when the problem does not include any alphabetical person names. Similarly, KIS2 applies LUKE-person in the same way but uses LUKE-all when the problem does not include any alphabetical person names. If the rule-based part is not applicable, KIS always applies LUKE-all.

4 Experiments and Results

4.1 Fine-Tune Parameters

We performed our fine-tuning with the following parameters: maximum tokens length of 512, batch size of 32, learning rate of 1e-5, and a maximum number of epochs of 10 but terminates early due to Early Stopping.

4.2 COLIEE 2023 Formal Run Results

Table 1 shows the results of all teams in the COLIEE 2023 Task 4’s formal run, where KIS is our team name.

Table 1 COLIEE 2023 Task 4’s formal run results for each participant’s submission. # represents the number of correct answers; Acc represents Accuracy. The submission IDs in bold are our submissions

Full size table

4.3 Previous COLIEEs’ Formal Run results

Table 2 shows the results of our experiments using previous formal runs of COLIEE 2019, 2020, and 2021 (test datasets are H30, R01, and R02, respectively) as required by the organizers.

Table 2 Numbers of correct answers and accuracies in previous formal run datasets

Full size table

4.4 Comparison of BERT and LUKE

Table 3 shows the results of the experiments on the formal run and the past formal runs using BERT and LUKE. Each cell shows numbers of correct answers with total numbers of problems from H30 to R04; the all column shows the total numbers, the person column shows the numbers for problems containing characters of the alphabetical person names, and the nonperson column shows the numbers for problems without the alphabetical person names (Table 4). The results of this table show that the correct numbers of the LUKE model is larger than the BERT model in H30 and R04. Especially in R04, LUKE improved the performance of the alphabetical person names problems. On the other hand, BERT had higher performance in R01 and similar performance in R02.

Table 3 The number of correct answers by problem types (person: problems of person type, nonperson: others, all: including both of person and nonperson), for BERT and LUKE

Full size table

Table 4 The number of correct answers by problem types (other than person) for BERT and LUKE

Full size table

4.5 Evaluation of Fine-Tune Models Without Ensemble Using Previous Formal Runs

Table 5 shows the evaluation results of the individual fine-tuned LUKE models on the formal run of COLIEE 2023 and the formal runs of the past three years. Each fine-tuned model was evaluated independently without any ensemble. We evaluated the models separately for the problems with alphabetical person names (person) and others (nonperson). The results show that the person model, which is fine-tuned by person type problems, worked better than other models in all of the datasets.

Table 5 Number of correct answers of three patterns of fine-tuned LUKE models (all, person, and nonperson), for each training/test datasets (H30, R01, R02, and R04), dividing into person type problems (P) and others (N)

Full size table

5 Discussion

The individual results of the fine-tuned models (Table 5) demonstrate that the fine-tuning was effective for the corresponding type of problems but not for the other types. Our team’s formal run results (Table 1) and the results of our experiments using past formal runs (Table 2) also showed that KIS2, which is an ensemble using the fine-tuned model for alphabetical person names, achieved the highest score. Table 3 shows that LUKE and BERT have different percentages of correct answers. We analyzed the patterns in which either LUKE or BERT answered problems correctly. Figure 11 shows an example problem that can be answered without analyzing the alphabetical person names, even though they appear in the problem text. Such problems could be correctly answered by BERT. As shown in Fig. 12, R04–08-A is an example of a person name problem where LUKE was correct and BERT was incorrect. In this problem, the gold label is “No”, because “B consented to this” in the problem text is different from “a third party consented to this” in the article, since B is an agent and C is a third party. LUKE was able to predict that the label for this problem would be “No”. This example suggests that LUKE might be more proficient in understanding personal relationships compared to BERT. While LUKE itself slightly improved its performance compared with BERT (Table 3), LUKE works significantly better when fine-tuned with the person type problems, which corresponds to the highlighted cells in Table 5; the person type problems (P) were better solved by the person fine-tuned model than other models in any case. By manually checking the problems, we found that among 13 problems, that were correctly answered by LUKE and its person fine-tuned model than BERT, 11 problems were the type of the above explanation, which require to analyze the alphabetical person names.

We analyzed the results of our article selection by Sentence LUKE and found an unsuccessful example shown in Fig. 13. In this example, our system selected Article 5, “A minor shall obtain the consent of his/her legal representative in order to perform a legal act. Any legal act contrary to the provisions of the preceding paragraph may be revoked”, while Article 124-2, item 2 was required to solve the problem. The non-relevant article our system selected shares similar tokens with the problem text, such as “minor” and “consent”, but the relevant article also shares these tokens. This may be because abstract paraphrases like “Any legal act contrary to the provisions of the preceding paragraph may be revoke” make the cosine similarities larger. Pretraining and fine-tuning on legal documents and paraphrase preprocessing into everyday language may help improve this issue.

Next, we compare the extent to which our three data extension methods have contributed to improve the accuracy of the model. Our three data extension methods applied to the training data augmentations include: (i) the data created from civil law articles described in Sect. 3.2, (ii) the negation expansion, and (iii) the person term replacement described in Sect. 3.4. As a comparison analysis, We applied one of the three data extension methods before fine-tune the BERT model. We also compare the fine-tuned BERT models with all three extensions (our proposed model), and without any of the three extensions, thus five patterns in total. We use the dataset of each of the years H30, R01, R02, and R04 for evaluation, use the dataset of years prior to the year used in evaluation for training; these training-evaluation pairs correspond to the past formal run settings. Within the training dataset, we performed 11-fold cross-validations, resulting in 11 fine-tuned models. Our final prediction results are decided by majority votes between these 11 models. Using our human-created problem type classifications, we counted the number of correctly answered problems for each fine-tuned model for each problem type (Table 6 correspond to H30, R01, R02, and R04, respectively, and Table 7 shows the total number of correct answers for each model question type.). When expanding data using articles, accuracy improvements were observed in many problem types. The negation expansion showed significant contributions in problems involving negation problem types as expected. Data augmentation by person replacement was expected to contribute to the Person problem type (where person names are represented as alphabetical symbols such as A and B); H30 and R01 showed a positive contribution, while we could not observe positive contributions in other years. These result would suggest that the training dataset is still insufficient after augmenting the Person problem type by person replacement, as the alphabetical symbols could appear as a variety of different roles.

Table 6 Problem type counts by year and model.

Full size table

Table 7 Problem type counts by model

Full size table

6 Conclusion and Future Works

We extended our previous system from COLIEE 2022 by performing an ensemble of the rule-based part and the LUKE-based part for COLIEE 2023 Task 4. We discriminated problems into two types based on whether they included alphabetical person names or not, and fine-tuned three different datasets on these two types of problems and all problems. We confirmed that our fine-tuned model for alphabetical person names improved the overall accuracy for those types of problems, achieving 0.69 accuracy in the formal run for COLIEE 2023 Task 4. Our future work includes improving the data split method and processing other types of problems, as well as working on improving the accuracy of article selection.

Notes

References

Competition on legal information extraction/entailment (coliee-14) workshop on ju-ris-informatics (jurisin) 2014 (2014). http://webdocs.cs.ualberta.ca/miyoung2/jurisin_task/index.html
Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota. https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423
Fujita, M., Onaga, T., Ueyama, A., & Kano, Y. (2022). Legal textual entailment using ensemble of rule based and bert based method with data augmentaion by rekated articke generation. In: Proceedings of the Sixteenth International Workshop on Jurisinformatics (JURISIN 2022), pp. 84–97.
Hoshino, R., Kiyota, N., & Kano, Y. (2019). Question answering system for legal bar examination using predicate argument structures focusing on exceptions. In: Proceedings of the Sixth International Competition on Legal Information Extraction/Entailment (COLIEE), pp. 38–42.
Kano, Y., Kim, M.Y., Goebel, R., & Satoh, K. (2017). Overview of coliee 2017. In K. Satoh, M.Y. Kim, Y. Kano, R. Goebel, T. Oliveira (Eds.) COLIEE 2017. 4th Competition on Legal Information Extraction and Entailment, EPiC Series in Computing, vol. 47, pp. 1–8. EasyChair. https://doi.org/10.29007/fm8f. https://easychair.org/publications/paper/Fglr
Kano, Y., Kim, M. Y., Yoshioka, M., Lu, Y., Rabelo, J., Kiyota, N., Goebel, R., & Satoh, K. (2019). Coliee-2018: Evaluation of the competition on legal information extraction and entailment. In K. Kojima, M. Sakamoto, K. Mineshima, & K. Satoh (Eds.), New Frontiers in Artificial Intelligence (pp. 177–192). Cham: Springer International Publishing.
Chapter Google Scholar
Kim, M.Y., Goebel, R., Kano, Y., & Satoh, K. (2016). Coliee-2016: Evaluation of the competition on legal information extraction and entailment.
Kim, M.Y., Goebel, R., & Satoh, K. (2015). Coliee-2015: Evaluation of legal question answering.
Kim, M.Y., Rabelo, J., Goebel, R., Yoshioka, M., Kano, Y., & Satoh, K. (2023). Coliee 2022 summary: Methods for nbsp; legal document retrieval and nbsp; entailment. In: New Frontiers in Artificial Intelligence: JSAI-IsAI 2022 Workshop, JURISIN 2022, and JSAI 2022 International Session, Kyoto, Japan, June 12–17, 2022, Revised Selected Papers, pp. 51–67. Springer-Verlag, Berlin, Heidelberg. https://doi.org/10.1007/978-3-031-29168-5_4
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L. & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. In: arXiv preprint arXiv:1907.11692
Minh-Quan, B., Chau, N.M., Do, D.T., Le, N.K., Nguyen, D.H., & Nguyen, T.T.T. (2022). Jnlp team: Using deep learning approaches for tackling legal’s challenges in coliee 2022. In: Proceedings of the Sixteenth International Workshop on Jurisinformatics (JURISIN 2022), pp. 70–83.
Rabelo, J., Goebel, R., Kim, M. Y., Kano, Y., Yoshioka, M., & Satoh, K. (2022). Overview and discussion of the competition on legal information extraction/entailment (Coliee) 2021. Review of Socionetwork Strategies, 16(1), 111–133.
Article Google Scholar
Rabelo, J., Kim, M.Y., Goebel, R., Yoshioka, M., Kano, Y. & Satoh, K. (2020) Coliee 2020: Methods for legal document retrieval and entailment. In: New Frontiers in Artificial Intelligence: JSAI-IsAI 2020 Workshops, JURISIN, LENLS 2020 Workshops, Virtual Event, November 15–17, 2020, Revised Selected Papers, p. 196–210. Springer-Verlag, Berlin, Heidelberg. https://doi.org/10.1007/978-3-030-79942-7_13
Rabelo, J., Kim, M. Y., Goebel, R., Yoshioka, M., Kano, Y., & Satoh, K. (2020). A summary of the Coliee 2019 competition. In M. Sakamoto, N. Okazaki, K. Mineshima, & K. Satoh (Eds.), New Frontiers in Artificial Intelligence (pp. 34–49). Cham: Springer International Publishing.
Chapter Google Scholar
Reimers, N. & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics, Hong Kong, China. https://doi.org/10.18653/v1/D19-1410. https://aclanthology.org/D19-1410
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 6000–6010. Curran Associates Inc., Red Hook, NY, USA.
Wehnert, S., Kutty, L. & Luca, E.W.D. (2022). Using textbook knowledge for statute retrieval and entailment classification. In: Proceedings of the Sixteenth International Workshop on Jurisinformatics (JURISIN 2022), pp. 137–146.
Yamada, I., Asai, A., Shindo, H., Takeda, H. & Matsumoto, Y. (2020). LUKE: Deep contextualized entity representations with entity-aware self-attention. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6442–6454. Association for Computational Linguistics, Online. https://doi.org/10.18653/v1/2020.emnlp-main.523. https://aclanthology.org/2020.emnlp-main.523
Yoshioka, M., Suzuki, Y., & Aoki, Y. (2022). Hukb at the coliee 2022 statute law task. In: Proceedings of the Sixteenth International Workshop on Jurisinformatics (JURISIN 2022), pp. 33–46.

Download references

Acknowledgements

This research was partially supported by MEXT KAKENHI 00271635, JP22H00804, Japan, and SECOM Science and Technology Foundation.

Author information

Authors and Affiliations

Faculty of Informatics, Shizuoka University, Hamamatsu, Shizuoka, Japan
Takaaki Onaga, Masaki Fujita & Yoshinobu Kano

Authors

Takaaki Onaga
View author publications
You can also search for this author in PubMed Google Scholar
Masaki Fujita
View author publications
You can also search for this author in PubMed Google Scholar
Yoshinobu Kano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoshinobu Kano.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Onaga, T., Fujita, M. & Kano, Y. Contribution Analysis of Large Language Models and Data Augmentations for Person Names in Solving Legal Bar Examination at COLIEE 2023. Rev Socionetwork Strat 18, 123–143 (2024). https://doi.org/10.1007/s12626-024-00155-5

Download citation

Received: 05 September 2023
Accepted: 11 January 2024
Published: 08 March 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s12626-024-00155-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Contribution Analysis of Large Language Models and Data Augmentations for Person Names in Solving Legal Bar Examination at COLIEE 2023

Abstract

Similar content being viewed by others

A Study on the Impact of Intradomain Finetuning of Deep Language Models for Legal Named Entity Recognition in Portuguese

EduNER: a Chinese named entity recognition dataset for education research

On the Assessment of Deep Learning Models for Named Entity Recognition of Brazilian Legal Documents

1 Introduction

2 Related Works