Data Augmentation and Transfer Learning for Cross-lingual Named Entity Recognition in the Biomedical Domain

Given the increase in production of data for the biomedical �eld and the unstoppable growth of the internet, the need for Information Extraction (IE) techniques has skyrocketed. Named Entity Recognition (NER) is one of such IE tasks useful for professionals in different areas. There are several settings where biomedical NER is needed, for instance, extraction and analysis of biomedical literature, relation extraction, organisation of biomedical documents, and knowledge-base completion. However, the computational treatment of entities in the biomedical domain has faced a number of challenges including its high cost of annotation, ambiguity, and lack of biomedical NER datasets in languages other than English. These di�culties have hampered data development, affecting both the domain itself and its multilingual coverage. The purpose of this study is to overcome the scarcity of biomedical data for NER in Spanish, for which only two datasets exist, by developing a robust bilingual NER model. Inspired by back-translation, this paper leverages the progress in Neural Machine Translation (NMT) to create a synthetic version of the CRAFT (Colorado Richly Annotated Full-Text) dataset in Spanish. Additionally, a new CRAFT dataset is constructed by replacing 20% of the entities in the original dataset generating a new augmented dataset. Further, we evaluate two training methods: concatenation of datasets and continuous training to assess the transfer learning capabilities of transformers using the newly obtained datasets. The best performing NER system in the development set achieved an F-1 score of 86.39%. The novel methodology proposed in this paper presents the �rst bilingual NER system and it has the potential to improve applications across under-resourced languages.


Introduction
In the eld of Natural Language Processing, Named Entity Recognition (NER) is not a new notion.Since its introduction by the MUC-6 in 1995, it has been a subtask of Information Extraction with substantial research reported.Named Entities (NEs) are textual items of interest with people, organisations, locations, and numbers being common examples.The aim of NER is to recognise and categorise different types of named entities in a structured or unstructured text.This domain has attracted a lot of interest.Starting with rule-based systems (Appelt et al. 1995;Weischedel, 1995), followed by machine learning models (Bam, and Shahi, 2014; Borthwick, 1999), and deep learning models (Lin et al. 2017; Chiu, and Nichols, 2016; Devlin et al. 2019), the development of systems capable of recognising these NEs has been considerable.
In the general domain, state-of-the art models can produce excellent results.However, research in specialised domains involving different NE classes has not grown at the same rate.The biomedical domain, for example, is expanding as medical records become more computerised and online biomedical research becomes more accessible.According to Microsoft (n.d.), PubMed adds two biomedical papers every minute, thousands every day, and over a million every year.NER is critical for Natural Language Processing (NLP) since NEs serve as both a referential base for nding information in texts and as important pieces of information.A news article could be summarised in the general domain by extracting the ve Ws (who, what, when, where, and why).Each W often corresponds to a NE (Zhang, Pan, and Zhang, 2004, p. 1).Similarly, the biomedical domain makes use of NER since denominations for genes, proteins, and diseases, among other things, are crucial bits of information for researchers and biomedical experts in situations like literature-based discovery and relation extraction.Because biomedical information is primarily published in English, data in other languages, especially NER datasets, are sparse.To ll in this gap, this research will try to overcome the lack of data in other languages by investigating and analysing the best opportunities for developing a bilingual model that can be used in both Spanish and English.
The available datasets for training NER and other NLP systems appear to be insu cient to handle all new information as the diversity and size of online data grows.The cost of manually annotating data for the biomedical domain is high, due to its level of speciality.Moreover, biomedical NER faces challenges such as variations in spelling, synonyms, and unknown vocabulary, which slows the development of new systems.In this paper we explore two data augmentation techniques to increase the size of the dataset: (a) translation of the dataset using a commercial machine translation system to create a dataset in another language; and (b) entity replacement in which a new dataset is constructed by replacing part of the entities in the original dataset (Liu et al. 2020).
Additionally, in order to create a cross-lingual NER model, we propose the use of Transfer Learning, which is the process of training a model using previously learnt parameters from a pretrained model (Hira et al. 2019), i.e., use the parameters of a pretrained model in the X domain to initialise a model in the Y domain.Continuous training is de ned as the sequential training of a system using previously obtained knowledge from one or more data sources.Transfer learning is commonly used to ne-tune general models on new domains or languages with promising results using transformers.
The original methodology that we put forward in this paper will make it possible for additional languages to bene t from biomedical datasets for NER.Our novel approach is portable to other languages and to the best of our knowledge, has not been proposed in the biomedical domain.
The following are this work's main contributions: The creation of a synthetic version of the CRAFT corpus in Spanish for the biomedical domain using a cheap translation approach based on back-translation.Using entity replacement, a separate version of the original CRAFT in English was also produced.
The rst bilingual NER system (ES-EN) for the biomedical area, which achieved the second highest F1 score compared to the literature's reports for systems trained in the monolingual CRAFT dataset.
The rest of the paper is structured as follows.Section 2 surveys related work in the eld and Section 3 presents our methodology providing details on the experiments conducted.Section 4 discusses the evaluation results and nally Section 5 summarises the conclusions of this research.

Related Work
This section outlines biomedical datasets and NER systems rst.Then, cross-lingual approaches to NER are discussed.
Although biomedical NER is not a new concept, there is not a global agreement on the entity classes or the annotation criteria, which has led to a handful of datasets with different entity classes and different annotation guidelines, creating inconsistency in the task.For instance, chemical entities are represented in the CHEMDER dataset, which includes the tags: Abbreviation, Family, Formula, Identi er, Multiple, No Class, Systematic, and Trivial (Krallinger et al. 2015).Another chemical dataset is the BC5CDR which also includes diseases (Li, 2016).NCBI is a full disease dataset created from 793 PubMed abstracts (Doğan, Leaman, and Lu, 2014).For proteins/genes identi cation, the GENIA dataset has a total of 23,996 tagged genes/proteins (Tanabe et al. 2005).The CRAFT dataset also contains genes and proteins but is tagged using instructions from ontologies which might differ from other datasets' annotation.By combining the datasets provided by the MEDIQA challenge with a subset of the MedQuAD The lack of standard global guidelines has impeded a uni ed effort such as that seen in the general domain.As shown above, the entity types differ greatly in all datasets in Spanish or English.Since the objective of this study is to build a single e cient bilingual NER system, the use of a dataset of each language for a single model is not viable, as might be the case for the general domain using the CONLL task's dataset for Spanish and English.Rather, this would bring inconsistency and poor performance to the NER system.
Early Named Entity Recognition systems utilised rule-based and supervised methods.For instance, an HMM model worked for the entity classes of proteins, RNA, DNA, and cells (Ponomareva et al., 2007), while a semi-Markov model demonstrated the capacity of such models to integrate information across all tokens in a segment (Leaman and Lu, 2016).Neural Network (NN) models tested Convolutional Neural Networks (CNNs) plus character embeddings and obtained an F1 score of 76.39, while a bidirectional long short-term memory (BiLSTM) character embedding approach achieved a 76.94 F1 score for a diseases dataset (Sahu and Anand, 2016).As with the general domain, BiLSTM-CRF (Conditional Random Field) models are popular in this domain (Li et  Biomedical NER faces a host of issues: the style and speciality of biomedical data; ambiguity, as some names can refer to one entity class or the other; and the constant discovery and coining of new terms, which creates the issue of unknown words, among other challenges.For Zhao et al. (2021, p. 5) the di culties faced when dealing with biomedical NER are that the biomedical terms have many variations: (1) small variations such as typos, hyphens, or capitalisation, e.g., 'FOXP2' and 'FOX-P2', (2) synonyms and abbreviations, and (3) unseen entities.Additionally, for Cho and Lee (2019, p. 5), the di culties are the entity boundaries, compound noun phrases, bracket enclosed entities, nested entities, and the corpus annotation inconsistency.These variations of the NEs complicate the recognition and normalisation of such entities.Another challenge in biomedical NER is the so-called "long-tail" Nes, i.e. "named entities that are rare, often relevant only in speci c knowledge domains, yet important for retrieval and exploration purposes."(Liu et al. 2020b, p. 79).
Hence, one might conclude that the biomedical NER is far from solved, even with state-of-the-art NLP systems.E cient NLP techniques are needed for automating the extraction and analysis of biomedical data and would help synchronise the efforts across languages.
Previous works have tested the ability of transformer systems to generalise enough and provide competitive results when trained in another language.Sun and Yang (2019) tested the usability of mBERT (without biomedical data training) and BioBERT (without training in data in Spanish) for the PharmaCoNER dataset.Their results reported competitive F1 scores in both systems (89.24 and 89.02 respectively), as well as the success of current systems in zero-shot transfer.Hakala and Pyysalo (2019) presented an approach of using mBERT for Spanish biomedical named entity recognition without further training achieving an F1 score of 87 in the test set on the PharmaCoNER dataset.2017) used dictionaries to translate an English dataset into several languages and language resources such as PANLEX and created a phrase translation table in which the labels are copied from source to target sentence.They also tested translating a dataset with Google translate but report di culties with the alignment and projection of entities into the new language, which according to the authors resulted in a noisy dataset with incorrect entity tags.Finally, Li et al. (2020) created a model that labels a bilingual corpus and processes NER, then uses GIZA + + for the alignment and extracts NE translation pairs, which are then ranked by calculating the mutual information (MI) value.

Methodology And Experiments
The selection of the dataset(s) to be used is one of the most important factors to consider when training a system for NER (or any other task).A dataset with a wider coverage of NEs would be more bene cial to Spanish users, especially if the NEs it contains have never been explored before.In the general domain there are datasets for English and Spanish that have the same number of entity tags, the same annotation guideline, and the same class names, such as the CONLL NER task.This is not the case in the biomedical domain, where most datasets only contain a small number of NE classes based on the task at hand.As previously stated, the use of a dataset from the same family with golden annotation is ruled out.This study will use the CRAFT [4] corpus to train the NER systems as it is one of the datasets with a larger number of entity tags.The CRAFT corpus is a collection of 97 full-text articles from the biomedical domain, manually annotated (version 1 includes 67 articles), and around 100,000 annotations of nine ontologies: Cell type ontology (CL), Chemical entities of biological interest (CHEBI) ontology, NBCI taxonomy (Taxon), Protein Ontology (PR), Sequence Ontology (SO), entries of the Entrez gene database, and three subontologies from the Gene Ontology (GO) (Bada et al. 2012).Our goal is to develop a system able to transfer the knowledge from one of the biggest and most diverse datasets for biomedical NLP from English to Spanish.
The entities contained in the CRAFT corpus are described by  as: Chemical: Chemical Entities of Biological Interest refer to atoms, biomedical roles and applications, subatomic particles, molecules, and polyatomic entities.
Cell: All cell mentions except the types of cell line cells.
Gene: Biological processes, including at the level of molecules, and subcellular structures.Also: cellular components representing subcellular structures, both intracellular and extracellular; and macromolecular complexes.
Taxon: Biological taxonomy and their corresponding organisms.
Protein: Based on the protein oncology without regard to sequence type.
We have normalised the annotation in the dataset, meaning that instead of three-letter codes for entities we used the full name of the entity: Protein, Cell, Taxon, Sequence, Chemical, and Gene.This dataset uses the IOB annotation scheme: B-beginning, I-inside and O for tokens not corresponding to any tag, e.g., B-Protein, and I-Taxon.The dataset is divided into three parts: a training set (10,875 elements), a test set (7,425 elements), and a validation set (3,730 elements).
One important point to note is that the CRAFT dataset does not have a Spanish equivalent.Given the scarcity of datasets in Spanish for the biomedical domain, we would like to offer new ways to improve biomedical systems for Spanish users.As a result, to boost the dataset's size, we used data augmentation techniques.

Data Augmentation
Data augmentation is a strategy to increase a system's performance by generating more training data.Two strategies for construction of the utilised datasets are presented next: Cheap translation and Entity replacement.

Cheap Translation
This initial data augmentation strategy used to train a bilingual NER system was inspired by the success of back-translation in NMT, which is the production of a synthetic dataset via machine translation (Edunov et al., 2018).Google Translate is one of the most widely used machine translation technologies in the world.It supports over 100 languages and its mobile app has been downloaded over 1 billion times (Pitman, 2021).Caswell and Liang (2020) reported a change to the transformer architecture on Google's MT system, which resulted in a + 5 BLEU score increase in high-resource languages and a + 7 BLEU score increase in low-resource languages on average.This system was chosen to translate the CRAFT dataset into Spanish because of its constant development, widespread use, and M4 modelling for multilingual transfer learning.This technique, as Mayhew et al. (2017, p. 3) called it, is a "cheap translation" method to get more data (even more than back-translation itself), as no monolingual data in Spanish was back-translated to create an English system, nor used parallel data.This synthetic new dataset will be used to train NER systems, employing different methods.NMT systems, especially those that are not domain-speci c, i.e., Google, are prone to errors, so this new MT translated dataset is not intended to be a gold-standard corpus for developing systems for Spanish only.

Mayhew et al. (ibid.
) used dictionaries and Google to translate an English dataset.Their dataset was constructed by translating sentences one at a time, using fast_align to obtain alignments, and then projecting the tags, which, according to the authors, resulted in errors in the tag projection and noise.We used a similar process to create a new dataset for Spanish without the use of dictionaries.Because the CRAFT dataset is preformatted in the CONLL format, we reconstructed all of the words with an "O" tag to make real sentences, as well as multi-word NEs.We fed Google Translate with the sentences separated from the NEs, either single words or multi-words.Then, we mapped back the tags to avoid errors in the misalignment of sentences.The translation process of this dataset is described as follows: Concatenate words with the same tag to create sentences, as NMT systems tend to create better output when a sentence is provided instead of a single word.
Preserve a unique tag per sentence for future mapping.
Pass the datasets (training, test, and development) onto Google translator in an .xlsxformat to preserve the order of the sentences and assure the tags match.
Collect the output from Google Translate.
Assign the labels matching the original annotation previously stored.Since the reconstruction created sentences, a single tag is assigned per sentence.
Tokenise each sentence and assign the proper tag using the BIO scheme.All "O" sentences keep only the "O" tag, whereas the entities get "B" for the beginning of the entity and "I" for the rest of the tokens in composed entities.
Figure 1 shows the translation process of a real sentence from the development subset of the CRAFT corpus.
The process creates a new translated Spanish CRAFT dataset.Polyglot systems tend to preserve essential weights for each language, as well as sharing parameters, according to Mueller et al. (ibid.).As a result, the construction of this dataset in Spanish is a strategy to improve performance in a transfer learning environment.Table 1 shows an example of the nal output of a sentence in the original English dataset and the output obtained from the translation system.
Although the translation technique is not fully automated, it guarantees that the noise caused by aligners and projecting tags is completely avoided, resulting in a signi cantly more reliable synthetic dataset.

Entity Replacement
In order to increase system performance and create a robust cross-lingual NER system, we follow Liu et al.'s (2020) approach for data augmentation: replacement of entities.Random replacement of existing entities with unseen entities is used to create new datasets for a system.We compiled a set of entities based on the o cial ontologies that were used to annotate the entities in the original CRAFT corpus.To ensure that all of the objects retrieved from ontologies are, in fact, new, this list has been cross-checked against the original dataset's vocabulary.The results for each tag are as follows: Protein [5] We concatenated some of the datasets to make different training instances that may be used to measure continuous training and compare it to dataset concatenation.
We have a total of ve datasets at this point:

Named Entity Recognition Systems Training
Inspired by the success of a monolingual system to evaluate its capabilities in another language (Sun and Yang, 2019) and (Hakala and Pyysalo, 2019), we employed the pretrained "BioBERT," a variant of the well-known BERT in English pretrained on biomedical texts.The second transformer option is the "Roberta-base-biomedical-clinical-es" for Spanish domain-speci c corpora, which is a pretrained RoBERTa-based transformer.Even though the literature has shown promise for zero-shot transfer, we opted to investigate transfer learning approaches to develop more robust systems and a consolidated bilingual unique system.Research shows that continuous training improves the metrics for MT systems, and that systems retain and share important parameters from different languages Mueller et al. (2020).This study will test this strategy in NER.As one of the main issues of biomedical NER is the unknown words.These systems can bene t from learning words from two different languages in different training instances.
We conducted two types of ne-tuning: direct ne-tuning and continuous training.Direct ne-tuning entails using concatenated data from previously constructed augmented datasets, such as the CRAFT EN + ES dataset.This ne-tuning strategy simply employs one system and one dataset to produce a single output, thus no additional training or ne-tuning is required.Continuous training, on the other hand, entails ne-tuning a model with an initial dataset, e.g.CRAFT EN, and then ne-tuning the resulting system with a new dataset, e.g.CRAFT ES.Because augmentation methods have not been employed in the biomedical NER task, to the best of our knowledge, it as impetus to evaluate both training strategies.
Since transfer learning is prone to catastrophic forgetting, in which a portion of the original weights is replaced by the new training, we chose to test both training strategies in order to compare the performance of the different datasets in different types of training.We have trained fourteen systems: two base systems on the original English-only dataset, six systems using direct ne-tuning with the concatenated datasets, and the remaining six systems using the continuous training approach for English plus Spanish and entity replacement with the enhanced datasets (EN augmented).
All systems were ne-tuned with the following hyperparameters: learning rate: 3e-05; train batch size: 8; optimiser: Adam; betas: 0.9, 0,99; epsilon: 1e-08; epochs: 4. The following is a list of the names that will be used to refer to them.
Base models: Concatenating the datasets gives competitive results but not the highest possible scores.Nonetheless, the best scores using concatenation were the ones without further data augmentation (only the Spanish dataset).This suggests that using concatenation with data augmented datasets, at least with the proposed augmentation technique, does not yield satisfactory results and even results in lower performance than the base models.The basic concatenated models (EN-ES) preserved the top scores in the direct ne-tuning category.However, the ne-tuning times doubled the ones in the continuous training, which makes continuous training more suitable for local training.
The translation of the dataset to create a Spanish dataset showed to be bene cial in terms of enhancing the model's knowledge and providing higher F1 scores.It is worth pointing out that the two systems with the highest scores, 86.39 and 86.25, respectively, were trained on the Spanish dataset.Surprisingly, concatenation of the dataset plus the augmented datasets revealed that the performance gures do not rise but rather drop when employing concatenation.Table 2 shows the results of all the training instances with the different datasets.Both base models at the bottom of the table are monolingual.The results are reported on the evaluation dataset.As shown in Table 3, in comparison to the second-best-scoring system, the top system outperforms systems in the literature for the same dataset by + 7.33 F1 score points (Crichton et al., 2017).The topperforming system by Furrer et al. ( 2021) is a combination of BioBERT and OGER, which uses a dictionary-based approach and fares better than our best-performing system by only 0.46 F1 score points.Given that our system will perform in another language and will not employ dictionary-based techniques, these competitive results hold promise.

Conclusions
One of main contributions of this study is the use of an English dataset in the biomedical domain to create a bilingual system.This NER system not only identi es NEs in both languages, but it does so using one of the datasets with the highest number of NEs (6), and furthermore, they are NEs that have not been annotated in Spanish texts.The synthetic dataset for Spanish was created using a "cheap translation" inspired by the success of back translation for NMT.The F1 score difference from the best performer, as shown in Table 3, is just 0.46 F1-score points, surpassing most systems in the literature and placing second, proving that the NER system's training was successful.In contrast to the greatest performance in literature, our system is able to recognise Nes in two languages, EN and ES, and it is independent of dictionaries or external information.
Entity replacement was used as a second augmentation technique.Entities of the same tag were replaced with data extracted from the o cial ontologies employed by the annotators of CRAFT, as one of the issues was the OOV words, which are common in the biomedical domain due to the constant generation of terms.As a result, a different CRAFT corpus was generated with 20% of its entities replaced.In the future, alternative percentages of replacement might be evaluated and reported, as well as an attempt to replace the entire dataset.
In total, fourteen systems were trained for NER using a variety of dataset combinations, including the original dataset plus the Spanish version or the new CRAFT in English.We employed either direct netuning by concatenating the datasets or continuous ne-tuning by sequentially training the separate datasets and using the transfer learning success of transformers.Systems trained via transfer learning performed best, while systems trained by concatenating datasets showed a downward trend in F-score.As in Lamurias et al. (2019), the performance of the system did improve by combining datasets, but the transfer learning technique proved to be better.The scarcity and di culty of biomedical annotated data in other languages can be solved by combining data augmentation with transfer learning, which also uses state-of-the-art systems.Such a method might be applied to different languages to improve results using in-domain data without relying entirely on the transformers' zero-shot capabilities, as well as for languages where zero-shot is not an option.
Our novel methodology will make it possible for another language to bene t from one of the biggest biomedical datasets for NER.This dataset, and methodology, can be bene cial for researchers and future studies on other datasets to cover more NE classes.Translations of the dataset in other languages can be used, either from English to the target language or by leveraging the already translated Spanish dataset to obtain another in a close (Romance) language.This approach is portable to other languages, provided any modern MT system has support for it.Incorporating an EWC into the training process to prevent catastrophic forgetting and increase the learning of new weights in the NER system is another potential future project.

Declarations
Authors' Contributions BSL designed and performed the experiments, derived the models, analysed the data, and wrote the main manuscript.Both GC-P and RM contributed to the nal version of the manuscript and supervised the project.All authors provided critical feedback and helped shape the research, analysis, and manuscript. Figures dataset, Lamurias et al. (2019) presented research on data augmentation for datasets for the Question Answering task, reporting an increase of 0.015 in accuracy in the test set and a decrease of 0.02 in accuracy in the development set.The Spanish language has not seen the same progress in annotation.The rst biomedical NER task in Spanish was called PharmaCoNER, using a chemical dataset containing four tags: No normalizables, Normalizables, Proteinas and Unclear (Gonzalez-Agirre et al. 2019).Focusing on cancer vocabulary, Miranda-Escalada et al. (2020) created the CANTEMIST task for named entity recognition.This corpus contains the following entity types: Disease, Drugs, Unit of Measurement, Excipient, Chemical Composition, Pharmaceutical Form, Medicament, Food, Route, and Therapeutic Action, with a total of 2,241 entities.
al. 2019; Cho and Lee, 2019; and Wang et al. 2019) with different combinations of character and word embeddings and features such as POS tags.Contextualised word representation based on transformers is the state of the art for many NLP tasks, and biomedical NER is no exception.Lee et al. (2020) introduced BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) based on BERT.This transformer was trained on PubMed abstracts and PMC full-text articles.Beltagy et al. (2019) re-trained BERT using biomedical data from Semantic Scholar to create SciBERT.The training corpus included 1.14 million full papers.Lastly, Carrino et al. (2021) trained a transformer-based system for Spanish using data from Scielo, Wikipedia, patents, EMEA, and PubMed, among others.This model is based on a RoBERTa-based transformer.Jofche et al. (2022) present a platform for recognising pharmaceutical documents and performing NER and coreference resolution.This platform uses transformer-based architecture to identify NEs in the BC5CDR and BioNLP15CG datasets in English.
Mueller et al. (2020) used diverse data in multiple languages to examine the polyglot capabilities of transformers.They show how multilingual models share a large number of parameters, how languagespeci c training uses those common parameters, and how these models preserve the top 5% of weights for each language.This work reveals that transfer learning is available to create cross-lingual systems in NLP.Sunders et al. (2019) also tested transfer learning for cross-lingual Machine Translation (MT) models, improving the BLEU score by 7 points starting with a general domain MT system and training it into a biomedical MT system.Mayhew et al. (

Table 1 .
There are 8,846 entities in total.The data gathered from the o cial ontologies is open-source and available for download on their websites.Comparison of an original sentence from the CRAFT corpus and the output of the same sentence after using a commercial MT system.To mirror the original format of the corpus, the list of monolingual English entities is formatted into a CONLL format with BIO scheme: [ENTITY, TAG].20% of the entities in the training, test, and evaluation sets have been replaced at random from the list corresponding to the tag.We attempted to replace the same number of entities for each tag to include a balanced number of new entities.7,161 entities were altered in the training set, 5,941 in the test set, and 2,267 in the evaluation set.This was saved as a new dataset to be used in current experiments.
The CRAFT EN + ES is a dataset that combines English and Spanish data.There are 3,596 distinct entities in total.The CRAFT EN + EN Augmented speci es the concatenation of the original dataset with the augmented version obtained by entity replacement.It has 4,876 distinct entities.The nal and largest dataset, CRAFT EN + ES + EN Augmented, is a concatenation of the original English dataset, the Spanish version, and the augmented English version.It has a total of 6,131 distinct entities.

Table 2
Results on the different training methods for the NER system.It shows the continuous training models and direct ne-tuning models.Data augmentation has increased the performance of all systems; thus, it is safe to say that it is bene cial for training.The success of continuous training over direct ne-tuning can be attributed to the additional ne-tuning phase, as the direct ne-tuning approach lacks the additional training instances carried out in the continuous training strategy.It should be noted that catastrophic forgetting and adopting strategies to prevent it, such as Elastic Weight Consolidation (EWC), are outside the scope of this paper and will be pursued in the future.

Table 3
F1 scores reported in the literature compared to the best performer obtained by this study.All models were evaluated with the CRAFT corpus.