MF-MNER: Multi-models Fusion for MNER in Chinese Clinical Electronic Medical Records

To address the problem of poor entity recognition performance caused by the lack of Chinese annotation in clinical electronic medical records, this paper proposes a multi-medical entity recognition method F-MNER using a fusion technique combining BART, Bi-LSTM, and CRF. First, after cleaning, encoding, and segmenting the electronic medical records, the obtained semantic representations are dynamically fused using a bidirectional autoregressive transformer (BART) model. Then, sequential information is captured using a bidirectional long short-term memory (Bi-LSTM) network. Finally, the conditional random field (CRF) is used to decode and output multi-task entity recognition. Experiments are performed on the CCKS2019 dataset, with micro avg Precision, macro avg Recall, weighted avg Precision reaching 0.880, 0.887, and 0.883, and micro avg F1-score, macro avg F1-score, weighted avg F1-score reaching 0.875, 0.876, and 0.876 respectively. Compared with existing models, our method outperforms the existing literature in three evaluation metrics (micro average, macro average, weighted average) under the same dataset conditions. In the case of weighted average, the Precision, Recall, and F1-score are 19.64%, 15.67%, and 17.58% higher than the existing BERT-BiLSTM-CRF model respectively. Experiments are performed on the actual clinical dataset with our MF-MNER, the Precision, Recall, and F1-score are 0.638, 0.825, and 0.719 under the micro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.685, 0.800, and 0.733 under the macro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.647, 0.825, and 0.722 under the weighted avg evaluation mechanism. The above results show that our method MF-MNER can integrate the advantages of BART, Bi-LSTM, and CRF layers, significantly improving the performance of downstream named entity recognition tasks with a small amount of annotation, and achieving excellent performance in terms of recall score, which has certain practical significance. Source code and datasets to reproduce the results in this paper are available at https://github.com/xfwang1969/MF-MNER. Graphical Abstract Illustration of the proposed MF-MNER. The method mainly includes four steps: (1) medical electronic medical records need to be cleared, coded, and segmented. (2) The semantic representation obtained by dynamic fusion of the bidirectional autoregressive converter (BART) model. (3) The sequence information is captured by a bi-directional short-term memory (Bi-LSTM) network. (4) the multi-task entity recognition is decoded and output by conditional random field (CRF).

avg evaluation mechanism.The above results show that our method MF-MNER can integrate the advantages of BART, Bi-LSTM, and CRF layers, significantly improving the performance of downstream named entity recognition tasks with a small amount of annotation, and achieving excellent performance in terms of recall score, which has certain practical significance.Source code and datasets to reproduce the results in this paper are available at https:// github.com/ xfwan g1969/ MF-MNER.

Graphical Abstract
Illustration of the proposed MF-MNER.The method mainly includes four steps: (1) medical electronic medical records need to be cleared, coded, and segmented.(2) The semantic representation obtained by dynamic fusion of the bidirectional autoregressive converter (BART) model.(3) The sequence information is captured by a bi-directional short-term memory (Bi-LSTM) network.(4) the multi-task entity recognition is decoded and output by conditional random field (CRF).

Introduction
With the development of electronic medical information, numerous hospitals have generated a vast amount of clinical electronic medical.These electronic medical records, as a form of unstructured data, often contain key diagnostic information such as patients' clinical symptoms, diagnostic results, and medication performance [1,2].They are rich with medical knowledge, and the accurate and rapid extraction of relevant medical named entity information from them is fundamental to disease research.This plays a crucial role in clinical decision support, medical information retrieval, Medical intelligent question and answer, medical entity relationship information retrieval, and more [3][4][5][6].Medical Named Entity Recognition (MNER) is the basis for medical information relationship recognition, is crucial to smart medical care, and has received widespread attention from researchers [7,8].However, clinical medical data has the problem of being difficult to accurately identify due to the small amount of annotated data.How to use a small amount of annotated data to establish an accurate clinical named entity recognition model is one of the key tasks in medical information processing [1,4].
The earliest research focused on named entity recognition for English electronic medical records in the medical field [9,10], followed by some researchers who designed methods for recognizing Chinese electronic medical records [11,12].From a technical perspective, both named entity recognition in Chinese electronic medical records and English electronic medical records can be roughly divided into four types: dictionary and rule-based methods, statistical learning methods, neural network learning methods, and pre-training methods based on large-scale unlabeled data.Dictionary and rulebased methods mainly rely on a large and comprehensive domain dictionary and domain experts to construct many rule templates based on grammatical structures.However, this method cannot handle the recognition of new entities and irregular entities well and cannot be reused across different domains [13].Feature engineering is a crucial part of statistical learning methods.It improves the performance of machine learning algorithms by preprocessing and extracting features from raw data to obtain a more effective and reliable feature set.Research using Conditional Random Fields [14], Support Vector Machines (SVM) [15], Hidden Markov Models (HMM) [16], etc., have achieved good results.However, these methods rely on feature sets created by high-quality feature engineering, The completion of this task relies heavily on manual efforts.The subjectivity and labor costs are relatively high, and the quality of feature selection directly affects the results of MNER.To address these issues, some scholars have proposed methods based on neural network learning.The most representative is the CNN-CRF model [17,18] and the BiLSTM-CRF model [12,19], which use neural networks and deep learning neural networks to autonomously pull out features at the character, word, and sentence tiers, lessening the subjectivity in feature selection and consequently enhancing the accuracy of recognition results [19].Nonetheless, this technique demands high-quality annotated data in the medical field to ascertain the model's identification efficacy.The labeling data of medical clinical cases are often particularly limited.Obtaining these labeled data usually requires the participation of business personnel or even medical experts, which is especially costly and time-consuming [20].Therefore, this series of methods is difficult to function as expected.
In recent years, pre-training methods utilizing massivescale unlabeled data have provided new solutions [21].Pre-trained language models (PLMs) trained on large-scale corpora, like BERT (Bidirectional Encoder Representations from Transformers), don't merely contain prior information from the training corpus in their dynamic word vectors but also include context information after the sentences are encoded by BERT [22].Some researchers use LSTM-CRF (Long Short-Term Memory with Conditional Random Field) as the main framework to address the shortcomings of single neural network named entity recognition models [23].To further enhance the model's ability to capture details and extract features, some studies have integrated Convolutional Neural Networks (CNN) and attention mechanisms into the BiLSTM-CRF model [19].This has improved the recognition accuracy and injected new vitality into the field of MNER in English electronic medical records.However, it's worth mentioning that despite these advancements, there are still challenges to be addressed.The complexity of medical terminology and the variability in the way information is recorded in medical records can make named entity recognition more difficult [24,25].
Gong L et al. established a BiLSTM-Att-CRF model to identify medical entities in the CCKS2017 dataset [26].They only studied four entities (disease, symptom, drug, and surgery), and the average Precision, Recall, and F1score only reached 75.06%, 76.40%, and 75.72%, respectively.Luqi Li et al. established a BiLSTM-Att-CRF model to identify medical entities in the CCKS2017 dataset and CCKS 2018 dataset [27].Although the models achieved good performance.Due to the limited number of entities, it is not conducive to subsequent research on entity relationship extraction and knowledge graph construction.The Embeddings from Language Models (ELMo)-lattice-LSTM-CRF model was designed in literature [28] and achieved an F1 score of 85.02% on the CCKS-2019 CNER dataset.However, the performance indicators need to be further improved.Moreover, the privacy and sensitivity of medical data add another layer of complexity to the task.Therefore, while the integration of advanced techniques like pre-trained language models and attention mechanisms has greatly improved performance, there's still room for further research and development in this field.
To deal with the above challenges, this paper designs a named entity recognition method for Chinese electronic medical records named MF-MNER, which is mainly based on multi-model fusion and multi-task learning.Firstly, the original Chinese electronic medical record data is preprocessed, including sentence segmentation, word segmentation, named entity annotation, and construction of tag sequences, and the training set and validation set are obtained by randomly dividing the annotated data.Then, the BART (Bidirectional and Auto-Regressive Transformer) layer is used to convert the input sentences into fixed length and vectorize the input sequence.On this basis, a Bi-directional Long Short-Term Memory (Bi-LSTM) layer is constructed to process the input sequence from one end to the other and vice versa simultaneously, capturing the context information in the sequence.The final output layer uses a Conditional Random Field (CRF) layer, which classifies each position based on the features extracted by Bi-LSTM, outputs the probability distribution of each annotated sequence, and determines the output entities and their corresponding positions based on probability.The AdamW optimization algorithm is employed throughout the model's training phase for hyperparameter tuning, enabling the model to better predict named entity tags.This design approach is then assessed in the CCKS 2019 data set and in the real data set of named entity recognition in Chinese electronic medical records, proving its efficacy.
The remaining sections of this paper are structured as follows: Sect. 2 mainly introduces the work related to this research, including task definition, description of the meaning of the six medical entities identified, as well as the design of the multi-model fusion method and the key technologies involved.Section 3 mainly introduces the specific practical implementation process of each link of the model.The results from the experiments and their comparative analysis are discussed in Sect. 4. The paper is summarized and concluded in Sect. 5.

Related Work
The general clinical medical electronic medical record naming entity mainly involves three stages: preprocessing, feature extraction, and model training.In English electronic medical records, words are clearly separated by spaces, making word segmentation relatively simple [20].However, in Chinese electronic medical records, there is no obvious word separation, making word segmentation an important and complex step.In addition, preprocessing steps like purifying text and deleting stop words are also required [29].In the feature extraction stage, in addition to common features such as word frequency and context information, Chinese electronic medical records may also need to consider specific features, such as the structural information of words (e.g., whether they are phrases or compound words).In the model training stage, Chinese named entity recognition can also use methods such as CRF, SVM, and deep learning, but the models need to be adjusted to adapt to the characteristics of Chinese [15].This section mainly defines and formalizes the task of MNER and introduces the relevant technologies used in our research framework.

Task Definition and Description
The task of MNER is to recognize and extract mentions of clinical medical entities from a provided collection of electronic health record texts and categorize them into predefined classes.Generally, Chinese MNER is a sequence tagging issue, including categories such as diseases & diagnoses, imaging examinations, lab tests, surgeries, medications, and anatomical sites [29].The formal definition of MNER task can be defined as Eq. ( 1): where G M is the model function, D is the input dataset, which consists of N electronic medical records d i , repre- sented as a set C is the set of predefined categories {c 1 , c 2 , …, c m }, Y is the output of the model, which represents the set of named entity mentions and their corresponding categories.The objective of this research can be formulated as follows: where m i = ⟨d i , b i , e i ⟩ represents a medical-named entity mention that appears in the document d i , b i and e i indicate the beginning and ending positions of m i in d i respectively, c m i ∈ C indicates the assigned pre-defined category for m i .The named entity mentions do not overlap, that is, e i < b i+1 .The predefined categories C about medical named entity in this study, as defined in [1], which are defined as follows: This research focuses on design models that improve the accuracy of recognizing the above six types of medical-named entities. (1)

Method Descriptions
For the above task, we have designed a multi-model fusion method for Chinese clinical electronic MNER.The approach we put forward is composed of four layers: a preprocessing layer, a BART layer, a Bi-LSTM layer, and a CRF layer, as shown in Fig. 1.First, the raw Chinese electronic medical record dataset is preprocessed using the preprocessing layer.This includes sentence segmentation, word segmentation, named entity annotation, and label sequence construction.The training set and validation set are obtained from the original data by random partitioning.Next, the preprocessed training data is fed into the next BART layer to convert the input sentences into fixed-length vector representations.The output of the MF-MNER model is obtained through a Bi-LSTM layer combined with a CRF layer for named entity labeling.This process generates the predicted segmentation results.The critical methods utilized in this research are detailed in the ensuing subsections.

The Preprocessing Layer
The initial stage of MF-MNER is the preprocessing layer, which is responsible for denoising, sentence segmentation, and word segmentation of the electronic medical records, as well as designing a labeling scheme to generate a standardized Chinese named entity recognition (CNER) dataset.
The most obvious marker for Chinese electronic medical record sentence recognition is the "period", so we first use the "period" as a delimiter to sequentially perform sequence sentence segmentation and corresponding label extraction from the original electronic medical record data.Then, we extract each character from the sentence and construct a list of characters and their corresponding labels to obtain the labeled dataset after "word segmentation" of the electronic medical records.We then use the BIO labeling strategy (Begin, Inside, Outside) to map the given labels to each character for character-level labeling to improve the accuracy of model predictions.Ultimately, we utilize the Hugging Face tokenizer to process this data and convert the electronic medical record text into tensor data that can be processed by the subsequent BART layer to match its processing requirements.

BART Layer
The second layer of MF-MNER is the BART layer, which is based on the BART model developed by Facebook [30].It consists of a bidirectional encoder and an autoregressive decoder.BART samples random tokens and replaces them with masks.The decoder then uses the output of the encoder and the previously uncorrupted tokens to reconstruct the original document.This approach greatly improves the natural language processing capabilities.Some Chinese natural language processing researchers have also developed related Chinese pretraining models.Although these models have been applied in multiple domains, they are trained on Chinese Wikipedia and a subset of WuDaoCorpus [31].They are not suitable for professional medical field scenarios.
To address this, we need to fine-tune BART layer by employing a limited amount of annotated data.After preprocessing the clinical electronic medical record data, every word x i in the phrase can be transformed from a one-hot vector to a compact, high-density word vector x i ∈ R d , where R d represents the d-dimensional real number field, and the x i embedding degree is the d-dimensional.During the pretraining of the BART model, multiple noise transformations are applied to the sequences to rebuild the original sequences from the distorted ones, leveraging the bidirectional selflearning ability of BART.This improves the robustness of the model and enables better prediction of named entity labels.

Bi-LSTM Layer
The Bi-LSTM layer is added after the BART layer to further boost the MF-MNER model's capacity to grasp and model the contextual data in the electronic medical record sequences.The Bi-LSTM [32] comprises two LSTM [33] networks, one handling the input sequence in the forward manner and the other in the reverse manner.At each time step, every LSTM unit in the Bi-LSTM layer has access to both the preceding and succeeding context information of the electronic medical records.This bidirectional reading capability allows the Bi-LSTM layer to better grasp the associations and relationships between the words in the sequences of electronic medical records.The method can capture not only the local context of every word but also the global context of the entire sequence by incorporating the Bi-LSTM layer.This enables the model to extract more meaningful features from the electronic medical record data, which can improve the accuracy and performance of the downstream tasks.
For every sentence, the word embedding sequence x = (x 1 , x 2 , …, x m ) is fed into the Bi-LSTM at each step.The output sequence of hidden states from the forward LSTM The role of the subsequent linear layer is to transition the hidden state vector from n-dimension to k-dimension, where k corresponds to the number of labels outlined in the tagging scheme.Consequently, the sentence attributes of electronic medical record data are extracted and represented as a matrix P = (p 1 , p 2 , …, p n ) ∈ R m×k .

CRF Layer
The CRF layer on top of the BART-BiLSTM model can effectively use the transition matrix to capture the dependencies between labels in a sequence, which can reduce error propagation in the final predictions.According to Ref. [34], the CRF layer's parameters are represented by a matrix A ∈ R (k+2)×(k+2) .Each entry A ij in the matrix denotes the score of the transition from the ith label to the jth label.Reflect on a series of labels y = (y 1 , y 2 , …, y m ), the score of the sequence of labels is determined using Eq.(3): The sequence's total score is calculated by adding up individual word scores.Each word in the sentence contributes to this score.The output matrix P from the Bi-LSTM layer plays a role in this calculation.Also, the transition matrix A from the CRF layer has an impact on the score.
The probability P i,y i can be obtained by the Eq. ( 4) for a training sample (x, y x ), during model training [34].
The training process of the model involves maximizing the log-likelihood function.Additionally, the model uses the Viterbi algorithm, which utilizes dynamic programming, to determine the optimal path during prediction by Eq. ( 5).

Experimental Software and Hardware Environment
The experimental training for this study was conducted on an i9-11950H CPU and a NVIDIA GeForce RTX 3080 Laptop GPU (16G) setup.The training framework utilized was Python 3.7 and TensorFlow 1.14.0.The proposed approach was implemented using Python 3.7, the version of PyTorch was 1.13.1, and the version of PyTorch with CUDA support was 11.7, on a 64-bit virtual machine.

Data Sources and Preprocessing
The experimental data utilized herein consists of two main parts: the first is the CCKS2019 dataset, and the other is an actual clinical dataset.The raw electronic medical record data was preprocessed.The first step was to segment the text into sentences.The delimiter used for this segmentation was the Chinese "。" (full stop).After the initial processing, further refinement is required.Specifically, five steps are designed: (1) If the number sequence contains Chinese characters, split it; (2) If the special character is preceded by a newline character, we skip this operation and do not split it; (3) The combination of letters and numbers before and after cannot be cut; (4) If there are letters before and after, it should not be divided and retained at this time; (5) If there are numbers before and after a special symbol, it should not be separated and retained at this time.When performing preprocessing work, we need to constantly observe the original text output and then perform in-depth text preprocessing operations, especially Chinese data.Therefore, preprocessing is a very complex and important step, which determines the quality of subsequent experiments.
Then, for each sentence, we extracted the corresponding labels and performed sequence sentence segmentation.Each sentence, consisting of m Chinese characters, was depicted as x = (x 1 , x 2 , …, x i , …, x m ), and x i was portrayed as the index of the ith Chinese character in the constructed vocabulary list.We further extracted individual characters from the sentences, constructing a list of characters and their corresponding labels.This process resulted in a labeled dataset of electronic medical records after character-level segmentation.The statistics, which are presented, are of the six named entity types (DISEASES, EXAM, TEST, TREAT, DRUG , BODY), and are derived from the original Chinese clinical electronic medical records dataset.Figure 2 provides a visual representation of these statistics.
According to the BIO (Begin, Inside, Outside) labeling strategy, the labels provided in the dataset were mapped to each character to perform character-level tagging, improving the accuracy of the model predictions.For example, if "X" represents a certain named entity category, the beginning is marked as "B-X", the middle and end characters are marked as "I-X", and other characters that are not part of a named entity word are marked as "O".In this study, there are a total of 13

labels corresponding to different entities: B-DIS-EASES, B-EXAM, B-TEST, B-TREAT, B-DRUG , B-BODY I-DISEASES, I-EXAM, I-TEST, I-TREAT, I-DRUG , I-BODY, O.
There are only the first 12 labels by analyzed in this study as they are closely related to medical entities.
We have designated 256 as the maximum length for the sequence, and the Padding function is used to pad all sequences in a batch based on the maximum sequence length in that batch.Subsequently, the preprocessed 1000 electronic medical records are randomly divided into training and validation sets, adhering to an 8:2 proportion.The model vocabulary is tokenized using a Tokenizer.With these steps, the data preprocessing is completed.

Settings for Each Layer of MF-MNER
After the above preprocessing, we use BART-based pretraining.First, Various noises are utilized to corrupt the original text, which is subsequently reconstructed via the seq2seq model, ultimately enhancing the preprocessed corpus quality.The Bart layer is designed based on "fnlp/bartbase-chinese" [31].The vocabulary size is 51,271, which is a larger vocabulary built based on the training data.
The activation function of each layer's neuron uses the GeLUs function, and the initialization parameters obey the N (0, 0.02) distribution.The maximum sequence length and the maximum position embeddings are designated as 1024.The batch size is adjusted to 2048, and a 2e−5 learning rate and 0.1 warm-up ratio are used.The model architecture is based on a 12-layer stacked bidirectional Transformer with a 768-unit hidden layer and 12 attention heads.The BART model consists of a shared embedding layer and 6 Transformer encoder/decoder layers.The shared embedding layer maps the word indices in the input data sequence to word vector representations.The input dimensions of the embedding layer are the vocabulary size (51,271) and the word vector dimension (768).The padding_idx parameter is used to map the padding word indices to zero vectors.The 6 Transformer encoder layers are used to encode the input sequence.Each encoder layer is composed of a self-attention mechanism along with a pair of fully connected layers.The self-attention mechanism calculates the contextual vectors at every spot in the input sequence of the electronic medical record data.The fully connected layers perform non-linear transformations on the contextual vectors.The decoder part also consists of 6 Transformer decoder layers, which are used to generate the output sequence.Each decoder layer comprises a self-attention mechanism, an encoder-decoder attention mechanism, and a pair of fully connected layers.The self-attention mechanism calculates the contextual vectors of the electronic medical record data at each position in the decoder input sequence.The encoder-decoder Fig. 2 Statistics number and the distribution of medical entities in the CCKS 2019 original data attention mechanism determines the alignment between the sequence input to the decoder and the sequence output from the encoder, and the fully connected layers perform nonlinear transformations on the contextual vectors to enhance the model's resilience.
The BART layer is followed by the Bi-LSTM layer, which has an input size of 768 and a hidden size of 128.It processes the input sequence of electronic medical record data, capturing the contextual information in the medical record data sequence.
The data processed by the Bi-LSTM layer is then fed into a CRF layer.The CRF layer utilizes the features extracted by the Bi-LSTM to perform named entity classification for each position, outputting the probability distribution of the labeled sequences.Based on the probability distribution, the named entity label corresponding to each position, denoted as y1, is determined based on the highest probability pi for each position x.

Model Training, Hyperparameter Optimization
The AdamW optimize [35] is used in this paper to update the parameters.The pseudocode for the optimization process is as follows: In the above pseudocode, learning_rate is the initial learning rate, weight_decay is the weight decay coefficient for L2 regularization.The optimizer initializes the first and second moment estimates m and v to zero.In each training step, the learning rate is updated by step t and the warmup steps.The optimizer is then zeroed out, and the loss is calculated for the current batch.The gradients are then calculated and backpropagated through the network.Finally, the optimizer updates the parameters with the calculated gradients and the weight decay coefficient.
The key parameter values of our models are listed in Table 1  where TP is True Positives, refer to the medical entities correctly identified in the clinical electronic medical record text.FP, also known as False Positives, are instances where words, that are not entities, are incorrectly identified as such in the text.FN stands for Negatives, which is when medical named entity words in the text are not recognized as entities.
To comprehensively evaluate the performance of our MF-MNER method for recognizing these 6 types of entities (12 labels) in clinical electronic medical records, we employed three evaluation mechanisms: micro-average, macro-average, and weighted-average [37].The detailed calculation of these three evaluation mechanisms are follows: (1) The micro-average (micro-avg) are obtained based on the statistical establishment of a global confusion matrix for each sample (regardless of category) in the clinical electronic medical record data set.Here we mainly use micro-avg to evaluate our method's overall performance on 6 types of medical entities.The calculation method of micro _ P(Precision rate),micro _ R (Recall rate) and micro _ F1 (F1-score) are defined as Eqs.(9-11): The macro averaging (macro-avg) involves first calculating the metrics for each category separately, and then uses the average of these category metrics as the final evaluation result.Here we use macro-avg to evaluate our method's performance when the average attention is paid to 6 types of entities.The calculation methods of macro _ P (Precision rate) , macro _ R (Recall rate) and macro _ F1 (F1-score) are defined as Eqs. (12-14): (3) The weighted average (weighted -avg) weights the metric based on the number of samples in each category to account for differences between categories and sample imbalance.The calculation method of weighted_P (Precision rate) , weighted_R(Recall rate ) and weighted_F1 (F1-score) are shown in Eqs. (15-17): w i is the weight.num_i is the number of samples in each category.num_all is the total number of samples in all categories.

Performance Analysis of the MF-MNER Model
In this subsection, our proposed model is validated by testing on CCKS 2019 datasets using Precision, Recall, and F1score.There are 6 types of entities, 12 types of prediction Results, and comprehensive evaluation metrics are listed in Table 2 and shown Fig. 4.
From Table 2, its shows that the Precision, Recall, F1score of B-DRUG reach 0.939, 0.925, 0.932, and the Precision, F1-score of reach 0.964, 0.923, 0.943.The Precision, Recall of I-TREAT are 0.921, 0.858, The MF-MNER has achieved a value of more than 0.9 for the three metrics B-DRUG , I-DRUG , and I-TREAT, which is higher than the other 9 tags.The reason for this effect may be that the public corpus has more information about these three types of named entity tags.
From Table it is find that all metrics reached a relatively close level.The score indicates that the comprehensive performance of MF-MNER is good whether it is from the perspective of focusing on entities, focusing on entities on average, or focusing on entities weighted according to named entity distribution.

Comparison of MF-MNER and Unoptimized BERT-BiLSTM-CRF Model
In order to evaluate the effectiveness of the design MF-MNER method, the existed model BERT-BiLSTM-CRF [29] is selected as the baseline comparison, because this document uses the same data set as we do, and because this document does not explain the calculation method of the listed writing metrics, we list all the metrics in Table 2 for comparison.The comparison results are shown in Table 4.
From Table 4, the three metrics of Precision, Recall and F1-score of the model we designed in the case of micro avg evaluation are better than the existing BERT-BiLSTM-CRF method [29] improved by 19.24%, 20.19% and 19.65% respectively.The three metrics of accuracy, recall and F1 score in the case of macro avg rating are respectively improved by 15.67%, 15.14% and 15.67% in comparison to the presented BERT-BiLSTM-CRF model [29].In the case of weighted avg evaluation, the three metrics of accuracy and recall FI score are 17.45%, 17.58% and 17.58% higher than the existing BERT-BiLSTM-CRF model [29] respectively.These results illustrates that the overall performance of MF-MNER is superior to existed method.

Performance of Our Optimized BERT-BiLSTM-CRF Model
To further study, we use AdamW to improve the Bert-BiL-STM-CRF [29] model performance.Under the same experienvironment conditions model, the training settings are adopted.The obtained prediction of the 12 named entity tags are shown in Table 5.To analyze the optimized model overall performance, we also calculated the three metrics of micro avg (macroaverage), macro avg (micro-average) and weighted avg (weighted-average) respectively using Eqs.(11)(12)(13)(14)(15)(16)(17), Table 6 shows the results.
From Table 6, the optimized metrics have far exceeded the metrics of the baseline model [29], indicating that AdamW's optimization method is very suitable for this type of medical clinical electronic medical record data, whether it is from the overall focus on the named entity, from the average focus from the perspective of entities, or from the perspective of focusing on entities based on weighted named entity distribution, the overall model has been greatly improved.

Performance Comparison of Some Existed Models on CCKS2019 Dataset
To demonstrate the superiority of the MF-MNER method designed in this paper, we conducted comparative experiments on the same dataset CCKS2019, mainly comparing our designed MF-MNER, Ref. [29], our optimized Ref.
[29] model, and the method in Ref. [28].To be fair, we used the weighted avg evaluation metric value in our designed model for comparison (Refs.[28,29] did not specify which evaluation mechanism(micro avg, macro avg and weighted avg) was used).Table 7 displays the results of the comparison.
Under the condition of datasets of equal size, the MF-MNER method proposed in this study has improved the Precision, recall, and F1-score by 19.24%, 12.22%, and 15.44%, respectively, compared to our optimized Ref. [29] method.In contrast to the baseline model, the three metrics of our proposed method have seen enhancements of 19.65%, 15.67%, and 17.58% respectively.Compared with the best method obtained in Ref. [28], the MF-MNER method proposed in this study improved the Precision, Rcall, and F1 score by 4.25%, 2.00%, and 3.05%, respectively.

Comparative Analysis of MF-MNER Model on the CCKS2019 Dataset with Varying Training Set Sizes
To further verify the performance of MF-MNER in small data, we conducted comparative experiments, randomly extracting 20%, 40%, 60%, and 80% of the total training data without replacement and all data for comparison.The comparison of F1 score results under different data sets is shown in Fig. 4. Figure shows that when the data is 0.6 train ratio of training data, our MF-MNER has achieved good performance, and the F1 score is close to the optimal level.This indicates that our MF-MNER is better adapted for named entity recognition in medical electronic records in small datasets.

Performance Comparison of MF-MNER Model and Our Optimized Bert-BiLSTM-CRF on the Real Data Set
To further validate the performance of the MF-MNER method designed in this study, we get a real data set, which contains 100 Chinese clinical electronic medical records from a tertiary hospital affiliated with Xinxiang Medical University (the First Affiliated Hospital of Xinxiang Medical University), with which we collaborate.After the same preprocessing, the MF-MNER model was evaluated under three assessment scenarios: micro average, macro average, and weighted average.Comparative experiments were conducted against the optimized method from Ref. [29].The results are presented respectively in Table 8.
From Table 8, it can be observed that our designed MF-MNER method shows an improvement over our optimized version of the method described in Ref. [29] across all three evaluation criteria: micro avg, macro avg, and weighted avg.In terms of Precision, Recall, and F1-score, the enhancements are evident.Specifically, there is an increase of 1.75%, 3.12%, and 2.28% in the micro avg category, respectively.In the macro avg category, the increments are 2.39%, 4.03%, and 3.24%, respectively.Lastly, in the weighted avg category, the improvements are 1.41%, 3.12%, and 2.41%, respectively.

Conclusion
In this study, a Chinese electronic medical record named entity identification approach, named MF-MNER, was developed inspired by the principle of model fusion.The main contributions of this paper can be summarized as follows: (1) Leveraging the encoder-decoder architecture of BART for its robust capabilities in understanding and articulating complex contextual information, as well as handling ambiguous entity boundaries during pre-training, to dynamically integrate context information from clinical electronic medical records, thereby enhancing fine-tuning efficiency and accomplishing the task of clinical electronic medical record entity recognition with minimal training data.(2) Design a novel MF-MNER model for entity recognition in Chinese medical electronic medical records.Initially, the Chinese electronic medical record data undergo preprocessing and encoding.Subsequently, the contextually informed and Bidirectionally Auto-Regressive Transformer (BART) pre-trained model is selected.This model is then refined using the AdamW optimization algorithm.The output from the BART layer is subsequently fed into the next layer, a Bidirectional Long Short-Term Memory (Bi-LSTM).The Bi-LSTM exploits its capability to process sequences in both forward and reverse directions, thereby integrating the sequence's "past" and "future" information, which is then "concatenated" to serve as the input for the CRF layer.Ultimately, the CRF layer employs its parameterized "transition matrix" and the Viterbi algorithm for decoding, to identify the most fitting annotation sequence for six types (Diseases and diagnoses (DISEASES), Imaging Examinations (EXAM), TEST (Lab tests), Surgeries (TREAT), Medications (DRUG), Anatomical sites (BODY)) of entities in Chinese clinical electronic medical records.(3) In the standard public CCKS2019 dataset, our proposed MF-MNER outperforms the existing literature in all three evaluation scenarios: micro-average, macro-average, and weighted-average.Compared to the existing BERT-BiLSTM-CRF model, our method has achieved significant improvements of 19.64% in Precision, 15.67% in Recall, and 17.58% in F1-score.
When tested on a real-world Chinese clinical electronic medical record dataset from hospitals, our MF-MNER demonstrated a noticeable enhancement across all three evaluation scenarios in terms of Precision, Recall, and F1-score metrics, relative to our optimized BERT- BiLSTM-CRF approach.This further substantiates the effectiveness of our designed MF-MNER.
This study also has certain shortcomings and limitations.Specifically, the assessment experiments were only on the CCKS2019 dataset and a 100 authentic data points for actual Although there a significant enhancement in model performance on the standard CCKS2019 dataset, the entity recognition performance on real medical data requires further improvement.Our subsequent efforts will be directed towards gathering a more extensive compilation of datasets from various hospitals, investigating methods to improve the model's learning and generalization capabilities, and addressing the issue of clinical information isolation between different domestic hospitals.

( 1 ) 2 ) 6 )
Diseases & diagnoses (DISEASES): Medically defined diseases and the judgments made by doctors in clinical practice regarding etiology, pathophysiology, classification, and staging.(Imaging examinations (EXAM): Includes X-ray, CT scan, MRI, PET-CT, etc.It does not include diagnostic procedures such as gastroscopy and colonoscopy to avoid excessive conflict with surgical procedures.(3) Lab tests (TEST): Physical or chemical evaluations performed in the laboratory, specifically referring to clinical lab analyses undertaken by the laboratory division.It does not include immunohistochemistry and other broad laboratory tests.(4) Surgeries (TREAT): Surgical operations conducted by medical professionals on particular regions of the patient's body, including excision, suturing, and other treatments.It is the main treatment method in surgery.(5) Medications (DRUG): Specific chemical compounds utilized for curing illnesses.(Anatomical sites (BODY): Pertains to the anatomical places in the human body where diseases, symptoms, and signs manifest.

Fig. 4
Fig. 4 Comparing F1-scores of three kinds under various training set sizes (CCKS2019 dataset).The horizontal axis signifies the ratio of the training set to the total training data.The ratios of the training data and the total number obtained in this study are 0.2, 0.4, 0.6, 0.8, and 1 respectively

Table 1
Parameter settings Parameters Value

Table 2
The prediction results of 12 types on CCKS2019 dataset with MF-MNER(our method) Bold represents rows where all indicators exceed 0.9, indicating higher recognition accuracy for these entities relative to the rest

Table 3
Comparison of performance indicators of Precision, Recall, and F1-sore MF-MNE on CCKS2019 dataset under three evaluation mechanisms

Table 5
Recognition results of Precision, Recall and F1-sore about

Table 6
Comparison of performance indicators of Precision, Recall and F1-sore our optimized Bert-BiLSTM-CRF's on CCKS2019 dataset under three evaluation mechanisms b CRF: conditional random field c GloVe: Global Vectors for Word Representation d ELMo: Embeddings from Language Models e ML: Many languages

Table 8
Performance comparison of MF-MNER and our optimizedBert-BiLSTM-CRF on a real Chinese clinical electronic medical records dataset under the three kinds of evaluation mechanism