Background

EMR is used by medical staff to record texts, symbols, charts, graphics, data, and other digital information generated by HIS (hospital information system). With the tremendous growth of the adoption of EMR, various sources of clinical information (including demographics, diagnostic history, medications, laboratory test results, and vital signs) are becoming available, which has established EMR as a treasure trove for large-scale analysis of health data. Unstructured medical text in EMR is one kind of narrative data, including clinical notes, surgical records, discharge records, radiology reports, and pathology reports. For the convenience of narration, we use EMR to represent unstructured EMR text in the following.

Identifying semantic relations existing among medical concepts in EMRs is of great importance to health-related various applications. These relations are hold between medical problems, tests, and treatments. Table 1 presents two examples of semantic relation, one of which is between medical concept e1=“cold” and e2=“fever” in sentence S1, and the other is between e1=“Head MRI” and e2=“lacunar infarction” in sentence S2.

Table 1 Examples of the relations between medical entities

On account of the importance of this subject, the 2010 i2b2/VA NLP challenge for clinical Records presented a relation classification task focused on assigning relation types between medical concepts in EMRs. Since then medical concept relation classification has being paid attention by more and more researchers.

In the traditional natural language processing (NLP) research, semantic relations between named entities can be used for many applications including knowledge graph construction, sentiment analysis, question answering, etc. [1], relation extraction or classification therefore has always been an important issue [2]. In previous open-domain entity relation extraction studies, researchers applied many different traditional machine learning models include Logistic Regression, SVM and CRF to recognize relations [37]. Li et al. used CRF model to reduce the space of possible label sequences and introducing long range features for relation recognition [8]. Mintz et al. put forward a remote monitoring relation classification method which could generate adequate training data by aligning text and knowledge base to solve the problem of lack of enough training data [9]. Socher et al. firstly employed recurrent neural network (RNN) on the task of relation extraction, while utilizing the syntactic structure information of sentences [10]. Miwa et al. proposed a neural network relation extraction architecture based on bidirectional LSTM and tree LSTM to encode entities and sentences simultaneously [11].

Drawing on these studies on open-domain relation extraction, similar task on EMRs was formally defined in the 2010 i2b2/VA Challenge Evaluation [12]. Some researchers proposed various models for relation classification of EMRs. Bruijn et al. used SVM to train multiple classifiers to deal with different relation categories, and improved the effect of classification [13]. Rink et al. use external dictionaries to increase the effect of entity relationship recognition [14]. Fang et al. extracted the relations from relevant articles of Chinese herbal medicine based on manually designed rules and created a relation database [15]. Zhou et al. utilized a bootstrapping framework to extract relations from the medical articles and created a knowledge base [16]. Li et al. raised an electronic health records relation classification model based on CNN-LSTM [17]. Overall, the existing models mainly focus on English EMR texts, and on the other hand it still cannot deliver satisfactory recognition performance. Concerning the increasing availability of digitalized Chinese EMRs, this paper addresses the semantic relation identification problem among medical concepts in Chinese EMRs. We propose an attention mechanism based deep residual network model to classify the medical entity relations in Chinese EMRs. Experimental results performed on a manually labeled Chinese EMR corpus show that our model achieved better performance with F1-score of 77.80% compared with other methods.

Methods

Our model is based on a CNN architecture as shown Fig. 1. The model consists of five parts: vector representation layer, convolution layer, residual networks layer, position attention layer and output layer.

Fig. 1
figure 1

The architecture of our relation extraction model

Character embedding

Given a Chinese sentence S=(c1,c2,…,cn) which contains two entities e1 and e2. Each character ci will be mapped to a low-dimensional dense vector \(V_{i} = (V_{w}^{i}, V_{p}^{i})\), in which \(V_{w}^{i}\) represents the character vector and \(V_{p}^{i}\) is the vector of character position in the sentence. The character embedding initialized with vector which is pre-trained by word2vec, and dw is the dimension of character vector.

Position embedding

Position embedding \(V_{p}^{i}\) is also a low-dimensional vector of character position in the sentence, which can combine the relative positions (see Fig. 2) of the current character to the first entity e1 as well as the second entity e2. Each relative position corresponds to a position embedding \(V_{p}^{i} \in R^{d_{p}}\), dp is the dimension of position embedding.

Fig. 2
figure 2

An example of the relative distance between an entity and a character. The relative distance of a character to medical entity “(cold)” and “(fever)” are 2 and -2 respectively

The vector \(\phantom {\dot {i}\!}V_{i} \in R^{d_{v}}\) is concatenation of character vector \(V_{w}^{i}\) and two position vectors, where dv=dw+2dp.

Convolution

Convolution is to extract the effective local feature information from characters and their corresponding contexts. The Vj is a vector which corresponds the j-th character in the sentence S=(V1,V2,…,Vn), here n is the sentence length. We use filter \(\phantom {\dot {i}\!}W \in R^{h \times d_{v}}\) to extract local features from the sentence S. A feature cj is generated from a window of character Vj:j+h−1 by

$$ c_{j}=f(W \cdot V_{j:j+h-1}+b), $$
(1)

where b is a bias terms and f is a non-linear function. We apply dropout layer in convolution to prevent data from outfitting.

Residual networks

Residual learning connects low-level to high-level representations directly and solves the vanishing gradient problem, we superimposed the identity mapping function on a network. In our model, each residual convolution block (see Fig. 3) has two convolutional layers, each one followed by a ReLU activation, we use shortcut connection between each of the residue convolution block W1,W2Rh×1 are two convolution filters, where h is convolution kernel size. The first convolutional layer is

$$ {\tilde c_{j}} = f\left(W_{1} \cdot c_{j:j + h - 1} + b_{1}\right), $$
(2)
Fig. 3
figure 3

The residual convolution block

and the second is

$$ \hat c_{j} = f\left(W_{2} \cdot \tilde c_{j:j + h - 1} + b_{2} + c_{j}\right), $$
(3)

here b1, b2 are bias terms. The residual convolution block output is the vector \(\hat c_{j}\). This block will be multiply concatenated in our architecture by a shortcut connection.

Position attention

Recently attention mechanism has been widely used in machine learning, and great achievements have been made in various NLP problems. In this paper, we use the position attention to enhance relation extraction ability. Firstly, we carry the max-pooling operation on the residual learning result. Secondly, as shown in Fig. 1, we concatenate the max-pooling results with the position embedding of entity. Finally, we use the attention mechanism to balance the weight to the sentence.

$$ S_{i} = \sum\limits_{i} \alpha_{i} \times P_{i}, $$
(4)

where αi represents the attention weight. Pi is a result which concatenates the max-pooling results with the position embedding of entity. Finally, we use the softmax function to normalize and output entity relation probability.

$$\begin{array}{*{20}l} \alpha_{i} = \frac{\exp \left(e_{j}^{i}\right)}{{\sum\nolimits}_{k} {\exp \left(e_{j}^{k}\right)}} \end{array} $$
(5)

Results

Dataset and evaluation metrics

On the basis of reference to medical semantic relation annotation specification of 2010 i2b2/VA Challenge, we established our own relation annotation specification of Chinese EMRs, in which semantic relations between medical concepts fall into five coarse-grained categories and fifteen fine-grained categories. All of relation category are detailed as follows.

Coarse-grained category 1: Treatment -Disease Relation. This category contains five fine-grained categories, including TrID (Treatment improves the disease), TrWD (Treatment worsens the disease), TrCD (Treatment causes the disease), TrAD (Treatment is administered for the disease), and TrNAD (Treatment is not administered because of the disease).

Coarse-grained category 2: Treatment -Symptoms Relation. This category also contains five fine-grained categories, including TrIS (Treatment improves the symptoms), TrWS (Treatment worsens the symptoms), TrCS (Treatment causes the symptoms), TrAS (Treatment is administered for the symptoms), and TrNAS (Treatment is not administered because of the symptoms).

Coarse-grained category 3: Test-Disease Relation. This category contains two fine-grained categories, including TeRD (Test reveals the disease) and TeCD (Test conducted to investigate the disease).

Coarse-grained category 4: Test-Symptoms Relation. This category also contains two fine-grained categories, including TeRS (Test reveals the symptoms) and TeBS (Test based on symptoms).

Coarse-grained category 5: Disease-Symptoms Relation. This category contains only one fine-grained category named as DCS (Disease causes symptoms).

According to our specification, we manually annotated 3000 de-identified Chinese EMR texts from different clinical departments of a grade-A hospital of second class in Gansu Province, China. 2000 medical texts are selected as training data, 500 medical texts as develop data, and 500 medical texts for test while evaluating our method on this dataset. The relation numbers of every fine-grained category in this dataset are given in Table 2. Precision, Recall and F1-score are used as evaluation metrics.

Table 2 The relation number of every fine-grained category in the corpus

Models and parameters

We carry out the experiments to compare the performance of our model with others described in the following.

CNN-Max: This model was used by Sahu, et al. [18], which encoded the sentence vectors with CNN, and outputted the results after max-pooling and softmax function.

BLSTM-Attention: This model was proposed by Li, et al. It mainly consists of bidirectional LSTM and attention mechanism [19].

ResNet-Max: This model was proposed by Huang, et al. Compared with our model, this model did not combined attention mechanism [20].

ResNet-BLSTM: The basic framework of the method is close to our model. The difference between this one with ours is that this model combine the residual network with Bi-LSTM.

ResNet-PAtt: This is the model presented in this paper. Table 3 gives the chosen hyper-parameters for all experiments. We tune the hyper-parameters on the development set by random search. We try to share as many hyper-parameters as possible in experiments.

Table 3 Hyper parameters of the residual neural network

Experimental results

Table 4 shows the overall classification performance of different models on our evaluation corpus. It can be seen that our method ResNet-PAtt is better than other methods in F1-score while precision, recall and F1-score reaches 79.16 and 77.80% respectively. Of all other methods, the model ResNet-BLSTM achieves the best performance on F1-score, and our model improves 2.97% F1-score compared with it, then our method is more effective. In addition, we can find that overall the residual network based methods are better than other relation extraction methods.

Table 4 Comparison of overall relation classification result of different model

Discussion

The reasons our model achieves best performance maybe owe to that the residual network-based model could reduce the negative impact of corpus noise to parameter learning, and the combination of character position attention mechanism could enhance the identification information of different type of entities. Table 5 gives the classification performance of our model on every fine-grained relation category. As can be seen from these data, our model performs best on relation category TeRS and worst on category TrNAS, which shows that it is more difficult to recognize category TrNAS correctly. We also evaluate the training time of different models. Figure 4 shows that the consumed times by these models while epoch is set as 5, 10 and 20 respectively. Overall, our model takes the shortest time to complete parameter training, and the traditional machine learning method SVM takes the longest time to train.

Fig. 4
figure 4

Comparison of the training time for different model

Table 5 Classification performance of our model on every fine-grained relation category.

Table 6 is comparison of F1-score for each model on every fine-grained relation category. The model has better classification performance and faster response speed.

Table 6 Comparison of F1-score for each model on every fine-grained relation category

Conclusions

In this paper, we propose a deep residual network model based on the attention mechanism to classify the relation of entity pairs in Chinese EMRs. The method reduced the influence of data noise on the model training, and enhance entity discrimination feature with position attention mechanism so that the entity information can be combined effectively in the relation extraction. Experimental results show that the model reached 77.80% F1-score value, and significantly improved the classification performance of the few instance categories. At present, most relation classifications are based on entity recognition tasks and need to specify the entity in the sentence. In the future, we will study the joint extraction of entity and entity relation to further improve the efficiency of entity and entity relation recognition simultaneously.