Introduction

The identification of railway safety risk is important in the intelligent transportation systems under industry 4.0 [48], which can guarantee stable operation and reduce the probability of incidents/accidents [8, 26]. The key is automatically detecting the safety risk from the “three-in-one” scenario of human, physical and technical defense. Recently, many promising deep neural networks (DNN) models [7, 13, 19, 28, 29, 34] have been proposed to detect safety risk in railway systems. However, these previous models suffer from two challenging problems.

Fig. 1
figure 1

Challenges of existing models. a Existing models fail to mining the semantic relation between railway objects. b Pre-trained language models (e.g., BERT) can not well represent railway corpus due to absence of domain semantics

Neglecting the importance of interconnection between railway objects. These existing models fail to explore the relation between railway objects and thus have to rely extra experiences or prior knowledge for incident/accident inference, as shown in Fig. 1a. For instance, the two-stage DNN model [49] is proposed to detect the abnormal state of a single railway object, but it ignores the relation between railway objects that is important for analyzing train collision accidents. In addition, introducing different experiences or prior knowledge to explore the relation can generate inconsistent results due to differences in human subjectivity.

Lacking domain semantics of railway. Almost all these DNN based models encounter with a severe semantic gap between the railway and universal domains, particularly in text mining. Existing models usually initialize the data representations with pre-trained models (e.g., BERT [9], GPT [32]) trained on the open corpus (that includes Wikipedia, Wall Street Journal, etc.), which lacks domain corpus, especially the Chinese railway corpus, as shown in Fig. 1b. For example, “turnout bolts” is an important and frequent phrase in the safety supervision log of the railway but missing in the open corpus, which do may do harm to understand railway text and thus impairs the final performance. It reveals that enhancing the domain semantics is important for understanding the railway text.

At present, only a few models attempt to address the aforementioned challenges. To our knowledge, a recent method [15] can be used to address the second challenge, which twice implements the pre-training on domain corpus to enrich domain semantics. However, two severe limitations are raised: (1) How to effectively explore the semantic relation between railway objects? and (2) How to fully exploit domain semantics from limited Chinese railway corpus for a better result and extend the practical application?

To solve these severe issues, this paper recasts the identification of railway safety risk as the relation extraction task and proposes a novel and effective Domain Semantics-Enhanced Relation Extraction (DSERE) model. Specifically, we model railway safety risk as the triple knowledge \((e^\textrm{sbj}_i,R_i,e^\textrm{obj}_i)\), where \(e^\textrm{sbj}_i\) and \(e^\textrm{obj}_i\) are the subject (or head entity) and object (or tail entity) of the railway safety risk interconnected by the relation \(R_i\). The main advantage of introducing triple knowledge is that interconnection can be used for incident/accident inference. Afterward, our DSERE implements the relation extraction task by learning railway semantics simply and effectively. A domain semantics-enhanced transformer (DSET) mechanism is designed to obtain the domain semantics by classical N-Grams mechanism on the dedicated railway lexicon. The motivation behind it contains two aspects, for one thing, the scale of the railway lexicon is much smaller than the unsupervised corpus for twice pre-training [5, 18]. For the other thing, the domain semantics based on the statistical mechanism is more helpful in understanding the railway text. Then, we observe that the triple knowledge splits sentence into three segments, each contributing much to the nearest entity. Therefore, piece-wise convolutional neural networks (CNN) are introduced to capture fine-grained features. With the above mechanisms, the enhanced domain semantics is fully exploited to improve the performance of relation extraction to better identify the railway safety risk. Experimental results on real-world dataset show the effectiveness of our proposed model.

The main contributions of this paper are summarized as follows:

  • A novel method for identifying the railway safety risk is proposed by extending the relation extraction task. To the best of our knowledge, this is the first work to project the railway safety risk into a knowledge interconnection space that can be used for incident/accident inference.

  • In order to clearly express the interconnection of railway safety risk, we are the first to formally define three core concepts, that depict railway objects and their semantic relation from the perspective of knowledge.

  • A novel DSERE model is proposed to enhance the domain semantics by designing an effective domain semantics-enhanced transformer mechanism, bridging the semantic gap between the railway and universal domains.

  • Experiments on the real-world dataset verify the superiority and effectiveness of our proposed model. Our proposed DSERE has important practical values in promoting knowledge-driven safety risk management of China railway system.

The rest of this paper is organized as follows. The next section introduces fundamentals about identifying railway safety risk, language representations and the practical application of relation extraction. The subsequent section presents the model for railway safety risk pattern and demonstrates the detail of the proposed DSERE. Experimental results and detailed analysis are shown in the penultimate section. The conclusion and future work are given finally.

Highlights

The main aspects of our work are shown as follows:

  • Recast the identification of railway safety risk as the relation extraction task. Railway safety risk is formally defined from knowledge perspective, which is important for exploring the semantic relation between railway objects.

  • Performance improvement on relation extraction. The proposed DSERE model includes a novel DSET mechanism enhancing the understanding of railway semantics from limited corpus, and thus greatly improves the result of relation extraction.

  • Broad application of the proposed method. Since knowledge interconnection is important for mining semantic relation between railway objects for further incident/accident inference, the proposed DSERE model has broad applicability in scenarios of ITS, such as locomotive scenario and electrical scenario.

Related work

Identification of railway safety risk

Early identifying the railway safety risk significantly reduces the probability of incidents/accidents [1]. Recently, machine learning methods have been used to automatically identify railway safety risk from large-scale heterogeneous data instead of relying on manual detection [4, 24, 39]. Gabriel et al. [17] automatically detected the rail failure risk through an image processing approach to ensure the safety of rail transportation. Zhang et al. [53] proposed an MRSDI-CNN model to detect the rail surfaces, that satisfy the comprehensive, fast, and accurate detection process. Chen et al. [6] investigated using a data-driven method to identify the slight fault information from the signal features of the bogie. Gao et al. proposed the adaptive deep learning model to overcome the challenging task of inspecting the safety status of the swivel clevis, an important component in the high-speed railway. However, these existing studies fail to mine the relation between the railway objects and ignore the rich value contained in the text.

Language representations

Understanding abstract semantics is a severe challenge in the text mining fields [50, 54]. In the early pioneer work [27], a toolkit was developed to project words into continuous vectors through Continuous Bag-of-Words (CBOW) and Skip-gram (SKG) models. Afterward, Vaswani et al. [38] proposed the Transformer mechanism, a milestone to model sequence transduction entirely by the attention mechanism. The BERT [9], GPT [32] are proposed based on the encoder architecture and the decoder architecture of the Transformer. Yang et al. [47] proposed the XLNet that is a, a generalized autoregressive method trained in Transformer(-XL) architecture since the BERT model adopts a masking strategy that corrupts the dependency between masked positions. To fully explore the language universality for understanding low-resource languages, multilingual pre-training models [14, 35] spark interest and show the superiority of cross-lingual performance on the downstream tasks of the natural language processing community. The pre-trained models have already learned the intrinsic textual representations without prior knowledge, but they fail to capture specific domain semantics, especially the Chinese railway corpus.

Relation extraction

Relation extraction is the core process for constructing a knowledge graph, and has been used to address various practical problems in specific domain [42]. For example, Du et al. [10] proposed a relation extraction model to obtain knowledge in the manufacturing domains. Relation extraction is used in the biomedical domain to extract relevant information from text, such as research articles and electronic health records [21, 33]. Amin et al. [2] proposed an accurate benchmark for broad-coverage biomedical relation extraction, which can be used for extracting bio-molecular information extraction and identifying drug–drug interactions. In the webpage mining domain, Locard et al. [25] proposed a ZeroShotCeres model to extract structural triple knowledge through the “zero-shot” method from the webpages. More interestingly, Jie et al. [16] used relation extraction to learn deductive reasoning for the math word problem in the text. However, no relation extraction model is applied to identifying the railway safety risk from the Chinese railway text.

Table 1 provides an overview of related methodologies, including the recent methods of identifying railway safety risk, excellent language models and relation extraction work for solving domain challenges.

Table 1 A survey of relevant papers

The proposed method

Problem formulation

The railway safety risk is the unsafe state of railway systems, including human, infrastructure, environmental and management safety risks. This study models railway safety risks to interconnect railway objects as a triple knowledge \((e_i^\textrm{sbj},R_i,e_i^\textrm{obj})\). Table 2 presents the notations used in this paper. This paper introduces three core concepts, defined as follows.

Table 2 Notations and explanations

Definition 1

(Subject) It describes the subject of the railway safety risk in the text, and let \(E^\textrm{sbj}\) be a set that includes the subject of human safety risk \(E_1^\textrm{sbj}\), the subject of infrastructure safety risk \(E_2^\textrm{sbj}\), the subject of environment safety risk \(E_3^\textrm{sbj}\), and the subject of management safety risk \(E_4^\textrm{sbj}\).

We generate the subset \(E_1^\textrm{sbj}\) with railway workers such as line workers, guard workers and flaw detection workers to explore the subject semantics. For the subset \(E_2^\textrm{sbj}\), we focus on the infrastructure such as rails, ballast beds and roadbeds. When considering the environmental factors that seriously affect the railway operation, we present the subset \(E_3^\textrm{sbj}\) as a set of extreme weather (e.g., strong winds, floods and earthquakes.). The subject of management safety risk \(E_4^\textrm{sbj}\) mainly includes management rules and regulations, including the “Rail Traffic Reliability, Availability, Maintainability and Safety Specifications and Examples”and “Railway Subgrade Major Maintenance Rules”.

Definition 2

(Object) It describes the object of the railway safety risk in the text, and let \(E^\textrm{obj}\) be a set that includes the object of human safety risk \(E_1^\textrm{obj}\), the object of infrastructure safety risk \(E_2^\textrm{obj}\), the object of environment safety risk \(E_3^\textrm{obj}\), and the object of management safety risk \(E_4^\textrm{obj}\).

This paper explores object semantics by mining the object corresponding to the subject. Specifically, \(E_1^\textrm{obj}\) represents unsafe behavior of workers, such as not wearing reflective tape, not implementing “hand-to-mouth”, and not getting off the road in time, which is vulnerable to induce personnel injuries. The \(E_2^\textrm{obj}\) means the unsafe state of infrastructure that will induce potential dangers, for example, cracks, subsidence, siltation and collapse. For \(E_3^\textrm{obj}\), extreme weather can cause safety hazards and thus induce the unsafe states of humans or infrastructure. The \(E_4^\textrm{obj}\) represents the human (e.g., grass-roots workers) or infrastructure who violates those management rules and regulations in \(E_4^\textrm{sbj}\).

Definition 3

(Relation) It describes the interconnected relation between the subject and object of the railway safety risk. Let \(R=\{R_1,R_2,R_3,R_4\}\) be a set representing the types of the relation, where \(R_i\in R\) represents the relation between \(E_i^\textrm{sbj}\) and \(E_i^\textrm{obj}\).

Based on the above definitions, identifying railway safety risk is recast as extracting the relation R between \(E^\textrm{sbj}\) and \(E^\textrm{obj}\) from the input sentence S. Unlike the existing models that regard railway objects as independent, our relation extraction method interconnects the subject and object of railway safety risk by the relation, and thus generates a space of knowledge interconnection [40]. This space can be viewed as a knowledge graph for analyzing incidents/accidents by knowledge inference, since the relation fully explores the chain of incident/accident evolution. As Fig. 2 shows, (No.3 turnout, \(R_2\), missing bolt) is extracted from context “missing bolt is occurred on the east point heel of No.3 turnout”, which expresses the safety risk of turnout induced by a missing bolt. When analyzing the accident depicted as “a train derails beside the No.3 turnout, causing the worker Mr. Zhang to be injured”, the knowledge graph can be searched to get the triple knowledge (No.3 turnout, \(R_2\), missing bolt) and (Mr. Zhang, \(R_1\), fork-heart walking) that show the direct cause of the train derailment accident. Furthermore, to analyze the “Mr. Zhang” node, we discover a triple knowledge (Mr. Zhang, \(R_4\), examination absence), indicating that the above-mentioned accident’s root cause is not strictly implemented in the management measures.

Modeling the railway safety risk as the triple knowledge interconnects the railway objects opens a new dimension for identifying of the railway safety risk. The next subsection explores the new and uncultivated areas that even were considered as a place that cannot be probable. Additionally, we show how to construct dedicated neural networks to automatically extract relation in railway systems.

Fig. 2
figure 2

A novel method to identify the railway safety risk by relation extraction. This method can project the railway safety risk into a knowledge interconnection space and support the incident/accident inference

DSERE model

The most challenging problem when extracting the relation from railway text is how to deeply understand abstract railway semantics. A common method is to twice pretrain the pre-trained model on large unsupervised domain corpus, which reshapes the adaptability of the deep neural networks to enrich the domain semantics. However, this method consumes computing resources and introduces a new problem of large-scale unsupervised railway corpus shortage. This study proposes a novel DSERE model to deeply understand railway semantics with a simple and effective model to solve this issue, as shown in Fig. 3. The DSET mechanism is designed to fuse railway and universal semantics, where the important railway semantics is extracted from a dedicated railway lexicon with railway domain terminologies through the N-Grams mechanism. Furthermore, a piece-wise CNN is used to extract the fine-grained features in the triple knowledge structure. With the above mechanisms, the proposed model can fully understand the railway semantics and extract relation using the softmax classifier.

Fig. 3
figure 3

Overview of DSERE model. a The DSERE model includes two important components—the domain semantics-enhanced transformer mechanism and piece-wise CNN. b The encoder details of the Transformer layer aim to bridge the gap between domain and universal semantics. c The encoder details of piece-wise CNN aim to extract fine-grained features contained in the triple knowledge. The convolution window slides from \(\textbf{x}_1\) to \(\textbf{x}_{k_1}\), and its size is closely related to the input sentence

Specifically, let \(S=\{x_1,x_2,\ldots ,x_{k_1}\}\) be the input sentence, and D be the railway lexicon with railway terminologies. The DSET mechanism aims to obtain enhanced domain semantics to improve the performance of our DSERE model. We introduce the N-Grams mechanism to explore word boundary features from the statistical perspective. Let \(s_g =\{c_i^g \vert c_i^g \in D, i\in [1,n]\}\) be a railway terminology, and the joint probability of N-Grams is expressed as follows:

$$\begin{aligned} p(c_{1}^{g}, c_{2}^{g}, \ldots , c_{n}^{g})=\prod _{i=1}^{n} p(c_{i}^{g} \vert c_{i-N+1}^{g}, \ldots ,c_{i-1}^g), \end{aligned}$$
(1)

the N-Grams matching matrix \(\textbf{M} \in \textbf{R}^{k_1\times k_2}\) is generated, where \(k_2\) means numbers of extracted N-Grams. Then the n-gram embedding is fused with word embedding of S through \(\textbf{M}\), and utilizes the transformer architecture to encode it as follows:

$$\begin{aligned} \left\{ \begin{aligned} u_j^{l+1}&= MHA (Q=u_j^l, K=V=U^l) \\ v_i^*&= v^l+ \sum \nolimits _k u_{i, \textrm{k}}^l \\ \textbf{V}^{l^{*}}&= \textbf{V}^{l}+\textbf{M} \times U^{l}, \end{aligned} \right. \end{aligned}$$
(2)

where \(MHA(\cdot )\) means the multi-head self-attention layer, \(u_{j}^{l}\) is used to query the vector Q that calculates the attention score of other N-Grams elements in the l layer, and \(U^{l}\) is a matrix representing the characteristics of all N-Grams elements in the l layer, providing information for K and V in MHA. The \(v^{l}\) and \( u_{i, \textrm{k}}^{l}\) represent the ith word of l layer and the kth N-Grams element associated with the word. \(\textbf{V}^{l}\) is the embedding matrix of all words and \(v_{i}^{i^{*}}\) is the embedding result of the next input layer.

Capturing only the railway semantics is not capable of identifying the relation, we have to explore the structural features of triple knowledge. Recurrent neural networks (RNNs) are usually used to extract temporal features in the text, but they are hard to train and accumulate redundant information in the last hidden unit [3, 45]. To alleviate this issue, CNN-based models are proposed and perform comparably to RNNs on sequence modeling [12]. In addition, the subject \(e^\textrm{sbj} \in E^\textrm{sbj}\) and the object \(e^\textrm{obj} \in E^\textrm{obj}\) split the input sentence S into three segments, and each segment contributes a lot to the nearest entity. Therefore, a piece-wise max-pooling operation is introduced in each segment to capture fine-grained features instead of implementing a single max-pooling over the entire sentence. Formally, the ith convolutional operation is defined as follows:

$$\begin{aligned} \left\{ \begin{aligned} \hbox {COV}_i&= \hbox {tanh}(Wq+b_1)_i \\ {\textbf {P}}_i&= [\hbox {MaxP}(\hbox {COV}_{i1}),\hbox {MaxP}(\hbox {COV}_{i2}),\hbox {MaxP}(\hbox {COV}_{i3})], \end{aligned} \right. \end{aligned}$$
(3)

where \(q=x_{i-l_1+1:i}\) (\(1\le i\le k_1+l_1-1\), \(l_1\) is the convolution kernel size) is the concatenation of the word vector sequence in the ith sliding window, W is the convolution kernel matrix and b is the bias vector and \(\hbox {tanh}(\cdot )\) is the activation function. The \(\hbox {COV}_{ij}\) corresponds to the convolutional result of the segment split by \(e^\textrm{sbj}\) and \(e^\textrm{obj}\), and the \(\hbox {MaxP}(\cdot )\) is the max pooling operation.

The output of the piece-wise CNN is a score matrix \(\textbf{O}\) that associates with the types of relation. Then for each sentence S, we can calculate probability of the relation type by a softmax operation as follows:

$$\begin{aligned} p(r_{i}\vert S)=\frac{\exp (\textbf{O}_{i})}{ {\textstyle \sum _{j=1}^{k_1}} \exp (\textbf{O}_{j})}. \end{aligned}$$
(4)

Optimization

Our DSERE model aims to automatically enhance the domain semantics and capture fine-grained features for extracting the relation. The objective function of the proposed model is defined as follows:

$$\begin{aligned} \textbf{J}(\theta )=\sum _{S} \log _{}{} p(r_{i}\vert S,\theta ), \end{aligned}$$
(5)

where \(\theta \) is a parameter set learned by deep neural networks and \(r_i\in R\) represents the real label of relation types. Following previous work [36], we adopt gradient descent algorithm to optimize this objective function. Algorithm 1 summarizes the complete procedure for our DSERE model.

figure a

Experiments

This section presents the extensive experiments on DSERE using a real-world dataset. These experiments aim to verify that the proposed DSERE model can identify railway safety risks through relation extraction. We compare our model with the classical models to show its priority in understanding railway semantics. Additionally, an ablation experiment is presented to verify the effectiveness of the DSET mechanism. We also explore how the fine-grained features contained in the triple knowledge structure improve the proposed model’s performance. The overall performance of our model is validated by comparing with recent derivation models. A case study demonstrates that the proposed DSERE greatly enhances the railway semantics, which helps to classify types of the relation.

Table 3 Details of RMSR dataset

Experimental settings

Dataset. The railway maintenance safety risk (RMSR) dataset is the safety supervision log from the South zone of the China Railway System. RMSR depicts the unsafe production status of track maintenance recorded by safety inspectors. It is important to identify the railway safety risk from the RMSR dataset, since the railway maintenance section is the core department. This will ensure the safe operation of the overall railway system. Introducing relation extraction is a novel and intelligent approach to mining the RMSR dataset, showing its practical values and potential values of extending to other departments.

Six months were spent manually annotating the RMSR dataset, and three master volunteers annotated each sentence. According to Definition 1, Definition 2 and Definition 3, four types of railway safety risk relations and eight types of entities were identified. Detailed information on the RMSR dataset is presented in Table 3. Each training input sentence is truncated if the length exceeds 512 tokens. This preprocessing operation helps to speed up the acquisition of railway semantics since controlling the sequence length reduces the training cost of the DSET mechanism.

Evaluation metrics. Following previous studies [31, 43], the held-out evaluation is used to verify the performance of our proposed model. Specifically, 80% of the RMSR dataset was randomly selected as the train set and 20% as the test set In our experiments, Precision@N (P@N), AUC, F1 score, and precision/recall curves were adopted to verify the effectiveness of DSERE.

Baselines. To better understand the domain semantics especially the Chinese railway semantics, we adopt dynamic language representations to initialize the RMSR dataset. In our experiments, we compare the DSERE with different semantics to verify the effectiveness of enhanced domain semantics. Meanwhile, we choose classical DNN models as competitors to demonstrate the importance of fine-grained features for results of relation classification. DSERE was also compared with recent derivation models to further validate the overall performance of our proposed model.

Language models:

W2V: This is a state-of-the-art toolkit of distributed word vector representations proposed in [27], which learns semantics through the two-layer neural networks. Following previous work [23], we represent it as W2V in the subsequent experiments.

ELMo: This is a state-of-the-art language model based on Bi_LSTM neural networks proposed by Matthew et al. [30], which extracts sequence features in the text to obtain context-dependent semantics.

GPT: This model [32] has an excellent semantic understanding ability for text generation tasks. It extracts semantics through the decoder of Transformer mechanism, and adopts a unidirectional left-to-right encoder.

BERT: Devlin et al. [9] proposed this state-of-the-art language model, which improves the natural understanding ability by the deep bidirectional encoder of transformers. The model fully exploits the contextualized features and has been successfully applied in various natural language understanding tasks, such as machine reading comprehension and natural language inference tasks.

XLNet: The core component of this model [47] is the two-stream attention mechanism that takes advantages of auto-regressive and auto-encoding mechanisms to learn semantics. It employs a permutation operation to capture high-order, long-range dependency for downstream tasks.

Classical DNNs:

Bi_LSTM: This model is proposed in the literature [46] to reduce the reliance on handicraft features. Using bidirectional recurrent neural networks, it captures dependency features between entity pairs and context.

CNN: Instead of using Bi_LSTM to capture structural features of the triple knowledge, CNN is proposed in literature [52] to implement relation classification. This network architecture can extract the lexical and syntactic features from the raw sentence by the convolution operations.

PCNN: This is an improved version of CNN architecture that splits the pooling layer into three segments according to the positions of entity pairs. This operation can fully extract fine-grained features for identifying the types of relation.

Recent derivation models:

SeG: This is a light-weight self-attention neural framework [22], which designs a pooling-equipped gate instead of a selective attention mechanism.

MVC: To avoid the impact of external parsers, Amir et al. [41] proposed to extract features from structure view and semantic view. The mutual information is introduced to encourage these two views consistencies.

FIFRE: This model is proposed in literature [55] to extract relation by a representation iterative fusion mechanism based on the heterogeneous graph.

A-GCN: Avoid being affected by the noise of dependency tree, Tian et al. [37] proposed the attention graph convolutional network for the relation extraction task.

FastRE: This model is proposed to improve the efficiency of relation extraction task [20] by taking advantages of the convolutional encoder and the improved cascade binary tagging framework.

TACNN: Geng et al. [12] integrate the CNN model and target attention mechanism for extracting semantic relation to fully explore the information of word embeddings and position embeddings.

Hyperparameter settings

In maintaining a fair comparison, this study chooses the TensorFlow framework to implement the proposed model and tune hyperparameters. In the DSET module, we adopt a bidirectional Transformer networks that consist of 12-layer Transformer blocks and 12 self-attention heads. The dimension of word embedding (or hidden size) is set to 768. In the PCNN module, we use one convolutional layer and a piece-wise maximum pooling layer, where the window size is set to 3 and the hidden size is set to 230 followed the previous work [51]. The batch size is 256, mainly considering better optimization when computing resources are satisfied. We tune the learning rate in {1e-5,3e-5,1e-4,3e-4,1e-3,3e-3,1e-2}, and show the optimal hyper-parameters in Table 4. We conduct all experiments on a server equipped with Intel(R) Xeon(R) Silver 4216 CPU @ 2.10 GHz on 4 NVIDIA Tesla V100 GPU (32GB for each) and Ubuntu 14.04 operating system.

Table 4 Hyper-parameters settings
Table 5 Performance comparisons of different models
Fig. 4
figure 4

The precision–recall comparisons of domain semantics-enhanced analysis

Fig. 5
figure 5

The precision–recall comparisons of knowledge structural features analysis

Performance analysis

Comparison With the SOTA models

To evaluate the performance of our proposed DSERE, we compare it with competitive models on the RMSR dataset. Table 5 shows the performance of the three metrics, and some comparison results are marked to better demonstrate our interesting findings. The rows in the table are models that combine various semantic features with different knowledge structural features. There are interesting observations as follows:

  • The P@N metric calculates the average precision of top N test data, and the higher values are, the better performance obtained by the model. It can be seen that our DSERE model can always get better results than other models by a large margin especially on P@200 and P@300 metrics, which shows the effectiveness of our model. Although the DSERE model on the value of the P@100 metrics is slightly lower than XLNet+PCNN, it surpasses the XLNet+PCNN on values of P@200, P@300 and average accuracy with the increase in the amount of dataset. This indicates that our proposed model better understands the railway semantics and thus helps to identify the relation between the subject and object of the railway safety risk.

  • It is observed that models based on the enhanced domain semantics outperform other models on the AUC metric. For example, observing the CNN-based models and Bi_LSTM-based models, we find that the DSET+CNN model and the DSET+Bi_LSTM model respectively outperform 6.18% and 5.75% with those optimal models based on same structural features. For the PCNN-based models, our DSERE model achieves the best performance, meaning that enhancing domain semantics enables models to deeply understand the Chinese railway text.

  • The proposed model achieves the 76.00% result on F1 score, which is superior to other models on this metric. The DSERE model achieves 2.05% absolute gain in the F1 score compared with the state-of-the-art model XLNet+PCNN. On one hand, this is because the enhanced domain semantics can bridge the gap between railway semantics and universal semantics, breaking through the challenge problem of neural networks in mining the Chinese railway text. On the other hand, the PCNN module can capture the fine-grained structural features of the triple knowledge simply and effectively, promoting the performance of identifying types of relation.

Furthermore, we use aggregate precision/recall curves to intuitively compare the impact of enhanced domain semantics and universal semantics on the real RMSR dataset and show them in Fig.  4. It is observed that enhanced domain semantics-based models consistently outperform universal semantics-based models, demonstrating that enhancing railway semantics improves the performance of identifying relation. The permutation operation of the XLNet architecture that addresses the data corruption problem still fails to address the insufficient domain semantics, especially the Chinese railway semantics. This means that the proposed model automatically enhances railway semantics and demonstrates its potential to support practical applications in railway systems.

Table 6 Ablation analysis of domain enhanced semantics

Ablation study of DSET mechanism

The DSET mechanism is an important module of our proposed model, that aims at enhancing the domain semantics for relation classification. In this section, we conduct ablation experiments to explore how DSET improves the performance of our DSERE model. Through pre-trained models capture features of a universal language framework, they fail to fully understand the specific domain semantics, especially Chinese railway semantics is hidden in a scarce corpus. As Table 6 shows the ablation results of DSERE model on the real-word dataset, we can easily see that the DSET mechanism significantly improves the performances of our proposed model. For example, on the Mean of the P@N metric, the DSERE model increases by 2.43% than the DSERE model without the N-Grams mechanism, which is significant in the practical identification of the railway safety risk. In addition, we observe that without DSET mechanism, our model degenerates BERT+PCNN model only considering universal semantics and fine-grained structural features of the triple knowledge. The performance of the DSERE model without the N-Grams mechanism is superior to the DSERE model without DSET, since the plain transformer can capture more features for the final classification. However, both of these two compared models fail to enhance the domain semantics and show the limitations of neural networks in understanding the Chinese railway text. Our DSET mechanism can fully learn the railway semantics for identifying the types of relation in the practical scenes of China railway system.

Table 7 Performance comparisons of recent derivation models

Fine-grained feature analysis

Three classical neural networks were compared based on the same semantics to verify the effectiveness of fine-grained feature extraction, as shown in Fig. 5. This figure shows that PCNN-based models consistently outperform Bi_LSTM and CNN-based models, consistent with the values in Table 5. This phenomenon verifies that capturing fine-grained features can improve the performance of relation classification, since the entity pair splits the input sentence into three segments and each segment contributes a lot to the nearest entity. The unique contribution of each segment may decrease if we use only a max-pooling operation on the whole sentence. Furthermore, the PCNN module is a convenient and efficient network architecture that captures fine-grained features through the piece-wise pooling operation without introducing extra parameters and complex network layers. This convenience enables the DSERE model to be more practical in real scenes of the China railway systems.

Comparison with the recent derivation models

The comparison results with recent competitive models are shown in the Table 7. We summarize the semantics, solution, metrics of AUC and F1 scores to evaluate the overall performance. The performance of competitors is inferior to our proposed model, because the compared models only use the universal semantics but ignore the domain semantics. This reveals that only capturing universal semantics and then constructing complex networks of relation extraction networks may be ineffective in addressing the challenges of identifying railway safety risk. In contrast, our proposed model can fully explore and utilize the railway semantics for performance improvement. It is also worth mentioning that our model can capture meaningful fine-grained features with a convenient and effective PCNN module.

In our experiments, we also test other configurations to set optimal coefficients. When we change the structure of the DSET mechanism to larger networks with 24-layer Transformer blocks, 1024 hidden units and 16 self-attention heads, the performance of model has no gain in the metrics (in some cases even declines at least 3%). This is because the domain semantics from the railway lexicon overfit in the deep networks, resulting in obtained semantics being close to the universal semantics. For the PCNN architecture, we try to set the window size as 1 and 5; however, this operation does not improve the experimental results. This is because the convolutional operation fails to capture temporal and syntactic features of input sentence with the window size 1. When window size is 5, the convolutional operation cannot effectively obtain phrase features, since the length of the railway phrase is usually 3. Therefore, we present the DSERE architecture as the final model for identifying the railway safety risk.

Table 8 Examples from the RMSR test set for comparison

Case study

To further compare the effectiveness of our proposed DSERE model, we randomly select three samples from test set of RMSR to report some micro-level case studies in Table 8. Specifically, we show each sample in three lines, where the top line is the ground truth, the second line is the result of compared model and the last line is the result of our proposed model. Moreover, we use the “[]”symbol with a subscript to express the entity pairs and their relation types. Inspecting these cases, we have the following observations: (1) Pre-trained model BERT is trained on the open domain corpus that lacks railway words, e.g., “tamping rod”, “switch rail”, and thus fails to understand the semantics of domain words. Then the BERT+Bi_LSTM model mistakes the relation type that influenced by the dependency of the word “artificially”, and incorrectly identifies the relation types of \(S_1\) as \(R_1\) representing the relation of human safety risk. (2) The PCNN module can capture fine-grained structural features to improve the performance of identifying the relation. For example, in the instance \(S_3\), the ELMo+CNN recognizes the relation between “Mr. He” and “absented from the business examination” as \(R_1\) (the relation of human safety risk). At the same time, the fine-grained features from “employee educational ledger” indicate the ground truth is \(R_4\) means the relation of management safety risk. From the above analysis, our proposed DSERE model can correctly identify railway safety risk based on enhanced domain semantics and fine-grained features of knowledge structure.

Discussion

Strengths of the proposed method

This section mainly discusses the strengths of our proposed method from four aspects.

  • This paper uses relation extraction to explore a new and uncultivated area to identify the railway safety risk. The method projects the railway safety risk into a knowledge interconnection space, supporting the incident/accident inference in practical applications.

  • Our proposed DSERE model can bridge the gap between the railway and universal semantics, then obtain better representations.

  • This study captures fine-grained features by a simple and effective PCNN architecture instead of complex encoder networks.

  • The extensive experimental results show that our DSERE model is more suitable for identifying railway safety risk in the practical scenario. The promising performance indicates that the proposed model has potential to be a convenient and intelligent assistant for the knowledge-driven management of railway safety risk.

Weakness of the proposed method

Two limitations of the proposed method are still worth solving. First, the ambiguity of unimodal semantics may occur with a small probability that would affect the model’s understanding of the input sentence. A multi-modal semantic enhanced relation extraction is worth exploring in future work. Second, the design of the proposed model is only for the railway scenario, in which the domain semantics cannot support understanding the other transportation scenarios, such as road transport, air transport and water transport. For example, roads are the key to maintenance in road transport [44], but the semantics of railway (e.g., rail) cannot be directly applied to road scenarios. In the future, we will focus more on identifying safety risk in various transportation scenarios.

Conclusion

In this paper, we introduce the triple knowledge of knowledge graph to model the railway safety risk with a knowledge interconnection mode, and recast its identification as the relation extraction task. Afterward, a novel and effective DSERE model is proposed to extract the relation between the subject and object of the railway safety risk. Instead of twice pre-trained models, a simple and effective domain semantic enhanced transformer mechanism is designed to bridge the semantic gap between Chinese railway and universal domains. Furthermore, our DSERE model capture the fine-grained features contained in the structure of triple knowledge through a convenient piece-wise pooling operation. Experimental results on the real-world dataset verify the effectiveness of our proposed model and the potential to be an intelligent assistant for the knowledge-driven management of railway safety risk.

In the future, we will concentrate on solving the two limitations that discussed in the subsection of Discussion part. In addressing the first limitation, we plan to integrate image semantics of entity pairs to alleviate the ambiguity of natural language. While for the second limitation, we plan to investigate the similarities and differences between the railway and other transportation scenarios, and then reshape the domain lexicon to improve the transferability of the DSERE model.