1 Introduction

World is making a major push towards a new industrial revolution aimed at effectively automating production systems, such as Industry 4.0 [1]. Industrial Internet of Things (IIoT) application environments typically integrate a variety of technologies, such as embedded devices, cloud computing, machine-to-machine communication, etc., in a closed industrial environment [2], as shown in Fig. 1. IoT-connected devices are now expanding quickly in quantity, with recent estimates suggesting that there will be around 70 billion internet-connected devices by 2025. A network of linked intelligent devices is known as Internet of Things (IoT), a sub-type of which is IIoT. IoT links all ‘Things’ to the internet, enabling all ‘Things’ to gather data and assign tasks to the devices to which it is linked through sensors, etc. [3], IoT devices mostly consist of processors, sensors, actuators and communication hardware, etc., and can process and send the data obtained from these devices [4]. Today, smart factories are combining industrial artificial intelligence (IAI) with IIoT for purposes such as data-driven predictive analytics and collaborative modeling in industrial production conditions [5], but challenges still exist.

Fig. 1
figure 1

The general architecture of Industrial Internet of Things (IIoT). In IIoT, local data processing is implemented by edge computing and cloud-based servers, which in turn optimize industrial processes and provide new industrial services

There are issues in the combination of IAI and IIoT. For example, in real industrial environment, data distribution is not centralized. Gathering large amounts of data may cause serious information privacy threats, such as malicious eavesdropping of information, or even malicious attacks, resulting in insufficient security in the process of training IAI models [6, 7]. Furthermore, real industrial systems contain many nodes and datasets at once, which makes it challenging to effectively handle traditional deployments [8, 9]. Therefore, in order to achieve and support an effective combination of IAI and IIoT, an optimal learning framework is needed.

Different edge devices in industrial applications generate skewed local data distributions. [Skewed data distributions are also known as non-independent identical distributions (non-i.i.d.).] Non-i.i.d. characteristics may lead to a severe impairment in the performance and convergence rate of trained neural networks, as the data collected at each endpoint do not hold a constant distribution. In order to achieve a performance comparable to centralized solutions, the traditional approach updates the model frequently by increasing the number of communication rounds, resulting in an overall degraded performance of the global model with lower accuracy [10]. In particular, there are complex temporal dependencies between different events and terminals, leading to heterogeneity in the continuous IIoT data collection [11]. Furthermore, statistics heterogeneity exists not only in the distribution of the collected data, but also in the specific circumstances of different nodes, which results in heterogeneity of nodes. In real IIoT applications, different nodes are subject to different types of cyber attacks, which also induced node heterogeneity which makes the global model less accurate. Therefore, IIoT security is important, as IIoT is vulnerable to attacks at all layers in the architecture [12]. However, IIoT devices often have limitations (e.g., power, storage capacity, computing power, etc.) that make the security of the IIoT vulnerable to challenges [13]. At the same time, the relatively independent and widely heterogeneous nature of IIoT devices makes it difficult to construct the security guarantee [14]. The security of data is therefore one of the core concerns of IIoT [15].

Recently, federated learning (FL), which integrates IAI with IIoT, has been a promising approach for the development of smart factories (i.e., IIoT data offloading and caching, IIoT mobile crowd sensing, etc.) [16]. Rather than requiring all endpoints to upload raw data, FL constructs a global model by aggregating sub-models trained on each endpoint based on their individual data sets [17]. FL aims to involve each endpoint in the construction of the model while privacy data leakage [18], thus contributing to IIoT in terms of data privacy, cost of communication between devices, etc. [19]. FL was originally designed for edge devices, where the endpoints collaborate to train models without exchanging local raw data, then use the FL to combine the submodels trained on the endpoints into a global model. McMahan et al. [20] proposed federated averaging (FedAvg), a standard algorithm for obtaining local models by performing stochastic gradient descent (SGD) at each terminal and performing weighted averaging of the weight parameters in the local models. The global model is obtained by a weighted average of the fixed weights, with the weight of each terminal proportional to the size of the original dataset at each terminal. Wang et al. [21] proposed FedMA algorithm, which matches hidden units with similar feature extraction labels to each other and averages them to create a shared global model. Li et al. [22] proposed the FedProx algorithm, which is a generalization and re-parameterization of FedAvg. FedProx proposes a proximity term. The approximation term is calculated based on the \({l_2-\text{norm}}\) distance between the current global model and the weight parameters of the local model. Therefore, it works better with non-i.i.d. data as the approximation term limits the update of the global model by the local model. Karimireddy et al. [23] proposed the SCAFFOLD algorithm, which is an improvement on FedAvg. The SCAFFOLD algorithm uses control variables to correct the ‘client-drift’ problem in its local updates. It has also been shown that the SCAFFOLD algorithm requires significantly less communication and is not affected by data heterogeneity or client-side. Chen et al. [24] proposed an asynchronous federated learning model and designs a lightweight node selection algorithm. The method is demonstrated empirically to be optimal in various cases of independent identical distribution (i.i.d) and non-i.i.d. Ouyang et al. [25] proposed ClusterFL, a novel multitasking federated learning framework for clusters, which automatically captures the intrinsic clustering relationships between data from different nodes and improves convergence speed and model accuracy by eliminating the slower converging nodes in each cluster.

Researchers have proposed algorithms to detect cyber attacks for IoT systems [7]. Cyber attack intrusion detection algorithms are improved using a deep learning framework. Ioannou et al. [26] proposed to use SVM models to identify cyber attacks and to perform multiple classifications of cyber attacks. The proposed C-SVM detection model showed good performances with a high classification accuracy. Zhang et al. [27] proposed a deep belief network (DBN), a genetic algorithm (GA) based intrusion detection algorithm that can improve the detection accuracy of DBN intrusion detection models by adaptively generating the number of hidden layers and neurons in several iterations. Li et al. [28] proposed a fused multi-convolutional neural network approach and used it to detect anomalies in the IoT. It was evaluated on an IoT dataset and the results show that the algorithm classifies attack types with high accuracy. Hassan et al. [29] proposes an algorithm that uses a weight-decreasing long short-term memory (WDLSTM) network to maintain the long-term dependency of extracted features. They developed a deep learning model using convolutional neural networks (CNNs) to extract useful features from time-series data of intrusion detection systems (IDSs). The superiority of this algorithm has also been experimentally validated. Xin et al. [30] proposed S-TCN, an improved multi-classification network based on temporal convolutional network (TCN), and they also experimentally demonstrated the effectiveness of S-TCN in handling temporal heterogeneous data in IoT scenarios. All of the above algorithms lack effective methods for handling time-series data with node heterogeneity present in real IIoT, have poor application to high-dimensional time-series data, and have not been experimentally validated on time-series data with node heterogeneity in real IIoT.

To improve the security of IIoT, many researchers have carried out relevant research. One of the most classical approaches is the detection of attacks using classification. Ge et al. [31] proposed an IoT cyber attack detection model using a deep learning model, which uses feed-forward networks to classify various cyber attacks. According to experiments, the accuracy of the proposed method outperforms that of traditional support vector machine (SVM) based intrusion detection networks. Aamir et al. [32] proposed a semi-supervised intrusion detection model, which is based on principal component analysis and random forest clustering techniques and is mainly used to identify DDoS attacks. Hara et al. [33] proposed an automatic encoder intrusion detection model based on semi-supervised learning. Simulation experiments were also conducted on an IIoT dataset and the model was found to be more accurate than an intrusion detection model using deep neural networks. Mcdermott et al. [34] proposed A bi-directional network framework based on long short-term memory (LSTM) to identify botnets in data. The framework is also experimentally verified to have high accuracy. Pacheco et al. [35] proposed a machine learning (ML) based intrusion detection system. The model has high accuracy for cyber attack intrusion detection in IIoT. The above experimental pairs do not take into account the different attacks and different levels of attacks suffered by different nodes, which may lead to differences in the distribution between the nodes’ local data, thus causing the problem of heterogeneity in the nodes’ data and making the accuracy of the trained model low.

In our study, federated sequential learning (FSL) based attack detection for IIoT networks in the FL framework is proposed. FSL is a TCN-based attack detection algorithm for detecting and classifying time-series industrial datasets with node heterogeneity, where FSL effectively extracts temporal features from time-series IIoT data. The algorithm is used to characterize time-series industrial datasets with node heterogeneity. The FSL algorithm has benefits over contemporary algorithms. The following are our main contributions:

  • We propose the FSL framework for detection of cyber attacks under real-world IIoT, where FSL can extract time-series features from sensory signals from real industrial application, while reducing the degradation of model accuracy due to the node heterogeneity.

  • We propose a TCN-based local model to characterize a single client’s time-series dataset and extract time-series features from the one-dimensional signals to improve the model’s classification accuracy against cyber attacks.

  • We propose a FedProx-based federated strategy to handle the low global model accuracy problem due to node heterogeneity. FSL trains sub-models on each endpoint and performs model aggregation with the uploaded sub-models to generate a global model.

  • We built a real testbed to simulate an industrial environment in our experiments. Raspberry Pi is used as an endpoint to process both traditional and IIoT datasets for our experimental tests with good results, demonstrating the feasibility of our proposed approach.

The remainder of the paper will be arranged as follows. We present the problem to address in Sect. 2. IIoT is vulnerable to cyberattacks, which can cause data to be prone to anomalies and affect the efficiency of industrial production. Then, in Sect. 3, we describe the main principles of FSL. Afterwards, we conduct experiments in Sect. 4 to prove that the suggested model framework is effective in IIoT security. We have some discussions about experimental and test platforms, leaving some clues for our future work in Sect. 5. With a summary and comments on potential research areas, we conclude our study in Sect. 6.

Fig. 2
figure 2

Cyber attacks towards IIoT. Different endpoints communicating through the communication layer to the cloud server are vulnerable to cyber attacks which can compromise information and cause serious consequences

2 Problem formulation

In this section, we discuss the problem with today’s IIoT in cyber attack detection. When the FL framework is applied at the IIoT level, IoT applications, communication layers and cloud servers are vulnerable to attacks. In a real-world IIoT scenario, it may suffer from lack of data centralization, poor device scalability and cyber-attacks. Among other things, the IIoT is vulnerable to cyber-attacks, resulting in reduced productivity, as shown in Fig. 2. In a realistic IIoT scenario, various nodes may be subject to different types and degrees of cyber attacks. These cyber attacks may result in different distributions of transmitted data and heterogeneity of data distribution on different nodes, as shown in Fig. 3. We need to process the time-series data collected by the individual terminals and build a cyber-attack detection model. The detection model determines whether the collected industrial data have been subjected to a cyber attack. If a terminal is under cyber attack, the detection model also needs to classify the type of attack.

Fig. 3
figure 3

Statistics heterogeneity on distributed nodes. Node heterogeneity exists because different sensors are in different locations and process different objects, and therefore collect different data distributions

Suppose there are I devices, and in each device \(device_i\) there is a private data which will not be uploaded \(\mathcal {D}_i=\left\{ \mathcal {X}_i, \mathcal {Y}_i\right\}\), where \(\mathcal {X}_i\) is a one-dimensional time-series dataset, and \(\mathcal {Y}_i\) is the corresponding label. The data on each \(\mathcal {D}_i\) will be used for local supervised model training. In training we need to minimize the loss function, so we aim for the following formula:

$$\begin{aligned} \min _{W_i} L_s\left( \mathcal {D}_i, \mathcal {M}_i, W_i\right) , \end{aligned}$$
(1)

where \(L_s\) is the loss function for the model task, where the parameters are the training data with labels \({\mathcal{D}}_i\), the model structure \({\mathcal{M}}_i\) and the weighting parameter \(W_i\).

The FL framework aims to minimise the loss of the classification model for I devices by training a classification model for cyber attacks. The model can discriminate whether the collected data are anomalous or not and can classify the type of attack on the anomalous data. Thus, the optimization of the local model for the ith device can be written as

$$\begin{aligned} \min _{W_i} \sum _{i=1}^I \frac{|D_i|}{|\cup_i{D}_i|} L_s\left( \mathcal {D}_i, \mathcal {M}_i, W_i\right) . \end{aligned}$$
(2)

Cyber attack detection at IIoT can be implemented as a multi-classification task of the collected data, which requires the construction of a classification model. The role of the model is to classify the cyber attacks and the multi-classification task needs to be completed. In the multi-classification task, we use probability-based classification of the type of cyber attack. To achieve intrusion detection of cyber attacks, the output of the final layer of the model is processed using a softmax function. The output values are converted to true probability values in the softmax function processing, and the category corresponding to the highest probability is used as the type of cyber attack detected by the model. For the ith device, the classification result can be described as follows:

$$\begin{aligned} \begin{aligned} \text{IIoT}\;\text{Status}&=\text {softmax}(P\left( Y=c_k \mid X=x\right) )\\&=\text {softmax}{\frac{p\left( X=x \mid Y=c_k\right) P\left( Y=c_k\right) }{\sum _k p\left( X=x \mid Y=c_k\right) P\left( Y=c_k\right) }}, \end{aligned} \end{aligned}$$
(3)

where \(c_k\) denotes the different types of cyber attacks(\(c_0\) indicates a normal state) and k denotes the number of attack types.

Fig. 4
figure 4

IIoT architecture based on federated learning. Different edge servers are responsible for collecting local data from different endpoints and training a local model, uploading the local model to the cloud server and obtaining a global model through the model aggregation

3 Methodology

In this section, we propose an FSL framework to deal with the IIoT challenge of extracting time-series features and heterogeneity of data across different nodes. A generic IIoT architecture based on federated learning is shown in Fig. 4. Different types of edge industrial devices train a local model using local datasets and communicate with the cloud server for model gradient upload to obtain a global model. This global model is transferred to each edge industrial device for local model updates.

3.1 Local time-series modeling

To construct distributed local models, TCN is able to efficiently perform convolutional operations on the temporal convolutional layers to extract features of the time-series signal across time steps. In the one-dimensional TCN network, the input of one layer feeds into the output of the following layer. To enable TCN to better extract information about the timing features in a one-dimensional time-series signal, we use a dilation convolution method. Dilation convolution enables to obtain global information about the whole sequence so that each point of the output is constructed from most of the points of the whole time-series. The dilation factor d represents the step size of the time gap to be crossed when selecting the input for each layer, where the step size of the time gap to be crossed varies from layer to layer, with the gap increasing as the number of layers in the network increases, depending on frequency at which the sensor collects the signal. At the same time, the use of larger expansion factors facilitates the extension of the sensing domain.

The structure of the TCN used in our study is a 4-layer one with \(d=[1,2,4]\). The output \(X_l\) for each layer is expressed in the following equation:

$$\begin{aligned} X_l= \,& {} \text {ReLU} \left( \sum _{i=0}^{k-1} w^l_i \otimes x_{s-d \cdot j} \right) , \end{aligned}$$
(4)
$$\begin{aligned} w_{i}= \,& {} w_{i-1} - \eta \sum \frac{\partial L}{\partial w_{i-1}}, \end{aligned}$$
(5)

where d denotes the dilation factor, l denotes the number of layers, \(w^l_i\) denotes the lth layer of filters in the ith device, \(\otimes\) denotes the one-dimensional time convolution, \(s - d\cdot j\) denotes the past direction, \(x_{s-d \cdot j}\) denotes the time-series signal of the previous layer and k denotes the filter size. L denotes loss function, the \(w^i\) of the ith device consists of all \(w_l^i\) and when communication takes place each device needs to upload the \(w^i\) of the local model. \(\eta\) denotes learning rate.

3.2 Heterogeneous node-federated modeling

When the local model training is completed, each client needs to upload the weight parameters from the locally calculated model. When uploading the model information to the cloud, the server aggregates the model parameters. Because different clients experience different types and levels of cyber attacks, there are differences in the distributions of the collected data, making node heterogeneity across clients.

To cope with the node heterogeneity, we leverage FedProx for model aggregation, which addresses the differences in communication and computing power between devices, as well as the non-i.i.d of data between devices. FedProx is a generalization of FedAvg that improves the local update by proposing an approximation term prox that subtracts the weight parameters from the global model of the previous round from the computed regularization term prox so that the local update does not deviate too much from the global model. The improved objective function \(h_i\) to be optimized is as follows:

$$\begin{aligned} \min _w h_k\left( w _\text{global}; w_i\right) =L_i(w)+\frac{\mu }{2}\left\| w_\text{global}-w_i\right\| ^2, \end{aligned}$$
(6)

where \(w_\text{global}\) and \(w_i\) are the global model and local model, respectively. \(L_i(\cdot )\) is the loss function for the ith device, and \(\mu\) is the associated parameter.

Fig. 5
figure 5

Federated sequential learning architecture. TCN is used to model the local time-series data under the FL framework. The model parameters are obtained by training the model on each endpoint local data. The model parameters are then sent to the cloud server, which uses FedProx to aggregate the model parameters to create a global model

figure a

3.3 Federated sequential learning

To perform attack detection on time-series datasets from real industrial data, we use a TCN model as a local model in the FL framework. We use TCN to train the local model weighting parameters \(w_i\) on the local dataset \(D_i\) and then update the approximation terms of the global model \(M_\text{global}\) using the FedProx algorithm. The process is repeated until convergence as shown in Algorithm 1. The whole FSL approach is demonstrated in Fig. 5.

4 Experiments

In this section, we conduct a number of experiments on our self-built testbed to evaluate the performance of our proposed FSL algorithm. Datasets include MNIST, Bearing Defect Detection (BDD) and Edge-IIoTset. We concentrate on evaluating the effectiveness of the FSL algorithm after training the aforementioned datasets. This can reflect that the FSL algorithm is more advantageous when training time-series data with node heterogeneity. To recreate a real industrial environment, we built a real testbed to conduct the experiments, making the results more convincing.

Fig. 6
figure 6

Testbed. Our testbed consists of a server and a client, with the laptop acting as the server for testbed and the Raspberry Pis acting as the client for testbed

4.1 Testbed setups

Our testbed consists of a server and a client, as shown in Fig. 6. Figure 7 is a diagram of the testbed, where the server uses a 12th generation Intel Core i7-1260P with MX550 2GB GDDR6 discrete graphics and the clients are four Raspberry Pi 4B with ARM Cortex-A72 (quad-core, 1.5GHz) and 500 MHz VideoCore IV. The server and clients communicate using a wireless local area network (WLAN) built in the lab.

Fig. 7
figure 7

Diagram of the testbed. The testbed consists of four Raspberry Pi 4Bs and a laptop computer communicating via a wireless local area network

4.2 Dataset

MNIST [36], Bearings Defect Detection [37] are typical datasets for validating the accuracy of deep learning networks, while Edge-IIoTset is an IIoT dataset subject to cyber attacks [38]. Detailed information on the datasets can be found in Table 1.

Table 1 Detailed information about the data sets

MNIST

MNIST is a grey-scale image dataset consisting of 250 handwritten numbers 0–9, all of 28\(\times\)28 in size, including 70,000 data samples and divided the training and test sets in a 6:1 ratio. The dataset is processed so that each sample is one-dimensional and 28\(\times\)28 in length.

Bearings defect detection (BDD)

Each sample in the bearing defect detection dataset is sampled at 6000 moments. Where the bearing condition is represented by the numbers 0–9, where 0 indicates normal. There are three types of bearing faults, including two faults in the bearing rings and one fault in the bearing balls. There are three different diameters of bearings and therefore nine types of fault conditions.

Edge-IIoTset

Edge-IIoTset generated by Ferrag et al. [38] is a real-world cybersecurity dataset for IoT and IIoT applications. To adequately assess the viability of our suggested strategy, cyber assaults are added to the generated dataset, which is created utilizing a dedicated IIoT testbed with a large number of representative devices, sensors, protocols, and cloud/edge setups, we selected 40,000 normal and 33,000 anomalous samples and divided the training/testing/valid ratio sets in a 7:2:1 ratio. Among them, the Edge-IIoTset data types include:

  • Normal: This type of normal data are data that are not subject to a cyber attack and therefore do not contain any outliers.

  • DDoS: This type of attack causes data anomalies due to the number of requests from the attacker, which prevents the target server or network resources from working properly.

  • Vulnerability scanning attack (VSA): This type of attack is typically conducted by an attacker looking for potential network entry points.

  • SQL injection (SI): This type of attack exploits a security flaw in the application communicating with the database, thereby corrupting the dataset.

  • Uploading attack (UA): This type of attack involves uploading malicious program files to a web server, gaining administrative privileges and thus conducting a cyber attack on the data.

  • Backdoor attack (BA): This type of attack allows an attacker to conduct a cyber attack by exploiting a vulnerability in a system, thereby providing unauthorized remote access to an infected IoT device.

  • Password breaching attack (PBA): This type of attack involves breaking passwords by trying successive combinations to attack the dataset.

4.3 Comparative models

To objectively evaluate the local model performance, we need to compare our FSL model with classic local models. We choose to use a CNN model and an LSTM model for training and to compare with our proposed FSL. The CNN model is a 4-layer 1D convolution: two fully connected layers, two pooling layers and a softmax output layer; the LSTM model is a 4-layer hidden layer and 2 fully connected layers.

In the FL framework, we set the number of local training sessions to 2 rounds. After each client completes the model training sessions locally, it will communicate with the server to upload the weight parameters and aggregate the global model, and we set the number of communication sessions to 50.

4.4 Evaluation metrics

We employ accuracy, precision, recall, and F1 scores to evaluate performances of CNN, LSTM and FSL models when processing test data from MNIST, BMD and Edge-IIoTset. The definitions are as follows:

$$\begin{aligned} \text {Accuracy }= \,& {} \frac{\text{TP}+\text{TN}}{\text{TP}+\text{FP}+\text{TN}+\text{FN}}, \end{aligned}$$
(7)
$$\begin{aligned} \text {Precision }= \,& {} \frac{\text{TP}}{\text{TP}+\text{FP}}, \end{aligned}$$
(8)
$$\begin{aligned} \text { Recall }= \,& {} \frac{\text{TP}}{\text{TP}+\text{FN}}, \end{aligned}$$
(9)
$$\begin{aligned} \text {F1}\, \text {score}\,= \,& {} 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision}+\text{Recall}}, \end{aligned}$$
(10)

where true positive (TP) is the number of the positive category that the model correctly identifies as positive; true negative (TN) is the number of the negative category that the model identifies as correctly being negative; and false positive (FP) is the number of the negative category that the model incorrectly identifies as positively. False negatives (FN) are the instances when the model misclassified a positive category as a negative; TP + FP is the sum of all positively predicted categories; and TP + FN is the sum of all positively classified categories in the original dataset.

4.5 Results analysis

We trained CNN, LSTM and FSL on our testbed using MNIST, BDD, and Edge-IIoTset, as shown in Figs. 8910. The model accuracies gradually improved and the loss gradually decreased after different training rounds. We applied the trained CNN, LSTM, and FSL models on MNIST, and industrial dataset BDD dataset and Edge-IIoTset and obtained results, as demonstrated in Tables 234. Next, we discuss the experimental results in detail.

Fig. 8
figure 8

Training accuracy and loss of FSL, LSTM, and CNN on MNIST data set

Table 2 Classification performance of models based on MNIST

In the MNIST dataset, both the FSL-trained model and the CNN-trained model were able to achieve accuracy above 0.8 in the test set, with the CNN model reaching a maximum of 0.91 and the FSL model reaching a maximum of 0.82, while the LSTM-trained model had an accuracy of up to 0.78. As can be seen from the trend plots of the loss functions, the loss functions of the FSL and CNN-trained models converged better than the LSTM-trained model, as shown in Fig. 8. Furthermore, the loss function convergence of the CNN and FSL models is approximately the same, but the convergence of the LSTM model is relatively poor and slow. According to Table 2, since the LSTM and FSL models are mainly used to process data with correlated time-series, MNIST does not process non-time series datasets as well as the CNN model with approximate structure. The CNN model is considered to be a better choice for the classification task, so it can be seen from the results that the CNN performs much better. Whereas the outputs of the LSTM and FSL models need to be trained through the entire input, both of which have memory capabilities, the sensitivity of the LSTM and FSL models to time-series leads to poor classification results when dealing with MNIST data without time-series properties.

Fig. 9
figure 9

Training accuracy and loss of FSL, LSTM, and CNN on BDD data set

Table 3 Classification performance of models based on BDD

In the BDD dataset, both the FSL-trained model and the LSTM-trained model achieved an accuracy of 0.9 or higher on the test set. In addition, the loss function of the FSL-trained model converged faster than that of the LSTM-trained model. The CNN-trained model had better classification results in the MNIST dataset, but performed poorly in the BDD dataset. Meanwhile, the loss function of the CNN-trained model converged poorly, fluctuating up and down around 0.5, as shown in Figure  9 and Table  3. Since the BDD dataset is a bearing vibration dataset, which is an industrial time-series dataset. the samples in the BDD dataset have strong time-series correlation, so the LSTM and FSL trained models have high classification accuracy. Moreover, since the LSTM and FSL models have the ability to remember timing features, they can greatly improve the classification accuracy of the models and the classification effect is better than that of the CNN-trained classification effect is better than that of the CNN-trained models.

Fig. 10
figure 10

Training accuracy and loss of FSL, LSTM, and CNN on Edge-iiotset

Table 4 Classification performance of models based on Edge-IIoTset

In the Edge-iiotset dataset, the test set had the highest accuracy of 0.95 on the FSL-trained model, while the training set had an accuracy of 0.81 on the LSTM model, but the highest accuracy of 0.78 on the CNN model. Meanwhile, the FSL loss function converged faster and gave the best results. And the loss function convergence of LSTM and CNN models are basically the same, as shown in Fig. 10. According to Table 4, it can be seen that the CNN, LSTM and FSL models generally have higher classification accuracy for normal signals. If we simply implement the binary classification problem of determining the presence or absence of a cyber attack, all three models have good recognition results. However, to classify cyber attacks specifically, the classification accuracy is different for different types of cyber attacks. One of them, according to Table 4 shows that in the case of VSA, UA and BA, detection and classification are poor and the models tend to confuse VSA and UA with normal signals. This is because these two types of attacks have less impact on the timing signal, which tends to make the model detect poorly. However, the overall results show that the models trained with FSL are better at detecting overall cyber attacks than those trained with LSTM and CNN.

5 Discussion

In the experiments using MNIST, the classification accuracy of the models trained by CNN, LSTM and FSL were lower, especially the LSTM model with the memory module and the FSL model, which did not perform as well as the traditional CNN model. This is due to the data pre-processing in the MNIST dataset, which converts all image samples with multiple layers into one dimension, resulting in the loss of structural information. Whereas MNIST is an image dataset, the data itself is not a time-series. However, when using the BDD dataset and the Edge-IIoTSet dataset, the presence of temporal information in these two industrial datasets allows the LSTM and FSL models to perform better, especially the FSL model. An accuracy of 0.92 can be achieved in the BDD dataset and 0.95 in the Edge-IIoTSet, which can demonstrate the efficacy of our proposed technique in identifying samples of cyber attacks.

In fact, there are many types of cyber attacks in real industrial environments, and we only represent six of the more common types of cyber attacks. More types of cyber attack data can be collected in the future to further improve the experiment. During the experiment, only four Raspberry Pi 4Bs were used as clients, making the heterogeneity of nodes in the experiment not obvious enough, and the experimental platform can be improved in the future.

6 Conclusions

IIoT brings revolutionary technology support to modern times. In this work, we proposed a possible solution based on federated sequence learning (FSL) for the construction of global models of different time-series in the context of data isolation, and which has the capability to detect cyber attacks. By conducting experiments on MNIST, BDD and Edge-iiotset industrial data in a distributed testbed, it is verified that FSL provides better detection performances for industrial time-series sensory data more effectively than traditional methods. Thus, our proposed attack detection method FSL is promising in terms of feasibility, robustness, and accuracy in practical IIoT scenarios. FSL can be applied to the detection of cyber attacks on one-dimensional signals in the real IIoT. In the future, we can conduct research on the detection of unknown attacks and improve the scope of the application of attack detection systems.