Accelerating privacy-preserving momentum federated learning for industrial cyber-physical systems

Federated learning (FL) is a distributed learning approach, which allows the distributed computing nodes to collaboratively develop a global model while keeping their data locally. However, the issues of privacy-preserving and performance improvement hinder the applications of the FL in the industrial cyber-physical systems (ICPSs). In this work, we propose a privacy-preserving momentum FL approach, named PMFL, which uses the momentum term to accelerate the model convergence rate during the training process. Furthermore, a fully homomorphic encryption scheme CKKS is adopted to encrypt the gradient parameters of the industrial agents’ models for preserving their local privacy information. In particular, the cloud server calculates the global encrypted momentum term by utilizing the encrypted gradients based on the momentum gradient descent optimization algorithm (MGD). The performance of the proposed PMFL is evaluated on two common deep learning datasets, i.e., MNIST and Fashion-MNIST. Theoretical analysis and experiment results confirm that the proposed approach can improve the convergence rate while preserving the privacy information of the industrial agents.


Introduction
Industrial cyber-physical system (CPS) is an emergent technology that focuses on the integration of computational applications with physical devices [1][2][3]. Industrial CPSs facilitate the remote control of large-scale heterogeneous systems, big data analysis, and condition monitoring, which has a high impact on various industrial fields [2,[4][5][6][7][8]. Industrial CPSs contain many edge devices which collect a huge amount of data, which is very helpful for developing deep learningbased methods to solve difficult industrial tasks, such as fault diagnosis [9,10], intelligent control [11], degradation prediction [12], smart city [13], etc. The conventional centralized learning approach (CL) centralizes all the distributed data to a central server for model training. However, if the central server is attacked, the data resources may be revealed, which may result in very bad consequences. In addition, due to business competition and privacy concerns, the data holders (i.e., industrial agents) are unwilling to share their local datasets.
To address this problem, the federated learning (FL) approach is proposed to control multiple training participants to collaboratively train a global model [14,15]. In the FL system, the training participants only share the gradients of their local models to the cloud server instead of raw data. The central cloud is responsible for updating the global model by aggregating the gradients shared by the training participants and sending the updated global model parameters to all participants. Since the FL can effectively solve data island issues, it has aroused widespread concern in many industrial fields. For instance, Li et al. [16] presented a FL-based CNN-GRU model for intrusion detection that can effectively detect different types of network threats. Kwon et al. [17] proposed a solution for joint cell association and resource allocation in smart ocean scenarios. The authors utilize the FL technologies to meet the requirements of distributed computing and Linlin Zhang and Zehui Zhang contributed equally to this work.

3
unexpected time-varying states. Liu et al. [18] proposed an FL-based gated recurrent unit neural network for predicting the traffic flow. Brisimi et al. [19] developed a FL-based prediction model to predict the hospitalization of patients suffering from various heart diseases using electronic health records distributed among different data sources. This not only improves the model performance but also ensures the patients' privacy. Zhang et al. [20] proposed a FL-based fault diagnosis method for the rolling bearing of the rotating machinery that combines a dynamic verification scheme and a self-supervised learning scheme. Various extension methods are proposed to improve the performance of the FL. Yu et al. [21] explained the effectiveness of periodic averaging of a specific model and adopted the parallel mini-batch stochastic gradient descent (SGD) for reducing communication costs. Wang et al. [22] proposed a cooperative SGD framework that combines the periodicaveraging, elastic averaging, and decentralized SGD for simultaneously optimizing the communication cost. Zhao et al. [23] proposed a strategy for globally sharing the small data subsets among all the edge devices to improve the accuracy of FL training on non-IID data. Most of the existing research takes advantage of the first-order gradient descent (GD) for increasing the efficiency of training; however, these techniques do not consider that the previous iterative gradient update potentially accelerates the convergence speed. Liu et al. [24] proposed momentum federated learning (MFL) which uses momentum term to accelerate the convergence during the process of local model training. These works effectively improve the performance of FL; however, they do not consider the privacy concerns appropriately.
Recent studies demonstrate that the FL techniques also suffer from privacy security issues. Wang et al. [25] proposed a framework that combines the generative adversarial networks (GANs) with multitasking discriminators to obtain specific private data from samples without interfering with FL processes. Zhu et al. [26] discussed that the gradient transmission in FL systems may leak the private data without relying on the model generation or prior knowledge of data. Therefore, privacy protection methods for the FL have been proposed. Geyer et al. [27] utilized differential privacy technologies to protect the privacy of distributed participants in the FL system. Triastcyn et al. [28] proposed a Bayesian differential privacy method for FL, which can flexibly adjust the injection noise to provide a stringent privacy guarantee. Aono et al. [29] introduced homomorphic encryption schemes in the FL to protect the data resources of distributed participants. These research works effectively preserve the privacy information in FL under their respective security assumptions. However, they do not consider improving the convergence performance of the FL.
For industrial CPSs, it is necessary that the FL not only performs efficiently in terms of model accuracy, convergence rate, and communication costs but also preserves the privacy information of the industrial agents. To meet the above requirements, we introduce the momentum term to accelerate the convergence rate of the FL system. Furthermore, the CKKS scheme is used to preserve the privacy information of industrial agents. The contributions of this work are summarized below: 1. We utilize momentum term to accelerate the convergence rate for the privacy-preserving federated learning approach. In particular, the momentum term is calculated by the cloud server to update the global model parameters, which can reduce the cryptography computing and communication costs. 2. To protect the data privacy of the industrial agents, we design a CKKS-based communication scheme, where the industrial agents use the CKKS encryption method to encrypt their gradient parameters. 3. We evaluate the effectiveness of the proposed approach on MNIST and Fashion-MNIST datasets. The experiment results show that the proposed PMFL improves the convergence rate while preserving the data resources of industrial agents.
The rest of this article is organized as follows. In the next section, we introduce the system model and theoretical background. In the subsequent section, we discuss two existing solutions: privacy-preserving federated learning and momentum federated learning followed by which the proposed PMFL approach is presented in detail. Then we analyze the functionality, security and communication costs of the PMFL. The experiment results are presented in the penultimate section. The final section concludes this work and introduces our future research directions.

System model
In this work, the proposed FL system comprises three participants, i.e., trust authority, cloud server, and multiple industrial agents, as shown in Fig. 1 Threat model In the proposed model, we assume that the trust authority is completely honest and rational. Hence, it never colludes with the cloud server and the outsiders. In addition, we assume that the cloud server and the industrial agents are honest-but-curious entities who execute the protocols correctly but try to extract additional information by inferring the intermediate data. Based on the assumptions, this work aims to preserve the privacy information of the individual agents during the entire training process (see Table 1).

Full homomorphic encryption
Fully homomorphic encryption (FHE) is a homomorphic technique that can perform multiple mathematical operations on encrypted data [30]. For a typical secure computing scheme, the distributed nodes use FHE to encrypt the data, and the encrypted data are sent to the cloud server. Then the cloud server can perform mathematical operations on the encrypted data without decryption, and send the encrypted results to the distributed nodes. In the scheme, the cloud server is unable to obtain the actual data since the encrypted data cannot be decrypted without the availability of the secret key, and the distributed nodes cannot obtain other nodes' data. CKKS [31] is an FHEbased technique for performing approximate calculation algorithms. The CKKS adds noise after data encryption via ciphertext truncation. Due to this, the resultant scheme has excellent encryption/decryption speed [32]. Therefore, we adopt the CKKS-based communication protocol in the proposed method.

Model parameters leak information
In Fig. 2, a sample neural network with three layers is used to illustrate how the model parameters may leak the information of the input data, where x i denotes the input data, y i denotes the truth label, w i denotes the weight parameters of the neural network, and b denotes the bias.
The forward propagation of the network is calculated using the following expression, where f () denotes the sigmoid function.  Fig. 1 The architecture of the FL system The loss value of the model is calculated as Then, the gradients of the model are calculated as From Eqs. (4)-(6), we observe that Therefore, the cloud server can infer the input data x i of the industrial agent. From Fig. 3, we can notice that a certain percentage of the image may leak its core information.
After the input data x i is obtained, the model output is calculated through forwarding propagation. The true value of the label y is inferred using Eqs. (9)-(10). Now, the information of the input data is completely leaked. (4)

Privacy-preserving federated learning
By utilizing homomorphic encryption technologies, the privacy-preserving federated learning (PFL) [29] has been proposed to protect the data resources of the training participants. In the PFL system, the training participants calculate the gradients of their local models and encrypt the gradients based on the homomorphic encryption method. Then, the industrial agents upload the encrypted gradients to the cloud server. Using these encrypted gradients, the cloud server updates the global model parameters and sends the encrypted global weight parameters to all the training participants. This is mathematically expressed as where Enc() denotes the encryption function, w global denotes the weight parameters of the global model, η denotes the learning rate, D a,i denotes the total number of training samples for training participant i, and g a,i denotes the gradients of the ith training participant.

Momentum federated learning
Momentum gradient descent algorithm (MGD) is an improved method for the gradient descent algorithm (GD) to speeding up the learning process. In the GD, the update change of parameters is calculated by η▽L(w(t-1)) which is only proportional to the gradients of the model. As shown in Fig. 4, the update path of the GD is oscillatory because its update direction is always along gradient descent. In the MGD, the update change of parameters consists of η▽L(w(t-1)) and γ(w(t-2)-w(t-1)) which is the momentum term to effectively mitigate the oscillation caused by the GD. Momentum term corrects the parameter update direction so that the iterations of the MGD scheme reaching the optimal point are less than that of the gradient descent algorithm. In Fig. 4, the momentum term corrects the parameter update direction so that the iterations of MGD reaching the optimal point are less than that of the GD, which demonstrates that mitigating the oscillation by MGD leads to a faster convergence rate.
Since solutions to convergence acceleration can improve the training efficiency of the FL, Liu et al. [10] proposed Momentum Federated Learning (MFL), where the training participants utilize momentum term to improve the convergence rate. The detailed process is described below.
1. Initialization The initial values of momentum parameters v a,i (0) and model weight parameters w a,i (0) are initialized.
2. Industrial agent local training The MGD algorithm is used by each industrial agent to calculate v a,i and w a,i on its respective local dataset, which can be expressed as where t denotes the iteration index of local training.
3. Cloud server parameter aggregation The cloud server aggregates the parameters upload by the industrial agents to update the global model. The global momentum parameters v global and global model parameters weight w global are aggregated by taking a weighted average of v a,i and w a,i . This is expressed in Eqs. (15) and (16). Finally, the global parameters v global and w global are transmitted to all the industrial agents for the next training process.
where |D a,i | denotes the data size of the industrial agent i.

The proposed framework
In this section, we elaborate on the PMFL framework by introducing the implement procedure for the FL-based application first, and then presenting the workflow of the PMFL, followed by the CKKS-based secure communication scheme.

Implement procedure for the FL-based application
As the FL can solve data barrier issues, it can be utilized to develop deep learning-based applications, which mainly involves five steps as shown in Fig. 5 [15]. First, each industrial agent obtains training samples. Secondly, the agents use the same preprocessing method to process raw data for improving model performance. Then, with the assistance of the cloud server, the industrial agents collaboratively train a global model utilizing their training samples according to the federated learning framework (detailed in Sect. 4.1). Finally, the trained model is evaluated on testing data. In this section, we present the proposed PMFL and its technical details. Fig. 4 Comparison of the MGD and the GD 1 3

The workflow for PMFL
The workflow of the PMFL comprises three phases: system initialization, local model training by industrial agents, and model parameter aggregation executed by the cloud server. The detail of each phase is presented below (see also Fig. 6 and Algorithm 1): 1. System initialization The trust authority generates the public key PK and the private key SK based on the CKKS encryption, and the cloud server establishes secure communication channels. Each industrial agent initializes the weight parameters of its local model w a,i . Then, each agent encrypts its weights by using the public key. The encrypted parameters are transmitted to the cloud server. Then the could server computes the global model by aggregating all the uploaded parameters. Finally, the server sends the initialized global model Enc(w global ) to all the industrial agents.
2. Local model training by industrial agents After receiving the encrypted global model parameters, i.e., Enc(w global ), from the cloud server, each industrial agent uses the private key SK to decrypt the weight parameters Enc(w global ) and obtains w global . After model decryption, the agent loads w global in its local model, and calculates the gradients of the local model g a,i using the private data resource. Lastly, the agent uses the public key to encrypt the gradients and sends Enc(g a,i ) to the cloud server.
3. Model parameter aggregation by the cloud server After receiving the encrypted gradients Enc(g a,i ) from each industrial agent, the cloud server calculates the encrypted global momentum term Enc(v global (t)) using Eq. (17), where n a,i denotes the total number of training samples for industrial agent i. Then, the encrypted global parameters Enc(w global (t)) are updated using Eq. (18), and sent Enc(w global (t)) the parameters to the industrial agents.

CKKS-based secure communication protocol
Various research works [33][34][35] indicated that the encryption/decryption operations of homomorphic encryption are computationally expensive. However, the CKKS as an emerging encryption scheme has excellent encryption speed compared to the Paillier and RSA encryption schemes [36,37]. To improve the training efficiency of the FL system, the proposed approach uses the CKKS scheme to preserve the data resources of the industrial agents.
In the PMFL, the CKKS encryption scheme is used to encrypt the model parameters. R represents a polynomial ring ℤ[X]/(X n + 1). We use R to denote ℤ[X]/(X n + 1) with integer coefficients modulo Rq. Similarly, we use [x]q to denote x mod q. Please note that < a,b > denotes the inner product of vectors a and b. The CKKS-based secure communication protocol includes the following four functions (detailed in [31]).
1. KeyGenerate(1 λ ) Given a security parameter λ, we set a ring degree n, a ciphertext modulus q, and a special modulus p which is coprime of q. The trust authority generates the secret key SK and the public key PK as per the standard defined by CKKS cryptosystem. global model sent by the cloud server using the following technique.

Functionality analysis
In Table 2, we present the functional comparison of various FL techniques, such as DPFL [28], PFL [29], and MFL [24]. The DPFL and the PFL preserve the privacy of the industrial agents. However, both solutions do not use the momentum parameters to accelerate the rate of convergence. Contrary, the MFL utilizes the momentum to accelerate the training process; however, it does not preserve the privacy of the industrial agents. The proposed PMFL incorporates the momentum parameters to improve the convergence rate while preserving the privacy of the industrial agents using the CKKS encryption scheme.

Security analysis
CPA-secure: For each probabilistic polynomial-time (PPT) and an adversary A , there is a negligible function negl, such that Π = (Gen, Enc, Dec) is a CPA security encryption method in which the probability is provided by the randomness of the adversary A and the randomness of the experiment (the generated PK , random bit, and any randomness used in the encryption process). Based on this definition, we obtain the following: All the encryption techniques that meet the CPA security, can guarantee security even in the presence of eavesdroppers.
All the deterministic encryption schemes do not satisfy the CPA security, and the encryption scheme that fulfills the CPA security must be probabilistic encryption.
We assume that the commutation channel between each industrial agent and the cloud server is sufficiently secure. This allows the server to verify the integrity of the uploaded data and prevent the potential attackers from performing any malicious activity, such as injecting their own data. The intermediate data obtained by the industrial agents and the cloud server during the training process are presented in Table 3. It is noteworthy that during the training process, each industrial agent can only obtain the encrypted weight parameters of the global model Enc(w global ) from the cloud server, the cloud server only receives the encrypted gradient parameters of the models Enc(g global ) from each industrial agent, and sends back the encrypted global model parameters Enc(w global ).

Theorem 1 In the PMFL, if CKKS is CPA-secure, and there is no collusion between industry agents and the cloud server/ external attacker, the data privacy of the industrial agents can be preserved.
Proof Assume that there is an adversary A who eavesdrops on the encrypted weight parameters of all the models. Since the adversary does not know the security parameter λ of the CKKS scheme, A cannot generate the secret key SK . Based on the above security assumptions, all industrial agents do not collude with the cloud server or any external member, and SK will not be leaked. Therefore, A is unable to obtain the key SK and decrypt the encrypted parameters to obtain the true values. At the same time, the weight parameters of the model are stored on the cloud server in the ciphertext form. As long as the cloud server does not conspire with any industrial agents, each industrial agent cannot obtain the model parameters uploaded by other industrial agents for inferring their data resources. In addition, the agents transmit information to the cloud server through different secure communication channels, thereby preventing the transmitted information from being stolen. Therefore, the proposed Table 3 Data information obtained by the industrial agents and cloud server Name Industrial agents Cloud server Intermediate data Enc(w global ) Enc(g global ) L a , i Enc(g a,1 ) g a,i … w a,i Enc(g a,n ) PMFL can effectively protect the private information of the industrial agents.

Communication cost analysis
We assume that the FL system includes N industrial agents. We set the length of w global and g a,i without considering the increase in the transmission cost of the ciphertext. The communication costs of the PMFL is further analyzed and compared with the MFL. In Table 4, we present a comparison of the communication cost of the proposed PMFL and the MFL. The above analyses show that the communication cost of the PMFL is lower than the MFL.

Performance evaluation
In this section, we present the simulation results to evaluate the performance of the proposed PMFL method. We first present the simulation setup, which includes the environmental setup, data resource description, and performance metrics. Then, we evaluate the proposed PMFL.

Experiment settings
The hardware operating environment is Intel CPU i7-6550, RAM 16 GB, and NVIDIA 1080Ti. We build the FL system using Python, NumPy, Pytorch, MATLAB, and CUDA. The MNIST dataset and Fashion-MNIST (F-MNIST) are used to evaluate the performance of the proposed FL system. The MNIST dataset contains 60,000 training samples and 10,000 test samples. It has 10 categories, and each sample is a 28 × 28 grayscale image. The F-MNIST data are similar to the MNIST dataset.
In this work, the FL system includes four industrial agents. To simulate real industrial scenarios, the datasets   are split into non-independent and identically distributed (Non-IID) datasets, which indicates that all data held by an industrial agent have the same label (if there are more labels than the number of the industrial agents, each industrial agent have data with more than one label but not the total number of labels). That is in accordance with the real industrial scenarios since the training data are generally difficult to be achieved. Convolutional neural network (CNN) has been widely used in computer vision and image recognition fields. Therefore, referring to Pytorch tutorials, a CNN model is selected as the local model of the industrial agents for evaluating the performance of the proposed PMFL approach. The construction parameters of the CNN are listed in Table 5. The minibatch size of the industrial agents is set to 64.

Case 1 (comparisons of different learning approaches)
In this subsection, we evaluate the performance of CL, FL, MFL and PMFL. The experimental curves and classification results of different learning approaches are presented in Fig. 7 and Table 6, respectively. In general, there is a performance gap between the FL and the CL. It can be seen from Fig. 7 that the accuracy and the loss value curves of the PMFL are closer to the CL compared to the FL. As listed in Table 6, the PMFL achieves the classification accuracies of 86.53% and 98.13% on F-MNIST dataset and MNIST dataset, respectively. The classification results of the PMFL are very close to the CL and superior to the FL. Compared with the MFL, the PMFL not only has almost the same performance but also saves half of the communication cost between the cloud server and the industrial agents.

Case 2 (varying experiment setting)
In this subsection, we verify the convergence performance of the PMFL and explore the effects of the momentum rate γ and the number of the industrial agents N on the PMFL convergence by simulation evaluation.
.1 Varying momentum rates In this subsection, the experiment curves of the proposed PMFL with different momentum rates are presented in Fig. 8. When the momentum rate γ is set to 0, the PMFL is the same as the traditional PFL. From the figure, it can be noticed that the accuracy curves with different momentum rates have a consistent overall trend. The convergence speed of PMFL is improved with the increasing momentum rate. The classification results of the proposed PMFL with different momentum rates are listed in Table 7. When γ = 0.6, the PMFL has the best classification performance on F-MNINST and MNIST datasets. The classification results reveal that the classification performance of the PEMFL is affected by the momentum rate. Therefore, the momentum rate value needs to be appropriately selected instead of just a large value.
2. Varying numbers of industrial agents For this case, we set the momentum rate γ to 0.5 and varies the number of industrial agents N. The experiment curves of the PMFL with different numbers of the industrial agents are shown in Fig. 9. The classification results of the proposed PMFL with different N are listed in Table 8. The trend of the curves shows the convergence performance of the PMFL. It can      Fig. 11 The momentum rate tuning of the PMFL on F-MNIST and MNIST datasets 1 3

Case 3 (hyperparameter tuning)
Liking most deep learning tasks, the hyperparameters of the CNN trained by the FL need to be tuned for improving model accuracy. In this subsection, the first hyperparameter to tune is the learning rate η. We set momentum rate γ to 0.5 and mini-batch to 64. The learning rate η is set to be from 0.01 to 0.1. The learning rate tuning results of the PMFL are shown in Fig. 10.
The second hyperparameter to tune is the momentum rate γ. From the above results, we set the learning rate η = 0.07 for the F-MNIST task and η = 0.08 for the MNIST task. The momentum rate γ is set to be from 0.1 to 0.9. The learning rate tuning results of the PMFL are shown in Fig. 11. From the above results, we set the momentum rate γ = 0.5 for the F-MNIST task and γ = 0.7 for the MNIST task.
Based on the above results, we set learning rate η to 0.08, momentum rate γ to 0.7 and mini-batch size to 64 for MNSIT task, and learning rate η to 0.07, momentum rate γ to 0.5 and mini-batch size to 64 for MNSIT task.

Conclusions
In this work, we present a privacy-preserving momentum federated learning (PMFL) for industrial cyber-physical systems (ICPSs). In the proposed FL approach, a CKKS-based secure communication protocol is designed to guarantee the privacy of the industrial agents by encrypting the weight parameters of the local models. The momentum term is utilized in the PMFL to accelerate the convergence rate. In particular, the momentum term is calculated in the cloud server, which reduces the communication costs compared to the MFL. Theoretical analysis and experiment results demonstrate that the proposed approach can effectively preserve the local privacy of the industrial agents and has high accuracy as well.
In future work, we aim to further investigate the process of model aggregation for alleviating the adverse effects of the industrial agents with low-quality training data, and use intelligent algorithms to tune hyper-parameters of the FL to achieve better model performance.