A secure and efficient privacy-preserving data aggregation algorithm

As a significant part of the Internet of things, wireless sensor networks (WSNs) is frequently implemented in our daily life. Data aggregation in WSNs can realize limited transmission and save energy. In the process of data aggregation, node data information is vulnerable to be eavesdropped and attacked. Therefore, it is of great significance to the research of data aggregation privacy protection in WSNs. We propose a secure and efficient privacy-preserving data aggregation algorithm (SECPDA) based on the original clustering privacy data aggregation algorithm. In this algorithm, we utilize SEP protocol to dynamically select cluster head nodes, introduce slicing idea for the private data slicing, and generate false information for interference. A comprehensive experimental evaluation is conducted to assess the data traffic and privacy protection performance. The results demonstrate that the proposed SECPDA algorithm can effectively reduce data traffic and further improve data privacy of nodes.


Introduction
Wireless sensor networks (WSNs) (IanF et al. 2016) consist of many low-power wireless sensor nodes with limited storage and computing power.These sensor nodes are used to sense and collect useful information in the network.Currently, WSNs are frequently applied in habitat monitoring (Smita and Mrinal 2019), intelligent space, medical systems (Muhammad et al. 2020), Smart Grid (He et al. 2017) and robotic exploration.In WSNs, sensor nodes are easily captured, physical tampering, denial of service and other attacks, which may lead to a series of challenges in foundational researches.In the data collection process, these sensor nodes may generate some redundant data, and the further data transmission will consume extra energy.Data aggregation (Sabrina et al. 2018), as the crucial technology in WSNs, is widely used to overcome the energy consumption issue.Aggregators can calculate and count the sum, average, minimum and maximum values from the child sensor nodes, and send the aggregated result to higher-level aggregator.Through redundancy process and information synthesis, the network traffic and energy consumption will be decreased.However, in the data aggregation process, when sensor nodes are communicating with each other, anyone with a relevant wireless receiver can detect and intercept messages between sensor nodes.The attacker may use illegal means to communicate with powerful workstations (Yang et al. 2020;Wang et al. 2020a, b, c, d).Illegal interactions and information theft can cause severe harm to the network, and even propel the entire network into a state of paralysis.
In recent years, the researches in data aggregation and privacy protection can be divided into three categories: hop by hop encryption, end-to-end encryption and nonencryption mechanism.He et al. (2007) proposed CPDA and SMART privacy protection algorithm in terms of hop by hop encryption mechanism, which utilize TAG tree model (Madden et al. 2002) to aggregate the data from sensor nodes.Yao and Wn (2008) proposed the DADPP algorithm to meet the needs of different privacy levels, and reach the privacy protection while obtaining accurate aggregation results.Yang et al. (2008) introduced SDAP algorithm, which utilize probability grouping technology to effectively verify the correctness of aggregated data.Feng et al. (2008) utilized interference numbers to protect the privacy of and reduce the communication overhead of node data.He et al. (2009) proposed an iCPDA protocol to overcome the data integrity issue, which increases the data integrity protection and inherits the data privacy protection capability of the CDPA algorithm.Ozdemir and Yang (2008) proposed an IPHCDA protocol, which provides data privacy and integrity protection by employing homomorphic encryption algorithm based on elliptic curve encryption and MAC mechanism.Guo (2012) improved the CPDA scheme, and reduced the computational and communication costs.Based on iPDA and CPDA algorithms, Bista et al. (2012) proposed the DCIDA algorithm, which utilizes real part of complex number to protect data privacy and utilized the imaginary part to verify data integrity by introducing the concept of complex number.Guo and Ding ( 2014) proposed an ILC-CPDA algorithm to reduce data transmission by utilizing the LEACH protocol and simple aggregation methods.This algorithm can detect data integrity by adding homomorphic message authentication code.In recent years, some privacy protection schemes (Liu et al. 2020a, b;He et al. 2019) in WSNs have been proffered.Zhang et al. (2019) proposed an energy efficient and reliable in-network data aggregation scheme for WSNs.Man et al. (2017) proposed an energyefficient cluster-based privacy data aggregation (E-CPDA), Fang et al. ( 2017) proposed a novel energy-efficient secure data aggregation scheme cluster-based private data aggregation (CSDA).These literatures have proffered the data aggregation protocol in smart grid (Afshin et al. 2019;Kong et al. 2020;Liu et al. 2018), Secure multi-party computation (Wang et al. 2020a, b, c, d) can also be applied to data aggregation in the future.These literatures have reduced traffic, but for higher privacy needs, it needs to be a further improvement.
In this work, we focus on the data traffic and privacy issues in CPDA, and proposed a secure and efficient data aggregation privacy protection algorithm (SECPDA).The SECPDA algorithm utilizes SEP protocol to select the cluster head node dynamically, and utilizes false message to enhance privacy protection capability.Experimental evaluations and performance analysis show that the proposed SECPDA algorithm has a lower data traffic, high security, privacy protection ability than other algorithms.
The contributions of our work in this article are shown as follow: 1. We adopt the SEP protocol to dynamically select cluster head nodes and merge them into the simple addition cluster to reduce the communication overhead of data, and propose a secure and efficient data aggregation privacy protection algorithm (SECPDA).
2. We adopt the method of data slicing and node false message to aggregate data for a further improvement the privacy needs in the communication process.
The rest of the paper is structured as follows: Sect. 2 describes the model and background, including sensor network model, encryption method and clustering method.Section 3 presents the SECPDA algorithm, including the formation of the cluster, aggregation process and the SECPDA algorithm flow.Section 4 describes the results of simulation and performance analysis, including the simulation of clustering process of sensor nodes, privacy performance analysis and data traffic analysis.Section 5 summarizes the paper and layout future research.

Sensor network model
Generally, sensor nodes are divided into three categories: base station, cluster head node and sensor node (He et al. 2007).The sensor node uploads the data to the cluster head node, then the cluster head node aggregates the received data with its own data, finally uploads the aggregated results to the base station.The sensor network model is demonstrated in Fig. 1 2

.2 Encryption method
The encryption method adopted by SECPDA is the same as that adopted by CPDA, which employs random key distribution scheme (Laurent and Virgil 2002).First, generate a key pool containing K keys, each of them has its own identity.Then each node randomly selects k keys from the key pool and stores them in the node.Each node broadcasts its own

The base station
Cluster head node Sensor node key, if the neighbor node has a public key with it, they will share a security link.Therefore, the probability of any two nodes sharing the security link is demonstrated in Formula (1): If the public key is not found between two nodes, the intermediate node forms a secure link between the two nodes by means of multiple hop links.In this process, the probability that the shared key is eavesdropped by the attacker is demonstrated in Formula (2): Here, K is the total number of keys in the key pool, k rep- resents the number of keys randomly selected for each node.

Clustering method
SEP protocol (Smaragdakis et al. 2004) is utilized to cluster in our proposed SECPDA algorithm.SEP protocol is an improved clustering protocol based on LEACH protocol (Heinzelman et al. 2000).In SEP protocol, owning to the different initial energy, these sensor nodes is divided into two categories: advanced nodes and normal nodes.In addition, different thresholds are set to make these advanced nodes to be more frequently selected as the cluster head nodes.Therefore, the build and transport process of the cluster is a cycle, and there exists a random proportion of advanced nodes.If the proportion of advanced nodes is a , the number of advanced nodes in the network with n nodes is na .Then, we can find that the number of normal nodes is (1 − a)n .Respectively, if the initial energy of the advanced node is b times than the normal nodes' energy.Each node generates a random number r and the range of r is from 0 to 1.If the threshold T(n) is greater than the random number r, the node is selected as the cluster head, and other nodes add the corresponding cluster according to the signal strength to complete the cluster construction.The probability of the advanced (1) (2) P overhear = k∕K.
node and normal node being selected as cluster head is p 1 , p 2 , as demonstrated in Formulas (3) and (4): Here, p represents the proportion of the heads in the clusters, and the thresholds for advanced nodes and normal nodes are demonstrated in Formulas ( 5) and ( 6): Here, r is the number of rounds, G 1 , G 2 represents a set nodes that are not elected as cluster heads in the nearest round of these sensor nodes.In the transmission phrase, the node sends the collected data to the cluster head, which aggregates the data of all nodes, and then sends the aggregated result to the sink node.After a period of stabilization, the network proceeds to the next round of elections.

Secure and efficient privacy-preserving data aggregation algorithm SECPDA
The functionalities of these components are demonstrated in Table 1.

The formation of the cluster
We utilize SEP protocol to select the cluster head node.Suppose there are 1 base station node and 10 sensor nodes in the network.When the base station sends a data request, the node sends its address to the base station, base station (3) .
Shared key between nodes I and J Enc(a i , K IJ ) Node I encrypts the slice information with a shared key and sends it to node J Enc(a � i , K IJ ) Node I encrypts the false information with a shared key and sends it to node J Dnc(a i , K IJ ) Node J decrypts the received slice information with a shared key Dnc(a � i , K IJ ) Node J decrypts the received false information with a shared key F I Aggregation data values collected by node I is node no.1-10.We select the cluster head node according to the threshold of the advanced node and the ordinary node, other nodes decide which cluster to join based on the strength of the signal.The specific process is demonstrated in Fig. 2 3

.2 Aggregation process
Suppose a cluster has three nodes A , B and C , where A is cluster head node.a , b and c respectively are the privacy data of each node.Here we will divide a into a 1 , a 2 and a 3 , that is a = a 1 + a 2 + a 3 .The node a will generate false informa- tion while slicing, a 1 correspond to a ′ Node A , B and C encrypt the privacy slice information and corresponding false information, then send them to other nodes randomly.
Where, the node A encrypts a 2 and a ′ 2 , then sends them to the node B , encrypts a 3 and a ′ 3 , then sends them to the node C .The process is demonstrated in Formula (7): Similarly, the node B encrypts b 2 and b ′ 2 , then sends them to the node A , encrypts b 3 and b ′ 3 , then sends them to the node C .The process is demonstrated in Formula ( 8): .
Similarly, the node C encrypts c 2 and c ′ 2 , then sends them to the node A , encrypts c 3 and c ′ 3 , then sends them to the node B .The process is demonstrated in Formula ( 9): Now the node A , B and C decrypts the received data by utilizing the shared secret key, which can be calculated to F A , F B and F C as demonstrated in Formulas ( 10)-( 12): The node B and C send F B and F C to node A. Here, F B mainly includes a 2 , a ′ 2 , c 3 , c ′ 3 , and similarly In the initial stage, node B and C send their first slice data and all the false information to the cluster head node A, then 3 and knows a 1 (its first slice data).Finally, A gets the aggregation values of a (11) .

The SECPDA algorithm flow
In SECPDA algorithm, node clustering, node data information processing and data aggregation algorithm flow are as follows: 4 Encrypts and sends

Simulation and performance analysis
In this work, we introduced a secure and efficient privacypreserving data aggregation algorithm.To simulate the clustering aggregation process of sensor nodes, we executed our algorithm on MATLAB.we suppose that 100 sensor nodes are deployed randomly in 100 × 100 area.The base station node is centrally located (50, 50).Set the proportion of the advanced nodes to be 0.1.Figure 6 demonstrates the random deployment of 100 sensor nodes in 100 × 100 area.Figure 7 demonstrates the result of cluster-head election.
Figure 8 is the result of clustering by distance matrix.Figure 9 demonstrates the cluster heads gather the aggregation results to the base station.

Communication overhead
We analyze the communication overhead of CPDA, ILC-CPDA and SECPDA algorithms respectively.In the CPDA algorithm, nodes within the cluster need to broadcast their own seeds within the cluster.Suppose a cluster has n nodes, So there are n nodes broadcasting the seeds, and each node needs to send encrypted data to other neighbor nodes, then each node sends n − 1 data, finally, n sensor nodes will send their aggregation data to the cluster head node.In the ILC-CPDA algorithm, n nodes send values to two nodes.The other nodes in the cluster will send the aggregation values to the cluster head node.In the SECPDA algorithm, each node randomly selects two nodes to send its own two privacy slicing information and corresponding false information respectively, then n nodes emit 4n data.Finally, n sensor nodes send the aggregation data to the cluster head node.We experimentally verified the analysis results.As can be seen from Fig. 10, in the whole network, SECPDA algorithm has less communication overhead than CPDA algorithm and is not significantly different from ILCCPDA algorithm.In SECPDA and ILCCPDA algorithms, both have private data slicing technology.With the increase of the number of clusters, the communication overhead of data is not very obvious, but the CPDA algorithm has been greatly increased.
The communication overhead in the whole network needs to consider the communication overhead formed by the network topology, the communication overhead in a cluster and the communication overhead between clusters.The Fig. 11 is a simulation of the communication overhead in the whole network.As can be seen from the Fig. 11, the communication overhead in the whole network of SECPDA algorithm and ILCCPDA algorithm is less than CPDA algorithm.

Privacy performance analysis
In CPDA algorithm, when the sensor nodes exchange messages within the same cluster, the privacy data will be leaked to the neighbor nodes.For a cluster of size m , the node sends m − 1 encrypted messages to the other m − 1 members of the cluster.The node can only be cracked if the attacker obtains the m − 1 keys, Otherwise, private data cannot be exposed.The average probability of data of all nodes in the cluster being cracked can be obtained as demonstrated in Formula ( 13): Here, m c is the minimum number of nodes in the cluster, and d max is the maximum number of nodes in the cluster, and q is the probability that the node link is cracked.
In ILCCPDA algorithm, if an eavesdropper wants to steal data from node s, they must know the two slices of data from node s and the information from the neighbor node.Therefore, the eavesdropper must break the link between node s and the neighbor node that gets the slice information of node s, as well as the link between the neighbor node and the cluster node.The average probability of data of all nodes in the cluster being cracked can be obtained as shown in Formula ( 14): Here, q 2 represents the probability of information being stolen from two neighboring nodes, and ∑ n−1 k=0 P(in = k)q k represents the probability of all transmitted information being stolen.
In our proposed SECPDA algorithm, each node in the cluster randomly selects two neighbor nodes, then sends encrypted privacy slices and false information to the neighbor node.And, each node only sends two encrypted messages, and ( 14) the number of encrypted messages received by each node is uncertain.The attacker needs to crack the slice information sent by the node and the information received by the node.Therefore, the average probability of data of all nodes in the cluster being cracked as demonstrated in Formula (15): Here, q is the probability that the node link is cracked.C 1 2 qC 1 2 q is the probability of messages being cracked, ∑ n−1 k=0 P(in = k)C 1 2 q k is the probability of the received mes- sage being cracked, C 1 2 stands for which of the hacked information is slice information or false message.P(in = k) represents the probability that k nodes send information to node s.P(in = k) is shown in Formula ( 16): When the probability of node link being cracked takes different values, the comparison of the probability of private data being stolen from CPDA, ILCCPDA and SECPDA algorithm is shown in Fig. 12.In Fig. 12, the privacy protection capability of SECPDA algorithm is higher than that of CPDA algorithm and ILCCPDA algorithm.

Security proof
The attack model is as follows.
We suppose that any adversary (Liu et al. 2020a, b;Wang et al. 2020a, b, c, d) wants to steal the private data of node A, the adversary needs to obtain all slice data (a1, a2 and a3) of private data a.Such an approach is harder to obtain the privacy data a than the unsliced private data.When this adversary attacks the privacy data, the attacker still can't get rid of the interference of false information.This is owning to the false information and slice privacy data are encrypted with this random key distribution scheme.Even if the data (sent by node A to other nodes) is intercepted by the adversary, the adversary also needs to ensure whether the eavesdropped data is a slice data of private data or a false data.
If any adversary wants to impersonate the node and exchange the slice data with some attacked nodes.Owing to the slice data is only a part of private data, the slice data does not make much sense, and the node also have false information to interfered, which increases the difficulty of being intercepted.( 16) We can utilize the probability formula to analyze the privacy protection capability.In Formula (15), when q = 0.02, P 3 (q) is about 0, which has a very low probability of being intercepted.

Conclusion
We propose a new data aggregation privacy protection algorithm called SECPDA, based on CPDA algorithm.Our algorithm adopts SEP protocol for cluster head elections and nodes aggregation.It greatly reduces communication overhead.The ability of privacy protection is improved by adding false information to interfere.After theoretical analysis and simulation experiments, the privacy performance of the SECPDA algorithm is better than CPDA algorithm and ILC-CPDA algorithm, data traffic is also effectively reduced.For the future work, we are going to investigate how to ensure the integrity of data in the transmission process.

1, a 2
correspond to a ′ 2 , a 3 correspond to a ′ 3 , The information of privacy slice a 1 is retained by this node A. Divide b into b 1 , b 2 and b 3 , that is b = b 1 + b 2 + b 3 .The node b will generate false information while slicing, b 1 cor- respond to b ′ 1 , b 2 correspond to b ′ 2 , b 3 correspond to b ′ 3 , the node B the information of privacy slice b 1 to the clus- ter head node A .send false information b ′ 1 , b ′ 2 and b ′ 3 to the cluster head node A .The process of node C is similar to that of node B.
2 and c ′ 3 .If we set S 1 to be the sum of a , b and c , the values of S 1 can be obtained without knowing b and c.Figures 3, 4 and 5 demonstrated the aggregation process of all nodes.

Funding
Funding was provided by National Natural Science Foundation of China (Grant No. 61962009), Major Scientific and Technological Special Project of Guizhou Province (Grant No. 20183001), Open Funding of Guizhou Provincial Key Laboratory of Public Big Data (Grant No. 2018BDKFJJ005).Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creat iveco mmons .org/licenses/by/4.0/.References Afshin K, Maede AT, Ladani BT (2019) An efficient privacy-preserving data aggregation scheme in smart grid.In: 27th Iranian conference on electrical engineering (ICEE), pp 1967-1971 Bista R, Kim Y, Song MS, Chang JW (2012) Improving data confidentiality and integrity for data aggregation in wireless sensor networks.IEICE Trans Inf Syst 95:67-77 Fang W, Wen XZ, Xu J (2017) CSDA: a novel cluster-based secure data aggregation scheme for WSNs.Clust Comput 22:5233-5244 Feng TM, Wang C, Zhang WS et al (2008) Confidentiality protection for distributed sensor data aggregation.In: 27th conference on computer communications, pp 56-60 Guo H (2012) A modified scheme for privacy-preserving data aggregation in WSNs.In: 2nd international conference on consumer electronics, pp 790-794 Guo ZW, Ding XJ (2014) Low energy-consuming cluster-based algorithm to enforce integrity and preserve privacy in data Fig. 12 Privacy comparison