LSH-based missing value prediction for abnormal traffic sensors with privacy protection in edge computing

Gao, Ailing; Liu, Xiaomei; Miao, Ying

doi:10.1007/s40747-023-00992-x

LSH-based missing value prediction for abnormal traffic sensors with privacy protection in edge computing

Original Article
Open access
Published: 02 March 2023

Volume 9, pages 5081–5091, (2023)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

LSH-based missing value prediction for abnormal traffic sensors with privacy protection in edge computing

Download PDF

1057 Accesses
Explore all metrics

Abstract

Traffic flow prediction is an important part of intelligent transportation systems (ITS). However, sensor failures or the transmission distortion often occur in the process of data acquisition, which will inevitably cause the loss or abnormality of traffic flow data transmitted to the edge server. In this situation, it is necessary to share traffic flow data among different platforms. However, existing traffic flow prediction methods are facing two challenges in the process of traffic flow data sharing. First, user privacy is often leaked in the process of sharing traffic data on various platforms. Moreover, with the continuous updating of data, the efficiency and scalability of data sharing between different platforms will become lower and lower. In view of the above challenges, in this paper, we propose a novel prediction method for the missing traffic flow data caused by abnormal sensors, named $ASMVP_{distr-LSH}$ based on distributed locality-sensitive hashing (LSH) technique. At last, a case study is presented to illustrate the feasibility and effectiveness of our approach $ASMVP_{distr-LSH}$.

Time-Aware Missing Traffic Flow Prediction for Sensors with Privacy-Preservation

A New Framework for Anomaly Detection Based on KNN-Distort in the Metro Traffic Flow

Edge computing empowered anomaly detection framework with dynamic insertion and deletion schemes on data streams

Article Open access 12 May 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Traffic flow prediction remains to be an indispensable part of intelligent transportation systems (ITS) [1, 2] in this ever-rapid changing modern society. In other words, the accurate and reliable traffic prediction will help effectively alleviate the huge traffic congestion problem, which is of great significance to traffic management and social security [3,4,5]. As a crucial part of 5 G, edge computing can optimize data processing performance and reduce the delay of traffic flow prediction [6, 7]. Generally speaking, traffic prediction is to predict the future traffic situation of road network based on historical traffic data [8,9,10]. Consequently, the integrity of historical traffic data is a key ingredient of the prediction recipe to success.

Considering that the data of traffic flow prediction come from different sensors (e.g., laser sensors and infrared sensors) [11,12,13], it is necessary for the infrared sensor platform to synergism with others (e.g., laser sensors platform) to exploit the integrated traffic data to improve the prediction accuracy of missing sensor data [14]. Therefore, it is necessary to buck a way to integrate traffic data [15,16,17]. However, the method is usually not feasible in the actual cooperation between manufacturers. One of the most fundamental reasons is that manufacturers rarely share the original traffic data with others owing to the conflicts of boot and users’ sensitive information [18,19,20]. Another reason is that the amount of raw data tends to grow over time, which will lead to lower efficiency of traffic data sharing, processing, mining and analysis [21,22,23].

Considering the above challenges, we propose a novel traffic prediction method named $ASMVP_{distr-LSH}$, which is based on the principle of distributed locality-sensitive hashing (LSH) [24,25,26] to protect privacy and fill the missing traffic data [27]. LSH has a favorable feature that is to retain similarity, i.e., two adjacent points are likely to be given the same exponent [28,29,30,31]. Overall, our main academic contribution is twofold, which is specified as follows.

(1)
We formulate the topic of traffic flow prediction form multi-source data across different sensors and propose a distributed LSH method for the traffic flow prediction to protect user privacy, i.e., $ASMVP_{distr-LSH}$. This method converts the traffic flow information into index information, and then uses the index information to predict. So as to achieve the purpose of privacy protection.
(2)
In this paper, we specially consider the data integrity before traffic flow prediction. We use the principle of finding similar in LSH method to fill in the missing data in the sensor, which can ensure the prediction accuracy of missing traffic sensor data.

The rest of this paper is organized as follows. In “Related work”, we summarize the related work in current traffic data prediction domain. In “Problem Description And Formulation’, we introduce the motivation of the study through a real-world traffic scenario and formalize the problem of missing values prediction in traffic. In “A distributed LSH-based missing value prediction approach: $ASMVP_{distr-LSH}$”, the novel method (i.e., $ASMVP_{distr-LSH}$) proposed in this paper is described in detail to achieve privacy-preserving and time-efficient traffic data prediction. In “Case study”, a case study is used to introduce the concrete steps and show the effectiveness of our method $ASMVP_{distr-LSH}$. In addition, its shortcomings are analyzed in this section. Finally, we summarize the conclusions of this paper and point out the future work in “Evaluation and Further Discussion”.

Related work

Next, we briefly review the research progress of traffic flow prediction from the following two aspects, i.e., missing data prediction and user privacy protection.

Missing traffic flow data prediction

The data collected by different types of sensors on the road are the key ingredient of traffic flow prediction model. However, the data collected by these sensors are occasionally missing due to a variety of reasons, e.g., hardware failure, transmission error, etc. Tian et al. [32] solve the problem of missing data for a long time from two different perspectives, and meanwhile propose two machine learning methods to update missing data without gap length limitation. Laña et al. [33] utilize the periodicity of traffic flow data to infer missing values and put forward a method based on Long Short-Term Memory (LSTM) [34] model. Li et al. [35] propose a Multi-View Learning Method (MVLM) to estimate the missing value of traffic flow in the database.

Although the influence of missing values on prediction performances is often ignored in deep neural network methods, Boquet et al. [36] put forward an online unsupervised data imputation method to tackle this issue. Wang et al. [10] use the characteristic that the traffic flow in the network follows the spatial-temporal patterns to restrain the influence of missing data. Zhang et al. [37] propose a method called FNNTEL, which is a data missing estimation method of tensor heterogeneous ensemble learning based on Fuzzy Neural Network (FNN).

At present, the research on the solution of traffic flow data missing often focuses on the model based on either temporal information or spatial information or spatiotemporal information or tensor decomposition. However, they often lack the ability to protect users’ private information contained in traffic flow data.

Privacy protection in traffic flow data

In ITS, traffic flow data are the key factor and the foundation of many prediction models and analysis methods. However, it is inevitable to disclose user privacy information contained in traffic flow data. In order to protect the release of real-time vehicle trajectory data, Ma et al. [38] propose a method named RPTR, which is an effective privacy protection mechanism based on differential privacy. And in the RPTR mechanism, ensemble Kalman filter based on user location transition probability matrix is used to ensure the data availability.

To solve the problem of privacy leakage caused by data sharing between different private operators and public institutions, He et al. [39] propose a privacy control algorithm based on k-anonymity diffusion to realize reliable data sharing. Facing the challenge of protecting the privacy of a single vehicle when collecting point-to-point data for geographical location, Zhou et al. [40] propose a privacy-preserving traffic flow measurement method by using bit array to collect data that need to be protected and maximum likelihood estimation to obtain measurement results. In order to protect the location privacy data of users in GPS, Yang et al. [41] propose a virtual travel route system with stealth technology, which promotes the design of distributed architecture considerably.

Wu et al. [42] first used homomorphic encryption function in compressed sensing data collection, and proposed an efficient data collection method with privacy protection to prevent traffic analysis and tracking in wireless sensor networks. Liuet al. [43] propose a neural network algorithm of gated recurrent unit based on FL, where FL is a privacy-preserving machine learning technology named federated learning.

Qi et al. [44] put forward an LSH-based service recommendation approach $SerRec_{distr-LSH}$ to secure the sensitive information of users hidden in historical user ratings. However, the authors do not take the time context factor into consideration and neglect the dynamic influence of time towards the unstable service quality. Therefore, the accuracy of the proposed $SerRec_{distr-LSH}$ approach is reduced accordingly.

Meng et al. [45] propose an “optimal publishing” strategy to reveal only the optimal service quality records instead of publishing all the sensitive service quality data observed by users. This way, most private information contained in service quality data can be protected well. However, such a partial publishing way decreases the availability of historical service quality data significantly since there is a tradeoff between data availability and data privacy.

However, the above algorithms have not been applied to deal with the traffic flow prediction scenarios where the traffic flow data generated by multiple sensors are distributed in multiple platforms. Motivated by this fact, we propose a novel traffic flow prediction method based on distributed LSH, as elaborated in the following sections.

Problem description and formulation

In this section, we first describe the LSH-based Traffic Flow prediction problems with privacy-preservation. Then, we formalize the specific problem for easy understanding. The symbols used in this paper can be found in Table 1.

Table 1 Specification of symbols in this paper

Full size table

Problem description

We use Fig. 1 to help describe the traffic flow prediction problem with missing data. Concretely, suppose there are three kinds of sensors to record the traffic situation of a certain road. In the figure, three different sensors are represented by three different base stations. To predict the missing data in one of the sensors, it is necessary to use LSH technology find the similar date corresponding to the date of missing data. Finally, the missing values in the traffic flow data are predicted successfully.

However, there are two challenges in the process of traffic flow prediction. (1) As the above traffic data contains user privacy information and involves the interests of various companies, these sensor companies do not want to share their collected data with other companies. (2) With the increase of sensor types and traffic flow, the amount of traffic flow data has become more and more huge. As a result, the efficiency and scalability of data sharing among companies are significantly reduced, and it is difficult to meet the requirements of real-time traffic prediction.

To address these challenges, we propose a new distributed LSH-based approach named $ASMVP_{distr-LSH}$, which has the characteristics of privacy protection and scalability. Details are given in the next section.

Problem formulation

In this paper, we focus on the problem of missing traffic flow prediction from multi-source data. For the convenience of understanding and the following discussion, we further formulate the traffic data prediction problem as a five-tuple problem $ LSH\_MVP(TFS, TP, Day, day_{target}, TD)$, where

(1)
$ TFS = \left\{ TFS_1,...,TFS_{SN} \right\} : tfs_d(1\le d\le SN)$ denotes the d-th traffic sensor, which supplies the d-th part of the traffic flow prediction data collected every day.
(2)
$ TP = \left\{ {TP_1,...,TP_{SN}}\right\} : TP_k(1\le k\le SN)$ denotes the time period in sensor $TFS_k$.
(3)
$ Day = \left\{ {day_1,...,day_{DN} } \right\} : day_i(1\le i\le DN)$ represents the i-th day of the month. Here, the traffic flow prediction data collected every day come from many sensors in the set TFS, so it is multi-source.
(4)
$ day_{target}: $ a target day when we need to predict the missing data of a traffic sensor in a day. Here, $ day_{target}\in Day $ holds.
(5)
TD is the length of a time period, e.g., suppose that we adopt 15 min as a time period, then each day will contain 96 time periods.

A distributed LSH-based missing value prediction approach: $ASMVP_{distr-LSH}$

Our traffic flow prediction method can not only protect privacy and fill the missing data in abnormal sensors, but also make distributed prediction for a variety of sensor platforms [46, 47]. In short, in “LSH: Locality-Sensitive Hashing”, we briefly introduce the location sensitive hash technology. In “$ASMVP_{distr-LSH}$: traffic flow missing value prediction based on distributed LSH”, we introduce $ASMVP_{distr-LSH}$ in detail.

LSH: locality-sensitive hashing

LSH technology was improved and proposed by Aristides Gionis et al to achieve high speed information retrieval. Specifically, the algorithm makes hash buckets to store more than one point. In other words, (1) it makes two adjacent data points in the original space after hashing likely to be neighbors; and (2) it makes two non-adjacent data points in the original data space are not contiguous after hashing.

The above is the main idea of LSH algorithm. Therefore, the hash function satisfying the above two circumstances is named LSH function, and LSH has been proved to be a technology that can effectively deal with distributed applications such as distributed information retrieval, such as the multi-source cloud service recommendation method based on distributed LSH in [20].

Specifically, f(*) is the function of LSH, F(*) is the family of functions of LSH, assuming that $ x_1$ and $ x_2$ are two variables in the primitive data space, r($ x_1$,$ x_2$) represents the distance between two variables, f(x) represents the index or hash value of variable x, P(Y) represents the probability that condition Y holds, and ($ r_x$, $ r_y$, $ p_x$, $ p_y$ ) are a set of thresholds. If both circumstances (1) and (2) hold, then the f() is called ($ r_x$, $ r_y$, $ p_x$, $ p_y$ )-sensitive.

$$\begin{aligned} If \ r(x_1,x_2) \le r_x, \ then \ P(f(x_1)= & {} f(x_2)) \ge p_x, \end{aligned}$$

(1)

$$\begin{aligned} If \ r(x_1,x_2) \ge r_y, \ then \ P(f(x_1)= & {} f(x_2)) \le p_y. \end{aligned}$$

(2)

We use the example to illustrate the general procedure of LSH-based similar days search. First, assume that the original data space contains m data points ($ data_1$, $ data_2$,..., $ data_m$), they can be mapped into n containers ($ b_1$, $ b_2$,..., $ b_n$) by LSH function, and each container $ b_k(1\le k \le n)$ contains $ m_k(m_k \ll m \ \& \ \sum m_k = m)$ data points with similar neighbor characteristics.

As described above, if a target date (i.e., X) wants to hunt for its similar dates from ($ data_1$, $ data_2$,..., $ data_m$), it should cypher the corresponding hash value f(X) through the hash function f(*), and then find the corresponding container, (assume $ b_k(1\le k \le n)$ here). According to the main idea of LSH, the $ m_i$ data points contained in $ b_k$ bucket are most likely the similar day data of target days X. In this case, once $m_i \ll m$ is established, the size of the search will change from m to $ m_i$, and the search efficiency will also be significantly improved.

It can be seen from the above examples that the method based on LSH search has three advantages. First, this method uses the hash value or index generated by hash function f(*) to find the similar data points of the target data. In this situation, LSH protects the privacy information in the data. Second, distributed data points ($ data_1$, $ data_2$,..., $ data_m$) can be centralized into a hash table through LSH, and then unified calculation can be carried out. Third, LSH can establish hash table offline and reduce search space, which can not only improve search efficiency, but also increase search scalability. Therefore, the extended LSH method is applied to the missing value prediction of traffic flow to achieve privacy protection and scalable traffic flow prediction in distributed multi-sensor environment.

$ASMVP_{distr-LSH}$: traffic flow missing value prediction based on distributed LSH

In this section, we will introduce our $ASMVP_{distr-LSH}$ method in detail. Generally speaking, our method mainly consists of four steps, as follows:

Step 1 (Establish date sub-indices offline):

Concretely, for each sensor $TFS_k (1\le k \le SN)$, we can choose a family of LSH functions $ F_k(*)$ to create a sub-index for $ day \in Day $ offline based on the known traffic flow data collected by sensors. Because Pearson Correlation Coefficient (PCC) is often used to reflect the linear correlation degree of two variables X and Y, in this paper, we use LSH function family corresponding to PCC to build the index. In addition, the selection of LSH function family $ F_k(*)$ also needs to consider the (1) (2) conditional formula described in the previous section.

First, for a day, all its time periods $\left\{ TP_{k,1},...,TP_{k,TN} \right\} $ are converted into a TP dimensions vector $ \overrightarrow{day(k)} = (TP_{k,1,TD},...,TP_{k,TN,TD})$, where TD refers to the length of a time period, and the missing data of a certain segment of sensor is expressed as $ TP_{k,i,TD}=0$. Then, the LSH function $ f_k(*)$ of the above TP dimensions vector is shown by Eq.(3) more formally.

$$\begin{aligned} f_k(day) = \left\{ \begin{array}{ll} 1 \quad if \ \overrightarrow{day(k)} \bullet \overrightarrow{p} > 0\\ 0 \quad if \ \overrightarrow{day(k)} \bullet \overrightarrow{p} \le 0. \end{array}\right. \end{aligned}$$

(3)

Here, $ \overrightarrow{p}$ is an TP dimensions vector $ (p_1,..,p_{TN})$ in which the elements are random values of the interval [-1,1]; the sign $ \bullet $ denotes the point multiplication. For the convenience of understanding, we can explain Eq. (3) as follows: first, vector $ \overrightarrow{p}$ denotes the hyperplane with cutting function, and then suppose that there are two vectors $ \overrightarrow{x_1}$ and $\overrightarrow{x_2}$. If $ \overrightarrow{x_1}$ and $ \overrightarrow{x_2}$ are on the same flank of $ \overrightarrow{p}$ (i.e., both $ \overrightarrow{x_1} \bullet \overrightarrow{p} > 0 $ & $ \overrightarrow{x_2} \bullet \overrightarrow{p} > 0 $ hold, or, both $ \overrightarrow{x_1} \bullet \overrightarrow{p} \le 0 $ & $ \overrightarrow{x_2} \bullet \overrightarrow{p} \le 0 $ hold), then $ \overrightarrow{x_1}$ and $ \overrightarrow{x_2}$ are likely to be considered similar.

Second, since the elements in vector $ \overrightarrow{p}$ are randomly generated from the data interval [-1,1], the above hashing and mapping process can be repeated $SF_k$ times using different vectors $ \overrightarrow{p}$. Then, the sub-index (i.e., $ F_k(day) = (f_{k}^{1}(day),..., f_{k}^{SF_k}(day))$) of the sensor in one day can be obtained, in which $ f_{k}^{j}(day)(1 \le j \le SF_k)$ is calculated by Eq. (3). In particular, the sub-index $ F_k(day)$ here is a 0–1 vector with $SF_k$ dimensions.

In addition, we can use the following pseudo code to represent the above process(see Algorithm 1).

Step 2 (Establish date index by amalgamating sub-indices offline):

In the previous step, in the light of traffic data of different sensors, we get the SN sub-indices $F_1(day),...,F_{SN}(day)$. In this step, we will amalgamate the SN sub-indices offline into an integrated date index $ F(day) = (F_1(day),...,F_{SN}(day))$ with dimension $\sum _{i=1}^{SN} R_i$. Subsequently, for each $day \in Day$, we repeat the above process until the mapping relationships of “$day \rightarrow F(day)$” are established. Next, we record the mapping relationships “$day \rightarrow F(day)$” through a pre-defined hash table FTab.

In addition, we can use the following pseudo code to represent the above process(see Algorithm 2).

Step 3 (Find similar days of $ day_{target}$ online):

According to the operation of selecting hash function family $F_m(*)(1 \le m \le SN)$ to get sub-index in step 1 and amalgamating sub-index in step 2, we can compute the index $ F(day_{target})$ of $day_{target}$ online. Next, we can find the bucket with the value of $ F(day_{target})$ from the FTab exported in step 2. If a valid bucket can be found, in this case, each date contained in the container are regarded to be similar days of $ day_{target}$ and put into a dataset named DS-Set. If we cannot find the qualified container, in this case, we cannot simply judge that $ day_{target}$ has no similar days, because of the characteristics of LSH (i.e., probability). This characteristic cannot guarantee that all similar days can be found every time, i.e., some qualified results will be ignored.

Therefore, we use the above method to create T hash table $ FTab_1,...,FTab_T$ by repeating Step 1 and Step 2 to relax the judgment or evaluation condition of similar days search. Next, if the condition in Eq.(4) is true, we regard that $ day_{target}$ has similar days, and the dates whose values in the bucket are equal to $ {F(day_{target})}_x$ are similar days of $ day_{target}$. Moreover, we put the similar days into a new data set named DS-Set.

$$\begin{aligned} \begin{aligned}&\exists \ day (\in Day) \ and \ x (\in {1,...,T}), \\&\quad satisfy \ F(day)_x = {F(day_{target})}_x \ in \ FTab. \end{aligned} \end{aligned}$$

(4)

In addition, we can use the following pseudo code to represent the above process (see Algorithm 3).

Step 4 (Top-K missing value prediction):

In the previous step, we have gained a similar date set (i.e., DS-Set) for $ day_{target}$. Next, we use DS-Set to predict the missing values in $ day_{target}$ (here, we can set a threshold for |DS-Set|). Specifically, we use Eq.(5) to predict the missing values of abnormal sensors in traffic over the time period TD.

$$\begin{aligned} TP.F_{target} = \frac{1}{|DS-Set|} *\sum _{day_j\in DS-Set} TP.F_j. \end{aligned}$$

(5)

Here, $TP.F_j$ represents the traffic flow of the corresponding time period in the sensor TFS, which is included in the days similar to $ day_{target}$ (i.e., a day with abnormal sensor values). Finally, we rank all the time periods of the sensor according to the prediction results by Eq.(5), and take the traffic flow data corresponding to the first k time periods as the final prediction results.

In addition, we can use the following pseudo code to represent the above process(see Algorithm 4).

After the above four steps of $ASMVP_{distr-LSH}$ approach, we can accurately predict the missing values of abnormal sensors under the condition of privacy.

Case study

To illustrate the feasibility of our method, in this section, we use a case study to demonstrate the execution process of our proposed method. Suppose there are 2 different sensors that collect traffic flow data. In addition, we adopt 60 min as a time period, then there are totally 24 time periods in each day. For the convenience of readers’ understanding and the easy calculation, the traffic flow data of the sensors used only contain 5 days, each of which is divided into 10 time periods (each period is equal to 2.4 h).

Step 1 (Establish date sub-indices offline):

In this section, we use 4 hash functions to form a family of hash functions (i.e., $ F_k(day) = (f_{k}^{1}(day),..., f_{k}^{4}(day))$) for better illustration. Concretely, the hash function family is shown in (6).

$$\begin{aligned} \begin{aligned} F_{10\times 4}= \begin{bmatrix} -0.16595599&{}\quad -0.16161097&{}\quad 0.60148914&{}\quad -0.80330633 \\ 0.44064899&{} \quad 0.370439&{}\quad 0.93652315&{}\quad -0.15778475 \\ -0.99977125&{}\quad -0.5910955&{}\quad -0.37315164&{}\quad 0.91577906 \\ -0.39533485&{}\quad 0.75623487&{}\quad 0.38464523&{}\quad 0.06633057 \\ -0.70648822&{}\quad -0.94522481&{}\quad 0.7527783&{}\quad 0.38375423 \\ -0.81532281&{}\quad 0.34093502&{}\quad 0.78921333&{}\quad -0.36896874 \\ -0.62747958&{}\quad -0.1653904&{}\quad -0.82991158&{}\quad 0.37300186 \\ -0.30887855&{}\quad 0.11737966&{}\quad -0.92189043&{}\quad 0.66925134 \\ -0.20646505&{}\quad -0.71922612&{}\quad -0.66033916&{}\quad -0.96342345 \\ 0.07763347&{} \quad -0.60379702&{} \quad 0.75628501&{}\quad 0.50028863 \\ \end{bmatrix}. \end{aligned} \end{aligned}$$

(6)

First, according to Eq. (3), a dot multiplication operation is adopted between a hash function and the vector corresponding to a sensor. Then, the above process is repeated four times based on different hash functions to get the sub-index of the sensor. Here, the sub-index of the sensor is a 0–1 vector. To facilitate the subsequent understanding and analysis, we transform the 0–1 vector of the sensor into a corresponding decimal number. The sub-indexes of the two sensors are shown in (7). Here, $ f_{k}(day)$ represents the sub-index of the k-th traffic sensor.

$$\begin{aligned} \begin{aligned} f_{1}(day)= \begin{bmatrix} 0&\quad 0&\quad 0&\quad 3&\quad 11 \end{bmatrix}, \\ f_{2}(day)= \begin{bmatrix} 1&\quad 1&\quad 0&\quad 1&\quad 1 \end{bmatrix}. \\ \end{aligned} \end{aligned}$$

(7)

Step 2 (Establish date index by amalgamating sub-indices offline):

In Step 1, we have obtained the sub-indexes of the two traffic sensors. Next, we separately merge the sub-indexes of the two sensors, and the final merging results are shown in Eq. (8). At the same time, the combined index is sent to each sensor platform.

$$\begin{aligned} \begin{aligned} F_{1,2}(day)= \begin{bmatrix} 0&{}\quad 0&{}\quad 0&{}\quad 3&{}\quad 11 \\ 1&{}\quad 1&{}\quad 0&{}\quad 1&{}\quad 1 \end{bmatrix}. \\ \end{aligned} \end{aligned}$$

(8)

Step 3 (Find similar days of $ day_{target}$ online):

Repeat Step 1 and Step 2 four times to get four different hash tables. Here, the sub-indexes obtained by four different groups of hash function families are presented in (9) and (10), respectively, and the combined indexes of four groups of sub-indexes are shown in (11). Here, $ f_{k}^{T}(day)$ represents the sub index of the k-th sensor obtained through the T-th group of hash function family.

$$\begin{aligned} f_{1}^{1}(day)= & {} \begin{bmatrix} 0&\quad 0&\quad 0&\quad 3&\quad 11 \end{bmatrix}, \ f_{1}^{2}(day)= \begin{bmatrix} 4&\quad 4&\quad 4&\quad 1&\quad 1 \end{bmatrix}, \nonumber \\ f_{1}^{3}(day)= & {} \begin{bmatrix} 2&\quad 0&\quad 0&\quad 2&\quad 2 \end{bmatrix}, \ f_{1}^{4}(day)= \begin{bmatrix} 1&\quad 3&\quad 10&\quad 5&\quad 5 \end{bmatrix}, \nonumber \\\end{aligned}$$

(9)

$$\begin{aligned} f_{2}^{1}(day)= & {} \begin{bmatrix} 1&\quad 1&\quad 0&\quad 1&\quad 1 \end{bmatrix}, \ f_{2}^{2}(day)= \begin{bmatrix} 5&\quad 5&\quad 4&\quad 1&\quad 1 \end{bmatrix}, \nonumber \\ f_{2}^{3}(day)= & {} \begin{bmatrix} 0&\quad 0&\quad 0&\quad 0&\quad 6 \end{bmatrix}, \ f_{2}^{4}(day)= \begin{bmatrix} 1&\quad 1&\quad 10&\quad 1&\quad 5 \end{bmatrix}, \nonumber \\\end{aligned}$$

(10)

$$\begin{aligned} F_{1,2}^{1}(day)= & {} \begin{bmatrix} 0&{}\quad 0&{}\quad 0&{}\quad 3&{}\quad 11 \\ 1&{}\quad 1&{}\quad 0&{}\quad 1&{}\quad 1 \end{bmatrix}, \ F_{1,2}^{2}(day)= \begin{bmatrix} 4&{}\quad 4&{}\quad 4&{}\quad 1&{} \quad 1 \\ 5&{} \quad 5&{} \quad 4&{} \quad 1&{}\quad 1 \end{bmatrix}, \nonumber \\ F_{1,2}^{3}(day)= & {} \begin{bmatrix} 2&{}\quad 0&{} \quad 0&{}\quad 2&{}\quad 2 \\ 0&{} \quad 0&{}\quad 0&{}\quad 0&{}\quad 6 \end{bmatrix}, \ F_{1,2}^{4}(day)= \begin{bmatrix} 1&{} \quad 3&{}\quad 10&{}\quad 5&{}\quad 5 \\ 1&{}\quad 1&{} \quad 10&{}\quad 1&{}\quad 5 \end{bmatrix}.\nonumber \\ \end{aligned}$$

(11)

Next, according to Eq. (4) and the above index values in (11), we can get a similar date matrix, and the results are shown in (12). In the similarity matrix in (12), the number of rows represents the number of days of the first sensor, and the number of columns represents the number of days of the second sensor.

$$\begin{aligned} \begin{aligned} sim_{5\times 5}= \begin{bmatrix} 1&{}\quad 1&{}\quad 1&{} \quad 1&{} \quad 0 \\ 1&{}\quad 1&{}\quad 1&{} \quad 1&{}\quad 0 \\ 1&{}\quad 1&{}\quad 1&{}\quad 1&{}\quad 0 \\ 0&{}\quad 0&{}\quad 0&{} \quad 1&{}\quad 1 \\ 0&{}\quad 0&{}\quad 0&{} \quad 1&{}\quad 1 \\ \end{bmatrix}. \end{aligned} \end{aligned}$$

(12)

Step 4 (Top-K missing value prediction):

According to Eq. (5) and the similarity matrix obtained in Step 3 (i.e., the matrix in (12)), we can predict the missing values of the abnormal sensors. Specifically, we assume that the original data of the two abnormal sensors are presented in (13). It can be seen from (13) that the data in the 3rd period of the 1st day of the second abnormal sensor is missing. According to (12), the 1st day is similar to the 2nd day and the 3rd day, so the missing value in 1st day can be predicted to be 1 according to Eq. (5). By analogy, after prediction, the complete data of the two sensors after prediction are shown in (14).

$$\begin{aligned}{} & {} FirSenMiss_{5\times 10}\nonumber \\ {}{} & {} = \begin{bmatrix} 6&{}\quad 3&{}\quad 0&{}\quad 0&{} \quad 1&{}\quad 17&{}\quad 81&{}\quad 121&{} \quad 87&{} \quad 83 \\ 1&{}\quad 2&{}\quad 1&{}\quad 3&{} \quad 1&{}\quad 8&{} \quad 51&{} \quad 112&{}\quad 87&{} \quad 83 \\ 3&{}\quad 1&{}\quad 0&{} \quad 1&{} \quad 4&{}\quad 12&{}\quad 92&{} \quad 0&{}\quad 111&{} \quad 0 \\ 8&{}\quad 0&{}\quad 2&{} \quad 3&{}\quad 8&{}\quad 0&{} \quad 112&{} \quad 173&{}\quad 0&{} \quad 117 \\ 0&{}\quad 3&{}\quad 4&{}\quad 4&{} \quad 4&{} \quad 34&{}\quad 0&{}\quad 157&{} \quad 0&{} \quad 119 \\ \end{bmatrix}, \nonumber \\{} & {} SecSenMiss_{5\times 10}\nonumber \\ {}{} & {} = \begin{bmatrix} 3&{}\quad 0&{}\quad 0&{} \quad 0&{}\quad 0&{}\quad 3&{} \quad 7&{}\quad 7&{}\quad 18&{}\quad 17 \\ 0&{}\quad 2&{} \quad 0&{} \quad 0&{} \quad 2&{}\quad 2&{}\quad 7&{}\quad 9&{} \quad 19&{} \quad 10 \\ 0&{}\quad 0&{} \quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 11&{}\quad 0&{}\quad 18&{}\quad 0 \\ 0&{}\quad 0&{} \quad 1&{} \quad 2&{} \quad 0&{}\quad 0&{} \quad 8&{} \quad 11&{} \quad 0&{} \quad 33.25 \\ 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{} \quad 0&{}\quad 0&{}\quad 4&{} \quad 0&{}\quad 38 \\ \end{bmatrix}, \end{aligned}$$

(13)

$$\begin{aligned}{} & {} FirSenFore_{5\times 10}\nonumber \\ {}{} & {} = \begin{bmatrix} 6&{} \quad 3&{}\quad 1&{} \quad 2&{} \quad 1&{} \quad 17&{}\quad 81&{} \quad 121&{} \quad 87&{}\quad 83 \\ 1&{}\quad 2&{}\quad 1&{}\quad 3&{}\quad 1&{}\quad 8&{}\quad 51&{}\quad 112&{}\quad 87&{}\quad 83 \\ 3&{}\quad 1&{} \quad 1&{} \quad 1&{} \quad 4&{}\quad 12&{} \quad 92&{} \quad 116.5&{}\quad 111&{}\quad 83 \\ 8&{}\quad 3&{}\quad 2&{} \quad 3&{}\quad 8&{}\quad 34&{}\quad 112&{}\quad 173&{}\quad 0&{}\quad 117 \\ 8&{}\quad 3&{} \quad 4&{} \quad 4&{}\quad 4&{}\quad 34&{} \quad 112&{} \quad 157&{}\quad 0&{} \quad 119 \\ \end{bmatrix}, \nonumber \\{} & {} SecSenFore_{5\times 10}\nonumber \\ {}{} & {} = \begin{bmatrix} 3&{}\quad 2&{} \quad 0&{} \quad 0&{}\quad 2&{} \quad 3&{} \quad 7&{}\quad 7&{}\quad 18&{} \quad 17 \\ 3&{}\quad 2&{} \quad 0&{} \quad 0&{} \quad 2&{} \quad 2&{} \quad 7&{}\quad 9&{}\quad 19&{} \quad 10 \\ 3&{}\quad 2&{} \quad 0&{} \quad 0&{} \quad 2&{} \quad 2.5&{}\quad 11&{}\quad 8&{}\quad 18&{}\quad 13.5 \\ 0&{}\quad 0&{} \quad 1&{}\quad 2&{} \quad 0&{} \quad 0&{} \quad 8&{} \quad 11&{}\quad 0&{} \quad 33.25 \\ 0&{} \quad 0&{} \quad 1&{} \quad 2&{} \quad 0&{} \quad 0&{} \quad 8&{}\quad 4&{}\quad 0&{}\quad 38 \\ \end{bmatrix}.\nonumber \\ \end{aligned}$$

(14)

Evaluation and further discussion

Next, we measure the performances of our proposed $ASMVP_{distr-LSH}$ method and compare it with another existing methods: $SerRec_{distri-LSH}$ [44] and $Optimal-Pub$ [45]. The recruited dataset is WS-DREAM [48]. Experiments are deployed in a laptop with 2.50 GHz CPU and 8.0 GB RAM, and repeated 50 times.

Profile 1: Accuracy comparison

In this profile, we test and compare the prediction accuracy (RMSE, smaller is better) of our method with other ones. Parameters are as follows: SN = 300, TN is varied from 1000 to 4000, threshold of $|DS-Set|$ = 4. Experimental results are reported in Fig. 2. As the results indicate, the accuracy of $ASMVP_{distr-LSH}$ is the highest (i.e., RMSE is the smallest) among the three methods because $ASMVP_{distr-LSH}$ can find out the most similar sensors with the target sensor whose data are missing, based on the time-aware LSH technique. Therefore, $ASMVP_{distr-LSH}$ can achieve a good prediction performance.

Profile 2: Efficiency comparison

In this profile, we compare the prediction efficiency of our method with other ones. Parameters are as follows: SN = 300, TN is varied from 1000 to 4000, threshold of $|DS-Set|$ = 4. Experimental results are reported in Fig. 3. As Fig. 3 indicates, the efficiency of $Optimal-Pub$ is the highest because it does not need to protect all the sensitive information of users when predicting missing values. While for $ASMVP_{distr-LSH}$ and $SerRec_{distri-LSH}$, additional time is needed to secure user privacy during prediction; therefore, time cost is increased accordingly. Therefore, $ASMVP_{distr-LSH}$ performs better than $SerRec_{distri-LSH}$ because we only need to consider the similar time periods in $ASMVP_{distr-LSH}$.

Profile 3: Performances with respect to the threshold of $|DS-Set|$.

The threshold of $|DS-Set|$ affects the prediction performances of our $ASMVP_{distr-LSH}$. Next, we measure the relationships. Parameters are as follows: SN = 300, TN = 4000, threshold of $|DS-Set|$ = 2, 4, 6, 8. Experimental results are reported in Figs. 4 and 5. As results show, the RMSE and time cost of $ASMVP_{distr-LSH}$ both decline with the growth of threshold. This can be explained as follows: a larger threshold means a “more similar” but “fewer” sensors based on the monitored data at more time periods. Therefore, the prediction results are better in accuracy and time cost simultaneously.

Profile 4: RMSE convergence of three methods.

The RMSE convergence of different methods is presented in Fig. 6. Parameters are as follows: SN = 300, TN is varied from 1000 to 4000, threshold of $|DS-Set|$ = 4. Experimental results show that it is rational to execute experiments 50 times since the RMSE performances of three methods are all becoming stable approximately. This means that the convergence of our proposal is relatively satisfactory.

Next, we briefly analyze the shortcoming or limitation of our proposal in this paper and point out the possible improvement directions in the upcoming studies.

(1)
In our prediction method for missing traffic flow data caused by abnormal sensors, LSH technique is employed to achieve the privacy protection goal. Overall, our method can secure the sensitive user information while making missing traffic value prediction for abnormal sensors. However, it is still difficult to measure or evaluate the capability of degree of the proposed privacy-preservation method. This is because LSH is essentially a hash-based technique and we cannot measure its privacy-preservation effects directly and quantitatively.
(2)
Traffic flow data are heavily dependent on the time factor because users’ driving behaviors everyday render an obvious time-varied fluctuation tendency. Therefore, this paper takes the time factor into consideration when making accurate traffic data prediction. However, traffic data flow is also rather related to other influencing factors besides time, such as location, weather, climate and so on. Therefore, it is beneficial to extend the current traffic flow data prediction method by incorporating more influencing factors. Such an extension is helpful for creating a comprehensive and wide prediction framework for missing traffic data flow, especially in complex city management.
(3)
In the current traffic flow prediction method based on time, each day is divided into 96 time periods, each of which is corresponding to 15 min. Such a time interval segmentation way is fixed and lacks of some flexibility. For example, for busy hours in a day, traffic condition varies with time frequently; in this situation, a smaller time period division manner is better to depict the traffic flow condition of the city. While on the contrary, for free hours in a day, traffic condition seldom varies with time; in this situation, a larger time period is better for describing the traffic condition of the city. Therefore, flexible setting of time period in time-aware traffic flow data prediction is necessary and beneficial to the prediction accuracy and efficiency.
(4)
LSH is practically a probability-based similar object search technique; therefore, there is some uncertainty in traffic flow data prediction. In other words, it is possible that the prediction performances are not as good as expected, especially in terms of prediction accuracy. In view of this limitation, we need to optimize the traffic data prediction accuracy by improving the traditional LSH technique. One promising way in optimization is to use multiple hash functions instead of only one hash function when creating traffic sensor indexes or time slot indexed with LSH. Thus, through multiple repetition process of LSH, we can achieve a good tradeoff between traffic prediction accuracy and efficiency.

Conclusions and future work

Missing data of abnormal sensors is normal in traffic domain and brings a big challenge for accurate traffic flow prediction and traffic routine scheduling in smart city managements. Motivated by this challenge, this paper presents a distributed traffic flow missing value prediction method with privacy-preservation function for abnormal traffic sensors, i.e., $ASMVP_{distr-LSH}$. First, our method can integrate known traffic flow data from different sensors offline and send these data to the edge server [49, 50]. Then, similar dates with close traffic conditions are filtered out based on LSH technique. Finally, the Top-k dates as well as their traffic flow data are used for predicting the missing traffic data of abnormal sensors in a certain day. To verify the feasibility of $ASMVP_{distr-LSH}$, we provide a case study to demonstrate the concrete process in missing value prediction with privacy-preservation.

In the future, we will use a set of real traffic sensor data to test the performances of our method, and compare its performance with other related methods. Compared with cloud AI, edge AI has lower latency, which will greatly improve the performance of traffic flow prediction [51,52,53]. Therefore, we will consider applying edge AI technology to future research. For abnormal sensors, it is still a challenging task to consider user privacy, prediction accuracy and scalability simultaneously[54, 55]. We will consider a more complex traffic flow prediction scenario in the upcoming study [56, 57]. We will also further study how missing values in traffic flow are generated and how they compare with contemporary methods. In addition to missing values in traffic flow, we will also consider sensor-induced anomalous data through anomaly detection.

References

Wang Z, Liu J, Shen S, Li M (2021) Restaurant recommendation in vehicle context based on prediction of traffic conditions. Int J Pattern Recognit Artif Intell 35(10):2159044. https://doi.org/10.1142/S0218001421590448
Article Google Scholar
Chen L (2021) Road vehicle recognition algorithm in safety assistant driving based on artificial intelligence. Soft Comput 1–10 . https://doi.org/10.1007/s00500-021-06011-w
Hong L, Lamberson P, Page SE (2021) Hybrid predictive ensembles: synergies between human and computational forecasts. J Soc Comput 2(2):89–102
Article Google Scholar
Catlett C, Beckman P, Ferrier N, Nusbaum H, Papka ME, Berman MG, Sankaran R (2020) Measuring cities with software-defined sensors. J Soc Comput 1(1):14–27
Article Google Scholar
Waggoner PD (2021) Pandemic policymaking. J Soc Comput 2(1):14–26
Article Google Scholar
Xu X, Zhang X, Gao H, Xue Y, Qi L, Dou W (2020) Become: Blockchain-enabled computation offloading for iot in mobile edge computing. IEEE Trans Industr Inf 16(6):4187–4195. https://doi.org/10.1109/TII.2019.2936869
Article Google Scholar
He Q, Dong Z, Chen F, Deng S, Liang W, Yang Y (2022) Pyramid: Enabling hierarchical neural networks with edge computing. In: Proceedings of the ACM Web Conference 2022. WWW ’22, pp. 1860–1870. Association for Computing Machinery, New York, NY, USA . https://doi.org/10.1145/3485447.3511990
Jin J, Zhu X, Wu B, Zhang J, Wang Y (2022) A dynamic and deadline-oriented road pricing mechanism for urban traffic management. Tsinghua Sci Technol 27(1):91–102. https://doi.org/10.26599/TST.2020.9010062
Article Google Scholar
Guezzaz A, Asimi Y, Azrour M, Asimi A (2021) Mathematical validation of proposed machine learning classifier for heterogeneous traffic and anomaly detection. Big Data Min Anal 4(1):18–24. https://doi.org/10.26599/BDMA.2020.9020019
Article Google Scholar
Wang F, Li G, Wang Y, Rafique W, Khosravi MR, Liu G, Liu Y, Qi L (2022) Privacy-aware traffic flow prediction based on multi-party sensor data with zero trust in smart city. ACM Trans Internet Technol. https://doi.org/10.1007/978-981-16-6554-7_78
Article Google Scholar
Huang H, Zeng Z, Yao D, Pei X, Zhang Y (2022) Spatial-temporal convlstm for vehicle driving intention prediction. Tsinghua Sci Technol 27(3):599–609. https://doi.org/10.26599/TST.2020.9010061
Article Google Scholar
Malek YN, Najib M, Bakhouya M, Essaaidi M (2021) Multivariate deep learning approach for electric vehicle speed forecasting. Big Data Min Anal 4(1):56–64. https://doi.org/10.26599/BDMA.2020.9020027
Article Google Scholar
Zhong G, Xiong K, Zhong Z, Ai B (2021) Internet of things for high-speed railways. Intell Converg Netw 2(2):115–132
Article Google Scholar
Zhao B, Zeng W, Gao W, Zhang Q (2022) Guest editorial on “computational intelligence in analysis and integration of complex systems’’. Complex Intell Syst. https://doi.org/10.1007/s40747-022-00736-3
Article Google Scholar
Xu X, Li H, Xu W, Liu Z, Yao L, Dai F (2022) Artificial intelligence for edge service optimization in internet of vehicles: A survey. Tsinghua Science and Technology 27(2):270–287. https://doi.org/10.26599/TST.2020.9010025
Article Google Scholar
Xu S, Chen X, He Y (2021) Evchain: an anonymous blockchain-based system for charging-connected electric vehicles. Tsinghua Sci Technol 26(6):845–856. https://doi.org/10.26599/TST.2020.9010043
Article Google Scholar
Liu Y, Song Z, Xu X, Rafique W, Zhang X, Shen J, Khosravi MR, Qi L (2021) Bidirectional gru networks-based next poi category prediction for healthcare. Int J Intell Syst. https://doi.org/10.1002/int.22710
Hou C, Wu J, Cao B, Fan J (2021) A deep-learning prediction model for imbalanced time series data forecasting. Big Data Min Anal 4(4):266–278. https://doi.org/10.26599/BDMA.2021.9020011
Article Google Scholar
Hu C, Fan W, Zeng E, Hang Z, Wang F, Qi L, Bhuiyan MZA (2022) Digital twin-assisted real-time traffic data prediction method for 5g-enabled internet of vehicles. IEEE Trans Industr Inf 18(4):2811–2819. https://doi.org/10.1109/TII.2021.3083596
Article Google Scholar
Qi L, Hu C, Zhang X, Khosravi MR, Sharma S, Pang S, Wang T (2021) Privacy-aware data fusion and prediction with spatial-temporal context for smart city industrial environment. IEEE Trans Industr Inf 17(6):4159–4167. https://doi.org/10.1109/TII.2020.3012157
Article Google Scholar
Sandhu AK (2022) Big data with cloud computing: discussions and challenges. Big Data Min Anal 5(1):32–40
Article Google Scholar
Sun L, Ping G, Ye X (2022) Privbv: distance-aware encoding for distributed data with local differential privacy. Tsinghua Sci Technol 27(2):412–421. https://doi.org/10.26599/TST.2021.9010027
Article Google Scholar
Qi L, Yang Y, Zhou X, Rafique W, Ma J (2021) Fast anomaly identification based on multi-aspect data streams for intelligent intrusion detection toward secure industry 4.0. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2021.3139363
Li D, Zhang W, Shen S, Zhang Y (2017) Ses-lsh: Shuffle-efficient locality sensitive hashing for distributed similarity search. In: 2017 IEEE International Conference on Web Services (ICWS), pp. 822–827. https://doi.org/10.1109/ICWS.2017.99
Kong L, Wang L, Gong W, Yan C, Duan Y, Qi L (2021) Lsh-aware multitype health data prediction with privacy preservation in edge environment. World Wide Web, 1–16. https://doi.org/10.1007/s11280-021-00941-z
Lin W, Zhang X, Qi L, Li W, Li S, Sheng VS, Nepal S (2021) Location-aware service recommendations with privacy-preservation in the internet of things. IEEE Transactions on Computational Social Systems 8(1):227–235. https://doi.org/10.1109/TCSS.2020.2965234
Article Google Scholar
Qi L, Wang F, Xu X, Dou W, Zhang X, Khosravi MR, Zhou X (2022) Time-aware missing traffic flow prediction for sensors with privacy-preservation. In: Liu, Q., Liu, X., Chen, B., Zhang, Y., Peng, J. (eds.) Proceedings of the 11th International Conference on Computer Engineering and Networks, pp. 721–730. Springer, Singapore . https://doi.org/10.1007/978-981-16-6554-7_78
Wang L, Zhang X, Wang T, Wan S, Srivastava G, Pang S, Qi L (2021) Diversified and scalable service recommendation with accuracy guarantee. IEEE Transactions on Computational Social Systems 8(5):1182–1193. https://doi.org/10.1109/TCSS.2020.3007812
Article Google Scholar
Xu X, Huang Q, Zhang Y, Li S, Qi L, Dou W (2021) An lsh-based offloading method for iomt services in integrated cloud-edge environment. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16(3s), 1–19 . https://doi.org/10.1145/3408319
Zhang K, Fan S, Wang HJ (2018) An efficient recommender system using locality sensitive hashing. In: Proceedings of the 51st Hawaii International Conference on System Sciences. https://doi.org/10.24251/HICSS.2018.098
Liu Y, Pei A, Wang F, Yang Y, Zhang X, Wang H, Dai H, Qi L, Ma R (2021) An attention-based category-aware gru model for the next poi recommendation. Int J Intell Syst 36(7):3174–3189. https://doi.org/10.1002/int.22412
Article Google Scholar
Tian Y, Zhang K, Li J, Lin X, Yang B (2018) Lstm-based traffic flow prediction with missing data. Neurocomputing 318:297–305. https://doi.org/10.1016/j.neucom.2018.08.067
Article Google Scholar
Laña I, Olabarrieta II, Vélez M, Del Ser J (2018) On the imputation of missing data for road traffic forecasting: New insights and novel techniques. Transportation research part C: emerging technologies 90:18–33. https://doi.org/10.1016/j.trc.2018.02.021
Article Google Scholar
Liu Y, Li D, Wan S, Wang F, Dou W, Xu X, Li S, Ma R, Qi L (2022) A long short-term memory-based model for greenhouse climate prediction. Int J Intell Syst 37(1):135–151. https://doi.org/10.1002/int.22620
Article Google Scholar
Li L, Zhang J, Wang Y, Ran B (2019) Missing value imputation for traffic-related time series data based on a multi-view learning method. IEEE Trans Intell Transp Syst 20(8):2933–2943. https://doi.org/10.1109/TITS.2018.2869768
Article Google Scholar
Boquet G, Vicario JL, Morell A, Serrano J (2019) Missing data in traffic estimation: A variational autoencoder imputation method. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2882–2886 . https://doi.org/10.1109/ICASSP.2019.8683011
Zhang T, Zhang D-G, Yan H-R, Qiu J-N, Gao J (2021) A new method of data missing estimation with fnn-based tensor heterogeneous ensemble learning for internet of vehicle. Neurocomputing 420:98–110. https://doi.org/10.1016/j.neucom.2020.09.042
Article Google Scholar
Ma Z, Zhang T, Liu X, Li X, Ren K (2019) Real-time privacy-preserving data release over vehicle trajectory. IEEE Trans Veh Technol 68(8):8091–8102. https://doi.org/10.1109/TVT.2019.2924679
Article Google Scholar
He BY, Chow JY (2019) Optimal privacy control for transport network data sharing. Transportation Research Procedia 38:792–811. https://doi.org/10.1016/j.trpro.2019.05.041
Article Google Scholar
Zhou Y, Mo Z, Xiao Q, Chen S, Yin Y (2016) Privacy-preserving transportation traffic measurement in intelligent cyber-physical road systems. IEEE Trans Veh Technol 65(5):3749–3759. https://doi.org/10.1109/TVT.2015.2436395
Article Google Scholar
Yang J, Mu Z, Liu G, Gong S, Javed A, Cao J, Yang B, Yuan F (2018) Double-layered wrap-around sensor network for three-dimensional positioning of acoustic emission sources and its effect on monitoring crack propagation in rock samples. Int J Distrib Sens Netw 14(9):1550147718803068. https://doi.org/10.1177/1550147718803068
Article Google Scholar
Wu B, Chen X, Wu Z, Zhao Z, Mei Z, Zhang C (2021) Privacy-guarding optimal route finding with support for semantic search on encrypted graph in cloud computing scenario. Wireless Communications and Mobile Computing 2021. https://doi.org/10.1155/2021/6617959
Liu Y, Yu JJQ, Kang J, Niyato D, Zhang S (2020) Privacy-preserving traffic flow prediction: A federated learning approach. IEEE Internet Things J 7(8):7751–7763. https://doi.org/10.1109/JIOT.2020.2991401
Article Google Scholar
Qi L, Zhang X, Dou W, Ni Q (2017) A distributed locality-sensitive hashing-based approach for cloud service recommendation from multi-source data. IEEE J Sel Areas Commun 35(11):2616–2624
Meng S, Li Q, Zhang J, Lin W, Dou W (2020) Temporal-aware and sparsity-tolerant hybrid collaborative recommendation method with privacy preservation. Concurrency and Computation: Practice and Experience 32(2):5447
Qi L, Yang Y, Zhou X, Rafique W, Ma J (2021) Fast anomaly identification based on multi-aspect data streams for intelligent intrusion detection toward secure industry 4.0. IEEE Transactions on Industrial Informatics, 1 . https://doi.org/10.1109/TII.2021.3139363
Wang W, Lv Z, Lu X, Zhang Y, Xiao L (2021) Distributed reinforcement learning based framework for energy-efficient uav relay against jamming. Intelligent and Converged Networks 2(2):150–162
Article Google Scholar
Zheng Z, Zhang Y, Lyu MR (2012) Investigating qos of real-world web services. IEEE Trans Serv Comput 7(1):32–39
Wu X, Qi L, Gao J, Ji G, Xu X (2022) An ensemble of random decision trees with local differential privacy in edge computing. Neurocomputing 485:181–195. https://doi.org/10.1016/j.neucom.2021.01.145
Article Google Scholar
Xu X, Jiang Q, Zhang P, Cao X, R. Khosravi M, T. Alex L, Qi L, Dou W (2022) Game theory for distributed iov task offloading with fuzzy neural network in edge computing. IEEE Transactions on Fuzzy Systems, 1 . https://doi.org/10.1109/TFUZZ.2022.3158000
Jia Y, Liu B, Dou W, Xu X, Zhou X, Qi L, Yan Z (2022) Croapp: A cnn-based resource optimization approach in edge computing environment. IEEE Transactions on Industrial Informatics, 1 . https://doi.org/10.1109/TII.2022.3154473
Xu X, Tian H, Zhang X, Qi L, He Q, Dou W (2022) Discov: Distributed covid-19 detection on x-ray images with edge-cloud collaboration. IEEE Transactions on Services Computing, 1. https://doi.org/10.1109/TSC.2022.3142265
Yuan L, He Q, Chen F, Zhang J, Qi L, Xu X, Xiang Y, Yang Y (2022) Csedge: Enabling collaborative edge storage for multi-access edge computing based on blockchain. IEEE Trans Parallel Distrib Syst 33(8):1873–1887. https://doi.org/10.1109/TPDS.2021.3131680
Article Google Scholar
Li L, Shao W, Zhou X (2021) A flexible scheduling algorithm for the 5th-generation networks. Intelligent and Converged Networks 2(2):101–107
Article Google Scholar
Dowling A, Huie L, Njilla L, Zhao H, Liu Y (2021) Toward long-range adaptive communication via information centric networking. Intelligent and Converged Networks 2(1):1–15
Article Google Scholar
Qi L, Lin W, Zhang X, Dou W, Xu X, Chen J (2022) A correlation graph based approach for personalized and compatible web apis recommendation in mobile app development. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2022.3168611
Article Google Scholar
Su Y-S, Ruan Y, Sun S, Chang Y-T (2020) A pattern recognition framework for detecting changes in chinese internet management system. Journal of Social Computing 1(1):28–39
Article Google Scholar

Download references

Author information

Authors and Affiliations

Shandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and Technology, Weifang, 261000, China
Ailing Gao & Xiaomei Liu
School of Computer Science, Qufu Normal University, Rizhao, 276800, China
Ying Miao

Authors

Ailing Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaomei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Miao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Miao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gao, A., Liu, X. & Miao, Y. LSH-based missing value prediction for abnormal traffic sensors with privacy protection in edge computing. Complex Intell. Syst. 9, 5081–5091 (2023). https://doi.org/10.1007/s40747-023-00992-x

Download citation

Received: 02 May 2022
Accepted: 27 January 2023
Published: 02 March 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s40747-023-00992-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

LSH-based missing value prediction for abnormal traffic sensors with privacy protection in edge computing

Abstract

Similar content being viewed by others

Time-Aware Missing Traffic Flow Prediction for Sensors with Privacy-Preservation

A New Framework for Anomaly Detection Based on KNN-Distort in the Metro Traffic Flow

Edge computing empowered anomaly detection framework with dynamic insertion and deletion schemes on data streams

Introduction