A federated pedestrian trajectory prediction model with data privacy protection

Ni, Rongrong; Lu, Yanan; Yang, Biao; Yang, Changchun; Liu, Xiaofeng

doi:10.1007/s40747-023-01239-5

A federated pedestrian trajectory prediction model with data privacy protection

Original Article
Open access
Published: 03 October 2023

Volume 10, pages 1787–1799, (2024)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

A federated pedestrian trajectory prediction model with data privacy protection

Download PDF

Rongrong Ni¹,
Yanan Lu²,
Biao Yang^1,3,
Changchun Yang³ &
…
Xiaofeng Liu ORCID: orcid.org/0000-0003-1310-6739¹

1135 Accesses
3 Citations
Explore all metrics

Abstract

Pedestrian trajectory prediction is essential for self-driving vehicles, social robots, and intelligent monitoring applications. Diverse trajectory data is critical for high-accuracy trajectory prediction. However, the trajectory data is captured in scattered scenes, which can cause the problem of data island. Furthermore, artificial aggregation of trajectory data suffers from the risk of data leakage, ignoring the rule of privacy protection. We propose a multi-scene federated trajectory prediction (Fed-TP) method to solve the above problems. As our key contribution, a destination-oriented LSTM (Long Short-Term Memory)-based trajectory prediction (DO-TP) network is proposed in each scene to forecast future trajectories in an encoder-decoder manner. The independent training using trajectory data in each scene can prevent data leakage and achieves high privacy security. As another key contribution, a federated learning framework is introduced to break the scene limitation by conducting distributed collaborative training. The performance of different federated learning methods is compared on public datasets, including ETH, UCY, and Stanford Drone Dataset (SDD). Compared with FedAvg and FedProx, FedAtt is more suitable for pedestrian trajectory prediction. Experimental results demonstrate that the proposed method has better data privacy security than directly training on multiple scenes and superior prediction performance than training on a single scene.

Social-Transformer: Pedestrian Trajectory Prediction in Autonomous Driving Scenes

SGAMTE-Net: A pedestrian trajectory prediction network based on spatiotemporal graph attention and multimodal trajectory endpoints

Article 05 December 2023

Dummy Trajectory Generation Scheme Based on Deep Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Modern society suffers from high-density flows of people and vehicles. Therefore, forecasting future trajectories of moving targets presents more critical effects in various applications. For example, in the field of automatic driving [1,2,3,4], the assistant system of an established trajectory prediction program can help drivers control vehicles [5,6,7], predict pedestrians’ walking intentions [8] on crowded roads, and reduce traffic accidents caused by driver’s fatigue or inattention. Meanwhile, studying the crowd walking path and behavior also presents significance to intelligent social robots [9], smart city construction [10], and cultural and entertainment industries. For example, restaurant service robots [11] need to predict the trajectory of guests to optimize their service paths. Intelligent tracking and monitoring system [12] in the city also needs to understand the interaction between pedestrians to prevent the occurrence of danger and ensure social stability. Therefore, predicting pedestrian trajectories has become the top priority of modern society.

The high performance of pedestrian trajectory prediction models mainly relies on rich trajectory data from different scenes. However, current research mainly adopts a centralized manner by artificially aggregating data from multiple scenes for centralized training. Such a manner does not match the reality that data are primarily scattered in various surveillance devices in cities that cannot be connected and shared. Moreover, the most severe issue is the possible leakage of user privacy, which is not conducive to protecting data security. With the recent awakening of the self-protection consciousness of ordinary people, the security of user personal data privacy [13] has aroused great concern in all social strata. According to the General Data Protection Ordinance [14], data is the user’s private property, and no organization has the right to use it without prior consent. Therefore, the research challenge is combining the data of multiple scenes and federally training the trajectory prediction model in the server without using traditional methods and violating users’ privacy.

Federated learning [13, 15, 16] can provide an intelligent solution for breaking data island limitations and securing privacy for collaborative data training from multiple scenes. It has excellent potential for federated training of multi-scene data [17]. Federated learning does not need to gather data from different scenes, which may damage data privacy [18, 19]. On the contrary, it maintains a shared global model in a federated server, which exchanges information with federated clients in which local models are trained with data in different scenes. Federated learning has been applied in many fields [13], especially in medicine [20] for protecting patients’ privacy. However, studies that combine pedestrian trajectory prediction with federated learning remain lacking. Therefore, we introduce a pedestrian trajectory prediction model with data collected from different scenes based on a federated learning paradigm (as shown in Fig. 1).

In summary, the main contributions of this work are as follows:

(1)
To forecast trajectories in a local scene, we propose a lightweight destination-oriented LSTM-based trajectory prediction (DO-TP) network in an encoder-decoder manner. The learned destination information can guide the model to generate credible future trajectories with limited trajectory data.
(2)
To improve the trajectory prediction performance, we gather trajectory data from different scenes but consider data privacy protection. The federated learning strategy is introduced to solve the data island problem by reserving data for cooperative training on federated clients.
(3)
To find the most suitable federated learning paradigm in trajectory prediction, we conduct quantitative and qualitative evaluations on reintegrated ETH, UCY, and SDD data sets to fairly compare different federated learning algorithms.

The rest of this work is arranged as follows. The following section reviews the related work. The proposed method is introduced in detail in “Methods”. The evaluation results are presented in “Experiments”. Conclusions and discussions are presented in “Conclusion and future works”.

Related work

Federated learning

Federated learning is a recently proposed machine learning framework that is very popular in protecting personal privacy and solving the data island problem [21]. FedAvg [15] was proposed in 2016 to learn how to share a global model by training local data jointly across mobile devices. The privacy of the data was significantly protected. The survey report by Yang et al. [13] showed a bright future for integrating federated learning into various fields. Tan et al. [22] introduced federated learning into online recommendation systems. It solved the data island problem in the recommendation system and protected user privacy. Bai et al. [23] proposed a FedFace framework to apply federated learning to face recognition. It trained multi-party data from the perspective of circumventing user privacy leakage.

In recent years, federated learning has shown a clear trend in combination with reinforcement learning, knowledge distillation, and contrastive learning. Nadiger et al. [24] combined reinforcement learning with federated technology. Users with similar tasks could learn from each other, which in turn reduces the personalization application time. Qi et al. [25] summarized the existing research results on federated reinforcement learning and the related classification. The future directions were also discussed. Chen et al. [26] proposed a concept of federated learning based on cyclic knowledge distillation. By removing the role of the central server, public knowledge within each federation was cyclically accumulated, and each federation was then trained to personalize it. This led to MetaFed, a highly credible and personalized federated framework. Li et al. [27] proposed model-contrastive federated learning (MOON). By performing contrastive learning at the model level, the data heterogeneity problem was solved, and the performance of the federated model on image datasets was improved.

At present, federated learning needs to be studied more in the trajectory prediction community. However, the research on pedestrian trajectory prediction urgently needs to break the current situation that data is closed and cannot be effectively shared to better fall on the field. Therefore, we propose a pedestrian trajectory prediction model based on federated learning, which can efficiently use trajectory data from multiple scenes and protect data privacy, which is helpful for trajectory prediction research. Moreover, the performance of the three federated algorithms in trajectory prediction is explored, and the most suitable one is selected.

Trajectory prediction

The core of trajectory prediction is to predict the future trajectory from the historical trajectory of pedestrians [28]. Early studies relied on kinematic, statistical [29], and probability models [30]. However, these methods often predicted trajectories that did not match the actual trajectories due to their weak fitting ability.

With the development of deep learning [31], data-driven trajectory prediction has become a significant research direction. In 2016, Li et al. proposed Social-LSTM [32], which learned pedestrians’ motion patterns with LSTM and captured social interactions with a social pooling layer. Later, SGAN [33] introduced generative adversarial networks to generate socially acceptable and diverse trajectories. Kosaraju et al. [34] proposed a graph-based generative adversarial network (Social-BiGAT) to improve the multi-modality of future trajectories with a latent scene encoder. A graph attention module was introduced to model all pedestrian interactions within a scene. A multi-modal end-to-end trajectory prediction network (Goal-GAN) [35] was proposed to utilize both goal estimation and route navigation modules. The final goal location of the pedestrian was predicted based on the observed path and environmental information to generate a feasible trajectory to reach the goal.

Pedestrian trajectory prediction suffers from future uncertainties from pedestrians’ internal and external factors, such as potential destination, surrounding pedestrian interference, and scene constraints. To better handle the uncertainty of future pedestrian movements and improve the trajectory prediction performance, Yang et al. [36] proposed a POP (pseudo oracle predictor) module to generate an informative potential variable by learning the future behavior of pedestrians. It could be better used in the testing phase and facilitates the broad application of trajectory prediction. A scene-oriented inverse reinforcement learning method [37] was proposed for trajectory prediction by understanding the strong correlation between scene and trajectory, which solved the problem that existing trajectory methods were prone to over-fitting. A group often influences a person’s walking route in realistic crowded scenes. Therefore, Bae et al. [38] proposed GP-Graph, which first assigned pedestrians to the group with the highest similarity and later built graphs based on the interpersonal interactions within and between groups.

The above research ignored the privacy of data and the reality that data is scattered and difficult to collect. We introduce federated learning to make up for the shortcomings of existing work and explore the feasibility of federated learning combined with pedestrian trajectory prediction. In terms of trajectory prediction, a lightweight DO-TP is proposed to perform trajectory prediction in each scene. Unlike former works, we resort to trajectory data only to estimate the potential destination, which avoids the complexity increase caused by semantic segmentation.

The research work mentioned above and our proposed new methods are based on data-driven models. With the vigorous development of artificial intelligence, we also note that recent research uses mathematical structures [39] to constrain machine learning algorithms and guide the generation of optimal strategies. Borodin et al. [40] established a mathematical model on the indicators that affect the financial situation, studied the profitability of enterprises, and then analyzed the development prospects of enterprises. Tutsoy et al. [41] proposed a multi-dimensional decision algorithm based on artificial intelligence, which used a mathematical model to derive constraints. Experimental results show that it can generate specific optimal strategies according to the importance of each sub-model. Bouchnita et al. [42] proposed a new method of the mathematical model combined with deep learning to rapidly predict patients’ specific response to anticoagulant therapy, which was conducive to clinical decision-making and effective management of coagulopathy.

Methods

It is not conducive to improving trajectory prediction performance if the scene data cannot be effectively shared. However, a high risk of privacy leakage existed if manually concentrating trajectory data in different scenes. Therefore, we propose a federated learning-based trajectory prediction (Fed-TP) method to address these two significant limitations. Fed-TP consists of the following two parts: (1) A lightweight DO-TP that forecasts pedestrians’ future trajectories based on their historical trajectories with an LSTM-based encoder-decoder. Potential destinations are learned from observation ground-truth and prediction positions without any scene information. (2) A federated learning framework that trains the trajectory prediction model with data from multiple scenes in a privacy protection manner. As presented in Fig. 2, the trajectory data of each scene is retained in the local scene. Afterward, the model is trained in each federated client, avoiding data privacy leakage during data transmission. Then all parameters are aggregated in the federated server to train a global model. Afterward, different federated learning algorithms are evaluated to select the most suitable training paradigm.

Destination-oriented trajectory prediction

This section presents the definition of trajectory prediction. Assuming pedestrians $P_{1}, P_{2},\ldots , P_{n}$ exist in the scene, we first term the position $\left\{ x_{i}^t, y_{i}^t \right\} $ of pedestrian $P_{i}$ at time step t as $P_{i}^t$. The purpose of trajectory prediction is to forecast the future trajectory $ \hat{Y_{i}} =\left\{ P_{i}^{T_{\textrm{obs}+1}},\ldots , P_{i}^{T_\textrm{pred}}\right\} $, considering the historical trajectory $Y_{i}=\left\{ P_{i}^1,\ldots , P_{i}^{T_\textrm{obs}}\right\} $. $T_\textrm{obs}$ and $T_\textrm{pred}$ are the lengths of observation and prediction, respectively.

A lightweight DO-TP is proposed considering the computational burdens in intelligent edge devices. DO-TP takes historical trajectories of all pedestrians in the scenes as input and outputs their predicted trajectories. To achieve trajectory prediction, the model comprises the LSTM-based encoding and decoding modules, which encode pedestrian motion patterns from their historical trajectories and decode the predicted trajectories from the learned motion patterns. Moreover, the model contains two LSTMs and two fully connected (FC) layers to predict destinations. Similar to [36], pedestrians’ relative displacements are fed into the encoding module to obtain the hidden states that represent pedestrians’ motion patterns from their observed trajectories, as follows:

$$\begin{aligned} Z_i^t=\theta \left( \left\{ \Delta x_i^t, \Delta y_i^t \right\} ; W_v \right) \end{aligned}$$

(1)

$$\begin{aligned} H_i^t = F_\textrm{enc} (H_i^{t-1}, Z_i^t; W_e ) \end{aligned}$$

(2)

where a linear transformation layer $\theta (\cdot )$ with learnable parameter $W_v$ is used to map the input displacement $\left\{ \Delta x_i^t,\Delta y_i^t\right\} $ into the 64-dimensional motion feature vector $Z_i^t$. $F_\textrm{enc}$ denotes the LSTM-based encoding module with learnable parameter $W_e$. After Xavier initialization, $W_v$ and $W_e$ are gradually learned while updating the network through error back-propagation until given epochs. $H_i^{t-1}$ and $H_i^t$ represent the hidden states of $F_\textrm{enc}(\cdot )$ at time steps $t-1$ and t, respectively. The relative displacement $\left\{ \Delta x_i^t, \Delta y_i^t\right\} $ is defined as:

$$\begin{aligned} \Delta x_i^t = \left( x_i^t - x_i^{t-1} \right) \end{aligned}$$

(3)

$$\begin{aligned} \Delta y_i^t = \left( y_i^t-y_i^{t-1} \right) \end{aligned}$$

(4)

where $(x_i^t, y_i^t)$ denotes the two-dimensional spatial coordinate of pedestrian $P_i$ at time step t.

Enlightened by GTPPO [36], we propose a destination prediction strategy without scene semantic segmentation to maintain lightweight. Specifically, we use two LSTMs to extract sequential position information from observed and future predicted spatial coordinates. Afterward, two fully connected (FC) layers are introduced to map the position information into 32-dimensional destination-aware latent vectors $D_i$ and $\hat{D_i}$. Considering the observed ground-truth spatial coordinates contain the information about potential destinations, we minimize the KL divergence between $D_i$ and $\hat{D_i}$ during training. Subsequently, the latent vector $D_i$ learned from observed coordinates can provide destination information that can guide the decoding module to generate more precise future trajectories.

$F_\textrm{dec}(\cdot )$ denotes the LSTM-based decoding module with learnable parameter $W_d$, which generates future trajectories based on the encoded motion feature vector, the hidden state, and the destination-aware latent vector, as follows:

$$\begin{aligned} Q_i^{T_{\textrm{obs}+1}} = F_\textrm{dec} \left( H_i^{T_\textrm{obs}}, Z_i^{T_\textrm{obs}} \Vert D_i; W_d \right) \end{aligned}$$

(5)

$$\begin{aligned} \left\{ \Delta x_i^{T_{\textrm{obs}+1}}, \Delta y_i^{T_{\textrm{obs}+1}} \right\} = \delta \left( Q_i^{T_{\textrm{obs}+1}}, W_c \right) \end{aligned}$$

(6)

where $\Vert $ denotes the concatenation operation. $\delta (\cdot )$ is a linear transformation layer ( $W_c$ is its learnable parameter) that converts the hidden state of the decoder $Q_i^{T_{\textrm{obs}+1}}$ into the predicted relative displacement $\left\{ \Delta x_i^{T_{\textrm{obs}+1}}, \Delta y_i^{T_{\textrm{obs}+1}} \right\} $, which is further used to forecast pedestrian $P_i$’s future trajectory $\hat{Y_i}$ through the inverse operation of Eqs. (3) and (4). Figure 3 shows the DO-TP. Afterward, the model in a local scene changes from $\omega _g$ to $\omega _{g+1}^k$ (k represents the serial number of client scenes involved).

Federated learning framework

To overcome the drawbacks of privacy leakage in data aggregation-based trajectory prediction, we introduce the federated learning framework and propose Fed-TP. Given each scene in $(S_1, S_2,\ldots , S_m)$ has its own trajectory data where m denotes the scene number, Fed-TP is trained with the data in each scene without aggregating private data of all scenes. In each training round of federated learning, K scenes are randomly selected from the total m scenes. The training is divided into local and global steps, as follows:

(1)
Local training: each federated client conducts local training for model $\omega _g$ in rounds E to update the model parameter under the current data. As shown in Eq. (7), after E rounds of training, each client updates the global model $\omega _g$ to the local model $\omega _{g+1}^k$ of the client. Model parameters of all federated clients are transmitted to the federated server for the joint update using encryption and privacy protection technology. The local training loss is denoted as $L_k$, which is defined as Eq. (8) below:
$$\begin{aligned}{} & {} \forall k \ \ \omega _{g+1}^k \leftarrow \omega _g - \eta * \gamma _k \ \ \end{aligned}$$
(7)
$$\begin{aligned}{} & {} L_k = \textrm{min} \left\| Y_i - {\hat{Y}}_i \right\| _2 \ \ + \beta * \textrm{KL}(D_i, {\hat{D}}_i) \end{aligned}$$
(8)
where $\gamma _k$ represents the model parameters of the k-th scene selected. The hyper-parameter $\beta $ denotes the trade-off between the trajectory loss and KL divergence. Considering the KL divergence in Eq. (8) is used as an additional constraint to the trajectory loss, we empirically set $\beta $ to a number less than 1. Afterward, we calculate the ADE/FDE values of the proposed method in the used dataset while increasing $\beta $ from 0.1 to 0.9 with a step of 0.05, and the best performance is achieved while setting $\beta $ to 0.1.
(2)
Global training: the federated server-side jointly trains each local dataset. However, the client and the server do not involve data transmission, thus significantly protecting data privacy while enhancing data diversity. The federated server first sends the initial global model $\omega _{g}$ to each client. For neatness, the new round of global model $\omega _{g+1}$ is obtained by aggregating and updating each client model according to the federated client weight parameter $u_k$ and the federated client model parameter $\gamma _k$. The federated server then encrypts the global model $\omega _{g+1}$ and sends it back to each federated client, overwriting the original model, as shown in Eq. (9). SGD is used to update the model until the loss defined in Eq. (10) converges. The calculations are as follows:
$$\begin{aligned}{} & {} \omega _{g+1} \leftarrow \omega _g - \eta \sum _{k=1}^{K} u_k * \gamma _k \ \ \end{aligned}$$
(9)
$$\begin{aligned}{} & {} L_s = \frac{1}{K} \sum _{k=1}^K L_k \ \ \end{aligned}$$
(10)
where $L_s$ denotes the global training loss obtained by the average summation of $L_k$ on each federated client. In Eq. (9), the federated server model is aggregated by the federated client models. There is no data transmission between them, reducing the risk of data leakage, thus protecting data privacy. The pseudo-code of the proposed Fed-TP is presented in Algorithm 1.

Different federated learning frameworks

A suitable federated learning framework is critical for Fed-TP to achieve satisfactory trajectory prediction and privacy protection performance. Therefore, we compare commonly used federated learning frameworks, including FedAvg, FedProx, and FedAtt. Details are introduced as follows:

(1)
FedAvg [15] consists of four steps: ① The federated server sends a global model to each participating client. ② All participating clients use local data to perform stochastic gradient descent to train local models. ③ Each participating client sends its trained model parameters to the federated server. ④ The federated server averages the aggregated model parameters to generate a global model for the next round of training.
(2)
FedProx [43] adds a proximal term to the local objective function of each federated client based on FedAvg to control the deviation of the client-updated model from the global model. Subsequently, the stability of FedProx is improved compared with FedAvg.
(3)
FedAtt [44] highlights the respective importance of each participating client during model aggregation by introducing an attention mechanism. Afterward, FedAtt minimizes the weighted distance between the federated server and the federated client by iteratively updating the parameters to achieve good generalization.

Experiments

Datasets

The proposed method is evaluated on three public datasets, including ETH [45], UCY [46], and Stanford Drone Dataset (SDD) [47], which are widely used for pedestrian trajectory prediction [32, 33, 35, 36, 48]. These datasets are taken from diverse scenes, including the hotel, university, zara, and different places at Stanford. The trajectory data is captured with cameras from different angles, and the layout of the scene and the number of pedestrians vary greatly. Therefore, the trajectory data is reliable and rich. Concretely, the ETH and UCY datasets contain 1536 pedestrians’ walking interactions and other social activities. The ETH dataset contains two scenes: eth and hotel. The UCY dataset contains three scenes: univ, zara1, and zara2. The SDD dataset collects the trajectory data of pedestrians and vehicles with a drone. SDD contains 8 scenes: gates, little, nexus, coupa, bookstore, deathCircle, quad, and hyang. All trajectories are sampled at 2.5 Hz. We pre-process the trajectory data by extracting 20 consecutive frames to form a sample, in which the first eight frames are input, and the last 12 frames are the ground truth. Therefore, the observed and predicted horizons are 8 (3.2 s) and 12 (4.8 s) time steps [49], respectively.

Table 1 The division of training and testing data in FD1 and FD2

Full size table

To evaluate our method under the federated framework, we reintegrate ETH, UCY, and SDD into federated dataset1 (FD1) and dataset2 (FD2). As presented in Table 1, FD1 contains five scenes: eth, hotel, univ, zara1, and zara2. Since the training and testing scenes need to correspond one by one with complete data to conduct comparative experiments, FD2 comprises coupa, gates, hyang, and nexus scenes from SDD. The training and testing sets of FD1 and FD2 are divided based on the official data partitioning in the three public datasets. In addition, in practical applications, data is distributed in scattered scenes. For FD1 and FD2, the training set of each scene only contains local training data in that scene. Subsequently, we use the federated framework to train scattered data jointly. Data does not leave the local scene, instead of the previous way of centralized training. Such a training manner effectively avoids data leakage during transmission, thus protecting data privacy. After training the federated model, the testing is separately performed in each scene with the federated model.

Evaluation metrics

There are two metrics for evaluating the trajectory prediction performance, including average displacement error (ADE) and final displacement error (FDE), which are defined as follows:

(1)
At each time step, ADE calculates the L2 distance between the ground-truth and predicted trajectories. ADE is defined as follows:
$$\begin{aligned} \textrm{ADE} = \dfrac{ \sum _{i=1}^n \sum _{t=T_{\textrm{obs}+1}}^{T_\textrm{pred}} \sqrt{{\left( \left( x_i^t,y_i^t\right) - \left( {\hat{x}}_i^t,{\hat{y}}_i^t \right) \right) }^2} }{n \times T_\textrm{pred}} \nonumber \\ \end{aligned}$$
(11)
where n is the total number of observed pedestrians, $\left( {\hat{x}}_i^t,{\hat{y}}_i^t \right) $ and $\left( x_i^t,y_i^t \right) $ represent the predicted and ground-truth coordinates of pedestrian i at time step t.
(2)
FDE calculates the L2 distance between the ground-truth and predicted trajectories at the final time step. FDE is defined as follows:
$$\begin{aligned}{} & {} \textrm{FDE} = \nonumber \\ {}{} & {} \dfrac{ \sum _{i=1}^n \sqrt{{\left( \left( x_i^{T_\textrm{pred}},y_i^{T_\textrm{pred}}\right) - \left( {\hat{x}}_i^{T_\textrm{pred}},{\hat{y}}_i^{T_\textrm{pred}} \right) \right) }^2} }{ n } \nonumber \\ \end{aligned}$$
(12)

Implementation details

One-layer LSTMs are used for the encoder and decoder, where the dimensions of the hidden states are 32. The total epoch is set to 300. The initial learning rates for FD1 and FD2 are 0.001 and 0.0001, respectively. The proposed Fed-TP is built using the Pytorch framework and trained with an NVIDIA RTX-3080 GPU.

Table 2 Comparison results of different parameter K using three federated algorithms (lower is better)

Full size table

Table 3 Comparison results of different parameter E using three federated algorithms (lower is better)

Full size table

Table 4 Comparison results of different parameter B using three federated algorithms (lower is better)

Full size table

Table 5 Comparison results of different federated algorithms (FA) on FD1 (lower is better)

Full size table

Table 6 Comparison results of different federated algorithms (FA) on FD2 (lower is better)

Full size table

Evaluation of key parameters

This section evaluates Fed-TP’s key parameters (K, E, and B) with three federated paradigms. Tables 2, 3, and 4 report the comparison results of different key parameters. The other two parameters are fixed when we evaluate a specific parameter. From the results, we can conclude as follows:

(1)
Parameter K denotes the number of clients participating in the training. However, the trajectory data involved in the training are not artificially integrated but retained locally for collaborative training. Hence, data privacy protection is strengthened because there is no data transmission. Table 2 shows that a larger K leads to lower ADE/FDE values. That is to say, the more clients participating in the training, the better trajectory prediction performance can be achieved. Considering the difference in data set scenes, the values of K for FD1 and FD2 are set to 5 and 4, respectively.
(2)
Parameter E denotes the training rounds for each federated client. This parameter affects the computational efficiency and controls the local model training performance. A small E indicates insufficient client training, whereas a large E may result in over-fitting. Table 3 reports that all three methods almost achieve the best performance when E is set to 7. Considering the computational efficiency and trajectory prediction performance, E is set to 7 in subsequent evaluations.
(3)
Parameter B denotes the batch size. Table 4 shows that the setting of B slightly influences FD1 but significantly influences FD2. Considering the comparison results, the batch sizes for FD1 and FD2 are set to 128 and 16, respectively.

Table 7 Comparison of global training time (s) for three federated algorithms (FA) on FD1 and FD2 for one round

Full size table

Table 8 Comparison results of different training paradigms (TP) on FD1 (lower is better)

Full size table

Table 9 Comparison results of different training paradigms (TP) on FD2 (lower is better)

Full size table

Comparisons of different federated algorithms

Comparisons of different federated algorithms are conducted using the above-mentioned key parameters, K, B, and E, which are experimentally determined on FD1 and FD2, respectively. The same data set and hyper-parameters ensure the fair comparisons of three federated algorithms. Generally, the three federated algorithms present similar trajectory prediction performances in different scenes and can solve the problem of trajectory data island and data privacy leakage in joint training scenes. Figure 4a, b show the error curves of the three algorithms for univ of FD1 and coupa of FD2, respectively. The disparity of the ADE curves of the three algorithms is weak. Comparison results for other scenes of the two datasets are reported in Tables 5 and 6. On FD1, the experimental results of the three federated algorithms are similar. However, the FedAtt algorithm has a lower average result of 0.45/0.62 and 0.16/0.12 compared with FedAvg and FedProx on FD2. At the same time, it can be seen from Table 7 that the three federated algorithms combine their respective scenes on FD1 and FD2, respectively. The time of the global training round is close, but FedAtt takes the least time. Therefore, FedAtt is used as the training paradigm of the proposed Fed-TP in the following evaluations.

Comparisons of different training paradigms

“Evaluation of key parameters” and “Comparisons of different federated algorithms” compare the key parameters and effects of three federated algorithms. This section compares the performance of different training paradigms, including single-scene, centralized, and federated. For single-scene training, only the training data of one scene is used for the trajectory prediction model. Afterward, the ADE/FDE values are calculated on the testing data of all scenes. Compared with Fed-TP, centralized training uses manually integrated training data for all scenes rather than federated.

Tables 8 and 9 show that single-scene training cannot perform satisfactorily due to lacking training samples. Compared with single-scene training, centralized training achieves the best performance by introducing multi-scene collaborative training. By aggregating all scene data for unified training, the average ADE/FDE values decrease by 0.34/0.66 and 4.68/8.71 on FD1 and FD2, respectively. However, directly aggregating all scene data ignores data privacy and may lead to leakage. The proposed Fed-TP can protect data privacy while aggregating all scene data to train a satisfactory trajectory prediction model. As shown in Fig. 5, there are five scenes in Fig. 5a, and each scene has a different amount of data. Due to the problem of data island, a single scene can only be trained locally. Thus the prediction trajectory error of the scene with fewer data will be high as the black line in Fig. 5b, while the error decreases as the number of scenes trained collaboratively under the federated framework increases. However, as shown in Tables 8 and 9, Fed-TP is slightly inferior to centralized training by increasing the average ADE/FDE values to 0.41/0.83 and 13.90/27.39 on FD1 and FD2, respectively.

Qualitative evaluations

Figure 6 shows the visualization of generated trajectories in the three scenes of eth, univ, and coupa to evaluate the proposed Fed-TP qualitatively. The red, green, and blue lines represent the observed, predicted, and ground-truth trajectories. Figure 6a shows the trajectory visualization under single-scene training, and the predicted trajectory deviates significantly from the ground-truth trajectories. In contrast, trajectories predicted by the centralized and Fed-TP training paradigms are closer to the ground-truth trajectories, which indicates the effectiveness of multi-scene training. Moreover, Fed-TP can protect data privacy, which is more suitable for real-world applications.

Conclusion and future works

Fed-TP is proposed to forecast pedestrians’ future trajectories with data privacy protection. A lightweight DO-TP is used to conduct trajectory prediction in each local scene. Subsequently, the privacy and security of personal data in each scene are protected by co-training the multi-scene trajectory data under the federated learning architecture. Moreover, three federated algorithms are compared to find the most suitable training paradigm for trajectory prediction. Evaluations are carried out on reintegrated ETH, UCY, and SDD. Results demonstrate that Fed-TP can effectively balance the trajectory prediction performance and user data privacy protection.

The proposed method can solve the real-world problem of not being able to effectively jointly analyze pedestrians’ motion behaviors in different scenes due to data island. At the same time, there is no data transmission during the training process to protect pedestrian privacy from being leaked. However, Fed-TP involves the transmission of model parameters between the server and clients, and there will be threats to privacy, such as model inversion attacks in real networks. In the future, we will learn the combination of federated learning and privacy protection technologies, such as homomorphic encryption and differential privacy, to better protect the security of pedestrian privacy. On the other hand, despite its good performance in several simple scenes, Fed-TP may degrade in complex scenes because social interactions between pedestrians are not considered. Therefore, our future work will concentrate on introducing pedestrians’ interactions and lightweight scene understanding strategy to improve the robustness of the model in dealing with unexpected changes in dynamic environments.

Data Availability

The data that support the findings of this study are openly available online, Refs. [45,46,47].

References

Rasouli A, Tsotsos JK (2019) Autonomous vehicles that interact with pedestrians: a survey of theory and practice. IEEE Trans Intell Transp Syst 21(3):900–918. https://doi.org/10.1109/TITS.2019.2901817
Article Google Scholar
Li K, Eiffert S, Shan M et al (2021) Attentional-GCNN: adaptive pedestrian trajectory prediction towards generic autonomous vehicle use cases. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, pp 14241–14247. https://doi.org/10.1109/ICRA48506.2021.9561480
Kalatian A, Farooq B (2022) A context-aware pedestrian trajectory prediction framework for automated vehicles. Transp Res Part C Emerg Technol 134:103453. https://doi.org/10.1016/j.trc.2021.103453
Article Google Scholar
Hu H, Wang Q, Cheng M et al (2022) Trajectory prediction neural network and model interpretation based on temporal pattern attention. IEEE Trans Intell Transp Syst 24(3):2746–2759. https://doi.org/10.1109/TITS.2022.3219874
Article Google Scholar
Hu H, Wang Q, Zhang Z et al (2023) Holistic transformer: a joint neural network for trajectory prediction and decision-making of autonomous vehicles. Pattern Recogn 141:109592. https://doi.org/10.1016/j.patcog.2023.109592
Article Google Scholar
Hu H, Wang Q, Du L et al (2022) Vehicle trajectory prediction considering aleatoric uncertainty. Knowl-Based Syst 255:109617. https://doi.org/10.1016/j.knosys.2022.109617
Article Google Scholar
Zhao C, Song A, Du Y et al (2022) TrajGAT: a map-embedded graph attention network for real-time vehicle trajectory imputation of roadside perception. Transp Res Part C Emerg Technol 142:103787. https://doi.org/10.1016/j.trc.2022.103787
Article Google Scholar
Yang B, Zhan W, Wang P et al (2021) Crossing or not? Context-based recognition of pedestrian crossing intention in the urban environment. IEEE Trans Intell Transp Syst 23(6):5338–5349. https://doi.org/10.1109/TITS.2021.3053031
Article Google Scholar
Kretzschmar H, Spies M, Sprunk C et al (2016) Socially compliant mobile robot navigation via inverse reinforcement learning. Int J Robot Res 35(11):1289–1307. https://doi.org/10.1177/0278364915619772
Article Google Scholar
Zhou X, Jia Y, Bai C et al (2022) Multi-object tracking based on attention networks for Smart City system. Sustain Energy Technol Assess 52:102216. https://doi.org/10.1016/j.seta.2022.102216
Article Google Scholar
Ferrer Mínguez G (2015) Social robot navigation in urban dynamic environments. http://hdl.handle.net/2117/95833
Morris BT, Trivedi MM (2008) A survey of vision-based trajectory learning and analysis for surveillance. IEEE Trans Circuits Syst Video Technol 18(8):1114–1127. https://doi.org/10.1109/TCSVT.2008.927109
Article Google Scholar
Yang Q, Liu Y, Chen T et al (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol (TIST) 10(2):1–19. https://doi.org/10.1145/3298981
Article Google Scholar
Regulation P (2016) Regulation (EU) 2016/679 of the European Parliament and of the Council. Regulation (EU) 679:2016
Google Scholar
McMahan B, Moore E, Ramage D et al (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. PMLR, pp 1273–1282
Vepakomma P, Swedish T, Raskar R et al (2018) No peek: a survey of private distributed deep learning. https://doi.org/10.48550/arXiv.1812.03288. arXiv preprint arXiv:1812.03288
Kairouz P, McMahan HB, Avent B et al (2021) Advances and open problems in federated learning. Found Trends Mach Learn 14(1–2):1–210. https://doi.org/10.1561/2200000083
Article Google Scholar
Cheng Y, Liu Y, Chen T et al (2020) Federated learning for privacy-preserving AI. Commun ACM 63(12):33–36. https://doi.org/10.1145/3387107
Article Google Scholar
Chen M, Mathews R, Ouyang T et al (2019) Federated learning of out-of-vocabulary words. https://doi.org/10.48550/arXiv.1903.10635. arXiv preprint arXiv:1903.10635
Xu J, Glicksberg BS, Su C et al (2021) Federated learning for healthcare informatics. J Healthc Inform Res 5:1–19. https://doi.org/10.1007/s41666-020-00082-4
Article Google Scholar
Li Q, Wen Z, Wu Z et al (2021) A survey on federated learning systems: vision, hype and reality for data privacy and protection. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2021.3124599
Article Google Scholar
Tan B, Liu B, Zheng V et al (2020) A federated recommender system for online services. In: Proceedings of the 14th ACM conference on recommender systems, pp 579–581. https://doi.org/10.1145/3383313.3411528
Bai F, Wu J, Shen P et al (2021) Federated face recognition. https://doi.org/10.48550/arXiv.2105.02501. arXiv preprint arXiv:2105.02501
Nadiger C, Kumar A, Abdelhak S (2019) Federated reinforcement learning for fast personalization. In: 2019 IEEE second international conference on artificial intelligence and knowledge engineering (AIKE). IEEE, pp 123–127. https://doi.org/10.1109/AIKE.2019.00031
Qi J, Zhou Q, Lei L et al (2021) Federated reinforcement learning: techniques, applications, and open challenges. https://doi.org/10.48550/arXiv.2108.11887. arXiv preprint arXiv:2108.11887
Chen Y, Lu W, Qin X et al (2022) Metafed: federated learning among federations with cyclic knowledge distillation for personalized healthcare. https://doi.org/10.48550/arXiv.2206.08516. arXiv preprint arXiv:2206.08516
Li Q, He B, Song D (2021) Model-contrastive federated learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10713–10722. https://doi.org/10.48550/arXiv.2103.16257
Liang J, Jiang L, Niebles JC et al (2019) Peeking into the future: predicting future person activities and locations in videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5725–5734. https://doi.org/10.1109/CVPRW.2019.00358
Kooij JFP, Schneider N, Flohr F et al (2014) Context-based pedestrian path prediction. In: Computer Vision-ECCV, 13th European conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part VI 13. Springer International Publishing, pp 618–633. https://doi.org/10.1007/978-3-319-10599-4_40
Baum L E, Petrie T (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat 37(6):1554–1563. https://www.jstor.org/stable/2238772
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Article Google Scholar
Alahi A, Goel K, Ramanathan V et al (2016) Social LSTM: human trajectory prediction in crowded spaces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 961–971. https://doi.org/10.1109/CVPR.2016.110
Gupta A, Johnson J, Fei-Fei L et al (2018) Social GAN: socially acceptable trajectories with generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2255–2264. https://doi.org/10.48550/arXiv.1803.10892
Kosaraju V, Sadeghian A, Martín-Martín R et al (2019) Social-BiGAT: multimodal trajectory forecasting using bicycle-GAN and graph attention networks. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.1907.03395
Article Google Scholar
Dendorfer P, Osep A, Leal-Taixé L (2020) Goal-GAN: multimodal trajectory prediction based on goal position estimation. In: Proceedings of the Asian conference on computer vision. https://doi.org/10.48550/arXiv.2010.01114
Yang B, Yan G, Wang P et al (2021) A novel graph-based trajectory predictor with pseudo-oracle. IEEE Trans Neural Netw Learn Syst 33(12):7064–7078. https://doi.org/10.48550/arXiv.2002.00391
Article MathSciNet Google Scholar
He C, Chen L, Xu L et al (2022) IRLSOT: inverse reinforcement learning for scene-oriented trajectory prediction. IET Intell Transp Syst 16(6):769–781. https://doi.org/10.1049/itr2.12172
Article Google Scholar
Bae I, Park JH, Jeon HG (2022) Learning pedestrian group representations for multi-modal trajectory prediction. In: Computer Vision-ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, Proceedings, Part XXII. Springer Nature Switzerland, Cham, pp 270–289. https://doi.org/10.48550/arXiv.2207.09953
He YH (2022) Machine-learning mathematical structures. Int J Data Sci Math Sci. https://doi.org/10.1142/S2810939222500010
Article Google Scholar
Borodin A, Mityushina I, Streltsova E et al (2021) Mathematical modeling for financial analysis of an enterprise: motivating of not open innovation. J Open Innov Technol Market Complex 7(1):79. https://doi.org/10.3390/joitmc7010079
Article Google Scholar
Tutsoy O (2021) Pharmacological, non-pharmacological policies and mutation: an artificial intelligence based multi-dimensional policy making algorithm for controlling the casualties of the pandemic diseases. IEEE Trans Pattern Anal Mach Intell 44(12):9477–9488. https://doi.org/10.1109/TPAMI.2021.3127674
Article Google Scholar
Bouchnita A, Nony P, Llored JP et al (2022) Combining mathematical modeling and deep learning to make rapid and explainable predictions of the patient-specific response to anticoagulant therapy under venous flow. Math Biosci 349:108830. https://doi.org/10.1016/j.mbs.2022.108830
Article MathSciNet Google Scholar
Li T, Sahu A K, Zaheer M, et al (2020) Federated optimization in heterogeneous networks. Proc Mach Learn Syst 2:429–450
Ji S, Pan S, Long G et al (2019) Learning private neural language modeling with attentive aggregation. In: 2019 international joint conference on neural networks (IJCNN). IEEE, pp 1–8. https://doi.org/10.1109/IJCNN.2019.8852464
Pellegrini S, Ess A, Van Gool L (2010) Improving data association by joint modeling of pedestrian trajectories and groupings. In: Computer Vision-ECCV 2010: 11th European conference on computer vision, Heraklion, Crete, Greece, September 5–11, 2010, Proceedings, Part I 11. Springer Berlin Heidelberg, pp 452–465. https://doi.org/10.1007/978-3-642-15549-9_33
Leal-Taixé L, Fenzi M, Kuznetsova A et al (2014) Learning an image-based motion context for multiple people tracking. In; Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3542–3549. https://doi.org/10.1109/CVPR.2014.453
Robicquet A, Sadeghian A, Alahi A et al (2016) Learning social etiquette: human trajectory understanding in crowded scenes. In: Computer Vision-ECCV, 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VIII 14. Springer International Publishing, pp 549–565. https://doi.org/10.1007/978-3-319-46484-8_33
Yang B, Fan F, Ni R et al (2022) Continual learning-based trajectory prediction with memory augmented networks. Knowl-Based Syst 258:110022. https://doi.org/10.1016/j.knosys.2022.110022
Article Google Scholar
Yang B, Yang J, Ni R et al (2023) Multi-granularity scenarios understanding network for trajectory prediction. Complex Intell Syst 9(1):851–864. https://doi.org/10.1007/s40747-022-00834-2
Article Google Scholar

Download references

Acknowledgements

This work is supported by Postdoctoral Foundation of Jiangsu Province NO. 2021K187B; National Postdoctoral General Fund NO. 2021M701042; General Project of Jiangsu Provincial Department of Science and Technology NO. BK20221380; Postgraduate Research and Practice Innovation Project of Jiangsu Province NO. KYCX23_3066.

Author information

Authors and Affiliations

College of IoT Engineering, Hohai University, Changzhou, 213000, China
Rongrong Ni, Biao Yang & Xiaofeng Liu
School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, 213000, China
Yanan Lu
School of Microelectronics and Control Engineering, Changzhou University, Changzhou, 213000, China
Biao Yang & Changchun Yang

Authors

Rongrong Ni
View author publications
You can also search for this author in PubMed Google Scholar
Yanan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Biao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Changchun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaofeng Liu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ni, R., Lu, Y., Yang, B. et al. A federated pedestrian trajectory prediction model with data privacy protection. Complex Intell. Syst. 10, 1787–1799 (2024). https://doi.org/10.1007/s40747-023-01239-5

Download citation

Received: 26 March 2023
Accepted: 03 September 2023
Published: 03 October 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s40747-023-01239-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A federated pedestrian trajectory prediction model with data privacy protection

Abstract

Similar content being viewed by others

Social-Transformer: Pedestrian Trajectory Prediction in Autonomous Driving Scenes

SGAMTE-Net: A pedestrian trajectory prediction network based on spatiotemporal graph attention and multimodal trajectory endpoints

Dummy Trajectory Generation Scheme Based on Deep Learning

Introduction