1 Introduction

The mobility trajectory dataset includes a wide range of information generated by diverse moving objects, consisting of a sequence of ordered points (Kong et al. 2018a). This data holds significant importance as it provides valuable insights into movement patterns and behaviors. In the field of urban computing, trajectory data enables the development of intelligent transportation systems, optimization of traffic flow, and prediction of congestion (Yan et al. 2014; Wang et al. 2019; Kong et al. 2022). In the area of city science, mobility trajectory data aids researchers in understanding urban dynamics, identifying activity hotspots, and improving resource allocation and public services (Halim et al. 2022; Bao et al. 2020; Han et al. 2020; Zhao et al. 2021a). Additionally, in the context of self-driving cars and intelligent transportation systems, mobility trajectory data is essential. It assists in training algorithms, allowing autonomous vehicles to navigate complex urban environments and make informed decisions (Kong et al. 2017; Waqas et al. 2020; Benko Loknar et al. 2023).

Currently, large-scale mobility trajectory data has been extensively utilized in practical applications. For instance, the study conducted by Hu et al. (2023) demonstrates the utilization of historical trajectory datasets and road networks for traffic predictions, thereby mitigating potential threats stemming from abrupt surges in traffic volume and ensuring the safety of public transportation. The analysis of mobile phone data conducted by Fan et al. (2021) and Li and Mostafavi (2022) improves the general public’s capacity to respond effectively to natural disasters. Furthermore, Wang et al. (2017) investigate taxi trajectory recognition to discern trip purposes and offer insights for smart city planning.

Although a large amount of mobility trajectory data is collected through sensors and various applications, there are challenges in direct utilization of this data in practice due to privacy concerns, commercial considerations, missing values, and expensive deployment costs. Firstly, there are privacy issues associated with mobility trajectory data, as it involves sensitive information about individuals’ activities and behaviors (Kong et al. 2018; Gursoy et al. 2019; Romero-Tris and Megías 2018). Secondly, there are commercial considerations as mobility trajectory data holds commercial value, but data sharing can be challenging due to conflicts of interest (Pan et al. 2019; Wang et al. 2020). Thirdly, data may contain missing values. In real-world mobility trajectory datasets, it is common to encounter corrupted or missing values due to sensor failures, communication loss, and data transmission issues (Ren et al. 2021; Hou et al. 2023). Finally, obtaining high-quality mobility trajectory data can be costly in terms of deployment. Setting up and maintaining sensors, data collection infrastructure, and computational resources require substantial investment (Halim et al. 2016; Zhang et al. 2020b; Kanaya et al. 2012). These factors mentioned above limit the accessibility and availability of mobility trajectory data. Therefore, trajectory data generation addresses the challenges of privacy protection, commercial considerations, missing values, and high investment costs faced in data collection. It helps professionals such as traffic managers, urban planners, and decision-makers optimize traffic systems, predict congestion, evaluate urban policies, and improve resource allocation.

The research topic of mobility trajectory data generation attracts sustained attention in recent years, and many impressive models or methods have been proposed. Some of these transform the generation problem as predicting the origin–destination matrix with spatial interaction theory (Roy and Thill 2003; Yan et al. 2017; Yan and Zhou 2019). These works model the mobility patterns based on gravity theory (Odlyzko 2015), Weber–Fecher Law (Slovic et al. 1977), intervening opportunities (Stouffer 1940), and game theory (Su et al. 2007) to estimate coarse-grained mobility preferences between two regions in urban. The generation or simulation process is carried out by the microscopic traffic simulation engines such as VISSIM (Fellendorf and Vortisch 2010) and SUMO (Simulation of Urban Mobility) (Brockfeld et al. 2001). With the development of artificial intelligence, various technologies related to it are used in different fields. Among all available mobility generation methods, the deep neural network is the stand out (Liu et al. 2020; Park et al. 2018; Zang et al. 2021; Zhang et al. 2020b; Bao et al. 2022). The idea behind this work is to learn the nonlinear spatio-temporal correlations preserved in traffic datasets by leveraging the strong approximation ability of deep neural networks.

The increasing popularity of mobility trajectory data generation has led to numerous publications in interdisciplinary fields. For example, in transportation and operational research areas, traffic patterns are simulated or modeled by related knowledge or theories of human mobility. However, most existing research generated data by estimating the possible distributions from the already existed dataset incapable of generating trajectories across different types. For example, the patterns learned from taxi trajectories cannot be directly applied in generating trajectories of private cars. Therefore, theory-guided or knowledge-based models also play an important role in mobility trajectory data generation.

In this paper, we attempt to solve this issue by presenting a comprehensive survey of mobility trajectory data generation. The main audience and readers of this survey are practitioners interested in studying the mobility trajectory of data generation from different research perspectives. We will first outline the problem of mobility trajectory data generation and introduce some related fundamentals. Then, the framework of this survey is given, and the categorization is discussed. Afterward, based on our categorization, we will elaborate on 55 mobility trajectory data generation papers. These papers mainly cover work in the field of transportation, but we also cover several publications from the data science and deep learning fields. Finally, we will discuss the current and future challenges of mobility trajectory data generation. The insights readers can extract from this survey are:

  • Comprehensive definitions of mobility trajectory data generation in different application scenarios.

  • The strengths and weaknesses of different categories methods and models in mobility trajectory data generation.

  • Commonly used open datasets in mobility trajectory data generation and the associated open code.

  • Future challenges facing mobility trajectory data generation and possible opportunities to deal with these challenges.

Comparison to other survey papers. There are some previously published works focusing on the topic of mobility trajectory data generation. One of the early surveys on this topic is Harri et al. (2009). This work mainly presents a framework to introduce vehicle mobility models, which can be used to generate realistic vehicular motion patterns based on Vehicular Ad Hoc Networks (VANETs). This survey mainly focuses on introducing the knowledge-based models, resulting in neglecting the deep learning-related work. Our survey aims to provide a more comprehensive view in reviewing the work in mobility trajectory data generation.

Recently, Shin et al. (2020) provides a survey about mobility trace generation. This survey focuses on synthesizing user mobility traces by Generative Adversarial Network (GAN) categorizes the review papers according to different types of GAN. However, this survey pays attention to GAN techniques without much focus on the domain knowledge in trace generation. Furthermore, this survey offers limited insights into future challenges, which may not be sufficient to inspire readers who are dedicated to the generation of mobility trajectory data. Our work provides several deep discussions about the challenges and future directions in Sect. 8.

The work of Gao et al. (2020) provides another survey of spatio-temporal data mining. This survey presents a detailed categorization based on different application scenarios of GAN in spatio-temporal modeling. However, this survey focuses on spatio-temporal data mining without consideration of the mobility trajectory generation work. Our work provides a deep and comprehensive survey of mobility trajectory data generation.

To the best of our knowledge, we are the first survey to organize and introduce the mobility trajectory data generation from the perspectives of different paradigms: knowledge-driven and data-driven. In this survey, we first provide a deep insight into these two paradigms and introduce the categorization and framework of our survey. Then, we give a detailed definition of mobility trajectory generation according to different scenarios. Moreover, we elaborate on the fundamentals (theories and techniques) commonly used in knowledge-driven and data-driven methods. We review each specific work based on the scenarios we presented and the fundamentals we discussed. Finally, we provide future challenges and possible trends in mobility trajectory data generation.

The rest of this paper is organized as follows. In Sect. 2, we introduce the detailed methodology that explains how we conducted the literature survey and identified the articles to be included in the study. In Sect. 3, we discuss the taxonomy of this survey. In Sect. 4, fundamentals and comprehensive definitions of mobility trajectory data generation are given. Our work is focused on Sect. 5. We split this section into two subsections: Sect. 5.1 discusses the knowledge-driven methods, while Sect. 5.2 elaborates on the data-driven methods. In Sect. 6, we introduce the evaluation metrics commonly used in mobility trajectory data generation. Then, in Sect. 7, we conclude the existing sources of mobility trajectory data generation including datasets, simulation tools and related open codes. Section 8 describes the challenges and future opportunities in mobility trajectory data generation research. Finally, we summery our work in Sect. 9.

Fig. 1
figure 1

Overview of the categories in mobility trajectory generation

2 Methodology

In the initial stage of the study, in accordance with the recommendations by Wohlin (2014), we utilized Google Scholar to conduct a literature search by employing diverse keywords, thereby mitigating potential publisher bias. The search was carried out in March 2020 without specifying a particular time frame. Duplicate papers and non-English articles were excluded, while all relevant journal articles, conference papers, and book sections pertaining to mobile trajectory data were included. Subsequently, a snowballing approach was employed on the identified papers. Firstly, the reference lists of each paper were scrutinized to identify potentially relevant new publications pertaining to the research topic. Subsequently, papers were selected or excluded based on the aforementioned criteria, and the process was concluded when no further relevant papers were discovered. Overall, 55 papers were utilized in this study.

3 Taxonomy

From the model’s perspective, we categorize the mobility trajectory generation works into knowledge-driven and data-driven. From application scenarios, we divide mobility trajectory generation into three scenarios. Figure 1 shows the categories of mobility trajectory generation. We will make a detailed discussion about our categorization.

In the early stage, hypotheses or theories are proposed by researchers. Then the collected various datasets are used to confirm or refute these hypotheses or theories, e.g., gravity model in traffic flow estimation. However, we have to agree that the data mining techniques or deep learning techniques have become a mainstream paradigm of the current mobility trajectory generation topic. Some researchers even propose that the rise of data science is the end of theory (Karpatne et al. 2017). The underlying idea is to leverage abundant data to construct models by optimizing a loss function, without relying on scientific theories.

Nevertheless, black-box deep learning methods have many limitations in applications. Firstly, deep learning methods rely largely on high-quality training samples. However, it is not easy to collect the representative labeled data involving complex and many physical variables. Generalization has become a major problem that plagues deep learning methods. The second limitation is the interpretability of deep learning methods. Although an ‘end-to-end’ or a ‘task-specific method’ achieves impressive performance on real-world datasets or application tasks, the process of knowledge discovery in the scientific domain does not end at that. Interpretable models or methods are based on explainable theories, which helps prevent the acquisition of erroneous patterns from noisy data. This ensures the model’s capacity for generalization.

Methods of mobility trajectory generation can be categorized into two classes from the macroscopic view. Some works designed their models based on theories or hypotheses, while others learned the mobility patterns from a large number of datasets. In this survey, we aim to introduce the mobility trajectory generation methods from these two paradigms. We hope that readers can get more in-depth insights or inspirations from the advantages and disadvantages of these two classes of methods we reviewed.

We divide the reviewed literature into categories knowledge-driven and data-driven. Moreover, we class the data-driven methods based on the specific techniques into Recurrent Neural Network (RNN-based) approaches and GAN-based approaches.

Table 1 The description of notations

4 Definitions and fundamentals

In this section, we first give definitions of mobility trajectory and mobility trajectory generation as shown in Table 1. We introduce three common application scenarios of mobility trajectory generation. Then, we give a detailed discussion about the fundamentals used in existing mobility trajectory generation work.

4.1 Definitions

Mobility trajectory mobility trajectory is defined as a set contained sequential spatio-temporal moving records \({\mathcal {S}}=\{x_{1}, x_{2},\ldots , x_{n} \}\in {\mathbb {R}}^{N\times 2}\), where ith element is a record defined as a tuple \((l_{i}, t_{i})\). \(l_{i}\) denotes the spatial information, which can be GPS coordinates (longitudelatitude) or a region ID. \(t_{i}\) represents the temporal information such as the timestamp of ith record.

Figure 2 shows an example of mobility trajectories of two objects. The top mobility trajectory is recorded by the GPS location identification, which is the most common manner of mobility trajectory data. The bottom mobility trajectory is obtained by transforming the GPS coordinates into other representations such as region ID to help model the latent semantic information from trajectories.

Fig. 2
figure 2

Examples of spatio-temporal trajectories

Domain knowledge domain knowledge is a set \({\mathcal {K}}\) that contains the information related to the trajectories or mobility patterns.

In this paper, we will mainly introduce four types of domain knowledge information that are commonly used in existing mobility trajectory generation work.

  • Report information the government will publish various information about the transportation, urbanization, and mobility analysis report per year. Information contained in these reports can reflect mobility or transportation situations in a macroscopic view, assisting in generating trajectories. For example, Kong et al. (2018) generated trajectories of social cars by estimating the parameters by the 2015 Beijing Transport Annual Report.Footnote 1

  • Demographic information the size of the population directly affects the formulation and improvement of policies for employment, elderly care, medical care, and social security. It also affects the distribution of education and medical institutions in the area where citizens are located, the construction of service facilities for humans, the distribution of commercial service outlets, the supply of urban housing, and the construction of urban roads. Demographic information is related to the travel demand and decides the mobility patterns in a city. Researchers use it to compute the demand and then provide a schema to solve some urban problems such as traffic congestion (Kong et al. 2018).

  • Spatial information spatial information can be categorized into two classes. The first class is the road network information. The road network is composed of points, lines, and planes. Besides, road network shows the basic spatial structure of a city and contain large amounts of information, such as different road network representing various road, hierarchy, and path structures. The second class is the Point of Interest (POI). POI contains text descriptions of spatial entities and can be utilized to extract the latent semantic information preserved in trajectories. The mobility trajectory can be transformed into the mobility activities between POIs and the mobility patterns can be extracted by learning the relationships among POIs (Yao et al. 2018). Common ways to obtain spatial information are Google Maps,Footnote 2 AMAP,Footnote 3 and Open Street Map (OSM)Footnote 4

  • Demand information demand information can be seen as a hybrid fine-grained information affected by various factors such as demographic information, economic information, environment information, etc. To simplify the discussion and help readers build a clear understanding, we list demand information as one of the information to be reviewed in the following discussions. Demand information decides the flow from the origin and destination. It is structured by an Originated-Destination (OD) matrix, which can convert into the individual trips of vehicles. Thus, the OD matrix describes each vehicle’s departure and arrival place in a specific region during the simulation.

It should be noted that domain-specific knowledge is varied, and within this survey, we have selected four frequently employed sources of information in works on generating mobility trajectories for inclusion.

Mobility trajectory generation given a predefined information set \({\mathcal {M}}\subseteq {\mathcal {S}}\cup {\mathcal {K}}\), the mobility trajectory generation aims to learn model or function \({\mathcal {F}}: {\mathcal {M}} \rightarrow \hat{{\mathcal {S}}}=\{{\hat{x}}_{1},{\hat{x}}_{2}, \ldots, {\hat{x}}_{n}\}\in {\mathbb {R}}^{N \times 2}\). The information set \({\mathcal {M}}\) consists of two components: \({\mathcal {S}}\), which is a set of sequential spatial–temporal movement records, and \({\mathcal {K}}\), which is a set of domain knowledge containing various types of information. The set \(\hat{{\mathcal {S}}}\) represents the collection of trajectories generated using a model or function.

The generated mobile trajectory data has similar statistical characteristics to real data and can be used for analysis and verification. The requirements for generating mobile trajectory data vary in different scenarios. In the context of smart cities, generated trajectory data is used to assess traffic congestion and accidents, thereby improving urban transportation. Therefore, generated mobile trajectories mostly consider factors other than just location, such as weather, peak hours, and holidays (Fan et al. 2021). For autonomous driving, generated trajectory data is used for training to enhance the vehicle’s understanding and response capabilities to the surrounding environment. Therefore, there is no need to generate long-term trajectories for autonomous driving; instead, the focus is on considering the interactions among different objects in the same space (Alahi et al. 2016). In terms of optimizing basic transportation infrastructure, generated trajectory data is used to evaluate the deployment of new infrastructure in cities and provide recommendations for urban planners and managers. Therefore, generated data is often generated for a specific area based on given historical conditions (Zhang et al. 2020b). In this paper, we divide the mobility trajectory generation into three application scenarios.

  • Scenario 1 the first scenario of mobility trajectory generation is about validation in VANETs and traffic simulation. For validation, some research (Codeca et al. 2015) used real information to build traffic scenarios for evaluating and comparing new communication protocols. For traffic simulation, the urban traffic state is estimated by generating trajectories (Dian Khumara et al. 2018).

  • Scenario 2 the second scenario of mobility trajectory generation is missing value imputation for urban. The complete mobility trajectory dataset is hard to obtain due to the limitations of privacy and security, power outage malfunctioning, and transfer errors. To solve this problem, Xia et al. (2017) and Kong et al. (2018) introduce relevant domain knowledge to generate trajectories. Besides, this work can also be used to fill in missing data.

  • Scenario 3 the third scenario of mobility trajectory generation is autonomous driving. To enhance autonomous driving safety, researchers (Alahi et al. 2016; Gupta et al. 2018) start to focus on making the algorithm understand the surrounding environment and the behavior of pedestrians and vehicles through generating possible trajectories.

4.2 Fundamentals

In this subsection, we first introduce the theories and tools used in the knowledge-driven methods, including spatial interaction models, traffic models, and two simulation tools. Then, we introduce the widely used techniques in data-driven methods, including Convolutional Neural Network (CNN), RNN, and GAN.

4.2.1 Spatial interaction models

Researchers have successively presented many models for predicting the flow of people, goods, and information between origins and destinations for more than 100 years. These models have different names in different disciplines and they are called travel distribution prediction models (Yan 2017) in transportation science. Prediction of flows can reduce the cost of spatial interaction while maintaining the diversity of choices in transportation.

The gravity model is successfully applied in mobility pattern analysis. There is a law similar to Newton’s law of universal gravitation in the flow distribution phenomenon between multiple places. In 2008, Jung et al. (2008) found that the traffic flow in the Seoul subway network in South Korea can be calculated using the following model:

$$T_{ij}=\alpha \frac{m_{i} m_{j}}{d_{ij}^{\beta }},$$
(1)

where \(T_{ij}\) is the passenger flow from station \(i\) to station \(j\), \(m_{i}, m_{j}\) are the populations of stations \(i\) and \(j\), \(d_{ij}\) is the distance between two stations \(i\) and \(j\), \(\alpha\) and \(\beta\) are two parameters.

In addition to the law of gravity in the railway network, this law also exists in the commuting travel (Viboud et al. 2006), population migration (Tobler 1995), international trade (Fagiolo 2010). However, the gravity model parameters have different values in different regions and may also have different values for the same region in different periods; that is, its applicability is limited. Stouffer (1940) provided another spatial interaction model called the intervening opportunities (IO) model. This model does not use the actual distance but sorts the destinations from near to far. The decision-maker will select the destination with a certain probability according to the ranking. In actual application, the IO model does not need to enter the actual distance; only the population and the number of trips in each location can complete the travel distribution forecast for the entire region. But its theoretical basis is not easy to understand and contains many parameters to be estimated; it is rarely adopted in practical applications.

4.2.2 Traffic models

Traffic models have a history of more than a hundred years. They are generally divided into the macro model at the strategic planning level and the micro model at the operational planning level. Establishing a traffic model is the basic method for traffic analysis. The four-step model (FSM) is currently the most commonly used macroscopic traffic model (McNally 2007).

The FSM is one of the first trip demand models that attempt to link the use and behavior of land for transportation planning. It includes the generation and distribution of trips, the choice of mode, and the assignment of traffic. Trip generation is determined by the population size, social economy, land use, travel frequency, and other factors. Trip distribution is used to predict the inter-regional trip flow related to the regional trip volume growth trend, trip resistance, and other factors. Due to the difference in time and other factors of various modes of transportation and the different preferences of travelers for different modes of transportation, the choice of trip mode is different. Traffic assignment will load OD traffic to each intersection section through route selection.

Fig. 3
figure 3

The simulation process of SUMO

4.2.3 Simulation tool

Traffic simulation is the utilization of simulation technology to assist in the study of traffic. It contains random characteristics, which can be microscopic or macroscopic. It involves a mathematical model that describes the real-time movement of the transportation system within a certain period of time. In this part, there are two mainly simulator tools, Simulation of Urban Mobility (SUMO) and VISSIM, which are widely used.

  • SUMO SUMO was provided in 2001 and first released in 2002  (Brockfeld et al. 2001; Krajzewicz et al. 2002). SUMO is an open-source tool with a simulation package that can process and simulate traffic-related data.  Behrisch et al. (2011) introduced the developments and prospects of SUMO in different research topics. SUMO is an effective simulation tool with characteristics of highly portable, microscopic, and continuous. SUMO contains multiple application packages. The common ones are dfrouter which can build the path of the vehicle, duarouter which use the Gawron model (1998) to compute the shortest path and dynamic user balance, netconvert which use to translate the road network, od2trips which import the OD matrix and translate the travel path, and TraCITestClient which can explore the possibility of communication with external applications such as network simulator version 2/3 (NS2/3).Footnote 5 As shown in Fig. 3, the important two modules of simulating the mobility for vehicles are road network import and demand modeling components. With the help of SUMO, urban traffic conditions change easily to study for researches. For instance, the combination of SUMO with NS2/3 makes it possible to achieve vehicle-to-vehicle (V2V) data transmission and generate vehicle trajectories.

  • VISSIM VISSIM is a discrete and stochastic microscopic traffic simulation system software based on the PTV Corporation’s time step and driving behavior in Germany. The traffic simulator relies on the “Wiedemann 74’’ car following model or the “Wiedemann 99’’ car-following model, which is classified as a psycho-physical car following model (Aycin and Benekohal 1999). The lateral lane change uses a rule-based algorithm. The VISSIM software is internally composed of a traffic simulator and a signal state generator. The simulator includes a car-following model and a lane change model. The signal generator is a signal control software that implements traffic flow control through programs. They exchange data and signal status information through the interface. VISSIM can perform functions such as road network evaluation and optimization, traffic impact evaluation. And, it can realistically simulate the behavior of cars, trucks, buses, subways, light rails, bicycles, and pedestrians. For example, VISSIM supports the location layout of light rail and public transportation systems, supports the evaluation of public transport priority schemes (such as bus lanes), supports indoor and outdoor pedestrian flow analysis, and public short-distance traffic simulation. Similar to SUMO, VISSIM can also simulate trajectories by importing the OD matrix.

4.2.4 Convolutional neural network (CNN)

CNN usually plays an important role in hybrid deep network design, whose main purpose is to gradually learn inherent features, beginning with low-level features and then building more complex concepts by a series of layers. Similar to the traditional neural network, the architecture of a typical CNN (Fig. 4) includes an input layer, an output layer, and hidden layers in general. Convolution layers, pooling layers, fully connected layers, and Rectified Linear Unit (ReLU) activation are the most commonly used in hidden layers. The purpose of convolution is to extract features from input layers. In contrast, pooling aims to gradually reduce the spatial size of the data volume but preserve vital information. Convolutional layers can handle temporal dependencies (Nikhil and Morris 2019). Moreover, pooling layers, which commonly include max-pooling and average pooling, perform downsampling or upsampling between successive convolutional layers on the spatial dimensions. ReLU layer will perform activation function operations by the element; the data size of this layer has not changed. Fully connected layers are similar to the traditional multilayer perceptron (MLP), in which every single neuron connect all neurons in the previous layer.

Fig. 4
figure 4

The structure of the typical CNN model

CNN is widely utilized not only for image data and natural language processing tasks (Krizhevsky et al. 2017; Nagarhalli et al. 2021), but also for addressing spatio-temporal data mining challenges. In transportation, CNN serves as a prevalent technique for extracting features that capture the spatial characteristics of traffic. For instance, Chen et al. (2020) proposed a methodology to extract spatio-temporal features across multiple layers, where CNN is employed to transform road representations into image format. This approach enables the extraction of pertinent information by considering the spatial structure of the roads. Similarly, Lv et al. (2018) treats trajectory data as two-dimensional images and utilizes multi-layer CNNs to integrate trajectory patterns at different scales, facilitating accurate prediction tasks. More about the use of CNN in mobility trajectory generation tasks can be found in Sect. 5.2.

4.2.5 Recurrent neural network (RNN)

RNN  (Mikolov et al. 2010) is a type of neural network that attaches great importance to capture temporal information in sequential data. RNN can take diverse sizes between inputs and outputs compared to another neural network such as CNN.

Fig. 5
figure 5

The structure of the typical RNN model

A classical RNN cell also consists of three layers (input, hidden, and output). It can be seen as a chain of nodes depicted in Fig. 5. Where X represents the input data, Y represents the output data, H refers to the hidden state, W and b refer to the parameters. Specifically, the state of node \({H}_{t}\) not only process the input data \({x}_{t}\) at time t but also process the information stored in \({H}_{t-1}\) and memorize the important sequence parts. Then, the state of node \({H}_{t}\) conveys the processed information to the next node state \({H}_{t+1}\). To calculate the loss, the result of output \({Y}_{t}\) can compare with the ground truth. In mobility trajectory generation, the input of RNN is composed of the historical trajectories. A continuous time period is divided into multiple time steps, and the historical trajectory is read from each time step and sent to the RNN (Ma et al. 2019).

However, RNN suffers from vanishing gradients with auto-regressive learning manner for long sequences input. To address this problem, Long Short-Term Memory (LSTM) has been proposed (Hochreiter and Schmidhuber 1997) and further improved in Gers et al. (2000).

LSTM also consists of multiple layers and possesses memorization capability to compare with simple RNN. LSTM add a element of memory state C, current \(C_{t}\) include previous time \({\textbf{C}}_{t-1}\) and current new part. In addition, LSTM has three more gates, which control the propagation of information in the network. The first is the input gate, which determines how much current information to reserve, such as remembering some new information. The second is forget gate, which determines how much current or previous information to reserve or forget. The third is the output gate, which determines the output of the information or controls how relevant and current information deliver for the next step. As shown in Fig. 6, LSTM maintains the recurrent structure of the RNN, but the difference is that LSTM has three gates to control the transmission of information.

Fig. 6
figure 6

The structure of the typical LSTM units

The RNN-based method possesses the main advantage in memorization capability. Knowing when to memorize or forget the information led RNN-based to be the popular method for sequence data. However, the time of training is remarkably longer than other deep neural network models because of its recurrent structure.

In transportation, RNNs are primarily utilized to capture the temporal and spatial movement patterns of individuals. These models often incorporate various types of data, such as weather conditions and holiday schedules, for modeling purposes. For instance, Feng et al. (2018) introduces a mobility prediction model based on a recurrent neural network with an attention mechanism. This attention mechanism captures multi-level periodic characteristics, thereby improving the prediction performance of the recurrent neural network. Additionally, Kong and Wu (2018) proposed the Hierarchical Spatio-temporal LSTM (HSTLSTM) model to address data sparsity and capture periodic variations for predicting short-term correlations among individuals. In Sect. 5.2.1, we will provide an overview of the common usage of RNN-based models.

4.2.6 Generative adversarial network (GAN)

GAN is proposed by Goodfellow et al. (2014). As shown in Fig. 7, the basic architecture of GAN comprises two fundamental components: the generator \(G\left( {\varvec{z}} ; \theta _{g}\right)\) and discriminator \(D\left( {\varvec{x}} ; \theta _{d}\right)\), which compete against each other. On the one hand, the generator can capture the data \({\varvec{x}}\) distribution \(p_{g}\) from noise variables \(p_{z}(z)\) learning to generate fake data that look real, which can fool the discriminator. On the other hand, the discriminator can distinguish between different classes fake or not as a classifier to model the probability of each class. In an ideal state, the generator G can generate the fake data with the real data G(z), and the discriminator difficult to distinguish whether the data generated by G is real or not. Finally, the two components reached a dynamic equilibrium, so D(G(z)) equals 0.5.

Fig. 7
figure 7

The structure of the typical GAN

However, the original GAN also exists some inadequate. For instance, GAN is not suitable for processing discrete forms of data, such as text. In addition, GAN has problems with unstable training, disappearing gradients, and mode collapse/dropping. To cope with this problem, many variants of the vanilla GAN are presented. Mirza and Osindero (2014) proposed conditional generative adversarial net (CGAN), which adds some prior conditions on the original basis, making GAN more controllable. Arjovsky et al. (2017) proposed Wasserstein generative adversarial network (WGAN), which used Wasserstein distance instead of JS divergence to solve that the two distributions do not overlap, the Wasserstein distance can still reflect their distance. WGAN not only solves training instability but also provides a reliable training progress indicator.

In the field of transportation, GANs have become a significant paradigm in data-driven generation methods, supplanting conventional stochastic models. GANs capture spatial dimension features that are beyond the reach of traditional methods and encompass additional information, including temporal dimension, social dimension, and complex nonlinear relationships in the data (Gupta et al. 2018; Ouyang et al. 2018). Recent studies have employed GANs as a stochastic generator for synthesizing realistic mobile trajectory data. For detailed information on GANs-based methods, refer to Sects. 5.2.2 and 5.2.3.

5 Mobility trajectory generation techniques

Table 2 Summary of advantages and disadvantages of mobility trajectory generation

In this section, we will elaborate on the representative methods of mobility trajectory generation based on the categorization we presented. For each work, we will introduce the scenario in which it is applied and discuss the theories or techniques it has developed.

5.1 Knowledge-driven approaches

Early generation of mobility trajectories was mainly used for simulating human daily dynamics in regional planning or observing and dealing with traffic congestion. Raney et al. (2003) designed a multi-agent traffic system that simulated 24-h micro-traffic in Zurich, Switzerland. They generated vehicle trajectories covering metropolitan areas with a population of 10 million, which were used for regional planning. They utilized demographic information and spatial information as knowledge, using micro-queue simulation and the Dijkstra algorithm for generating routes. Likewise, Cetin et al. (2003) also conducted dynamic micro-simulation of car traffic throughout Switzerland using traffic flow queue models based on Scenario 2. The generated dataset has a long duration but mainly focuses on the morning peak period and does not consider daily traffic conditions. However, both of these studies solely focus on car traffic, overlooking other modes of transportation.

Kanaya et al. (2012) combined spatial information and SUMO to propose a human sensing system simulator that synthesizes realistic human movements. Under Scenario 2, it can assist in locating individuals for navigation purposes. In the simulation part, they utilized map data, sensor information, and network data as prior knowledge for simulation. However, it is challenging to set up sensors in different cities to validate the system’s cost. Moreover, this method only simulates human behavior in urban areas.

Considering the previously discussed constraints of privacy and security protection, the absence of authentic, publicly accessible mobile trajectory datasets capable of capturing regional traffic dynamics poses a challenge for evaluating and validating vehicular networking protocols outlined in Scenario 1. To mitigate this concern, Ferreira et al. (2009) provided an alternative method to get urban mobility of vehicles and the respective drive speeds based on traffic image. They extracted the trajectory-related knowledge, e.g., distribution of buildings, from the Spatial information contained in the stereoscopic aerial photos. This work generates the fine-grained through SUMO and the spatial knowledge is utilized to estimate an accurate O/D matrix of two regions. The spatial knowledge is mainly learned by feature selection.

$$\begin{aligned} P(Z)&= \sum _{i\in I} \sum _{j\in J}P(Z\cap Y_j \cap X_{i})\\&= \sum _{i\in I} \sum _{j\in J}P(Z\vert Y_j\cap X_{i})P(Y_j\vert X_{i})P(X_{i}),\\ \end{aligned}$$
(2)

where XYZ represents the three events and IJ denotes two partitions of events space. Given two events X and Y (already occurred), the probability that C happens can be represented as a conditional probability as (2). This work transforms P(Z) as destination choice event and utilizes the demand information and spatial information to estimate the corresponding probability in (2). The estimated choice probability can be represented as an O/D matrix to be input into simulation tools to generate the trajectories. However, the short duration of connectivity in the aircraft and the cost of aerial photography make the data collection hard.

Subsequently, Thakurzx et al. (2012) acquired traffic flow data from roadside surveillance cameras in cities including London, Sydney, and Toronto, in order to calibrate the mobility of micro-vehicles. However, like previous research, this approach is also burdened by high filming costs and the need for advanced image processing techniques. Moreover, aerial photography has a limited time interval, rendering it unsuitable for generating large-scale datasets.

For knowledge-driven methods to generate mobility trajectories, traffic simulation tools are indispensable. Typically, they combine prior knowledge to generate macroscopic traffic flow, which refers to traffic volume between regions, for the purpose of trajectory generation tasks. Uppoor et al. (2014) synthesized real vehicle trajectory datasets for the City of Cologne based on Scenario 1 using SUMO. This work combines Spatial information, Demographic information, and Report information to generate possible mobility distributions in urban areas. Firstly, they obtained road topology information from the OpenStreetMap database. Secondly, they utilized population, Points of Interest (POI), and time usage patterns (i.e., residents’ time planning) as knowledge to calculate traffic demand. Then, the authors chose to utilize the Gawron algorithm for traffic assignment to achieve dynamic user equilibrium. Compared to the Dijkstra algorithm, the Gawron algorithm maximizes the road network capacity more effectively. The authors provided solutions to the issues encountered during the simulation process.

Codeca et al. (2015) described the process of creating realistic scenarios based on SUMO in a medium-sized European city, Luxembourg, using Scenario 1. The authors extracted the road topology structure using OpenStreetMap (OSM). With the help of the simulator, they needed to verify the accuracy of the manually corrected topology structure. They generated realistic traffic patterns based on activity-based demand using data easily obtained from government websites, such as population data. Additionally, they considered the reasonableness of traffic patterns. Bedogni et al. (2015) provided an openly available real trajectory dataset. Knowledge was extracted from Spatial information, particularly road network information. They implemented the SUMO road network conversion tool NETCONVERT, which allows automated and clean importing of OSM data, generated original circular movement trajectory datasets for the Bologna region in Italy. This work considered fine-grained road features such as connectivity and traffic lights when simulating trajectories. All three works mentioned above have long trajectory durations and wide coverage areas. However, these methods cannot be used for trajectory generation without relevant government research reports.

Gramaglia et al. (2016) generated a trajectory dataset based on the Scenario 1 to characterize the vehicular network connectivity. Intelligent Driver Model (IDM; Liebner et al. 2012) is utilized to estimate the statistical driving status to simulate the traces. IDM estimates the driver behavior of a vehicle i through the instantaneous acceleration \(dv_{i}(t)/dt\) as:

$$\begin{aligned} \frac{dv_{i}(t)}{dt}&= a \left[ 1-\left( \frac{v_{i}(4)}{v_{i}^{max}} \right) ^{4}- \left( \frac{\Delta x_{i}^{des}(t)}{\Delta x_{i}(t)} \right) ^{2}\right] ,\\ \Delta x_{i}^{des}(t)&= \Delta x^{safe}+ \left[ v_{i}(t)\Delta t_{i}^{safe}-\frac{v_{i}(t)\Delta v_{i}(t)}{2\sqrt{ab}}\right] ,\\ \end{aligned}$$
(3)

where \(v_{i}(t)\) is the current speed of i, \(v_{i}^{max}\) denotes the maximum speed, and \(\Delta x_{i}^{des}(t)\) represents the desired dynamical distance (leading distance driver would keep from). This work analyzed the data collected by sensors deployed on highway loops and incorporated the Demand information into traffic models to generate trajectory data. This work generated trajectories with a duration of 24 h and a coverage range of 10 km. However, it focuses primarily on the study of vehicle networks in highway environments.

SUMO is a highly regarded simulation tool primarily designed for right-hand traffic. However, countries that follow left-hand traffic need to make specific modifications to the SUMO files. Lim et al. (2017) proposed a method that enables the simulation of left-hand traffic in Malaysia using SUMO, building upon the foundation of Scenario 1. The research focused on making primary modifications to the road connections and traffic signal files. Nevertheless, due to the challenges of modifying extensive maps, this method is not suitable for large-scale areas.

The acquisition of inter-regional traffic flow is vital for simulating traffic using the SUMO platform, and numerous studies have relied on publicly available government data for estimation purposes. Kong et al. (2018) utilized floating car data in Beijing to generate a dataset of social vehicle trajectories within SUMO. It is important to highlight that the objective of their work was to produce trajectory datasets specifically for private cars, based on the floating car dataset, which is applicable to Scenario 1. The study integrated Report information, Demographic information, and Spatial information into a spatial interaction model to estimate the macroscopic travel distribution across different areas in Beijing. In a subsequent work, Kong et al. (2022) introduced an alternative method for generating mobility trajectories in the same application scenario. They proposed a three-layer framework, wherein the first layer focused on developing a regional partition scheme. The second layer presented a novel spatiotemporal interaction model to estimate traffic flow between two regions and conducted simulations using SUMO. Lastly, the third layer analyzed the validation results from both macroscopic and microscopic perspectives. However, it is important to acknowledge that this method exhibits certain limitations, performing better in high-density scenarios compared to low-density scenarios. Moreover, it lacks a comprehensive consideration of factors that influence travel behavior and requires specific urban road segmentation in the regional partition. The two aforementioned studies encompass an analysis of macroscopic traffic flow and microscopic driving behavior, resulting in extended duration and coverage of the entire Beijing Fifth Ring Road. Nevertheless, it is essential to note that utilizing simulation tools for route selection may contribute to traffic congestion.

In summary, knowledge-driven methods are predominantly used in Scenario 1 due to their application for validating works related to VANETs protocols or simulating traffic, which requires larger volumes of data, wider coverage, and longer duration. While realistic data, such as traffic flow or traffic average speed, can be collected  (Gramaglia et al. 2016), these data are solely utilized for estimating statistical characteristics rather than learning features. When simulation tools are employed, knowledge-driven methods demonstrate a more effective capability in generating large-scale and long-term datasets. However, this approach heavily relies on supplementary information in addition to specific datasets. Furthermore, this generation paradigm is primarily based on spatial theories and traffic models. Nonetheless, theories or models tend to oversimplify real-world variables, leading to suboptimal performance in capturing intricate correlations or dependencies at a fine-grained microscopic level.

5.2 Data-driven approaches

Compared to the method of knowledge-driven, data-driven methods make use of large-scale real datasets of sensors by incorporating deep learning techniques into mobility trajectory generation. This paradigm aims to learn the spatio-temporal dependencies preserved in the realistic data and then generates the trajectories by the learned spatio-temporal correlations.

Fig. 8
figure 8

Social pooling in Social LSTM. Three hidden states in different colors are aggregated into two different pools

5.2.1 RNN-based models

RNNs and their variations have achieved certain accomplishments in generating pedestrian trajectories. Alahi et al. (2016) proposed Social LSTM for generating pedestrian trajectories. The model designed an aggregation strategy to connect neighboring LSTM units and learn the interactive behaviors among individuals in a larger spatial context. Social pooling aggregates the hidden states of adjacent pedestrians within a certain spatial distance, as shown in Fig.8. However, this method neglects the influence of other factors, such as scene layout. Additionally, in crowded scenes, the strategy becomes more complex due to the use of LSTM for each individual. Inspired by the aforementioned work, Fernando et al. (2018) presented an attention-based LSTM model that considers the past interactions between pedestrians and their neighbors in the contextual scene to generate future trajectories. The introduced attention model can handle highly congested scenarios.

Xue et al. (2017) further extended the previous work and presented a framework named Bi-Prediction for predicting pedestrian trajectories in a scene. Bi-Prediction designed a two-stage architecture based on bidirectional LSTM to learn fine-grained entry and exit trajectories in a given scene. Unlike the previous work that clusters trajectories, Bi-Prediction divides the scene into multiple regions and utilizes bidirectional LSTM classification to predict the destination selection probability of pedestrians.

Unlike previous studies that disregard the present intention of nearby pedestrians while concentrating solely on their adjacent hidden states, Zhang et al. (2019) introduced a states refinement module based on LSTM network. Acting as a feature extractor, this module employs an information passing mechanism to engage neighboring pedestrians’ intentions and jointly handles the current states of all pedestrians in congested scenarios. Furthermore, an information selection mechanism is introduced to selectively extract valuable features from individual neighbors.

In contrast to Social LSTM and Bi-Prediction, Lisotto et al. (2019) proposed three tensors to enhance the performance of the basic LSTM model. The first tensor is the Social Tensor, which aggregates neighboring interactions using a pooling mechanism. The Social Tensor follows a similar pooling strategy as in Social LSTM. The second tensor is the Navigation Tensor, which incorporates environmental content information for path selection. Specifically, a Navigation Map \({\mathcal {N}}\) was developed to quantify the frequency of crossings during navigation. Average pooling is employed to mitigate abrupt frequency transitions. The third tensor is the Semantic Tensor, which captures the semantic characteristics of spatial areas. The study defined a semantic class \({\mathcal {C}}={grass, building, obstacle, bench, car, road, sidewalk}\) and encoded it using one-hot representations. However, this approach also models each pedestrian as an LSTM network, making it equally unsuitable for crowded scenarios.

In real-life scenarios, pedestrians influence each other’s movements and are also affected by the presence of obstacles in their surroundings. Therefore, it is essential to consider various factors when generating future trajectory predictions. The application of attention mechanisms has proven to be effective in generating more plausible trajectories, and its effectiveness has been demonstrated in many tasks.

Haddad et al. (2019) introduced a graph-based LSTM framework for generating pedestrian trajectories. In contrast to previous approaches, this framework represents spatial and temporal interactions using a spatio-temporal graph, as shown in Fig.9. The graph components are decomposed into three LSTM-based modules: temporal edge LSTM, spatial edge LSTM, and node LSTM. Vanilla LSTM is employed to incorporate spatial and temporal relationships into deep representations.

Fig. 9
figure 9

The crowded scene and corresponding spatio-temporal graph over 2-time steps

Al-Molegi et al. (2018) proposed a neural network model that combines RNN and attention mechanisms. This model employs representation learning techniques to extract essential information from sequential trajectories. It tends to generate pedestrian trajectories that correspond to specific locations. However, the model lacks the capability to handle unseen locations. Similarly, Vemula et al. (2018) incorporates attention mechanisms to capture the relative importance of each individual in the crowd, irrespective of their proximity. However, the computational complexity increases due to the larger number of model parameters. The attention mechanism was also incorporated by Jiang et al. (2019) to distinguish the importance of different neighbors and tackle the issue of generating pedestrian trajectories. However, their initial extraction of destination information from past trajectory data led to the model neglecting the influence of pedestrians on one another. Consequently, this led to a deviation in the intended destination, resulting in trajectories that deviated from their actual paths.

The utilization of soft attention and hard attention (Fernando et al. 2018), implemented with the LSTM model, addresses pedestrian interactions in densely populated scenarios by incorporating the trajectory information of nearby neighbors into future trajectory generation. In a similar vein, Bhujel et al. (2019) propose two attention mechanisms within the LSTM framework. The first one is physical attention, which leverages input images to identify locations and generate contextual information. The second one is social attention, which computes social context vectors based on the encoder’s hidden states. Furthermore, the authors employ CNN as an extractor to acquire scene information. Notably, this study employs a single LSTM, effectively reducing the complexity of the training process. In the study conducted byXue et al. (2020), the generation of pedestrian future trajectories relies exclusively on the observed partial trajectories. The model adopts the LSTM architecture and incorporates temporal attention mechanisms into the location and velocity LSTM layers. However, the emphasis of this research is not placed on the integration of comprehensive background information, including static obstacles and scene details.

The main objective of the aforementioned trajectory generation tasks is to generate pedestrian trajectories. However, there have also been several studies that focus on generating trajectories from the perspective of vehicles. Park et al. (2018) proposed a framework specifically designed for vehicle trajectory generation. In this framework, an LSTM encoder is utilized to capture the trajectory samples and state information of the ego vehicle. Subsequently, the LSTM decoder leverages a beam search algorithm to generate future trajectories. The architecture of the LSTM encoder–decoder framework is visually shown in Fig.10.

Fig. 10
figure 10

The encoder–decoder LSTM architecture

The following works are built upon the LSTM Encoder–decoder framework. Deo and Trivedi (2018) introduced a unique approach by enhancing the social pooling layer with convolution, enabling robust learning of interdependencies in the data. Messaoud et al. (2019) tackled the challenge of long-term trajectory prediction (5 s) on highways by integrating attention mechanism and LSTM to capture spatio-temporal dependencies. Khakzar et al. (2020) aimed to overcome the limitations of existing methods, including computational complexity and dataset dependence, by employing ConvLSTM. This replaces the inner product of LSTM with convolution, ensuring the preservation of spatio-temporal motion patterns.

Existing LSTM models inadequately capture the spatial interactions and temporal relations among distinct vehicles. Furthermore, basic LSTM models encounter challenges with the vanishing gradient problem, impeding their training on long time series. Choi et al. (2019) proposed an attention mechanism to enhance the basic RNN and elucidate the impact of network-level traffic state information on generating trajectories for urban vehicles. Ma et al. (2019) devised an algorithm comprising two primary levels: an instance level for capturing agent mobility and interactions, and a category level for learning from agents of the same type. Nevertheless, its practical application is limited by the algorithm’s high computational cost and overreliance on traffic conditions and historical trajectories. Dai et al. (2019) integrated spatial interactions and temporal relations into the LSTM model to quantify the interactions among diverse vehicles. Additionally, they mitigated the vanishing gradient problem by introducing two consecutive LSTM layers between the input and output.

It is crucial to emphasize that the previously mentioned trajectory generation works utilizing RNN are employed in Scenario 3 with the goal of comprehending pedestrian and vehicle behaviors, and preventing collisions with obstacles in the surrounding environment. These works play a crucial role in the future advancement of socially compliant agents and autonomous vehicles.

5.2.2 GAN-based models

GAN has proven effective in generating pedestrian mobility trajectories. For instance, Gupta et al. (2018) designed an early GAN model named SocialGAN, which utilized a purely data-driven approach to model interactions among individuals. L2 loss was employed in this work to measure the distance between generated samples and real samples, as illustrated in Eq.4. In contrast to the conventional GAN discussed in Sect. 4.2.6, SocialGAN integrates a new pooling mechanism within the Encoder–decoder framework to capture information about individuals and generate trajectories in a scene.

$${\mathcal {L}}2=\min _{m}\left\| Y_{t}-{\hat{Y}}_{t}^{(m)}\right\| _{2},$$
(4)

where m is a hyperparameter.

Ouyang et al. (2018) designed a non-parametric trajectory generator that combines WGAN-GP (Gulrajani et al. 2017) to capture high-order geographic and semantic features. Non-parametric means that the generator does not assume any explicit parameters for the movement trajectories. They evaluated the synthetic trajectories by comparing the geographic and semantic features with real trajectories. In the model proposed by Amirian et al. (2019), L2 loss was excluded during the training of the generator to avoid mode collapse issues. This work not only integrated the Info-GAN structure into their network but also defined an attention aggregation mechanism to capture interactions between humans.

Song et al. (2019) analyzed data from macro and micro perspectives within the GAN framework. The former applied the k-means clustering method, while the latter focused on understanding the correlations between different points. They used a four-layer CNN to generate trajectories represented as matrices. However, due to the limitations of specific locations, the model’s capability is bound by high randomness and has some drawbacks. Additionally, this approach lacked quantitative evaluation of the model’s realism. Subsequently, Liu et al. (2020) applied a generator called CoL-GAN with an attention mechanism in a generative adversarial network, using a convolutional neural network as the discriminator. The model includes a social attention module to capture pedestrian’s historical patterns.

In the task of generating vehicle movement trajectories, GAN have been utilized. For example, a GAN-based framework for predicting vehicle trajectories was proposed by Roy et al. (2019) to model the interactions between vehicles with diverse types and driving styles. The crucial aspect involves integrating the social environment into the GAN model, which incorporates the LSTM encoder–decoder architecture and has demonstrated superior performance compared to certain purely RNN- or LSTM-based approaches. To account for the interactions among multiple vehicles, Wang et al. (2020c) proposed a collaborative learning approach based on GANs to generate multi-modal distributions of vehicle trajectories. This approach comprises two modules: the autoencoder social convolution module and the recursive social module, enabling the modeling of spatiotemporal information for distinct vehicles. Zhao et al. (2021) introduced a GAN model for trajectory generation and a vehicle turning model to adapt the prediction process in urban scenarios. During the dataset preparation, the complex spatial dependencies of road topology were addressed through vehicle coordinate transformation.

The above-mentioned GAN-related models, similar to the RNN-based methods for generating pedestrian and vehicle movement trajectories, are applied in Scenario 3.

5.2.3 Hybrid methods

The majority of the models presented in Sects. 5.2.1 and 5.2.2 generate trajectories depicting the movement of pedestrians or vehicles within a shared scene. Subsequent approaches integrate multiple neural network models within their frameworks and capture intricate scenarios. Unless otherwise indicated, these methods are also employed for Scenario 3. Zhao et al. (2019b) presented the Multi-Agent Tensor Fusion (MATF) network, which generates trajectories considering both vehicles and pedestrians. Specifically, this method utilizes an LSTM encoder–decoder architecture and employs Conditional Generative Adversarial Networks (CGAN; Mirza and Osindero 2014) to learn a stochastic generative model that captures uncertainties across multiple modes. The future trajectories are subsequently obtained through iterative decoding processes.

Vishnu et al. (2023) further expanded upon the previously mentioned approach and introduced three prediction models with distinct architectures: TS-Transformer, Generative Adversarial Network-based (TS-GAN), and Conditional Variational Autoencoder-based (TS-CVAE). These models are designed to generate trajectories for multiple agents in interactive driving scenarios. Sadeghian et al. (2019) provided Sophie, a GAN-based model, to predict future social constraints among multiple interacting agents in a scene. This method, similar to SocialGAN, employs LSTM to estimate temporal states. However, it distinguishes itself by integrating two attention mechanisms (physical attention and social attention) to enable interpretable generation. Furthermore, CNN is utilized as a feature extractor to capture scene features.

On the contrary, in comparison to most scene generation models that require extensive condition settings and parameters, Wu et al. (2020) introduced a fully data-driven model called LSTM-GAN, which solely relies on historical data. Moreover, the data generated by this method can concurrently encompass continuous time periods and locations.

5.2.4 Others

The Graph Neural Network (GNN) is a machine learning model that operates on graph structures. Graph attention networks (GAT), which combine GNN with attention mechanisms, have been employed for trajectory generation tasks. Kosaraju et al. (2019) utilized a GAN based on the GAT to generate multimodal pedestrian trajectories in interactive scenes, known as Social-BiGAT. Huang et al. (2019) introduces a spatial-temporal graph attention mechanism, using LSTM to capture the temporal correlations of pedestrian movements, and GAT to model spatial interactions.

In contrast to the aforementioned research, Gao et al. (2020) introduced a hierarchical graph neural network known as VectorNet. Rather than employing CNN for encoding, they utilize vector representations to handle high-definition maps and agent movement trajectories. Additionally, they stack multiple GNN layers to capture higher-order interactions among all components. Lv et al. (2023) designed a model that combines Graph Convolutional Networks (GCN) with attention mechanisms to capture interactions among pedestrians and between pedestrians and the environment in complex scenes. However, since the functions are specifically designed based on inherent graph structures, they are not compatible with non-GCN methods.

Lv and Yuan (2023) integrates social knowledge (such as the distance, speed, and visual range between pedestrians) as a matrix and combines it with the GAT to generate pedestrian movement trajectories. However, this approach primarily emphasizes the interaction between pedestrians, neglecting the interaction between pedestrians and the environment.

Kang et al. (2021) proposed a method called TraG for urban crowd mobility, which automatically captures contextual and statistical mobility features, ranging from simple empirical data to synthetic trajectories, using real-world datasets. This study primarily focuses on Scenario 1 and Scenario 2 for evaluating network simulation and planning decisions.

To address the probabilistic generation task for multiple interacting entities, Li et al. (2019) employed a variational recurrent neural network (VRNN) to improve coordination classification accuracy and used a Coordination-Bayesian Conditional Generative Adversarial Network to generate future vehicle trajectories based on historical information and coordination outcomes of multiple vehicles.

Si et al. (2019) designed an Adaptive Generation (AGen) method for generating vehicle trajectories. This method combines online adaptation and offline learning models to account for individual variances and temporal behaviors. It also incorporates an RNN model.

To simulate human spatio-temporal mobility patterns, Luca Pappalardo (2018) designed a data-driven algorithm called DIary-based TRAjectory Simulator (DITRAS), which achieves realistic simulation of human mobility. The basic idea is to separate the temporal characteristics and spatial characteristics of human mobility. Specifically, it constructs a mobility diary from real data and transforms it into a mobility trajectory.

To address the issue of data scarcity in emerging cities, recent works have combined prior knowledge with data-driven methods based on Scenario 2. He et al. (2020) proposed a framework that integrates transfer learning and multiple-source data from the target city to generate mobility data for new cities. Rong et al. (2023) was inspired by the previous work and combined GNN and GAN to generate OD flow data in emerging cities using data from source cities.

For improved estimation of traffic conditions and patterns in urban development planning and management (Scenario 2), TrafficGAN, a deep generative model proposed by Zhang et al. (2020b), captures the underlying patterns of how traffic evolves with changing travel demands and the evolving structure of the underlying road network. Within their framework, they developed a generative adversarial network (GAN) architecture featuring a generator and discriminator equipped with dynamic convolutional layers. Additionally, Zhang et al. (2020a) proposed conditional GAN (cGAN) to address traffic planning problems by considering traffic demands as conditions to generate traffic estimates. They used dynamic convolutional layers to extract spatial correlation within localized networks. Finally, they utilized self-attention mechanisms to capture temporal relationships.

For the task of generating future trajectories of moving objects and forecasting traffic flow in urban areas (Scenario 2 and Scenario 3), Karimzadeh et al. (2021) employ both reinforcement learning and transfer learning techniques to design the architecture of LSTM models. Additionally, they leverage high-order convolution operations and adaptive distance adjacency matrices to effectively capture the spatiotemporal dependencies within urban environments.

In summary, data-driven approaches have the capability to uncover complex and latent factors or correlations from the data itself. In Sect. 3, we have discussed two limitations of the data-driven paradigm. From this subsection, it can be concluded that the performance of data-driven approaches is contingent upon the quality of the training data. Furthermore, in the majority of existing data-driven approaches, there is a lack of clear understanding regarding the training process. Taking GAN as an example, the adversarial learning process within GAN remains largely unknown. Consequently, the training of GAN still poses significant challenges in GAN-related research. Nevertheless, data-driven methods offer distinct advantages when compared to knowledge-driven methods. Our current understanding and theories may not fully grasp the inherent complexity of the mobility trajectory process. For instance, accurately modeling the subtle psychological state of individual drivers, which significantly influences trajectory generation, remains elusive.

Data-driven methods are more commonly utilized in Scenario 2 and Scenario 3. Among them, Scenario 2 is primarily employed to generate and evaluate data for emerging cities, leveraging historical traffic data, and to assess the impact of new buildings on future traffic. These tasks are frequently complemented by knowledge-driven approaches. These works primarily concentrate on generating traffic speed and traffic flow data. Data-driven methods in Scenario 3 has the capability to generate both vehicle and pedestrian movement trajectories. The generated trajectory data usually exhibits shorter duration and covers smaller spatial areas, enabling a more detailed exploration of spatio-temporal dependencies.

In the trajectory generation process, knowledge-driven methods, in addition to the relevant information mentioned in Sect. 4, have taken into account the temporal patterns of residents’ travel in Scenario 1 (Uppoor et al. 2014; Bedogni et al. 2015). Conversely, data-driven methods incorporate a wider range of features for training, going beyond the sole reliance on location-based information. Within Scenario 2, data-driven methods primarily focus on the average speeds within specific regions and the traffic flow within each respective region (Zhang et al. 2020a). Moreover, certain studies also encompass demographic data, income-related data, epidemiological conditions, and policies (Bao et al. 2022), alongside the generated time periods (peak hours, holidays) and weather conditions (Wu et al. 2020). In Scenario 3, there is a notable emphasis on ensuring safe distances between vehicles (Dai et al. 2019) and considering environmental factors (Gao et al. 2020).

The complexity of knowledge-driven methods in handling large-scale and long-duration simulations depends on various factors, including the number of vehicles, road segment complexity, vehicle interactions, and traffic signal controls. The processing time increases as more factors are taken into consideration. In data-driven methods, the CNN module exhibits relatively low complexity, whereas RNN and GAN training involves higher complexity, demanding more computational resources (Huang et al. 2019). The complexity of hybrid methods is contingent upon network design, parameter size, and training process iterations. A summary is shown in Table 2.

Table 3 Evaluation methods for generating mobility trajectories

6 Evaluation metrics

A major problem with generating mobility trajectory data is that they are generated by model simulations and thus require validation. However, the validation of the accuracy and effectiveness of knowledge-driven and data-driven data generation methods are different.

For knowledge-driven methods, most evaluation metrics are based on prior knowledge (e.g., real traffic conditions and navigation services data) or visualize the generated data to analyze whether it is following common sense. For instance, Dian Khumara et al. (2018), Kong et al. (2018), Pigné et al. (2011), Bedogni et al. (2015), Zhao et al. (2019a) and Raney et al. (2003) show the efficiency for the generated dataset by comparing with the real traffic condition. In addition, some works (Kong et al. 2018; Uppoor et al. 2014) visualizes the generated data and analyzes its rationality. In some cases (Codeca et al. 2015), directly uses the generated data in actual scenarios, such as evaluating and testing network protocols. Kanaya et al. (2012) simulate a monitoring-based flow estimation system to validate the usefulness of the model.

For data-driven methods, the evaluation is usually included qualitative evaluation and quantitative evaluation. The qualitative evaluation is mainly to show the generated results in a way of visual comparative analysis. There are metrics used to quantitatively evaluate the mobility trajectory generation models as illustrated in Table 3. First, for the data type of pedestrian trajectories, the common error metrics used to quantitatively evaluate the accuracy of the generated model are average displacement error (ADE), final displacement error (FDE).

  1. (1)

    ADE (Gupta et al. 2018; Roy et al. 2019; Sadeghian et al. 2019): this metric is the average Euclidean distance difference between each generated position and ground truth position during the generation time. Apart from this, the average non-linear displacement error (NL-ADE) calculates the distance between each generated position in the nonlinear region formed by the turning point generated by the pedestrian walking process and the ground truth position (Jiang et al. 2019). The calculation formula of this metric is as follows:

    $$A D E=\frac{\sum _{j=1}^{N} \frac{\sum _{i=1}^{n} \sqrt{\left( {\hat{x}}_{t}^{j}-x_{t}^{j}\right) ^{2}+\left( {\hat{y}}_{t}^{j}-y_{t}^{j}\right) ^{2}}}{n}}{N},$$
    (5)

    where N is the set of pedestrians, \(({\hat{x}}_{t}^{j}, {\hat{y}}_{t}^{j})\) are the generated coordinates at time t and \((x_{t}^{j}, y_{t}^{j})\) are the real position coordinates of time t.

  2. (2)

    FDE (Gupta et al. 2018; Roy et al. 2019; Sadeghian et al. 2019): this metric is the average Euclidean distance difference between the final generation positions and the corresponding truth locations. The calculation formula of this metric is as follows:

    $$F D E=\frac{\sum _{j=1}^{N} \sqrt{\left( {\hat{x}}_{n}^{j}-x_{n}^{j}\right) ^{2}+\left( {\hat{y}}_{n}^{j}-y_{n}^{j}\right) ^{2}}}{N},$$
    (6)

    where N is the set of pedestrians, \(({\hat{x}}_{n}^{j}, {\hat{y}}_{n}^{j})\) are the generated coordinates at time n and \((x_{n}^{j}, y_{n}^{j})\) are the real position coordinates at time n.

  3. (3)

    Jensen–Shannon Divergence (JSD) (Ouyang et al. 2018; Feng et al. 2020): JSD is the symmetric measure of the distance of two probability (P and Q) distributions. The smaller the JSD between the generated data and the real-world data distribution, the better. The calculation formula of this metric is as follows:

    $$\begin{array}{l} J S \text{ divergence } (P \Vert Q)=\frac{1}{2} {\mathbb {E}}_{P}\left[ \log \frac{P}{X}\right] +\frac{1}{2} {\mathbb {E}}_{Q}\left[ \log \frac{Q}{X}\right] , \end{array}$$
    (7)

    where \(X=\frac{1}{2}(P+Q)\).

    In addition, for the data type of vehicle trajectory, the more common metrics are average accuracy (AA), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root-Mean-Squared Error (RMSE).

  4. (4)

    AA (Zhao et al. 2021): it represents the average generation accuracy of the generated vehicle trajectory. The calculation formula of this metric is as follows:

    $${\text{AA}}=\frac{1}{n} \sum _{i=1}^{n}\left( 1-\frac{|{\hat{y}}_{i}-y_{i}|}{K}\right) ,$$
    (8)

    where \({y}_{i}\) is the information of real traffic; \({\hat{y}}_{i}\) represents the predicted by \(y_{i}\); n represents the number of vehicles, K represents a constant.

  5. (5)

    MAE (Zhao et al. 2021; Li et al. 2018; Park et al. 2018): this metric represents the average value of absolute error, which can reflect the real value of the error of generated value. The calculation formula of this metric is as follows:

    $${\text{MAE}}=\frac{1}{n}\sum _{i=1}^{n}|{\hat{y}}_{i}-y_{i}|,$$
    (9)

    where \({y}_{i}\) is the information of real data; \({\hat{y}}_{i}\) is the predicted by \(y_{i}\); n represents the number of vehicles. The evaluation metric MAPE is equivalent to the weighted version of MAE.

  6. (6)

    RMSE (Deo and Trivedi 2018; Khakzar et al. 2020; Zhang et al. 2020a; Wang et al. 2020c): this metric is the square root of the ratio of the square sum of the error of the generation result to times n of generation. RMSE is sensitive than other metric in abnormal value. The calculation formula of this metric is as follows:

    $${\text{RMSE}}=\left[ \frac{1}{n} \sum _{i=1}^{n}\left( {\hat{y}}_{i}-y_{i}\right) ^{2}\right] ^{\frac{1}{2}},$$
    (10)

    where \({y}_{i}\) is the information of real data; \({\hat{y}}_{i}\) represents the predicted value of \(y_{i}\); n represents the number of vehicles.

    Table 4 Open mobility trajectory datasets

7 Open mobility trajectory datasets and source code

In this section, we summarize the open datasets and code from the existing researches. We hope this section will help the successor to spawn more valuable work in this domain.

7.1 Open datasets

We categorize the datasets into three types. The first type is road network data. As previously mentioned, the road network data consist of point, line, and plane. These data show the basic structures of the region. They can be easily obtained from the Internet, such as OpenStreetMap.Footnote 6 The second type is the trajectory of pedestrians and vehicles data. On the one hand, these data main include longitude and latitude information. On the other hand, they can be matched with road network data. The third type is data generated by the simulation tools. The key to generating these data is the calculation of region demand traffic using domain knowledge.

The relevant open datasets are shown in Table 4.

(1) Geolife: this dataset includes timestamped points with latitude and longitude information collected from 182 users from April 2007 to August 2012. Al-Molegi et al. (2018) uses it as test datasets.

(2) ETH/UCY: these datasets contain thousands of trajectories from 1536 pedestrians, including abundant pedestrians interacting in the real world. ETH contains two subsets with different scenes (ETH-univ and ETH-hotel). UCY contains three subsets with different scenes (UCY-univ, UCY-zara01, and UCY-zara02).

(3) NGSIM: this dataset contains four trajectory subsets, namely: US-101, I-80, Lankershim Boulevard, and Peachtree Street. The first two are commonly used which record the trajectory of the vehicle on the highway. The I-80 dataset represents data collected on Interstate 80 in Emeryville, California on April 13, 2005. The US-101 dataset was collected on US Highway 101 in Los Angeles, California on June 15, 2005.

(4) PeMS: this dataset is provided by the California Transportation Department from 2001 to 2019. It contains various traffic-relevant data, such as congestion.

5) METR-LA: this dataset contains highway traffic information from Los Angeles County Road, collecting by loop detectors. Li et al. (2018) referenced the time period from March 1st to June 30th, 2012.

(6) Cologne trace: this vehicle mobility dataset is provided by the Institute of Transportation Systems at the German Aerospace Center (ITS-DLR) based on the project of TAPASCologne. This dataset covers 400 km\(^{2}\) during 24 h in a region.

Table 5 Open source code

7.2 Open source code

Open source code is not only helpful for researchers to compare the result with other methods but also inspires successors to think and deepen understanding during operation. Therefore, we provide the existing hyperlinks of open source code in this paper (as shown in Table 5).

All the open source code is built on the PyTorchFootnote 7 framework. For the SocialLSTM model, the core part is the LSTM sequence network and it can train on a single GPU. The SocialGAN model consists of three components: generator, max pooling and discriminator. The code is developed on Ubuntu 16.04 with Python 3.5 and PyTorch 0.4. Theoretically, the SocialWay model is an improvement on the basis of the Social-GAN model, such as the SocialWay model implemented attention pooling to replace max pooling. The code of the CurbGAN model and TrafficGAN model can develop on Ubuntu 16.04 with Python 3.6.7 and PyTorch 0.4.1.

8 Challenges and future opportunities

Mobility trajectory generation is very challenging because of the complicated relationship of spatio-temporal in mobility trajectory data. In addition, evaluating the generated results is also an important aspect of mobility trajectory generation. In this section, we introduce four common challenges and their corresponding solutions, make a comparison after a comprehensive survey on the knowledge-driven and data-driven approaches in mobility trajectory generation.

8.1 Long-term mobility trajectory generation

As above-mentioned, most existing work generates mobility trajectory data in a short-term range (\(\le 30\) min). Though knowledge-driven methods reviewed in Sect. 5.1 can generate large-scale and long-term mobility trajectory data, the fine-grained quality of these generated data is still worse than that generated by data-driven methods. Moreover, prior or external knowledge is hard to obtain in some scenarios.

As stated in Sect. 5.2, lots of data-driven methods learn the temporal correlations in data by RNN-based approaches. However, the time-consuming and gradient vanishing/explosion problems limit its capabilities of generating long-term sequences. Therefore, long-term temporal dependency learning is one of the most important challenges for mobility trajectory generation.

In future research, it is crucial to focus on the development of models capable of capturing global temporal dependencies. Currently, Attention-based methods (Vaswani et al. 2017; Kitaev et al. 2020; Zhou et al. 2021) have proven effective in learning long and global temporal dependencies. Moreover, integrating finer-grained knowledge into data-driven methods can guide the model in learning long-term dependencies within mobility trajectory data (Karpatne et al. 2017).

8.2 Spatio-temporal interactions

In mobility trajectory data generation, the basic factor is to make a model to learn spatio-temporal dependencies or correlations sufficiently. Knowledge-driven methods achieve impressive performance on macroscopic scenarios, which is inferior to data-driven methods on microscopic data generation.

Although data-driven methods can learn more fine-grained spatio-temporal correlations, the sequential learning manner of these methods still limits their capabilities to learn spatio-temporal interactions. Guo et al. (2019) argued that different spatial correlated locations at different time slots are considered to formulate different impacts on a given region in the future.

Most existing methods model the spatio-temporal correlations separately in generating mobility trajectory data. To overcome this challenge, future work should investigate representing mobility-related data in a more structured mode, such as using graph representations (Ye et al. 2020; Sheng et al. 2022) and knowledge graph triplets (Wang et al. 2020a). These representations can explicitly enhance the model’s learning of spatial and temporal interactions.

8.3 Model limitations

We reviewed the mobility trajectory data generation work based on their different modeling driving forces. Knowledge-driven methods rely very little on data and perform well on the macroscopic mobility trajectory data generation, e.g., area trajectory status generation (Kong et al. 2018). However, the accuracy of these methods can not support the fine-grained downstream application tasks. Data-driven methods depend largely on the data and can obtain accurate mobility trajectory data generation results. However, missing data, privacy protection or difficulty in data acquisition may limit the application of data-driven methods.

To tackle this challenge, future research should explore the integration of prior knowledge into data-driven approaches. Knowledge-assisted learning has garnered considerable interest in recent years due to its potential in reducing the complexity of learning and mitigating overfitting problems when dealing with limited data (Karpatne et al. 2017). An exemplary application of this approach is seen in COVID-GAN (Bao et al. 2020, 2022), which incorporates various factors such as population demographics, median income, epidemic conditions, and policy parameters into a generative adversarial network (GAN). This integration allows for the generation of accurate mobility data specifically tailored to the COVID-19 period.

8.4 Fixed representation

In mobility trajectory data generation, the popular way to represent data is the image-based representation. The map is divided into regular grids and the mobility trajectory data is transformed into each regular grids. Then, CNN is utilized to extract the features of these data. However, the spatial structure in mobility trajectory has been demonstrated more complex than the Euclidean space (Ye et al. 2020). This fixed representation of mobility trajectory data is the challenge in generating more accurate results.

In future works, the exploration of graph-structured data learning continues to hold significant promise (Wu et al. 2021; Lv et al. 2023). Given the prevalence of graph structures in traffic data, the integration of GNNs into deep learning frameworks, such as RNNs and GANs, offers a means to capture non-Euclidean spatial dependencies and obtain more precise generation outcomes.

9 Conclusion

In this survey, we present a novel taxonomy for the literature of mobility trajectory generation. We categorize the methods into two different paradigms: knowledge-driven and data-driven. Subsequently, we provide clear definitions of trajectory generation and analyze three common application scenarios of mobility trajectory generation. Detailed introduction of fundamentals, including theories, tools, and techniques commonly used in mobility trajectory generation, is given. We elaborate the knowledge-driven and data-driven methods according to the scenarios and fundamentals we introduced. Evaluation metrics, public datasets, and open-source code are introduced in this survey. Four open challenges future research directions are introduced based on the work we surveyed. We anticipate that this survey paper will facilitate readers in comprehending the fundamental concepts, application scenarios, relevant theories, and techniques in the area of mobility trajectory generation, thereby providing valuable insights.