1 Introduction

The increasing complexity of urban environments [2, 5] and the crucial role played by human displacements in the diffusion of epidemics, not least the COVID-19 pandemic [24, 28, 36, 43, 47, 51], have created a great deal of interest around the study of individual and collective human mobility [4, 34, 60]. The prevention of detrimental collective phenomena such as traffic congestion, air pollution, segregation, and epidemics spread, which is crucial to make our cities inclusive, safe, resilient, and sustainable [7, 25, 29, 57], depends on how accurately we can predict and simulate people’s movements within an urban environment.

In this regard, a particularly challenging task is generating realistic mobility flows, i.e., flows of people among a set of geographic locations given their demographic and geographic characteristics (e.g., population and distance) [4, 34, 37, 54, 60]. Traditionally, flow generation is addressed through the Gravity model [4, 8, 15, 30, 35, 65], the Radiation model [4, 55, 60], and their variants [4, 48, 54, 63]. The Gravity model assumes that the number of travelers between two locations (flow) increases with the locations’ populations while decreasing with the distance between them. The Radiation model is a parameter-free model that only requires information about geographic locations (e.g., population) and their intervening opportunities. The Gravity and the Radiation models are designed to generate single flows between pairs of locations and are typically used to complete a network in which some mobility flows are missing.

In this paper, we address mobility network generation, a variation of flow generation that consists in generating a city’s entire mobility network. A mobility network is a weighted directed graph in which nodes are geographic locations and weighted edges represent people’s movements between those locations, thus describing the entire set of mobility flows within a city.

Our solution to mobility network generation – MoGAN (Mobility Generative Adversarial Network) – is based on Generative Adversarial Networks (GANs) [20], deep learning architectures composed of a discriminator, which maximizes the probability to classify real and artificial mobility networks correctly, and a generator, which maximizes the probability to fool the discriminator producing artificial mobility networks classified by the discriminator as real. The choice of GANs is motivated by the fact that mobility networks can be represented as weighted adjacency matrices, similarly to how images are typically represented, and considering that GANs are tremendously effective in generating realistic images [13, 18, 20, 49]. While several papers show that GANs can generate individual mobility trajectories [16, 22, 26, 33, 34, 39, 44, 64] with a realism comparable to or better than mechanistic mobility models [4, 12, 23, 45], to what extent GANs can generate realistic mobility flows has never been explored in the literature.

We train MoGAN on a set of real mobility networks and develop a tailored evaluation methodology to test the model’s effectiveness in generating realistic mobility networks. We conduct extensive experiments on four public mobility datasets, describing flows of bikes and taxis in New York City and Chicago, US, to demonstrate that MoGAN generates synthetic mobility networks that are way more realistic than those generated by several baseline models, i.e., the Gravity, the Radiation, and the Random Weighted models. Our results prove that our solution can synthesize aggregated movements within a city into a realistic generator, which can be used for data augmentation and performing simulations and what-if analysis.

2 Mobility network generation

Mobility network generation consists of generating a realistic mobility network, i.e., a weighted directed graph in which nodes are locations and edges represent flows between those locations. The locations are defined by a discretization of the geographic space defined by a spatial tessellation, i.e., a covering of the bi-dimensional space using a countable number of geometric shapes called tiles, with no overlaps and no gaps [34]. In mobility networks, nodes are tiles of the spatial tessellation and edges flows of people among these tiles.

Formally, we define a mobility network as a weighted directed graph \(\mathcal{G} = (V, E, w)\), where:

  • V is the set of nodes, i.e., tiles of the spatial tessellation;

  • \(w: V \times V \mapsto \mathbb{N} \) is a function that assigns to each pair of nodes the number of people moving between the two nodes (mobility flow);

  • \(E = \{(x,y) | (x,y) \in V \times V \land w(x,y) \neq 0 \} \) is the set of the weighted directed edges in the network.

A mobility network may contain self-loops (edges in which the origin and destination coincide), which describe movements of people within the same tile. Here, we represent a mobility network as a weighted adjacency matrix \(\mathcal{A}_{n \times n}\) with \(n = |V|\). Thus, an element \(a_{i,j} \in \mathcal{A}\) represents the number of people moving from node i to node j, with \(i, j \in V\).

A generative model of mobility networks M is any algorithm able to generate a set of n synthetic mobility networks \(\mathcal{X}_{M} = \{ \hat{\mathcal{G}}_{1}, \dots , \hat{\mathcal{G}}_{n} \}\), which describe the set of mobility flows on a given spatial tessellation. The realism of M is evaluated with respect to:

  1. 1.

    A set of network patterns \(\mathcal{K} = \{ s_{1}, \dots , s_{m}\}\) that describe some statistical properties of mobility networks. A realistic set \(\mathcal{T}_{M}\) of synthetic mobility networks is expected to reproduce as many of these mobility patterns as possible.

  2. 2.

    A set \(\mathcal{X} = \{ \mathcal{G}_{1}, \dots , \mathcal{G}_{n} \}\) of real mobility networks that describe real flows on the same spatial tessellation. Typically, a portion \(\mathcal{X}_{\text{train}} \subset \mathcal{X}\) is used to train M or to fit its parameters. The remaining part \(\mathcal{X}_{\text{test}} \subset \mathcal{X}\) is used to compute the set \(\mathcal{K}\) of patterns, which are compared with the patterns computed on \(\mathcal{X}_{M}\).

  3. 3.

    A function D that computes the dissimilarity between two distributions. Specifically, for each measure in \(f \in \mathcal{K}\), \(D(P_{(f, \mathcal{X}_{M})}||P_{(f, \mathcal{X}_{ \text{test}})})\) indicates the dissimilarity between \(P_{(f, \mathcal{X}_{M})}\), the distribution of the measures computed on the synthetic mobility networks in \(\mathcal{X}_{M}\), and \(P_{(f, \mathcal{X}_{\text{test}})}\), the distribution of the measures computed on the mobility networks in \(\mathcal{X}_{\text{test}}\). The lower \(D(P_{(f, \mathcal{X}_{M})}||P_{(f, \mathcal{X}_{ \text{test}})})\), the more realistic model M is with respect to f and \(\mathcal{X}_{\text{test}}\).

3 MoGAN: a mobility generative adversarial network

To solve the problem of mobility network generation, we design MoGAN (Mobility Generative Adversarial Network), a deep learning architecture based on Deep Convolutional Generative Adversarial Networks (DCGANs) [49]. MoGAN consists of a generator G, which learns how to produce new synthetic mobility networks, and a discriminator D, which has the task of distinguishing between real and fake (artificial) mobility networks. G and D are trained in an adversarial manner: D maximizes the probability to correctly classify real and fake mobility networks; G maximizes the probability to fool D, i.e., to produce fake mobility networks classified by D as real. Both D and G are Convolutional Neural Networks (CNNs), which are proven to be effective in capturing spatial patterns in the data [34].

During the training phase, G repeatedly takes a \(1 \times 100\) noise vector as input and operates a series of transposed convolutions, which perform upsampling of the input vector to generate a \(64 \times 64\) adjacency matrix representing a mobility network. Then, D takes a set of real and generated \(64 \times 64\) matrices as input and performs a binary classification task to classify these matrices as real or fake. The above process is repeated for a certain number of epochs and stopped when some criteria are met (see Supplementary Note 1).

MoGAN leverages the architecture of DCGAN [49] and, as highlighted above, this implies that the shape of adjacency matrices must be \(64\times 64\). MoGAN could easily be extended to geographic areas with less than 64 zones, for example testing MoGAN ability of working with 0-padded mobility networks. On the other hand, working with more than 64 zones would necessarily require some form of aggregation of the zones, or a totally different GAN structure.

Once MoGAN is trained, G can be used to generate as many mobility networks as desired. A visual representation of the networks generated during the training phase can be found in Supplementary Note 2. Figure 1 schematizes and describes MoGAN’s architecture. Further details on MoGAN’s architecture and training can be found in Supplementary Note 1.

Figure 1
figure 1

Architecture of MoGAN. The generator (a Convolutional Neural Network or CNN) performs transposed convolution operations that upsample the input random noise vector, transforming it into a \(64\times 64\) adjacency matrix representing a mobility network. The discriminator (a CNN) takes as input both the generated mobility networks and the real ones from the training set and performs a series of convolutional operations that end up with a probability, for each sample, to be fake or real. Both the discriminator’s and generator’s weights are then backpropagated

4 Baseline models

We compare MoGAN with the Gravity and the Radiation models, two classical approaches for mobility flows’ generation [4, 34, 54, 55], using the implementations provided in library scikit-mobility [46].

The singly-constrained Gravity model [4, 8, 30, 65] prescribes that the expected flow, ȳ, between an origin location \(l_{i}\) and a destination location \(l_{j}\) is generated according to the following equation:

$$ \bar{y}(l_{i},l_{j}) = O_{i} p_{ij} = O_{i} \frac{m_{j}^{\beta _{1}}f(r_{ij})}{\sum_{k} m_{k}^{\beta _{1}} f(r_{ik})}, $$
(1)

where \(O_{i}\) is the number of people leaving location \(l_{i}\), \(m_{j}\) is the population of location \(l_{j}\) (estimated as \(O_{j}\)), \(p_{ij}\) is the probability to observe a trip (unit flow) from location \(l_{i}\) to location \(l_{j}\), \(\beta _{1}\) is a parameter and \(f(r_{ij})\) is the deterrence function, which is a function of the distance \(r_{ij}\) between two locations. We model the deterrence function as a power-law function, \(f(r) = r^{\alpha}\), where α is another parameter. These parameters can be fitted from a subset of available flows. We report the value of α and \(\beta _{1}\) resulting from the fitting of the model in Supplementary Note 3.

The Radiation model [4, 55] is a parameter-free model that aims to generate flows between locations given their characteristics (e.g., population) and the intervening opportunities among them. The choice of the destination consists of two steps: (i) we assign a fitness z to each location opportunity sampled from a distribution \(p(z)\) that represents the quality of the opportunity for each travel; (ii) the traveler ranks the opportunities according to their distance from the origin location and chooses the nearest location with a fitness higher than a certain threshold. As a result, the mean flow between two locations \(l_{i}\) and \(l_{j}\) is calculated as:

$$ \bar{y}(l_{i}, l_{j}) =O_{i} \frac{1}{1-\frac{m_{i}}{M}} \frac{m_{i} m_{j}}{(m_{i} + s_{ij}) (m_{i} + m_{j} +s_{ij})}, $$
(2)

where \(O_{i}\) is the number of people leaving location \(l_{i}\), \(m_{i}\) and \(m_{j}\) are the opportunities in \(l_{i}\) and \(l_{j}\), M is the sum of all the opportunities, and \(s_{ij}\) is the number of opportunities in a circle of radius \(r_{ij}\).

Note that the Gravity and the Radiation models do not solve mobility network generation directly. While MoGAN, once trained, can generate an entire mobility network, the Gravity and the Radiation models are designed to generate single flows between pairs of locations. To generate a mobility network using the Gravity and the Radiation models, we proceed as follows: (i) we take a real mobility network; (ii) for each node, we compute its relevance \(m_{i}\) and total outflow \(O_{i}\); and (iii) we use \(m_{i}\) and \(O_{i}\) in Equations (1) and (2). For the Gravity model, we also fit parameters \(\beta _{1}\) and \(\beta _{2}\) from the real mobility network assuming a power-law deterrence function. For both the Gravity and Radiation models, we use the implementations available in the library scikit-mobility [46], which provides methods to fit parameters and generate flows from locations’ relevance and outflow.

For a further analysis, we compare MoGAN with a Random Weighted (RW) model that creates a mobility network where the weight of each edge is randomly chosen from the distribution of weights for that edge in the training set. In other words, given an edge \(e = (i,j)\) connecting node i to node j in the mobility network, the edge weight \(\hat{w}(e)\) is a number picked at random from \(\{ w_{1}(e), w_{2}(e), \dots w_{n}(e) \}\), i.e., the distribution of the weights of e in the training set.

In terms of computational time required to generate a new mobility network, MoGAN is way faster (<1 second) than the Gravity model (about one minute) and the Random Weighted model (10-20 seconds). However, MoGAN needs a training phase that requires from 1 up to 3 hours depending on the dataset. In out experiments, we train MoGAN on a server with a GPU Tesla P100 with 16 GB of VRAM, 13 GB of RAM and a 2-core Intel Xeon CPU.

5 Experimental setup

5.1 Datasets

We use four real-world public datasets, which describe trips with taxis and bikes in New York City and Chicago during 2018 and 2019 (730 days). Two datasets contain daily information regarding the use of bike-sharing services: the City Bike Dataset for New York City [11] and the Divvy Bike Dataset for Chicago [14]. Each record describes the coordinates of each ride’s starting and ending station, and the starting and ending times. We remove trips with a duration lower than 60 seconds because they could be false starts or users trying to re-dock a bike to ensure it is secure [11, 14]. We also use two datasets containing daily information about the movements of taxis: the New York City taxi dataset [41] and the Chicago taxi dataset [9]. A record describes each ride’s starting and ending location and the starting and ending times. Both datasets are already preprocessed to remove dummy and noisy rides. In the Chicago taxi dataset, we know the GPS points corresponding to the starting and ending points of each taxi trajectory. In the New York City taxi dataset, we only know the trajectories’ starting and ending zones, i.e., administrative areas in New York City. We use an administrative area’s centroid as a taxi ride’s reference starting or ending point. We select the island of Manhattan for New York City and the central districts for Chicago (see Supplementary Figure S3) and split the selected zones into 64 equally-sized squared tiles (1840 meters per side for New York City, 1405 meters per side for Chicago). For each dataset, we count the daily number of taxis or bikes moving between each pair of tiles to obtain an origin-destination matrix representing the daily mobility network. We obtain, for each dataset, a representation of the daily flows in the city, which is divided into 64 equally spaced tiles. The mobility networks represent the flow of people moving, daily, across these zones. We remind that, since MoGAN is based on DCGANs which are designed to work with images (matrices) of size \(64 \times 64\), we are constrained to tessellate the city into 64 equally-sized tiles. This means that different cities have different tile size, depending on the city size.

We compute the relevance of each location (tile), which is needed for generating flows in the Gravity and the Radiation models, as the total number of daily drop-offs in that location. Table 1 shows some statistics about the datasets used in our study. As an example, Fig. 2 visualizes where bike stations concentrate and a mobility network representing daily flows in Manhattan, New York City.

Figure 2
figure 2

Examples of a real mobility network. (a) Position of bike stations in Manhattan. (b) A daily mobility network in Manhattan, where the size of each edge is proportional to the flow they represent

Table 1 Statistics of the four datasets used in our study. For each dataset, we provide the link to download it, the number of rides, the different locations, and the number of bikes/taxis. CHI = Chicago, NYC = New York City. For the NYC taxi dataset, taxi identifiers are not available and we do not know the total number of taxis. All datasets refer to trips in 2018 and 2019

5.2 Validation

We develop a tailored approach to evaluate the realism of the mobility networks generated by MoGAN. For each dataset, we construct a mobility network for each day obtaining 730 real mobility networks in total. We split the 730 networks into a training set (584 networks) and a test set (146 networks). We train MoGAN on the training set and generate 146 synthetic mobility networks (synthetic set). We evaluate the model’s realism computing the difference between each network in the synthetic set and each network in the test set, so obtaining \(146\times 146=21{,}316\) values. If the generated mobility networks are realistic, they should differ from the real networks to the same extent real networks differ between themselves. To stress this aspect, we create a set of 146 mobility networks (mixed set), in which half of them are chosen uniformly at random from the test set, and the other half is chosen uniformly at random from the synthetic set. We then compute the pairwise difference between any possible pair of mobility networks in the mixed set.

The idea behind this validation methodology is that we need to verify whether MoGAN is capable to reproduce the variability of mobility networks in the training set. If the distribution of differences among the networks in the synthetic set is similar to that of networks in the test set, MoGAN can approximate well the variability of mobility networks in the training set. The use of the mixed set further tests MoGAN’s ability to reproduce the variability in the training set: by verifying that the distribution of differences between networks in the synthetic set and those in the test set is similar to the distribution of network distances within the test set and the synthetic set separately, we argue that MoGAN can reproduce the variability of networks in the training set.

A crucial aspect is how to compute the difference between two mobility networks, considering that directed weighted networks are hard to compare, even in the case of known-node correspondence (i.e., networks with the same nodes but different edges) [56]. We compute this difference in two ways.

The first one consists of computing an error metric between two networks’ adjacency matrices. In our experiments, we try three error metrics: (i) Normalized Root Mean Square Error (NRMSE), (ii) Common Part of Commuters (CPC), and (iii) Cut Distance (CD). The Root Mean Square Error (RMSE) [34, 54] is defined as:

$$ \mathrm{RMSE}(A,B) = \sqrt{\frac{1}{n} \sum _{i,j=1}^{n} (a_{ij} - b_{ij})^{2} }, $$

where \(a_{ij}\) and \(b_{ij}\) are the elements (flows) in position \((i,j)\) in the two networks’ adjacency matrices of A and B and n is the number of elements of the matrices (\(64\times 64\)). Note that RMSE is substantially equivalent to the Frobenious norm (see Supplementary Note 5). The NRMSE is a min-max normalization of the RMSE, defined as:

$$ \mathrm{NRMSE} = \frac{\operatorname{RMSE}(A,B)}{\max(A,B) - \min(A,B)}. $$

The Common Part of Commuters (CPC), also known as Sørensen-Dice index [4, 31, 34, 54], a well-established measure to compute the similarity between real and generated matrices, is defined as:

$$ \mathrm{CPC}(A,B) = \frac{2\sum_{i,j=1}^{n} \min(a_{ij}, b_{ij})}{\sum_{i,j=1}^{n} a_{ij} + \sum_{i,j=1}^{n} b_{ij} }. $$

CPC is a widely used metric in human mobility studies [30, 34] and it ranges between 0 and 1. A CPC of 1 indicates a perfect match between the generated flows and the ground truth. On the other hand, 0 highlights a bad performance with no overlap. In other terms, CPC can be seen and interpreted as a metric of accuracy.

The Cut Distance (CD) [32] is based on the notion of cut weight, widely used in network theory [56], and measures how much a network is bipartite. The cut norm \(\|A\|_{C}\) of a real matrix \(A = (a_{ij} )\), \(i\in R\), \(j \in S\) with a set of rows indexed by R and a set of columns indexed by S, is the maximum over all \(I \subset R\), \(J \subset S\) of the quantity \(|\sum_{i\in I, j\in J} a_{ij} |\). The Cut Distance (CD) between two adjacency matrices A and B is the cut norm of their difference:

$$ \mathrm{CD}(A,B) = \max_{S \in V} \frac{1}{ \vert V \vert } \bigl\vert e_{A}\bigl(S, S^{C}\bigr) - e_{B} \bigl(S, S^{C}\bigr) \bigr\vert $$

with V being the number of nodes (64, in our case), \(e_{G}(S,T) = \sum_{i \in S, j\in T} w_{ij} \) is the cut weight of adjacency matrix G with weights \(w_{ij}\), i.e., the sum of the weights of the edges that starts in S and ends in T and \(S^{C} = V \setminus S\). [1]. Maximizing this quantity is a computationally heavy problem, so we use the Semidefinite Program (SDP) approximation proposed by Chan and Sun [42]. For calculating CD, we use the python implementation available in the library cutnorm [10].

The second approach to computing the difference between two mobility networks consists of comparing their distributions of edge weights and weight-distances. Edge weights indicate the values (flows) of the adjacency matrices describing the two mobility networks. Weight-distances indicate the combination of an edge’s weight (flow) and the distance between the two nodes composing the edge. We compute the weighted-distance adjacency matrix of a mobility network as \(\hat{A} = A / (d+\epsilon )\), where A is the network’s weighted adjacency matrix, d is the distance matrix having the same dimension and node ordering of A and representing the geographic distances between all pair of nodes.Footnote 1 We add the residual term \(\epsilon = 0.01\) to the denominator just to avoid dividing by zero only for elements on the diagonal of the adjacency matrices. Given two mobility networks, the more similar their distribution of edge weights or weight-distances are, the more similar the two mobility networks are. We measure the similarity between two distributions using the Jensen-Shannon divergence [17, 45]:

$$ \mathrm{JS}\bigl(P \vert \vert Q\bigr) = \frac{1}{2} \mathrm{KL} \bigl(P \vert \vert M\bigr) + \frac{1}{2} \mathrm{KL}\bigl(Q \vert \vert M\bigr), $$

where P and Q are two density distributions, \(M = \frac{1}{2} (P + Q)\), and KL is the Kullback–Leibler divergence (KL) [27, 58], defined as:

$$ \mathrm{KL}\bigl(P \vert \vert Q\bigr) = \sum_{x \in X} P(x) \log \biggl( \frac{P(x)}{Q(x)} \biggr). $$

An alternative to the usage of divergence metrics may consist in using kernels to measure similarities between graphs [40, 59]. However, while kernel methods compare networks’ representation in a latent space, in this paper we aim to capture the mobility network’s topological macro-scale features (e.g., degree distribution, clustering coefficient). For the sake of completeness, in Supplementary Note 10, we provide a comparison between topological properties of the generated and real mobility networks such as the clustering coefficient and the weighted degree distribution.

6 Results

Figure 3 shows the distribution of the Cut Distance (CD) in the four datasets’ test (red), synthetic (blue), and mixed sets (green) for MoGAN (left), the Gravity model (center), and the Radiation model (right). MoGAN’s CD distributions overlap almost entirely in all four datasets, meaning that MoGAN generates mobility networks that are indistinguishable from real ones and way more realistic than those generated by the baselines (except in two cases, see Supplementary Note 6). Similar results hold for the other metrics: MoGAN typically outperforms the baselines regarding CPC (Fig. 4) and RMSE (Supplementary Note 7). Table 2 shows, for each model, the JS-divergence between (i) the CPC distribution of the mixed and test sets and (ii) the CPC distribution of the synthetic and test sets.

Figure 3
figure 3

Results for the Cut Distance. Distributions of the pairwise cut distances between mobility networks in the test set (red), synthetic set (blue), and mixed set (green), for the four datasets. For each dataset, we compare the overlap among the distributions of MoGAN and the two baselines (Gravity and Radiation). The Radiation model’s mixed and synthetic sets distributions significantly differ from the test set for all datasets. In contrast, the Gravity model clearly outperforms the Radiation model for all datasets

Figure 4
figure 4

Results for the CPC. Distributions of the pairwise CPC distances between mobility networks in the test set (red), synthetic set (blue), and mixed set (green), for the four datasets. For each dataset, we compare the overlap of the distributions of MoGAN and the two baselines (Gravity and Radiation). For both the Gravity model and the Radiation model, the three distributions are significantly different, especially for the latter

Table 2 JS divergences of the distributions of the CPC scores. For each model, we report the JS divergence between mixed set and test set (column \(JS_{m}\)) and the JS divergence between synthetic set and test set (column \(JS_{s}\)). The last four \(\Delta _{x,Z}\)-like columns represent the improvement of MoGAN compared to the Gravity model on the mixed and the synthetic sets (columns \(\Delta _{m,G}\) and \(\Delta _{s,G}\)) and the improvement of MoGAN compared to the Radiation model on the mixed and synthetic sets (columns \(\Delta _{m,R}\) and \(\Delta _{s,R}\))

To compute the improvement in performance of MoGAN with respect to the baseline models, for each metric, each set and each baseline, we define the quantity:

$$ \Delta = - \biggl( \frac{JS^{(\text{MoGAN})} - JS^{(\mathrm{baseline})} }{JS^{(\mathrm{baseline})}} \biggr) \times 100, $$

where \(JS^{(\text{MoGAN})}\) is the JS divergence between the set (synthetic or mixed) of networks generated by MoGAN and the test set, while \(JS^{(\mathrm{baseline})}\) is the JS divergence between the set (synthetic or mixed) of networks generated by the baselines (Gravity, Radiation or Random Weighted) and the test set.

Table 2 shows that, according to the CPC, MoGAN outperforms the Gravity and Radiation models on all datasets, with a relative improvement of up to 86% on the Gravity model and 91% on the Radiation model over the mixed set, and a relative improvement of up to 49% on the Gravity model and 37% on the Radiation model over the synthetic set. We report the results of the comparison with the Gravity and Radiation models for RMSE, CD, weights distribution and weight-distances distribution in Supplementary Notes 7, 8 and 9.

MoGAN also outperforms the Random Weighted model for all proposed metrics. Figure 5 compares the performance of MoGAN and the Random Weighted model according to CPC. For each dataset, MoGAN’s test, synthetic and mixed set distributions are more overlapping than the ones of the Random Weighted model. We report the results of the comparison with Random Weighted model for RMSE, CD, weights distribution and weight-distances distribution in Supplementary Notes 11-14. Table 3 shows that, according to CPC, MoGAN outperforms the Random Weighted model for all datasets.

Figure 5
figure 5

Results for the CPC with Random Weighted Model. Distributions of the pairwise CPC distances between mobility networks in the test set (red), synthetic set (blue), and mixed set (green), for the four datasets. For each dataset, we compare the overlap of the distributions of MoGAN and the Random Weighted model. MoGAN distributions are perfectly overlapping, while the Random Weighted ones show significant differences

Table 3 JS divergences of the distributions of the CPC scores with the Random Weighted model. For each model, we report the JS divergence between mixed set and test set (column \(JS_{m}\)) and the JS divergence between synthetic set and test set (column \(JS_{s}\)). The last two \(\Delta _{x,Z}\)-like columns represent the improvement of MoGAN compared to the Random Weighted model on the mixed and the synthetic sets (columns \(\Delta _{m,\mathrm{RW}}\) and \(\Delta _{s,\mathrm{RW}}\))

MoGAN’s JS-divergences between the mixed and test sets and between the synthetic and test sets are the lowest for each dataset, meaning that our model produces the most overlapping distributions (see Table 2). Our results also show that the difference (either in terms of CD, CPC, or RMSE) between a real network and a synthetic one is similar to the difference between two real networks or two synthetic networks. This means that MoGAN generates realistic mobility networks that are, to a certain extent, indistinguishable from real ones.

Figure 6 shows the distributions of the pairwise similarities among the edge weights for the synthetic, mixed, and test sets built over the four datasets. For each dataset, we report the performances of MoGAN, the Gravity model, and the Radiation model. Again, MoGAN significantly outperforms the baselines, except for two cases (mixed set of NYC and CHI taxi) in which the Gravity model and MoGAN achieve similar performance. We find a similar result for the weight-distances (see Supplementary Note 7).

Figure 6
figure 6

Results for weights distribution. Distributions of the pairwise JS distance between the distribution of weights of the mobility networks in the test set (red), synthetic set (blue), and mixed set (green), for the four different datasets. For each dataset, we compare the overlap of the distributions of MoGAN and the two baselines (Gravity and Radiation). The Radiation model’s mixed and synthetic sets distributions significantly differs from the test set for all datasets. The situation is similar for the Gravity model performances. MoGAN distributions are almost overlapping for all four datasets

In Fig. 7, we compare a subset of the entries in the adjacency matrices representing the generated mobility networks with the adjacency matrix of a real mobility network. We compared only this part of the matrices for visualization reasons: the external part of them is, in fact, made up of 0 entries. For each model, we visualize the generated mobility network with the maximum sum of flows. We observe that MoGAN’s adjacency matrix is way more similar to the real one than the other models. The Gravity model produces an adjacency matrix that looks quite similar to the real one, but it lacks the self-loops (the elements on the diagonal). The Random Weighted model’s matrix resembles the real one but the magnitude of flows differ in several parts of the network. The Radiation model’s adjacency matrix is way different to the real one.

Figure 7
figure 7

Visual comparison of the adjacency matrices of the Mobility Networks. Visualization of the more dense part of the mobility networks of NYC Bikes having the maximum sum of flows observed in the Test Set (Real Zoomed) and of the Mobility Networks having the maximum sum of flows observed in the fake sets produced by all of the other models. Per each generated matrix, we reported the RMSE with respect to the Real matrix. In the top left panel, we show the full 64×64 mobility network and highlight the most dense zones, on which we focus in the other plots of the figure

Figure 7 shows that MoGAN is way better than the Gravity model at predicting flows between close tiles. In contrast, the two models reach a similar performance for flows regarding tiles that are very distant to each other. In Supplementary Note 15, we report the correlation between the error and the distance between flows’ tiles for the BikeNYC dataset, for both MoGAN and the Gravity model.

7 Conclusion

This paper introduces MoGAN, a deep-learning-based model for generating realistic urban mobility networks. Our results, conducted on four public datasets representing flows of bikes and taxis in New York City and Chicago, show that the realism of the networks generated by MoGAN outperforms those generated by classic models such as the Gravity and the Radiation models.

Although MoGAN’s performance is encouraging, it also has some limitations. Being based on DCGAN [49], MoGAN can generate \(64\times 64\) adjacency matrices, that is, mobility networks with 4096 locations. We plan to extend MoGAN’s architecture to generate mobility networks with an arbitrary number of nodes as future improvements. Other technical improvements may be the use of Graph Neural Networks (GNNs) [52], which would better capture the network dependencies and include other location-related information (e.g., population or relevance), and the use of the Wasserstein loss [3], which improves the performance of GANs in several contexts [21, 62]. It would also be interesting to test MoGAN’s effectiveness on cities of different sizes and shapes and regarding the generation of individual mobility trajectories, which represent the aggregated movements of single individuals among a city’s locations [6, 50, 53]. Finally, we plan to design a version of MoGAN capable of generating a network describing the mobility network of a weekday, a weekend day, or a specific day of the week.

An important aspect to investigate as future work is also to what extent MoGAN is geographically transferable [34], i.e., it can be trained on a specific city and then used to generate mobility networks in a different city effectively. Geographic transferability can be crucial when there is a scarcity or even an absence of mobility data for a city.

In this study, we use data from Chicago and New York City, which differ considerably by size, population, and shape, as well as by socio-demographic factors, POIs distribution, land use, etc. So, it does not make sense to transfer Chicago’s MoGAN to New York City and vice versa. We leave experiments about the geographic transferability of MoGAN among cities to future works.

As MoGAN leverages the architecture of DCGAN, it only works with \(64\times 64\) matrices. While representing geographic areas with less than \(64\times 64\) zones is not an issue (using, e.g., padding techniques [19]), in its current version, MoGAN cannot work with areas split into more than \(64\times 64\) zones. Future extensions of MoGAN may consider using GAN architectures that deal with matrices larger than \(64\times 64\) [61]. Adapting such models to deal with temporal and spatial aspects would allow us to design a new GAN for mobility flows to deal with larger geographic areas.

Another promising future direction is developing a GAN to generate a realistic mobility network for a specific condition (e.g., a rainy day or a day with some public events in the city). Having a so-called conditional GAN [38] may represent a unique opportunity for policymakers to generate realistic scenarios for specific circumstances. Finally, an exciting open challenge consists in interpreting which rules or well-known mobility laws (e.g., the gravity law) generative models are learning.

In the meantime, our study demonstrates the great potential of artificial intelligence to improve solutions to crucial problems in human mobility, such as the generation of realistic mobility networks. MoGAN can synthesize aggregated movements within a city into a realistic generator, which can be used for data augmentation, simulations, and what-if analysis. Given the flexibility of the training phase, our model can be easily extended to synthesize specific types of mobility, such as aggregated movements during workdays, weekends, specific periods of the year, or in the presence of pandemic-driven mobility restrictions, events, and natural disasters.