1 Introduction

Since 2010, we have witnessed the dawn of personal handheld devices that have completely transformed our digital live. These devices have continuously been enriched with potent positioning capabilities thanks to GPS, WiFi and Bluetooth sensors. Therefore, users can now link their digital life to actual real-world physical places.

This has generated an unprecedented amount of human location data [1] and has promoted the development of the mobility mining discipline as a prominent course of action within the data science ecosystem [2]. In this context, one of the major findings in this discipline is that human mobility is highly predictable [3].Therefore, many techniques and solutions have emerged for human mobility forecasting [4]. Moreover, the anticipation of human flow has important applications in diverse domains like healthcare [5, 6], urban services [7] and transportation management [8].

These solutions often rely on human movement trajectories on different spatial and temporal scales from a large variety of sources like GPS traces [9], Call Detail Records (CDRs) [10] or Online Social Media (OSM) posts [11]. To analyze these sources, different techniques based on statistical methods [12], machine learning algorithms [13] and, more recently, deep learning [6] have been used.

However, two important limitations in the existing human-mobility predictors are observed:

  • First, most of the current solutions are geographically bound to urban settlements covering intra-urban displacements (see Fig. 1). They provide predictions only regarding the movement of people within such constrained areas. However, there is a lack of proposals predicting respect to larger spatial scenarios, like nation-wide or inter-urban human mobility. Thus, the anticipation of large-scale crowd flows could be instrumental for developing, for example, effective lockdown policies if a pandemic scenario occurs.

  • Second, these solutions take, as a primary data source, the raw spatiotemporal trajectories generated by different moving objects like vehicles or crowds of people. Nevertheless, in a real-world setting the access to this raw data is usually restricted due to several privacy and economic policies defined by data providers and operators [14]. Furthermore, the open data movement has promoted the release of an increasing number of human-mobility datasets [15,16,17]. However, they are usually filtered and pre-processed to complain with several restrictions as location data is sensible in terms of privacy. Developing predictors leveraging such coarse-grained mobility data is still scarce in the mobility mining domain.

Fig. 1
figure 1

Scales of human mobility. The red arrows of the leftmost figure represent the intra-urban human flows among different areas within a city A. The rightmost figure depicts the mobility flows defined at larger scale among spatial regions (R1,R2,R3) that include several cities

This study therefore proposes a novel mechanism for human mobility forecasting on a nationwide scale. In brief, the solution can anticipate the flows of people among administrative areas defined on a large geographical scale. Consequently, the solution operates at a spatial granularity larger than existing solutions. Besides, it does not require the collection of raw spatiotemporal trajectories, but aggregated mobility data regarded as emissions of the geographical regions. As shown in the rightmost part of Fig. 1, the solution aims to predict the flows of people among regions R1, R2 and R3. Our proposal has been tested with a nationwide human mobility open dataset released by the Spanish Ministry of Transportation based on the mobile phone location data from several telephone operating carriersFootnote 1.

Because the latent human flow connections among the target areas can be regarded as a sparse graph, a compelling alternative to forecast the mobility in these areas are Graph Neural Networks (GNNs). These particular neural networks have been successfully applied in many classification and regression tasks in graph-based scenarios [18]. Thus, the composition of an accurate GNN calls for the proper definition of the edges among the geographical areas acting as nodes. Much effort has been put within the spatial analysis field for the formal definition of techniques and mechanism to evaluate the latent correlations and relationships among spatial areas [19]. Therefore, given the target spatial granularity of the proposal, we leveraged several techniques and concepts from the spatial analysis field to define the adjacency among areas (nodes) of the target graph.

The contributions of this study are two-fold, 1) the development of a large-scale human mobility predictor based on GNN, and 2) the usage of spatial analysis knowledge to uncover connections among the target spatial areas to compose the input graph.

The rest of the paper is structured as follows. Section 2 summarizes current approaches for human mobility prediction. Section 3 is devoted to describing the human mobility feed used in the study. Section 4 puts forward the proposed forecasting mechanism. Section 5 discusses the major results of the experiments. Finally, the main conclusions and the future work are summed up in Section 6.

2 Related work

Many proposals for human mobility forecasting can be found in the literature, which can be classified according to their input data, applied methods, or prediction type (see Table Table 1).

Table 1 Human mobility prediction approaches

The applied methods can be categorized into parametric and non-parametric approaches [20]. The former ones relate to proposals that focus on generating a model based on a palette of theoretical or physical foundations and tune its set of parameters. Examples of these approaches are ARIMA [21] and linear regression models [22]. Nonetheless, these studies do not offer a reliable performance under abnormal mobility situations due to, for example, hazardous weather events. However, non-parametric methods can cope with such type of scenarios [23]. In this scope, Deep Learning (DL) algorithms have been widely used for such a forecasting task as Table 1 shows.

From this table, different forms of Recurrent Neural Networks (RNNs) have been one of the most applied DL solutions. One clear example of this are Long Short-Term Memory (LSTM) networks. Thus, the proposal Miyazawa et al. [28] fused data from two mobility feeds, OSNs and aggregated GPS trajectories, to provide predictions about individual movements in the city of Tokyo. The authors fed a LSTM network with the embeddings generated from the spatiotemporal and textual data extracted from the OSN and GPS traces. Likewise, Kong and Wu [26] made use of a spatiotemporal LSTM model (ST- LSTM) to predict the next area of interest visited by a particular user based on his/her previous trace of visits to other city’s areas. The proposal was successfully tested by using location data extracted from a Location-Based Service (LBS) in Beijing. Moreover, in a fine-grained spatial scale, an LSTM network is composed by Zou et al. [27] to predict the collision-free trajectories of a set of pedestrians. Another example is an attention mechanism along with a bidirectional LSTM as stated by Zhao et al. [29], to develop a user destination prediction based on LBS data. The system relies on a mapping process where the raw spatiotemporal trajectories of the users are translated into grid-based tessellations of the urban setting. Wang et al. [30] extended this bidirectional LSTM model with an attention mechanism to forecast urban traffic conditions in peak hours using external factors such as weather conditions. The authors tested their proposal using taxi and bike datasets from New York. All these approaches are limited as they require the collection of individual trajectories. In contrast, our approach focuses on forecasting the flows of people that move from one spatial region to another in an aggregated manner.

Convolutional LSTM (ConvLSTM) networks have been also applied for human mobility prediction [24, 25]. Moreover, the work by He, Chow, and Zhang [24] defines a grilled spatial tessellation of a city. A ConvLSTM model was used to predict the incoming and outgoing public transportation flows of each city on an hour-based granularity. The model was tested using taxi and bike-sharing mobility data from New York and Beijing. Yang et al. [25] used a ConvLSTM for traffic condition estimation in Beijing. Here, the proposed methodology first identifies the critical road segments of a city with a correlation analysis. Hence, it is possible to feed the ConvLSTM with only such critical roads instead of the full network of roads of the target city.

Another important type of RNN are Gated Recurrent Unit (GRU) models. An interesting usage of this model is put forward by Fan et al. [32], where an ensemble of GRU models, each focusing on a particular target day, was composed to detect both regular and abnormal citywide mobility. Then, a predictor was built based on this ensemble to predict the next locations of the monitored users. The same model was applied by Feng et al. [31], where two attention modules were tailored to the network to improve the selection of the target historical trajectories to perform the prediction. The mechanism was tested using different location data at different granularities in three cities, New York, Beijing, and Shanghai. Wang et al [30] proposed the combination of a GRU with a CNN along with an attention mechanism to predict the traffic speed on certain roads of Paris. This proposal can detect traffic trends depending on the day of the week and without using past events.

Last, GNNs have been also proposed as enablers for human mobility prediction. Hence, Zhao et al. [35] applied a traffic graph-convolutional LSTM to predict the traffic conditions of the cities of Shenzhen and Los Angeles with two datasets from inductive loop detectors and aggregated GPS trajectories to compose the latent input graphs. Likewise, Cui, Henrickson, Ke, and Wang [34] used a similar approach for the city of Seattle. Thus, authors consider the free-flow traffic dependencies among road networks to compose the graphs.

There are multiple proposals for human mobility prediction based on DL algorithms. Such works rely on vastly different data sources and operate for different types of citywide human mobility (e.g., individual vs. crowd). However, our proposal focuses on providing a nationwide predictor based on a GNN, a model not fully explored for human mobility analysis. Thus, it is impossible to directly transfer the models generated for an intra-urban scale to an inter-urban scenario. The relationships among nodes in an intra-urban graph differ from inter-urban. Whereas in citywide GNN proposals such links are modeled based on the underlying road topology of the city [34, 35]. The geographical regions on a nationwide scale cannot be solely linked themselves, based on such an infrastructure network, but it is necessary to consider other spatial, temporal, and demographic factors that might govern the human flows among such regions as it is discussed in Section 4.3.

3 Description of the human mobility dataset

This study analyzes a nationwide human mobility dataset released by the Spanish Ministry of Transportation (SMT) in December 2020. This dataset covers a 9-month period of mobility data in Spain from February 29th to November 30th, 2020, and it contains the number of trips among 3216 administrative areas (herein Mobility Areas, MA) per hour. These areas are created ad hoc by the SMT and include the entire country (peninsular and insular extension). A single trip in the dataset is considered as the spatial displacement of an individual at a distance above 500 meters. Consequently, this dataset can be regarded as a set of tuples taking the form:

$$\langle date, hour, m_{origin}, m_{dest}, n_{trp} \rangle $$

reporting that there were ntrp trips from the MA morigin to the MA mdest during the indicated date and hour.

According to the official documents [36], these data have been collected through CDRs from 13 million users of an unspecified mobile phone carrier. Once anonymized, this dataset was used to infer representative mobility statistics on the nation-level of the population of Spain and made publicly available as open data.

In its raw form, the dataset comprises 830,450,300 trips between MAs. Figure 2 shows the spatial distribution of the total number of outgoing trips per MA during the entire period of the study, where each polygon represents a particular area. It can be seen there is a strong spatial aggregation among the MAs where areas with similar values are spatially group together.

Fig. 2
figure 2

Geographical distribution of the sheer number of outgoing trips of the Spanish MAs during the entire period of study

Furthermore, Fig. 3 depicts descriptive features of the MAs. Thus, most areas emitted between 5 and 20 million trips on average as Fig. 3a depicts. In terms of spatial size, each area covered around 2.5 km2 (Fig. 3b) and an average population of 14,428 people in most cases (Fig. 3c). These are values larger than usual for regular city neighborhoods. For example, New York City has 390 neighborhood tabulation areas (instead of 3216 as in this study) whose average population was 45,322 in 2010Footnote 2. Finally, Fig. 3d depicts the distribution of the number of neighboring areas per MA based on the Queen contiguity definition, MAs that share either a corner or an edge. The irregular tessellation of the regions makes this contiguity feature fluctuate to a great extent, unlike regular grid-based spatial partitions where all the regions have the same number of adjacent cells. Fig. 4 also shows almost 60% of the outgoing trips of an MA go toward non-adjacent areas whereas the other 40% end in the adjacent areas of the MA.

Fig. 3
figure 3

Descriptive parameters of the target MAs

Fig. 4
figure 4

Number of trips between adjacent and non-adjacent mobility areas

Furthermore, Fig. 5a confirms that there is a strong positive correlation between the population of an area and its sheer number of outgoing trips; thus, the larger the MA population the larger its outgoing trips.

Fig. 5
figure 5

Evolution of the number of outgoing trips based on different parameters. Each blue dot represents a particular pair of origin-destination MAs

We also studied the distribution of the number of inter- MA trips based on the distance between the origin and destination MAs. Fig. 5b shows areas close to each other. This distribution seems to follow a power-law distribution widely discussed within the human-mobility mining discipline [37]. Briefly, this distribution dictates that the magnitude of trips between two areas decreases in an exponential proportion of their distance.

Concerning the connectivity level among MAs, Fig. 6 shows the number of origins and destinations per MA. Thus, an MA is regarded as an origin of another MA if the former emits trips toward the latter whereas a destination MA receives trips from another MA. As it can be observed, the mobility dataset exhibits a sparse connectivity among areas since an MA is connected on average with other 328 MAs. This only represents 10% of the total areas of the setting.

Fig. 6
figure 6

Box plots for the number of origins and destinations per MA

Finally, Fig. 7 depicts the spatial autocorrelation of MAs based on their spatial lag. For an MA mi, its spatial lag sli is calculated as:

$$sl_{i}=\sum\limits_{j=0}^{3215} w_{i,j} \times ot_{j}$$

where wi, j is the spatial weight of 〈mi,mj〉 and otj is number of outgoing trips of mj. Here, the spatial weights are defined with the Queen adjacency described above, thus wi, j = 1 if the area adjacent and wi, j = 0 otherwise

Fig. 7
figure 7

Spatial auto-correlation of the MAs regarding their outgoing number of trips

In Fig. 7, the spatial lag analysis confirms the high spatial autocorrelation observed in Fig. 2. Note that the largest values (depicted by a darkest color) occur in particular regions of Spain. These regions spatially overlap with the actual locations of the most important Spanish cities.

4 Prediction of human flows with GNNs

This section describes the proposed solution to predict the human mobility flows among MAs based on GNNs.

4.1 Problem formulation

The prediction problem this work deals with can be formulated as follows:

Given

the set of MAs, \({\mathscr{M}}\), and the sequence of outgoing trips from each \(m_{i}\in {\mathscr{M}}\) during the last hprev hours, \(\mathcal {OT}_{i}^{h}=\langle o{t_{i}^{t}},ot_{i}^{t-1},\)\(,ot_{i}^{t-h_{prev}}\rangle \), Find a mapping function \(\mathcal {F}\),

$${\mathcal{F}(\mathcal{OT}_{i}^{h}) \rightarrow ot_{i}^{t+T}}$$

where \(ot_{i}^{t+T}\) is the number of outgoing trips of each MA in T hours ahead \(\forall m_{i} \in {\mathscr{M}}\).

By solving this problem, authorities can anticipate the nation’s short term global movement. This will have practical applications such as enabling adaptive prices of certain public transport services or the adoption of spatially constrained lockdown actions in pandemic scenarios where only certain sets of MAs are affected.

4.2 Suitability of a GNN-based approach

To solve the prediction problem described, we must consider the features of the mobility dataset described in section 3. Thus, two important patterns observed in such mobility feed should be highlighted:

  • MAs emit not only trips toward their adjacent areas. As Fig. 4 depicts, most of the outgoing trips from MAs go toward non-adjacent areas. Therefore, the spatial closeness is not the unique factor that affects the human mobility behavior of an area. Furthermore, the number of adjacent areas highly varies among MAs. Therefore, a solution relying on regular grid convolution operators cannot capture such non-adjacent and complex relationships among areas.

  • However, the underlying graph of MAs that arises when considering their trip connections is actually a sparse model where an MA is connected, on average, to only 10% of the total areas of the setting (Fig. 6). These sparse connections among MAs are meaningful and thus seem a reasonable parameter to be included in the model for human mobility forecasting.

Consequently, it seems a feasible approach to leverage the classification and regression capabilities of a GNN model to solve the target prediction problem for this topological structure with irregular and sparse connections among MAs.

4.3 Design of the GNN

This section describes the GNN model used for the prediction problem. For clarity, Table 2 summarizes the key acronyms and symbols used in this section.

Table 2 Acronyms of the model

To generate the GNN model, we firstly compose a graph \(\mathcal {G}=\langle {\mathscr{M}}, E\rangle \) where the nodes \({\mathscr{M}}\) are the MAs, \(|{\mathscr{M}}|=N=3216\) and the edges E represent the connections among them. Such connections are defined by an adjacency matrix \(\mathcal {A}\). Moreover, a set of feature matrices \(\mathcal {OT}_{t} \forall t \in \mathcal {T}\) comprising the last hprev outgoing trips of the MAs at time t is also generated.

The pipeline of the GNN used for the present setting is depicted in Fig. 8. The model uses as an input the matrices \(\mathcal {A}\) and \(\mathcal {OT}\) to perform the prediction task. Then, the spatial dependencies among MAs are modeled with a set of graph-convolutional layers whereas the temporal ones are processed by a stack of LSTM layers. Last, a Multilayer Perceptron (MLP) processes the sequences from the LSTM layers.

Fig. 8
figure 8

Pipeline of the used GNN with its general structure of layers

Moreover, each GCN layer defines a filter H in the Fourier domain that acts on the first-order neighbors of each node. This way, the output of the l-th graph-convolution layer can be expressed as,

$$ H^{(l)}=ReLU (\tilde{D}^{-\frac{1}{2}} \hat{\mathcal{A}} \tilde{\mathcal{D}}^{-\frac{1}{2}} H^{(l-1)} \theta^{(l-1)}) $$
(1)

where \(\hat {\mathcal {A}}\) is the adjacency matrix with self-connections that is defined as \(\mathcal {A} + I_{N}\), where IN is the identity matrix; \(\tilde {D}\) is the degree matrix of the graph; H(l− 1) is the output; 𝜃(l− 1) the weights of the previous layer; and ReLU() stands for the REctitifed Linear Unit function [38]. Thus, the input of the first layer H(1) is the incoming matrix \(\mathcal {OT}_{t}\).

The rationale of using a filter targeting only for the one-hop neighbors of an MA is that the target mobility feed only indicates the initial origin and final destination of a trip without including its stopovers as it was pointed out in Section 3. Consequently, the underlying relationships are only defined among pairs of MAs.

Regarding the LSTM layers, Fig. 9 depicts a general schema of this type of cell. It is a foremost type of RNN cell able to retain short-term along with long-term patterns in time series. Moreover, h(t− 1) indicates the short-term state at time instant t, \(\mathcal {OT}_{t}\) is the feature matrix of outgoing trips at instant t and \(\mathcal {A}\) is the adjacency matrix. Concerning the cell outputs, \(y_{(t)}\in \mathbb {R}^{N}\) is the vector with the predicted outgoing trips at instant t whereas c(t) is the long-term state that traverses the network from left to right. Moreover, GC represents the stack of graph-convolution layers, Lg, described before.

Fig. 9
figure 9

Inner structure of the LSTM cells of the model. FC: Fully Connected

Focusing on the four inner gates of the cell that modulates the outputs of the model, they can be formulated as follows:

$$ f_{(t)}= \sigma(W_{f}\ GC^{out}_{t} + U_{f}\ h_{(t-1)} + b_{f})\\ $$
(2)
$$ g_{(t)}= tanh(W_{g}\ GC^{out}_{t} + U_{g}\ h_{(t-1)} + b_{g})\\ $$
(3)
$$ i_{(t)}= \sigma(W_{i}\ GC^{out}_{t} + U_{i}\ h_{(t-1)} + b_{i}) $$
(4)
$$ o_{(t)}= \sigma(W_{o}\ GC^{out}_{t} + U_{o}\ h_{(t-1)} + b_{o})\\ $$
(5)

where \(GC^{out}_{t}\) are the features extracted from the graph convolution layers, W{f, i, o, g} are the weight matrices for these features, U{f, i, o, g} are the weight matrices for the connections to the previous short-term state h(t − 1) and b{f, i, o, g} are the bias terms of the four gates.

Finally, the outputs of the cell are computed as follows:

$$ c_{(t)} = f_{(t)} \otimes c_{(t-1)} + i_{(t)} \otimes g_{(t)} $$
(6)
$$ y_{(t)} = h_{(t)} = o_{(t)} \otimes tanh(c_{(t)}) $$
(7)

where ⊗ is the element-wise multiplication operator. Note that the outputs of the last LSTM layer feed an MLP with Lmlp layers. Thus, the last layer of the MLP provides the vector \(\hat {ot}^{t+1}_{{\mathscr{M}}}\in \mathbb {R}^{N}\) with the final outgoing trips prediction \(\hat {ot}^{t+1}_{i}\) for each \(m_{i} \in {\mathscr{M}}\). This vector is computed as follows

$$ \hat{ot}_{\mathcal{M}}^{t+1}= \sigma(W_{L_{mlp}}\ X + b_{L_{mlp}}) $$
(8)

where X is the output of the previous layer Lmlp − 1 whereas \(W_{L_{mlp}}\) and \(b_{L_{mlp}}\) are the weight matrix and bias term of the last MLP layer, respectively.

We should mention this model is slightly based on the one proposed by Cui, Henrickson, Ke, and Wang [34]. However, our pipeline includes two major differences. First, a dense layer is included on top of the model. This allows us to achieve a better performance and a lower over fitting than the original model. Second, the temporal dependencies are captured with LSTM cells instead of GRU ones. Although GRU cells are more computationally efficient, we have chosen LSTM cells due to their capability to handle long-distance dependencies.

4.4 Definition of the adjacency matrix

One paramount input parameters of the proposed GNN model is the adjacency matrix \(\mathcal {A}\), which indicates the latent connections among MAs in the graph \(\mathcal {G}\). Unlike other proposals that focus on the road traffic environment [34, 35], in the present setting there is not a latent infrastructure of streets or paths defining the links among nodes.

Therefore, a set of five alternative adjacency matrices have been defined so as to establish the connections among the areas. Each focuses on a particular spatial and mobility feature of the setting. The matrices are:

  • Distance-based matrix \(\mathcal {A}_{d}\). This is a straightforward weighted adjacency matrix where each cell \(a_{ij}\in \mathcal {A}_{d}\) indicates the distance in kilometers between MAs mi and mj.

  • Queen-based matrix \(\mathcal {A}_{q}\). This adjacency matrix is based on the Queen neighbourhood concept (see Section 3). Thus, a cell \(a_{ij}\in \mathcal {A}_{q}\) takes value 1 when MAs mi and mj are Queen-based neighbors, 0 otherwise.

  • Gravity-based matrix \(\mathcal {A}_{gr}\). This adjacency matrix is based on the gravity model for human mobility [39]. Briefly, this model establishes that the magnitude of displacements MDa, b between two regions ra, rb can be computed with the formula \(MD_{a,b}=\frac {P_{a}\times P_{b}}{Dist_{a,b}}\) where P{a, b} is the population of each region and Dista, b its distance. This way, this weighted adjacency matrix is populated with such a model so that a cell \(a_{ij} \in \mathcal {A}_{gr}\) is set to MDi, j.

  • Queen gravity-based matrix \(\mathcal {A}_{qgr}\). This matrix combines the gravity model and the queen adjacency concept. Therefore, a cell \(a_{ij}\in \mathcal {A}_{qgr}\) takes 1 as value if mj is a Queen neighbor of mi or it is one the top 100 MAs with the largest magnitude of displacements with mi, 0 otherwise. With this matrix, two MAs are adjacent if they are geographically close (due to the Queen neighborhood) or, theoretically, there must exist a large flow of people moving between them (because of the gravity model).

  • Spatial-lag based matrix \(\mathcal {A}_{sl}\). This last matrix is based on the spatial lag concept described in Section 3. Each cell \(a_{ij}\in \mathcal {A}_{sl}\) is defined as aij = |slislj| where sl{i, j} is the spatial lag of each MA. Hence, this matrix comprises the similarity among areas in terms of outgoing trips of their neighbors.

Notably, all the matrices consider the spatial closeness among areas at different degrees along with other external factors such as their population or spatial lag. Moreover, whereas \(\mathcal {A}_{q}\) and \(\mathcal {A}_{qgr}\) are unweighted adjacency matrices, \(\mathcal {A}_{d}\), \(\mathcal {A}_{gr}\) and \(\mathcal {A}_{sl}\) are weighted. This allows testing the model by considering different latent relationships among areas.

5 Evaluation of the model

This section evaluates the model described in Section 4 using the nation-wide mobility feed described in Section 3. A comparison with other alternative approaches for nation-wide human mobility prediction is also included as part of the evaluation.

5.1 Dataset pre-processing

For this evaluation we have set hprev = 12 so that the model will use the last 12 hourly outgoing trips of each mi to forecast the number of these outgoing trips in T hours time, \(\hat {ot}_{i}^{t+T}\). Consequently, a set of 12 matrices \(\mathcal {OT}_{t-j}\in \mathbb {R}^{N \times 12} \forall j \in \langle 0,..,11 \rangle \forall t \in \mathcal {T}\) have been composed. It is worth mentioning that the whole 9-month time dataset described in Section 3 has been used for the evaluation.

5.2 GNN configuration

Table 3 shows the key parameters of the GNN model used in the evaluation. These values were obtained with a grilled-search approach using a training rate (tr) of 0.9.

Table 3 Model parameters for the experiments

Five different GNN models were actually composed, each with a particular adjacency matrix as defined in Section 4.4. Nevertheless, all share the same configuration parameters listed in Table 3. Therefore, it is possible to easily compare the effect of each adjacency matrix on the accuracy of the final model.

5.3 Baseline methods

We have compared our approach with three baseline and state-of-art models:

  • Autoregressive Integrated Moving Average model (ARIMA) [21]. This is a foremost predictor within the time series analysis domain. In brief, it fits a regressive model to a time series to predict new values.

  • LSTM network. An RNN using LSTM cells has been also developed as candidate predictor. As discussed in Section 2, this is a well-known method for human mobility prediction at different domains.

  • Finally, a naive method that just returns the last value \(o{t_{i}^{t}}\) in each series as the predicted value \(\hat {ot}_{i}^{t+T}\). This type of straightforward mechanism is sometimes quite difficult to outperform.

For clarity, Table 4 includes the key parameters of the ARIMA and LSTM models listed above.

Table 4 Parameters of the candidate models

Notably, each of the three candidate models were individually fit for each MA in the dataset. Therefore, 3216 models of each candidate were generated for the evaluation performed in this work. In this manner we can compare our solution—which comprises a single model covering all the MAs in the dataset—with an ensemble of models targeting each MA in an isolated manner.

5.4 Metrics

Three well-known metrics in the mobility forecasting domain have been used to evaluate our approach and the baseline methods:

  • Mean Absolute Error (MAE): MAE= \(\frac {1}{N} \times \sum \limits _{i=1}^{N} |ot^{t+1}_{i} - \hat {ot}^{t+1}_{i}|\)

  • Mean Squared Error (MSE): MSE= \(\frac {1}{N} \times \sum \limits _{i=1}^{N} (ot^{t+1}_{i} - \hat {ot}^{t+1}_{i})^{2}\)

  • Root Mean Squared Error (RMSE): RMSE= \(\sqrt {MSE}\)

5.5 Results discussion

Table 5 comprises the evaluation results regarding three metrics for different time dimensions (T). First, it shows the results of the different adjacency matrices for the proposed GNN model.The highest accuracy for most of the prediction horizons is achieved by using the adjacency matrix based on the gravity model (GNNgr) with MAEs ranging from 136.877 to 156.773 for T up to 6h. However, the model using the distance matrix achieves a worse result with MAE around 200 in all the cases.

Table 5 Metric values of the candidate models for different time horizons (T)

Furthermore, the GNN based on the gravity model combined with the Queen neighborhood, GNNqgr, achieves quite a similar result than the one with the spatial-lag adjacency matrix (GNNsl) for short prediction horizons (T = 1h). This might indicate that both models capture different but inter-related connections among MAs. This is plausible because both types of matrices consider at different levels the amount of trips emitted by each MA and their geographical connections with their neighborhoods.

Regarding the non-weighed adjacency matrices (GNNq and GNNqgr), Table 5 shows that enriching the Queen-based neighborhood with relevant connections extracted from the gravity model creates a more robust model. Hence, the GNN based on this matrix (GNNqgr) obtains better results in the three metrics than the one solely based on the Queen neighborhood (GNNq) for all the evaluated prediction horizons. Besides, it is worth mentioning that the GNNqgr matrix achieved the highest accuracy for T = 12h (MAE value of 159.373). This suggests that the adjacency relationships and the most relevant gravity-based connections included in this model (but not in the GNNgr matrix) help to better capture long-term mobility behaviours. This is reasonable because for such a long-term prediction horizon most of the knowledge captured by the models might correspond to regular commuting patterns. Therefore, one can assume that most of such patterns coocurr in spatially adjacent regions or between regions with large populations.

Next, we compared the results of the GNN models with the baseline methods. Table 5 shows that all the alternative versions of the proposed GNN outperform the LSTM and naïve approaches. In contrast, some observe that the ARIMA algorithm obtains the lowest error in the three metrics. However, for ARIMA it is necessary to generate an individual baseline model for each MA whereas the GNN solution only involves a single model. Therefore, our GNN model would involve a much simpler infrastructure in a production setting than with using ARIMA.

Figure 10 shows the geographical distribution of the MAE for each of the GNN and baseline methods. One interesting finding when we observe this distribution, is that the prediction capability of all the evaluated models meaningfully varies depending on the particular area. Moreover in detail, the highest accuracy (lighter colors in the figure) is mostly achieved in the central regions of Spain, whereas the lower one (darker colors) mostly occurs at in the seaside coastal regions. Indeed, these These two types of regions have very different population profiles. Whilst Whereas the central regions of Spain usually have very low population densities (except for Madrid), many of the most crowded Spanish cities are located near the coast.

Fig. 10
figure 10

Geographical distribution of the MAE per model in the Spanish MAs for T= 1h

To expand this correlation among population and the accuracy of the models, we investigated the finding observed in Fig. 5a, i.e., the larger the population of an MA, the larger its flow of outgoing trips. Thus, Fig. 11 shows the MAE of each model regarding the magnitude of outgoing flows of the MAs. As it is shown, the accuracy of all the models degrades if the trips generated by the target MA increases. This is a common pattern in all the evaluated models.

Fig. 11
figure 11

Evolution of the MAE per model with respect the MA’s sheer number of outgoing trips for T= 1h

Last, Fig. 12 compares the total number of trips generated by all the MAs in the test dataset with the corresponding prediction of GNNgr. We have chosen this model as it was the GNN instance with better results (Table 5). Such a predicted value is computed as the predicted aggregates for each MA made by the GNN model at each time. As we can see, the model predicted the overall trend of trips with great detail.

Fig. 12
figure 12

Comparison of the total number of outgoing trips and the aggregated predictions of GNNgr for T= 1h

The results show that the GNN solution achieves a high accuracy for human mobility prediction on a nationwide scale. Our approach can anticipate the outgoing trips of MAs with an accuracy like models trained for each MA individually. That a single model is used for the overall prediction instead of individual models for each MA brings important benefits in terms of deployment and usage for real infrastructures.

In operational terms, the GNNgr model has an MSE of roughly 137 trips per hour (Table Table 5). Considering the volume of trips emitted by the MAs, this prediction deviation seems small enough to make the solution suitable for a real-world setting. Therefore, it could be used as part of a decision-support system to help authorities to order the pre-emptive closure of certain MAs in a pandemic scenario. Thus, the system could trigger an alarm when the predicted number of outgoing trips in certain hours of a region exceeds a threshold based on its current epidemiological state. Likewise, a similar approach could be used to proactively adapt the prices of certain long-distance means of transport of a region based on the predicted outgoing flow of travelers at certain hours.

5.6 Mobility prediction including weather condition

One key feature with an impact on the human mobility behavior in a region is its current weather condition [40]. Therefore, we enriched our prediction approach by considering the temperature at each MA as a new contextual input of the model.

Remembering the formulation of the problem stated in Section 4.1, in this case we intend to find a mapping function \(\mathcal {F}'\):

$${\mathcal{F}'(\mathcal{OT}_{i}^{h}, \mathcal{W}_{i}^{h}) \rightarrow ot_{i}^{t+T}}$$

where \( \mathcal {W}_{i}^{h} = \langle {w_{i}^{t}},w_{i}^{t-1},\)\(, w_{i}^{t-h_{prev}}\rangle \) is the temperature sequence of each MA during the last hprev hours, \(\forall m_{i} \!\in \! {\mathscr{M}}\).

To collect the meteorological data, we made use of the Reliable Prognosis web serviceFootnote 3. This platform provides an open repository with the meteorological conditions collected by multiple weather stations deployed at national and international airports worldwide. We extracted the temperature values covering the study period from weather stations in all the Spanish airports in the platform. Then, each MA was associated with its closest airport’s temperature sequence \(\mathcal {W}\). Fig. 13 shows the resulting association in which 55 airport locations were extracted.

Fig. 13
figure 13

Association between MAs and airports. Each black dot indicates the location of an airport. The MAs around each dot with the same color indicate the areas covered by the airport (the area of influence of the airport)

Next, we fed a new GNN. Unlike in the previous experiment, this model took as input a feature matrix \(\mathcal {OTW}_{t} \in \mathbb {R}^{N\times 12\times 2} \forall t \in \mathcal {T}\) comprising the last 12 outgoing trips and temperature values of the MAs at time instant t. Table 6 shows the parameters of the model obtained following a grilled-search approach with tr set to 0.9. Remembering the results discussed in Section 5.5, the model used the \(\mathcal {A}_{gr}\) matrix as the adjacency matrix.

Table 6 Model parameters for the experiments with weather data

Table 7 shows the metric values of the resulting model \(GNN_{gr}^{w}\). Regarding the results obtained by the baseline GNNgr model (Table 5), this novel approach does not improve the accuracy of the proposal. However, the new model yielded poorer results for all the metrics and time horizons. For instance, GNNgr obtained an MAE of 136.877 for T= 1h whereas \(GNN_{gr}^{w}\) obtained 206.388. Likewise, for T= 12h the baseline model obtained a MAE of 193.796 and the value of model enriched with weather data was 300.965.

Table 7 Metric values of \(GNN_{gr}^{w}\) for different time horizons (T)

To find the rationale of this accuracy degradation, the map in Fig. 14a shows in white those MAs where \(GNN_{gr}^{w}\) achieved better results than the baseline GNN model for all the time horizons. From this map, a clear pattern exists in the spatial distribution of those MAs. As we can see, most of these improved MAs are usually located close to the boundary of their airport’s area of influence. Therefore, these areas receive and emit flows from and toward other MAs with different temperature sequences. Hence, this enriches the model and improves its accuracy.

Fig. 14
figure 14

Geographical distribution of the MAs according to the \(GNN_{gr}^{w}\) results. The red dots indicate the location of the airports

Furthermore, Fig. 14b depicts in black those MAs where \(GNN_{gr}^{w}\) obtained a MAE, MSE or RMSE above the 90th percentile of such metrics given GNNgr. Here, we can also see a clear spatial pattern because most of these degraded MAs are in the center of their airports’ area of influence (instead of the borders). Therefore, these MAs are mostly influenced by surrounding areas whose temperatures are all the same as they belong to the same weather area defined by an airport. Furthermore, the new input layer comprising such temperatures just adds complexity to the model providing no relevant knowledge.

Besides, Fig. 15 shows the number of trips emitted by improved and degraded MAs. From this figure, it is possible to see that both groups of MAs emitted trips on a similar scale. Consequently, the improvement and degradation of the GNN accuracy do not seem influenced by such a factor.

Fig. 15
figure 15

Range of outgoing trips of the MAs based on whether they are improved by \(GNN_{gr}^{w}\)

Last, we can conclude the key drawback of including weather data for mobility prediction was that the spatial scale defining the outgoing trips differed from the scale used for the temperature data. Whereas the former was defined at MA granularity, the latter was based on the airports’ areas of influence which cover broader land areas. This mismatch allowed the model to not leverage the new input data source appropriately. One possible solution would be to inject a different weather data stream for each MA following the open-data approach this study claims. However, in operational terms, it would be rather difficult to find, collect and process such streams from numerous weather stations. Besides, we can expect not all the MAs to have a functional weather station, especially those involving rural and low-populated regions. Another solution would be the aggregation of trip flows, so they are defined in the same spatial scale than the temperature stream. Nonetheless, this would reduce the utility of the proposal as the spatial granularity of the predictions would not fit in many potential use cases.

6 Conclusion and future works

The study of human mobility is gaining momentum due to the continuous increase of data produced by different types of mobile devices, and for the socioeconomic benefits obtained from its range of applications; from traffic forecasting to urban planning, to the prediction of human flows in context of a pandemic, as with COVID-19. However, most studies focus on scenarios covering relatively small distances, as intra-urban scenarios. To extend the utility of human mobility data, this study deals with a nationwide dataset to analyze its feasibility to forecast large-scale human flows. We proposed the study of a dataset published by the SMT including trip data throughout Spain during a 9-month period.

The analysis of this large-scale trip data reveals that the connections between origin and destination areas could be considered as a sparse graph. Therefore, we propose the use of a GNN as a compelling alternative to forecast the number of outgoing trips within an hour from any mobility area identified in the dataset. With this GNN, several factors such as the spatial distance among areas, their population, or their spatial lag can be included in the study. The results show this technique achieves a high accuracy in the predicted number of trips textcolorbluefor multiple time horizons up to 12 hours, outperforming other techniques such as LSTM and with a similar performance to ARIMA, but with a notable advantage. Our proposal only needs a single model for processing the whole dataset, whereas ARIMA requires a different model for each mobility area in the study (up to 3216 in this study). This may be a determining factor when deploying the model for real infrastructure in terms of simplicity and computational efficiency. Last, the influence of weather conditions has been also studied for the mobility prediction, however the results do not show a general improvement over the baseline dataset due to the different granularity of both types of data.

In the future, we plan to study the accuracy of our model when applied to other nationwide datasets to validate the model at a general level. An extension of the forecasting period will also be explored in these experiments. Another line of research consists in combining the mobility dataset with data from Online Social Networks to check the feasibility of the latter as a mobility predictor.