1 Introduction

The number of air passenger trips has continued to increase in recent years. In 2023, the number of Chinese air passengers exceeded 620 million. The airport light rail transit line (ALRTL), as one of the airport's land-based transportation methods, typically handles 10–30% of passenger transfer operations [1], for example reaching 32.1% at Beijing Daxing International Airport [2]. The growing demand for air travel has posed a challenge for short-term forecasting in terms of passenger flow, which is a crucial aspect of ALRTL operations [3]. In urban rail transit systems, the short-term forecasting of passenger flow has garnered significant attention, but research on the ALRTL is still insufficient. Passenger flow is characterized by nonlinearity, non-stationarity, uncertainty, and periodicity [4]. However, due to the different composition, passenger flow differs between the ALRTL and normal urban rail transit, with strong weekend/weekday differences.

The passenger flow on the ALRTL can be further categorized into inbound flow and outbound flow. The inbound direction is linked to the arrival flight wave, which is usually used as the main feature for prediction, while the correlation between outbound passenger flow and departure flight waves is weaker, is more challenging, and requires further research.

Forecasting the outbound passenger flow of ALRTL is meaningful for both passengers and airport managers. The forecast results reflect the congestion level of the ALRTL, as the airport is one of the main transfer nodes, which may help passengers in planning travel in advance [5]. Meanwhile, the forecast results help airport managers deploy security measures in advance and avoid the risk presented by large passenger flow.

From the perspective of short-term passenger flow forecasting, the primary research avenues include time series techniques, artificial intelligence, and hybrid models combining multiple methods. The time series technique applies smoothing technology to reveal the potential trend, periodicity, and seasonality of data, and includes the autoregressive integrated moving average model (ARIMA) [6] and exponential smoothing methods [7]. In recent years, artificial intelligence methodologies such as statistical learning techniques [8] and artificial neural networks [9, 10] have gained widespread use. Liu et al. [11] introduced a forecasting model in which particle swarm optimization (PSO) was used for the long short-term memory (LSTM) model, to make predictions based on the automated fare collection (AFC) data. Within this model, PSO is employed for optimizing the hyperparameters of the LSTM, contributing to enhanced predictive performance. A study by Du et al. [12] still primarily revolves around the LSTM model. The combination of LSTM with deep irregular convolution residual (DST-ICRL) considers the irregularity of multiple channels. Artificial intelligence methods are more widely used and can adaptively learn parameters from data. Nonetheless, they often demand substantial amounts of data for effective learning.

To address the limitations of single models and enhance predictive performance, researchers in recent years have shifted their attention toward the integration of various models [13,14,15]. Combined models are those in which two or more models are combined to achieve higher accuracy in passenger flow prediction by learning from each other. This approach aims to leverage the strengths of different models by combining them, thereby achieving more robust and accurate predictions.

In multidimensional data scenarios, the various types of data must be processed by different models and then reproduced organically to formulate combined models. Ma et al. [16] used the daily and historical passenger flow data as input, applied the Kalman filter algorithm followed by the K-nearest neighbor fusion algorithm to formulate predictions, and weighted and regrouped the prediction results to construct a forecasting model. He et al. [17] introduced a combined model of multi-image convolution and a recurrent neural network (RNN) considering the temporal–spatial correlation between the stations of the urban metro to make predictions about passenger flow.

In one-dimensional data scenarios, the combined model uses different methods at different stages. Several scholars have proposed prediction models that combine coarse-grained and fine-grained data to obtain more accurate results locally. Jing et al. [18] proposed a hybrid model for short-term forecasting, including a lightweight gradient decision-making enhanced tree, LSTM, and dynamic regression approaches, for passenger flow at Chengdu East Railway Station.

The application of decomposition techniques to passenger flow time series is also a research hotspot, in which a combination of decomposition algorithms and predictive models are typically used, and where the decomposition algorithm is used to decompose simple periodic data from complex passenger flow time series, thereby reducing the cost of training. Fu et al. [19] presented a combination of ensemble empirical mode decomposition (EEMD) and backpropagation (BP) neural network approaches to forecast incoming passenger flow in a subway station. Li et al. [20] studied monthly granular passenger flow at railway stations and built the basic model with extreme learning machine (ELM) and BP. Then, pairing them in combination with empirical mode decomposition (EMD) or variational mode decomposition (VMD), the VMD-BP method performed the best in the comparison experiment. Chen et al. [21] used an EMD-LSTM hybrid model to forecast passenger flow, in which Kendall correlation analysis was used to select components with higher correlations.

To summarize, the combined model exhibits superior performance relative to the individual model. In this work, we concentrate on the short-term forecasting of passenger flow with nonlinearity, uncertainty, and periodicity in the ALRTL, and we propose a forecasting model that combines the Holt–Winters additive model (HWAM), EMD, and gated recurrent unit (GRU) model, which we call HWAM-EMD-GRU. Firstly, HWAM is used to extend passenger flow on the right side (the historical dataset is to extend the left side) to eliminate the edge effect of EMD. Secondly, the EMD method is applied to decompose passenger flow and obtain decomposed components including the intrinsic mode functions (IMFs) and residual (Res). Then, we use correlation analysis to incorporate low-correlation IMFs into the Res to form aggregated components. The aggregated components have simpler periodicity, which is easier to predict by general predictive models. Finally, the GRU network is employed for forecasting each component, and the predictions are aggregated and reconstructed to finalize predictions. To verify the predictive performance of the HWAM-EMD-GRU model, we performed empirical studies. Hourly data for outbound passengers from the Daxing Airport Station spanning 12 days were selected for this empirical study.

The remainder of this paper is structured as follows. In the next section, we propose a HWAM-EMD method to eliminate the edge effects and decompose the passenger flow. In Section III, we introduce the HWAM-EMD-GRU forecasting model and provide a general forecasting process. In this model, the GRU, as the predictive module, is used to forecast each component of the original passenger flow, which will be reconstructed to obtain the forecasting value. In Section IV, we conducted a case study of Beijing Daxing International Airport Express Daxing airport station to evaluate the predictive performance of the combined HWAM-EMD-GRU model. Section V presents the conclusion.

2 Hybrid HWAM-EMD Decomposition Method

In this section, we employ the EMD method to decompose the passenger flow to extract the underlying periodicity and trend, which can facilitate passenger flow forecasting. However, the edge effect of EMD will lead to data drift at both ends of the IMF components. To address this issue, we utilize the HWAM technique to extend the passenger flow data. Subsequently, we discard the extended part after the decomposition process to mitigate the impact of the edge effect.

2.1 Triple Exponential Smoothing Addition Model

Triple exponential smoothing is a time series prediction method proposed by Holt and Winters [22], which is widely used in time series forecasting when the series has trend characteristics and seasonal characteristics. The seasonal feature takes the operating time of the ALRTL in one day as a cycle. This feature remains roughly constant over time, so we chose the additive model, called the Holt–Winters additive model (HWAM), whose core formula is as follows:

$$\hat{y}_{i + h|i} = l_{i} + hb_{i} + s_{{i - m + 1 + \left[ {\left( {h - 1} \right){\text{ mod}} m} \right]}}$$
(1)

The level component \(l_{i}\) is expressed as

$$l_{i} = \theta_{l} \left( {y_{i} - s_{i - m} } \right) + \left( {1 - \theta_{l} } \right)\left( {l_{i - 1} + b_{i - 1} } \right)$$
(2)

The trend component \(t_{i}\) is expressed as

$$t_{i} = \theta_{t} \left( {l_{i} - l_{i - 1} } \right) + \left( {1 - \theta_{i} } \right)b_{i - 1} { }$$
(3)

The seasonal component \(s_{i}\) is expressed as

$$s_{i} = \theta_{s} \left( {y_{i} - l_{i - 1} - b_{i - 1} } \right) + \left( {1 - \theta_{s} } \right)s_{i - m}$$
(4)

The initial value condition is expressed as

$$s_{0} = y_{0}$$
(5)

where \(i, h\) is the index of time series, \({y}_{i}\) is the historical passenger flow at time \(i\), \({\widehat{y}}_{i+h|i}\) is the prediction at time \(i+h\), which is predicted using the data at time \(i\), and \(m\) is the seasonal frequency; as there are 52 weeks in a year, \(m=52\). \({\theta }_{l},{\theta }_{t}, {\theta }_{s}\) are smoothing parameters, which are level, trend, and seasonal, respectively.

2.2 Empirical Mode Decomposition Method

The EMD [23] technique can adaptively decompose a signal into its constituent IMFs and a Res. The IMFs capture the different scales or modes of variation present in the data. The Res represents the remaining trend or low-frequency components that cannot be further decomposed into IMFs.

Application of the IMFs satisfies the following two characteristics [23]:

2.2.1 Extreme and Zero-Crossing Points

Extreme points are local maxima and minimal points in the original signal, and zero-crossing points are the points where the signal changes sign. In EMD, the necessary conditions for a signal to be considered an IMF is that the difference between the number of extreme and zero-crossing points should be at most 1.

2.2.2 Upper and Lower Envelopes

EMD also involves the determination of upper and lower envelopes. The cubic spline interpolation method is applied to identify the upper and lower envelopes of the original signal, in which the upper (lower) envelope is determined by identifying the local maxima (minima) of the data, and the average value should be zero.

The hourly passenger flow data of the ALRTL can be treated as a time series, which can be decomposed into IMFs and Res. IMFs are independent of each other, and the quantity is determined by the local characteristic time scale of the passenger flow; Res retains the trend information of the original signal and the period information beyond the current data scale. Applying EMD technology, the passenger flow data \(y(\tau )\) can be decomposed into \(n\) IMFs and one residual Res, expressed as follows:

$$y\left( \tau \right) = \mathop \sum \limits_{j = 1}^{n} c_{j} \left( \tau \right) + r\left( \tau \right)$$
(6)

where \(y(\tau )\) is a time series, \(\tau\) is the index of time, \({c}_{i}\left(\tau \right)\) is the IMF, \(r(\tau )\) is the Res and \(j\) is the index of the IMFs.

2.3 Empirical Mode Decomposition Edge Effect

The EMD edge effect [24] is manifested as data drift at both ends of the IMF component, since the two ends of the series may not be exactly extreme points when using cubic spline interpolation. This results in envelope distortion and affects the accuracy of the IMF components at both ends, which in turn will cause a large deviation in the subsequent prediction results of each component. To solve the edge effect of EMD, researchers have proposed a diverse range of continuation methods, including the linear regression method [25], the Coughlin method [26], and the Rato method [27]. The principle of these methods is to extend the original signal, then decompose it by EMD and delete the part of the IMFs corresponding to the range of extension, thereby reducing the influence of EMD edge effects on the signal decomposition results.

To address the specific characteristics of hourly passenger flow data on the ALRTL, this paper introduces a bidirectional extension technique to eliminate the edge effect utilizing HWAM technology and historical data. The left edge extension can be filled with one period (usually 24 hours) of historical data, and the right edge extension can be filled with predictions. Utilizing the HWAM method, we employ all available historical data in the research period (including the data of the left edge) to forecast one period of data as the extension of the right edge. The hourly passenger flow of the ALRTL has obvious periodicity, so the HWAM method can obtain higher prediction accuracy.

3 HWAM-EMD-GRU Combined Model

In this section, we propose the framework of the HWAM-EMD-GRU combined forecasting model. Using the hourly flow as the input, the framework applies the HWAM-EMD method for decomposition, generating multiple IMF components and a Res component. Next, a correlation analysis is conducted between the original passenger flow, its IMFs, and its Res. Components with high correlation are retained, while components with low correlation are merged. Finally, the individual predictions of each reconstructed component are made using the GRU. The predictions of each component are added together to obtain the prediction of the original passenger flow.

3.1 GRU Neural Network

The GRU network as proposed by Cho et al. [28] is a variant of RNN, but more lightweight, and is widely applied in predictive applications such as time series. As a variant of LSTM, GRU also overcomes the long-term dependence and the gradient in direction propagation, and has better performance on time series prediction problems [29]. However, the GRU can achieve predictive performance comparable to LSTM with less computational overhead.

The GRU unit structure is shown in Fig. 1. Its main structure is a gate structure, which is a combination of an activation function and a matrix bitwise multiplication operator, including an update gate and a reset gate.

Fig. 1
figure 1

The structure of the GRU cell

The function of the update gate \({z}_{t}\) is analogous to the combination of the input gate and forget gate in LSTM, which is used to filter the information of the previously hidden layer and pass it to the next layer. The reset gate \({r}_{t}\) is applied to determine the volume of forgotten information. The functions of these gates are expressed as follows:

$$r_{\tau } = \sigma \left( {w_{r} \left[ {h_{\tau - 1} ,x_{t} } \right] + b_{r} } \right)$$
(7)
$$z_{\tau } = \sigma \left( {w_{z} \left[ {h_{\tau - 1} ,x_{t} } \right] + b_{z} } \right)$$
(8)

where \({h}_{\tau -1}\) is the hidden state of time \(\tau -1\), \({x}_{t}\) is the time series, \(\sigma\) is the sigmoid function, \({w}_{r},{w}_{z}\) is the weight of the reset and update gate, and \(b\) is the bias.

The hourly passenger flow time series data of ALRTL has obvious periodicity, so RNN has great adaptability in passenger flow forecasting. Considering the limited availability of historical data in this study, we chose to use the GRU network, which is more lightweight.

Each component of the original passenger flow time series is also a time series with a relatively certain periodicity, so GRU can also be used for prediction. Since the periodic regularity after decomposition is simpler than that before decomposition, when the decomposed hourly passenger flow data is predicted by GRU, the model can be trained more fully, and better prediction accuracy can be obtained with the same amount of data.

The GRU network requires the determination of four key parameters, including lookback and the size of all layers. In the input layer, the size is set to 1, which corresponds to the dimension of the features employed for prediction; in the output layer, the size is set to 1, which corresponds to the dimension of labels for prediction; in the hidden layer, the size is set according to the experiments; the lookback is the amount of historical data used for prediction, which also must be determined experimentally.

Determining the optimal values for the lookback and the size of the hidden layer is crucial for achieving high prediction accuracy. However, it is important to note that the relationship between these parameters and prediction accuracy is not linear. Additionally, increasing the number of neurons can lead to higher computational overhead. Hence, it is essential to perform repeated experiments to determine the optimal values.

To discover the optimal combination of parameters, the root mean square error (RMSE) is commonly used as the selected index, shown as follows:

$${\text{RMSE}} = \sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2} }$$
(9)

where \({y}_{i}\) is the actual hourly flow of the ALRTL in time \(i\), \({\widehat{y}}_{i}\) is the predicted flow of the ALRTL in time \(i\), and \(n\) is the time scale of the study.

3.2 HWAM-EMD-GRU Forecasting Model

The HWAM-EMD-GRU is proposed to realize the short-term forecasting of outboard passenger flow in the ALRTL. The model is composed of three basic methods: HWAM, EMD, and GRU. The EMD method decomposes the original hourly passenger flow data into multiple components with simple periodicity and a longer-term trend, which are easier to predict than the original data. The HWAM model is used to extend the EMD model to eliminate the component data distortion caused by the edge effect. The algorithm flow of the HWAM-EMD-GRU combined model is as follows:

Step 1. Time series extension of hourly flow:

Extend the flow in both directions. The left extension uses a period of historical data, and the right extension uses the prediction result of the HWAM method.

Step 2. Time series decomposition of hourly passenger flow by EMD:

Decompose the extended data into several IMFs and a Res.

Step 3. Extension part cutting of decomposed data:

Remove the extension part of the decomposed data.

Step 4. Important component identification:

Apply correlation analysis to identify important components. The Pearson correlation coefficient between the original data and each component is calculated separately, and the components with weak correlations are merged into Res.

Step 5. Component prediction and reconstruction:

GRU is used to predict the components separately, and then the predicted values of all components are reconstructed to form the predictions.

4 Experiment

In this section, to evaluate the predictive performance of the HWAM-EMD-GRU combined model, we conducted a numerical experiment on the hourly outbound passenger flow desensitized dataset for Daxing Airport Station in Beijing Daxing International Airport Express. Firstly, we performed data cleaning on the desensitized dataset. Subsequently, following the prediction framework of the HWAM-EMD-GRU combined model, we conducted predictive experiments on the dataset. During the training process, we focused on optimizing certain hyperparameters. To determine the best combination of hyperparameters, we utilized a grid search method.

To fully validate the performance of the prediction framework, we also performed comparative experiments. We compared HWAM-EMD-GRU with the classic models GRU [28], ARIMA [6], HWAM [22], and Prophet [30] by applying these models to the same dataset for forecasting, serving as baseline models.

4.1 Passenger Flow Desensitized Dataset

The data in this article use the hourly flow of outbound passengers at the Daxing Airport Station of the Beijing Daxing International Airport Express from September 7 to 18, 2020, for a total of 12 days.

Considering the security of passenger travel information, the dataset is aggregated and desensitized. The details of the dataset are shown in Table 1. STAIONNAME is the name of the station in the Beijing Daxing International Airport Express; STARTTIME is the start time of the time slot (e.g., “2020-08-01 02” means that these data reflect the passenger flow between 2 o’clock and 3 o’clock on August 1, 2020); INSTATIONFLOW and OUTSTATIONFLOW are the numbers of inbound and outbound passenger flow, respectively, aggregated from AFC data. Due to privacy concerns, the original AFC data have not been provided.

Table 1 Details of desensitized data

4.2 Desensitized Data Preprocessing

Although the data have been processed in terms of aggregation and desensitization, numerous issues still persist for technical reasons. Hence, preprocessing of the data is crucial for enhancing data quality, which in turn benefits subsequent predictions. The preprocessing is performed as follows:

4.2.1 Data Filtering

For selecting the outbound passenger flow of Beijing Daxing International Airport Express at Daxing Airport Station from the desensitized dataset, we selectively preserve the data within the operating hours (6:00–23:00) and discard the data for other periods, which consistently remain at 0.

4.2.2 Outlier Detection

The interquartile range (IQR) is used to detect the outlier, and the outlier data will be set as missing values.

4.2.3 Missing Data Imputation

The Lagrange interpolating polynomial is utilized for missing data imputation in the raw data and the missing value marked in the outlier detection step.

4.2.4 Dataset Partitioning

The desensitized passenger flow data from September 7 to 16, 2020, are used as the training set for a total of 10 days, where the data for September 7 are set as the left extension; the data from September 17 to 18, 2020, are used as the test set for a total of 2 days.

4.3 Evaluation Metrics

To assess the predictive performance of the HWAM-EMD-GRU combined model, besides RMSE, we also considered the following three performance metrics:

Mean squared error (MSE):

$${\text{MSE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2}$$
(10)

Mean absolute error (MAE):

$${\text{MAE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {y_{i} - \hat{y}_{i} } \right|$$
(11)

Mean absolute percentage error (MAPE) [31]:

$${\text{MAPE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \frac{{\left| {y_{i} - \hat{y}_{i} } \right|}}{{y_{i} }} \times 100\%$$
(12)

4.4 Ablation Study

To investigate the impact of the HWAM method used in the forecasting model on the predictive performance, an ablation study was conducted by canceling the HWAM relative component shown in Section 3.3. Specifically, we cancel Step 1 and Step 2 in the HWAM-EMD-GRU model to obtain the EMD-GRU model. The evaluation metrics of the ablation study are shown in Table 2, and the effect of HWAM on improving the predictive performance is verified.

Table 2 Evaluation metrics of the ablation study

4.5 Experimental Process

Figure 2 shows the bidirectional extension of the passenger flow by applying HWAM technology.

Fig. 2
figure 2

Bidirectional extension of original passenger flow

The extended hourly passenger flow is decomposed by EMD, as shown in Fig. 3. The hourly flow time series are decomposed into five IMFs and a Res. The IMF components show different periodic characteristics of the original series, while the Res component retains the trend. The decomposition results also further show that the edge effect of EMD, especially IMF 1 and IMF 2, are more obvious, and the left and right ends are severely distorted. Applying this distorted data will reduce the prediction accuracy when making predictions by GRU.

Fig. 3
figure 3

IMF and Res of decomposed passenger flow time series

We compute the Pearson correlation coefficient for each component with respect to the original hourly passenger flow data and perform a correlation test as depicted in Table 3. The results show that the correlation between the various IMF components is low, which is consistent with the relatively independent characteristics of IMFs after EMD decomposition. IMF 1 to IMF 4 have strong correlations with the original data, while IMF 5 and residual Res have low correlations. Therefore, IMF 1 to IMF4 mark input components \({C}_{1}\) to \({C}_{4}\), respectively, and IMF 5 and residual Res will be merged and mark input component \({C}_{5}\).

Table 3 Pearson correlation coefficient

GRU is employed for predicting each component as well as directly forecasting the original series, serving as a baseline.

We set one layer of GRU hidden layers, with a dropout layer to avoid overfitting, and the random inactivation probability is set to 0.1. The excitation function is set to the linear rectification (ReLU) function. The initialization learning rate is 0.001, and Adamax is considered as the optimizer. We adopt the learning rate scaling mechanism and set the factor to 0.9 and step size to 100, which means that the learning rate will be adjusted every 100 training steps by multiplying it by 0.9.

The parameters are used with the same settings for both component prediction and original series prediction. Other hyperparameters of each component of GRU and original series, including epoch, batch size, lookback, and the size of hidden, are determined through experiments.

We conduct a grid search experiment to determine the optimal hyperparameter combination on the training set. Specifically, we set the RMSE as evaluation metrics. The search space is set as follows: [100, 200] for epoch, [6, 12, 18, 24] for batch size, [1, 2, …, 20] for lookback, and [8, 16, 32, 64] for hidden layer size. The grid search results are shown in Table 4.

Table 4 Optimal hyperparameter combination for each component

4.6 Comparison and Analysis

Figure 4 shows the comparison results between the HWAM-EMD-GRU and other baselines; forecasting passenger flow curves are plotted in the same figure. Except for the ARIMA model, these models exhibit favorable predictive performance, indicating their compliance with the patterns of outbound passenger flow in the ALRTL.

Fig. 4
figure 4

Comparison of prediction results between HWAM-EMD-GRU combined mode and baseline

The evaluation metrics are shown in Table 5. In the comparison of GRU with other baselines in short-term ALRTL passenger flow, the predictive performance of GRU is worse than Prophet and HWAM in almost all indexes. This phenomenon may be caused by the small size of the training set (even though we deliberately increased the proportion of the training set during the data partitioning) and insufficient training of the artificial neural network, leading to underfitting.

Table 5 Comparison of evaluation metrics between HWAM-EMD-GRU and baseline

Comparing the predictive performance of HWAM-EMD-GRU and other baselines, the HWAM-EMD-GRU model is superior in all indexes. The evaluation metrics of HWAM-EMD-GRU are much lower than Prophet, which is the best model among the baselines of this paper. Specifically, the MSE is 42.56% lower, the RMSE is 24.21% lower, the MAE is 16.01% lower, and the MAPE is 8.41% lower. Thus, the predictive performance of the HWAM-EMD-GRU combined model is demonstrated by the above evaluation metrics. The results also indicate that HWAM-EMD can decompose the original signal into multiple simpler signals, reducing the complexity of fitting the components and thus reducing the data volume requirements when applying GRU for prediction.

5 Conclusion

EMD technology was employed to decompose the original time series of the ALRTL passenger flow into multiple components. To address the endpoint effect inherent in EMD, HWAM extension was utilized. Subsequently, GRU was employed to independently predict each component, and by reconstructing the predictions, the forecasted passenger flow of the ALRTL was obtained.

Experiments were conducted on the hourly outbound passenger flow dataset for Beijing International Daxing Express Daxing Airport Station, and verified that the accuracy of the HWAM-EMD-GRU is much higher than the baselines in the short-term passenger flow forecast. The bidirectional extension employed in this study effectively addresses the edge effect and improves the decomposition quality of each component. Another notable advantage of the HWAM-EMD-GRU model is its ability to achieve high accuracy in passenger flow prediction using minimal historical data, without the need for other features.

In future research, we will further evaluate ALRTL passenger flow forecasting at finer granularity. At the same time, it is also possible to consider collecting multidimensional data and using multisource data fusion technology to obtain higher prediction accuracy.