1 Introduction

The Aviation industry faces challenges of air traffic congestion and reduced flight operation efficiency [1]. These issues stem mainly from the demand for flights surpassing the capacity of available airspace and airport accommodation [2]. As an essential component of air traffic management (ATM), air traffic flow management (ATFM) is designed to achieve demand–capacity balancing (DCB) [3]. Conceptually, ATFM encompasses three distinct phases based on the time of implementation [4]: (1) strategic planning (a few months ahead) involving measures such as runway expansion [5] and shorter separation standards [6]; (2) pre-tactical planning (1 day ahead) that includes traffic flow and sector splitting [7]; and (3) tactical planning (on the day of implementation) that entails aircraft sequencing and re-sequencing during flight operations [8]. The traditional ATFM methods include group delay programs [9, 10], airport surface management [11,12,13], flight rerouting [14, 15], flight scheduling [16, 17], and flight sequencing [18, 19].

Air traffic flow is an important indicator for smooth flight operation. Therein, accurate traffic flow prediction can help identify air traffic operation bottlenecks and serve as the prerequisite and basis for effective ATFM [20]. Traditional air traffic flow prediction methods refer to the approaches used before the advent of modern data-driven and machine learning techniques, which rely on analytical and statistical approaches to forecast air traffic flow, such as time-series analysis, regression analysis, exponential smoothing, moving averages, seasonal decomposition, and historical averages [21,22,23,24]. Generally, traditional methods are valuable for predicting air traffic flow when historical data are limited, and simpler models are preferred due to ease of implementation and interpretability. However, traditional methods may be less capable of capturing complex patterns and relationships present in large and dynamic datasets due to the simplified linear relationship assumptions [25]. In contrast, data-driven and machine learning techniques have revolutionized air traffic flow prediction, allowing for more accurate and adaptive forecasts. These methods utilize historical data, real-time information, and various features to learn patterns and relationships in air traffic flow. Notably, deep learning neural networks, including convolutional neural networks (CNN), recurrent neural networks (RNN), and long short-term memory (LSTM) networks, are commonly used for air traffic flow prediction [26,27,28,29,30]. These models can capture complex spatiotemporal dependencies and have shown promising air traffic flow prediction.

Airspace is divided into sectors based on various factors, such as geographical location, traffic density, and complexity of the airspace. The sector capacity determines the number of aircraft that can be safely handled within a sector at a given time, depending on factors such as airspace configuration, available resources, and controller workload [31]. Airspace sector management plays a crucial role in flow management, especially during high-traffic periods or in congested areas. Clearly, air traffic flow prediction in sectors is crucial for managing airspace capacity, balancing traffic flows, ensuring safety, optimizing sector utilization, and enabling collaborative decision-making among stakeholders [32]. Accurate predictions can help authorities proactively manage air traffic, enhance operational efficiency, and maintain a safe and orderly flow of aircraft through the airspace. Figure 1 shows the schematic diagram of the air traffic flow management.

Fig. 1
figure 1

Schematic diagram of air traffic prediction and air traffic flow management

However, sector-based traffic flow prediction is a complex task that comes with several challenges. First, air traffic flow in sectors can be highly variable and is influenced by factors such as weather conditions, peak travel times, special events, and unforeseen incidents. Predicting the flow of traffic under such dynamic conditions requires robust models that can adapt to changing patterns. Then, air traffic flow exhibits nonlinear dependencies, where the relationships between input features and traffic flow can be complex and nonlinear. Traditional linear models may not capture these intricate dependencies effectively, necessitating the use of more sophisticated machine learning techniques. Additionally, there are temporal dependencies due to the sequential nature of flight operations. Capturing these dependencies accurately requires specialized modeling techniques. Last, the location and size of sector boundaries and the complicated and unique internal airway structure within the sector can also affect the flow of traffic. The traffic flow of one sector is also impacted by interactions with adjacent downstream and upstream sectors. Designing sector-specific prediction models that take into account the topological structure characteristics and traffic flow patterns of the sector and tailor predictions for individual sectors is an area that requires exploration. These challenges arise due to the dynamic and generally unpredictable nature of air traffic, as well as the need for accurate and timely forecasts several steps ahead to ensure safe and efficient airspace and air traffic flow management [33].

The utilization of deep learning technologies, specifically those rooted in graph neural networks, has found extensive applications across diverse domains. These include but are not limited to natural language processing [34, 35], computer vision [36, 37], recommendation systems [38, 39], graph analysis [40, 41], and traffic prediction [42, 43].

Within the realm of both road traffic and air traffic domains, to comprehensively model the spatiotemporal correlation features of the prediction target, graph convolutional network (GCN) combined with long short-term memory (LSTM) methods find application in forecasting traffic metrics such as flow, speed, and traffic complexity. Li et al. [44] integrate GCN and LSTM models to extract spatial–temporal traffic features and then implement a soft attention mechanism to make final road traffic flow prediction. He et al. [45] validated the effectiveness of GC-LSTM in capturing spatial and temporal characteristics, intra-station correlations, and exogenous factors for passenger flow forecasting in high-speed rail networks. Guo et al. [46] combined GCN and LSTM and build the Seq2Seq model to predict multi-step road traffic speed. Du et al. [47] employed a fusion of GCN and gated recurrent units (GRU) for forecasting traffic flow across multiple airports. Li et al.[48] combined graph convolutional modules with attention-based temporal convolutional modules to formulate a prediction model for airspace complexity.

Within the realm of sector traffic prediction, graph neural networks showcase substantial versatility, forming the core of our investigation. In response to the aforementioned research challenges, we proposed a cutting-edge attention-enhanced graph convolutional long short-term memory network (AGC-LSTM) model for short-term multi-step-ahead sector flow prediction, combining the attention mechanism with the graph convolution layer and capturing temporal dependencies of flight data using the LSTM layer.

The major contributions and highlights of this study can be summarized as follows:

  1. (1)

    The topological structures of spatial route segments both within and beyond a sector are considered by the study. These structures are used as inputs to construct a graph representation, owing to the spatiotemporal correlations observed in traffic flow data between the adjacent route segments inside the sector, as well as those upstream and downstream of the focal sector. By utilizing the traffic data from those related route segments, a comprehensive representation and prediction of sector traffic flow are achieved. This approach effectively captures the influence of both traffic complexity and sector airspace structure complexity on sector-level traffic flow, consequently enhancing the accuracy of predictions at the sector-wide level traffic flow.

  2. (2)

    The AGC-LSTM model integrates the attention-enhanced graph convolutional network and the long short-term memory network. The graph convolutional layer employs the multi-head attention mechanism to capture the multiple spatiotemporal dependencies of sector-based traffic flow, and the LSTM layer is applied to capture the temporal dependencies.

  3. (3)

    We evaluate our approach using the typical sector traffic datasets. The proposed model can generate more accurate predictions on air traffic flows than the baseline models, which has the potential to help air traffic control officers (ATCOs) manage air traffic flow efficiently.

The rest of this paper is organized as follows. Section 2 reviews the related studies on air traffic prediction. Section 3 proposes the AGC-LSTM model for airspace sector-based traffic flow prediction. Section 4 introduces the flight data and generates the network graphs and demonstrates the experimental results of the proposed model. Section 5 discusses the limitations and contributions of this study, and also the scalability of the model to enable the expansion of application to relevant field. Finally, Sect. 6 provides a summary of the research conducted in this paper and outlines future research directions.

2 Related works

Air traffic flow prediction constitutes a crucial component of ATFM [49]. This section briefly reviews the related works from two aspects, i.e., airport-based traffic flow prediction and sector-based traffic flow prediction.

2.1 Airport-based traffic flow prediction

Airport-based traffic flow prediction is of significant importance for efficient air traffic management and airport operations [50]. Airports have a limited capacity to handle a certain number of flights and passengers within a given timeframe [51]. Airport flow prediction allows airport operators to allocate resources optimally such as runways, shuttles, gates, and taxiways [52]. Consequently, airport flow prediction directly impacts the passenger experience. By accurately estimating the flow of flights, airports can provide real-time information about expected wait times, gate changes, and potential disruptions. This allows passengers to plan their journeys better, manage their time, and navigate the airport more efficiently, reducing stress and enhancing the overall passenger experience [53, 54]. Additionally, many busy airports use slot management systems to schedule and allocate arrival and departure slots to airlines. Airport flow prediction plays a crucial role in determining the availability of slots and optimizing their allocation [55]. By accurately predicting the flow of flights, airport authorities can make informed decisions about slot assignments, reducing delays and maximizing the utilization of available slots [56]. Besides, air traffic control officers (ATCOs) rely on accurate arrival flow prediction to maintain safe separation between arriving aircraft [57]. By knowing the estimated arrival times and sequence of flights, controllers can plan and execute the necessary air traffic control instructions, including sequencing, spacing, and vectoring of aircraft for a safe and orderly flow of arrivals [58].

In recent years, there has been a surge in research focused on air traffic flow prediction at airports, with machine learning and deep learning models gaining popularity due to their superior prediction accuracy and learning capabilities [59]. Li and Wang [60] utilized the stacked automatic coding machine model, the long and short memory network (LSTM) model, and the control gate recursion model to predict short-term traffic flow at capital airports. Similarly, LSTM-based air traffic flow prediction has been explored for Diyarbakır Airport [61]. Recognizing the impact of meteorological conditions, Yang et al. [62] proposed a combined LSTM and extreme gradient boosting method for predicting airport flight arrival flow. Zhu et al. [63] introduced a novel graph attention RNN model to forecast short-term airport throughput over a national air traffic network. Building on the strength of residual neural networks, GCN, and LSTM, Zang et al. [27] developed a deep learning architecture for predicting the spatiotemporal distribution of traffic flow at the airport network level. Considering the influence of the topological airport network, Yan et al. [64] introduced an airport traffic flow prediction network designed to capture spatial–temporal dependencies of historical airport traffic flow (departure and arrival) for multiple step situational (network-level) arrival flow predictions. To model network-wide spatial dependencies among airports based on flight duration and flight schedule factors, a multi-view attention-based spatial–temporal network was presented [65]. Addressing heterogeneous and dynamic network dependencies, Yan et al. [66] proposed a novel large-range air traffic flow prediction model to improve airport arrival flow prediction.

2.2 Sector-based traffic flow prediction

Air traffic flow prediction at sectors is essential for efficient air traffic management and ensuring the safe and orderly flow of aircraft through specific airspace sectors. Each sector within the airspace has a limited capacity to handle a certain number of aircraft at a given time [67]. Therefore, accurately predicting the flow of air traffic in sectors can help prevent traffic congestion, flight delays, and airspace saturation, ensuring that aircraft can flow smoothly and safely through sectors. In addition, sector flow prediction can contribute to balancing the flow of air traffic among different sectors. By forecasting the expected demand and traffic volume in each sector, ATCOs can adjust the flow of aircraft [68], distribute the workload evenly [69], and detect potential aircraft conflicts [70, 71]. Besides, sector flow prediction can help authorities optimize the utilization of available airspace capacity by opening or closing certain routes or sectors, adjusting sector boundaries, or implementing flow management measures to accommodate the predicted traffic flows effectively [72].

Airspace operation complexity evaluation is significant for ensuring flight safety [73], optimizing airspace capacity [74], improving traffic flows [75], and supporting decision-making [76]. By evaluating and understanding complexity, authorities can implement measures and strategies that contribute to efficient and safe airspace operations. Accordingly, Shi-Garrier et al. [77] adopted a novel encoder–decoder LSTM neural network to predict ATC tasks based on the presented intrinsic complexity metric. Furthermore, a novel end-to-end learning framework was introduced by Xie et al. [78] to assess sector operation complexity. This approach employed a deep CNN to transform air traffic data into images, marking the first application of this technique for comprehensive complexity analysis. Subsequently, Sui et al. [79] extended the study by abstracting the multi-sector airspace scenario as an undirected graph. They then introduced a spatiotemporal GCN model to capture the correlations between changes in sector operational conditions over time and space. Xu et al. [80] proposed a Bayesian ensemble graph attention network for predicting stochastic traffic density near the terminal. Their model accounted for the intricate spatial–temporal variations in traffic patterns and considered the inter-dependencies within air traffic networks.

Currently, sector flow prediction has been carried out based on GCN [81], supervised learning [82], machine learning algorithms, RNN, and LSTM [83]. Moreover, researchers have explored various approaches to capture meaningful spatiotemporal correlations within high-dimensional feature space for traffic flow prediction. These methods include an end-to-end deep learning-based model [26], a three-dimensional CNN [84], and several machine learning models [85]. In the context of traffic flow coordination at major intersections, the flow-centric paradigm has been utilized to aid controllers in effectively managing intersecting traffic movements [86]. In line with this, Delahaye et al. [87] presented a transformer neural network model for flow prediction at coordination points. Additionally, studies have approached air traffic flow prediction from the perspective of air traffic flow networks. For instance, in enroute airspace, a dynamic network-based approach has been employed to achieve short-term air traffic flow prediction, characterizing the topological structure of airspace and the dynamics of air traffic flow [88]. Zhang et al. [89] proposed a hybrid model based on fuzzy c-means and GCN to capture the upstream and downstream dependencies within air traffic flow networks. Similarly, Cai et al. [90] introduced a temporal attention aware dual-graph convolution network for predicting air traffic flow, considering the airspace structure and routes of air traffic flows. Unlike these studies, this article will apply the AGC-LSTM method taking into account multiple spatial–temporal dependencies including spatial adjacency, and long-term and short-term temporal dependencies. The graph is constructed with route segments inside both the focal sector and its surrounding upstream and downstream sectors to capture the complex impact of inner traffic flow and airspace structure dependency characteristic on the sector traffic flow.

3 Methodology

Firstly, this section provides an introduction to the method of spatiotemporal feature extraction. Following that, a comprehensive explanation of the AGC-LSTM model is presented.

3.1 Spatiotemporal feature representation and graph modeling

This study aims to predict the traffic flow within the sector for future time intervals. To achieve this, the task involves learning a mapping function that calculates the traffic flow \(Y\) for the upcoming \(Q\) time steps, based on the topological structure \(G\) of the flight route segment network and the feature matrix \(X\) of the preceding \(P\) time steps. The schematic diagram of this process is depicted in Fig. 2, and the model function is expressed as follows:

$$\left[ {Y_{t + 1} , \cdots ,Y_{t + Q} ,} \right] = f\left( {G;\left( {X_{t - P + 1} , \cdots ,X_{t - 1} ,X_t } \right)} \right)$$
(1)
Fig. 2
figure 2

Graph to sequence learning for multi-step sector traffic flow prediction

The spatial relationship between segments is transformed into an adjacency matrix \(A^{n \times n}\) as shown in (2), where \(n\) is the number of flight route segments, and the values in the matrix represent the connectivity between segments. The topological structure of the neural network \(G\) is established based on \(A^{n \times n}\). This matrix was constructed to describe the features of each node in \(G\). Each row of the matrix represents a flight route segment, and each column represents the time dimension. Vector \(\left( {{\text{rs}}_{{\text{ij}}} ,{\text{ra}}_{{\text{ij}}} ,w_{{\text{ij}}} } \right){ }\) in each element of the matrix represents features for each flight route segment at different time intervals, where \({\text{rs}}_{{\text{ij}}} { }\) and \({\text{ra}}_{{\text{ij}}}\) represent the scheduled and actual traffic flows of the route segments, respectively, and \(w_{ij}\) representing weekly periodicity of the flight schedules. Ultimately, a comprehensive input feature matrix \(X^{n \times t \times 3}\) is obtained as shown in (3), where \(t\) is the number of the time slot, and 3 is the number of input features in our model. The output target variable vector \(Y^{1 \times (t - P)}\) is shown in (4), where \(k\) is the input sequence length.

$$A = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {a_{11} } & {a_{12} } \\ \end{array} } & \cdots & {a_{1n} } \\ {\begin{array}{*{20}c} {a_{21} } & {a_{22} } \\ \end{array} } & \cdots & {a_{2n} } \\ {\begin{array}{*{20}c} {\begin{array}{*{20}c} \vdots & { \vdots } \\ \end{array} } \\ {\begin{array}{*{20}c} {a_{n1} } & {a_{n2} } \\ \end{array} } \\ \end{array} } & {\begin{array}{*{20}c} \ddots \\ \cdots \\ \end{array} } & {\begin{array}{*{20}c} \vdots \\ {a_{nn} } \\ \end{array} } \\ \end{array} } \right]$$
(2)
$$X = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\left( {{\text{rs}}_{11} ,{\text{ra}}_{11} ,w_{11} } \right)} & {\left( {{\text{rs}}_{12} ,{\text{ra}}_{12} ,w_{12} } \right)} \\ \end{array} } & \cdots & {\left( {{\text{rs}}_{1t} ,{\text{ra}}_{1t} ,w_{1t} } \right)} \\ {\begin{array}{*{20}c} {\left( {{\text{rs}}_{21} ,{\text{ra}}_{21} ,w_{21} } \right)} & {\left( {{\text{rs}}_{22} ,{\text{ra}}_{22} ,w_{22} } \right)} \\ \end{array} } & \cdots & {\left( {{\text{rs}}_{2t} ,{\text{ra}}_{2t} ,w_{2t} } \right)} \\ {\begin{array}{*{20}c} {\begin{array}{*{20}c} \vdots & { \vdots } \\ \end{array} } \\ {\begin{array}{*{20}c} {\left( {{\text{rs}}_{n1} ,{\text{ra}}_{n1} ,w_{n1} } \right)} & {\left( {{\text{rs}}_{n2} ,{\text{ra}}_{n2} ,w_{n2} } \right)} \\ \end{array} } \\ \end{array} } & {\begin{array}{*{20}c} \ddots \\ \cdots \\ \end{array} } & {\begin{array}{*{20}c} \vdots \\ {\left( {{\text{rs}}_{nt} ,{\text{ra}}_{nt} ,w_{nt} } \right)} \\ \end{array} } \\ \end{array} } \right]$$
(3)
$$Y = \left[ {\begin{array}{*{20}c} {{\text{sr}}_{1,P + 1} } & {{\text{sr}}_{1,P + 2} } & {\begin{array}{*{20}c} \cdots & {{\text{sr}}_{1,t} } \\ \end{array} } \\ \end{array} } \right]$$
(4)

3.2 AGC-LSTM prediction model

To better explore spatiotemporal dependencies of sector traffic flows, we propose the AGC-LSTM model. Figure 3 illustrates the architecture of the AGC-LSTM model, comprising five layers, namely, the input layer, the multi-head attention-based graph convolutional layer, the dropout layer, the LSTM layer, and the output layer.

Fig. 3
figure 3

AGC-LSTM architecture diagram

The input layer receives feature dataset X and target label dataset Y. The subsequent graph convolutional layer captures spatial dependencies within the data, enhancing feature extraction. It employs a multi-head attention mechanism to capture diverse relationships and feature importance in the graph structure. After the graph convolutional layer, a dropout layer randomly discards some output, promoting network learning on different subsets for robustness. The resulting sequence is input to the LSTM layer, which captures temporal dependencies and extracts temporal features. High-level features from the previous layers are propagated through fully connected layers to predict sector traffic flow effectively. This comprehensive approach enables the AGC-LSTM model to explore both spatiotemporal dependencies and accurately predict traffic flow.

3.2.1 Graph convolutional layer with a multi-head attention mechanism

In recent years, GCN has been introduced to handle graph-structured data. They capture spatial features between vertices by constructing filters in the Fourier domain, which act on vertices and their first-order neighbors. The multi-head attention mechanism is an extension based on attention mechanisms used for sequential data [91]. It expands the single attention head in the attention mechanism to multiple parallel attention heads, thereby enhancing the expressive and modeling capabilities of the model. The operation process of the entire multi-head attention mechanism graph convolutional layer can be represented by the following equations:

$$H_1 = {\text{Concatenate}}\left[ {\sigma \left( {D^{ - \frac{1}{2}} A_1 D^{ - \frac{1}{2}} {\text{XW}}_1 } \right), \sigma \left( {D^{ - \frac{1}{2}} A_2 D^{ - \frac{1}{2}} {\text{XW}}_2 } \right), \cdots ,\sigma \left( {D^{ - \frac{1}{2}} A_K D^{ - \frac{1}{2}} {\text{XW}}_K } \right)} \right]$$
(5)
$$A_K = {\text{softmax}}(e_K )$$
(6)
$$e_{{\text{ij}}} = {\text{LeakyReLU}}\left( {a_K^T [W_K X_i ||W_K X_j ]} \right)$$
(7)

where \(H_1\) is the output of the multi-head attention graph convolution layer, and \(K\) represents the number of attention heads in the multi-head attention mechanism, where each head has its attention weight matrix and linear transformation matrix. \(A_k\) denotes the attention weight matrix of the \(K_{{\text{th}}}\) head, \(W_K\) represents the linear transformation matrix of the Kth head, \(D\) is the node degree matrix, \(X\) is the feature matrix, and \(\sigma ( \cdot )\) denotes the activation function. \(e_K\) represents the attention weights between all nodes in the Kth attention head, which are normalized using the \({\text{softmax}}\) function to ensure that the attention weights of each node sum up to 1. \(e_{ij}\) represents the attention weight between node \(i\) and node \(j\), while \(a_K\) denotes the parameter vector of the Kth attention head. \(X_i , X_j\) represent the feature vectors of node \(i\) and node \(j\), respectively, \(||\) represents the concatenation operation of vectors, and \({\text{LeakyReLU}}\) represents the rectified linear activation function with a leaky slope.

3.2.2 Dropout layer

In the context of deep learning, dropout has been demonstrated to be effective in preventing overfitting in deep neural networks [92]. It is a commonly used regularization technique that randomly sets a portion of neuron outputs to zero during the training process of a neural network to prevent overfitting. The dropout operation can be seen as an ensemble learning technique that enhances the robustness of the network to small perturbations in the input by randomly dropping the outputs of neurons. In the constructed model, the formula for dropout can be expressed as follows:

$$H_2 = M \odot H_1$$
(8)

where \(H_1\) represents the input vector of the dropout layer, and \(M\) is a binary mask vector with the same shape as \(H_1\), indicating which neurons should be dropped. \(\odot\) denotes element-wise multiplication, and \({ }H_2\) represents the output vector after dropout.

3.2.3 LSTM layer

By leveraging the aforementioned steps, we effectively extract spatial structural features from the data. Subsequently, the LSTM layer is employed to capture the temporal dependencies. Due to the limitations of traditional RNNs, such as the vanishing gradient and exploding gradient problems, the LSTM model [93] was proposed as a variant that can address these issues. As illustrated in Fig. 4, the core idea of LSTM is to control the flow of information through gate units, including the forget gate, input gate, and output gate.

Fig. 4
figure 4

Principle diagram of LSTM

These gates utilize learnable parameters to selectively retain and discard information, enabling the capture and propagation of important information within the sequence. Below are the formulas that describe the implementation of the LSTM layer. \(H_2 { }\) is required to be inputted sequentially, and the input gate determines the information that needs to be updated. It processes the input and the previous hidden state using the \({\text{Sigmoid}}\) activation function to obtain a value between 0 and 1, representing the importance of each input. Equation (9) defines \(i_t\) as the activation value of the input gate, which controls the importance of the current input. \(b_i\) is the bias vector, and \(x_t\) represents the current input sequence. Similarly, the forget gate uses the \({\text{Sigmoid}}\) activation function to determine the degree of retention for each previous hidden state. Equation (10) represents \(f_t\) as the activation value of the forget gate, which controls the degree of forgetting the previous hidden state. Equation (11) defines \(\tilde{c}_t\) as the candidate value for the update, generated by the hyperbolic tangent (\({\text{tanh}}\)) activation function to produce a new candidate hidden state. \(c_t\) represents the cell state, responsible for transmitting and storing information, which is updated and stored at each time step. \(o_t\) represents the activation value of the output gate, and \(h_t\) represents the current hidden state, which is the output of the LSTM model. \(W\) and \(b\) represent the weight matrix and bias matrix for the respective time step.

$$i_t = \sigma (W_{{\text{xi}}} \cdot x_t + W_{{\text{hi}}} \cdot h_{t - 1} + b_i )$$
(9)
$$f_t = \sigma (W_{{\text{xf}}} \cdot x_t + W_{{\text{hf}}} \cdot h_{t - 1} + b_f )$$
(10)
$$\tilde{c}_t = {\text{tanh}}(W_{{\text{xc}}} \cdot x_t + W_{{\text{hc}}} \cdot h_{t - 1} + b_c )$$
(11)
$$c_t = f_t \cdot c_{t - 1} + i_t \cdot \tilde{c}_t$$
(12)
$$o_t = \sigma (W_{{\text{xo}}} \cdot x_t + W_{{\text{ho}}} \cdot h_{t - 1} + b_o )$$
(13)
$$h_t = o_t \cdot {\text{tanh}}(c_t )$$
(14)

By incorporating the gate mechanism in the LSTM layer, selective retention and forgetting of information can be achieved. This design addresses the challenge of capturing long-term dependencies in traditional RNN and enables better extraction of temporal features from air traffic data.

3.2.4 Loss function

Within the proposed model, we utilize L2 regularization as the loss function for the regression model’s mean squared error. The formula for the L2 regularization is as follows:

$${\text{loss}} = \frac{{\sum^(\hat{y}_i - y_i )^2 }}{n} + \lambda L_{{\text{reg}}}$$
(15)

where \(\hat{y}_i\) and \(y_i\) represent the actual value and predicted value, respectively. \(\lambda\) denotes the L2 regularization coefficient, and \(L_{{\text{reg}}}\) represents the trainable weights of the regression model.

3.2.5 Evaluation metric

Four metrics were utilized to evaluate the predictive performance of AGC-LSTM: mean absolute error (MAE), root-mean-square error (RMSE), symmetric mean absolute percentage error (SMAPE), and coefficient of determination (R2). The formulas are as follows:

$${\text{MAE}} = \frac{1}{n}\mathop \sum \limits_1^n |\hat{y}_i - y_i |$$
(16)
$${\text{RMSE}} = \sqrt {{\frac{1}{n}\mathop \sum \limits_1^n (\hat{y}_i - y_i )^2 }}$$
(17)
$${\text{SMAPE}} = \frac{1}{n}\mathop \sum \limits_1^n \frac{{|\hat{y}_i - y_i |}}{{(\left| {\hat{y}_i } \right| + |y_i |)/2}} \times 100$$
(18)
$$R^2 = 1 - \frac{{\sum_1^n (\hat{y}_i - y_i )^2 }}{{\sum_1^n (y_i - \overline{y})^2 }}$$
(19)

where \(\hat{y}_i\) represents the predicted value, \(y_i\) represents the true value, \(\overline{y}\) denotes the mean of the true values, and \(n\) represents the number of samples. MAE and RMSE quantify the differences between the predicted and true values, SMAPE measures the magnitude of relative errors, and R2 evaluates the extent to which the predictive model explains the total variability.

4 Experiments and results

Following the introduction of the AGC-LSTM model for sector-based traffic flow prediction, this section first introduced the experiment datasets, selecting two key sectors in the central–southern region of China to conduct a case study. Subsequently, graph structures are constructed, taking into account the actual internal routes and traffic flow patterns within these selected sectors. Then to verify the feasibility and validity of the model, we made comparative experiments of AGC-LSTM with five baseline models to evaluate the prediction performance at a time granularity of 15 min for the single-step ahead. Additionally, comparative experiments were conducted to evaluate the prediction results at different statistical time granularities for a single time interval of 15, 30, 45, and 60 min. Furthermore, prediction results of different look-ahead time steps were compared.

4.1 Datasets

The Automatic Dependent Surveillance-Broadcast (ADS-B) data are adopted to extract the spatiotemporal features of air traffic flows. The ADS-B is based on the Global Navigation Satellite System (GNSS) and can provide comprehensive datasets including flight ID, timestamp, latitude, longitude, altitude, aircraft type, etc. Detailed information about ADS-B can refer to [94, 95]. The two selected sectors, namely, ZGGGAR11 and ZGGGAR22, are situated in the central–southern region of China. They act as crucial junctions connecting numerous airports, including those in the Greater Bay Area and the southwest region of China. In comparison with other airspace sectors, ZGGGAR11 and ZGGGAR22 accommodate a higher volume of flights and play a more significant role in air traffic management. Figure 5 provides a detailed spatial depiction of these two sectors. Both sectors share identical horizontal extents, but they differ in their vertical ranges (ZGGGAR11: 9200–12,500 m; ZGGGAR22: 6000–9200 m).

Fig. 5
figure 5

a Screenshot of the overview of flight routes in the central–southern region of China. The dots denote airport locations. Especially, b and c depict the vertical view and horizontal view of the ZGGGAR11/ZGGGAR22 sectors, respectively

This study utilizes ADS-B data from March 2019, obtained from Variflight (http://www.variflight.com). The dataset comprises 72,512 records from a total of 34,167 flights, encompassing 20 flight route segments within the two airspace sectors (ZGGGAR11 and ZGGGAR22). For each segment, both the scheduled and actual traffic flows were determined, using the scheduled arrival time and the ADS-B arrival time, respectively. However, it was observed that while calculating the overflying times of the waypoints using the original ADS-B data, certain abnormal data arose due to inconsistencies in the adjustments of entry/exit times for each segment and sector. To address this issue, a set of criteria were applied to clean the flight data and ensure its reliability and accuracy. Following the application of these selection criteria, which involved considering the flying time of each route segment to be between 2 and 6 min, a dataset containing valid flight route segment data was derived. This dataset comprised 70,276 records originating from 33,942 flights. Subsequently, an examination of the entry/exit times for each flight route segment enabled the calculation of air traffic flow for the 32 one-way flight route segments present in sector ZGGGAR11/ZGGGAR22. These calculations were performed for specific time intervals, each corresponding to a node in the subsequent graph network to be constructed.

4.2 Graph generation

This section presents the principle of graph generation through a series of transformations as depicted in Fig. 6. Especially, Fig. 6a illustrates the flight route segment network within the sectors ZGGGAR11/ZGGGAR22, comprising 32 flight route segments listed on the right side of the figure. To represent the topological structure of the network, this network is converted into a directed graph denoted as \(G=(V,E)\), where \(V\) denotes the nodes (flight route segment), and \(E\) denotes the edges (waypoint). For further clarity, Fig. 6b depicts the correspondence between the route segment structure of the main air route ONEMI-VQ-MAMSI-ENKUS within the sector and \(G\). Similarly, Fig. 6c shows the correspondence between the route segment structure of NODOG-QP-MAMSI-SJG-AKNAV-ELKAL and \(G\).

Fig. 6
figure 6

Graph structure of ZGGGAR11/ZGGGAR22

4.3 Experimental settings

In the experiment, 70% of the data was used as the training set, while 30% of the data was allocated for the test set. These datasets were utilized to predict the sector traffic flow for multiple subsequent time series. The AGC-LSTM model undergoes training with a learning rate set at 0.001 for a total of 300 epochs, using a batch size of 64. The number of attention heads in the graph convolutional layer is set to 2. Given the significant impact of hyperparameters on model predictive accuracy, we conducted hyperparameter tuning to enhance model performance. The optimized hyperparameter involved the hidden neurons in the GCN layer, the hidden neurons within the LSTM layer, and the dropout ratio. Figure 7 illustrates the prediction error results for different values of the hyperparameter. In particular, Fig. 7a and b depicts the application of binary search to identify the optimal number of hidden neurons in both the GCN and the LSTM layers. Meanwhile, Fig. 7c illustrates the prediction error result for each of the dropout ratio values. The resulting hyperparameters are presented in Table 1.

Fig. 7
figure 7

Process of hyperparameter tuning and results

Table 1 Hyperparameter values obtained through hyperparameter tuning

To gain a comprehensive understanding of the AGC-LSTM model, an analysis of the algorithmic complexity is conducted. Let \(\left\| A \right\|_0\) represents the number of nonzero elements in the adjacency matrix, \(N\) denotes the number of nodes, \(F\) represents the feature dimensionality, and \(L\) indicates the number of GCN layers. Within each GCN layer, forward propagation involves feature propagation and aggregation operations, while backward propagation requires gradient computation and parameter updates, resulting in an overall complexity is \(O(L\left\| A \right\|_0 F + {\text{LNF}})\). Upon incorporating a multi-head attention mechanism into the GCN, with \(K\) denoting the number of attention heads, the overall complexity of the GCN with a multi-head attention mechanism can thus be expressed as \(O\left( {{\text{LK}}(\left\| A \right\|_0 F + {\text{NF}})} \right)\). For LSTM, with an input sequence length of \(T\) and a hidden state dimensionality of \(h\), the complexity is \(O(4{\text{Th}} + 4{\text{TFh}})\). Therefore, the total complexity of the AGC-LSTM model is \(O\left( {{\text{LK}}\left( {\left\| A \right\|^0 F + {\text{NF}}^2 } \right) + \left( {4{\text{Th}}^2 + 4{\text{TFh}}} \right)} \right)\).

To evaluate the performance of the AGC-LSTM model, the comparison was made with five baseline models: (1) Historical average (HA) is a forecasting method that predicts future values by taking the average of past observations. (2) Autoregressive integrated moving average (ARIMA) is a time-series prediction technique. (3) Support vector regression (SVR) is a regression method that utilizes a linear support vector machine for series prediction. (4) LSTM is a specialized variant of recurrent neural network (RNN) commonly used for sequential prediction tasks. (5) GCN-LSTM is a model designed to leverage the advantages of both graph-based and sequential data modeling.

The proposed AGC-LSTM is compared with five baseline models to evaluate the prediction performance at a time granularity of 15 min for the single-step ahead. Additionally, comparative experiments were conducted to evaluate the prediction results at different statistical time granularities for a single time interval of 15, 30, 45, and 60 min. Furthermore, we conducted experiments to compare the results of multi-step ahead predictions. All the experiments are conducted on a machine with NVIDIA GeForce GTX1050 (2 GB memory), i7-7700HQ CPU (2.80 GHz), and 8 GB of RAM.

4.4 Experimental results

4.4.1 Single-step ahead prediction results compared to baseline models

The training time of the AGC-LSTM model is 0.91 s per epoch, and the iteration number is 8700. When applied to the test dataset, the model demonstrates a runtime of 0.39 s on average. AGC-LSTM is compared with five baseline models for the single-step ahead prediction of the sector-based traffic flow. Table 2 provides a detailed comparison of model performance for each prediction method under different input sequence lengths. The “Input_seq_length” represents the input sequence length, and the best values under different feature sequence lengths for each model are highlighted in boldface. The results indicate that the AGC-LSTM model demonstrates improved predictive performance across multiple metrics, including MAE, RMSE, SMAPE, and R2. Compared to the best-performing model GCN-LSTM among the other five models, the AGC-LSTM model reduces the MAE by 14.4%, RMSE by 14.1%, SMAPE by 10.2%, and increases R2 by 9.96%. These data indicate that the AGC-LSTM model is more accurate in predicting. This is primarily attributed to the fact that while the GCN-LSTM model only extracts spatiotemporal features, the AGC-LSTM model incorporates the multi-head attention mechanism in the graph convolutional layer to fully exploit the spatial correlations in the data and focuses on critical node information within the flight route segment network to better learn the topological structure of the entire graph network. Additionally, as sector traffic flow exhibits significant fluctuations in a short period, the AGC-LSTM model mitigates the impact of these perturbations by incorporating the dropout layer, thereby enhancing the robustness of the predictions.

Table 2 Performance comparison of the six prediction methods for single-step ahead prediction with features of 15-min time granularity

Another phenomenon can be observed that the prediction accuracy of the model initially improves and then deteriorates as the number of input time sequence lengths increases. It is speculated that during the process of increasing the data dimension, the quantity of input features related to the predicted results gradually increases, leading to optimal prediction accuracy. However, as the input data dimension continues to increase, the data sparsity increases, making it more difficult for the model to capture effective patterns and trends, thereby affecting the model’s performance.

To assess the impact of the space range of route segments on the prediction results, Table 2 also includes the results of using the extended segment network shown in Fig. 8 to construct the spatial network structure of the AGC-LSTM model. The results indicate that the AGC-LSTM model using the expanded spatial network reduces the MAE by 1.1%, RMSE by 1.2%, and SMAPE by 6.7% when compared with the AGC-LSTM model using a network constructed only with route segment in the focused sector.

Fig. 8
figure 8

Expanded route segment network constructed with route segment inside and out of the sector ZGGGAR11/ZGGGAR22 (route segment in red indicates the expanded segment)

The features have a time granularity of 15 min and vary in input sequence length. Figure 9a presents the real and predicted sector-based traffic flow by AGC-LSTM and GCN-LSTM. Figure 9b shows the absolute prediction errors for each time slot of the whole day. For the forecasted time slots 1–20 (0–5 h), where the actual values are relatively low, the AGC-LSTM model exhibits errors within 2, with most errors being within 1. In contrast, the GCN-LSTM model has errors ranging mostly more than 2 for the same time series, with the highest error up to 6. Ordinally, 5–10% of sector capacity is reserved to take care of all “non-adherence issues” caused by the low traffic predictability in the pre-tactical and tactical stages [96]. Sectors in the central–southern region of China ordinally have the maximum traffic flow not exceeding 21, the improvement of 1–2 flights in the forecasting accuracy has a significant impact. Furthermore, for the subsequent time slot beyond 20 (5–24 h), it can be observed that the AGC-LSTM model achieves a higher fit, with smaller absolute prediction errors for the majority of the time slot.

Fig. 9
figure 9

a Real and predicted sector-based traffic flow in a typical day and b absolute error values of a prediction made by AGC-LSTM and GCN-LSTM

4.4.2 Single-step ahead prediction results under different time granularities

To analyze the impact of different time granularities of the input and output features on the prediction performance, we extracted features at time granularities of 15 min, 30 min, 45 min, and 60 min to predict the traffic flow of the next time step. The experimental results are presented in Table 3.

Table 3 Performance comparison of prediction under different time granularities for single-step ahead prediction with AGC-LSTM model

It can be seen that when the time granularity increases, the model’s prediction performance decreases according to MAE and RMSE, while the SMAPE values tend to decrease, and R2 continues to increase. The decrease in prediction performance may be attributed to the reduction in the number of input time series due to the increase in time granularity, resulting in a decrease in data volume and an increase in outliers, thereby disrupting the model’s learning process. This aligns with the characteristic that mean absolute error is less sensitive to outliers, while root-mean-square error amplifies the error of outliers. The increase in R2 may be due to the smoothing effect of statistical data as the time granularity increases.

The statistical time granularity has an impact on TFM monitoring and implementation. The Enhanced Traffic Management System in the US typically uses a 15-min flow statistics time interval, while the Air Traffic Flow Control Management (ATFCM) system in Europe typically uses a 1-h time interval. Additionally, the intervals can be manually adjusted [97, 98].

4.4.3 Multi-step ahead prediction results under different output sequence lengths

To analyze the performance of the AGC-LSTM model in multi-step ahead predictions at a 15-min granularity, we conducted experiments using different input sequence lengths to predict the sector-based traffic flow for different output sequence lengths. For tactical air traffic flow management operations, when the prediction look-ahead time (LAT) is more than 1 h, the flow prediction is often less accurate [99, 100], so a two to five steps ahead output sequence length is used here. The experimental results are presented in Table 4. The “Predict_seq_length” column indicates the number of time steps ahead that were predicted. The table highlights in boldface the optimal results for each prediction of different time steps.

Table 4 Performance comparison for 2–5 steps ahead prediction with AGC-LSTM model

The prediction results demonstrate that as the number of forward prediction sequence lengths increases, the AGC-LSTM model experiences slight performance degradation. Results indicate that the optimal length of the input sequence increases with the prediction time steps. This can be attributed to the long-term memory characteristics of the LSTM layer, where the AGC-LSTM model retains less information loss during the transmission of longer time series, thereby maintaining high prediction accuracy in long-term forecasting scenarios.

5 Discussion and implications

The model presented retains certain limitations, for instance, the impact of weather conditions on sector traffic flow is significant, and weather-related features have not been incorporated. Additionally, given the disparate roue segment structure related to the sectors, the current methodology outlined in this research necessitates the reconstruction of graph structures and the retraining of networks to predict traffic flow accurately across different sectors.

Nevertheless, this research offers several noteworthy contributions. Firstly, the utilization of the innovative AGC-LSTM model for sector traffic flow prediction based on sector-related flight segments demonstrates exceptional performance. Secondly, an exploration into the optimal input sequence length and granularity of time intervals for prediction markedly enhances prediction accuracy. Furthermore, the proposed model holds promise in aiding ATCOs in efficiently managing traffic flow and mitigating their operational burdens in practical settings. In practice, this entails a process of selecting the target region for prediction, followed by rigorous data collection, feature extraction, and model training. Subsequently, the model’s real-time application enables precise sector traffic flow prediction in a short time, facilitating controllers in evaluating airspace congestion and implementing effective management strategies. The practical application framework for the proposed AGC-LSTM method is shown in Fig. 10.

Fig. 10
figure 10

Practical application framework for the proposed AGC-LSTM method

Regarding the scalability of the model, the enhancement of its performance can be pursued through the exploration of diverse attention mechanisms. Beyond conventional self-attention mechanisms, the incorporation of multi-scale attention and positional attention, among others, can augment the model’s adaptability to various focal points. Moreover, the enrichment of graph data representation by integrating additional node features, edge features, or subgraph structural information holds promise for further amplifying the model’s efficacy. In essence, the AGC-LSTM model exhibits commendable extensibility, permitting subsequent refinements and expansions tailored to specific task exigencies to elevate both performance and applicability.

6 Conclusions and future work

Data-driven and machine learning techniques offer the advantage of adaptability and flexibility, allowing models to learn from data and adjust predictions based on changing patterns in air traffic flow. These advanced approaches have significantly improved the accuracy and effectiveness of air traffic flow prediction, supporting safer and more efficient airspace operations. In this paper, we presented an innovative AGC-LSTM model for sector-based traffic flow prediction, which combines graph convolution networks with attention mechanisms. The traffic flow in the sector is influenced not only by the number of planned flights but also by the airspace complexity within the sector. To address this, our model integrates the multi-head attention mechanism into GCN, effectively capturing the sector’s topological structure and focusing on critical nodes. Furthermore, the LSTM model is incorporated to capture temporal dynamics in node attributes, allowing the extraction of essential spatiotemporal features from the data. Through comprehensive comparisons with the five baseline models, our proposed method demonstrates superior performance across all evaluation metrics. Notably, the AGC-LSTM model reduces the MAE to 1.6 which is of significance to the sector-based traffic flow management. The prediction performance can still be improved when the AGC-LSTM model is constructed with an expanded graph space range. Additionally, we explore the optimal input sequence length and time interval granularity variables for prediction, leading to significant improvements in prediction accuracy. The AGC-LSTM model also proves highly adept at accurately predicting long input sequence lengths. All the results highlight the effectiveness of our AGC-LSTM approach in predicting sector traffic flow and showcase its potential for real-world applications in the aviation domain. Specifically, the proposed model has the potential to help ATCOs manage traffic flow efficiently and reduce the workload.

Based on the analysis of real aviation datasets, the following observations can be made regarding the proposed method. (1) The proposed method outperforms the GCN-LSTM model, showing superior predictive ability, particularly in high-flow scenarios. (2) The prediction accuracy of the model initially improves with an increase in the length of the input time sequence. However, there comes a point where a further increase in the length of the input sequence leads to a deterioration in prediction accuracy. (3) The model’s prediction performance improves when extending the route segment range beyond the current sector to include the neighboring sectors. (4) The model’s prediction performance is negatively affected as the time granularity increases. In other words, when dealing with coarser time intervals, the model’s accuracy decreases. (5) Extending the prediction time steps beyond a certain point does not significantly impact the prediction performance of the model. This verifies the stability performance of GCN-LSTM.

However, it is crucial to recognize that the research findings presented in this paper come with certain limitations and open avenues for future investigations in novel directions. Firstly, although the input features for prediction encompass periodicity and historical traffic flow, incorporating additional factors such as weather conditions might have the potential to further enhance prediction performance. Secondly, the current method proposed in this study necessitates reconstructing the graph structure and retraining the network when predicting the traffic flow for a different sector, which could be a subject for improvement in the future research. Exploring approaches that allow for more flexible and generalized predictions without the need for complete restructuring would be beneficial. Thirdly, it is worth noting that different sectors may exhibit unique spatial structural characteristics and traffic flow patterns, which can impact sector traffic flow differently. Investigating the extraction of universal input features that can accommodate diverse sector characteristics would further enhance the robustness and applicability of the prediction model. In summary, while this research lays a foundation for sector traffic flow prediction, addressing these limitations and exploring new research directions will contribute to advancing the accuracy, flexibility, and generalization of the prediction model.