1 Introduction

The continuous and rapid growth of civil aviation flights has led to increasingly high requirements for air traffic safety, capacity, and efficiency. As the wake separation and arrival runway occupancy time (AROT) are the most critical factors directly affecting runway capacity, studies on AROT prediction are essential. Accurately predicting the runway occupancy time (ROT) can help air traffic controllers determine the necessary interval and improve the runway operation capacity and air traffic flow.

Currently, research on AROT is mainly focused on three aspects: the positioning of runway exits, the primary factors influencing runway occupancy, and analysis of runway capacity.

In the investigation of runway exit location, many researchers have established aircraft deceleration models to predict exit location based on actual data collected from airport observations, without taking into account the impact of airport layout and environmental factors [1]. Alternatively, some have utilized dynamic programming methods to simulate landing flight profiles and determine the optimal exit crossing [2, 3].

In the research analyzing the main factors affecting runway occupancy, the National Aeronautics and Space Administration (NASA) analyzed the sensitivity of AROT to factors such as aircraft weight, speed, airline carrier, and meteorological conditions, concluding that AROT is mainly dependent on aircraft speed and weight and that it increases slightly, up to a maximum of 5%, in wet runway conditions [4].

Herrema conducted a study on the main factors affecting AROT using Automatic Dependent Surveillance-Broadcast (ADS-B) and Advanced Surface Movement Guidance and Control System (A-SMGCS) data and found that runway exit position, aircraft type, airline, aircraft final approach speed, and following aircraft were the most significant factors [5]. Meijers et al. utilized a data-driven approach to investigate the ROT of aircraft during their landing process and explored the influence of various factors on this duration [6]. In another study by Koenig, it was revealed that AROT is closely related to the pilot's intention, as well as the wake category and runway surface conditions [7].

In the field of runway capacity analysis, researchers have primarily utilized simulation techniques to study runway capacity, delay, and dual runway occupancy, and have highlighted the significance of ROT in mixed runway operations and airport capacity improvement [8,9,10,11]. Nikoleris identified that AROT variation can lead to airport capacity constraints and flight delays [12, 13]. Kang developed a runway capacity evaluation model for regional airports and quantified the constraints on runway capacity imposed by runway structure by computing takeoff and arrival runway occupancy time [14]. Zhang analyzed the joint impact of landing time interval and runway occupancy time on runway capacity [15].

Currently, the focus of research on ROT mainly revolves around the structure and capacity analysis of runway taxiways. However, there is a lack of prediction research on AROT, resulting in limited reference for improving airport runway capacity. To fill this gap, this paper proposes the use of machine learning algorithms to predict the occupied time of aircraft during arrival on runways.

This paper focuses on the prediction of ROT during the aircraft arrivals, and conducts statistical analysis of historical data and related factors of arriving aircraft to establish an initial prediction model based on the Back Propagation (BP) neural network. Then, a combination of particle swarm optimization algorithm and genetic algorithm is used to optimize the initial model and improve the accuracy of model prediction. The main research contents of the paper are as follows:

  1. 1.

    Analysis of the current situation and process of runway occupancy during the arrival phase

By investigating the current situation of ROT for aircraft arrival operations, as well as the definitions of ROT in different operating units and foreign regulations, this study proposes a clear definition of the AROT. Furthermore, a comprehensive analysis of the occupancy time during each phase of the aircraft arrival runway occupancy process is conducted, in order to prepare for the subsequent data selection and acquisition.

  1. 2.

    Data extraction and processing

Through investigating the operating status of arrival runway occupation, different operating units, and the definition of runway occupation time in foreign regulations, this study clarified the definition of arrival runway occupation time and prepared for the data selection and collection in the next section.

  1. 3.

    Building AROT initial prediction models using BP neural network

This study mainly focused on predicting the AROT. Based on the statistical analysis of historical data and related factors of arriving aircraft, a BP neural network-based AROT initial prediction model was established. The selection of BP neural network activation functions and training algorithms, hidden layers, and determination of neuron numbers were performed through the analysis of the correlation of the feature parameters. Finally, the prediction results of the AROT model were analyzed, and the model was improved and optimized.

  1. 4.

    Joint optimization of AROT prediction model using particle swarm optimization algorithm and genetic algorithm

This study addresses the deficiencies of the BP neural network in predicting AROT, such as the uncertainty of the initial weight threshold leading to the network easily falling into a local minimum, uncertain convergence speed, and unstable network structure. To address these issues, this study combines the characteristics of the GA algorithm, such as cross-over and mutation, and the characteristics of the PSO algorithm, such as the quick maturity of inter-particle communication, to jointly optimize the initial model of the BP neural network for predicting AROT.

2 Research Progress

The civil aviation operations regulations prohibit simultaneous occupation of the runway [16]. This means that only one arrival aircraft can occupy the runway at a time, thereby reducing the risk of collision. If the preceding aircraft occupies the runway for an extended period, it may cause the subsequent aircraft to abort landing and decrease the runway capacity. Therefore, reducing the ROT is crucial to improve airport capacity. While the application of wake re-categorization (RECAT) improves runway capacity by reducing wake separation, excessively long ROT may negate the benefits of shortened intervals and limit the enhancement of runway operational efficiency.

2.1 Related Definitions

The Civil Aviation Administration of China (CAAC) defines ROT as the duration in which an aircraft occupies the runway, including the time required for the aircraft to take off and land within the ground protection area. The ground protection area encompasses the runway, the portion of the taxiway from the applicable runway holding position to the actual runway, the surface area within 75 m on each side of the runway centerline, the Instrument Landing System (ILS) sensitive area, the ILS critical area, and the runway end safety area. As such, the AROT refers to the period from the moment the aircraft passes over a position 300 m away from the runway threshold to the instant it departs from a location 75 m away from the centerline of the runway.

Various countries have differing definitions of AROT. For instance, the American MITRE corporation defines AROT as the time from when an aircraft crosses the runway threshold to the runway boundary [17, 18], while Trani defines it as the time interval from the runway threshold to the runway exit intersection [19]. Kumar, on the other hand, defines AROT as the time when the aircraft crosses the runway threshold to 25 ft. from the runway boundary (average of the aircraft wingtip and tail leaving the runway boundary [20]. Kolos defines AROT as the time from crossing the runway threshold to leaving the runway exit holding position [5]. Additionally, Eurocontrol defines AROT as the time when the aircraft crosses the runway threshold to when the tail of the aircraft leaves the runway boundary [21]. In this study, the Federal Aviation Administration (FAA) definition of AROT, which defines the aircraft exiting the runway as the entire fuselage completely crossing the runway holding position, is used.

2.2 The Impact of ROT

The runway operation phase of an aircraft is critical in airport ground operations, and any irregularities in runway occupancy time can significantly impact operational efficiency. In practical operations, aircraft are often unable to implement reduced wake turbulence separation through RECAT due to excessive runway occupancy time, thus limiting the potential improvements in runway operation efficiency. During peak hours, irregularities in runway occupancy time can trigger a series of chain reactions, leading to conflicts in runway occupancy and increasing the likelihood of go-around situations for landing aircraft, thereby affecting airport operational efficiency. Additionally, the arrival and departure processes are interdependent, and the runway occupancy of arrival aircraft directly affects the operational efficiency of departing aircraft.

It is evident that runway occupancy time has a significant impact on runway capacity, particularly in the context of RECAT-CN implementation. By standardizing aircraft runway occupancy time while ensuring safe aircraft operations, it is possible to enhance runway capacity and operational efficiency. Moreover, standardizing aircraft runway occupancy time enables air traffic controllers to accurately determine final approach spacing adjustments, maximizing the utilization of airspace resources and achieving the goal of increasing runway capacity. Currently, there are no unified regulations in China to standardize aircraft runway occupancy time, and different air traffic control units, airports, and airlines adopt flexible approaches based on their own operational conditions, resulting in significant variations in runway occupancy time for the same aircraft type at different airports and under different carriers.

Therefore, it is of great practical significance to define aircraft runway occupancy time, standardize it, and enhance airport runway capacity. This will contribute to the successful implementation of RECAT operations and the overall improvement of airport operational efficiency.

3 Data Extraction and Processing

3.1 Data Source

To predict the AROT, flight path data, weather information during landing, and runway configuration information are necessary. However, traditional surveillance systems applied to surface operation suffer from low precision. To overcome this shortcoming, Quick Access Recorder (QAR) data, which can continuously record more than 600 h of raw flight data, including thousands of flight parameters such as altitude, speed, flight attitude, and acceleration, are used to provide more detailed and reliable data. Weather information is extracted from historical meteorological airport reports, usually updated every 30 min, while arrival airport runway and taxiway configuration information can be obtained from airport Aeronautical Information Publication (AIP) data.

3.2 Parameter Extraction

A portion of the A320’s QAR data was acquired from Air China and pertinent parameters related to arrival runway occupancy behavior were extracted. Specifically, time parameters, runway threshold speed, runway departure speed, and taxi distance were extracted during the descent-to-taxiing phase of flight. Additionally, the METAR message was utilized to obtain temperature, wind speed, wind direction, and visibility during landing. The Aeronautical Information Publication (AIP) data was consulted to retrieve information regarding airport arrival runway configuration, runway length, and angles of departure from the crossing. Statistical analyses were conducted on each parameter, as demonstrated in Figs. 1, 2, 3, 4, 5, 6, 7, and 8, and the resulting statistical values are presented in Table 1.

Fig. 1
figure 1

Distribution of arrival runway occupancy time

Fig. 2
figure 2

Distribution of taxi distance

Fig. 3
figure 3

Distribution of runway entry speeds

Fig. 4
figure 4

Distribution of runway exit speeds

Fig. 5
figure 5

Distribution of visibility

Fig. 6
figure 6

Distribution of wind speed

Fig. 7
figure 7

Distribution of wind direction

Fig. 8
figure 8

Distribution of temperature at arrival time

Table 1 Statistical table of the distribution value of each parameter

3.3 Data Processing

Due to the substantial amount of parameters stored in the QAR device, relevant data extraction can be interfered with by other parameters, leading to the collection of data that may contain certain deviations or missing values. Hence, it is imperative to conduct data preprocessing operations on the obtained raw data. In this section, the data preprocessing is divided into three steps. First, the K-Nearest Neighbor (KNN) algorithm is employed to fill in the missing values, followed by the Z-score normalization method to standardize the data. Finally, the boxplot method is utilized to identify and handle outliers. The outliers before and after processing are depicted in Figs. 9 and 10.

Fig. 9
figure 9

Boxplots of raw data before outlier processing

Fig. 10
figure 10

Boxplot of data after outlier processing

4 BP Neural Network Initial Predicted AROT

The BP neural network is a supervised learning algorithm that effectively addresses the weight adjustment problem of multi-layer feedforward networks in solving nonlinear continuous function problems [22,23,24]. The BP neural network comprises three layers, namely the input layer, one or more hidden layers, and the output layer, with neurons connected in a one-by-one manner within each layer, but without any inter-layer neuron connections [25].

During the training process, the dataset was randomly split into a training set (80%) and test set (20%). The Sklearn 1.0.2 open-source machine learning library was utilized to construct a regression model employing the MLP Regressor algorithm. After continuous debugging and manual traversal comparison, the following best parameters were chosen: Hidden Layer Sizes = (n), hidden layer input with n = 110; Max_iter = 10,000, the maximum number of iterations set to 10,000; Learning rate init = 0.001, learning rate was set at 0.001; Activation = ‘relu’, RELu function was selected as the activation function; Solver = ‘Adam’, and the optimization algorithm adopted the Adam algorithm. Random state was set to 123 to ensure each run produced identical results, and the training stopped when the training set error reached 1e−2. The prediction results were depicted in Fig. 11.

Fig. 11
figure 11

Comparison between predicted and actual values of BP neural network

5 GA–PSO Improves and Optimizes AROT

5.1 The GA and PSO Algorithm

Despite the BP neural network's robust capability in nonlinear mapping, high adaptability and self-learning, and excellent generalization and fault-tolerance abilities [26], it has certain drawbacks, such as a high risk of getting stuck in local minima, slow convergence of the network algorithm, and difficulty in determining network topology [27,28,29]. To address these limitations, this paper proposes a novel approach that combines the genetic algorithm (GA) and the particle swarm optimization (PSO) algorithm. Specifically, the GA’s crossover and mutation operations are integrated into the PSO algorithm. The particles of the corresponding dimensions are generated by calculating the number of parameters to be optimized, and the model is then trained. After several iterations, the optimal particle parameters for each dimension are determined as the best parameters, leading to the development of the GA–PSO algorithm with improved overall performance [30].

During the update process of the PSO algorithm, the velocity and position of each particle are updated in accordance with Eqs. (1) and (2).

$$ V_{i} (k + 1) = \omega V_{i} (k) + c_{1} r_{1} [p_{i} - x_{i} (k)] + c_{2} r_{2} [p_{g} - x_{i} (k)] $$
(1)
$$ X_{i} (k + 1) = X_{i} (k) + V_{i} (k + 1) $$
(2)

In the equation, \(k\) represents the iteration number; \(c_{1}\) and \(c_{2}\) are the acceleration coefficients;\(\omega\) is the inertia coefficient; \(r_{1}\) and \(r_{2}\) are random numbers uniformly distributed between 0 and 1, and the particle update process can be expressed by formula (3).

$$ X_{i} (k + 1) = X_{i} (k) + \omega V_{i} (k) + c_{1} r_{1} [p_{i} - x_{i} (k)] + c_{2} r_{2} [p_{g} - x_{i} (k)] $$
(3)

where, the second term on the right side of the equation represents the particle's original inertia, which is equivalent to applying the mutation operation in GA to the velocity term at the previous moment. Similarly, the first, third, and fourth terms represent the sum of the particle and the personal best particle and global best particle, respectively. The global best particle performs the crossover operation in GA.

Therefore, initially, the mutation operation in GA is applied to replace the second term in Eq. (3): the velocity term of the updated particle no longer uses the inertia coefficient to multiply the original velocity but rather mutates the particle velocity. Then, the crossover operation in GA is used to replace the 1st, 3rd, and 4th items in formula (3): firstly, the particle is crossed with its individual extremum, and then with the global extremum. Finally, the particle that has completed the crossover operation is added to the corresponding velocity item to complete the particle update.

5.2 GA–PSO Joint Optimization Process of BP Neural Network

The GA–PSO algorithm described above replaces the gradient descent method used in the traditional BP algorithm and is utilized in the parameter optimization process of the BP neural network to explore all of its weights and thresholds, which are subsequently employed as encoding information for the particle swarm individuals. This approach not only avoids the problem of excessive computation brought on by the gradient descent method but also mitigates the risk of getting trapped in a local minimum [31]. In the optimization process, the memory function of the particle swarm algorithm is harnessed to retain the global optimum, thus enabling each particle to quickly approach the global optimum solution and enhance the convergence speed.

The fitness function of the particle swarm in the BP neural network is determined by the minimum mean square error of the learning samples, as demonstrated by formula (4).

$$ f = \frac{1}{N}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{q} {(a_{ij} } } - t_{ij} )^{2} $$
(4)

In the formula, \(a\) is the actual output of the network, \(t\) is the corresponding expected output, and \(N\) is the total number of samples.

The GA–PSO algorithm for optimizing the BP network involves 8 specific steps.

  1. 1.

    The objective function is determined, which is the average mean square error (MSE) of the test set obtained from fivefold cross-validation of the BP neural network.

  2. 2.

    Population initialization is performed with a three-dimensional structure, where the first dimension represents the seed, the second dimension represents the dimensions, and the third dimension represents the coding gene. The initialization values of the number of neurons in the hidden layer of the two-layer neural network 1 and 2 are used as the two dimensions in the second dimension of the particle individual.

  3. 3.

    Parameter initialization, including setting the size of the particle population in PSO, defining the upper and lower bounds for each dimension, and determining the iteration times, crossover probability, and mutation probability for GA.

  4. 4.

    Calculate the fitness value, and use the accuracy of fivefold cross-validation as the fitness index.

  5. 5.

    To determine the optimal solution, the best fitness value was used to identify the group’s best extreme value in PSO, individual extreme value, and global extreme value in GA. Then, the global extreme value of GA–PSO was obtained by comparing these values. If the obtained global extreme value met the conditions or reached the maximum number of iterations, the global optimal solution was outputted; otherwise, the iteration was continued to update the values.

  6. 6.

    Population update, population update is performed by updating the global extremum in GA and the global and individual extremum in PSO. The population is selected using the improved selection method, and crossover and mutation operations are carried out on the selected population to create new particles. These new particles update their positions and velocities and undergo additional mutation operations based on the crossover probability.

  7. 7.

    To obtain the global optimal values of \(N_{1}\) and \(N_{2}\), the algorithm outputs the global optimal solution.

  8. 8.

    The output results are used as new parameters to replace the parameters in the initial BP neural network model.

5.3 Parameter Settings

The GA–PSO algorithm is used to optimize the parameters of a two-layer BP neural network, where the learning rate is also considered as a parameter for optimization. The key parameters of the model include: (1) hidden layer sizes = (n1, n2), where n1 and n2 represent the number of hidden layers; (2) max iter = 10,000, which specifies the maximum number of iterations as 10,000; (3) learning rate init = lr, where lr = [0.001, 0.1], is the interval for selecting the optimal learning rate; (4) random state = 123, set a random number to ensure that each run gets the same result; (5) size pop = 10, indicates the population size as 10; (6) lenchrom = 10, defines the length of the chromosome as 10; (7) pc = 0.8, sets the crossover probability to 0.8; (8) pm = 0.01, specifies the mutation probability as 0.01; (9) w1 = [10, 300], sets the optimal range of neurons in the first layer within this interval; (10) w2 = [10, 300], sets the optimal range of neurons in the second layer within this interval. The best model parameters are obtained after 10,000 iterations, resulting in 68 neurons in the first hidden layer, 218 neurons in the second hidden layer, and the learning rate lr = 0.008.

5.4 Predicted Results After Optimization

The prediction results obtained before and after the optimization process are presented in Fig. 12. It can be observed that the predicted values of the green line of the model after joint optimization using the GA–PSO approach are more accurate, approaching the actual values of the red line. The comparison of the error values is illustrated in Fig. 13.

Fig. 12
figure 12

The comparison of prediction results before and after optimization

Fig. 13
figure 13

The comparison chart of error values before and after optimization

In order to assess the accuracy of the model predictions, three indicators were used: mean square error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). A smaller value for these evaluation metrics indicates a more accurate prediction result. Table 2 presents a comparison of the evaluation metrics before and after the model was improved and optimized. After using GA–PSO to jointly optimize the initial BP neural network model, the prediction accuracy of the model improved by 9.8%, and the MAPE was reduced to less than 5%, demonstrating that the model achieved high-precision prediction after GA–PSO optimization.

Table 2 Index comparison table after improvement and optimization

5.5 SHAP Explains the Degree of Influence of Characteristic Parameters

The influence of each parameter on the prediction results was analyzed using SHapley Additive explanation (SHAP) approach, which decomposes the influence of each feature and observes the effect of each feature on the prediction. It calculates a specific feature's marginal contribution to the model and derives the average marginal contribution based on the differences in these features. In addition to expressing the impact of individual features in AROT, it can also express the impact of feature groups and the synergistic effects that exist between each feature [3233].

5.5.1 Force Plot

Force plot is utilized to elucidate the predictive power of a parameter in a particular sample from the overall population. Each feature is depicted as a force that either strengthens or weakens the prediction, with positive contributions shown in red and negative contributions shown in blue. Figure 14 displays the degree of influence of each characteristic parameter of Sample No. 385. An increase in air temperature at arrival time, runway exit speed, and taxiing distance leads to a reduction in the AROT of this sample, while an increase in runway threshold speed results in an increase in AROT. Among all factors, the temperature at the time of arrival is found to have the greatest impact on AROT in this particular sample.

Fig. 14
figure 14

Force plot

Figure 15 depicts a SHAP interpretation plot of AROT prediction for a specific sample, with a predicted value of 49.71 s. In this sample, only the runway threshold speed is a positive contributor, meaning that higher runway threshold speed results in longer runway occupancy by the aircraft. The runway exit speed has the most significant negative contribution, followed by visibility.

Fig. 15
figure 15

Predictive power of feature parameters for individual samples

5.5.2 Summary Plot

The summary plot is utilized to interpret the predictive ability of parameters across all samples. Figure 16 is a standard bar graph that calculates the average absolute value of SHAP for each feature, displaying the overall importance and relative relationship of each feature. The analysis indicates that taxiing distance has the highest global importance, followed by the angle of departure from the crossing, with visibility ranking third. This result aligns with reality, where AROT is mainly affected by sliding distance, which depends on the position of the exit crossing. Additionally, the angle of departure from the crossing also impacts AROT, with a rapid departure at an acute angle being more efficient than a 90° right-angle exit.

Fig. 16
figure 16

Standard bar chart of feature parameters

The importance analysis shown in Fig. 17 reveals that runway entry speed has the lowest importance for AROT prediction, as its positive and negative contributions cancel out, thereby having minimal impact on AROT. This may be due to the series of braking operations during landing to reduce the speed to a certain level, rendering the effect of approach speed on the aircraft's arrival runway occupation time insignificant. Wind direction also exhibits low importance for AROT prediction, which may be attributed to the control personnel altering runway operation mode in response to changes in wind direction during actual operation.

Fig. 17
figure 17

The distribution plot of the feature parameters

6 Conclusion

This study investigated the theoretical research on ROT both domestically and internationally, including different definitions. Machine learning algorithms were selected to predict AROT and to fill existing research gaps. Based on QAR data from six airports, an initial prediction of ROT was made using the BP neural network, and the GA–PSO algorithm was used to jointly improve the initial prediction model, resulting in a highly accurate prediction model. The results demonstrate that the proposed model can accurately predict AROT and can quantitatively analyze the impact of factors such as arrival distance, exit angle, and visibility on AROT. The research findings can provide a reference for reducing ROT, runway planning and design, runway capacity analysis, and coupled analysis of takeoff and landing. Additionally, this study can assist air traffic controllers in making interval judgments and provide technical support for airport runway and taxiway design. The main research work of this paper is summarized as follows:

  1. 1.

    According to the current operational situation of runways in China and the arrival occupation process of aircraft, nine influential factors directly related to arrival runway occupation behavior, including runway entry speed, exit runway speed, and taxiing distance, were selected. The corresponding QAR landing data were collected, and a series of processing were carried out on the data.

  2. 2.

    Theoretical foundations of the BP neural network, genetic algorithm, and particle swarm optimization were studied. A preliminary single-hidden-layer BP neural network prediction model with 9 input parameters and 110 hidden nodes, and a decision tree model with 10 nodes and a maximum depth of 3 were constructed to predict ROT. The accuracy of the prediction results was compared and analyzed, and the defects and improvement directions of the models were discussed.

  3. 3.

    To address the uncertainties in the initial weight threshold of the BP neural network leading to local minima, uncertain network convergence speed, and unstable network structure, intelligent optimization was chosen for improvement. A genetic algorithm and particle swarm optimization (GA–PSO) were proposed to optimize the BP neural network AROT prediction model. The prediction results before and after optimization were compared, and overall optimization improved by 15.2%, demonstrating the higher accuracy of the optimized prediction model. Finally, the SHAP model was used to analyze the influence of each feature parameter on the prediction results.

The present study suggests several avenues for future research in the prediction of AROT:

  1. 1.

    Although this study primarily focused on the BP neural network algorithm and optimized it using a combination of genetic algorithm and particle swarm optimization, other models and optimization algorithms such as ant colony algorithm, convolutional neural network, radial basis function, and gated recurrent unit algorithm remain unexplored in the prediction of AROT. Therefore, further investigation is warranted to explore these models and algorithms.

  2. 2.

    The data collection direction of this study was not comprehensive enough, as only flight parameters from aircraft's QAR devices and meteorological and airport parameters were collected. In fact, the Advanced Surface Movement Guidance and Control System (A-SMGCS) at the airport can be useful in the arrival process. A-SMGCS primarily controls aircraft and vehicles on the airport surface by monitoring, controlling, route planning, and guiding traffic flow on the airport and nearby areas to ensure the safety of the surface movement. Therefore, for further research on ROT, it is essential to collaborate not only with airlines but also with airport groups and air traffic control agencies to obtain a broader data source and deepen the scientific conclusions for practical operational guidance.