1 Introduction

Airflow characteristics, especially speed and direction, are pivotal in the aviation world due to their critical role in influencing flights (Gultepe et al., 2019). When strong winds come into play, they pose challenges to both airport functions and the regulation of air traffic, underscoring the need for meticulous monitoring of wind patterns near areas critical to flying, such as takeoff and landing zones (World Meteorological Organization, 2018). Traditional methods, like the numerical weather prediction (NWP) models, often struggle to pinpoint nuanced local wind shifts and require massive computation power and time (Mazzarella et al., 2022; Saoud et al., 2021; Schultz et al., Apr. 2021). Machine learning (ML), however, is emerging as a game-changer in the realm of weather prediction (Liu et al., 2021; Markuna et al., 2023; Menegardo-Souza et al., 2022). ML excels at processing extensive data sets, analyzing complex patterns, and rapidly accommodating new data, suggesting its significant potential in enhancing local weather forecasting accuracy (Baïle & Muzy, 2023). Such cutting-edge techniques might not only enhance but could potentially supplant the established NWP methodologies not only on the results but also on their efficiency, producing forecasts with less computational requirements and in less time (Bentsen et al., 2023).

This work was based on data from Gran Canaria Airport (GCLP), which is strategically located on the eastern coast of Gran Canaria Island, nestled within the bay of Gando, a mere 22 km south of Las Palmas de Gran Canaria, as shown in Fig. 1 (Suárez Bravo de Molina & Méndez Frades, 2019). At an altitude of 24 m, its runways are oriented at 030°–210°, in alignment with the dominant northeastern trade winds, blowing at a frequency of 90% during the summer and around 50% for the rest of the year (Suárez Molina et al., 2019). However, the island's complex topography can disrupt these wind patterns, often leading to orography-induced wind shear. Such disturbances pose challenges, especially during aircraft approaches. Crosswinds, sudden wind shifts, and relative intensities can exceed 60 kt, especially between regions such as Tufia and the GCLP's runway 21 head. Consequently, both turboprop and jet aircraft operations at the airport can be significantly impacted by these conditions (Suárez Bravo de Molina & Méndez Frades, 2019; Suárez Molina et al., 2019).

Figure  1
figure 1

Canary Islands and GCLP airport location

Given the challenges posed by wind patterns in areas such as Gran Canaria and the shortcomings of conventional forecasting methods, ML models offer a potential solution for enhanced accuracy at GCLP. This study compared the effectiveness of several standard models for time-series analysis, specifically, Long Short-Term Memory (LSTM), Vanilla Recurrent Neural Network (vRNN), One-Dimensional Convolutional Neural Network (1dCNN), Convolutional Neural Network-LSTM (CNN-LSTM), and Gated Recurrent Unit (GRU). These models fall under the umbrella of Deep Learning (DL), a subset of ML, which excels in capturing intricate patterns within large datasets. These model’s proficiency in handling time-series data renders a particularly appropriate for tasks involving temporal predictions, such as wind nowcasting (Farah et al., 2022; Sarp et al., 2022; Siami-Namini et al., 2019; Valdivia-Bautista et al., 2023; Zhang et al., 2018). These models were selected for evaluation due to their frequent use in DL for time-series forecasting (Rajagukguk et al., 2020) and due to the recent advances of DL research, particularly using attention mechanisms and transformer architectures to improve the performance and minimize errors (Yu et al., 2024).

The performance was assessed based on the ability to predict wind speed and direction, at 3-h steps (Novotny et al., 2021), until reaching a 24-h span, using data from the Meteorological Aerodrome Report (METAR) wind records.

Preceding research at Tenerife Sur Airport (GCTS) has concentrated on wind dynamics, especially wind shear, and thorough analyses of synoptic events (Quitián-Hernández et al., 2021; Suárez Bravo de Molina & Méndez Frades, 2019; Suárez Molina et al., 2019; Suárez-Molina & González, 2021). While these studies have consistently utilized traditional models, there is a notorious gap in the state-of-the-art regarding wind speed and direction nowcasting using ML-based techniques. These methods have primarily been applied to study extreme weather events (Castro et al., 2022; Chkeir et al., 2023) and low visibility conditions (Bari et al., 2023; Bartok et al., 1684; Li et al., 2022) with some investigation covering extreme wind speeds (Chkeir et al., 2023). Nevertheless, there appears to be an absence of studies focusing objectively on wind speed and direction nowcasting using ML, and none have been centered on GCTS, indicating a specific research gap to be addressed.

The primary objective of this study was to explore the potential of DL in providing up to 24-h wind forecasts, based on METAR observations, explore the potential for advancing and supporting the refinement of Terminal Aerodrome Forecasts (TAF), and identify the most suiTable ML approach to predict the multivariate wind speed and direction values. A secondary objective assesses the performance on the granularity of the models, specifically their accuracy over the 24-h span, segmented into 3-h intervals (Novotny et al., 2021). Additionally, the research evaluates the best model and its forecasts in more detail, evaluating the reliability and precision of wind forecasting in GCLP, an aerodrome affected by complex atmospheric conditions. In this study, the autoregressive integrated moving average (ARIMA) model, known for its proven efficacy in wind and time-series predictions, was evaluated as a reference model alongside the DL models. While prior research focused on univariable performance, this investigation examines a multivariable ARIMA approach (Elsaraiti & Merabet, 2021; Liu et al., 2021).

This paper was structured as follows: Sect. 2 describes the dataset and methodology utilized; Sect. 3 presents the results and associated discussion; and Sect. 4 offers the concluding remarks.

2 Data and Methodology

2.1 Data

The dataset used for this research was drawn from METAR reports from the GCLP airport, covering the timeframe from 2018 to 2022. METAR is an international standard used for reporting current weather conditions, primarily intended for aviation-related activities and at GCLP airport is issued every 30-min (International Civil Aviation Organization, 2018). Within these reports, wind information is denoted through a sequence indicating both its speed (in knots) and direction (in degrees). Figure 2 shows a GCLP METAR example, highlighting the date and wind information.

Figure  2
figure 2

GCLP METAR example, with date and wind information decoded

The data was retrieved from Iowa State University's-Iowa Environmental Mesonet (IEM), which links directly with the National Oceanic and Atmospheric Administration (NOAA) Automated Surface Observing System (ASOS). NOAA is a U.S. agency responsible for monitoring the climate, oceanic, and atmospheric conditions. Accessing data through IEM, which integrates with NOAA's ASOS, ensures that the data utilized in this study is both consistent and reliable.

The final dataset spans from January 1, 2018, at 00:00 UTC to December 31, 2022, at 23:30 UTC, with a temporal resolution of 30 min, which was set as the input base granularity for the models. This dataset includes measurements for both wind speed and direction. The mean wind speed across this period is 7.71 m/s, the maximum wind speed recorded is 19.55 m/s, and the average wind direction is 188.45°.

2.2 Methodology

Wind information extracted from METAR data served as the base for this study. First, a preprocessing step was executed, converting the original METAR information into a more adequate format. The new structure featured a timestamp, the wind speed in meters per second (m/s), and the wind direction presented in degrees (°). The data was then segmented chronologically. Years 2018 to 2020 were designated for training, the year 2021 was used for validation, and the year 2022 was employed for the testing phase.

Prior to any modeling, the dataset underwent a cleaning process where entries with invalid values were discarded. Subsequently, the data was standardized, ensuring uniformity in the dataset's scale and distribution. Wind information was also separated in to \(\mathop{u}\limits^{\rightharpoonup} and \mathop{v}\limits^{\rightharpoonup}\) vectors, as used in previous works (Quintero Plaza & García-Moya Zapata, 2019) using

$$\overrightarrow{u}={-\omega }_{i}\times \text{sin}\left({\uptheta }_{i}\times \left(\frac{\pi }{180}\right)\right)$$
(1)
$$\overrightarrow{v}= {-\omega }_{i}\times \text{cos}\left({\uptheta }_{i}\times \left(\frac{\pi }{180}\right)\right)$$
(2)

where \(\omega\) is the wind speed in m/s and \(\uptheta\) is the wind direction in degrees.

With the data in optimal shape, the ML architectures evaluated were trained separately for individually outputs, producing separated models for each forecasting lead time. During the model training and evaluation, the data input varied from 30-min periods to 24-h duration. The predictions were made for the subsequent 24-h, in alignment with the forecast span of the GCLP TAF. The output granularity was set for 3-h intervals, in accordance with the TAF verification standards set by the Northern European Meteorological Institutes' TAF (NORTAF) scheme (Novotny et al., 2021).

The DL model’s general architecture consisted of 3 layers, an input layer shaped to match the number of features, a second layer, specific to each model, containing up four times the quantity of training data as the number of neurons or units, and an output layer producing two units, that correspond to the values for the \(\mathop{u}\limits^{\rightharpoonup} \; {\text{and}}\; \mathop{v}\limits^{\rightharpoonup}\) wind components.

The standard optimizer employed was Adam with the mean squared error (MSE) as loss function. The activation function used for all models was the hyperbolic tangent, though a sigmoid is used as activation function for the recurrent step (gate activations). For the dense layers, a linear function is used on all models. Training involved early stopping, monitoring validation loss, with a patience of 10 epochs and a batch size of 512. If not specified otherwise, all other parameters remained at their pre-defined Tensorflow values. The ARIMA fitting was performed in a standalone structure, but, as a univariable model, the followed approach was to use two parallel models enabling the fitting of both \(\mathop{u}\limits^{\rightharpoonup} \;{\text{and}} \;\mathop{v}\limits^{\rightharpoonup}\) wind components.

Post-training, the models were evaluated based on their forecasting accuracy. Wind speed predictions were assessed via the mean absolute error (MAE), MSE, and mean absolute percentage error (MAPE) metrics. Wind direction forecasts were evaluated using the circular mean absolute error (cMAE), circular mean squared error (cMSE), and circular mean absolute percentage error (cMAPE) using

$$MAE=\frac{1}{m}\sum_{i=1}^{m}\left|{Y}_{i}-{\widehat{Y}}_{i}\right|,$$
(3)
$$MSE=\frac{1}{m}\sum_{i=1}^{m}{\left({Y}_{i}-{\widehat{Y}}_{i}\right)}^{2},$$
(4)
$$MAPE=\frac{1}{m}\sum_{i=1}^{m}\left(\left|\frac{|{Y}_{i}-{\widehat{Y}}_{i}}{{Y}_{i}}\right|\right)\times 100,$$
(5)
$$\mathrm{cMAE} = \frac{1}{m}\sum_{i=1}^{m} \left| \left( Y_i - \hat{Y}_i \right) - 360 \times \left\lfloor \frac{Y_i - \hat{Y}_i + 180}{360} \right\rfloor \right|,$$
(6)
$$\mathrm{cMSE}=\frac{1}{m}\sum_{i=1}^{m}\left(\left(Y_i-{\hat{Y}}_i\right)-360\times\left\lfloor\frac{Y_i-{\hat{Y}}_i+180}{360}\right\rfloor\right)^2,$$
(7)
$$\mathrm{cMAPE}=\frac{1}{360\times m}\sum_{i=1}^{m}\left(\left|\left(Y_i-{\hat{Y}}_i\right)-360\times\left\lfloor\frac{Y_i-{\hat{Y}}_i+180}{360}\right\rfloor\right|\right)\times100,$$
(8)

where \(Y\) is the actual values, \(\widehat{Y}\) is the predicted values, and m is the number of predictions. For more direct comparisons, MAE and cMAE were chosen due to the direct relation with the original order of magnitude of the data, enabling a more direct analyze of each model's predictive performance in the context of real-world wind conditions at GCLP airport.

3 Results and Discussion

The initial test was aimed at identifying the optimal input length for each model. In Fig. 3, focusing on a 9-h forecast window for wind direction, there is a clear trend, more precisely, the accuracy of all models improves with the increasing input length, a trend also observed for wind speed prediction and consistently seen across all forecast windows.

Figure  3
figure 3

Models' performance over varying input lengths on a 9-h wind direction forecast

Aggregating data across all models, variables, and predicted times, the calculated average optimal input length is determined to be 20 h. This value can serve as a benchmark for future wind speed and direction forecasts up to a 24-h window using the models under study. Table 1 presents the best-performing input lengths for every combination of model, variable, and forecast window.

Table 1 Optimal input lengths in hours across all models, targets, and forecasting durations

For the 3-h forecast window, most models, especially CNN-LSTM, require longer input lengths, predominantly spanning 20 to 24 h. As the forecast window extends to 12, 15, and 18 h, the optimal input lengths display greater variability; while LSTM and GRU models generally prefer prolonged input durations, models like 1dCNN and vRNN often select shorter lengths. Interestingly, when forecasting for the longest duration of 24 h, most models lean towards extensive input durations, with vRNN being a noTable exception, favoring a shorter 23-h input for direction prediction.

Addressing the results of the model’s predictions for wind speed and direction, Table 2 presents the outcomes of each model for wind speed forecasting, and Table 3 provides results for wind direction forecasts, showcasing the MAE at all forecasted steps. Best performances were marked in bold.

Table 2 Wind speed MAE for the test dataset
Table 3 Wind direction cMAE for the test dataset

Based on data from Tables 2 and 3, the LSTM model consistently demonstrates a competitive error rate in predicting wind speed and direction across various forecast windows. Specifically, for wind speed at the 3-h forecast, the LSTM and GRU both have a MAE of 1.23, with the 1dCNN recording 1.30, vRNN at 1.27, and CNN-LSTM at 1.25. By the 12-h forecast, the CNN-LSTM, LSTM, and GRU models all yield a MAE of 1.71, highlighting similar performance.

In terms of wind direction at this 12-h interval, the LSTM model has a cMAE of 20.91, while the CNN-LSTM and GRU are at 21.02 and 20.95, respectively. Contrasting with these, the 1dCNN and vRNN consistently report slightly higher errors, evidenced by values such as 21.71 and 21.48 for the 12-h forecast for wind direction.

Figure 4 presents a visual comparative for the performance of the models in forecasting wind speed and direction across multiple forecast intervals. It is evident from the graph that the error increases with the length of the forecast period for both speed and direction, indicating the challenge of predicting wind characteristics over extended hours.

Figure  4
figure 4

Comparative analysis of wind speed and direction across all forecast intervals for all tested models

While the LSTM frequently emerges as the most accurate model, the GRU and CNN-LSTM models display competitive performance throughout the evaluated time frames, with the GRU closely aligned and even surpassing the LSTM in particular instances. Nevertheless, the LSTM sustains its leading performance as forecasted time increases, and the CNN-LSTM model distinctly outperforms in wind speed at the 6 and 12-h forecasts. Conversely, 1dCNN and vRNN models consistently rank with the highest error rates, indicating their lower overall performance in the evaluated scenarios.

Figure 5 compares the forecasted wind speed and direction to the actual wind speed and direction for the leading model (LSTM) at the more challenging 24-h forecast for the full year of 2022. Figure 6 zooms on the 3-h forecast, for a more detailed view, at a 5-day period with significant wind speed and direction fluctuations.

Figure  5
figure 5

Comparative wind rose representing observed and modeled wind data for the year 2022

Figure  6
figure 6

LSTM 3-h wind forecast at GCLP

During the timeframe presented in Fig. 6, the MAE for wind speed was 1.38, and the cMAE for wind direction reached 20.50. This error exceeds the annual average, indicating the challenging nature of predicting variations for this specific period. However, the LSTM performance, even in this case, aligns with the established thresholds for wind direction (± 20°) and wind speed (± 2.5 m/s) in TAF forecasts (International Civil Aviation Organization, 2018). The MSE for wind speed was recorded at 3.28, and its MAPE stood at 9.94 × 1013%. Conversely, the wind direction had a cMSE of 1299.50 and a cMAPE of 5.70%. The extraordinarily high MAPE for wind speed can be attributed to the small wind speed values, which cause an exponential rise in the results. However, since wind direction values range between 1 and 360, this specific issue doesn't manifest.

Given these high variations in wind direction and speed, it is expected that the models encounter more challenges due to the rapid changes that introduce a more chaotic behavior for the wind conditions in the data. To potentially enhance model performance under these challenging conditions, increasing the resolution of the database and decreasing the granularity enables a way for capturing more detailed and frequent variations in wind patterns, providing richer datasets that could lead to improved accuracy in predictions and more robust performance across different forecasting scenarios (Alves et al., 2023). Such enhancements in data granularity might offer better insights into the chaotic dynamics of wind behavior, ultimately aiding in the development of more effective predictive models.

To enhance understanding of model behavior and facilitate a comprehensive comparison across all models, an extreme scenario featuring high wind speeds followed by significant fluctuations in wind direction was analyzed, corresponding to the period between 2022-12-03 at 23:00UTC and 2022-12-07 at 00:00UTC. The results of the 6-h forecast for this scenario are displayed in Fig. 7, while the full forecast duration errors are detailed in Tables 4 and 5, for wind speed and direction respectively.

Figure  7
figure 7

Comparative analysis of all models for a 6-h forecast during a high wind speed event followed by a period of high wind variability

Table 4 Wind speed MAE for the period between 2022-12-03 at 23:00UTC and 2022-12-07 at 00:00UTC
Table 5 Wind direction cMAE for the period between 2022-12-03 at 23:00UTC and 2022-12-07 at 00:00UTC

Analyzing the results of Fig. 7, Tables 4 and 5, it is possible to observe that the overall performance of the machine learning models is similar, yet certain models excel in specific forecasting scenarios. The CNN-LSTM and GRU models demonstrate the best performance for rapid changes in wind speed and high wind forecasts, with lower MAE values in earlier forecast hours. In contrast, the 1dCNN model outperforms others over time in wind direction forecasting, consistently showing lower cumulative came, particularly evident at the 6-h forecast interval. These findings highlight the strengths of ML techniques, in specific for each model in their respective applications in more severe conditions, as it is for rapid wind speed and direction changes.

The ARIMA performance, when looking at the entire dataset, was inferior to all DL models, achieving wind speed MAE, MSE, and MAPE of 3.27, 15.02, and 2.03 × 1014%, respectively. For the wind direction, the cMAE, cMSE, and cMAPE stood at 28.82, 2,492.13. Therefore, when compared to the best-performing model (LSTM) ARIMA has a MAE 2.04 and 13.02 superior for wind speed and direction, respectably. Furthermore, these values are outside the limits of TAF forecasts (International Civil Aviation Organization, 2018), additional supporting the need for DL models.

4 Conclusions

This study investigated the efficacy of five DL models (LSTM, vRNN, 1dCNN, CNN-LSTM, and GRU) in forecasting wind speed and direction at GCLP airport, an area renowned for its complex wind dynamics. Utilizing METARs from this location, wind data was extracted and used to train and evaluate each model's performance.

The analysis confirmed the LSTM model as the most reliable, particularly for extended forecasting periods, consistently producing the lowest MAE and aligning with accepTable thresholds for TAF. The LSTM recorded a performance metrics with an MAE of 1.23 and a cMAE of 15.80 for wind speed and direction, respectively, across a 3-h forecasting granularity.

During a challenging period marked by significant fluctuations in wind speed and direction, the LSTM model's predictions aligned with the World Meteorological Organization's thresholds for TAF forecasts. Additionally, this study explored 24-h forecasts in 3-h intervals, aligning with the full GCLP TAF forecasting period and adhering to the 3-h NORTAF scheme. Consequently, this work underscores the potential of DL, especially LSTM-based methods, in enhancing weather nowcasting and forecasting and providing a ground base in automating and improving the accuracy of TAF wind forecasts at airports with intricate terrains or wind patterns, such as GCLP.

Moreover, the CNN-LSTM and GRU models also delivered robust results, especially noTable during periods of significant wind speed and direction fluctuations. These models demonstrated superior performance in scenarios involving rapid changes, evidenced by lower MAE values at earlier forecast hours. Conversely, the 1dCNN model showed strength in wind direction forecasting, outperforming other models in terms of cumulative MAE, particularly over a 6-h forecast interval. These insights highlight the differentiated capabilities of each model under varying atmospheric conditions, highlighting the utility of ML techniques for specific meteorological forecasting challenges.

Our study also highlighted the limitations of the ARIMA model, which consistently underperformed across all metrics when compared to the DL models. This reinforces the superior predictive capabilities of DL techniques in meteorological applications at complex sites like GCLP.

Some limitations include its dependence on METARs from the GCLP airport, which represents a single point wind measurement; utilizing multiple measurement points might yield a more comprehensive view. While the models showed promise for GCLP, it would be beneficial to evaluate their performance across diverse geographical contexts. Lastly, the models were tested over multiple years of data, offering an opportunity for extended temporal analyses to capture varying climatic patterns over multiple years.

Future research is recommended to test wind nowcasting in different geographical areas, ensuring a broader understanding of model adaptability. It would also be valuable to refine and optimize DL models, potentially enhancing their predictive performance. Additionally, incorporating multi-point observations, as opposed to relying on single-point data, might provide a more comprehensive view of wind behaviors and improve forecast accuracy. These steps could significantly advance the field of meteorological forecasting.