1 Introduction

Many thousands of Uncrewed Aircraft Systems (UASs) such as drones and air taxis, will fly in the National Airspace System (NAS) to carry out a wide range of missions from emergency management to delivery and people transport. Today’s drone delivery occurs in low-population areas, such as rural or peri-urban areas. Numerous technology companies, such as Amazon and Google Wing Aviation [1], are investing in drone delivery and diversifying their operations into larger urban markets [2].

The weather component and wind-related restrictions in particular, imposed by Advanced Air Mobility (AAM) designs, although of paramount importance in urban areas remain under-studied. Current solutions suggest using expensive sensors such as lidars and radars that have operational limitations and require continuous calibration and maintenance, low-resolution forecasts not tailored for UAS operations or observations at airports. The prediction of winds, including wind speed (WS), wind direction (WD), wind gusts (WGs) and turbulence quantified by the eddy dissipation rate (EDR) along the AAM routes is an important and non-negligible part of the AAM ecosystem as it is directly correlated with mission safety, battery performance and endurance, delivery scheduling, and client comfort through the Vortex-Induced Vibrations (VIVs) over the vehicles. Wind and turbulence predictions over cities, mountainous and coastal areas pose challenges and require innovative techniques and operational capabilities to provide high-resolution, timely, and accurate predictions. This paper suggests the first version of a wind and turbulence prediction system, “WindAware” that uses turbulence resolving simulations, existing ground-based wind sensing infrastructure, and deep learning techniques to connect real-time and future winds at flight altitude along an air route with real-time and historical surrounding wind data at the ground to support GO/NO-GO decisions for UAS operations. Deep learning is nowadays a hot research topic, and its algorithms are in vogue in weather forecasting research as they have shown promising results in multiple research areas [3]. The motivation for using deep learning techniques is their proven ability to learn nonlinearity [4], in addition to their superiority in low latency applications in terms of delivering rapid operational nowcasts compared to physics-based models. These capabilities can be used to feed other AAM models to help in mission efficiency and schedule management or be integrated onboard the UAS to improve users’ understanding and awareness of future wind-related hazards along a corridor. Three widely used types of neural networks will be compared and evaluated: The first one is based on the multilayer perceptron (MLP) [5], the second one is based on a recurrent neural network (RNN) composed of long short-term memory (LSTM) cells [6], and the third one is a boosting algorithm, namely the XGBoost model [7]. MLP models are feed forward artificial neural networks composed of multiple layers that connect inputs to outputs [8]. LSTMs are based on a recurrent and cyclic architecture that includes an additional layer connecting the layer’s values with values from the previous layer. This recurrent feature allows the model to learn long-term dependencies [9]. The eXtreme Gradient Boosting (XGBoost) model (Cen and Guestin 2016) is a scalable, end-to-end tree boosting system that uses second derivatives to calculate the gradient descent and regularization to prevent over-fitting. These models were widely used in multiple domains, namely medical research [10], energy [11], and economy [12]. Here, both models will be compared in their ability to nowcast wind and turbulence parameters.

The Chicago area is one of the most urbanized areas in the USA, with buildings of different heights and high population densities. Chicago has an additional specific geographical feature: it sits on the shore of the Michigan Great Lake. Lake breezes, which are local circulations induced by thermal contrast between the lake and nearby land, pose an important constraint on Chicago’s microclimate [13]. Therefore, properly simulating the lake-breeze phenomenon is crucial for validating of data used to build WindAware.

Traditional Numerical Weather Prediction (NWP) models use horizontal resolutions on the order of 10 km or more for global models and 1 km or more for regional models [14]. Therefore, turbulent eddies in these mesoscale simulations are implicitly filtered out, and the dynamical transfer between turbulent scales is modeled using boundary layer parameterizations [14]. Nevertheless, physical parameterizations and coarse resolution are two major factors impacting the performance of numerical predictions, especially lakeshores and coastal areas. Although advancements have been made in terms of model treatments of various physical and dynamical processes, uncertainties in the prediction of lake breezes remain and biases are high [15,16,17]. Besides, remote sensing instruments such as lidars and radars are expensive because multiple of them are needed to retrieve the three components of the wind and turbulence. Therefore, they cannot be a scalable solution to every flying area given maintenance costs and operational limitations.

Fortunately, high-resolution simulations (HRSs) are powerful numerical tools to reduce numerical prediction biases. Ref. [18] used HRSs (111 m of horizontal resolution) to explicitly resolve weather phenomena across the northern half of the San Luis Valley, Colorado, during the Lower Atmospheric Profiling Studies at Elevation-A Remotely-Piloted Aircraft Team Experiment (LAPSE-RATE) measurement campaign in July 2018 that present hazards to UAS such as horizontal and vertical shear, thermals, boundary layer turbulence, as well as fog, low cloud ceilings, and thunderstorms. Ref. [14] used large eddy simulations (LESs) to simulate urban wind flows over Downtown Oklahoma City under stable atmospheric conditions using Computational Fluid Dynamics (CFD) models. They succeeded in improving wind speeds by coupling LESs with atmospheric simulations over Downtown Oklahoma City. Ref. [19] also used HRSs to study the sensitivity of lake breezes to atmospheric stability and surface thermal flux. They found that LESs, although computationally expensive, accurately simulate lake-breeze development and their inland penetration compared to mesoscale simulations. Using nesting capability in the Weather Research and Forecasting (WRF) model, HRSs can accurately depict unresolved-scale motions over complex terrain [14, 15, 20]. The WRF nested LES capability has shown promising results in several meteorological research areas, such as boundary layer turbulence [21], stratocumulus clouds [22], and deep convection [23]. Ref. [24] used real-case HRSs with a horizontal resolution of 111 m in order to simulate turbulence magnitudes within the first 10 m of the boundary layer. The 25-day evaluation against sonic anemometer measurements from 60 m tower showed that WS and WD are fairly modeled, but the Turbulent Kinetic Energy (TKE) is underestimated during daytime due to the low vertical velocity and turbulent heat flux is misrepresented because of uncertainties in the sub-grid scale scheme. In the present study, the WRF-LES model will be evaluated during a lake-breeze event across an urban area, namely the Chicago area.

The aim of this study is twofold: First, evaluate the performance of 100 m HRSs in reproducing winds over a complex area such as Chicago using available observations. Second, examine the feasibility and validation of WindAware, an operational prediction model of WS, WD, WG, and EDR along conceptual urban routes. These four quantities have an impact on flights time, scheduling management, UAS uncertainty volume, client comfort, tracking precision, battery life, and overall flight safety.

The novelty of this paper relies on the following: (1) the overall concept of using ground-based sensors data to predict future wind and turbulence aloft and (2) the consideration of bandwidth limits and information overload by providing data only over flight corridors instead of 2D maps at flight altitudes.

This paper is structured as follows: Sect. 2 presents the setup of the WindAware model and the validation of the used data. Section 3 discusses the performance of the HRSs, the validation of WindAware including during lake-breeze events (LBEs). Section 4 highlights the key findings, limitations, and research areas that need to be addressed before operational WindAware can be deployed.

2 Data description and model setup

2.1 Datasets description

WindAware utilizes ground-based data from the UrbaNet network and interpolated data from HRSs.

2.1.1 UrbaNet data

The ground-based data from anemometers are provided by the 45 stations shown in Fig. 1 from the UrbaNet network. UrbaNet is a network of ground-based weather stations located in 20 metropolitan areas across the USA aiming to create urban testbeds and improve weather forecasting in complex urban environments. The data are publicly accessible using the NOAA Meteorological Assimilation Data Input System (MADIS) platform (https://data.eol.ucar.edu/dataset/100.024). These urban testbeds were created and are managed by NOAA’s Air Resources Laboratory (ARL) and AWS Convergence Technologies, Inc. [25], and these stakeholders are planning to expand these testbeds to first, develop, validate, and improve next-generation urban models using real-world data; second, develop tools to characterize communities’ exposure to airborne agents and pollutants and third, work toward operational tools to be used by the emergency response community. The data used in this study, namely WS, are reported approximately every 5 min. WD, WG, and EDR data were not available over the simulated period. The data went through various quality control checks to ensure its spatial and temporal consistency, as described by [26]. If data are missing, the data at the same hour will be used.

Fig. 1
figure 1

Simulated domains, UrbaNet and METAR stations over Chicago

2.1.2 Data over the airway

2.1.2.1 Flight airway definition

For this study, the characteristics of the low-altitude environment for operations of UAM vehicles have been analyzed through the UAM CONcept of OPerationS (CONOPS). The US Federal Aviation Administration (FAA) has established the concept that low-altitude UAS controlled by UTM systems will operate at or below 400 feet above ground level (a.g.l.) [27]. Additionally, UAM vehicles will operate within a defined UAM corridor when cruising above 400 ft a.g.l..

Potential operations over Chicago are planned between the 5 vertiports shown in Fig. 2: O’Hare International Airport, Schaumburg Municipal Heliport, Vertiport Chicago FBO, Marine Heliport, Tinley Park Police Department Heliport, and Chicago Midway International Airport (https://eveairmobility.com/chicago/). The airspace over O’Hare International Airport and Chicago Midway International Airport is Class B and Class C, respectively. The CONOPS suggests that an Urban Air Traffic Management (UATM) system be established, and that information be exchanged through integration with current Air Traffic Management (ATM) providers. These routes should be designed at an altitude above 400 feet a.g.l (121.92 m.a.g.l.). A conceptual route at a fixed altitude of 200 m is used as a proof of concept to test the software. Direct routes following the geodesic between different vertiports that are based on distance optimization are not adopted here because they would pass over residential areas, the O’Hare International Airport and Downtown Chicago on the route between O’Hare International Airport and the Marine Heliport. Notional air routes used here are designed along rivers, lakes, and major roads to avoid noise-sensitive areas, obstacles, and congested airspaces over the inner circles of Class B and Class C airports. Precisely, the routing network follows highways and ground routes, such as Interstate 355, the Chicago-Kansas City Expressway, and rivers or lakes, such as the Michigan Lake. Figure 2 shows the direct routes between vertiports, and the notional routing network used in the present study.

Fig. 2
figure 2

UAM flight routes and airspace segregation over the Chicago area

A new field of research on Urban Air Mobilities (UAMs) tries to answer questions about how future urban airspace will function and where urban air routes should be placed because design scenarios will have an impact not only on UAM systems but also on ground-level communities, ground mobility features, micro-economies, as well as future urban planning and architecture. Multiple UAM simulations are being developed and tested. For instance, AirMap, a US-based company working on UAS Airspace Management (UTM), is testing its platform in upper New York State (New York State Governor’s Office 2019). The NTU Air Traffic Management Research Institute is designing and modeling urban corridors in Singapore with the purpose of simulating various scenarios of airspace capacity [28].

Presently, multiple studies have shown that UAM traffic in urban areas is unlikely to follow direct origin-to-destination routes due to multiple factors summarized in [29]. Ref. [30] underlined that a future UAM airspace would include, elements of airspace design, existing airspace restrictions, dynamic geo-fencing, among other restrictions that would limit the paths UASs could take through the airspace. The altitude at which different types of UASs will operate is also an open regulatory question. Ref. [31] suggested in a proposal submitted in 2015 a solution based on a two-tiered airspace design for UASs of different capabilities and autonomy levels with “low-speed localized traffic,” operating up to an altitude of 61 m and a high-speed transit’ zone from 61 to 122 m for highly automated vehicles operating beyond line of sight [31].

2.1.2.2 Data description

Data preprocessing


The wind data along the route consists of WS, WD, WG, and EDR values along the urban corridors at a resolution of 100 m. A total of 3487 points distanced by 100 m constituting the 348 km route are simulated using a NWP microscale simulation using the LES model to resolve turbulence and are retrieved by the following two steps:

  1. 1.

    Simulating winds with 100 m resolution over Chicago using the WRF model [32] in the LES mode (WRF-LES). These simulations are considered LESs during clear daytimes when atmospheric conditions are unstable, and the boundary layer is convective and a turbulence gray-zone simulation during stable conditions (nighttime and early morning). As stability increases, the scale and magnitude of turbulent eddies decrease and finer meshes are required to resolve the dominant turbulent motions [33].

  2. 2.

    Retrieving wind data along the corridors using 3D interpolation from the NWP grid. The nearest neighbor interpolation is used to avoid smoothing turbulence effects and conserve hyperlocal resolved wind features. In this proof of concept, the air route at one altitude of 200 m is used. In another use case whereby the altitude of the air route changes, the same interpolation technique can be used. The WG and EDR are derived from the simulated WS and Turbulent Kinetic Energy (TKE) fields following Eqs. 1 [34] and 2 [35], where L is the integral length scale of large turbulent eddies that is assumed to be constant and equal to 300 m following [35]. The WG and EDR parameterizations, designed based on results from mesoscale simulations (with a horizontal resolution coarser than 1 km), will be tested here using data from sub-kilometer simulations. The total TKE is calculated in Eq. 3 as the sum of the resolved TKE (TKEresolved) and the modeled Sub-Grid Scale (SGS) TKE (TKESGS).

    $${\text{WG}} = {\text{WS}} + \sqrt {2 \cdot {\text{TKE}}}$$
    (1)
    $${\text{EDR}} = \left( {\frac{{{\text{TKE}}^{\frac{3}{2}} }}{L}} \right)^{\frac{1}{3}}$$
    (2)
    $${\text{TKE}} = {\text{TKE}}_{{{\text{resolved}}}} + {\text{TKE}}_{{{\text{SGS}}}}$$
    (3)

Simulations at 100 m of horizontal resolution were performed from March 1, 2022, to July 25, 2022, over domain D2, shown in Fig. 1. This simulation is dynamically downscaled via one-way nesting capability in the WRF model. The parent mesoscale simulation over D1 has a horizontal resolution of 1 km and is used to provide initial and boundary conditions for the 100 m simulation over D2. Both the nesting D1 and nested D2 domains are centered over 41.82405° N and 87.64975° W and performed at the same time. Model setup and used parameterizations are given in Table 1. The simulation is split to 24 h simulations with additional 6 h period for spin-up time. Ref. [18] showed that 6 h spin-up time is sufficient for the dissipation of spurious gravity waves and produce balanced flows for the coupling through the lateral boundary conditions. The 30 h simulations are initialized every day using the High-Resolution Rapid Refresh (HRRR) analysis data. The HRRR provides real-time 3 km resolution, hourly updated, cloud-resolving, convection-allowing forecasts for 48 h every 6 h. Radar data are assimilated in the HRRR every 15 min over a 1 h period in addition to other conventional observational data [36]. No other data are assimilated in simulations over both parent and child domains. A vertically stretched terrain-following sigma coordinate is used with 80 vertical levels, and the lowest 30 levels are below 1 km. The cell perturbation technique described by [37, 38] was not used this WRF setup because in this urban area, the turbulence should be generated by local processes and vertical turbulent transport because of the surface-based heat fluxes [37]. Therefore, adding synthetic turbulence at the lateral inflow boundaries is not necessary.

Table 1 WRF model configuration and input physics parameterizations

The coupling between the two simulations is reformed following the mesoscale-to-microscale (M2M) coupling [46]. A refinement of 10:1 between D1 (1 km) and the D2 (100 m) is used. This large ratio is used in order to minimize the effect of the turbulence gray zone (terra incognita) resolutions, for which neither boundary layer parameterizations nor LES are not suitable [14, 47]. Ref. [18] used a ratio of 9:1 for the M2M system over the northern half of the San Luis Valley, Colorado, during the LAPSE-RATE field campaign, and simulations were successful in capturing thunderstorms, thermals, wind shear, and low ceiling events and Ref. [48] used a ratio of 11:1 for the M2M system during the Crop-Wind Energy Experiment (CWEX) field experiments with 90 m resolution to ensure a proper modeling of turbulence under stably-stratified conditions. The turbulent sub-grid-scale parameterization used for the nested simulation is the 1.5 TKE closure [49]. Cumulus is not parameterized over both domains to allow convection resolution. The configurations and parameterizations used are similar to those used in [14]. Given the computational expense required by simulations over D2, the widely used Single-Layer Urban Canopy Model is chosen. This model estimates momentum exchange in the urban area using approximations of roughness lengths and displacement heights using morphometric parameters of buildings of uniform heights derived from idealistic laboratory experiments [50]. Currently, substantial efforts are being devoted to refining the urban canopy models using satellite and Geographical Information Systems [51], coupled WRF-CFD simulations [14], the Immersed Boundary Method (IBM [52]), and multilayer urban canopy models [53].

Lake-breeze events identification


Five criteria, defined by Ref. [54] are selected to identify lake-breeze events (passage of lake-breeze front) during the simulated period (hereafter, LDT refers to the local Daytime Time which corresponds to the UTC-5 h):

  1. 1-

    Abrupt change in average wind direction from offshore to onshore during daytime (0600 LDT to 1900 LDT). The perpendicular shoreline is defined using the coastline orientation (330°–150°).

  2. 2-

    A positive difference between the daily maximum temperature at an inland METeorological Aerodrome Reports (METAR) station and the lake surface temperature measured at the same hour. The temperature of the lake surface is obtained from the Great Lakes Surface Environmental Analysis (GLSEA [54]). These temperatures are derived from the NOAA Advanced Very High-Resolution Radar (AVHRR), Visible Infrared Imaging Radiometer Suite onboard the Suomi National Polar-Orbiting Partnership spacecraft (VIIRS S-NPP), and NOAA-20 Visible Infrared Imaging Radiometer Suite (VIIRS NOAA-20) imagery obtained through the NOAA Great Lakes CoastWatch program (https://coastwatch.glerl.noaa.gov/glsea/doc/). The offshore temperature grid point in the lake surface is located at 10 km offshore from the nearest point to the shoreline with an azimuth angle equal to the 1 h average wind direction after the TLBA following [55].

  3. 3-

    An average air temperature in the morning (0600–0800 LDT) lower than during the afternoon (1700–1900 LDT).

  4. 4-

    An average wind speed in the morning (0600–0800 LDT) less than 5.5 m s−1.

  5. 5-

    No precipitation or trace of precipitation after 3 h of the wind direction shift.

The Time of the Lake-Breeze Arrival (TLBA) is defined as the time of the greatest shift in WD computed using temporal rate of change in WD. A dataset of lake-breeze events is built and groups only data during LBE detected over the nearest METAR station to the lakeshore (PWK) over a period of 3 h before and after the TLBA. This dataset is used to validate the HRSs during these events in terms of WS, WD, WG, TLBA, and inland penetration and assess the performance of WindAware during these events. After applying these criteria to data from PWK, we found that during the simulated period, LBEs occurred 19% of the total days which corresponds to 28 days from 147 days.

2.2 WindAware description

Two deep learning models will be trained and tested using the same inputs and outputs: The first one is WindAware, which is based on a RNN composed of LSTM units as detailed in Sect. 3.1. The second one is based on the MLP network, as detailed in Sect. 3.2.

2.2.1 Model description and design

WindAware is a RNN model that uses as inputs:

  1. (1)

    current WS from 45 UrbaNet stations at time t

  2. (2)

    historical WSs from 45 UrbaNet stations with a frequency of 5 min (times: t − 5 min, t − 10 min, …, t − 55 min, t − 1 h)

The model makes inferences about four quantities along the route in the future with a frequency of 5 min. These quantities are WSs, WDs, WGs, and EDRs at times (t, t + 5 min, t + 10 min, …, t + 6 h).

Overall, the model requires 540 inputs (a WS value for 12 time steps (every 5 min for one hour back for the 45 UrbaNet stations)) and will predict 1,004,256 outputs (4 × 3487 × 72) corresponding to values of WS, WD, WG, and EDR for 6 h ahead every 5 min (72 time steps) over the conceptual route (3487 waypoints). Figure 3 shows the design of the model with its inputs and outputs.

Fig. 3
figure 3

WindAware block diagram

2.2.2 Model architecture

The architecture of WindAware includes input, two hidden, and output layers:

  • The input layer feeds the input value to each cell in the hidden layer.

  • Every hidden layer is composed of four LSTM cells that perform calculations through weight adjustment to produce the outputs [56]. The basic components of the LSTM cell are described in [57]. The LSTM cell was designed particularly to reduce the vanishing gradient issue found using RNNs [58].

  • The output layer consists of a dense layer’s neuron that receives output from every neuron in the hidden layer, where one neuron of the dense layer changes the dimension of a vector to one predicted value.

The Keras framework [59] is used here, which implements the extended LSTM described in [60].

2.2.3 MLP description

MLP is a classical neural network that is widely used to resolve classification and regression problems [61]. It consists of a suite of three layers that contain a number of nonlinear activation functions [62]. The first layer is the input layer, the second and third layers are the hidden layers, and the fourth layer is the output layer. Each hidden layer of formed of 20 neurons. The nonlinear activation function used here is the rectified linear unit (ReLU) function, defined as the positive part of its input. All weights and biases are randomly initialized. Detailed information about MLP and the training process using backpropagation can be found in [63,64,65].

The input and output layers of both the MLP model and the RNN model are similar.

2.2.4 Description of XGBoost

XGBoost is a classic gradient-boosting algorithm that combines multiple decision tree algorithms and linear models to create a strong learner with high predictive power [66]. Each tree uses residuals from previous tree and a gradient algorithm based on Hessians to build new decision trees. The XGBoost algorithm prevents over-fitting and optimizes the computation resources, which is obtained by simplifying the objective functions that allow combining predictive and regularization terms but maintaining an optimal computational speed [7]. The pseudo-code of the XGBoost algorithm is provided by [67]. The parallelized implementation is used to speed up the tree-building process. GridSearchCV algorithm [68] uses the Grid Search technique for finding the optimal hyperparameters to increase the model performance by Testing all possible combinations and electing the one producing the best performance.

2.2.5 Training/testing setup

The 5 min data described above from March 1 to July 25, 2022, was first generated. Second, the first 70% of the generated data from every month (except LBE days) is used as a training dataset. The remaining 30% from every month (except LBE days) is used as a testing dataset. This method aims at capturing the maximum wind and turbulence variability and seasonal regime changes. Moreover, 70% of data during LBE days represented is used as training dataset using a random selection and the rest is used as testing dataset. This approach is adopted because the LBE are not uniformly distributed in time during the simulated period. The fine-tuning of various hyperparameters to reach a trade-off between accuracy and low training time was performed. In order to be used in the operational environment, the model should be trained over longer periods and seasons to enable the learning of different complex microscale features.

The loss function is calculated at the end of every training epoch, after the weights have been updated. The loss function that we used is the mean squared error (MSE), which tries to minimize the average squared error between the desired output and the predicted output, given by Eq. 4 where N is the number of data, Yi is the target output, and yi is the predicted output at a given epoch. The normalized root MSE (NRMSE), defined in Eq. 5, is used to assess the temporal evolution of nowcasting errors.

$${\text{MSE}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {y_{i} - Y_{i} } \right)^{2}$$
(4)
$${\text{NRMSE}} = \frac{{\sqrt {\frac{1}{N}\mathop \sum \nolimits_{i = 1}^{N} \left( {y_{i} - Y_{i} } \right)^{2} } }}{{\frac{1}{N}\mathop \sum \nolimits_{i = 1}^{N} Y_{i} }}$$
(5)

Other statistical metrics defined in Ref. [14] are used to evaluate the accuracy and temporal correlation of numerical simulations: the simulated mean (\(\overline{s}\)), the observed mean (\(\overline{o}\)), the root-mean-square error (RMSE), the correlation (R), and the mean bias error (MBE). Their definitions is presented in Eqs. (6, 7, 8, 9, 10) where oi and ci are the observed and the simulated parameters at time i.

$$\overline{s} = \sqrt {\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} s_{i} }$$
(6)
$$\overline{o} = \sqrt {\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} o_{i} }$$
(7)
$${\text{RMSE}} = \sqrt {\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {s_{i} - o_{i} } \right)^{2} }$$
(8)
$$R = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {s_{i} - \overline{s}} \right)\left( {o_{i} - \overline{o}} \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{N} \left( {s_{i} - \overline{s}} \right)^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{N} \left( {o_{i} - \overline{o}} \right)^{2} } }}$$
(9)
$${\text{MBE}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {c_{i} - o_{i} } \right)$$
(10)

3 Results and discussion

3.1 Validation against METAR data

WRF-LES model performance from March 1 to July 25, 2022, over Chicago is shown in Fig. 4 using the 10 m WS, WD, and WG every 5 min at the ORD station located at O’Hare International Airport.

Fig. 4
figure 4

Comparison of simulated WS, WD, and WG with the observations over ORD

The TKE or EDR observations are not available over the three stations, which prevents the evaluation of the simulated ones. Besides, observations of winds and turbulence aloft using remote sensing or airborne instruments are needed to validate the model at flight altitude. Table 2 shows the statistics for this comparison over the three stations (ORD, PWK, and MDW) shown in Fig. 1 (white cells) over the simulated period.

Table 2 Statistics for the comparison of WRF-LES with the three ASOS stations

Simulated WS and WD show a reasonable agreement with measured data during the simulated period, with an average bias over the three stations of 30.6% for WS and − 25.3% for WD and correlations of more than 50%. WS is slightly overestimated, probably owing to the overestimation of nighttime WS due to uncertainties in used resolution under stable conditions as shown in Fig. 6. Another possible source of overestimation of near-surface WS is the urban canopy model used here, in which the urban area is considered a 2D area with idealized roughness estimation, as explained by [52] and [69]. However, the RMSE of simulated WD is high because of the low-magnitude fluctuations in the measurement data that were not captured by WRF-LES. WG is highly overestimated, with biases exceeding 40%. These errors are mainly due to the overestimation of WS and uncertainties related to the parameterization used in Eq. 1. In fact, the used parameterization is designed for mesoscale simulations where large eddies are parameterized. Therefore, WG is expected to be overestimated when large eddies are resolved in HRSs.

To further validate the WRF-LES model in terms of winds aloft predictions during this event, modeled and measured wind vertical profiles up to 20 km vertically are compared in Fig. 5 against radiosonde data at 00UTC (two left panels) and 12UTC (two right panels) on August 16, 2022, over a land point of coordinates (40.15° N, 89.33° W). Sounding profiles obtained from radiosondes are provided by the Department of Atmospheric Science Sounding Archive of the University of Wyoming (http://weather.uwyo.edu/upperair/sounding.html). These upper-air wind data, namely WS and WD, are based on GPS wind retrievals with a very high vertical resolution (1–2 s) and an accuracy of 0.15 m s−1 [70]. The WRF-LES simulations show good agreement with the sounding observations for both times and both WS and WD. For simulated wind speeds, high discrepancies are found at mid altitudes (5–13 km), which correspond to the tropopause. For WD, both measurements and the model agree that winds are northerly and up to 2.5 km corresponding to the Planetary Boundary Layer (PBL) while winds are easterly at 12UTC. Moreover, both WRF-LES and observations capture at slightly different altitudes the inversion from easterly to westerly winds (2.5 km at 00 UTC and 4.5 km at 12 UTC). Within the PBL (below 2.5 km), reasonable agreement is found between WRF-LES and observations, with an average RMSE of 19°. However, higher discrepancies are found in the WD comparison at the upper heights (17–21 km) as the simulation was not able to capture fine-scale gradients of wind direction. This difference may be due to the coarse vertical resolution of the model at these altitudes as the vertical resolution is finer at low altitudes within the PBL and gradually stretches at upper altitudes.

Fig. 5
figure 5

Comparison of the vertical profiles of WS and WD

The comparison of the WS diurnal cycle shown in Fig. 6 shows that daytime WS is better reproduced as the height is the boundary layer is high due to convection and surface heating. More discrepancies are found during night and early morning where the boundary layer is low. Therefore, higher resolution is needed to improve winds HRSs under stable conditions. As EDR observations are not available, comparison was made based on the comparison of the probability density functions (PDFs) of the EDR at PWK shown in Fig. 6 and findings from the literature [19] during both nighttime and daytime. This comparison confirms the findings of Ref. [19] that the nighttime (daytime) EDR follows log-normal (log-Weibull) distribution with a peak of 0.009 (0.09) m2/3 s−1. The fitting distributions (represented by dashed lines) are found using nonlinear least squares minimization. By comparison to 20 Hz anemometer data, Ref. [19] show that during nighttime stable conditions, EDR’s PDF follows log-normal distributions, while unstable daytime EDR’s distribution follows log-Weibull distribution with higher peak. High-frequency data acquisition is needed in METAR stations to allow ground-based EDR validation.

Fig. 6
figure 6

From left to right: comparison of WS, WD on June 27, and WS diurnal cycle and EDR distributions

3.2 Data validation during lake-breeze events

The objective of this validation is to evaluate model’s skill in predicting a breeze blowing from Lake Michigan. The statistical comparison of the model against LBE data is shown in the blue columns in Table 2.

The model has reproduced 21 LBEs from the 28 events observed over PWK which corresponds to 75% of the total events. This difference might be due to the absence of water–air coupling in the NWP model because lake surface temperature is a critical factor in simulating winds and various unsteady circulations over Chicago, such as lake-breeze circulations, as demonstrated by [15]. Among days during which the LBE occurred, data from 15 days are selected among the training dataset and data during the remaining six days is considered as testing dataset.

The WRF-LES model performance is strongly degraded during LBE events with biases around 50% especially the WD because the model could not reproduce both the WD shift and its timing TLBA as shown in Table 3 where the error is exceeding 1 h. These discrepancies can probably be improved by coupling the model with a lake model to better represent lake surface temperature and the thermal contrast between land and water surface. The inland penetration is estimated as the distance between the lakeshore and the furthest UrbaNet station where the lake-breeze front was detected. The furthest station is located at a distance of 39 km from the lakeshore. We found that the inland penetration is overestimated in the model and the lake-breeze front is advected out of the domain D2 highlighting model’s difficulty to predict the decay of the lake-breeze. A lake breeze that was accurately reproduced in terms of capturing the TLBA over PWK is the lake-breeze case on June 27, 2022, shown in Fig. 6. The WD shift is well reproduced. The discrepancies in WS are probably due to the misrepresentation of WS during stable conditions or uncertainties related to the urban canopy model.

Table 3 Comparison of TLBA over the three METAR stations

To examine the vertical structure of the lake-breeze event on June 27 at 1400 LDT, the cross section of the zonal winds and barbs of the vertical wind components from point A over Michigan Lake to point B (in the urban area) is shown in Fig. 7. The coordinates of A and B are (87.2834264° W, 41.879705° N) and (89.649976° W, 41.877705° N), respectively. The vertical velocity is multiplied by 200 to highlight with vectors the vertical motion. The spatial patterns of the zonal wind show that the winds are easterly with negative zonal winds until an inland point of longitude 87.99° W indicating that the inland penetration during this event at this time is estimated to be more than 39 km. Starting from that point, easterly and westerly winds are mixed up as winds from onshore and offshore meet over that convergence area. The mixing layer over the lakefront urban area is becoming stable and the vertical convective mixing is pushed inland as cooler and humid air is transported inland. Another pattern shown in Fig. 7 is related to the distribution of the vertical wind component that is very low with very low vertical and horizontal gradients over the lake and urban penetration area, but over the convergence area, thermal perturbations and very strong gradients are simulated because of the urban heat island effect during daytime. These strong gradients are indicative of a lake- breeze front that was successfully captured [21]. This difference is due to the thermal difference over the lake and the city: warm temperature over the city with stratified layers and strong vertical temperature gradients and weak vertical gradients over the lake [7]. Moreover, the vertical motion is very small over the urban area within the inland penetration area. This is indicative that the convective forcing due to the lake-breeze was higher than the radiative forcing, starting the point of longitude 87.99°W, the radiative forcing begins to increase within the convergence area.

Fig. 7
figure 7

Cross section (A to B) of the zonal wind field (shades) and vertical wind component (arrows)

3.3 WindAware v1.0 learning evaluation

Figure 8 shows the loss functions of both models training process. The loss function is calculated using the MSE. This figure demonstrates that the training of both models was successfully completed because the loss functions decreased and converged to a very small MSE of approximately 0.08 using both the training and testing datasets. The convergence of both training and testing curves also shows that both models are neither over-fitting nor under-fitting the training dataset. Moreover, the training of both models required 200 epochs. The errors are reduced more rapidly using MLP than RNN, as the MLP loss function starts its descent during the first epochs, earlier than RNN (71 epochs).

Fig. 8
figure 8

Monitoring of the learning process

3.4 WindAware v1.0 validation and discussion

After finishing the training, the three models were compared here using unseen inputs in terms of the temporal variation of prediction errors (Fig. 10) of the four parameters that is quantified using the NRMSE between the inferred data using the deep learning models (WindAware and MLP) and the WRF-LES data along the routes averaged over the testing dataset (Fig. 10). Figure 9 shows a 3 h prediction example using WindAware using testing data on March 2, 2022, at 0120 UTC as input to the model, and the absolute difference between the WRF-LES and predicted values along the route. Both models were able to learn the spatiotemporal correlations of historical and real-time WSs at ground level with real-time and future WS, WD, EDR, and WG at an altitude of 200 m.a.g.l. covering a corridor.

Fig. 9
figure 9

Example of WindAware predictions and absolute errors (AE)

The maximum differences at the first prediction time between the predicted (WindAware) and WRF-LES WS and WG at 3 forecast hours are as low as 1.5 m s−1, which corresponds approximately to 37% of the predicted wind speeds and gusts. For WD prediction at 3 h, the maximum errors are 40º. For EDR, the maximum errors are 0.01 m2/3 s−1 localized on a segment of the routes. Figure 10 shows WindAware’s normalized error as a function of nowcasting lead times. Errors on the predictions of the four parameters are scattered throughout the route and increase as a function of times, ahead ranging from approximately 10–15% in real time (forecast time of 0 s) to 40–55% at a forecast time of 6 h. The prediction errors confirm error amplification with a temporal rate of approximately 0.11 percent/min using both WindAware and XGBoost averaged over the four parameters, slightly lower compared to the MLP error amplification rate of 0.12 percent/min.

Fig. 10
figure 10

Temporal variation of prediction errors

In addition, the errors of the RNN model are slightly lower than the ones of the MLP-based model, especially for long prediction times, showing that the RNN model slightly outperforms the MLP model. This slight difference in performance may be due to the fact that LSTM, given its recurrent nature and the spatiotemporal correlations between the data, outperformed the MLP model. The XGBoost model also shows intermediate performance comparing to WindAware as the best performing model and MLP as the model with the lowest performance with a particularity when predicting EDR at long lead times where XGBoost shows the highest performance.

Although trained on a limited amount of data as proof of concept, both models were successful in predicting the wind features along the routes. It is worth mentioning that even with a low number of hidden layers, the models were able to reasonably reproduce different wind parameters. Ref. [51], who compared multiple deep learning models with different cell numbers in terms of forecasting accuracy over the nowcasting time steps, found that increasing the size of hidden layers may produce under-fitting results and hence less accurate predictions.

3.5 WindAware v1.0 evaluation during lake-breeze events

WindAware is evaluated during lake-breeze events included in the LBE testing dataset. The NRMSE in Fig. 10 is computed using predictions every 5 min during the LBE and corresponding LBE testing dataset (6 days). This comparison shows that model errors grow as a function of lead times but are higher when tested only on the LBE dataset especially WD because the change on WD is higher than the other parameters. Errors on the associated gusts and turbulence also increased as they are associated with the front passage. The error amplification rate averaged over the four parameters during LBE (0.18 percent/min) is higher than over all testing dataset (0.11 percent/min).

Figure 11 shows the absolute error of 1 h WindAware predictions on June 27, 2022, at 1430 LDT initialized at 1330 LDT. The day of June 27, 2022, is among the testing dataset. Errors are higher comparing the 3 h predictions under normal conditions (not LBE, example shown in Fig. 9). This poor performance can be explained by the fact that these events are underrepresented in the training dataset (15% corresponding to 15 days from 103 days included in the training dataset) and more data collected during LBE can improve its performance. Developing a separate model trained only on LBE predictions to be deployed when LBE conditions are met should be tested in the future.

Fig. 11
figure 11

Absolute errors of predictions on June 27, 2022

3.6 Computational performance evaluation

Different model configurations are compared in terms of inference time, training time, and accuracy in Table 4. The NWP simulation was computed on 50 dual-core processors. The comparison between data-based and physics-based models shows that the inference using the physics-based model is understandably slower than the data-based models by a factor of more than 6000%. Moreover, increasing the depth of the network does not improve prediction accuracy, but the training time is higher than WindAware’s training time. The MLP model training time is longer than other data-based models and with higher error. Models with higher cell numbers have been tested, but no improvement was found regarding prediction accuracy, and longer training times are required to reach low biases. XGBoost is trained more rapidly because it was run and optimized in parallel using the configuration described in [67].

Table 4 Comparison of inference, training times, and accuracy between different models

4 Conclusion and future work

This paper presents the first version of WindAware, a data-based model that predicts wind and turbulence parameters WS, WD, WG, and EDR using a RNN composed of LSTM cells. This model can have added value for UAS, eVTOLs, and helicopter’ operators as it can predict avoidance areas in the predefined routing network. This model is based on sparse data from ground-based sensors from the urban network UrbaNet and wind and turbulence data along a corridor from high-resolution simulations using the WRF model. The M2M coupling is adopted using the WRF nesting capability to downscale forecasts to 100 m HRSs. The simulated data are validated over a period of 147 days against ground-based data at three airports in the Chicago area. A reasonable agreement between simulated parameters and METAR and upper-air data over airports is found, with an overestimation of WS during stable conditions because of resolution uncertainties and overestimation of WG due to parameterization uncertainties. Higher resolution is needed to improve model’s performance during nights and early mornings. The EDR, modeled using parameterization based on mesoscale simulations, was validated using findings from the literature but an adapted WG and EDR parameterization based on sub-kilometer simulations is needed. In addition, airborne or remote sensing data are needed to validate the simulations at flight altitude. A LBE on June 27, 2022, was investigated to validate evaluate the modeled vertical structure of the lake-breeze event and the simulated WD shift. The evaluation during LBE showed that the model slightly misrepresented the LBE arrival timings and inland penetration. Assimilating the ground-based data into the forecast might improve the LBE modeling and coupling the NWP model with a lake dynamics model might improve the modeling of air–water interaction in terms of air temperature above the lake and the thermal contrast driving the LBE. In the future, the validation of the HRSs at flight altitude can be conducted using remote sensing instruments.

These HRSs were then used to build validated training and testing datasets of high variability to train and test three widely used deep learning-based models, namely RNN (WindAware), MLP, and XGBoost. WindAware was able to predict wind and turbulence parameters, including WS, WD, WG, and EDR every 5 min up to 6 h. WindAware was evaluated and compared to predictions from HRSs. It is shown that WindAware was able to predict the four parameters with reasonable accuracy and has similar performance as XGBoost but slightly outperforms the MLP model, showing a better ability to fit the WRF-LES and measured wind data from ground sensors and learn complex spatiotemporal correlations.

This work demonstrates that nowcasting errors are amplified as a function of prediction times, with an error temporal rate of 0.11 percent/min as the real-time errors are 10% and they become as high as 45–50% at a forecast time of 6 h. Therefore, the model can operationally be initialized frequently so that forecast errors are reduced. The evaluation over LBE data shows that the model errors are high because these events are underrepresented in the training dataset. Including temperature data from the UrbaNet as model inputs might improve WindAware’s skill to predict LBE and WD shifts.

Future work should focus on training the model for a longer period to improve its performance and predictability in unseen wind regimes. From an operational standpoint, future work can investigate ways to connect it with the UAS autopilot onboard the vehicle and evaluate its added value during simulated flights before real-world UAS navigation and collect feedbacks from end-users. Future work should also focus on increasing the efficiency of the model’s training by speeding up the process using Graphics Processing Units (GPUs) or parallelization implementation using Central Processing Units (CPUs).

A limitation of WindAware is its generalizability: First, the investigated deep learning algorithms are “black-boxes” that make interpretability and trustworthiness challenging. Second, the sensitivity to the number of sensors questions the ability to apply this concept in another location. Future work should also explore techniques for quantifying the uncertainty of estimated outputs using stochastic ML algorithms and explainability techniques to overcome the aforementioned limitations. Furthermore, testing the generic Temporal Convolutional Networks (TCNs) is also worth investigating using real-world correlated data. Ref. [66] demonstrated that TCNs have longer effective memory and outperform recurrent networks such as LSTMs on tasks such as audio synthesis and machine translation.