Characterizing mesoscale variability in low-level jet simulations for CBLAST-LOW 2001 campaign

A low-level jet (LLJ) event observed during a frontal passage in the 2001 Coupled Boundary Layers and Air–Sea Transfer Experiment in Low Winds campaign was simulated using the Weather Research and Forecasting model (WRF). The sensitivity of the modeled LLJ characteristics, such as formation time, height and the strength of the LLJ core, to the choice of initial and boundary conditions, planetary boundary layer (PBL) schemes and vertical resolution was evaluated with a suite of diagnostic tools. The model simulations were compared against available soundings from the campaign observations as well as with surface observations from the Automated Surface Observing Systems. The simulation initialized with ERA-interim reanalysis and using the Mellor–Yamada–Nakanishi–Niino PBL scheme gave the best mix of diagnostic scores for surface temperature and wind speed predictions. The choice of boundary conditions introduced a stronger variability in the LLJ characteristics than the changes in PBL schemes or vertical resolution. The variability emerged primarily due to the timing of the frontal passage in the boundary condition datasets.


Introduction
A low-level jet (LLJ) can be described as accelerated winds aloft, confined within the atmospheric boundary layer. Such events are characterized by strong surface shear and wind speed maximum of more than 10 m s −1 . LLJs have been frequently documented in many regions of the world (Bonner 1968;Doyle and Warner 1993;Whiteman et al. 1997;Banta et al. 2002;Song et al. 2005;Lundquist and Mirocha 2008;Colle and Novak 2010;Rife et al. 2010;Hu et al. 2013;Berg et al. 2015;Du et al. 2015;Vanderwende et al. 2015;Smith et al. 2019) and have an impact on the transportation of momentum, moisture and pollutants (Angevine et al. 2006). Furthermore, LLJs can be a source of strong winds for wind power production. However, the associated strong wind shear and veer have significant influences on the stresses experienced by a wind turbine (Kelley et al. 2004). A better understanding of LLJ interaction with wind turbines can advance optimal wind turbine design strategies.
The traditional approach for wind resource modeling is based on Jackson-Hunt-type linear flow models such as WAsP, MS3DJH and MS-Micro (Brower 2012). Linear flow models solve steady-state linearized Navier-Stokes equations with simplified assumptions for terrain effects (Jackson and Hunt 1975). They are computationally efficient and are able to provide a suitable first approximation for the wind climate of a chosen site. However, the increased availability of computational resources and power has made it more attractive to consider more complex models to be used for wind resource assessment. Higher-order models are able to better resolve nonlinear effects and hence provide better approximation of the available wind resource (Mahoney et al. 2012;Badger et al. 2015;Beaucage et al. 2014;Gopalan et al. 2014).
Numerical weather prediction models (NWP) are used as operational weather forecast models although the interest in using NWP for wind resource assessment is growing (Storm et al. 2009;Badger et al. 2015;Beaucage et al. 2014;Hahmann et al. 2014;Wilczak et al. 2015). NWP models are typically compressible, non-hydrostatic models that are able to simulate a wider scale of meteorological Responsible Editor: S. Trini Castelli. 1 3 phenomena driven by atmospheric stability such as lowlevel jets and barrier jets.
With increased focus on offshore wind power (Dvorak et al. 2010;Archer et al. 2014;Gryning et al. 2014;Hahmann et al. 2014), NWP provides a larger spatial and temporal coverage for wind climate studies compared to in situ offshore measurements. Few long-term measurements of offshore wind are available due to the cost and difficulty in setting up mast towers. NWP can bridge this gap and provide better approximations of the wind climate at a potential offshore site before further commitment into costly long-term measurement campaigns.
Despite the greater accuracy of NWP models in wind prediction, the modeling outcome requires familiarity with both the geographic region of interest and the available model parameterization schemes (Storm et al. 2009;Talbot et al. 2012;Nunalee and Basu 2014;Vanderwende et al. 2015;Berg et al. 2015). No universal model set-up performs optimally in all geographic regions and mesoscale conditions. Hence, it would be useful to be able to objectively quantify the performance of mesoscale simulations and pinpoint the sources of the prediction uncertainties. This is the major motivation for the present investigation.
One of the most common NWP models is the Weather Research and Forecasting model (WRF), which is a nonhydrostatic mesoscale numerical weather prediction model used extensively for both operational and research purposes (Skamarock et al. 2008). WRF allows for either idealized or real-data simulation studies and uses nested domain runs that can be one-way or two-way coupled.
WRF has been shown to simulate LLJs reasonably well for the US Great Plains (Storm et al. 2009;Vanderwende et al. 2015) and for coastal cases (Nunalee and Basu 2014). These studies have shown that the timing of LLJ formation, as well as the height and strength of the jet core, is dependent on the WRF model configurations. In this study, we seek to characterize the variability of the results related to the model configurations. We utilize WRF (version 3.4.1) to simulate a coastal LLJ observed in August 2001 during the Coupled Boundary Layers and Air-Sea Transfer Experiment in Low Winds (CBLAST-LOW) field campaign, which was conducted in the New England coastal region. We employ a diagnostics suite (Koh et al. 2012)

Domain description and simulation set-up
The CBLAST-LOW campaign was conducted over the summer months of 2001-2003 in the North Atlantic, south of Martha's Vineyard Island, MA, USA (Edson et al. 2007). The objectives of the campaign were to understand air-sea interaction and the dynamics of the coupled atmospheric-oceanic boundary layer. To achieve these goals, direct measurements of vertical fluxes of momentum, heat and mass were made across these coupled boundary layers. A combination of radiosondes and remote sensing of the vertical structure of the coastal boundary layer yielded further insights into the dynamics of the coastal LLJ and the influence of synoptic events on its formation and structure (Mahrt et al. 2014;Helmis et al. 2015). A strong LLJ was captured by a Woods Hole Oceanographic Institution (WHOI) sounding at approximately 07 August 2001 0030 UTC (1930 LST) (Fig. 1). The radiosonde data provided temperature, wind speed and wind direction profiles which showed the characteristics associated with a LLJ: a temperature inversion up to 350 m above sea-level, supergeostropic wind just above the temperature inversion, as well as wind veer of ∼20 • . A warm frontal passage occurred during this observational period and is believed to have contributed to the formation of this coastal LLJ (Helmis et al. 2015).

WRF model domain
The domain consists of three nests, centered upon Martha's Vineyard Coastal Observatory (Fig. 2). The horizontal resolutions from the outermost domain, d01 to the innermost domain, d03 are 30 km, 6 km and 1.2 km, respectively. The grid sizes (Nx × Ny) are 100 × 100, 300 × 300 and 300 × 300, respectively. All simulations used the same Noah Land surface model (Dudhia 1989) and RRTM radiation scheme (Mlawer et al. 1997), with the cumulus scheme switched off for d03. All simulations ran from 07 August 2001 0000 UTC (1900 LST) until 09 August 2001 0000 UTC (1900 LST) with the first 12 h discarded.

Model configuration
For the sensitivity studies, we choose to vary the initial and boundary conditions, planetary boundary layer (PBL) schemes, vertical resolution and sea-surface temperature (SST) fields. Table 1 summarizes the model configurations used in this study.

Boundary conditions
Previous studies have shown that the choice of initial and boundary condition datasets heavily influences the characteristics of the modeled LLJ (Nunalee and Basu 2014; Vanderwende et al. 2015). The LLJ formation is dependent on multiple factors that include baroclinicity, topography and synoptic events such as fronts (Ostdiek and Blumen 1997;Banta et al. 2002;Van de Wiel et al. 2010;Vanderwende et al. 2015). Differences exist between various reanalysis datasets in their representation of synoptic events, depending on the methods used to produce them. Hence, it would be prudent to test the sensitivity of the modeled LLJ due to such differences.
We have explored two options for the boundary conditions, ERA-Interim and NARR. ERA-Interim is a thirdgeneration global atmospheric reanalysis dataset from  1 3 1979 to 2019 (Dee et al. 2011). It contains global atmospheric and surface parameters at T255 spectral resolution ( ∼ 80 km) on 60 vertical levels, with 12 levels below 850 hPa. It uses 4D-VAR for data assimilation to produce atmospheric fields on model pressure levels at 6-h intervals and surface fields at 3-h intervals. The North-American Regional Reanalysis (NARR) is a second-generation regional reanalysis product from 1979 to present, and is a high-resolution extension of the NCEP Global Reanalysis, focused over the North American Region (Mesinger et al. 2006). It contains regional atmospheric and surface parameters at 32-km horizontal resolution on 29 vertical levels, with 7 levels below 850 hPa. NARR uses 3D-var to assimilate meteorological variables into the NCEP Eta model to produce atmospheric fields that are available at 3-h intervals. These datasets are summarized in Table 2.

PBL scheme
In mesoscale WRF simulations, the vertical structure of the atmosphere is parameterized by PBL schemes, which are used to represent sub-grid scale turbulent vertical fluxes of momentum, heat and moisture. In contrast to a convectively driven boundary layer, a stable boundary layer exhibits suppressed buoyancy-driven mixing, while the turbulence production due to strong wind shear is increased. Thus, the choice of closure schemes suitable for convective boundary layers might not be optimal for the stable boundary layer Shin and Hong 2011).
In this study, we focus on the different results obtained by local and non-local closure schemes that differ in how the sub-grid fluxes are modeled as a function of mean flow quantities. Local closure schemes model these fluxes at each grid point based on the mean values of their respective gradients at that point. Non-local closure schemes include an additional flux term to account for large-scale vertical transport that is non-local in nature (Troen and Mahrt 1986). As such, we have chosen the Mellor-Yamada-Nakanishi-Niino (MYNN) local closure scheme and the Yonsei-University (YSU) non-local closure scheme (Hong 2010;Hu et al. 2010;Shin and Hong 2011).

SST field
The contrast in surface temperature between land and sea can contribute to the LLJ formation in coastal regions (Mahrt et al. 2014). The land-sea temperature contrast can result in baroclinicity that is conducive to the formation of coastal LLJ, similar to the role of baroclinicity over the sloping Great Plains (Holton 1967). When warm air from the land surface advects offshore over cooler water, a strong stable marine boundary inversion can form (Smedman et al. 1997). The stratification of this marine boundary layer depends on the temperature difference between the warm land and the cooler sea. The stability profile will then influence the characteristics of the resultant LLJ above the inversion layer.
The standard WRF-ARW model configuration lacks a coupled ocean model; hence, the SST field is provided as an initial boundary condition and updated in time when possible. ERA-Interim provides a 6-hourly SST field that can be used to update the SST and, thus, the surface temperature field (Skamarock et al. 2008). For the NARR dataset, there is no explicit SST field available at 3-h intervals. Instead, the SST field is derived from the initial air temperature at the surface over water bodies from NARR fields and the SST field remains constant over the duration of the simulation. To assess the importance of SST on LLJ characteristics, our simulations include a case where the SST field was not updated in an ERA-based run during the simulation period. This simulation also serves as a control against which NARR-based runs can be evaluated.

Vertical resolution
The stable atmospheric boundary layer is characterized by smaller turbulent eddies compared to those of the convective boundary layer. Increasing the vertical resolution within the stable boundary layer would, in theory, increase the accuracy of the PBL parameterizations of the eddy fluxes. The improved accuracy near the surface and the LLJ core should then lead to better estimates of winds within the first 200 m. In a study of high-resolution modeling for wind energy purposes, Bernier and Bélair (2012) found that increased vertical resolution only has modest improvements on wind profile predictions. A study on coastal LLJ (Nunalee and Basu 2014) corroborated this finding. However, Bernier and Bélair (2012) also noted that the prediction of wind shear near the surface was improved with higher resolution. To test the influence of vertical resolution, we executed a simulation with a finer vertical resolution by increasing the total number of vertical levels from 60 to 75 . As a result, the effective vertical resolution in the lowest 1000 m increased from ∼ 35 to ∼ 25 m.

Observation data
To validate our results with observations and to apply the diagnostic metrics, we use surface observations from Automated Surface Observing Systems (ASOS) stations in Massachusetts and Rhode Island for temperature and wind speed (see map in Fig. 3). The ASOS program is the United States' primary surface weather observation network (NOAA 1998) and measures surface wind speed at 10 m, direction, air temperature and sea-level pressure, among other variables. Hourly surface measurements were used in this study from ASOS stations situated in an elevation range between 2 and 52 m above sea-level with the average elevation at 22 m.
The main synoptic condition occurring during the period of study was a warm front originating from the continental land mass to the northwest of the simulation domain d03. The front advanced south-east towards the Atlantic coast. When the warm air mass advects over the cooler sea surface, a temperature inversion is established. The resulting stable boundary layer suppresses turbulence mixing and reduces the effect of surface roughness. The effect of reduced turbulence leads to an acceleration of the wind field aloft, thus forming a LLJ. The LLJ observed was partially a result of this frontal passage. Wind direction shifts associated with a frontal passage were recorded by some ASOS stations (not shown). Observed surface winds generally shifted from southerly to northeasterly direction.
Although this LLJ was captured in a rawinsonde sounding launched from the WHOI ship ( Fig. 1), the single sampling of the atmosphere was not sufficient to evaluate the success of WRF in modeling LLJ characteristics that we are interested in: timing, location, height and speed. Thus, we employ the ASOS data to help fill in the gaps in observations. Even though only surface measurements were available from ASOS stations, the data will be able to capture the frontal passage, which is likely a key modulator of the LLJ characteristics (Helmis et al. 2013). Hence, we can use the surface measurements to evaluate the ability of WRF to simulate the frontal passage by proxy.

Diagnostic suite
To quantify the success of the various model configurations in simulating the LLJ, it is imperative to select a suite of diagnostic metrics that is robust and has the ability to elucidate the underlying biases and errors. We selected the diagnostic suite of Koh et al. (2012), which was designed to evaluate NWP performance. A brief description of the metrics used in our study will be given in this section.
In the following, is the defined as the difference (error) between forecast ( ) and observation ( ). The averaged difference, ⟨ ⟩ , is also known as the bias; while, 2 D is the variance of the difference. In the diagnostic framework of Koh et al. (2012), the root-mean-squared error (RMSE) is normalized by the sum of variances for both observation ( 2 O ) and forecast ( 2 F ). The normalized RMSE (NRMSE), , is, thus, defined as The NRMSE can further be expressed as where and The above relation for NRMSE decomposes the total error into two main components-bias and variance. The variance error is also referred to as pattern error. Normalized bias (NBIAS), , is the bias, ⟨D⟩ , normalized with the standard deviation of the difference. It measures the statistical significance of the bias in the dataset. A larger NBIAS implies a larger statistical bias. Furthermore, arising from the relationship between NBIAS and NRMSE, it can be shown that a NBIAS of less than 0.3 would contribute to less than 5% of the NRMSE, suggesting that the total error is dominated by pattern error rather than bias error. Normalized Pattern Error (NPE), , scores the variance error between forecast and observation. It can be expressed as where The normalized error variance, , does not depend on variability in the observation dataset, which is useful for evaluating forecast errors and is bound between 0 and 2. can be decomposed further into two other metrics, and , which assess the contribution of phase and amplitude errors, respectively: where and The correlation coefficient, , is the familiar Pearson correlation coefficient. For equal to 1, the variances are perfectly correlated; while equal to −1 means that the variances are perfectly anti-correlated; and equal to 0 means that there is no correlation. Variance similarity, , scores how similar the standard deviation of forecast and observations are. The invariant property of the metric penalizes equally the under-or overprediction of variability. Good variance prediction, when = 1, means that both O and F match exactly. However, poor prediction, when = 0, means that O and F match poorly.
For an ideal forecast, for which = 0, the amplitude and phase of forecast variability are predicted perfectly; hence, there is no pattern error. On the other hand, = 1 implies that the model performs as well as a random forecast. Lastly, = 2 indicates maximum pattern error between forecast and observation, when = 1 and = −1 . Although the amplitude of variability is well predicted in such a situation, observations and forecasts fluctuate completely out-of-phase, leading to a negative correlation. The goal of a forecast would, thus, be to minimize the pattern error, . This goal can be achieved in various ways. For any non-zero , improving the phase error between forecast and observation, hence increasing correlation, , will minimize . For a fixed positive score, maximizing by matching the amplitude of forecast variability to observations will lead to minimizing . Conversely, for a negative score, will need to be minimized to minimize . In a scenario when either forecast or observation is nearly constant and the forecast is varying in opposite direction with observation, minimizing (maximizing the amplitude error) would result in least overall disagreement between forecast and observation. Furthermore, that is decomposed into amplitude and phase errors allows model evaluators to distinguish between models with similar scores, enabling them to decide if small phase or amplitude errors are better predictors of their models' performance. Table 3 summarizes the diagnostic variables used.

Results
The frontal passage is captured in all simulations although the timing and the spatial extent of the front differ between simulations. Horizontal cross sections at 200 m above sea level of the temperature contours and wind barbs of ERAMYNN and NARRMYNN at 6-h intervals depict the status of the evolving front (Fig. 4). The simulated frontal passage in ERAMYNN (top panels) appears to extend further offshore compared to NARRMYNN (bottom panels). The acceleration of the south-westerly winds at 2100 UTC (1600 LST) is represented in both simulations. At 1500 UTC (1000 LST), no front is observed in d03 since westerly winds are fairly steady at 10 m s −1 onshore and close to coast with stronger south-westerly winds further offshore at approximately 15 m s −1 . At 2100 UTC (1600 LST), the south-westerly winds have intensified. Westerly winds onshore can be seen in the north-west of the domain. A distinct wind direction shift between the warmer continental air mass and the cooler maritime air mass can be noted in this north-west region. At 0300 UTC (2200 LST), this distinct region of wind direction shift, indicative of a frontal passage passing through the domain from the north-west to the south-west, advects offshore.
Although each of the simulations described in Table 1 produces a LLJ, as seen in Fig. 5, the resulting LLJs differ by their time of onset, duration, maximum wind speed, height of the wind speed maximum, and depth of the acceleration layer. Many of these differences emerge from different representations of the synoptic situation, specifically the timing and shape of the frontal passage discussed above.
We first explore the impact of boundary conditions on the simulations. Comparing ERAMYNN and NARRMYNN (top-left and bottom-left panels of Fig. 4), the warm frontal passage as evidenced by the band of wind direction shifts from the northern region of the domain appears to occur earlier in ERAMYNN and extends much further offshore compared to NARRMYNN. This difference in frontal passage timing is corroborated by the difference in LLJ initiation timing (top-left and bottom-left panels of Fig. 5). The difference in spatial extent of the warm air mass offshore is also shown in Fig. 6, where ERAMYNN and ERAYSU exhibit warmer air mass aloft the marine boundary layer further offshore compared to NARRMYNN and NARRYSU. The intensity of the LLJ is also stronger in ERA-based simulations compared to NARR-based simulations, with a longer period of sustained wind speeds equal to or greater than 20 m s −1 at the jet-core (Fig. 5). Jet-core wind speeds exceeded 20 m s −1 for approximately 6 h in ERA-based simulations, while only for 2 h in NARR-based simulations.
The differences in boundary-layer parameterizations (especially MYNN vs YSU) are constrained to near-surface regions, below the nose of the jet. The wind shear is stronger below 100 m in both MYNN-based simulations compared to both YSU-based simulations (Fig. 5). Wind contours offshore reflect this behaviour as well, with stronger wind speeds close to the surface within the cooler marine boundary layer for the MYNN simulations (Fig. 6). This observation holds true regardless of the choice of boundary condition. The role of the PBL scheme appears to influence the depth of the marine inversion layer and, hence, the height of the LLJ core.
Although increased vertical resolution was expected to improve the representation of the LLJ, this was not evident in the results. Wind shear below 100 m appears to be slightly weaker over the LLJ duration in ERAMYNN75 compared to ERAMYNN (Fig. 5). However, the difference between the two MYNN simulations is not as large as the difference between MYNN and YSU. The depth of the LLJ (jet-core speeds are greater than 20 m s −1 ) is shallower in ERAMYNN75 (from 200 to 350 m above mean sea-level) compared to ERAMYNN (from 150 to 400 m). On the other hand, the heights of the simulated jet-core are at approximately 250 m for both.
Lastly, in the NOSST simulation where the SST was not updated during the period of simulation, the differences in simulated LLJ characteristics are less pronounced. The wind shear observed in Fig. 5 (top right) within the lowest 100 m was more similar to ERAMYNN as compared to ERAYSU and ERAMYNN75. The jet is slightly elevated compared to ERAMYNN but this behaviour is also observed in ERAYSU and ERAMYNN75. In Fig. 6, the NOSST wind contours within the marine boundary layer are also similar to ERAMYNN and the height of the marine boundary is approximately 200 m, in concert with the other ERA-based simulations.
These qualitative observations offer some insight into the variability in the LLJ due to configuration options: the choice of the boundary condition exerted the greatest influence on resulting LLJ variability. However, it is still unclear as to which configuration simulated the actual conditions the best.

Comparison to CBLAST-LOW sounding profile
To decide which configuration options modeled the actual conditions best, we utilize in situ measurements of the atmospheric conditions during the LLJ event. The profiles for wind speed and temperature from the rawinsonde sounding shown in Fig. 1 are compared to all six model simulations. The profiles are extracted from the closest grid point to the location of the rawinsonde launch (41.52 • N, 70.67 • W).
Wind speeds are overestimated by all simulations within the first 500 m (left panel of Fig. 7). Wind speeds in ERAbased simulations are closer in magnitude to the sounding compared to NARR-based simulations, particularly around the height of the jet wind speed maximum. The heights of the wind maxima are approximately similar for all simulations, with the NOSST simulation producing the lowest height of 220 m and the ERAMYNN75 the highest height of 300 m. However, all six simulations overestimate the height of the jet core compared to the sounding.
The NARR-based simulations underestimate the temperature below the inversion layer, while also underestimating the strength of the inversion (right panel of Fig. 7). Conversely, the ERA-based simulations match the sounding to a good degree, with the exception of the ERAYSU simulation. We conclude that the choice of PBL scheme However, the LLJ varies spatially and temporally (Banta et al. 2002;Rife et al. 2010); hence, the single sounding profile comparison presented in this section does

Comparison to ASOS observations
Due to the scarcity of observational soundings during the simulation period, we utilize available ASOS surface observations to validate the results and to objectively select the optimal combination of model schemes.
We first bin the ASOS surface observations for surface temperature and wind speed into the closest 00/30 min for the LLJ duration of 1500 UTC (1000 LST) to 0300 UTC (2200 LST), then we extract the time-series for temperature at 2 m (T2) and wind speed at 10 m from the closest simulation grid cell for each ASOS station in the simulation domain, d03. We perform the diagnostic analysis on these multi-station pairs. NBIAS for most stations lies outside the strip = ±0.3 for T2 (Fig. 8) and wind speed (Fig. 9),  indicating that the contribution of NBIAS to NRMSE is significant and the error is not random.
Changing from ERA to NARR as the boundary condition generally increased NRMSE and NPE for most stations, regardless of PBL scheme chosen (Fig. 8). Stations that exhibit larger NRMSE and NPE are common within ERAand NARR-based simulations. ERA-based simulations also show lower overall NBIAS compared to the NARR-based simulations, which show a strong NBIAS of approximately −2 for multiple stations.
The effect of changing MYNN to YSU scheme is less pronounced compared to the effect of changing boundary conditions. Overall, MYNN-based simulations have smaller NRMSE due to decreased NBIAS and NPE as compared to YSU-based simulations. This holds true regardless of boundary condition used.
For NOSST configuration, the effect of not updating SST in ERAMYNN lead to slight increase in NBIAS for KBID, KOQU and KPVC stations but decreased NPE for the same stations. This shift is expected as these stations are situated in grid points designated as water-bodies in WRF. When vertical resolution is increased in ERAMYNN75, there are marginal changes in error metrics when compared to ERAMYNN.
For all simulations, NPE scores are similar in magnitude for surface temperatures, indicating comparable success in pattern prediction. In addition, all simulations show a cold bias for the surface temperature prediction, while NARRbased simulations exhibit much stronger significant biases compared to ERA-based simulations.  When the simulations are evaluated based on their predictions of surface wind speeds, the contrast between simulations is much more striking (Fig. 9). ERA-based simulations have in general smaller NRMSE and NPE compared to NARR-based simulations, regardless of PBL scheme used . This observation is similar to that of surface temperature comparisons. When the PBL scheme is changed from MYNN to YSU, the average NRMSE and NPE increase for each configuration. However, it is also interesting to note that although average NPE increases, the spread of NPE is reduced for both YSU-based simulations. Comparing the performance of NOSST and ERAMYNN75, the average NRMSE and NPE scores are increased slightly, although it is difficult to determine a systematic error associated with each station.
It is clear for this offshore LLJ that boundary conditions introduce a stronger bias compared to those introduced by different PBL schemes, SST field, and vertical resolution.
Based on the above metrics, we choose ERAMYNN as the best simulation as a baseline case for inter-model comparison diagnostics. ERAMYNN offers the best mix of NRMSE/NPE/NBIAS scores for both T2 and wind speed amongst all the simulations.

Inter-model comparisons
ERAMYNN is chosen as a reference model based on comparisons to ASOS surface observations. The purpose of the analysis in this section is to use the diagnostic tools to objectively quantify the variability in prediction introduced by different model parameters.
NARR-based simulations exhibit a strong cold bias mainly over sea surfaces and strong warm bias over land surfaces during the LLJ event (Fig. 10). The spatial extent and amplitude of NBIAS are larger than in the other model configurations.
Furthermore, NARR-based simulations exhibit strong error variance in the south-eastern portion of the domain, over the sea surface (Fig. 11). For other model configurations, large error variance is confined mainly to regions just off the south-eastern coast. The prominent area of large scores in NARR-based simulations over the sea indicates an anti-correlation of temperature between ERA-based and NARR-based simulations in this region, which implies a phase error. This pattern suggests that the time of frontal passage that contributed to the LLJ formation was lagging in NARR-based simulations compared to ERA-based simulations.
The LLJ itself and its frontal boundary can be seen in the northwest-southeast vertical transect of the domain (Fig. 12). The biases are similar to those in the horizontal domain in that NARR-based simulations exhibit overall stronger biases compared to ERA-based simulations with cold bias over the sea and warm bias over land surfaces. ERAYSU shows stronger warm biases over land compared to other ERA-based simulations.
The score in the vertical slice of NARR-based simulations shows a spatial pattern of strong normalized error variance that resembles a frontal boundary (Fig. 13). The strong score arises from an anti-correlation of temperatures between the NARR and ERA-based simulations and manifests over the sea surface, especially near the marine surface layer. This behaviour corroborates the pattern seen in the horizontal plane in Fig. 11, where an associated phase error contributed to the variability in the simulations. In contrast, for the ERA-based simulations, the error variance is mainly confined to the region of sea surface near the coast, with little variation in the vertical profile.

Discussion
The strong score ( > 1 ) offshore as seen in both Figs. 11 and 13 arises from an anti-correlation of the temperature in that region. This phase error can be explained by the dynamics of the frontal passage. NARR-based simulations produce a frontal passage that passes through the domain later in time and does not have the same spatial extent as in the ERA-based simulations.
Examination of the boundary forcing in ERA and NARR dataset shows slight variation in pressure fields (Fig. 14). A sharper pressure gradient extends across the domain in ERA as compared to NARR, and the low pressure system extends further south in NARR. The resultant geostrophic wind from the sharper pressure gradient in ERA likely increases the frontal advection, thus influencing the formation of the LLJ at the frontal boundary (Ostdiek and Blumen 1997).
In contrast to the role of ERA/NARR boundary conditions, the choice of PBL scheme, vertical resolution and role of SST updates are less significant for model variability. Furthermore, the variability in the ERA-based simulations is more apparent in the vertical profile and is mainly manifested as NBIAS.
The choice of PBL scheme, SST and vertical resolution mainly influence the height of the marine boundary layer and, hence, the height of the LLJ core. The choice of YSU scheme over MYNN results in an overall warmer profile over land due to enhanced mixing from the land surface that is then advected offshore, which is seen in YSU's warm bias above the marine boundary layer. The role of SST is less significant than expected. Of course, SST fields are updated every 6 h in ERA and in 0.75 • × 0.75 • resolution, which may be too coarse in space and time to have a significant impact on the land-sea temperature contrast.

Conclusion and future work
In this study, we simulate a LLJ event observed during CBLAST-LOW 2001 campaign off the New England coast and characterize the variability in the LLJ timing, height and speed of the LLJ core due to different model configurations. The impact from the choice of initial and boundary condition dataset, PBL scheme, SST update frequency and vertical resolution is investigated.
The timing of the LLJ initiation is most dependent on the boundary condition dataset (ERA or NARR) (Fig. 5), whereas the height of the LLJ core responds to the choice of PBL scheme. We evaluate simulations against observations to establish an optimal configuration suitable for this case study. Comparison with rawinsonde sounding demonstrates that wind speeds are overestimated by all models, although ERA-based simulations exhibit the smallest bias. The height of the wind speed maximum is also overestimated by all models. ERA-based simulations are able to model the strength and depth of the temperature inversion layer much better than NARR-based simulations, which exhibit a strong cool bias of -2 • C below the inversion layer. The influence of the PBL schemes on wind speed and temperature profile manifests differently in ERA compared to NARR. However, a single sounding is not suitable for determining the performance in modeling a 4D phenomenon. Thus, we utilize surface observations from ASOS stations for further validation.
To objectively determine model performance, we employ the diagnostic suite described in Koh et al. (2012). This set of diagnostic tools allows us to decompose errors into bias and pattern errors, which provides further insights into the the possible sources of variability between model configurations. From the diagnostics of surface temperature, we find that all simulations tend to exhibit a cold bias error, although the majority of stations within each simulation have small pattern errors. This led us to conclude that models have comparable success in variance prediction but the biases are likely driven by differences in boundary conditions between ERA and NARR. The wind speed diagnostics demonstrate that the NRMSE is overall higher compared to the corresponding error in temperature. While the magnitude of NBIAS changes between model configurations, stations with | | > 0.3 retain their significant NBIAS. Based on the diagnostic metrics derived from ASOS stations, ERAMYNN offers the best mix of NRMSE/NBIAS/NPE performance; hence, we chose ERAMYNN as the reference model for the inter-model comparison study.
In the inter-model study, we compare the performance of all other model configurations to ERAMYNN . The diagnostic suite is able to help distinguish the type of variability between the simulations. The score highlighted the difference between ERA/NARR simulations is due to a phase difference of the frontal passage representation as dictated by the boundary conditions. The variability between different model configurations arises mainly from bias errors which can be explained by the parameterization of the PBL scheme and the changes in vertical resolution.
The diagnostics suite also provides metrics to diagnose wind as vector quantity (see for example the application to radiosonde observations in Tieo et al. (2018)), although it was not performed in this study. The ASOS surface observations for wind speed and direction are not robust enough to decompose the wind into vectors for analysis. In addition, the frontal passage was the major synoptic event that heavily influenced the LLJ formation, hence we diagnose the (scalar) temperature change which serves as a proxy for the LLJ initiation. We conclude by noting that there is a keen interest in nesting large-eddy simulations (LES) within mesoscale models, such as WRF, to improve weather predictions (Talbot et al. 2012), as well as for wind energy applications Aitken et al. 2014;Mirocha et al. 2015). Field campaigns, such as IMPOWR (Colle et al. 2016), XPIA (Lundquist et al. 2017) and WFIP2 (Shaw et al. 2019), funded under the U.S. Department of Energy A2e initiative, provide more comprehensive measurements in various conditions that will allow further validation and improvement of both mesoscale and LES models. The performance of nested LES depends on the success of the mesoscale domain in simulating real atmospheric conditions. This study outlines a useful method for modelers to evaluate different mesoscale model configurations and select a suitable configuration for future LES studies. Fig. 14 Pressure field at mean sea-level for d03 from ERA and NARR reanalysis dataset used as initial and boundary conditions for the simulations Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.