1 Introduction

A low-level jet (LLJ) can be described as accelerated winds aloft, confined within the atmospheric boundary layer. Such events are characterized by strong surface shear and wind speed maximum of more than 10 m \({\text{s}}^{-1}\). LLJs have been frequently documented in many regions of the world (Bonner 1968; Doyle and Warner 1993; Whiteman et al. 1997; Banta et al. 2002; Song et al. 2005; Lundquist and Mirocha 2008; Colle and Novak 2010; Rife et al. 2010; Hu et al. 2013; Berg et al. 2015; Du et al. 2015; Vanderwende et al. 2015; Smith et al. 2019) and have an impact on the transportation of momentum, moisture and pollutants (Angevine et al. 2006). Furthermore, LLJs can be a source of strong winds for wind power production. However, the associated strong wind shear and veer have significant influences on the stresses experienced by a wind turbine (Kelley et al. 2004). A better understanding of LLJ interaction with wind turbines can advance optimal wind turbine design strategies.

The traditional approach for wind resource modeling is based on Jackson–Hunt-type linear flow models such as WAsP, MS3DJH and MS-Micro (Brower 2012). Linear flow models solve steady-state linearized Navier–Stokes equations with simplified assumptions for terrain effects (Jackson and Hunt 1975). They are computationally efficient and are able to provide a suitable first approximation for the wind climate of a chosen site. However, the increased availability of computational resources and power has made it more attractive to consider more complex models to be used for wind resource assessment. Higher-order models are able to better resolve nonlinear effects and hence provide better approximation of the available wind resource (Mahoney et al. 2012; Badger et al. 2015; Beaucage et al. 2014; Gopalan et al. 2014).

Numerical weather prediction models (NWP) are used as operational weather forecast models although the interest in using NWP for wind resource assessment is growing (Storm et al. 2009; Badger et al. 2015; Beaucage et al. 2014; Hahmann et al. 2014; Wilczak et al. 2015). NWP models are typically compressible, non-hydrostatic models that are able to simulate a wider scale of meteorological phenomena driven by atmospheric stability such as low-level jets and barrier jets.

With increased focus on offshore wind power (Dvorak et al. 2010; Archer et al. 2014; Gryning et al. 2014; Hahmann et al. 2014), NWP provides a larger spatial and temporal coverage for wind climate studies compared to in situ offshore measurements. Few long-term measurements of offshore wind are available due to the cost and difficulty in setting up mast towers. NWP can bridge this gap and provide better approximations of the wind climate at a potential offshore site before further commitment into costly long-term measurement campaigns.

Despite the greater accuracy of NWP models in wind prediction, the modeling outcome requires familiarity with both the geographic region of interest and the available model parameterization schemes (Storm et al. 2009; Talbot et al. 2012; Nunalee and Basu 2014; Vanderwende et al. 2015; Berg et al. 2015). No universal model set-up performs optimally in all geographic regions and mesoscale conditions. Hence, it would be useful to be able to objectively quantify the performance of mesoscale simulations and pinpoint the sources of the prediction uncertainties. This is the major motivation for the present investigation.

One of the most common NWP models is the Weather Research and Forecasting model (WRF), which is a non-hydrostatic mesoscale numerical weather prediction model used extensively for both operational and research purposes (Skamarock et al. 2008). WRF allows for either idealized or real-data simulation studies and uses nested domain runs that can be one-way or two-way coupled.

WRF has been shown to simulate LLJs reasonably well for the US Great Plains (Storm et al. 2009; Vanderwende et al. 2015) and for coastal cases (Nunalee and Basu 2014). These studies have shown that the timing of LLJ formation, as well as the height and strength of the jet core, is dependent on the WRF model configurations. In this study, we seek to characterize the variability of the results related to the model configurations. We utilize WRF (version 3.4.1) to simulate a coastal LLJ observed in August 2001 during the Coupled Boundary Layers and Air–Sea Transfer Experiment in Low Winds (CBLAST-LOW) field campaign, which was conducted in the New England coastal region. We employ a diagnostics suite (Koh et al. 2012) to quantify the variability between different model configurations. We address the following questions in our work:

  • How sensitive are the timing of LLJ formation, and height and strength of the LLJ core, to the model parameters such as choice of initial/boundary conditions, planetary boundary layer schemes and vertical resolution?

  • How can we quantify the influence that the model parameters have on the simulated LLJ?

The simulation domain and the different model parameters tested in this case study are presented in Sect. 2 together with a description of the data sets used. In Sect.  3, we introduce the diagnostic suite used to evaluate the models. The results of the simulations and the performance evaluation are presented in Sect. 4. Lastly, a discussion on the implication of the choice of model parameters is offered in Sect. 5, while conclusions are drawn in Sect. 6.

2 Domain description and simulation set-up

The CBLAST-LOW campaign was conducted over the summer months of 2001–2003 in the North Atlantic, south of Martha’s Vineyard Island, MA, USA (Edson et al. 2007). The objectives of the campaign were to understand air–sea interaction and the dynamics of the coupled atmospheric–oceanic boundary layer. To achieve these goals, direct measurements of vertical fluxes of momentum, heat and mass were made across these coupled boundary layers. A combination of radiosondes and remote sensing of the vertical structure of the coastal boundary layer yielded further insights into the dynamics of the coastal LLJ and the influence of synoptic events on its formation and structure (Mahrt et al. 2014; Helmis et al. 2015).

A strong LLJ was captured by a Woods Hole Oceanographic Institution (WHOI) sounding at approximately 07 August 2001 0030 UTC (1930 LST) (Fig. 1). The radiosonde data provided temperature, wind speed and wind direction profiles which showed the characteristics associated with a LLJ: a temperature inversion up to 350 m above sea-level, supergeostropic wind just above the temperature inversion, as well as wind veer of \(\sim\)20\(^\circ\). A warm frontal passage occurred during this observational period and is believed to have contributed to the formation of this coastal LLJ (Helmis et al. 2015).

Fig. 1
figure 1

Sounding profiles for wind speed (red), wind direction (green) and temperature (blue) measured with a rawinsonde launched on 08 August 2001 0030 UTC (1930 LST) at 41.52\(^\circ\) N, 70.67\(^\circ\) W from the WHOI Ship. No wind data was available between 3 and 226 m from the available sounding archive. (Retrieved from ftp://ftp.whoi.edu/pub/users/cblast/2001/rawinsonde/01080720.CAP)

Fig. 2
figure 2

WRF domain set-up showing three nested domains centred on the location of the Martha’s Vineyard Coastal Observatory. d01 is the outermost domain, d02 is the nested intermediate domain and d03 is the innermost domain nested within d02

2.1 WRF model domain

The domain consists of three nests, centered upon Martha’s Vineyard Coastal Observatory (Fig. 2). The horizontal resolutions from the outermost domain, d01 to the innermost domain, d03 are 30 km, 6 km and 1.2 km, respectively. The grid sizes (Nx \(\times\) Ny) are 100 \(\times\) 100, 300 \(\times\) 300 and 300 \(\times\) 300, respectively. All simulations used the same Noah Land surface model (Dudhia 1989) and RRTM radiation scheme (Mlawer et al. 1997), with the cumulus scheme switched off for d03. All simulations ran from 07 August 2001 0000 UTC (1900 LST) until 09 August 2001 0000 UTC (1900 LST) with the first 12 h discarded.

Table 1 Names and summary of all the model configurations in the study

2.2 Model configuration

For the sensitivity studies, we choose to vary the initial and boundary conditions, planetary boundary layer (PBL) schemes, vertical resolution and sea-surface temperature (SST) fields. Table 1 summarizes the model configurations used in this study.

Fig. 3
figure 3

Map of d03 with markers indicating positions of ASOS stations used. Black diagonal line indicates a northwest-southeast transect referenced in later figures

Fig. 4
figure 4

Temperature contours at 200 m above mean sea level during frontal passage in d03. Wind speed and direction indicated by wind barbs in m \({\text{s}}^{-1}\). Gray area in plots indicate the presence of terrain at 200 m above mean sea level. The black pointer marks the center of the simulation domain. Top panels ERAMYNN, Bottom panels NARRMYNN

2.2.1 Boundary conditions

Previous studies have shown that the choice of initial and boundary condition datasets heavily influences the characteristics of the modeled LLJ (Nunalee and Basu 2014; Vanderwende et al. 2015). The LLJ formation is dependent on multiple factors that include baroclinicity, topography and synoptic events such as fronts (Ostdiek and Blumen 1997; Banta et al. 2002; Van de Wiel et al. 2010; Vanderwende et al. 2015). Differences exist between various reanalysis datasets in their representation of synoptic events, depending on the methods used to produce them. Hence, it would be prudent to test the sensitivity of the modeled LLJ due to such differences.

We have explored two options for the boundary conditions, ERA-Interim and NARR. ERA-Interim is a third-generation global atmospheric reanalysis dataset from 1979 to 2019 (Dee et al. 2011). It contains global atmospheric and surface parameters at T255 spectral resolution (\(\sim\)80 km) on 60 vertical levels, with 12 levels below 850 hPa. It uses 4D-VAR for data assimilation to produce atmospheric fields on model pressure levels at 6-h intervals and surface fields at 3-h intervals. The North-American Regional Reanalysis (NARR) is a second-generation regional reanalysis product from 1979 to present, and is a high-resolution extension of the NCEP Global Reanalysis, focused over the North American Region (Mesinger et al. 2006). It contains regional atmospheric and surface parameters at 32-km horizontal resolution on 29 vertical levels, with 7 levels below 850 hPa. NARR uses 3D-var to assimilate meteorological variables into the NCEP Eta model to produce atmospheric fields that are available at 3-h intervals. These datasets are summarized in Table 2.

2.2.2 PBL scheme

In mesoscale WRF simulations, the vertical structure of the atmosphere is parameterized by PBL schemes, which are used to represent sub-grid scale turbulent vertical fluxes of momentum, heat and moisture. In contrast to a convectively driven boundary layer, a stable boundary layer exhibits suppressed buoyancy-driven mixing, while the turbulence production due to strong wind shear is increased. Thus, the choice of closure schemes suitable for convective boundary layers might not be optimal for the stable boundary layer (Nielsen-Gammon et al. 2010; Shin and Hong 2011).

In this study, we focus on the different results obtained by local and non-local closure schemes that differ in how the sub-grid fluxes are modeled as a function of mean flow quantities. Local closure schemes model these fluxes at each grid point based on the mean values of their respective gradients at that point. Non-local closure schemes include an additional flux term to account for large-scale vertical transport that is non-local in nature (Troen and Mahrt 1986). As such, we have chosen the Mellor–Yamada–Nakanishi–Niino (MYNN) local closure scheme and the Yonsei-University (YSU) non-local closure scheme (Hong 2010; Hu et al. 2010; Shin and Hong 2011).

Fig. 5
figure 5

Time–height cross sections of horizontal wind speed at the center of d03 for all six model configurations. Location is marked with a black pointer in Fig. 4. Period is from 07 August 2001 1500 UTC (1000 LST) to 08 August 2001 0300 UTC (2200 LST)

Fig. 6
figure 6

NW-SE transect of d03 for all six model configurations on 08 August 2001 0030 UTC (1930 LST). Colored contours indicate temperature. Line contours indicate wind speed

2.2.3 SST field

The contrast in surface temperature between land and sea can contribute to the LLJ formation in coastal regions (Mahrt et al. 2014). The land–sea temperature contrast can result in baroclinicity that is conducive to the formation of coastal LLJ, similar to the role of baroclinicity over the sloping Great Plains (Holton 1967). When warm air from the land surface advects offshore over cooler water, a strong stable marine boundary inversion can form (Smedman et al. 1997). The stratification of this marine boundary layer depends on the temperature difference between the warm land and the cooler sea. The stability profile will then influence the characteristics of the resultant LLJ above the inversion layer.

The standard WRF-ARW model configuration lacks a coupled ocean model; hence, the SST field is provided as an initial boundary condition and updated in time when possible. ERA-Interim provides a 6-hourly SST field that can be used to update the SST and, thus, the surface temperature field (Skamarock et al. 2008). For the NARR dataset, there is no explicit SST field available at 3-h intervals. Instead, the SST field is derived from the initial air temperature at the surface over water bodies from NARR fields and the SST field remains constant over the duration of the simulation. To assess the importance of SST on LLJ characteristics, our simulations include a case where the SST field was not updated in an ERA-based run during the simulation period. This simulation also serves as a control against which NARR-based runs can be evaluated.

Fig. 7
figure 7

Comparisons between sounding and simulation profiles on 08 August 2001 0030 UTC (1930 LST) at 41.52\(^\circ\) N, 70.67\(^\circ\) W. Left panel horizontal wind speed, Right panel temperature

Fig. 8
figure 8

Error decomposition diagram of surface temperature, T2, for all model configurations. NRMSE, \(\delta\), is the radial distance along the bottom of the plot. NPE, \(\sigma = \sqrt{\alpha }\), is the upward vertical distance from the bottom axis and bound by \(\sqrt{2}\). NBIAS, \(\mu\), is labeled for each radial line. Bold radial lines mark \(\mu =\pm 0.3\). A \(\mu\) between \(-0.3\) and \(+0.3\) indicates that NBIAS contributes to less than 5\(\%\) of NRMSE. Optimal model with \(\delta =0\) is marked with a blue circle

2.2.4 Vertical resolution

The stable atmospheric boundary layer is characterized by smaller turbulent eddies compared to those of the convective boundary layer. Increasing the vertical resolution within the stable boundary layer would, in theory, increase the accuracy of the PBL parameterizations of the eddy fluxes. The improved accuracy near the surface and the LLJ core should then lead to better estimates of winds within the first 200 m. In a study of high-resolution modeling for wind energy purposes, Bernier and Bélair (2012) found that increased vertical resolution only has modest improvements on wind profile predictions. A study on coastal LLJ (Nunalee and Basu 2014) corroborated this finding. However, Bernier and Bélair (2012) also noted that the prediction of wind shear near the surface was improved with higher resolution. To test the influence of vertical resolution, we executed a simulation with a finer vertical resolution by increasing the total number of vertical levels from 60 to 75 . As a result, the effective vertical resolution in the lowest 1000 m increased from \(\sim 35\) to \(\sim 25\) m.

Table 2 Comparison of boundary condition datasets

2.3 Observation data

To validate our results with observations and to apply the diagnostic metrics, we use surface observations from Automated Surface Observing Systems (ASOS) stations in Massachusetts and Rhode Island for temperature and wind speed (see map in Fig. 3).

The ASOS program is the United States’ primary surface weather observation network (NOAA 1998) and measures surface wind speed at 10 m, direction, air temperature and sea-level pressure, among other variables. Hourly surface measurements were used in this study from ASOS stations situated in an elevation range between 2 and 52 m above sea-level with the average elevation at 22 m.

The main synoptic condition occurring during the period of study was a warm front originating from the continental land mass to the northwest of the simulation domain d03. The front advanced south-east towards the Atlantic coast. When the warm air mass advects over the cooler sea surface, a temperature inversion is established. The resulting stable boundary layer suppresses turbulence mixing and reduces the effect of surface roughness. The effect of reduced turbulence leads to an acceleration of the wind field aloft, thus forming a LLJ. The LLJ observed was partially a result of this frontal passage. Wind direction shifts associated with a frontal passage were recorded by some ASOS stations (not shown). Observed surface winds generally shifted from southerly to northeasterly direction.

Although this LLJ was captured in a rawinsonde sounding launched from the WHOI ship (Fig. 1), the single sampling of the atmosphere was not sufficient to evaluate the success of WRF in modeling LLJ characteristics that we are interested in: timing, location, height and speed. Thus, we employ the ASOS data to help fill in the gaps in observations. Even though only surface measurements were available from ASOS stations, the data will be able to capture the frontal passage, which is likely a key modulator of the LLJ characteristics (Helmis et al. 2013). Hence, we can use the surface measurements to evaluate the ability of WRF to simulate the frontal passage by proxy.

Table 3 Summary of acronyms, symbols and their meanings for diagnostic suite variables

3 Diagnostic suite

To quantify the success of the various model configurations in simulating the LLJ, it is imperative to select a suite of diagnostic metrics that is robust and has the ability to elucidate the underlying biases and errors. We selected the diagnostic suite of Koh et al. (2012), which was designed to evaluate NWP performance. A brief description of the metrics used in our study will be given in this section.

In the following, \(\mathbf{D }\) is the defined as the difference (error) between forecast (\(\mathbf{F }\)) and observation (\(\mathbf{O }\)). The averaged difference, \(\langle \mathbf{D } \rangle\), is also known as the bias; while, \(\sigma _D^2\) is the variance of the difference. In the diagnostic framework of Koh et al. (2012), the root-mean-squared error (RMSE) is normalized by the sum of variances for both observation (\(\sigma _O^2\)) and forecast (\(\sigma _F^2\)). The normalized RMSE (NRMSE), \(\delta\), is, thus, defined as

$$\begin{aligned} \delta\, {\mathop {=}\limits ^\mathrm{def}}\, \frac{\sqrt{ \langle \mathbf{D }^2 \rangle}}{\sqrt{\sigma _O^2 + \sigma _F^2}}. \end{aligned}$$
(1)

The NRMSE can further be expressed as

$$\begin{aligned} \delta ^2 = \sigma ^2(1+|\mu |^2), \end{aligned}$$
(2)

where

$$\begin{aligned} \sigma \,{\mathop {=}\limits ^\mathrm{def}}\, \frac{\sigma _D}{\sqrt{\sigma _O^2 + \sigma _F^2}}, \end{aligned}$$
(3)

and

$$\begin{aligned} \mu\, {\mathop {=}\limits ^\mathrm{def}} \,\frac{\langle \mathbf{D } \rangle }{\sigma _D}. \end{aligned}$$
(4)

The above relation for NRMSE decomposes the total error into two main components—bias and variance. The variance error is also referred to as pattern error. Normalized bias (NBIAS), \(\mu\), is the bias, \(\langle D \rangle\), normalized with the standard deviation of the difference. It measures the statistical significance of the bias in the dataset. A larger NBIAS implies a larger statistical bias. Furthermore, arising from the relationship between NBIAS and NRMSE, it can be shown that a NBIAS of less than 0.3 would contribute to less than 5% of the NRMSE, suggesting that the total error is dominated by pattern error rather than bias error. Normalized Pattern Error (NPE), \(\sigma\), scores the variance error between forecast and observation. It can be expressed as

$$\begin{aligned} \sigma = \sqrt{\alpha }, \end{aligned}$$
(5)

where

$$\begin{aligned} \alpha\, {\mathop {=}\limits ^\mathrm{def}}\, \frac{\sigma _D^2}{\sigma _O^2 + \sigma _F^2} , \qquad 0 \le \alpha \le 2. \end{aligned}$$
(6)

The normalized error variance, \(\alpha\), does not depend on variability in the observation dataset, which is useful for evaluating forecast errors and is bound between 0 and 2. \(\alpha\) can be decomposed further into two other metrics, \(\rho\) and \(\eta\), which assess the contribution of phase and amplitude errors, respectively:

$$\begin{aligned} \alpha = 1 - \rho \eta , \end{aligned}$$
(7)

where

$$\begin{aligned} \rho\, {\mathop {=}\limits ^\mathrm{def}} \,\frac{\hbox {Trace}[\hbox {cov}(\mathbf{F },\mathbf{O })]}{\sigma _O \sigma _F}, \qquad -1 \le \rho \le 1, \end{aligned}$$
(8)

and

$$\begin{aligned} \eta\, {\mathop {=}\limits ^\mathrm{def}}\, \frac{\sigma _O \sigma _F}{\frac{1}{2}(\sigma _O^2 + \sigma _F^2)} , \qquad 0 \le \eta \le 1. \end{aligned}$$
(9)

The correlation coefficient, \(\rho\), is the familiar Pearson correlation coefficient. For \(\rho\) equal to 1, the variances are perfectly correlated; while \(\rho\) equal to \(-1\) means that the variances are perfectly anti-correlated; and \(\rho\) equal to 0 means that there is no correlation.

Variance similarity, \(\eta\), scores how similar the standard deviation of forecast and observations are. The invariant property of the metric penalizes equally the under- or over-prediction of variability. Good variance prediction, when \(\eta\) = 1, means that both \(\sigma _O\) and \(\sigma _F\) match exactly. However, poor prediction, when \(\eta\) = 0, means that \(\sigma _O\) and \(\sigma _F\) match poorly.

For an ideal forecast, for which \(\alpha\) = 0, the amplitude and phase of forecast variability are predicted perfectly; hence, there is no pattern error. On the other hand, \(\alpha\) = 1 implies that the model performs as well as a random forecast. Lastly, \(\alpha\) = 2 indicates maximum pattern error between forecast and observation, when \(\eta\) = 1 and \(\rho\) = \(-1\). Although the amplitude of variability is well predicted in such a situation, observations and forecasts fluctuate completely out-of-phase, leading to a negative correlation. The goal of a forecast would, thus, be to minimize the pattern error, \(\alpha\). This goal can be achieved in various ways. For any non-zero \(\eta\), improving the phase error between forecast and observation, hence increasing correlation, \(\rho\), will minimize \(\alpha\). For a fixed positive \(\rho\) score, maximizing \(\eta\) by matching the amplitude of forecast variability to observations will lead to minimizing \(\alpha\). Conversely, for a negative \(\rho\) score, \(\eta\) will need to be minimized to minimize \(\alpha\). In a scenario when either forecast or observation is nearly constant and the forecast is varying in opposite direction with observation, minimizing \(\eta\) (maximizing the amplitude error) would result in least overall disagreement between forecast and observation. Furthermore, that \(\alpha\) is decomposed into amplitude and phase errors allows model evaluators to distinguish between models with similar \(\alpha\) scores, enabling them to decide if small phase or amplitude errors are better predictors of their models’ performance.

Table 3 summarizes the diagnostic variables used.

Fig. 9
figure 9

Error decomposition diagram of surface wind speed at 10 m, for all model configurations. Symbols and lines as in Fig. 8

Fig. 10
figure 10

Contour plot of NBIAS for temperature during LLJ event between 07 August 2001 1500 UTC (1000 LST) and 08 August 2001 0300 UTC (2200 LST) at 200 m cross-section. ERAMYNN is chosen as the reference dataset for comparison. NARRMYNN refers to comparing NARRMYNN to ERAMYNN. Similarly for ERAYSU, NARRYSU, NOSST and ERAMYNN75

Fig. 11
figure 11

Contour plot of \(\alpha\) for temperature during LLJ event between 07 August 2001 1500 UTC (1000 LST) and 08 August 2001 0300 UTC (2200 LST) at 200 m cross-section. ERAMYNN is chosen as the reference dataset for comparison. NARRMYNN refers to comparing NARRMYNN to ERAMYNN. Similarly for ERAYSU, NARRYSU, NOSST and ERAMYNN75

Fig. 12
figure 12

Contour plot of NBIAS for temperature during LLJ event between 07 August 2001 1500 UTC (1000 LST) and 08 August 2001 0300 UTC (220 LST) along the NW-SE vertical transect across domain 3 shown in Fig. 3. ERAMYNN is chosen as the reference dataset for comparison. NARRMYNN refers to comparing NARRMYNN to ERAMYNN. Similarly for ERAYSU, NARRYSU, NOSST and ERAMYNN75

Fig. 13
figure 13

Contour plot of \(\alpha\) for temperature during LLJ event between 07 August 2001 1500 UTC (1000 LST) and 08 August 2001 0300 UTC (2200 LST) along the NW-SE vertical transect across domain 3 shown in Fig. 3. ERAMYNN is chosen as the reference dataset for comparison. NARRMYNN refers to comparing NARRMYNN to ERAMYNN. Similarly for ERAYSU, NARRYSU, NOSST and ERAMYNN75

Fig. 14
figure 14

Pressure field at mean sea-level for d03 from ERA and NARR reanalysis dataset used as initial and boundary conditions for the simulations

4 Results

The frontal passage is captured in all simulations although the timing and the spatial extent of the front differ between simulations. Horizontal cross sections at 200 m above sea level of the temperature contours and wind barbs of ERAMYNN and NARRMYNN at 6-h intervals depict the status of the evolving front (Fig. 4). The simulated frontal passage in ERAMYNN (top panels) appears to extend further offshore compared to NARRMYNN (bottom panels). The acceleration of the south-westerly winds at 2100 UTC (1600 LST) is represented in both simulations.

At 1500 UTC (1000 LST), no front is observed in d03 since westerly winds are fairly steady at 10 m \({\text{s}}^{-1}\) onshore and close to coast with stronger south-westerly winds further offshore at approximately 15 m \({\text{s}}^{-1}\). At 2100 UTC (1600 LST), the south-westerly winds have intensified. Westerly winds onshore can be seen in the north-west of the domain. A distinct wind direction shift between the warmer continental air mass and the cooler maritime air mass can be noted in this north-west region. At 0300 UTC (2200 LST), this distinct region of wind direction shift, indicative of a frontal passage passing through the domain from the north-west to the south-west, advects offshore.

Although each of the simulations described in Table 1 produces a LLJ, as seen in Fig. 5, the resulting LLJs differ by their time of onset, duration, maximum wind speed, height of the wind speed maximum, and depth of the acceleration layer. Many of these differences emerge from different representations of the synoptic situation, specifically the timing and shape of the frontal passage discussed above.

We first explore the impact of boundary conditions on the simulations. Comparing ERAMYNN and NARRMYNN (top-left and bottom-left panels of Fig. 4), the warm frontal passage as evidenced by the band of wind direction shifts from the northern region of the domain appears to occur earlier in ERAMYNN and extends much further offshore compared to NARRMYNN. This difference in frontal passage timing is corroborated by the difference in LLJ initiation timing (top-left and bottom-left panels of Fig. 5). The difference in spatial extent of the warm air mass offshore is also shown in Fig. 6, where ERAMYNN and ERAYSU exhibit warmer air mass aloft the marine boundary layer further offshore compared to NARRMYNN and NARRYSU. The intensity of the LLJ is also stronger in ERA-based simulations compared to NARR-based simulations, with a longer period of sustained wind speeds equal to or greater than 20 m \({\text{s}}^{-1}\) at the jet-core (Fig. 5). Jet-core wind speeds exceeded 20 m \({\text{s}}^{-1}\) for approximately 6 h in ERA-based simulations, while only for 2 h in NARR-based simulations.

The differences in boundary-layer parameterizations (especially MYNN vs YSU) are constrained to near-surface regions, below the nose of the jet. The wind shear is stronger below 100 m in both MYNN-based simulations compared to both YSU-based simulations (Fig. 5). Wind contours offshore reflect this behaviour as well, with stronger wind speeds close to the surface within the cooler marine boundary layer for the MYNN simulations (Fig. 6). This observation holds true regardless of the choice of boundary condition. The role of the PBL scheme appears to influence the depth of the marine inversion layer and, hence, the height of the LLJ core.

Although increased vertical resolution was expected to improve the representation of the LLJ, this was not evident in the results. Wind shear below 100 m appears to be slightly weaker over the LLJ duration in ERAMYNN75 compared to ERAMYNN (Fig. 5). However, the difference between the two MYNN simulations is not as large as the difference between MYNN and YSU. The depth of the LLJ (jet-core speeds are greater than 20 m \({\text{s}}^{-1}\)) is shallower in ERAMYNN75 (from 200 to 350 m above mean sea-level) compared to ERAMYNN (from 150 to 400 m). On the other hand, the heights of the simulated jet-core are at approximately 250 m for both.

Lastly, in the NOSST simulation where the SST was not updated during the period of simulation, the differences in simulated LLJ characteristics are less pronounced. The wind shear observed in Fig. 5 (top right) within the lowest 100 m was more similar to ERAMYNN as compared to ERAYSU and ERAMYNN75. The jet is slightly elevated compared to ERAMYNN but this behaviour is also observed in ERAYSU and ERAMYNN75. In Fig. 6, the NOSST wind contours within the marine boundary layer are also similar to ERAMYNN and the height of the marine boundary is approximately 200 m, in concert with the other ERA-based simulations.

These qualitative observations offer some insight into the variability in the LLJ due to configuration options: the choice of the boundary condition exerted the greatest influence on resulting LLJ variability. However, it is still unclear as to which configuration simulated the actual conditions the best.

4.1 Comparison to CBLAST-LOW sounding profile

To decide which configuration options modeled the actual conditions best, we utilize in situ measurements of the atmospheric conditions during the LLJ event. The profiles for wind speed and temperature from the rawinsonde sounding shown in Fig. 1 are compared to all six model simulations. The profiles are extracted from the closest grid point to the location of the rawinsonde launch (41.52\(^\circ\) N, 70.67\(^\circ\) W).

Wind speeds are overestimated by all simulations within the first 500 m (left panel of Fig. 7). Wind speeds in ERA-based simulations are closer in magnitude to the sounding compared to NARR-based simulations, particularly around the height of the jet wind speed maximum. The heights of the wind maxima are approximately similar for all simulations, with the NOSST simulation producing the lowest height of 220 m and the ERAMYNN75 the highest height of 300 m. However, all six simulations overestimate the height of the jet core compared to the sounding.

The NARR-based simulations underestimate the temperature below the inversion layer, while also underestimating the strength of the inversion (right panel of Fig. 7). Conversely, the ERA-based simulations match the sounding to a good degree, with the exception of the ERAYSU simulation. We conclude that the choice of PBL scheme influences the strength and depth of the marine inversion layer.

However, the LLJ varies spatially and temporally (Banta et al. 2002; Rife et al. 2010); hence, the single sounding profile comparison presented in this section does not provide a robust basis for assessing which simulation performs optimally.

4.2 Comparison to ASOS observations

Due to the scarcity of observational soundings during the simulation period, we utilize available ASOS surface observations to validate the results and to objectively select the optimal combination of model schemes.

We first bin the ASOS surface observations for surface temperature and wind speed into the closest 00/30 min for the LLJ duration of 1500 UTC (1000 LST) to 0300 UTC (2200 LST), then we extract the time-series for temperature at 2 m (T2) and wind speed at 10 m from the closest simulation grid cell for each ASOS station in the simulation domain, d03. We perform the diagnostic analysis on these multi-station pairs. NBIAS for most stations lies outside the strip \(\mu =\pm 0.3\) for T2 (Fig. 8) and wind speed (Fig. 9), indicating that the contribution of NBIAS to NRMSE is significant and the error is not random.

Changing from ERA to NARR as the boundary condition generally increased NRMSE and NPE for most stations, regardless of PBL scheme chosen (Fig. 8). Stations that exhibit larger NRMSE and NPE are common within ERA- and NARR-based simulations. ERA-based simulations also show lower overall NBIAS compared to the NARR-based simulations, which show a strong NBIAS of approximately \(-2\) for multiple stations.

The effect of changing MYNN to YSU scheme is less pronounced compared to the effect of changing boundary conditions. Overall, MYNN-based simulations have smaller NRMSE due to decreased NBIAS and NPE as compared to YSU-based simulations. This holds true regardless of boundary condition used.

For NOSST configuration, the effect of not updating SST in ERAMYNN lead to slight increase in NBIAS for KBID, KOQU and KPVC stations but decreased NPE for the same stations. This shift is expected as these stations are situated in grid points designated as water-bodies in WRF. When vertical resolution is increased in ERAMYNN75, there are marginal changes in error metrics when compared to ERAMYNN.

For all simulations, NPE scores are similar in magnitude for surface temperatures, indicating comparable success in pattern prediction. In addition, all simulations show a cold bias for the surface temperature prediction, while NARR-based simulations exhibit much stronger significant biases compared to ERA-based simulations.

When the simulations are evaluated based on their predictions of surface wind speeds, the contrast between simulations is much more striking (Fig. 9). ERA-based simulations have in general smaller NRMSE and NPE compared to NARR-based simulations, regardless of PBL scheme used . This observation is similar to that of surface temperature comparisons. When the PBL scheme is changed from MYNN to YSU, the average NRMSE and NPE increase for each configuration. However, it is also interesting to note that although average NPE increases, the spread of NPE is reduced for both YSU-based simulations. Comparing the performance of NOSST and ERAMYNN75, the average NRMSE and NPE scores are increased slightly, although it is difficult to determine a systematic error associated with each station.

It is clear for this offshore LLJ that boundary conditions introduce a stronger bias compared to those introduced by different PBL schemes, SST field, and vertical resolution.

Based on the above metrics, we choose ERAMYNN as the best simulation as a baseline case for inter-model comparison diagnostics. ERAMYNN offers the best mix of NRMSE/NPE/NBIAS scores for both T2 and wind speed amongst all the simulations.

4.3 Inter-model comparisons

ERAMYNN is chosen as a reference model based on comparisons to ASOS surface observations. The purpose of the analysis in this section is to use the diagnostic tools to objectively quantify the variability in prediction introduced by different model parameters.

NARR-based simulations exhibit a strong cold bias mainly over sea surfaces and strong warm bias over land surfaces during the LLJ event (Fig. 10). The spatial extent and amplitude of NBIAS are larger than in the other model configurations.

Furthermore, NARR-based simulations exhibit strong error variance in the south-eastern portion of the domain, over the sea surface (Fig. 11). For other model configurations, large error variance is confined mainly to regions just off the south-eastern coast. The prominent area of large \(\alpha\) scores in NARR-based simulations over the sea indicates an anti-correlation of temperature between ERA-based and NARR-based simulations in this region, which implies a phase error. This pattern suggests that the time of frontal passage that contributed to the LLJ formation was lagging in NARR-based simulations compared to ERA-based simulations.

The LLJ itself and its frontal boundary can be seen in the northwest-southeast vertical transect of the domain (Fig. 12). The biases are similar to those in the horizontal domain in that NARR-based simulations exhibit overall stronger biases compared to ERA-based simulations with cold bias over the sea and warm bias over land surfaces. ERAYSU shows stronger warm biases over land compared to other ERA-based simulations.

The \(\alpha\) score in the vertical slice of NARR-based simulations shows a spatial pattern of strong normalized error variance that resembles a frontal boundary (Fig. 13). The strong \(\alpha\) score arises from an anti-correlation of temperatures between the NARR and ERA-based simulations and manifests over the sea surface, especially near the marine surface layer. This behaviour corroborates the \(\alpha\) pattern seen in the horizontal plane in Fig. 11, where an associated phase error contributed to the variability in the simulations. In contrast, for the ERA-based simulations, the error variance is mainly confined to the region of sea surface near the coast, with little variation in the vertical profile.

5 Discussion

The strong \(\alpha\) score (\(>1\)) offshore as seen in both Figs. 11 and 13 arises from an anti-correlation of the temperature in that region. This phase error can be explained by the dynamics of the frontal passage. NARR-based simulations produce a frontal passage that passes through the domain later in time and does not have the same spatial extent as in the ERA-based simulations.

Examination of the boundary forcing in ERA and NARR dataset shows slight variation in pressure fields (Fig. 14). A sharper pressure gradient extends across the domain in ERA as compared to NARR, and the low pressure system extends further south in NARR. The resultant geostrophic wind from the sharper pressure gradient in ERA likely increases the frontal advection, thus influencing the formation of the LLJ at the frontal boundary (Ostdiek and Blumen 1997).

In contrast to the role of ERA/NARR boundary conditions, the choice of PBL scheme, vertical resolution and role of SST updates are less significant for model variability. Furthermore, the variability in the ERA-based simulations is more apparent in the vertical profile and is mainly manifested as NBIAS.

The choice of PBL scheme, SST and vertical resolution mainly influence the height of the marine boundary layer and, hence, the height of the LLJ core. The choice of YSU scheme over MYNN results in an overall warmer profile over land due to enhanced mixing from the land surface that is then advected offshore, which is seen in YSU’s warm bias above the marine boundary layer. The role of SST is less significant than expected. Of course, SST fields are updated every 6 h in ERA and in 0.75\(^\circ\) \(\times\) 0.75\(^\circ\) resolution, which may be too coarse in space and time to have a significant impact on the land–sea temperature contrast.

6 Conclusion and future work

In this study, we simulate a LLJ event observed during CBLAST-LOW 2001 campaign off the New England coast and characterize the variability in the LLJ timing, height and speed of the LLJ core due to different model configurations. The impact from the choice of initial and boundary condition dataset, PBL scheme, SST update frequency and vertical resolution is investigated.

The timing of the LLJ initiation is most dependent on the boundary condition dataset (ERA or NARR) (Fig. 5), whereas the height of the LLJ core responds to the choice of PBL scheme. We evaluate simulations against observations to establish an optimal configuration suitable for this case study. Comparison with rawinsonde sounding demonstrates that wind speeds are overestimated by all models, although ERA-based simulations exhibit the smallest bias. The height of the wind speed maximum is also overestimated by all models. ERA-based simulations are able to model the strength and depth of the temperature inversion layer much better than NARR-based simulations, which exhibit a strong cool bias of -2 \(^\circ\)C below the inversion layer. The influence of the PBL schemes on wind speed and temperature profile manifests differently in ERA compared to NARR. However, a single sounding is not suitable for determining the performance in modeling a 4D phenomenon. Thus, we utilize surface observations from ASOS stations for further validation.

To objectively determine model performance, we employ the diagnostic suite described in Koh et al. (2012). This set of diagnostic tools allows us to decompose errors into bias and pattern errors, which provides further insights into the the possible sources of variability between model configurations.

From the diagnostics of surface temperature, we find that all simulations tend to exhibit a cold bias error, although the majority of stations within each simulation have small pattern errors. This led us to conclude that models have comparable success in variance prediction but the biases are likely driven by differences in boundary conditions between ERA and NARR. The wind speed diagnostics demonstrate that the NRMSE is overall higher compared to the corresponding error in temperature. While the magnitude of NBIAS changes between model configurations, stations with \(|\mu | > 0.3\) retain their significant NBIAS. Based on the diagnostic metrics derived from ASOS stations, ERAMYNN offers the best mix of NRMSE/NBIAS/NPE performance; hence, we chose ERAMYNN as the reference model for the inter-model comparison study.

In the inter-model study, we compare the performance of all other model configurations to ERAMYNN . The diagnostic suite is able to help distinguish the type of variability between the simulations. The \(\alpha\) score highlighted the difference between ERA/NARR simulations is due to a phase difference of the frontal passage representation as dictated by the boundary conditions. The variability between different model configurations arises mainly from bias errors which can be explained by the parameterization of the PBL scheme and the changes in vertical resolution.

The diagnostics suite also provides metrics to diagnose wind as vector quantity (see for example the application to radiosonde observations in Tieo et al. (2018)), although it was not performed in this study. The ASOS surface observations for wind speed and direction are not robust enough to decompose the wind into vectors for analysis. In addition, the frontal passage was the major synoptic event that heavily influenced the LLJ formation, hence we diagnose the (scalar) temperature change which serves as a proxy for the LLJ initiation.

We conclude by noting that there is a keen interest in nesting large-eddy simulations (LES) within mesoscale models, such as WRF, to improve weather predictions (Talbot et al. 2012), as well as for wind energy applications (Mirocha et al. 2014; Aitken et al. 2014; Mirocha et al. 2015). Field campaigns, such as IMPOWR (Colle et al. 2016), XPIA (Lundquist et al. 2017) and WFIP2 (Shaw et al. 2019), funded under the U.S. Department of Energy A2e initiative, provide more comprehensive measurements in various conditions that will allow further validation and improvement of both mesoscale and LES models. The performance of nested LES depends on the success of the mesoscale domain in simulating real atmospheric conditions. This study outlines a useful method for modelers to evaluate different mesoscale model configurations and select a suitable configuration for future LES studies.