Multiscale Shear Forcing of Turbulence in the Nocturnal Boundary Layer: A Statistical Analysis

The lower nocturnal boundary layer is governed by intermittent turbulence which is thought to be triggered by sporadic activity of so-called sub-mesoscale motions in a complex way. We analyze intermittent turbulence based on an assumed relation between the vertical gradients of the sub-mean scales and turbulence kinetic energy. We analyze high-resolution nocturnal eddy-correlation data from 30-m tower collected during the Fluxes over Snow Surfaces II field program. The non-turbulent velocity signal is decomposed using a discrete wavelet transform into three ranges of scales interpreted as the mean, jet and sub-mesoscales. The vertical gradients of the sub-mean scales are estimated using finite differences. The turbulence kinetic energy is modelled as a discrete-time autoregressive process with exogenous variables, where the latter ones are the vertical gradients of the sub-mean scales. The parameters of the discrete model evolve in time depending on the locally-dominant turbulence-production scales. The three regimes with averaged model parameters are estimated using a subspace-clustering algorithm which illustrates a weak bimodal distribution in the energy phase space of turbulence and sub-mesoscale motions for the very stable boundary layer. One mode indicates turbulence modulated by sub-mesoscale motions. Furthermore, intermittent turbulence appears if the sub-mesoscale intensity exceeds 10%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10 \%$$\end{document} of the mean kinetic energy in strong stratification.


Introduction
The atmospheric boundary layer in conditions of neutral or weak stability is well described using similarity theory (Grachev et al. 2013), but its modelling becomes arduous in increased stratification (Fernando and Weil 2010). In the nocturnal or polar boundary layer, for example in a very stable boundary layer (vSBL), the intermittency of turbulence challenges the existing B Vyacheslav Boyko vyacheslav.boyko@fu-berlin.de Nikki Vercauteren nikki.vercauteren@fu-berlin.de 1 Berlin, Germany parametrizations used in weather or climate models. An intermittent event of the turbulence kinetic energy (TKE) is often identified as an infrequent burst with a sharp peak followed by a rapid decay. A sequence of these bursts during a night can enhance mixing and change the structure of the boundary layer (Acevedo et al. 2006). Usually, in weak stability, a welldefined boundary layer exists in which turbulence is continuous and decreases with height according to Monin-Obukhov similarity theory, therefore, providing a predictable level of mixing. Excursions from this regime into the vSBL with less understood turbulence can occur in a variety of scenarios.
Intermittent turbulence appears to be distinctively pronounced during clear-sky nights, which favour a strong radiative cooling. Buoyant forces hinder the development of turbulence through the partitioning of turbulent energy between TKE and turbulence potential energy (Sun et al. 2016). The importance of radiative heat loss is discussed in the so-called maximum sustainable heat-flux theory (van de Wiel et al. 2012b), according to which continuous turbulence requires the turbulent heat flux to balance the surface-energy demand resulting from radiative cooling. Beyond a maximum sustainable heat flux, turbulent swirls have insufficient energy to act against the buoyancy force and are attenuated (van de Wiel et al. 2012a). Seemingly, if the turbulence is suppressed due to the temperature gradient, it becomes sensitive to localized perturbations of the flow on so-called sub-mesoscales and tends to be intermittent (Vercauteren and Klein 2015). Global intermittency can also occur in the absence of external forcing (Ansorge and Mellado 2016), and can result from the competition between a strong surface cooling and mechanical generation due to shear (van de Wiel et al. 2002).
On the one hand, continuous turbulence is established for energy injection at a significantly large scale, thereby providing the height of the boundary layer as a characteristic length scale. On the other hand, identification of a boundary-layer height in the vSBL should be considered with caution (Zilitinkevich and Baklanov 2002) due to weak and unsteady mean wind-speed profile. For instance, Lan et al. (2018) observes such a behaviour over a flat surface and describes the boundary layer to be in a decoupled state. Accordingly, the turbulence collapses close to the surface but is mainly generated away from the ground. Below a certain windspeed threshold, the upper region of the boundary layer, as found by Sun et al. (2012), is indeed prone to top-down events, localized shear events, and non-turbulent oscillations, such as meandering motions. These oscillations in the horizontal components of the velocity have been identified using Eulerian autocorrelation functions (Mortarini et al. 2016). In a recent study, Mortarini et al. (2019) extracted the meandering periods in the vSBL (which occupy ≈ 1/3 of their considered nocturnal dataset), further showing that these motions are usually found aloft in the boundary layer. Based on these findings, Cava et al. (2019, their Eq. 3) suggested a unique function for the low wind speeds and the inertial subrange to account for the distribution of the flow energy between turbulence and sub-mesoscales. Overall, the turbulence appears to respond intermittently to perturbations above the ground.
Similarly, processes of different scales can be the trigger for external intermittency (Mahrt 2014). Their spatio-temporal scales and origin are not generalized, but a multitude of processes are commonly lumped under the denomination of sub-mesoscale motions. Besides meandering, sub-mesoscale motions can include internal gravity waves (Zaitseva et al. 2018), density currents , drainage flows (Mahrt et al. 2001), or microfronts (Mahrt 2019). While parametrizing these scales is challenging due to partly unknown physics, exploratory data analysis can help to discover new concepts. For example, Kang et al. (2015) used a statistical method to identify types of non-turbulent structures in the nocturnal boundary layer with no a priori assumption on their physical origin. Sharp, step-like temperature structures that correspond to a shallow and deep event in the boundary layer were found to be associated with intermittent turbulence. To improve the transport properties in the vSBL, the discontinuous, inhomogeneous response of the turbulence to perturbations should be treated in numerical models. As the equilibrium state of the boundary layer weakens with stability, the perturbations lead to instabilities. Distinguishable characteristics can become unclear, like the flux-profile relationship, making the classification of the SBL and the use of classical scaling relationships difficult. Therefore, the stochastic description of intermittent turbulence offers a promising alternative.
Statistical-clustering techniques have recently been applied to gain understanding of the vSBL. Indeed, as it is known that the presence of sub-mesoscale motions can trigger nonstationary turbulence, a causal relationship is to be expected. Characterization of the boundary layer by accounting for this hypothesized statistical relationship between sub-mesoscale motions and intermittent turbulence is an ansatz that was suggested in Vercauteren and Klein (2015). In a related statistical framework, Monahan et al. (2015) applied hidden Markov models to classify the SBL in two regimes. The multivariate clustering of the bulk wind shear, mean wind speed, and stratification reveals a consistent regime structure. Vercauteren and Klein (2015) classified the SBL in terms of statistical causality (Granger 1988) between the filtered velocity on sub-mesoscales and the standard deviation of the vertical turbulent fluctuations using a bounded variation, finite-element, vector-autoregressive-factor method (FEM-BV-VARX). In a follow-up study, Vercauteren et al. (2016) investigated interactions between scales of motion within the detected regimes, and characterized turbulent momentum fluxes using an extended multiresolution-flux decomposition. Moreover, these regimes were analyzed in combination with a turbulent-event-detection method (Kang et al. 2014) to investigate the statistical properties of non-turbulent events across two datasets (Vercauteren et al. 2019b). Overall, the post-analysis of the FEM-BV-VARX regimes highlights a physically meaningful classification of the vSBL, but, so far, very little attention has been paid to the role of gradients in the sub-mesoscale motions (Mahrt 2010). Here, we focus on quantifying the TKE input from a different range of scales, in particular based on their vertical gradients.
We separate the non-turbulent velocity components in several frequency bands using a discrete wavelet transform (DWT) and estimate the vertical wind-speed gradients for each scale. Next, the wind-speed gradients are used to model the TKE in the SBL as a discrete-time autoregressive process. Within the clustering framework, we expect the regime of intermittent turbulence to be classified as one with the most substantial contribution from the gradients of the sub-mesoscale velocity, which was shown to correspond to the vSBL in Vercauteren and Klein (2015). Depending on the performance of the model, for each height, we investigate if the vertical wind-speed gradients on sub-mesoscales are essential in the vSBL. Additionally, separation of the flux-profile relationship is investigated under the statistical classifier, thereby suggesting a stochastic extension for the parametrization of turbulence in the vSBL. As the sub-mesoscale motions are an essential part of the vSBL, the ratio between the turbulence intensity and sub-mesoscale intensity is constructed relative to a mean velocity scale of the flow.
We are motivated by the following idea. For a sufficiently long observation time, each point on the earth has an individual activity level of sub-mesoscale motions. These partly stochastic motions should have a mean energy level and produce a certain level of TKE as a response. Although it is intermittent, and it is not clear how the energy is transformed, for the stochastic-model development, it is essential to know what is the amount of the available energy and what portion of it is converted to turbulence. We hypothesize that each site has its characteristic footprint in the phase space of turbulence intensity and sub-mesoscale intensity depending on the stability. We investigate the turbulence-submeso-intensity (TSI) diagram by separating it based on the stability and the FEM-BV-VARX classifier. Following this research hypothesis, the ratio between turbulence intensity and sub-mesoscale intensity reflects the different types of interactions between scales of motion, and we thus expect to observe distinct behaviours of this ratio depending on the flow regime and on the boundarylayer height.
The study extends earlier findings through the following aspects: -The clustering framework models the log transform of the TKE using the gradients of velocity at three different scales. The log transformation of the time series to be modelled most closely reflects the assumptions of the clustering framework. This transformation is especially important in the regime of strong stratification because the probability density function (p.d.f.) of the TKE is not Gaussian and resembles the shape of a log-normal distribution. -A new representation of the interaction between sub-mesoscale motion and turbulence is presented and analyzed. -A connection between Monin-Obukhov similarity theory and the results of the clustering are presented.
Below, the Fluxes over Snow Surfaces II (FLOSSII) dataset and the preprocessing steps are addressed in Sects. 2 and 3 gives a brief introduction to the filtering approaches used to separate scales and the classification based on the statistical modulation of the TKE by the different scales. The results are presented in Sect. 4 and consist of visualized scale activity during three interesting nights, a classified TSI diagram by the stability and clustering method, as well as a classified flux-profile relationship. The results are discussed in Sect. 5 and the conclusions are given in Sect. 6.

Dataset and Preprocessing
The dataset under consideration consists of high-resolution eddy-correlation data collected during the FLOSSII field program, which took place from 20 November 2002 to 4 April 2003 over locally flat grass in the North Park region of Colorado, near Walden (Earth Observing Laboratory data 1999). Seven sonic anemometers (Campbell CSAT3) collected three velocity components and the temperature at 1, 2, 5, 10, 15, 20 and 30 m above ground level and mounted on a rigid truss tower, allowing airflow to pass through. The terrain of the site is flat with a variable brush height from 0.2 m to about 0.5 m (Mahrt and Vickers 2005). The ground was covered by a thin snow layer for approximately 20 days of the field campaign.
The dataset is quality controlled following Vickers and Mahrt (1997). In the following, we describe the preprocessing steps in the sequence in which they are performed, while Fig. 1 displays the corresponding flow chart. The raw data comprise 132 days of continuously recorded measurements. Occasionally, the instruments were not working properly, causing long data gaps (on a scale of several hours). The days with these long periods and those with unrealistically large velocity changes in time (|Δu/Δt| > 117 m s −2 ) are removed, resulting in 102 days of data. The remaining shorter data gaps on a scale of minutes (probably the results of the quality control) are linearly interpolated in time, therefore increasing continuity for the clustering (which is described in Sect. 3.3). Furthermore, we do not remove periods during which the flow is passing through the tower because the disturbances caused by the tower are of the scale of the truss rods ( ≈ 0.1 m), meaning the fluctuation energy induced by the rods is small compared to the 1-min turbulent scale. Excluding periods for flow through the tower as in Vercauteren et al. (2019b) results in a shorter dataset, consisting of 68 days, and excludes a portion of strongly stable nights. We chose to instead keep the 102 nights for the analysis. After the interpolation, the velocity is double-rotated (Lee et al. 2004) into the mean wind direction with a moving window. The natural mean wind direction is calculated from the 30-m sonic anemometer using a moving average on a scale of 1 h.
In the next steps, the velocity components are filtered in order to obtain a multiscale decomposition. First, a running time average is applied to estimate the Reynolds stress and sensible heat flux. Second, the DWT technique is used to separate the larger scales of the velocity signal into different ranges of scales which is detailed in Sect. 3.1. The nocturnal time is then selected based on the average negative heat flux over all nights. The nocturnal time is selected after the wavelet filtering to mitigate edge effects from the wavelet transform, known as the cone of influence of a mother wavelet (Torrence and Compo 1998). The gradients of the filtered velocity components with respect to height are computed using finite differences; a central difference for the heights 2, 5, 10, 15, 20 m, taking into account the non-equidistant grid, and one-sided at 1 m and 30 m.
We did not perform any correction of the sensible heat flux due to lack of moisture measurements. However, this correction only affects the results in the estimation of the fluxprofile relationship, as all other analysis are based solely on velocity measurements.

Definition of the Scale Ranges
At high Reynolds number, turbulence in the inertial subrange has universal properties that express themselves through the well-known energy cascade that follows a power law. Generally, eddies in the energy-production range are not expected to follow a universal description (Pope 2001, p. 247). In the following, we divide the turbulent signal into three frequency bands in order to study the response of the turbulence to different ranges of scales.
In an earlier analysis of the FLOSSII dataset presented in Vercauteren et al. (2019b), the computed momentum and heat-flux cospectrum in the vSBL displayed a scale overlap between turbulent and non-turbulent motions, therefore indicating a lack of scale separation. Nevertheless, the analysis suggests that scales < 1 min correspond mainly to turbulence in the nocturnal FLOSSII dataset, and we similarly use this threshold to define the turbulent scales. Originally, the flow was sampled at 60 Hz. The decomposition of scales is performed in two filtering steps. In the first step, the turbulent fluctuations are separated using a running time average, where 2N = 3600 is the number of samples corresponding to a period of 1 min, thus · denotes time average on a scale of 1 min. The velocity component that is filtered (u, v, w) is denoted by f (t) and G is the convolution kernel being equal to one (i.e. this corresponds to block averages as performed in classical Reynolds averaging). The turbulent fluctuations are then f = f − f . If the boundary layer is in the weak stability regime with typically larger turbulent eddies, the 1-min averaging produces an underestimated TKE, because the filter scale is not adapted to the scale of turbulence. To compute the TKE, a running time variance is used, where 2N = 3600 is the number of samples corresponding to a period of 1 min and the mean is removed according to Eq. 1. Consequently, the TKE is e = 0.5( u u + v v + w w ). One advantage of the running time statistics over the block averaging is the potentially higher sampling frequency for subsequent analyses, which influences the performance of the Fig. 2 A sketch of the time scales for the FLOSSII dataset. The vertical dashed lines from left to right indicate the frequencies of 3 h, 1 h, and 1 min. The median (black line) and the 0.05 with 0.95 quantiles (denoted by the grey area) are computed over 102 nights. The energy spectra is calculated with a continuous wavelet transform using the Morlet wavelet model identification. The highest frequency of the velocity u after the averaging is 0.02 Hz, or in different units 60 cycles per hour (cph) and it is the so-called band limit frequency f c of the signal. According to Billings (2013, p. 476), Shannon's theorem states that to recover all the information in a signal band limited to the frequency f c , the signal should be sampled at the minimum rate of 2 f c . For parameter estimates, a sampling rate of around 5 f c /2 is often sufficient. As a consequence, the variables e, u , w θ are downsampled (Alkin 2016) from f s = 60 Hz = 216, 000 cph to f d = 180 cph, thereby resolving 1-min fluctuations with three points. This sampling rate is important to extract statistical relations between scales of motion when using the clustering approach (see Sect. 3.3) up to a time scale of 1 min.
In the second step, after the Reynolds averaging (u = u + u ) the velocity u is further separated into three ranges of scales u = u + u + u by using the DWT approach (see Sect. 3.2). The sub-mesoscales u = ( u, v, w) occupy the period from 1 min to 1 h, the jet scale u = ( u, v, w) occupies the band from 1 to 3 h, and the mean scale u = (u, v, w) includes scales of duration ≥ 3 h. These ranges are summarized in Fig. 2.
Three hours for the mean velocity scale is chosen because it is sufficiently large to observe a logarithmic wind-speed profile corresponding to traditional similarity theory. The jet scale has been introduced because this scale has enough energy to produce ground-sheared turbulence, but insufficient to contribute to the wind-speed profile up to the height of 30 m. Besides, the jet scale adds inflection points to the mean wind-speed profile, thereby changing the height at which a maximum velocity is observed. The goal of our multiscale decomposition is to find a mean scale for which the highest measurement point (30 m) corresponds to a maximum wind speed according to the traditional concept of a boundary layer. At this mean scale (3 h here), we expect traditional similarity scaling to be valid. The scaling velocity u ∞ (t) in our subsequent analysis is the velocity at the height of 30 m, with a band limit of 3 h (see Fig. 2). By scaling with the wind speed u ∞ (t), unknown deviations (the jet, sub-mesoscale and turbulence) may be studied in relation to the known similarity theory. Similar to the definition of the turbulence intensity, (Pope 2001), we define the sub-mesoscale intensity which denotes the energy content of the sub-mesoscales. The relation between these parameters is investigated in Sect. 4.3. The jet intensity is not analyzed, because it is expected to produce turbulence mainly through the shear at the ground, similarly to the mean scale.
The wind speed to be analyzed for the mean, jet and sub-mesoscale bands are defined , respectively, as U = u 2 + v 2 , U = √ u 2 + v 2 and U = √ u 2 + v 2 . When using the DWT approach, it is important to choose an appropriated basis function. As noted by Kumar and Foufoula-Georgiou (1997), the Haar wavelet provides a good and simple choice for applications where the process has sharp variations such as within the inertial subrange. Outside the turbulent cascade, where sub-mesoscale motions are recorded as smoother signals, a high-order Daubechies wavelet is better. Consequently, as the scale changes from turbulent to sub-mesoscale, to jet and to the mean scale, the basis of the wavelet should change from a Haar wavelet to a Daubechies wavelet. As a more straightforward approach, we use two different filter types instead, and is explained next.

Wavelet Filtering
The aim is to separate wind-speed profiles of different scales and to estimate the corresponding vertical wind-speed gradients for later use in the modelling and clustering. Linear filters that are constructed in the Fourier domain are not able to filter the velocity u correctly due to the unsteadiness. Here, we apply the DWT approach as implemented as a filter bank in the Python software package reported by Lee et al. (2019) to perform this task. A Daubechies wavelet ψ s,c (t) of order 20, which is selected as a mother wavelet and discussed in detail later, forms an orthonormal basis, where δ i j is the Kronecker delta A function f (t) can be, s is the dilation, and c the translation index of the dyadic scale. These functions can be approximated by a linear combination to arbitrary precision with the appropriate choice of the basis. The coefficients of a wavelet transform are where f (t) denotes the analyzed function. The dyadic-scale discretization of a wavelet allows a sparse representation of a signal in one dimension, and provides a time-scale representation of a process (Kumar and Foufoula-Georgiou 1997), The reason for using a dyadic filter bank is efficiency. The basic idea is to construct a cascade of high-and low-pass filters (Mallat 1999, p. 255). After one level of decomposition, the output consists of approximations (low-frequency band) and detailed (high-frequency band) coefficients. Then, the approximation coefficients are downsampled by a factor of two, and the discrete convolutions with low-and high-pass filters are repeated. The filter bank can have multiple levels of decomposition, and by the inverse transform of the cascade, the signal is entirely reconstructed. By setting the wavelet coefficients to zeros at one decomposition level, the user removes unwanted scales, therefore extracting those that are of interest. Considering the Nyquist sampling theorem and previous filtering steps, the highest frequency in the signals of the parameters e, u , w θ is 90 cph. After the first level of decomposition, the detailed coefficients contain frequencies in the range 45-90 cph. The full filter-bank cascade consists of nine levels and is summarized in Table 1. The choice of a wavelet basis is dependent on the analyzed time series and must represent most of the energy content of the signal with the least amount of wavelet coefficients (Kumar and Foufoula-Georgiou 1997). While the Shannon measure of entropy can be used to find a basis from the considered library of wavelets as demonstrated by Katul and Vidakovic (1996), another reasoning is applied here. The velocity on the different scales (u, u, u) must be separated in the physical space because we aim to investigate from which velocity scales the TKE is triggered. To separate scales, a basis with strong localization properties in the frequency domain is needed. In particular, weak localization properties are characterized by a smooth frequency response function, which stretches into the neighbouring bands beyond the predefined cutoff, leading to energy loss across scales. That weakness of the DWT approach is so-called spectral leakage (Qiu et al. 1995), which, according to the sensitivity study of Peng et al. (2009), is mitigated by selecting a mother wavelet of a high order, and thereby achieving a sharper cut-off. Higher-order wavelets require a higher time resolution of the signal, which is fulfilled by the a sampling frequency of 180 cph used here.

Non-stationary Clustering Based on a Linear-Autoregressive-Factor Model
The clustering method (Horenko 2010) is used to classify regimes based on discrete-time statistical linear models for a multivariate variable and has been used, for instance by Franzke et al. (2015) to study the response of Southern-Hemisphere-circulation trends to multiple external forcing; they found that anthropogenic CO 2 is a more relevant driver than ozone depletion. Risbey et al. (2015) investigated mid-tropospheric flows and showed that teleconnection patterns in the Northern Hemisphere exhibit characteristic switching of metastable states. O' Kane et al. (2017) investigated the memory and dimensionality in terms of the clustering method in defining the quasi-stationary states of the troposphere.
In brief, the clustering method expresses the non-stationary time series as a combination of K > 1 stationary submodels that alternate in time, with the difference between each of the models being their parameter values. Each parameter set of the K 'th model is constant for an a priori unknown period of time, which, along with the corresponding model parameters, are determined through a machine-learning process. Here, we construct a one-dimensional autoregressive-factor model with exogenous variables to learn about the non-stationary modulation of TKE by vertical gradients of the wind speed at different scales.
The wind-speed gradients on the filtered scales (which are treated as exogenous variables): if the past history of the gradients F M , F M and F M helps to predict the value of e M at some stage in time (Granger 1988). The variables are temporarily marked with the subscript M to indicate that they are rescaled later in the text, after which "M" is dropped. Concerning the selected exogenous variables, we have the following expectation of the model. The TKE in the neutral and the weak stability regime is predicted by the gradient ∂U /∂z according to similarity theory and, therefore, we expect to find a cluster for which the dynamical evolution of the TKE is mainly modulated by the gradient ∂U /∂z. In the statistical model to be introduced next, this translates into having parameter values of ∂U /∂z of larger magnitude than those of ∂ U /∂z and ∂ U /∂z (see Eq. 10). Respectively, for the strongly stable regime we hypothesize to find that the gradients ∂ U /∂z and ∂ U /∂z are more Granger causal for TKE than for the gradient ∂U /∂z.
Here, we consider rescaling the variables. On the one hand, rescaling is favoured if the clustering aspect of the methodology is of interest. On the other hand, if the modelling aspect is of importance, a physically meaningful scaling should be applied. As nondimensionalizing a system is not always possible in exploratory data analysis as little knowledge on the process may be available, our focus is clustering, which, therefore, favours a statistical scaling (see Eq. 9).
Removing traceable nonlinearity from the data also benefits the clustering. Since the statistical model is linear within each cluster, the algorithm does not need to learn the nonlinearity. Otherwise, the model would need more clusters. According to an earlier similar study (Vercauteren et al. 2019a), the TKE exhibits a heavy tail distribution in the strongly stable regime. Consequently, taking the logarithm of the TKE, e t ≡ ln(e M ), and making the distribution more Gaussian effectively reduces the number of clusters needed to describe the intermittent regime. Otherwise, the clustering was found to separate cases based on the mean TKE levels, rather than on dynamical interactions with the forcing variables (the latter being our scientific interest). In the following, the variables are standardized by The clustering framework is generalized to multiple dimensions and can be used with all seven measurement heights of the FLOSSII dataset. Unfortunately, this leads to a high number of parameters, because cross-correlation terms need to be considered. Therefore, the one-dimensional model is estimated for each height separately to keep the parameter space small. The linear time-lagged model structure is used (Note: TKE is now logarithmically transformed e t ≡ ln(e M )) e t =μ(t) + p n=1 a n (t)e t−n where p and q are memory lags of the statistical process, the subscript t denotes the discrete timestep, and (t) = {μ(t), A(t), B(t), ε(t)} the parameter set. The first parameter of the set (t) is an offset μ(t) that later under assumptions becomes a constant mean value of a cluster. The vector A(t) = [a 1 (t), . . . , a p (t)] consists of the coefficients of the autoregression and consists of the coefficients of the exogenous variables. The last parameter of the set (t) is the residual error ε(t) between TKE (e t ) and the linear model. The model distance function is defined by the least-squares residual norm, The variational problem N t t=1 g(e t , (t)) → min (t) (N t is the length of the time series in samples) to find the parameters is ill-posed. It is assumed that the function g(e t , (t)) can be represented as a linear combination of K > 1 distance functions, is a time-dependent model-affiliation vector fulfilling the convexity condition, Alternatively, the affiliation vector can be interpreted as a probability vector, which indicates at each timestep a probability to observe a set of model parameters that fits the data in the best way. The reason to observe a change to another more probable regime is induced by the gradients of the sub-mean scales as prescribed by the model structure (see Eq. 10). With this assumption, the average clustering function takes the form, and is efficiently solved with the subspace-clustering algorithm (Horenko 2010) for finding the model affiliation function and model parameters.
In summary, we started with a problem, where the parameters describing the modulation of the TKE by the filtered velocity gradients are time-dependent, and relocated the issue of nonstationary parameter estimation to a model-affiliation function by imposing assumptions on the parameter space. The assumption that the data have a finite set of persistent states makes the inverse ill-posed problem solvable. The optimization problem Eq. 14 is solved by a finiteelement approach. Furthermore, the problem is regularized by constraining the persistency of the functions γ k using the class of functions with bounded variation BV ([0, N t where the persistency parameter C defines the maximum number of transition between the cluster state k and all others. Here, the functions γ k , k = 1, . . . , K are discrete vectors (zero or one). For more details, we refer to Metzner et al. (2012). Lastly, the four hyperparameters: the persistency parameter C, number of clusters K , number of autoregression lags p and the number of lags for the exogenous variables q can be determined a posteriori using information criteria as explained by Metzner et al. (2012). However, we limit our analysis to three clusters that approximate the weak, intermediate, and strong stability regime of the boundary layer sufficiently well. Another way to measure the performance of the statistical model is to compare the modelled variance with the variance in the data using the so-called R 2 score, where E[·] is the expected value, and N k the number of samples in the cluster k. The maximum number of lags for the autoregressive terms p and forcing variables q (see Eq. 10) affect the R 2 score. We limit the memory depths in the statistical model to p = 3 and q = 10, since a further increase does not yield a significant increase in the value of R 2 (not shown). The optimal parameter C, is similarly selected based on the value of R 2 (see Sect. 4).

Methods for Boundary-Layer Analysis
The local scaling theory is widely used in single-column models (Rodrigo and Anderson 2013;He et al. 2019) but fails at increased stratification due to the increased scatter. We use the estimated classifier (t) (see Eq. 14) to examine only the flux-profile relationship for momentum in three different clusters. Table 2 shows the relation between the detected clusters and the stability parameter ζ . To avoid the self-correlation in the flux-profile relationship, we use following Grachev et al. (2018), where the friction velocity is replaced by the standard deviation of the vertical velocity component σ w . The quantity φ m φ −1 w is expressed as a function of stability ζ = z/L, where the Obukhov length is defined in terms of the local friction velocity u * = [ v w 2 + u w 2 ] 1/4 , the von Kármán constant κ = 0.4, the acceleration due to gravity g = 9.81 m s −2 and the local kinematic heat flux w θ , with w and θ being fluctuations from their respective 1-min-averaged mean. The reference potential temperature 0 is calculated as an average temperature of all nights and over all heights. Furthermore, we wish to analyze the energy distribution between sub-mesoscales and turbulent scales in the phase space spanned by the parameters T I and S I (see Fig. 11). To better understand the diagram, it is useful to know if some regions can be associated with a stability value or the shape of the mean wind-speed profile. These two properties are inherently linked. For instance, by damping the turbulence, the shape of the mean wind-speed profile is changes from logarithmic towards linear. The type of the boundary layer can be quantified by the shape factor H = δ 1 /δ 2 , where δ 1 [m] is displacement thickness and δ 2 [m] is the momentum thickness (Schlichting and Gersten 2016) defined as respectively, where the theoretical values are H = 2.3 for a laminar profile and H = 1.3-1.4 for a turbulent one. In the atmospheric boundary layer, we are unlikely to observe a laminar flow. Therefore, we use the shape factor as an approximate indicator to know the type of the mean wind-speed profile.

Estimation of the High-Density Regions for a Density Function
Through the TSI diagram, it is demonstrated that the clustering method reveals a stronger link between the sub-mesoscales motions and turbulence in comparison with the classification with the stability parameter ζ (see Fig. 12). In Fig. 12, the high-density regions are estimated according to the method of Hyndman (1996) who defines the regions as follows. For a density function f (x) of a random variable X, the 100(1−α)% high density region is the subset Selecting regions in this way enables the detection of the bimodality of the distribution even in higher-dimensional space.
and sub-mesoscales relate to the turbulence intensity and the local stability. The selected nights support our subsequent statistical analyses with typical case studies of multiple-scale interactions in nocturnal boundary-layer flows. The three nights are chosen from the whole dataset and are arranged according to decreasing mean wind speeds. For the rest of the nights (not shown), there is an irregular mixture of the dynamics present in the nights selected for presentation here (Figs. 3, 4, 5). Here, the height coordinate is normalized with the tower height z * = z/z tower and the time coordinate with the length of the night (15.38 h) t * = t/t night .
In the second step, we cover the performance of the classification method that was introduced in Sect. 3.3. Each height is classified individually. We compare the in-sample prediction of the clustering method with the observed T I values and reveal the statistical link between the velocity gradient on different scales and the turbulence response in terms of the model parameters.
To summarize the scale interactions, the TSI diagram is introduced. To develop a better understanding of this diagram, we consider regions marked by values of ζ and the shape factor H (see definition in Sect. 3.4). The point cloud of the TSI diagram is further investigated by the statistical-clustering method and by a threshold to the value of ζ . To relate these classifiers, Table 2 presents the stability of the detected regimes at each height. The intermittent class is characterized by an intense variability of ζ > 1 (see the interquartile range in Table 2), although the median stays below one.
To finalize the results, we present the impact of the statistical clustering on the stability function (see Sect. 4.4).

Time Evolution of Wind-Speed Profiles for Different Scales
We begin by reviewing the scale activity for a night (see Fig. 3) with a moderate mean wind speed of E(u ∞ ) = 5.7 m s −1 . Within the FLOSSII campaign, nights with u ∞ > 6 m s −1 unmistakably show a well developed logarithmic mean wind-speed profile together with a corresponding profile of the TKE. Moreover, such nights are close to neutral or weakly stable, with little relevance for our investigation. Similarly, a critical wind speed of 5-7 m s −1 above which turbulence is active in a classical way, such as for clear-sky conditions, was found by Van de Wiel et al. (2012).
The onset of intermittency begins already in weak stability, where the local stability starts to demonstrate weak spatio-temporal variations (see Fig. 3e, where t * ∈ (0.3, 0.5) and t * ∈ (0.8, 1.0)). The mean velocity profile throughout this night is well developed and appears relatively steady (see Fig. 3a). The scale fluctuations of the sub-mesoscale motions (see Fig.  3c) indicate an increase of relative magnitude at t * ∈ (0.1, 0.2) and t * ∈ (0.7, 0.9). For the time t * ∈ (0.7, 0.9), the fluctuation of sub-mesoscale band corresponds with the alternating colour bands in the turbulence intensity, while the jet-scale band is less pronounced. In contrast, in the range t * ∈ (0.1, 0.2) the jet scale is decreasing t * ∈ (0.1, 0.15) and then increasing t * ∈ (0.15, 0.2). At that time, the magnitude of T I responds to the temporal evolution of the jet rather than the sub-mesoscale (see Fig. 3d where t * ∈ (0.1, 0.2)). The jet and sub-mesoscale show no change with height either. In conclusion, this night is generally weakly stratified with insignificant deviations. The inferred scale interactions observed are more pronounced during the more stably stratified night. Figure 4 displays a night with an average mean wind speed of E(u ∞ ) = 2.7 m s −1 and a noticeable deviation in the wind-speed profile (see Fig. 4a), which is non-stationary and seems to denote a less turbulent one at the beginning t * ∈ (0.1, 0.3) and at the end of the The complex evolution of the boundary layer can be concluded from the ζ value (see Fig. 4e), which indicates a shallow boundary-layer height at t * ∈ (0.05, 0.1) followed by a sequence of fully to partially neutral states for the period t * ∈ (0.1, 0.3). An irregular spatial switching from neutral to strongly stable conditions is evident for t * ∈ (0.3, 0.5) and is decoupled from the ground. This dynamics of the stability parameter ζ is then accompanied by high-frequency noise for the whole range of stability values varying from unstable to stable (see t * ∈ (0.5, 0.9) in Fig. 4e). In the first half of the night t * ∈ (0.3, 0.5), stability ζ responds more on a larger scale, while in the second part of the night, it follows more the small-scale dynamics. The average relative magnitude (estimated from the figure) of the jet and sub-mesoscale (see Fig. 4b, c) is approximately 0.5 u ∞ . In the latter half of the night, the jet and sub-mesoscales are more noticeable than in the first half because of the lower mean scale energy. Sharp peaks in the turbulence intensity for t * ∈ (0.5, 0.6) and t * ∈ (0.7, 0.8) correspond to increased absolute values of jet and sub-mesoscales together with a relatively comparable shape of the mean wind-speed profile t * ∈ (0.4, 0.8).
The change of the jet and sub-mesoscales with respect to height becomes more apparent for the night with the lower mean wind speed (compare top in Fig. 4b, c with Fig. 5b, c). At both scales, the sporadic accelerations occupy significant portions of the boundary layer. Generally, the T I is intermittent in time and shows signs for pronounced dependence on height (see Fig. 5d; t * ∈ (0.3, 0.4)). As demonstrated next, the mean wind speeds E(u ∞ ) < 2.7 m s −1 should not be considered as the energetically large scale, but rather the jet scale.   Figure 5 displays a night with a wind speed E(u ∞ ) = 1.3 m s −1 . The variability of the submesoscales is of the order of 1 m s −1 . The profile of the mean wind speed is unsteady in time and vertical direction (see Fig. 5a). The value of T I is more sensitive to the sub-mesoscale band than to the jet scale as evident by comparing the magnitude of the alternating coloured bands (see Fig. 5c, d). While it is difficult to infer any structural information from the profiles due to their chaotic behaviour. Some large values in horizontal velocity components at the end of the night are notable t * = (0.9, 1.0), probably the result of the DWT calculations.

Performance of the Statistical Classification Model
Working with the clustering method enables the clustering of the data and provides a model for the quantity of interest, in our case, the TKE. It is interesting to know how the estimated model performs in predicting the training dataset. Unfortunately, predicting outside the training dataset is beyond the scope for this work as described in the discussion section.
The clustering methodology requires several parameters which can be determined a posteriori. The number of clusters is set to three to resolve weak, intermediate, and strong stability regimes (see Table 2). Test runs were performed with two and four clusters. Two clusters did not separate the right branch of the TSI diagram (see Fig. 9), and four clusters resolved the intermediate regime in two. Also, more clusters added complexity to the data analysis and did not yield any new information. Therefore, the number of clusters for the FLOSSII dataset is not statistically optimal.
The relative explained variance is represented with the coefficient of determination R 2 , which serves as the performance indicator for the model, and may be plotted against the persistency threshold to indicate an optimal value C opt (see Fig. 6) by its maximum. Figure 6a indicates that the most variance is explained close to the ground. Rescaling the graphs for each height shows that the optimal value of C is dependent on height, with the averaged value over all heights C = 103. The parameter C opt undergoes a spread if the minimization is repeated, because the global optimum is approached from different, random initial conditions. The variability in the value of C opt is of order 10 (± 10 regime jumps for 102 nights), and hence the location of the regime jumps in the affiliation function differ insignificantly between the different solutions. The balance between the number of minimization trials and the solution accuracy is found to be acceptable for this dataset.
The R 2 score is summarized in Table 3 for each height and regime, with the overall explained variance by the model ranging from 0.8 to 0.93. The model performs consistently in the weak and intermediate clusters with respect to the modelled height as evident in the steady decreasing value of R 2 with increasing height (see Table 3). This is because, in the weak stability regime, the turbulence is well described by the shear generation at the ground. In contrast, the strongly stable regime shows a lower score and no trend with respect to the height.  Figure 7 displays the modelled and measured T I for the intermittent night presented in Fig. 4. The large departure of the model is evident at z * = 0.4. The small-scale details at the top of the boundary layer are captured but do not extend deep enough towards the ground, mainly because the model does not account for spatial correlation. The time evolution of T I is plotted for z * = 1.9 (see Fig. 7c), illustrating that the model captures the main tendency but with lower variance. Figure 8 illustrates the statistical link between the TKE and the vertical velocity gradients of the jet and sub-mesoscales in terms of regression coefficients. The absolute values of the parameters cannot be compared across the regimes, because the averaged value of T I is different in them. However, a comparison between the coefficients of the forcing variables is justified within one cluster at one height. In the Fig. 8, the right column of panels shows the strongly stable regime (intermittent). At the 1-m height (see Fig. 8i), the memory effect (more time lags are important) is more significant than at 15 m (see Fig. 8f) and 30 m (see Fig. 8c). By comparing the colour-coded areas, one can see that the contribution of each scale to the TKE is approximately equal at the ground (see Fig. 8i). With increasing altitude, the impact of the gradients of the jet and sub-mesoscales on the TKE is reduced (see Fig. 8f, c). The expected behaviour can be seen in the detected weak stability regime (see Fig. 8a, d, g). Close to the ground (see Fig. 8g), the dominant factor is the gradient of the mean velocity scale with the smallest relevant lag time. Further away from the ground (see Fig. 8a), the jet scale and the mean is dominant but with a longer memory effect. The detected intermediate regime (see Fig. 8b, e, h) indicates a contribution from the jet scale at all three heights.

Turbulence-Submeso-Intensity Diagram
The TSI diagram illustrates the ratio between the energy of the sub-mesoscales and that of the turbulent scales, and is expressed relative to the mean flow, therefore sharing a common denominator (u ∞ ). By observing only the scatter plot, we may conclude that, at some point, the value of T I is correlated with S I (see Fig. 9a, where S I > 10 −1 ). This dependency is especially suspicious as the slope is almost one. On the one hand, we cannot trust the simple scatter plot due to possible self-correlation (Kim 1999) for S I > 10 −1 . On the other hand, there is a visible change at S I ≈ 10 −1 from no correlation to almost 1 : 1. To investigate the relationship between the values of T I and S I , we rely on the stability ζ threshold, clustering method and the shape factor H of the boundary layer, which groups the scatter plot in different ways based on stability, on Granger causality between TKE and the gradients of the filtered wind speed, and the type of the boundary-layer profile. With these indicators, the TSI diagram is more reliable. Before analyzing the results of the classification, we investigate the indexing of the TSI diagram by the stability and the shape factor H . Figure 9 displays the TSI diagram for three heights: 1, 15, and 30 m (column-wise from left to right), with each point colour-coded with stability (see Fig. 9a, b, c) and the shape factor (see Fig. 9d, e, f). First, observe that the data points are changing the outer shape, thereby indicating a height dependence. Second, the average turbulence level (T I = 0.11; see Fig. 9a, the left corner of the point cloud) within the weak stability condition (ζ < 0.5) is evident for the height of 1 m. The shape factor (see Fig.  9d) in that region has low values and is an additional confirmation that the boundary-layer profile is turbulent.
Periods of strong stability (ζ > 1) are observed more frequently (darker region) above ground level (see Fig. 9c) than at the surface (see Fig. 9a). With increasing height, the left corner of the point cloud (indicator for average T I ) decreases its value and blends with the centre. Detection of the average T I becomes difficult due to increased scatter at the left edge of the point cloud (see Fig. 9c). The shape factor indicates that the left corner of the cloud is mostly in a state of developed turbulence (see Fig. 9f).
The left corner of the TSI diagram is where the mean wind speed and TKE are high. According to the value of ζ , it is a region of neutral conditions (see Fig. 9a, b, c). We know that the upper right corner of the TSI diagram is a region with a low value of the mean wind speed and a low level of turbulence, but this is not obvious from the diagram because it displays the dimensionless TKE. The dimensional TKE (not scaled with the mean wind speed) is low in the upper right corner and high at the left corner of the TSI diagram. The lack of turbulence reduces the transport of momentum and therefore changes the shape of the mean wind-speed profile. Consequently, we expect to find a change in the shape factor, and this is indeed the case (see Fig. 9d, e, f), indicating a less turbulent profile at higher values of S I . Figure 10 displays the classification of the TSI diagram by constraining the value of ζ . The subcritical cluster range 0 < ζ < 1 is presented in the upper row and the supercritical cluster 1 < ζ < 10 in the bottom row for three heights of interest. Furthermore, the panels display the estimated p.d.f. for the constrained sets of points.
For the subcritical stability, the densest region shows a reduction of T I with height, as expected (see Fig. 10a, b, c). A barely noticeable decrease of the sub-mesoscale energy across the boundary-layer height is present (observe the shift of the densest region to the left). For the supercritical stability, the elliptical shape of the densest area changes in location and orientation in the TSI diagram (see Fig. 10d, e, f). The sub-mesoscale intensity decreases across the height in the supercritical regime more than in the subcritical. At the ground, the high-density regions for subcritical (see Fig. 10a) and supercritical (see Fig. 10d) regimes are separated, but away from the ground, they overlap (see Fig. 10c, f). Figure 11 displays the classification of the TSI diagram by the clustering method in weak, intermediate, and strong stability regimes for three heights: 1, 15 and 30 m. Both classifiers show almost an identical p.d.f. in the TSI diagram for the weak stability regime (compare Fig. 10a, b, c with Fig. 11g, d, a), which supported by the evaluated ζ value within the clustering method regimes (see Table 2). The strong stability regime in Fig. 11i, f, c shows more elongated density functions. Across the height, the shape is changed from an angled feather-like structure to an elliptical one. We suspect that, close to the ground, the submesoscale motions produce the turbulence in two different ways, with the first through the ground shear (note that the minimum TKE is not changed across regimes; see Fig. 11i) and the second one by a different mechanism, such as wave breaking or wind turning (see  Fig. 11c), but which is lower in the weak stability regime (see Fig. 11a, d, g). One way to relate a p.d.f in the TSI diagram classified with different methodologies is to investigate the high-density region, by comparing the strong stability regime detected with the clustering method to the regime selected with 1 < ζ < 10 for α = 0.85 (see Fig. 12). This thresholding value α means that each p.d.f. has a volume of 15% and is confined from below with a plane, below which every value is removed. The statistical-clustering method identifies a larger region in the TSI diagram than the ζ -classifier (compare Fig. 12a, b, c with Fig. 12d, e, f). The intermediate cluster (see Fig. 11h, e, b) is also a part of the 1 < ζ < 10 set. Nevertheless, the clustering based on the relationship between turbulence and velocity gradients depicts a more intense relationship between turbulence and sub-mesoscale motion in the TSI diagram. It is interesting to note that at high stability local ζ value can indicate  1 (a, b, c) and supercritical stability 1 < ζ < 10 (d, e, f). A p.d.f. is estimated and plotted for each set of points. Each column of panels represents a height a range of T I (see Fig. 9c), but on average it is highlighting only a reduced low value (see Fig. 12c). Surprisingly, the clustering method finds an additional state of turbulence intensity (compare Fig. 12f with Fig. 12c). At each height, the statistical classifier finds a more elongated relationship, where the ζ classifier shows more concentrated spots and cannot detect the increased T I values at higher stabilities, suggesting that the temporal dependence of the sub-mesoscale motions plays an essential role in the description of the intermittent turbulence. Figure 13 displays the flux-profile relationship φ m φ −1 w as in Grachev et al. (2018) for each cluster as this form of stability function is not affected by self-correlation. We recall that the cluster stabilities have different origins than in Monin-Obukhov similarity theory and are grouped based on the model affiliation function (t) that is determined by solving the minimization problem Eq. 14. Equation 10 prescribes the rule that is used to identify a cluster. For each regime, a p.d.f. is estimated, and the high-density region is constrained with α = 0.32. The three flux-profile relations show substantial overlap with increasing scatter. The strong stability regime (see Fig. 13c) reveals an outbreak towards the horizontal axis, which means more sustained turbulence. The maximum error (the distance between the black curve and the edge of the colour coded p.d.f.) relative to Monin-Obukhov similarity theory can be found in the intermittent cluster approximately at ζ ≈ 2. The error decreases for higher ζ . This observation is not sensitive to the value of the p.d.f. threshold α. Besides calculating the flux-profile relation with the gradient of the mean scale, we examined the flux-profile relation constructed with the gradients of the jet and sub-mesoscale, which reveal substantial scatter regardless of the classified regimes and, therefore, are omitted.

Discussion
Parametrization of SBL dynamics has recently received significant attention (Li et al. 2016;Baas et al. 2018), with the aim of an adequate representation of strong stability with characteristically intermittent turbulence encouraging the exploration of the physics governing the vSBL. Besides the dependence on the topography (Serafin et al. 2018), turbulence modelling in the SBL is complicated by the fact that strong stratification reduces the turbulence scale, thereby giving room for other processes to act. These processes produce additional turbulence compared with mean-shear driven turbulence. As a consequence, in the vSBL, turbulence is triggered at multiple scales and can be localized in space and time.

Fig. 13
Classified flux-profile relationship φ m φ −1 w with the clustering method. a Weak, b intermediate, and c strong stability regimes. A p.d.f is estimated and constrained to 68% of the total volume, and colour-coded according to the colour bar, with grey points corresponding to the remaining 32% volume of the p.d.f. The stability function is fitted based on the points within the constrained p.d.f. Note, ideally one would need a correction for the heat flux due to humidity state cannot be represented by models based on the assumption of turbulence in equilibrium. Nevertheless, missing TKE terms may be incorporated through additional forcing terms in the TKE equation, as suggested by He et al. (2019, see their Eq. A1.1). The nature of the turbulence created through multiscale perturbations is widely ranging, making it challenging to systematize the effect in a universal and deterministic way. Our intention is to understand how the statistical properties of turbulence correlate with those of the sub-mean scales to be able to develop unsteady models. Our study contributes to this goal by isolating and visualizing the sub-mean scales. Using a statistical clustering tool, we investigate the traditional relation between gradients and turbulence at multiple scales.
In Sect. 4.1, we applied nonlinear filtering to separate the mean scale instead of fitting a log profile (Nieuwstadt 1984). We used wavelets for scale separation, because they are flexible in their application and widely studied. Particularly interesting alternatives are the recursive filters used by Zurbenko and Smith (2017) or Falocchi et al. (2018Falocchi et al. ( , 2019. We find that, for the FLOSSII database, filtering the velocity on a scale of 1 h is not long enough to obtain a logarithmic profile for a height of 30 m, and a 3-h-averaging scale is required. In addition, we showed the activity of the sub-mean scales and spatio-temporal interaction with the TKE and stability. For a given observational height of 30 m, there should exist a temporal averaging scale that is long enough to obtain the log profile for the mean wind speed. The fluctuation of the mean wind profile that does not follow a logarithmic shape should be considered as a deviation from the similarity theory. The choice of sub-mean scales is subjective and serves here to unfold the scales acting on turbulence in stable conditions. We investigated the scaledependent profiles for 102 nights and observed that the intermittent turbulence is associated with decoupled boundary layers (Lan et al. 2018). Furthermore, top-down intermittency ) is related at the sub-mesoscale.
The statistical model can reproduce the value of T I (see Fig. 7) within the dataset. While accounting for spatial correlation in the model would improve it further, using this model to forecast TKE in stable conditions is complicated for several reasons. The model needs the gradients of the sub-mean scales and the affiliation function, but these quantities are not given in a forecast by traditional models, and it is not clear how to parametrize the regime-affiliation function obtained here through the clustering procedure. For example, regime-switching may be modelled by a Markov chain, but we did not observe enough statistical regularity to derive such a parametrization. Further studies, which take the Markov chain approach into account, will need to be undertaken. Furthermore, the number of parameters has to be reduced (in this study we have 42 parameters). Moreover, the trained model is not transferable to another dataset for several reasons. The affiliation function, which tells us what set of model parameters is active at the current time, is inferred from the data. To make a prediction, as discussed above, one needs to model it. If one considers modelling the regime jumps stochastically without any dependence on external parameters, then it is expected that the averaged performance over the whole dataset may be acceptable. However, locally on the time scale of several hours, the model may predict a strong stability regime in the presence of a high geostrophic wind speeds, therefore making the prediction worse. Advance modelling of the (t) function (the regime affiliation) is needed to get any reasonable results. One can, for example, consider the regime jumps to be conditional on the geostrophic wind speed or stratification. Besides these difficulties, we used a rescaling of the TKE with the standard deviation to enhance regime separation. During the prediction, even within the same dataset, this scaling factor is not known.
In Sect. 4.3, we introduced the TSI diagram and showed how the sub-mesoscales commute with the turbulence intensity T I using the ζ classifier and the statistical clustering. Major differences between these classifiers are found in the very stable regime (ζ > 1). The clus-tering methodology isolates states in the TSI diagram with the interscale relation because the clustering is based on Granger causality between the vertical gradients and TKE. We have analyzed the probability density in the TSI diagram of the strong stability regime for seven available heights (not shown). The gradient-based classification finds a bimodal distribution (see Fig. 12 d, e, f), but the bimodality is weak, and to separate the modes, one needs to select the high-density region that contains 1% of the total p.d.f. volume. The results demonstrate how the two modes change their location with respect to height. We find a mode corresponding to high T I and S I values, associated with a state transition to a higher TKE level due to sub-mesoscale motions. The TSI diagram shows the relative importance of sub-mesoscale motions in the production of turbulence concerning stability and height (see Fig. 11). Indeed, it is not known if the footprint of the energy distribution between the sub-mesoscale motions and the TKE would be different at another site. The footprint also cannot be investigated with the stability parameter ζ to a full extent, because it does not account for the temporal dependence between sub-mesoscale motions and the TKE as demonstrated in Fig. 12.
The scatter in the flux-profile relationship is reduced by removing the intermittent cluster. Mahrt (2007) came to a similar conclusion when the nonstationarity was isolated based on the logarithm of the gradient Richardson number. The scatter in the intermittent regime is significant and covers a wide range of function values. This isolated regime can be modelled stochastically by dividing the flux-profile relationship into two states with different parametrizations of the stability function. Luhar et al. (2009) investigated the performance of a modified stability function, and implemented a discontinuous transition from the weak to the strong stability regimes. As a result, the turbulence model was able to predict the frequency of the low wind speed correctly.
For stochastic turbulence modelling, one could think of a two-state stability function, and an implementation based on the stochastically-perturbed parameter schemes (Hacker et al. 2011) which mimic the spread in the flux-profile relation. The regime at strong stability would be described by providing a distribution. For example, an asymmetric Gaussian distribution can be fitted to the flux-profile relationship in the intermittent regime (see Fig. 13c). Then, during a simulation of the strong stability regime, the value of φ m (ζ ) would be conditionally sampled based on the stability parameter ζ , where the regime of weak stability follows a deterministic description (see Fig. 13a).
Here, we presented an application of a methodology with which the classification and modelling are done simultaneously. Improving the model structure compared with the earlier approach of Vercauteren and Klein (2015) improves both aspects: clustering and prediction, thereby resulting in a better understanding of the vSBL. The approach can be extended to enable the testing of different model structures. Finding a simpler functional form for parametrizing the intermittent regime and embedding it into the clustering framework is the subject of future work.
The findings on the interaction between turbulence and sub-mesoscale motions are summarized in a concluding sketch of the TSI diagram (see Fig. 14). The shear at the ground is not the only source of turbulence as other scales induce additional turbulence on top of the mean level (see Fig. 14a), but the situation changes with increasing height. The bulk mean wind speed is small; consequently, the gradient of the mean wind speed above ground level is close to zero, resulting in no mechanical production of turbulence. Besides, the stratification is damping the turbulence. Hence, we find a less obvious overlap of the intermittent (red) region with the non-intermittent one (see Fig. 14b). As a result, the turbulence is driven more by the sub-mesoscale than by the gradient of the mean wind speed. The value of SI exceeds the critical value of one more often than the value of T I exceeds its critical value of one. Accordingly, to induce intermittency, the ratio between the mean wind speed and the sub-(b) (a) Fig. 14 A sketch of a TSI diagram illustrating the intermittent turbulence states in the FLOSS2 dataset. The weak and strong stability regimes are marked with blue and red, respectively, T I ave is the averaged value of T I ; T I crit = 1 if the energy of the mean wind speed is equivalent to the TKE and S I crit = 1 if the energy of the mean wind speed is equal the energy of the sub-mesoscale motions mesoscale energy should be almost one at the height of 1 m, before reducing further away from the ground.

Conclusion
Intermittent turbulence in the SBL is a common component that is not well represented in models. The unresolved flow structures contribute to the infrequent triggering of turbulence, causing intermittent bursts of different spatio-temporal scales in the vSBL.
As illustrated by the wavelet filtering, the time variability of the turbulence field develops with decreasing wind speed and becomes closely associated with the sub-mean scale dynamics. The patchiness of the turbulence is pronounced at low wind speeds. For sub-critical stability, the turbulence shows a weak dependence on the sub-mesoscale flow, but for supercritical stability, the sub-mesoscale intensity becomes an essential source of turbulence. The introduced jet scale is found to show patterns of generation and reduction of the turbulence due to the induced shear close to the ground.
For the strong stability regime, the vertical gradients from the sub-mean scales contribute to the TKE in the same amount as the mean scale, whereby above ground level, the contribution from vertical gradients is less. Furthermore, closer to the ground, memory effects are more relevant than away from it. A higher lag time in the discrete model is proportional to a higher-order derivative in the time and indirectly indicates a need for higher-order turbulence closure. However, in the FLOSSII dataset for strong stability, the memory of the vertical gradient is essential at the height of 1 m, but starts to be less important above 15 m.
The TSI diagram classified by the turbulence-scale gradient relation provides evidence that in strong stratification, with sub-mesoscale energy > 10% of the total mean kinetic energy, an intermittent regime appears. The performance of the statistical model demonstrates that it is possible to reproduce the observed value of T I using the vertical gradients of the sub-mean velocity scales within a dataset. Future work will develop a stochastic approach to model the unsteady energy injections based on the data analyses.