A tutorial on reproducing a predefined autocovariance function through AR models: application to stationary homogeneous isotropic turbulence

Sequential methods for synthetic realisation of random processes have a number of advantages compared with spectral methods. In this article, the determination of optimal autoregressive (AR) models for reproducing a predefined target autocovariance function of a random process is addressed. To this end, a novel formulation of the problem is developed. This formulation is linear and generalises the well-known Yule-Walker (Y-W) equations and a recent approach based on restricted AR models (Krenk-Møller approach, K-M). Two main features characterise the introduced formulation: (i) flexibility in the choice for the autocovariance equations employed in the model determination, and (ii) flexibility in the definition of the AR model scheme. Both features were exploited by a genetic algorithm to obtain optimal AR models for the particular case of synthetic generation of homogeneous stationary isotropic turbulence time series. The obtained models improved those obtained with the Y-W and K-M approaches for the same model parsimony in terms of the global fitting of the target autocovariance function. Implications for the reproduced spectra are also discussed. The formulation for the multivariate case is also presented, highlighting the causes behind some computational bottlenecks.


Introduction
The synthetic realisation of a random process (or, simply, the numerical generation of a random process) refers to the computational generation of time/space series that simulate the random behaviour of a dynamical system accurately.In this context, accurately means that the obtained time/space series reproduce sufficiently well a number of statistical features defined beforehand, like time/space covariance and cross spectra.
While many of the aforementioned real-life processes can only be rigourously represented through a non-stationary and/or non-homogeneous random process, stationarity and homogeneity are usually assumed in the modelling for convenience.In some cases, the obtained algorithms have served as a basis for the development of strategies oriented to non-stationary random processes, as in the case of evolutionary spectra, see the seminal paper [Deodatis and Shinozuka, 1988].
There are different types of numerical generation approaches.Although an agreed classification based on common names for the different models lacks, see [Kleinhans et al., 2009, Liu et al., 2019], two families, referred to as spectral and sequential methods, are usually identified.Spectral methods are based on strategies like harmonic superposition and inverse Fast Fourier Transform (FFT) [Shinozuka andDeodatis, 1991, Shinozuka andDeodatis, 1996].These methods require information regarding the spectral characterisation of the random process as an input, for example, in the form of predefined target cross power spectral density (CPSD) functions, coherence functions, etc.Some limitations of spectral methods are related to the fact that the process realisation needs to be synthesised in the whole time/space domain at once, which translates into high computational requirements for long-duration/long-distance multivariate and/or multi-dimensional processes [Kareem, 2008].
Sequential methods, also referred to as digital filters, are usually based on time series linear models, such as autoregressive (AR) and Moving Average (MA) models, or a combination of both (ARMA), and their multivariate versions (VAR, VMA and VARMA, respectively).Compared to spectral methods, sequential methods are less intensive in computational requirements during the synthesis, as only the model coefficients need to be stored.In addition, synthesis is a sequential process that can be stopped at a desired length of the time series and restarted later to lengthen the simulation.However, the determination of the model coefficients may demand high computational memory for multi-dimensional and/or multivariate problems.Model coefficients can be derived from a predefined target autocovariance function defined in time and/or space1 , though some approaches introduce spectral information as input as well, see [Spanos andHansen, 1981, Spanos, 1983].
Another advantage of sequential methods is that AR, MA, and ARMA models have theoretical expressions of their autocovariance and PSD functions that can be computed directly from the model coefficients.This fact allows comparing directly the target autocovariance function with the model theoretical autocovariance function to assess the accuracy of the model in reproducing the desired statistical information, while this comparison for the spectral methods is based on sample functions estimated from a finite number of finite-length realisations, subjected to smearing and leakage effects [Stoica and Moses, 2005].It is remarked that statistical bias affecting the sample autocovariance and PSD functions estimated from time series has to be taken into account when generating synthetic time series, regardless the nature of the method employed (spectral/sequential), in order to properly set the parameters of the simulations, like the time series length.[Dimitriadis and Koutsoyiannis, 2015] provides expressions for the bias of estimators for both the PSD and autocovariance function, and discusses the advantages of the climacogram as an alternative statistical object to characterise random processes.
It is noted that, in the context of stationary random processes, both spectral and sequential methods can be applied regardless the form of the input information, since the power spectrum and the autocovariance function are Fourier pairs, thus they are two forms of providing the same statistical information.However, going from a PSD to an autocovariance function (and vice versa), except for particular cases that admit a theoretical formulation, requires a numerical implementation of the corresponding Fourier transform.For this reason, spectral methods are usually applied to problems where the input information is provided in the frequency domain, while time domain descriptions of a random process represent natural inputs for sequential methods.
This tutorial is focused on the determination of AR models to optimally reproduce a predefined target autocovariance function.Indeed, optimal is a notion that needs to be clearly defined, as it will be discussed in Section 5. To frame the problem, consider a one-dimensional univariate discrete random process, {zt(α)}, where t ∈ N is a time index and α ∈ N is the index of the realisation.Thus, zt i (α) is a random variable associated to time index ti, and {zt(αj)} is a time series corresponding to the αj-th realisation of the random process.To simplify notation, the realisation index α will be omitted, so that {zt} will be used to refer a time series for a generic realisation α, and zt to refer the random variable associated to time t.
The random process is assumed to be Gaussian, stationary, and zero-mean.In addition, the existence of the integral time scale (i.e. the integral of the autocorrelation function) is assumed.Consequently, longterm persistence processes (Hurst phenomenon), characterised by an infinite integral scale, are excluded.The reason for this hypothesis is that this research considers stationary AR models, whose integral time scale always exists because the autocorrelation function decays exponentially.Long memory processes can be handled through different model types, like Fractional Autoregressive-Moving Average (FARMA) models [Hip, 1994].
The general formulation of an AR model of order p, AR(p), is as follows: where ϕi for i = 1, ..., p are the regression coefficients of the AR model, and σ εt represents the random term; εt is a sequence of independent and identically distributed (iid) random variables with zero mean and unit variance, and σ, here referred to as the noise coefficient, scales the variance of the random term, given by Var[σ εt] = σ 2 .Thus, the AR(p) model has p + 1 parameters, comprising p regression coefficients and one noise coefficient.Figure 1 illustrates the different elements involved in the synthetic generation of a random process with an AR model (also valid for MA and ARMA models).γ T l is the target autocovariance funcion, which depends only on the time lag l under the assumption of stationarity (the formal definition of the autocovariance function is provided in Section 2).In some applications, γ T l is derived from theoretical models and admits a mathematical expression, but in real-life problems it usually has to be estimated from observations.Step (1) represents the determination of the AR coefficients from γ T l ; this step requires a methodology and a choice for the model order, p. γ AR l is the theoretical autocovariance function, and it can be computed directly from the AR model coefficients, step (2) in the figure .Step (3) represents the synthesis of the random process through the AR model.By using a sequence of random values as inputs, {εt}, the AR model can be employed to generate realisations of the process in the form of time series.γ α l denotes the sample autocovariance function computed for realisation α.Averaging N sample autocovariance functions yields the ensemble autocovariance function, γ E l .γ E l converges to γ AR l when N −→ ∞.Thus, the objective in step (1) is to define a methodology that yields an AR model with a theoretical autocovariance function, γ AR l , that optimally reproduces the target, γ T l ; this can be verified without the need for generating a large number of realisations N to compute γ E l .Regarding the methodologies to obtain the AR(p) model parameters, the vast majority of works make use of the Yule-Walker (Y-W) equations [Spanos and Hansen, 1981, Spanos, 1983, Reed and Scanlan, 1983, Samaras et al., 1985], which establish relationships between the p + 1 AR parameters and the p + 1 first autocovariance terms, from l = 0 to l = p.These relationships arise simply from applying the definition of the autocovariance function for time lags from 0 to p.In the context of this work, these expressions are referred to as autocovariance equations for time lags from 0 to p.This approach leads to a perfect match of the first p + 1 terms of the target autocovariance function.Consequently, under this approach, there exists no means to improve the matching between the target and the theoretical AR autocovariance functions for lags larger than the model order, p.This fact is problematic for processes with high inertia (i.e., large integral time scale) or situations in which small sampling times of the time series are employed, because large values of the model order p are required to guarantee the matching of the target autocovariance function in a sufficiently wide range of time lags.But large p values reduce model parsimony and increase the computational cost to determine the model coefficients [Spanos, 1983, Dimitriadis andKoutsoyiannis, 2018].Model parsimony refers to achieving a certain model performance with the lowest number of model parameters, and is considered a key feature in time series modelling [Box et al., 2016].A proposal to preserve model parsimony is to use ARMA models [Gersch andLiu, 1976, Samaras et al., 1985].Several methodologies have been developed, typically based on multistage approaches that build the ARMA model by combining an AR(p) model previously defined with a MA(q) component.However, the potential of these approaches may be limited because MA processes show nonzero autocovariance only in the first q + 1 terms, see for example [Madsen, 2007].Thus, the improvement of the matching between target and the theoretical autocovariance functions for large time lags could be conditioned to considering high q values.In this article, a proposal for overcoming this limitation is introduced in Section 2.2.It consists in including autocovariance equations for lags larger than p in the procedure.
In a recent paper [Krenk and Møller, 2019], an interesting proposal based on AR models only with regression coefficients at certain time lags was introduced for the synthetic generation of turbulent wind fields.The theoretical formulation is provided for a generic sequence of lags, and the simulations are performed for AR models with an exponential scheme (regression coefficients only for time lags 2 k , k = 0, 1, 2, ...).In econometrics, such models are usually referred to as restricted AR models [Giannini and Mosconi, 1987], because they can be seen as a particular case of an AR(p) model for which some of the regression coefficients are imposed to be zero.2Thus, the actual number of regression coefficients is lower than the AR order, and the capacity of matching exactly the first p + 1 terms of the target autocovariance function is lost.However, the trade-off between model parsimony and target autocovariance function reproducibility for a wide range of lags was improved.In our opinion, one limitation of that work is that the exponential scheme of the model was assumed to be reasonably good for the considered application, and no further discussion is provided concerning the impact of different model schemes on the results.In what follows, the methodology presented in [Krenk and Møller, 2019] will be referred to as the K-M approach.
Within this context, this article introduces a general formulation to determine the parameters of a restricted AR model from a predefined target autocovariance function.Under this general framework, it will be shown that both the Y-W approach and the K-M approach could be seen as particular cases of the presented formulation.The described approach requires a reduced number of input parameters related to the model scheme and the employed autocovariance equations.An optimisation procedure based on genetic algorithms is applied to obtain AR models that reproduce the target autocovariance function more accurately than the Y-W and K-M approaches for the same model parsimony.The main ideas contained in this tutorial and its research contributions are as follows: • The Yule-Walker approach to obtain an AR(p) model from a target autocovariance function is described, emphasising the classical result consisting in the perfect matching of the first p + 1 terms of the target autocovariance function as a consequence of selecting a set of autocovariance equations for time lags from 0 to p.
• We show that using autocovariance equations for time lags larger than p may improve the matching of the target autocovariance function for lags beyond the model order.
• The potential of restricted AR models for improving the matching of a target autocovariance function is revisited by considering the K-M approach introduced in [Krenk and Møller, 2019].
• We introduce a general formulation for the AR parameters determination from a target autocovariance function.The formulation is general in the sense that it provides flexibility in the choice for the autocovariance equations and in the definition of the AR model scheme.
• The introduced formulation is exploited by a genetic algorithm to obtain optimal AR models without a pre-defined model scheme.The considered application is based on a stationary, homogeneous, and isotropic (SHI) turbulence model.Results are compared to those obtained with the Y-W approach and the K-M approach.
• The introduced general formulation is extended to the multivariate case.This leads to some computational bottlenecks that are highlighted.
The article is organised as follows.Section 2 describes the relationships between the parameters of an AR model and its theoretical autocovariance function, emphasising the impact of the selected autocovariance equations on the matching of the target autocovariance function.The case of restricted AR models is addressed in Section 3, where the focus is placed on the role of the model scheme.The general formulation for the determination of a restricted AR model from a predefined target autocovariance function is introduced in Section 4. Section 5 contains the optimisation exercise based on genetic algorithms.The generalisation of the problem to the mulivariate case is briefly described in Section 6.The paper ends with the main conclusions gathered in Section 7. The article includes a number of examples and reflections to facilitate comprehension.

The autocovariance equations of an AR model
In time series analysis, the covariance between two random variables zt 1 and zt 2 is usually denoted by γt 1 ,t 2 .Under the assumption of stationarity, the autocovariance depends only on the time lag, l = t1 − t2, and is referred to as the autocovariance function: (2) Note that the autocovariance function is symmetric, γ −l = γ l .Note also that, since the random term in (1) is independent, there is no dependency between the random term at time t, σ εt, and previous values of the process, z t−l for l > 0. Indeed, the following expression can be demonstrated [Madsen, 2007]: Given these considerations, Equation (2) together with ( 1) and (3) provide a means to generate analytical expressions that relate the autocovariance function for different time lags and the AR model parameters.As mentioned above, these expressions are here referred to as autocovariance equations.The autocovariance equation for lag l = 0 is: The autocovariance equation for a generic positive time lag l is: Note that Equation ( 4) is the only one among all autocovariance equations that includes the noise coefficient, σ.Actually, this equation defines the relationship between the variance of the AR process, γ0, and the variance of the random term, σ 2 .
Equations ( 4) and ( 5) are the basis for computing the theoretical autocovariance function of a given AR model, addressed in Section 2.1, and for obtaining the AR model parameters from a predefined target autocovariance function, see Section 2.2.

Computing the theoretical autocovariance function of an AR model
The objective of this section is to compute the first n + 1 terms of the theoretical autocovariance function of an AR(p) model, γ AR 0 , γ AR 1 , ..., γ AR n , assuming that the AR parameters, ϕi for i = 1, ..., p and σ, are known.
Without loss of generality, the case of an AR(2) model is considered.The following expression represents the autocovariance equations for lags from l = 0 to l = n, see equations ( 4) and ( 5), in the form of a matrix equation, where the autocovariance terms have been gathered into the independent vector.
Equation ( 6) represents a linear system of n+1 equations with n+3 unknowns.3However, by applying symmetry in the autocovariance function, γ −l = γ l , it is possible to remove terms γ−1 and γ−2 from the independent vector.This allows one expressing the system of equations ( 6) with as many equations as unknowns: From that, the following expression for the theoretical autocovariance function of the AR(2) model is readily obtained: Equation ( 8) can be employed to obtain an arbitrary number n of terms of γ AR l .
Increasing n comes at the expense of increasing the dimension of the matrix to be inverted, thus, the computational memory requirements.An alternative approach for computational alleviation consists in solving the subsystem given by the p + 1 first autocovariance equations in order to obtain γ AR 0 , ..., γ AR p , and then to compute recursively γ AR l for l > p through Equation ( 5).The counterpart of this approach is that the accumulation of rounding errors may lead to inaccurate estimations of γ AR l for large l values.As an example, Figure 2 shows γ AR l for an AR(2) model given by ϕ1 = 1.2, ϕ2 = −0.3 and σ = 0.5, computed for lags up to l = 20.

Computing the AR model parameters from a predefined target autocovariance function
The objective of this section is to compute the parameters of an AR model from a predefined target autocovariance function, γ T l , assuming that the target is available for any time lag l.It will be shown that only a limited number of values of γ T l are required, depending on the employed autocovariance equations.The p + 1 model parameters, ϕ1, ϕ2, ..., ϕp and σ, are computed from p + 1 autocovariance equations.The traditional approach to this problem considers the autocovariance equations for lags l = 0, 1, ..., p, which leads to the Yule-Walker (Y-W) equations.For this reason, this approach is here referred to as the Y-W approach.For illustrative purposes, consider the case of an AR(3) model, note that symmetry in the autocovariance function has already been applied: Equation ( 10) can be written in matrix form as: By replacing in (11) the autocovariance terms γi by the corresponding target values, γ T i , the four AR model parameters are obtained from γ T 0 , ..., γ T 3 : A key consequence of employing the autocovariance equations for lags l = 0, ..., 3 is that number of model parameters and the number of required γ T l values is the same, which leads to an AR(3) model with a theoretical autocovariance function that matches exactly the employed target values.This conclusion can be extended for an AR(p) and the first p + 1 terms of the target autocovariance function.
As an example, the following AR(3) model has been obtained for the target autocovariance function described in Appendix B: The theoretical autocovariance function of the obtained AR model, γ AR τ , and the target autocovariance function, γ T τ , are shown in Figure 3.The four target values employed during the model determination have been highlighted.Note that the theoretical autocovariance function of the AR(3) model matches exactly the employed target autocovariance values, but increasing differences with the target are observed for larger lags.
Note also the following comments: (i) The determination of the AR model involves a matrix inversion.Care should be taken with issues related to ill-conditioned matrices that may arise from some target autocovariance functions defined arbitrarily.For example, target values γ T 0 = 1, γ T 1 = 0.5 and γ T 2 = −0.5 lead to a non-invertible matrix in (12), regardless the value of γ T 3 .(ii) While the autocovariance function of the obtained AR model reproduces exactly γ T l for l = 0, ..., p, no constraints have been imposed on the autocovariance function for larger time lags l > p.This implies that there is no means to improve the matching between γ AR l and γ T l for time lags larger than p.
The last comment leads to an important question: is it possible to introduce information of γ T l for time lags larger than the AR model order in the determination of the model parameters?If so, that would provide a means to obtain AR(p) models for which there exists some control on γ AR l for l > p, potentially improving the trade-off between model parsimony and matching between target and theoretical autocovariance function, compared to the Y-W approach.To address this question, autocovariance equations for time lags larger than p could be considered.Let us define vector l = [l1, l2, ..., lN ] with the N positive lags corresponding to the autocovariance equations employed in the model determination.By default, the autocovariance equation for l = 0 is always required to determine the noise parameter, σ.For this reason, only positive lags are specified in vector l.Note also that N must be equal to the number of model regression coefficients, p.As an example, consider the set of autocovariance equations obtained with l = [1,2,5] for an AR(3) model: Now, the AR model parameters are given by: Equation ( 15) reveals that, in this case, the four AR model parameters are computed from six terms of the target autocovariance function, from γ T 0 to γ T 5 .For the particular case of the target autocovariance function described in Appendix B, the obtained AR(3) model is: which differs notably from the model obtained with the Y-W approach, see (13). Figure 4 shows the target and the theoretical autocovariance functions, γ T l and γ AR l , respectively.The six target values employed during the model determination have been highlighted.It can be seen that γ AR l does not match exactly γ T l for any time lag.However, a visual comparison with Figure 3 reveals that the global matching is improved.From this analysis, it can be concluded that, in the determination of the parameters of an AR(p) model, using autocovariance equations for lags larger than p leads to a number of required target terms higher than the number of model parameters.Since the autocovariance equations represent constraints between the AR model parameters and certain terms of the autocovariance function, this inequality makes that γ AR l does not exactly match the target values employed in the model determination, Equation (15), as it was the case for the Y-W approach; γ AR l only fulfils the constraints determined by the selected autocovariance equations.However, given a particular target γ T l and a considered AR model order p, optimal decisions on the selected autocovariance equations (defined in vector l) may lead to AR models that reproduce γ T l globally better, as compared with the models obtained with the Y-W approach, as it was shown in the previous example.Thus, considering vector l as an input parameter in the determination of an AR(p) model introduces flexibility as compared with the assumption l = [1, ..., p] that underlies the Y-W approach, and represents a path for improvement in reproducing a predefined target autocovariance function.To the authors knowledge, this strategy has not been addressed previously in the literature.

The autocovariance equations of a restricted AR model
In a restricted AR model, not all regression coefficients from ϕ1 to ϕp are considered.Let us define vector j = [j1, j2, ..., jN ], with N < p, containing the lags of the regression terms included in the model.The AR order is given by the regression term with the highest lag, p ≡ jN .Note that N represents the number of regression coefficients of the model.To distinguish between restricted and unrestricted AR models, the following notation will be employed for restricted AR models: A restricted AR(p, j) model can be seen as a particular case of an AR(p) model for which: and Note also that an AR(p) can be seen as a particular case of a restricted AR(p, j) model for which: provided that the constraint N < p is relaxed.
The autocovariance equations of a restricted AR(p, j) process can be readily obtained by combining the autocovariance equations of the corresponding unrestricted AR(p) model, see Section 2, with equations ( 18) and ( 19):

Computing the theoretical autocovariance function of a restricted AR model
The problem of computing γ AR l for a restricted AR(p, j) model can be readily addressed by applying the procedure described in Section 2.1 together with eqs.( 18) and (19).As an example, Figure 5 shows γ AR l for an AR(5, [1,2,5]) model given by a1 = 1.2, a2 = −0.5, a5 = 0.1 and b = 0.5, computed for lags up to l = 20.A peculiarity of restricted AR models is that specific model schemes (i.e., specific values of vector j components) lead to theoretical autocovariance functions with alternating zero and nonzero values.For example, let us consider the model AR(3,[3]): The values of the autocovariance function γ AR l up to lag l = 3 are provided by the following autocovariance equations: By applying symmetry to the autocovariance function, the system of equations given in ( 24) can be expressed as follows: Equation ( 25) can be divided into two independent subsystems.The first one with equations for lags l = 0 and l = 3, and the second one for lags for which there are no regression terms, l = 1 and l = 2. Solving the first subsystem yields: Concerning the second subsystem, the only possible solution is: By recursively applying Equation ( 22), one gets nonzero values for γ AR l only at lags l = 3, 6, 9, ....In a general case, a restricted AR(p) model with a single regression term at lag j = [p] has a theoretical autocovariance function with non-zero values only at lags l = p and its multiples.This particular structure of γ AR l is also observed for restricted AR models with j vectors such that ji is multiple of j1 for i > 1.For example, the theoretical autocovariance function of an AR (6,[2, 4, 6]) model will show nonzero values only at even time lags.Such model schemes are likely to represent bad candidates for reproducing real-life autocovariance functions that usually fade out to zero in a continuous fashion.

Computing the parameters of a restricted AR model from a predefined target autocovariance function
In Section 2.2 it was shown that the p + 1 parameters of an (unrestricted) AR(p) model can be computed to reproduce the first p + 1 values of a predefined target autocovariance function through the Y-W approach.It was also discussed how the exact matching up to lag p could be sacrificed in favour of improving the global matching by employing autocovariance equations for lags larger than p.In this section, restricted AR models are considered as an additional strategy to increase the control on the obtained model autocovariance function for time lags larger than the number of regression coefficients, N .Without loss of generality, consider the problem of computing the parameters of a restricted AR model given by j = [1,2,5].Note that the model parsimony is the same than that of the AR(3) model employed in Section 2.2, i.e.N = 3, but in this case the model order is p = 5.Let us start by considering the autocovariance equations of the corresponding unrestricted AR(5) model, for lags l = 0, ..., 5: l = 3 : γ3 = ϕ1 γ2 + ϕ2 γ1 + ϕ3 γ0 + ϕ4 γ1 + ϕ5 γ2, l = 4 : γ4 = ϕ1 γ3 + ϕ2 γ2 + ϕ3 γ1 + ϕ4 γ0 + ϕ5 γ1, l = 5 : γ5 = ϕ1 γ4 + ϕ2 γ3 + ϕ3 γ2 + ϕ4 γ1 + ϕ5 γ0.
The system of equations ( 27) contains six equations, six model parameters (σ , ϕ1, ..., ϕ5) and six autocovariance terms (γ0, γ1, ...γ5).Thus, according to the Y-W approach described in Section 2.2, by introducing the six target autocovariance terms, the system of equations is linear and provides the six parameters of an AR model whose theoretical autocovariance function matches exactly the imposed target autocovariance values.Now, consider the restricted model AR(5, [1,2,5]), obtained from an AR(5) by imposing ϕ3 = 0 and ϕ4 = 0. Since these two model parameters are no longer unknowns in ( 27), two possible strategies can be followed to obtain the model parameters (ϕ1, ϕ2, ϕ5 and σ) from ( 27): 1. To select two new unknowns from the set of six autocovariance terms.These two terms will not be replaced by target autocovariance values.This yields a system of equations with six equations and six unknowns (four model parameters and the two selected autocovariance terms).
2. To discard two autocovariance equations, in order to have a system of equations with four equations and four unknowns (the four model parameters).
If the strategy defined in 1 is considered, only four (and not six) values of the target autocovariance function are required.Let us select, with no loss of generality, γ0, γ1, γ3 and γ5 to be replaced by the corresponding target autocovariance values, and consider γ2 and γ4 as additional unknowns.In this case, the system of equations ( 27) yields the six unknowns (σ, ϕ1, ϕ2, ϕ5, γ2 and γ4).The drawback of this strategy is that the system of equations becomes non-linear, due to the products between unknowns such as ϕ2 γ2 and ϕ5 γ4 in the first and second equations, respectively.Note that the computational cost may be dramatically increased, specially for AR models with large order p, as the system of equations to be solved comprises p+1 equations, regardless the number of AR coefficients assigned to zero.The advantage of this strategy is that the obtained restricted AR model has a theoretical autocovariance function that matches exactly the four provided target autocovariance values.
To illustrate this, the restricted AR(5, [1,2,5]) model was obtained for the target autocovariance function described in Appendix B: zt = 0.649 zt−1 + 0.138 zt−2 + 0.026 zt−5 + 0.634 εt. (28) Figure 6 shows the corresponding γ AR l and γ T l functions, where the four target values employed during the model determination have been highlighted.The figure shows an exact matching of γ T 0 , γ T 1 , γ T 3 and γ T 5 , as well as an improved global matching of the target autocovariance function with respect to the Y-W approach for the same model parsimony can be observed, see Figure 3.The second strategy defined in 2 is actually equivalent to just selecting as many autocovariance equations as AR parameters, by defining a vector l.For the considered example, and without loss of generality, the autocovariance equations for lags l = 0 and l = [1,4,5] are selected.The resulting system of equations, using the notation for restricted AR models, is: l = 4 : γ4 = a1 γ3 + a2 γ2 + a5 γ1, l = 5 : γ5 = a1 γ4 + a2 γ3 + a5 γ0.
The system of equations ( 29) is linear, and comprises four equations, four unknowns (the model parameters b 2 , a1, a2 and a5), and requires six autocovariance terms.By introducing the target autocovariance terms, the model parameters are given by: This situation is similar to that explained in Section 2.1, in the sense that, since the number of required target autocovariance terms is higher than the number of equations (i.e., the model parameters), the autocovariance function of the obtained AR model will not exactly match any of the imposed target autocovariance values, but it may show a reasonably good matching for a wide range of time lags.To illustrate this idea, Figure 7  Although the obtained restricted AR model given in (31) performs slightly worst than the model given in ( 16), it still improves the model obtained with the Y-W approach, Equation (13).The key idea of the presented analysis is that selecting appropriate AR model schemes through vector j has the potential to improve the global matching between γ AR l and γ T l .To the authors knowledge, this strategy has been considered only to a limited extent in a previous work [Krenk and Møller, 2019], since in that work an exponential model scheme j = [1, 2, ..., 2 N −1 ] was assumed for the presented simulations, which leaves the possibility of optimising the AR scheme unexplored.

General formulation for the determination of a restricted AR model from a predefined target autocovariance function
This section provides a general formulation for the linear problems addressed in sections 2.2 and 3.2.The non-linear problem described in Section 3.2 is not covered by the following formulation and will be further analysed in future research.The objective here is to obtain the parameters of a restricted AR(p, j) model given a target autocovariance function, from generic input vectors j = [j1, j2, ..., jN ] (regression coefficients considered in the model) and l = [l1, l2, ..., lN ] (autocovariance equations considered to obtain the model parameters), by means of a linear system that can be computed with low computational resources.
The problem is divided into two steps: (i) Determination of the regression coefficients aj i , for i = 1, ..., N , by means of the set of N autocovariance equations for lags gathered in l.For convenience, the regression coefficients are encapsulated into a row vector a = [aj 1 , aj 2 , ..., aj N ].
(ii) Determination of the noise coefficient, σ, by means of the autocovariance equation for lag l = 0.
Note that, while in sections 2.2 and 3.2 a single system of equations including (i) and (ii) was considered, the division of the problem proposed in this section is always possible since the noise coefficient appears only in the autocovariance equation for lag l = 0.By doing so, a more handy general formulation is obtained.Note also that the formulation is presented for the case of a restricted AR model, but the case of an unrestricted AR model is included by simply considering Equation (20).

Determination of the model coefficients: a
The autocovariance equations considered for the lags gathered in l and for the case of a restricted AR(p, j) model can be written in matrix form as follows: or, in a more compact way: where γ l is a row vector with dimension N , and γ j,l is a matrix N × N , both containing values of the autocovariance function at specific time lags, according to vectors j and l.It is worth noting that, for the particular case j = l = [1, 2, 3, ..., p], the system of equations given in (32) becomes the Yule-Walker equations.If the autocovariance terms in (33) are replaced by the target values, the model coefficients are given by:

Determination of the noise coefficient: b
The autocovariance equation for time lag l = 0 for a restricted AR(p, j) model is as follows: where tilde means transposed.Operating and applying symmetry in the autocovariance function, the noise coefficient is given by: or, in a more compact way: where γ j is a row vector containing N values of the target autocovariance function at time lags gathered in j.
Finally, note that equations ( 33) and ( 37) particularised for j = l = [1, 2, ..., 2 N −1 ] represent the univariate case of the formulation introduced in [Krenk and Møller, 2019].The formulation here presented is more flexible, as it allows searching for optimal j and l vectors independently, that is, without assuming the constraint j = l.

Optimal AR models for synthetic isotropic turbulence generation
In this section, a methodological proposal is presented to obtain AR models from a predefined target autocovariance function.The particular case of homogeneous stationary isotropic (SHI) turbulence is considered.The methodology combines the general formulation described in Section 4 with the use of genetic algorithms.The aim is to find, for a given number of regression coefficients N , optimal vectors j = [ j1, j2, ..., jN ] and l = [ l1, l2, ..., lN ].In this context, optimal means that the obtained AR model resulting from expressions ( 34) and ( 37) provides minimum mean squared error, M SE, between its theoretical autocovariance function, γ AR l , and the target autocovariance function, γ T l .M SE is given by: with γ T l is defined as the non-dimensional longitudinal autocovariance function of SHI turbulence, Ru(r), as described in Appendix B; r = r/L is a non-dimensional spatial coordinate, and L is the length scale parameter of the three-dimensional energy spectrum.In (38), M represents the maximum lag considered in the computation of the M SE.This parameter has been fixed to M = 41, which corresponds to a maximum non-dimensional spatial distance of rmax = 40 • ∆r = 4.98.It holds that: meaning that the selected M value accounts for the 99.5% of the integral length scale, L x u .

ji
The first and second constraints derive from the definition of vectors j and l.The third constraint introduces the parameter ∆, that regulates the maximum difference between the regression lags included in the AR model and the corresponding autocovariance equations in the equation system that provides the model parameters.It has been observed that large differences between elements ji and li may derive into numerical instabilities during the optimisation process.Note that ∆ = 0 means that l = j.
The analysis includes a benchmark exercise for a number of models, which are compared for the same model parsimony.The range N = 1, ..., 10 is considered in what follows.The benchmark comparison includes the following models (italic letters are employed to denote the models): • Y-W, an unrestricted AR(p) model obtained through the Yule-Walker approach, that is, p = N and j = l = [ 1, 2, ..., p ].
• GA-0: obtained with a genetic algorithm with ∆ = 0.This model allows exploring the potential improvement due to relaxing the exponential model scheme employed in K-M, while keeping the constraint j = l.
• GA-10: obtained with a genetic algorithm with ∆ = 10.This model allows assessing the additional improvement with respect to GA-0 due to relaxing the constraint j = l.It is noted that, for the GA-10 model, the maximum difference found between i-th elements ji and li was 5, meaning that the choice ∆ = 10 was flexible enough to allow the search for optimal AR models witout actually constraining the difference between corresponding elements in j and l vectors.
Results show that, for the most parsimonious models, N = 1, the single regression term considered in the four models is the previous lag, j = [1].However, for GA-10 the employed autocovariance equation is for lag l = [2].For the case N = 2, both GA-0 and GA-10 models provided optimal regression lags j = [1,3], different from Y-W and K-M models, for which j = [1,2].Concerning the model order p, given by the last term of j, jN , results show that, while for models Y-W and K-M p is determined by N (linearly and exponentially, respectively), the proposed methodology has the ability to reveal an optimal model order for a given model parsimony.It is noted that the obtained optimal order models depend on the specific target autocovariance function considered.Indeed, for increasing N values, the obtained model order for the most flexible model, GA-10, stagnates around p = 23.This value is coherent with the fact that the target autocovariance function is already very close to zero for this lag, see Figure 18.Thus, including additional regression coefficients by reducing the model parsimony is better exploited by the model by rearranging terms in vector j rather than increasing the model order beyond this value, as it is the case for models Y-W and K-M.
Figure 11 shows the M SE obtained with all the models considered in the analysis, as a function of N .As expected, the MSE obtained with the Y-W approach decreases with N , because the obtained models match exactly the target autocovariance function for lags up to the model order.For the K-M approach, the error decreases rapidly for low N values, reflecting the advantages of restricted AR models as compared with the Y-W approach.However, the obtained error stagnates from N = 5 on.This can be explained by the fact that including a regression term for lag 2 6−1 = 32 (i.e.N = 6) improves little the fitting of a target autocovariance function that becomes very close to zero for such lag values.This result clearly shows that an exponential model scheme, as that of model K-M, has an intrinsic upper limit on the number of regression terms that is worthy to consider, this limit being related to the range of lags for which γ T l takes non-negligeable values.This range depends on the selected ∆r employed to obtain the discrete target values from the continuous non-dimensional target autocovariance function of the underlying random process, see Appendix B. The improvements provided by GA-0 with respect to K-M model show the importance of searching for optimal model schemes rather than assuming a predefined model structure.The improvement becomes noticeable for N values larger than the aforementioned upper limit N = 5.Finally, the gap between the results for GA-0 and GA-10 clearly shows the additional improvement associated with an optimal selection of the autocovariance equations employed in the determination of the model parameters.Thus, relaxing the hypothesis l = j, that usually underlies in the literature, is actually a clear path for improvement.
To delve more into this analysis, the distribution of M SE with the lag is analysed.Figure 12 shows e 2 l , see Equation ( 39), for case N = 3.It can be seen that improving the global fitting of the target autocovariance function by means of restricted AR models may come at the expense of deteriorating the local fitting in some of the first lags, as compared with the results of the unrestricted models employed under the Y-W approach.For example, model GA-10 improves greatly the other models in terms of the global fitting, see Figure 11, but at the same time GA-10 shows the highest error for lag l = 2.This trade-off between global fitting and local fitting of the target autocovariance function is relevant when using restricted AR models, and it probably needs to be addressed with a broader concept of optimal model, which may vary from one problem to another.In this regard, additional information from the frequency domain could be included to define the notion of optimal model.This is particularly relevant for some engineering problems, for which some constraints could be defined in terms of frequency ranges, like those involving wind loads [Li andKareem, 1990, Kareem, 2008].For illustrative purposes, Table 1 shows the model parameters obtained for N = 3, and Figure 13 gathers, on top, the non-dimensional one-sided target spectrum, ST ( k), together with the corresponding theoretical spectra of the AR models obtained with the four approaches, and at the bottom, the differences between the target spectrum and the AR model spectra.k is the non-dimensional wave number, k = k L. Note that ST ( k) is affected by aliasing due to the discretisation of the target autocovariance function, see details in Appendix B. The non-dimensional one-sided spectrum of an AR model can be obtained either from the model coefficients, or from its theoretical autocovariance function [Box et al., 2016]: where kmax = 1 2∆r is the maximum non-dimensional wave number, and ∆r is the non-dimensional length employed to obtain the discrete version of the target autocovariance function, see Appendix B.
Table 1: AR models obtained for the different approaches for Figure 13: Top: non-dimensional one-sided target spectrum, ST ( k), together with the corresponding theoretical spectra of the AR models obtained with the four approaches; bottom: differences between the target spectrum and the AR model spectra.Case N = 3.
Figure 13 illustrates that, in terms of fitting the target spectrum, model GA-10 outperforms the rest of the models for low frequencies (up to k = 4), including the frequency at which the maximum of the target spectrum is attained ( k ≈ 1.245).For higher frequencies, all model spectra show similar oscillations around the target spectrum.This result suggests that, a priory, reducing the local fitting of the target autocovariance function in the first lags to improve the global fitting does not have a clear negative impact on the spectrum fitting.

Generalisation to multivariate processes
In this section, the analyses described in sections 2, 3 and 4 are extended for a one-dimensional multivariate random process, {zt(α)}, with the column vector zt = [z1, z2, ..., z k ] t being a k-variate random variable.In particular, we describe the reasons why some bottlenecks in terms of computational cost may arise as a consequence of the multivariate character of the formulation.Note that multi-dimensional multivariate problems in some cases admit a one-dimensional multivariate formulation, as firstly described in [Mignolet and Spanos, 1991].In [Krenk and Møller, 2019], a three-dimensional three-variate description of SHI turbulent wind field was expressed in the form of a one-dimensional 3P -variate random process by stacking the three velocity components at P points of the plane perpendicular to the mean wind into a random variable.
The general formulation of a zero-mean k-variate VAR(p) model is: where z = [z1, z2, ..., z k ] is a k-variate random variable, and εt is a sequence of independent and identically distributed (iid) random vectors with zero mean, E[εt] = 0 k×1 , and unity covariance matrix, Var[εt] = I k .
The p + 1 model matrix parameters are given by the regression matrices, Φi with i = 1, ..., p, and the noise matrix, Σ; the dimension of all the matrix parameters is k × k.The covariance of the random term is given by Var Γt 1 ,t 2 is the covariance between random vectors zt 1 and zt 2 .As in the univariate case, the assumption of stationary process implies that Γt 1 ,t 2 depends only on the time lag, l = t1 − t2, which is referred to as the covariance matrix function: The covariance equations that generalise equations ( 4) and ( 5) for the multivariate case are: Φi Γ l−i . (45)

Computing the theoretical covariance matrix function of a VAR model
As in the univariate case, equations ( 44) and ( 45) can be employed to compute the theoretical covariance matrix function, Γ V AR l , from the model parameters, Φi, i = 1, ..., p and Σ.However, note that in the multivariate case, Γ −l = Γ l .This fact makes the linear formulation described in Section 2.1 not applicable for the multivariate case, as it is not possible to replicate the step given between equations ( 6) and ( 7).There are two approaches for obtaining the theoretical covariance matrix function of a VAR(p) model, both involving an increasing computational cost, as compared with the univariate case: 1. To obtain Γ V AR l from the covariance matrix of the VAR(1) representation of the VAR(p) model.This strategy provides the exact covariance matrix function of the VAR(p) model, but it is computationally very expensive, as the size of the involved matrices scales up to (pk) 2 × (pk) 2 .2. To obtain Γ V AR l from the covariance matrix function of the equivalent multivariate Vector Moving Average (VMA) model.Since this equivalent VMA model contains infinite terms, truncation to a VMA(q) model is required, meaning that the obtained covariance matrix function is an approximation of the exact VAR covariance matrix function.In this case, the increase in the computational cost comes from the fact that appropriate q values are related to the integral length scale of the process, which typically leads to q p.
In what follows, both methodologies are briefly described.

Covariance matrix function of a VAR(p) model through the VAR(1) representation
A k-variate VAR(p) model can be expressed in the form of an extended pk-dimensional VAR(1) model by using an expanded model representation [Tsay, 2013].For illustrative purposes, the case of a VAR(3) model is analysed: Let us define the expanded 3k-variate random variable as: The VAR(1) representation of the original VAR(3) model is given by: where Φ * 1 is called the companion matrix, defined as: and the random matrix is given by: The covariance matrix function of the extended VAR(1) model of equation ( 48) is given by: The solution of Equation ( 51) is given by [Tsay, 2013]: where vec(•) denotes the column-stacking vector of a matrix and ⊗ is the Kronecker product of two matrices.
The covariance matrix function of the original VAR(3) model can be readily obtained from Γ * 0 , noting that Γ V AR 0 , Γ V AR 1 and Γ V AR 2 are given by: and using recursively Equation ( 45) to obtain Γ V AR l for lags l ≥ 3.

Covariance matrix function of a VAR(p) model through VMA(q) representation
It can be demonstrated that any VAR(p) model admits a Vector Moving-Average (VMA) representation with infinite terms: where matrices Ψi are known functions of the VAR regression matrices, Φi [Tsay, 2013]: Figure 15: γ V AR z1,z1,l from the covariance matrix function of a VAR(2) process obtained through VAR(1) representation, and three estimations obtained through the VMA(q) representation.
Note also that a VAR(p) model can be seen as a particular case of a restricted VAR(p) model for which: In the following, the formulation for obtaining the matrix parameters of a restricted VAR(p) model from a target covariance matrix function is introduced for generic vectors j = [j1, j2, ..., jN ] and l = [l1, l2, ..., lN ].As in the univariate case, the problem is divided into two steps, and the regression matrix coefficients are encapsulated into matrix A = [Aj 1 Aj 2 ... Aj N ], which has dimension k × kN .

Determination of the model matrix coefficients: A
The multivariate form of Equation ( 32) is given by: or in a more compact way: where Γ l is a matrix k × kN and Γ j,l a matrix kN × kN .By replacing the covariance terms by the target values, the model matrix coefficients are given by:

Determination of the noise matrix: B
The multivariate version of Equation ( 35) is: Operating and applying Γ −l = Γ l , the covariance matrix of the random term, BB , is given by: or in a more compact way: where Γ j is a matrix k × kN .From (65), the noise matrix B can be obtained in several ways, for example, through Cholesky decomposition.Finally, it is remarked that equations ( 61) and ( 65) particularised for j=l represent the formulation introduced in [Krenk and Møller, 2019] (in particular Γ0 = Cuu, Γ j = Γ l = Cuw and Γ j,l = Cww).
As an example, two VAR models with three regression matrices have been computed to reproduce the target covariance matrix function described in Appendix B, for the case of the longitudinal wind component at two spatial locations with a non-dimensional lateral separation of ∆ẙ = ∆y L = λ = 0.747.The first model is a VAR(3) model, and it was computed with the Y-W approach, i.e. j = l = [1,2,3 Figures 16 and 17 show the resulting covariance matrix functions, computed through VAR(1) representation (i.e.exact values).As expected, the model obtained with the Y-W approach provides exact matching for lags from l = −3 to l = 3.However, an improved global fitting is obtained with the restricted VAR(5, [1,2,5]) model.

Conclusions
Sequential methods for synthetic realisation of random processes have a number of advantages compared to spectral methods.For instance, they are characterised by a more handy synthesis process, as it can be stopped and restarted at any time.In addition, the models obtained through the sequential approach (e.g., autoregressive models) have theoretical expressions of their autocovariance function and power spectrum density (PSD) function, which allows an improved assessment of the accuracy of the models in reproducing the predefined statistical information.
In this article, a methodological proposal for the determination of optimal autoregressive (AR) models from a predefined target autocovariance function was introduced.To this end, a general formulation of the problem was developed.Two main features characterise the introduced formulation: (i) flexibility in the choice for the autocovariance equations employed in the model determination, through the definition of a lag vector l; and (ii) flexibility in the definition of the AR model scheme through vector j, that defines the regression terms considered in the model.The AR parameters are directly obtained as a solution of a linear system that depends on j and l.The well-known Yule-Walker (Y-W) and a recent approach based on restricted AR models (K-M) can be seen as particular cases of the introduced formulation, since j = l = 1, ..., p for Y-W and j = l for K-M.
The introduced formulation was exploited by a genetic algorithm to obtain optimal AR models for the synthetic generation of stationary homogeneous isotropic (SHI) turbulence time series.The resulting models improved Y-W and K-M based models for the same model parsimony in terms of the global fitting of the target autocovariance function.This achievement was obtained at the expense of reducing the local fitting for some lags.The impact of this trade-off on the frequency domain was presented as a path for extending the notion of optimal model to specific problems in which constraints in the frequency domain may exist, as it is the case in some engineering problems.
The formulation for the one-dimensional multivariate case was also presented.The reasons behind some computational bottlenecks associated with the multivariate formulation were highlighted.
Finally, a non-linear approach for the univariate case was described, for which preliminary results suggest an improved fitting of the target autocovariance function.
Figure 18 shows Ru(r) (continuous line) together with γ T l (discrete values).It is worth mentioning that the choice for parameter ∆r usually represents a trade-off between different criteria [Spanos, 1983].On the one hand, small ∆r values involve target autocovariance functions with non negligible values for a large number of lags, increasing the number of required AR model coefficients to reproduce γ T l reasonably well.On the other hand, large ∆r values may derive into problems in reproducing the spectrum for high frequencies (aliasing), which is problematic for some engineering problems [Li and Kareem, 1990].Indeed, by discretising the continuous autocovariance function of a physical process to build γ T l (see above), the corresponding target spectrum (i.e., the Fourier Transform of γ T l ) differs from the spectrum of the original continuous process in the high frequency range.To illustrate this, Figure 19 shows the non-dimensional one-sided spectrum of the employed von Karman turbulence model, SV K ( k), together with the target spectrum resulting from three different values of ∆r, referred to as ST for a specific ∆r.SV K (, k) is given by: A.12) where k is the non-dimensional wave number, k = k L, and γ = 5/6, as stated above.Note that ST are obtained only up to a maximum value of k derived from the Nyquist theorem, kmax = 2π 2∆r .

B Results for a target autocovariance function with very low decay rate
As mentioned in Section 1, long-term persistence processes (Hurst phenomenon), characterised by an infinite integral scale, are not considered in this study.However, because many physical processes may show very large integral time scales, it is interesting at least to highlight how the structure of the optimal AR model changes according to the decay rate of the considered target autocovariance function.To this end, the optimisation process performed in Section 5 has been repeated assuming a discretisation of the non-dimensional continuous autocovariance function ten times finer, i.e. ∆r = λ 60 = 0.01245.As a consequence, the target autocovariance function decays ten times slower, and the maximum lag considered in the MSE computation to account for the 99, 5% of the integral length scale becomes M = 401.Figure 20 shows the resulting target autocovariance function.A direct consequence of this modification is an increase of the computational burden associated with the optimisation procedure.
Table 2 shows the obtained AR model order (the last element of vector j) for the different approaches and model parsimony, N .While the model order for Y-W and K-M approaches depends solely on the number of model parameters, N , different model orders are obtained for the case of the optimal GA-0 and GA-10 models, according to the different decay rate of the target.
The comparison of the model performances in terms of the MSE is qualitatively similar to that shown in Figure 11, the single difference being that the error of K-M approach stagnates from N = 8 on, as γ T l takes values very close to zero at lag 2 8 = 256.The optimal models obtained for N = 3 are shown in Table 3.A comparison with Table 1 illustrates the importance of not assuming a predefined AR model structure, as the optimal one clearly depends on the decay rate of the target.

Figure 1 :
Figure 1: Scheme with the different elements involved in the synthetic generation of a random process with an AR model.

Figure 3 :
Figure 3: Autocovariance function of an AR(3) model and target autocovariance function.The AR model was obtained by considering the autocovariance equations for lags l = 0, 1, 2, 3.The employed values of the target autocovariance function are highlighted.

Figure 4 :
Figure 4: Autocovariance function of an AR(3) model and target autocovariance function.The AR model was obtained by considering the autocovariance equations for lags l = 0 and l = [1, 2, 5].The employed values of the target autocovariance function are highlighted.

Figure 6 :
Figure 6: Autocovariance function of an AR(5,j=[1,2,5]) model and target autocovariance function.The restricted AR model was obtained with the non-linear approach, see text for details.The employed values of the target autocovariance function are highlighted.

Figure 9
Figure9shows the components of vector j obtained for model GA-0, for different N values.Figure10shows the components of vectors j (left) and l (right) obtained for model GA-10, for different N values.Y-W and K-M models are included in both figures for comparison.

Figure 9 :
Figure 9: Components j i of vector j obtained with GA-0 model.Y-W and K-M models are included for comparison.

Figure 10 :
Figure 10: Components j i of vector j (left) and l i of vector l (right) obtained with GA-10 model.Y-W and K-M models are included for comparison.

Figure 11 :
Figure 11: M SE obtained with Y-W, K-M, GA-0 and GA-10 models, as a function of the number of regression terms considered in the model, N .(Logarithmic scale).

Figure 16 :Figure 17 :
Figure 16: Target and VAR covariance matrix function of a VAR(3) model obtained with the Y-W approach.

Figure 18 :
Figure 18: Non-dimensional autocovariance function for longitudinal wind component along wind direction in isotropic turbulence (continuous line), and target autocovariance function for ∆r = 0.1245 (discrete values).

Figure 19 :
Figure 19: Non-dimensional one-sided von Karman spectrum together with the non-dimensional one-sided target spectrum obtained for different ∆r values.

Figure 20 :
Figure 20: Non-dimensional autocovariance function for longitudinal wind component along wind direction in isotropic turbulence (continuous line), and target autocovariance function for ∆r = 0.01245 (discrete values). ]:

Table 2 :
AR model order, p, for different approaches, model parsimony, N , and decay rates of the target (fast decay: ∆r = 0.1245; slow decay: ∆r = 0.01245) Fast decay of γ T

Table 3 :
AR models obtained for the different approaches for N = 3 and slow decay rate of the target autocovariance function.