Stochastic modeling of stationary scalar Gaussian processes in continuous time from autocorrelation data

We consider the problem of constructing a vector-valued linear Markov process in continuous time, such that its first coordinate is in good agreement with given samples of the scalar autocorrelation function of an otherwise unknown stationary Gaussian process. This problem has intimate connections to the computation of a passive reduced model of a deterministic time-invariant linear system from given output data in the time domain. We construct the stochastic model in two steps. First we employ the AAA algorithm to determine a rational function which inter-polates the z -transform of the discrete data on the unit circle and use this function to assign the poles of the transfer function of the reduced model. Second we choose the associated residues as the minimizers of a linear inequality constrained least squares problem which ensures positivity of the transfer function’s real part for large frequencies. We apply this method to compute extended Markov models for stochastic processes obtained from generalized Langevin dynamics in statistical physics. Numerical examples demonstrate that the algorithm succeeds in determining passive reduced models, and that the associated Markov processes provide an excellent match of the given data.


Introduction
The stochastic realization problem concerns the question whether a given stationary Gaussian process in continuous time has a realization in terms of a Markov process of higher dimension, cf.Lindquist and Picci [21].In this paper we refer to stochastic modeling, when an approximating Markov process is determined numerically from given time samples of the autocorrelation function of the stationary process in question, independent of whether or not a stochastic realization exists.This is also known as the inverse problem of stationary covariance generation, cf.Kalman [20] and Anderson [2].
Our interest in stochastic modeling is driven by applications from statistical physics.Consider, e.g., the classical example of the equilibrium dynamics of a particle in a heat bath (Zwanzig [29]).Assume that the velocity of the particle is being measured, and an effective stochastic model for its dynamics is sought.The full Langevin dynamics of the particle together with the constituents of the heat bath provides a stochastic realization of this process, but this is not tractable analytically and much too large and costly to simulate, anyway.Rather, one is interested in a small extended Markov system with a few handful of auxiliary variables, the simulation of which reproduces the main features of the particle's dynamics.
The stochastic realization problem has strong connections to the theory of deterministic time-invariant linear dynamical systems, and the stochastic modelling problem is related to the corresponding model reduction aspect -without any given information about the underlying system, be it finite dimensional or not.The major difference to the common deterministic problem setting is the quality and amount of data: for applications like the one described above the number of time samples may be less than a hundred, and they are noisy.
As will be explained in Section 2 the stochastic problem comes with the additional catch that it is indispensable that the reduced models be passive in the systems theory terminology.Passivity is a constraint that is easiest to formulate in the frequency domain; it is difficult to achieve this property in the time domain when the use of the Laplace transform is prohibited by the quality of the data.Of course, the need for passive models is also relevant in many deterministic applications, but so far there is no available off-the-shelf algorithm which applies to our setting; see Section 2 for a brief review of the pertinent literature.
In this paper we restrict ourselves to scalar problems and present a method which accounts for passivity, but nevertheless operates in the time domain; this way we are able to control the quality of our approximation of the original process.The method is fairly simple to implement and requires only a small number of linear systems to be solved.It employs in a first step the AAA algorithm for rational interpolation by Nakatsukasa, Sète, and Trefethen [22] to determine the eigenvalues of a reduced system matrix, followed by a constrained linear approximation to settle the remaining free parameters in a second step.The (in)equality constraints make sure that passivity is achieved for high frequencies, which is the troublesome frequency regime in our applications.As we demonstrate by numerical examples with data from a statistical physics application, our algorithm is feasible for the treatment of significantly larger data sets than could have been handled previously.
The outline of the paper is as follows.In Section 2 we specify the stochastic modeling problem that we consider and draw the connections to linear systems theory.In Section 3 we review the AAA algorithm and its potential use for model reduction purposes.Subsequently, in Section 4, we develop the algorithm that we propose to compute approximations of the given autocorrelation function, which have all the requirements that are needed to construct a corresponding Markov model.Numerical examples are presented in Section 5. To be self contained we show in Sections 6 and 7 how to set up the extended Markov system: Although the material in these two sections is mostly general folklore, some of the details are difficult to extract from the literature and the presentation is adapted to our needs.Finally, in an appendix, we discuss the consistency of the constrained approximation problem which is considered in Section 4.

The stochastic modeling problem
For t ∈ R and s ≥ max{0, −t} let be the autocorrelation function of a real-valued centered stationary Gaussian process Y = Y (t), t ≥ 0, with positive variance, and assume that we are given 2n (possibly noisy) samples y ν ≈ y(ντ ) , ν = 0, 1, . . ., 2n − 1 , of y on some equidistant time grid with spacing τ > 0. The stochastic model of Y that we want to construct is formulated in terms of an Ornstein-Uhlenbeck equation with a stable matrix A ∈ R m×m and a scalar Brownian motion W acting in the direction g ∈ R m .The goal is to choose A and g in such a way that the scalar component X of the stationary solution of (2.1) approximates the original process Y in the sense that 2) The m − 1 components of Z ∈ R m−1 are auxiliary (dummy) variables for this purpose and of little interest in practice.We remark that there is no loss of generality in restricting the attention to a scalar Brownian motion in (2.1).This is a consequence of the Positive Real Lemma (see Section 7), as the process Y is itself a scalar one.We refer to Pavliotis [23] for a general reference concerning the stochastic differential equation (2.1), its stationary solution, and stationary processes in general.We will make use of the fact that the covariance matrix Σ ∈ R m×m of the stationary solution of (2.1) is determined by the Lyapunov equation and that the autocorrelation function of X is given by cf. [23,Section 3.7].Here and throughout we denote by e 1 the first Cartesian basis vector in the (real or complex) Euclidean space, the dimension of which is determined by the context.Likewise we write I for the associated identity matrix.
In view of our goal (2.2) we exclude systems (2.1), for which the autocorrelation function of X is identically zero.We may further assume without loss of generality that X and Z are stochastically independent, for this can always be achieved by a coordinate transformation, i.e., by representing the auxiliary variables in a different basis.Then for some σ 2 > 0 and some Hermitian positive semidefinite matrix Σ 0 ∈ R (m−1)×(m−1) , and (2.4) simplifies to The autocorrelation function of any second order stationary stochastic process is a function of positive type, cf., e.g., [23,Section 1.2], and hence, the Fourier transform of φ is nonnegative for every ξ ∈ R by Bochner's theorem.Vice versa, if φ is a nonnegative function then there is a vector g ∈ R m , such that the stochastic differential equation (2.1) has a stationary solution which satisfies (2.6); see Section 7 for details.
If the approximation X ≈ Y were exact, i.e., if φ = y, then y is the output of the deterministic dynamical system ẋ(t) = Ax + σ 2 e 1 u , with homogeneous initial data, where u = u(t) is a delta distribution at t = 0.In control theory, u represents the control (or the input) of the system.Associated with (2.8) is the so-called transfer function In view of (2.7) and Bochner's theorem we observe that κ has a nonnegative real part on the imaginary axis, and since A is stable, κ is analytic in C + , the open right complex half plane.Such functions are termed positive real, and when κ is positive real, then the system (2.8) is called passive.We can thus rephrase our stochastic modelling problem as the task to construct a passive deterministic system such that the output is in good agreement with the given samples y ν .
In contrast to control theory, however, we have no control on u; the input is necessarily given by a delta distribution.Since this particular input is not localized in frequency space, it prohibits, for example, the use of an algorithm suggested by Cherifi, Goyal, and Benner [10] for finding a passive model that is compatible with given time-domain data.Other works which deal with the construction of passive reduced models, e.g., [9,16,25], have proposed fixes of preliminary non-passive models by modifying their coefficients to shift negative values of φ which are present in finite frequency bands; these algorithms, however, are neither guaranteed to succeed, nor can they treat negative values of φ in unbounded frequency bands.Unfortunately, the latter is the main cause of difficulties in our application, because the Fourier transforms of autocorrelation functions which are of interest to us, vanish at infinity.This also rules out a method by Fazzi, Gugliemi, and Lubich [13], which requires such a fix in a preliminary step.
Other papers, e.g., [11,14,15], have suggested nonlinear optimization to modify the system coefficients in order to find nearby systems which satisfy the so-called Lur'e equations (see (7.3) below) that can be used to characterize passivity.Those methods have subtle convergence issues, are quite complicated to formulate, and their impact on the data fit in the time domain is not transparent.The algorithm which we propose here is much easier to implement, and a mean square approximation (2.2) of the given data is its primary objective.

The exponential approximation problem
We will stipulate the assumption that the matrix A in (2.1) is diagonalizable (we will briefly elaborate on this assumption in Remark 3.1).Then the right-hand side of (2.6) is a so-called Prony series, i.e., a real-valued linear combination * φ(t) = m j=1 α j e tλj for t ≥ 0 (3.1) of complex exponentials with some appropriate weights α j ∈ C, j = 1, . . ., m.This function decays exponentially because the λ j are the eigenvalues of A, and since A is stable, those belong to C − , the open left complex half plane.
To achieve the goal (2.2) we want to determine a Prony series φ which satisfies in a mean square sense.To this end we need to select an appropriate number m of terms in (3.1) and choose corresponding exponents λ j and weights α j , j = 1, . . ., m.This is a long-standing delicate numerical problem, cf., e.g., Varah [27], because (i) it is nonlinear in the parameters λ j , and (ii) the outcome is very sensitive to perturbations in the data.In particular, as the exponents have negative real parts, the given data y ν carry less and less additional information, the higher the value of ν.One can therefore expect the necessary number m of terms in (3.1) to drive the error in (3.2) below a given tolerance to be considerably smaller than n; note, however, that for m ≥ n it is generically possible to realize (3.2) with equality, as there are 2n equations to satisfy and 2m parameters to choose.We emphasize that we do not claim or attempt to resolve a presumably valid Prony series representation of the true autocorrelation function, which is a highly illconditioned problem.For our purposes it is sufficient to approximate the data by some Prony series representation, which is less ambitious and much better conditioned.
A common way to approach the exponential approximation problem (3.2) is via the generating function (or z-transform) of the data.Stipulating the very mild restriction that the samples (y ν ) ν∈N are absolutely summable, this function f is analytic in the exterior of the unit disk and extends continuously to the unit circle.Imagine that (3.2) holds with equality.Then the series can be rewritten as These z j lie inside the unit disk (and are nonzero), and hence the domain of convergence of the generating function extends to the exterior of some disk of radius ρ < 1 around the origin.It coincides with a rational function in the complex plane with a zero at infinity and m poles in the points z j .And vice versa: If f is a rational function with m distinct first order poles z j ̸ = 0 in the unit disk and corresponding residues α j , and if f is vanishing at infinity then the associated Prony series φ of (3.1) with interpolates the coefficients y ν of the Laurent series (3.3) at all grid points t ν = ντ , ν ∈ N 0 .We mention that the choice of the complex logarithm in (3.4) has no impact on the values of the Prony series at the grid points, as is obvious from (3.1).However, depending on which branch of the logarithm has been selected, the function φ may oscillate in between the grid points.Assuming that the sampling rate τ has been chosen small enough to capture all relevant frequencies of the target function y, the appropriate branch to choose in (3.4) is the one with imaginary parts in (−π, π).Further, if some pole z j happens to be negative, then this pole should be mapped onto two complex conjugate exponents λ ± j = (log |z j | ± iπ)/τ , with α ± j = α j /2 as corresponding weights, to obtain a real-valued approximation (3.1); we emphasize that the special case of negative poles within the unit disk calls for further attention at a later stage of the algorithm; see Remark 4.2 below.
The task (3.2) can therefore be recast as a problem of approximating the generating function f by a rational one, which vanishes at infinity, and then choosing φ to be the Prony series associated with this rational approximation.For example, one can compute the Padé approximation of f of order (m − 1, m) at infinity; the associated Prony series then interpolates the given data y ν , ν = 0, . . ., 2n − 1.The resulting algorithm is the classical Prony method for exponential interpolation, cf., e.g., Weiss and McDonough [28], or Plonka et al [24].This is the route that we have followed in [8].The resulting algorithm works reasonably well for twenty or so data points, but for larger or noisy data sets the resulting Prony series often fails to be of positive type.† Then there is no associated Markov system (2.1) which satisfies (2.6).We briefly mention that Bai and Freund [5] have suggested an ad hoc modification of their Padé approximation scheme to obtain a Prony series of positive type, but they apply the Padé scheme in the frequency domain and not in the domain of the z-transform.
An alternative way of computing rational approximations of the generating function can be based on the AAA algorithm ‡ .To this end one can adapt a suggestion by Derevianko, Plonka, and Petz [12] and first evaluate the available 2n leading terms of (3.3) on an equidistant angular grid on the unit circle, e.g., by using a fast Fourier transform of order 2n of the samples {y ν }.This gives data where ω = e iπ/n . (3.5) The AAA algorithm now proceeds by choosing an appropriate index set N k ⊂ {0, . . ., 2n − 1} associated with m k of these grid points on the unit circle, defines a corresponding parameterized rational interpolant and optimizes the m k free complex parameters w ν to approximate the remaining data pairs (3.5) by solving the linear least squares problem minimize among all coefficients w ν with ν∈N k |w ν | 2 = 1.It follows from (3.7) that any vector w ∈ C m k with minimizing weights is a singular vector associated with the smallest singular value of the so-called (rectangular) Loewner matrix † By some abuse of denomination we refer to a Prony series of positive type, if the Fourier transform of its even extension to all of R is of positive type.
We like to mention in passing that Claus Schneider has contributed in an early paper [26] to the use of the barycentric representation for rational interpolation, and refer to [22] for a detailed historical account of the ideas behind the AAA algorithm.
The AAA algorithm is a greedy iterative scheme, i.e., in each iteration k = 1, 2, . . ., the set of interpolation points is extended until the overall match (3.7) is below a given tolerance.Presuming that the final rational function has only first order poles z j ̸ = 0 within the unit disk, these poles and their residues yield as before an associated Prony series φ which satisfies (3.2).
In order to ensure the rational approximation to be real, i.e., that r k (z) ∈ R for z ∈ R, we initialize N 1 = {0, n} with the indices ν in (3.5) corresponding to z = ±1, and in each iteration we extend the index set N k by the two indices ν and 2n − ν of one complex conjugate pair of grid points, hence m k = 2k.Then it can be shown that in each iteration we obtain a real rational interpolant (3.6) which minimizes (3.7), provided we choose the weight vector w ∈ C m k appropriately.To be specific, assume that {ω ν } ν∈N k is any set of minimizing parameters for (3.7) computed by a singular value decomposition of the corresponding Loewner matrix.Assume further that ν, ν ′ ∈ N k correspond to two complex conjugate grid points of the grid (3.5).Then it is not too difficult to see by reordering the summation in (3.7) that the minimal value of (3.7) is also attained if the weight w ν is replaced by w ν ′ , and w ν ′ by w ν , respectively.Accordingly, the vector w ′ , which contains all these exchanged weights for all complex conjugate pairs of grid points in N k , must also be a singular vector of the same Loewner matrix associated with the same singular value, and this implies that (w + w ′ )/∥w + w ′ ∥ 2 is yet another vector in this singular subspace; the entries of this latter vector, however, are complex conjugate, when the corresponding data points are, and hence the associated rational function (3.6) is real.Remark 3.1.We have made the assumption that the system matrix A of (2.1) be a diagonalizable matrix.If this fails to be true, then some of the coefficients of the associated autocorrelation function φ may be polynomials α j (t) instead of being constant.This in turn implies that the corresponding generating function f has poles of higher order.And vice versa: If the rational approximation (computed by whatever means) has poles of degree two or more, then this calls for a system matrix A with nontrivial Jordan blocks.

⋄
Although the AAA algorithm performs well in approximating the generating function, there is no clear interpretation of the sense in which the associated Prony series will approximate the data in (3.2), nor is there any guarantee that this Prony series will be a function of positive type.As we have emphasized before, the latter is mandatory to obtain the desired Markov model (2.1).

The new algorithm for the stochastic modeling problem
Recall that it is our objective to approximate the given data by a Prony series in a mean square sense, subject to the constraint that this function be of positive type.The major difficulty in achieving this goal stems from the fact that there is no characterization (nor understanding) of the nonlinear manifold of data which correspond to Prony series of positive type.We therefore relax our aim in two ways: (i) we preassign the number m and the exponents λ j , j = 1, . . ., m, of the exponentials to be used in (3.1); (ii) we optimize the remaining free coefficients α j , j = 1, . . ., m, so as to minimize the mean square fit (3.2), subject to the weaker constraint that the Fourier transform of the Prony series is positive near infinity.
These goals are realized in two independent steps of our algorithm.Concerning item (i) we apply in a first step the variant of the AAA method described in the previous section.We take the poles z j of the resulting rational approximation (which are symmetric with respect to the real axis), ignore the associated residuals -because the final weights of the Prony series φ are subject to item (ii) -and choose the exponents λ j of φ via (3.4).As has been mentioned in Remark 3.1 higher order poles z j lead to non-diagonalizable coefficient matrices A in (2.1); although it is very unlikely that this case will ever show up in numerical computations, we simply ignore the order of the poles for item (i).We further remark that some poles may fail to lie in the open unit disk.Those are called spurious poles, because they typically come with very tiny residuals and have small impact on the quality of the approximation; so we can simply eliminate them.The same is true for a possible pole at z = 0.In fact, in our numerical implementation we eliminate all poles whose residues are negligible relative to y 0 = f (∞).For the moment we also make the assumption that none of the remaining poles is negative (see Remark 4.2 below).
Item (ii) is motivated by the fact that in the applications that we have in mind the true autocorrelation function y belongs to L 1 (R), and hence, y(ξ) converges to zero for |ξ| → ∞.Accordingly, violations of the positivity of φ are most likely to happen near infinity, in particular, as tiny spurious oscillations in the data due to noise will mostly affect high frequency components of φ.Further, the positivity of φ near infinity can be checked with a finite number of linear constraints on the coefficients α j , which simplifies their optimization.In fact, a straightforward computation shows that and expanding the right-hand side for large frequencies ξ we obtain for |ξ| > max |λ j |, with It follows that φ(ξ) ≥ 0 for all sufficiently large frequencies ξ, if and only if the leading term of (4.2) is positive near infinity, i.e., if condition holds true for some k * ∈ N 0 .As mentioned before, these are finitely many linear constraints on the searched for coefficients α j .Thus, to realize item (ii) we specify the weights α j via the linear least squares problem minimize α j e ντ λj 2 subject to the (in)equality constraints (4.4) .
The solution of this problem constitutes the second step of our algorithm.Note that the quadratic objective function of (4.5) is strictly convex as long as we restrict ourselves to m ≤ 2n; compare, for example, [24].Since the constraints also define a convex set Q of admissible parameters, problem (4.5) has a unique minimizer, provided that the constraints are consistent.
The same argument as in the previous section can be used to show that the minimizing weights come in complex conjugate pairs, so that the corresponding Prony series is real-valued.The solution of (4.5) can be found by implementing a loop over the number k * = 0, 1, 2, . . . of active constraints in (4.4).We initialize k * = 0 and consider first the unconstrained least squares problem minimize ψ(a) (4.7) for the given exponents λ j ∈ C − , j = 1, . . ., m.If the solution a † ∈ C m of (4.7) satisfies β 0 = β 0 (a † ) > 0 then we have found the unique solution of problem (4.5).Otherwise the solution vector a of (4.5) must satisfy the additional equality constraint We thus let k * = 1 in this case and minimize ψ(a) subject to the equality constraint (4.8).This problem has a unique solution, which we denote again by a † ∈ C m for simplicity; it can be determined by using the constraint to eliminate one of the free variables and then solving a single linear system of (normal) equations for the remaining ones; compare, e.g., Björck [7, Section 5.1] for stable numerical implementations.If the corresponding value of β 1 = β 1 (a † ) > 0 then the parameter vector a † is the unique solution of (4.5); if not, we again increase k * by one, i.e., add another active constraint via (4.4), repeat the optimization process anew, and so on.This loop over k * terminates whenever the solution of (4.5) has been found, or when the set of constraints becomes inconsistent, i.e., when Due to the Vandermonde type structure of the constraints this happens exactly when k * = m, i.e., when the number of active constraints has reached the number of free parameters.If this happens then the algorithm has failed.
Such a failure indicates that the Prony series approximation (3.1) does not have enough degrees of freedom or that the data are too noisy.To account for the former, one can enforce a smaller tolerance for (3.7) in the AAA algorithm in view of item (i) above, because this will typically give more poles, i.e., more terms for the Ansatz (3.1).To cope with too much noise in the data, one option is to use a larger grid spacing τ , or to come up with some individual sophisticated treatment to reduce the noise in the data.
We also emphasize that even when (4.5) has a solution, then this is no guarantee that the resulting Prony series is of positive type, for the Fourier transform may still be negative in some bounded frequency band.It remains open how to best proceed in such a situation.Remark 4.1.Depending on the application it may be appropriate to append additional constraints to (4.5).For example, in many situations the variance y(0) of the underlying stationary process Y can be measured more accurately than the other autocorrelations, or may be known beforehand from theoretical considerations.In this case it may be reasonable to enforce the approximation φ ≈ y to be exact at the origin, i.e., by adding the constraint In other applications, for example the one in Section 5, it may be known a priori that the true autocorrelation function is differentiable, in which case y ′ (0) must vanish due to the symmetry of y.In this case one may wish to add the constraint which means that β 0 = 0, compare (4.3).Accordingly, (4.4) can only hold for some k * ≥ 1, and in this case we initialize k * = 1 when executing the loop of our algorithm for the constrained least squares problem for the first time.We will show in the appendix that the inhomogeneous constraint (4.9) and the equality constraints in (4.5) are consistent as long as k * < m. ⋄ Remark 4.2.So far we have excluded the case that the rational approximation determined by the AAA algorithm has negative poles.Now let us assume that some pole z j happens to be negative.Then the corresponding two exponents differ by 2iπ/τ , and the exponentials e ντ λ ± j in (4.5) are indistinguishable.Accordingly, the objective function ψ of (4.6) is convex, but not strictly convex.If (4.9) is the only equality constraint in (4.5) then there is an infinite family of Prony series which solves (4.5):If φ is one of them, then φ(t) + γ e µj t sin(πt/τ ) , t ≥ 0 , with γ ∈ R, |γ| small, are further ones.By tuning γ, one can change φ ′ (0) to any value one might like -provided the positivity of φ is not being violated.Which of them is "best" (and in what sense), however, is not clear.
In our implementation we enforce this derivative to be zero in this case, i.e., we add the constraint (4.10).If the rational function has only one negative pole then this heals the problem: the objective ψ is strictly convex over the set Q of admissible parameters defined by (4.4) with k * ≥ 1.If the rational function has ℓ ∈ N negative poles, then a similar fix is possibly by constraining k * ≥ ℓ in (4.4); see Theorem A.4 in the appendix.

⋄ 5 Numerical example: Motion of a colloid within a fluid
For our numerical results we focus on applications where the underlying stationary process Y : R → R satisfies a generalized Langevin equation with an even integral kernel γ ∈ L 1 (R) of positive type.This sort of equation is used, for example, in statistical physics as a coarse-grained model for the velocity Y of a macroparticle, a colloid say, dispersed in a fluid, where the so-called memory friction γ represents the interactions of the macroparticle with the constituents of the fluid.In (5.1) m is the mass of the macroparticle and the external forcing F is a random process, whose autocorrelation function is given by for a system in thermodynamical equilibrium, where β > 0 is the inverse temperature, cf., e.g., Jung and Schmid [19].Moreover, Y is a Gaussian process whenever the random forcing is.The numerical implementation of (5.1) is involved and expensive, and therefore approximate Markov models (2.1) are often sought for simulations of the coarsegrained system.Typically autocorrelation data of Y are used to find such Markov approximations.Different techniques have been proposed for this in the literature, cf., e.g., [8], and the references given there.Most of these work well for a handful of auxiliary variables only and a corresponding limited number of data points.As will be demonstrated below, the new algorithm of Section 4 can handle much larger data sets, while the size of the resulting Markov models remains manageable.
For the generalized Langevin equation (5.1) it is not difficult to see that the autocorrelation function y of the stationary process Y satisfies the delay differential equation cf., e.g., [17].Note that the variance 1/(βm) of the process corresponds to the average kinetic energy of a particle of mass m in thermodynamical equilibrium, and for physical consistency it suggests itself to prescribe φ(0) to be exactly this value.Further, it follows readily from (5.2) that ẏ(t) → 0 as t → 0, and hence, the even autocorrelation function y is differentiable with ẏ(0) = 0. We therefore include the two additional constraints (4.9) and (4.10) of Remark 4.1 in our algorithm for the determination of the weights α j of the Prony series (3.1).The data for the numerical examples have been borrowed from our previous work [8,18], where more details about the physical setting and the particular application can be found, as well as the technical details concerning the generation of these data.For the detection of spurious poles of the rational approximations in the first step of our algorithm we have used a tolerance of 10 −3 for the absolute values of the residues of the poles, relative to the value of y 0 .Example 5.1.As a first test case we consider the autocorrelation function from [18] shown as black solid line in the left panel of Figure 1.We assume that we are given 2n = 50 samples of this autocorrelation function with grid spacing τ = 0.05.These data points correspond to the black circles in the plot on the left hand side.
For this example the Padé approximation which is computed in the Prony method mentioned in Section 3 would have 25 poles, and if the associated Prony series is of positive type then the corresponding Markov system would have 24 auxiliary variables.Of course, some of these poles may have negligible residues.With a relative tolerance of 10 −3 for the data fit (3.7) the AAA algorithm determines a rational approximation of the generating function with only five poles.The optimization of the corresponding weights α j of the Prony series succeeds in the very first execution of the loop (with counter k * = 1) to find the global solution of (4.5).One can check that the Fourier transform of the associated Prony series has no real zeros, so this Prony series approximation of the autocorrelation function is of positive type.
With the smaller tolerance 0.5 • 10 −3 for (3.7) the resulting AAA approximation has seven poles.This time the optimization of the corresponding weights requires more than one execution of the loop, because β 1 turns out to be negative initially.The loop terminates for k * = 2 with the global solution of (4.5).Again the resulting Prony series is of positive type.
The graphs of the two approximations are displayed as dashed lines in Figure 1.In the panel on the left they can hardly be distinguished from the given autocorrelation function.Note that they correspond to autocorrelation functions of the first component X of a Markov system (2.1) with four, respectively six, auxiliary variables only.The panel on the right of Figure 1 reveals that the autocorrelation function of the approximation with four auxiliary variables is somewhat less accurate for small times.
We mention that with even smaller tolerances for (3.7) the AAA approximations develop spurious poles and the final mean square fit does not improve any further.⋄ Example 5.2.To compare the outcome of the new method with the numerical results in [8, Section 3.2], we also consider the corresponding smaller data set with 2n = 20 samples and slightly larger grid spacing τ = 0.06 of the same autocorrelation function.
Again, for a relative tolerance of 10 −3 the rational function computed with the AAA algorithm has m = 5 poles, but this time one of these poles is negative and leads to two associated terms of the Prony series, see Remark 4.2.Since we restrict ourselves to k * ≥ 1 anyway, the constrained least squares problem (4.5) has a unique solution, which is, indeed, obtained for k * = 1 as in Example 5.1.When the tolerance for (3.7) is 10 −4 then the associated rational function has seven poles.Again, one of these poles is negative and the optimal weights are obtained for k * = 1.
As in Example 5.1 both Prony series can be checked to be of positive type; their graphs are included in Figure 1 as dotted lines.Note that they use other interpolation data than the approximations in Example 5.1, with samples restricted to t ≤ 1.2.As can be seen from the plot on the right the quality of the approximations of the corresponding Prony series deteriorate for larger times.
Due to the fact that one of the poles of the rational function has been negative, the corresponding two Markov systems have one auxiliary variable more than the ones of Example 5.1, namely five and seven, respectively.To compare the results with the ones of [8] we point out that the autocorrelation function constructed in [8] with the Prony type algorithm is of similar quality, but requires nine auxiliary variables.is taken from [8, Section 3.3], where it is termed "medium noise setting"; as the name suggests the corresponding data are somewhat noisy.With the same grid spacing τ = 0.06 as in Example 5.2 the method of [8] failed to produce an approximation of positive type when the first 2n = 50 data points were used, and also for 2n = 48 data points.It was successful for 2n = 46 data points, though, but despite the m = 13 terms of the approximating Prony series, the fit of the autocorrelation function wasn't too good in that case, either; compare the middle panel of Figure 6 in [8].
With our new method and each of the three data sets the default tolerance of 10 −3 for the fit (3.7) of the AAA algorithm is sufficient to obtain Markov models for which the autocorrelation functions in question are in good agreement with the given data.The graphs of two of them, namely the ones for 2n = 46 and 2n = 50 data points are displayed in Figure 2, together with the underlying autocorrelation function (black line) and the data samples (black circles).The associated Prony series have five terms (2n = 46) and nine terms (2n = 50), respectively; in both cases the rational approximations had two further poles, which were decided to be spurious.The two Prony series approximate the data equally well, both in terms of the mean square and absolute error.The fact that more poles are necessary for 50 data points may reflect the aforementioned difficulties of the Prony method for that situation.
The approximation shown in Figure 2 for 2n = 48 grid points has been obtained for the slightly smaller tolerance 0.5 • 10 −3 for (3.7).Its data fit is about a factor of two better than for the other two models.In this case the AAA rational approximation had 13 poles, four of which were considered spurious, while two poles in the exterior of the unit disk just barely failed the criterion of being spurious poles.One may suspect that poles outside the unit circle may be more common when the data are noisy.Of course these two poles had to be eliminated to make the algorithm work, much to the success of the corresponding approximation, which thus had seven terms, eventually.
The plot on the right-hand side of Figure 2 zooms into a detail of the left-hand plot, in which the noise in the autocorrelation data is striking.Take note that the given data samples correspond to times t < 3.As can be seen from this plot the Prony series approximations provide a smoother and more reliable extrapolation of these data to times t > 3 as the measured values of the autocorrelation function do.This is probably due to the fact that the corresponding functions are of positive type, i.e., are "more physical".

⋄
These examples show very well that the new method has a great potential to provide excellent Prony series approximations of positive type.It is also fascinating to see the approximation power of the parameterized Prony series, despite the fact that there is only a handful of free parameters to optimize.In other words, the AAA algorithm does a very good job in selecting appropriate sets of exponents for their parameterizations.

Computation of the system matrix
Once the Prony series φ of (3.1) has been computed, the next step is to set up an associated system matrix A which satisfies the second equation in (2.6).
We want this matrix to be real because the simulation of the Markov system (2.1) should give real paths.To this end we define the real Jordan canonical matrix associated with the exponents λ j ∈ C − of (3.1).In (6.1) the negative exponents λ 1 , . . ., λ r make up the leading part of the diagonal, and two-by-two rotation matrices follow on the remaining part of this block diagonal matrix, one for each pair of complex conjugate exponents, like λ r+1 and λ r+2 = λ r+1 , and so on.Next we introduce the two (real) m-dimensional with entries v j = 1 and w j = α j for the indices j = 1, . . ., r corresponding to the real exponents, and and so on, for the complex conjugate ones.Then one readily checks that In particular, for t = 0 this gives by virtue of (2.6).Now one can construct a nonsingular matrix V ∈ R m×m which satisfies (See, for example, Bai and Freund [4, Lemma 1] for an explicit construction.)With the help of (6.2) and ( 6.3) we thus arrive at the representation (2.6), namely with the stable matrix 7 Determination of the driving noise direction The final step of our construction of the Markov system (2.1) consists in finding the direction g ∈ R m of the driving Brownian motion, so that the autocorrelation function of the stationary solution of (2.1) satisfies (2.6).For the ease of presentation we will make the stronger assumption that the computed Prony series φ is of strict positive type, i.e., that φ(ξ) > 0 for all ξ ∈ R ; we refer to Anderson and Vongpanitlerd [3] for a treatment of the general case.We further assume that the coefficients α j of (3.1) are all different from zero, and that φ can be written in the form (2.6) for some (stable) matrix A ∈ R m×m , e.g., by choosing A as in Section 6.Then, in the systems theory terminology, the representation (2.9) of κ is a minimal realization of this transfer function.From (7.1) and (2.7) it follows that κ is analytic in a neighborhood of the closed right half complex plane with Re κ(iξ) > 0 for all ξ ∈ R .
Given these properties the Positive Real Lemma states that the singular Lur'e equations AΣ + ΣA T = −gg T , Σe have a solution -consisting of a symmetric positive definite matrix Σ ∈ R m×m and a vector g ∈ R m ; cf.[3].If the driving force of the Ornstein-Uhlenbeck equation (2.1) operates in the direction g then Σ is the covariance matrix of its stationary solution, compare (2.3); the second equation in (7.3) implies that Σ can be written in the form (2.5), and hence, the autocorrelation function of the stationary solution satisfies (2.6) as desired.
Note that we not only need g if we want to simulate the process X, but also Σ for the initial value X(0) Z(0) ∼ N (0, Σ) (7.4) of the stationary solution of (2.1).To compute Σ and g, i.e., to solve the singular Lur'e equations numerically, we reduce them to a set of Lur'e equations of lower dimension, cf.Anderson [1].Since this is an essential step of our algorithm we provide some more details, as those are rarely found in the pertinent literature.We represent A in the block form , and a 0 ∈ R. It follows from (2.6) that because functions of positive type attain their maximum at the origin.Inserting (7.5) and (2.5) into (7.3),we see that the driving force vector g has to have the form where g 0 ∈ R m−1 and Σ 0 ∈ R (m−1)×(m−1) satisfy the system of equations This is the general form of the Lur'e equations, with the singular case corresponding to a 0 = 0. Again, the Positive Real Lemma states that there is a vector g 0 and a symmetric positive definite matrix Σ 0 , which solve (7.7), if A 0 is stable and the associated transfer function where we have used (2.9) for the final step.Since κ is analytic in C \ C − , its real part is harmonic, and it follows from (7.2) and the maximum principle that it is positive in the closed right half plane.This implies, that (i) κ 0 is analytic in C \ C − , and (ii) its real part also satisfies (7.2).Accordingly, κ 0 is, indeed, positive real.
To see that the lower right block A 0 of A in (7.5) is stable, we assume to the contrary that µ is an eigenvalue of A 0 , which does not belong to C − .Let x r and x l be associated right and left eigenvectors of A 0 , respectively.Then it follows from (7.5) that neither c T 0 x r nor x * l b 0 may vanish, for otherwise [0; x r ] or [0; x l ] would be right/left eigenvectors of A for the same eigenvalue; but this is not possible because A is stable.Therefore, since neither c T 0 x r nor x * l b 0 vanish, it follows from (7.8) that κ 0 has a pole at ζ = µ, but this is not possible as we have discussed above.Accordingly, all eigenvalues of A 0 belong to C − .We thus have verified that (7.7) has a solution.
If a 0 > 0 then a solution of the Lur'e system (7.7) can be obtained numerically by first solving the Riccati matrix equation for Σ 0 , and then computing g 0 from the second equation in (7.7), cf.[3].This is a well-studied numerical problem; compare, e.g., Bini, Iannazzo, and Meini [6].
On the other hand, when a 0 = 0, i.e., in the case of a singular Lur'e equation (7.7), then we first note that σ 2 0 = c T 0 b 0 > 0 .(7.10)In fact, it follows from (7.7) that in the singular case and equality can only occur when Σ 0 c 0 = 0.But this means that b 0 = 0, and then A is singular according to (7.5), which would contradict the stability of A. Therefore, (7.10) is valid, and we can choose (as in Section 6) a similarity transformation V ∈ R (m−1)×(m−1) with so that we can rewrite Replacing A by (the stable matrix) V A 0 V −1 , we can then proceed as above to reduce the solution of the singular Lur'e system (7.7) of dimension m − 1 to a Lur'e system of dimension m − 2, and so forth.Take note that Re κ(iξ) = −σ 2 e T 1 A(A + ξ 2 I) −1 e 1 is an even function of ξ ∈ R, and it has an asymptotic expansion of the form Re κ for some k ∈ N and some γ k ̸ = 0; as usual, we denote by o(1) a term which converges to zero in the respective limiting process.It thus follows from (7.9) and (2.9) that 1) .that p( • ; π 0 ) has at least k − 1 zeros (counting multiplicities) in C + , and the same argument applies to C − .This concludes the proof.
Proof.We rewrite the constraints in matrix form Ba = y 0 e 1 , (A.7) where a = [α 1 , . . ., α m ] T ∈ C m as before and is a complex matrix with k * +1 ≤ m rows and m columns.Accordingly, the constraints are consistent, if B has full row rank, i.e., if the null space of B * is trivial.Note that we can add and subtract complex conjugate columns of B to transform B into a real matrix of the same rank.This means that if B should fail to have full row rank then there is a nontrivial real vector π = [π 0 , π 1 , π 3 , . . ., π 2k * −1 ] T ∈ R k * +1 , such that B T π = 0.The latter is the case, if and only if the associated real polynomial p of degree 2k * − 1 of (A.2) is vanishing for z = λ 1 , . . ., λ m .This means that p has (at least) m roots in C − .Now we distinguish two cases.If π 0 = 0 then p is odd, and for each zero in C − there is a corresponding one in C + .This would imply that p has 2m ≥ 2k * + 2 zeros (at least).On the other hand, if π 0 ̸ = 0 then Theorem A.1 states that p has k * − 1 zeros (at least) in C + ; accordingly, p has at least k * − 1 + m ≥ 2k * zeros in this case.Since p has degree 2k * − 1 this shows that p must be the zero polynomial in either case, and hence, all its coefficients are zero, i.e., π = 0.This is a contradiction, which shows that the null space of B * is trivial, and hence, B has full row rank.
Remark A.3.Note that we have made the assumption that y 0 = E Y 2 > 0. In this case the system (A.1) and (4.9) of constraints becomes inconsistent, whenever k * ≥ m.
To see this it remains to establish the inconsistency of the first m + 1 constraints.We use the arguments and the notation from the proof of Corollary A.2.For the system of the first m + 1 constraints the matrix B of (A.8) has one more row than columns, and its first m rows have full rank according to Corollary A.2.It follows that the null space of B T is one-dimensional.Let π ∈ R m+1 be a nontrivial vector from this null space and p of (A.2) be the corresponding polynomial of degree 2m − 1.Then p has m zeros λ 1 , . . ., λ m ∈ C − .Assume now that π 0 = 0. Then p is odd, and hence, it has m further zeros in C + .This makes 2m zeros, which exceed the degree of p. Hence, if π 0 = 0 then p must be the zero polynomial which gives a contradiction.
Accordingly, B * π = B T π = 0, i.e., the null space of B * contains a vector π, whose first entry is nonzero.But then the right-hand side of (A.7) cannot belong to the range space of B, because e 1 is not orthogonal to the null space of B * .This shows that the system (A.7) is inconsistent, whenever the number of constraints exceeds the number of free variables.

⋄
The final result of this appendix deals with Remark 4.2 and considers the case that the rational function determined by the AAA algorithm has ℓ ∈ N (distinct) negative poles.Theorem A.4.Let Q be the set of parameter vectors a = [α 1 , . . ., α m ] T ∈ C m which satisfy (4.4) with the additional constraint that k * ≥ ℓ.Then ψ of (4.6) is strictly convex over Q. .
The matrix on the right-hand side is the square Vandermonde matrix of µ 2 1 , . . ., µ 2 ℓ , which is nonsingular because the µ 2 l are pairwise different, since all µ l are negative.Therefore the null space of the matrix on the left-hand side is trivial, and hence, the coefficients γ 1 , . . ., γ ℓ are all equal to zero, showing that a = a ′ in (A.10).Therefore, ψ is strictly convex over Q.