Toward a certified greedy Loewner framework with minimal sampling

We propose a strategy for greedy sampling in the context of non-intrusive interpolation-based surrogate modeling for frequency-domain problems. We rely on a non-intrusive and cheap error indicator to drive the adaptive selection of the high-fidelity samples on which the surrogate is based. We develop a theoretical framework to support our proposed indicator. We also present several practical approaches for the termination criterion that is used to end the greedy sampling iterations. To showcase our greedy strategy, we numerically test it in combination with the well-known Loewner framework. To this effect, we consider several benchmarks, highlighting the effectiveness of our adaptive approach in approximating the transfer function of complex systems from a few samples.


Introduction
Rational approximation techniques have been applied with great success in the field of surrogate modeling for frequency responses of dynamical systems, with notable applications in the synthesis of electric circuits, and in the structural response analysis for mechanical systems.In the field of model order reduction (MOR), a plethora of methods and algorithms have been developed with this aim in the last decades, including vector fitting (VF) [1,2], the Loewner framework (LF) [3,4], RKFIT [5], AAA [6], and minimal rational interpolation Toward a certified greedy Loewner framework with minimal sampling (MRI) [7].In such methods, one leverages a priori knowledge on the structure of the problem, namely, the rational dependence of the transfer function H(z) on the frequency z ∈ C. For this reason, a rational function is introduced as ansatz for the surrogate model.Its coefficients (the "degrees of freedom" of the surrogate) are then optimized by fitting the surrogate to the available data, usually in the form of transfer function samples: {(z j , H(z j ))} S j=1 .Alternative approaches like reduced basis and balanced truncation have also been shown to be very effective [8][9][10][11].However, they have the disadvantage of being intrusive, i.e., they require access to the matrices defining the dynamical system.This drastically reduces their range of application.For instance, the system matrices are unavailable when closed-source software is used to model the target transfer function.Similarly, intrusive methods cannot be applied when the transfer function is obtained via lab measurements, since, in this case, the system matrices are not even defined.In these settings, nonintrusive rational-approximation-based methods are paramount for obtaining surrogate models for the frequency response.
In order to guarantee an effective and accurate approximation, it is crucial to judiciously choose the number and locations of the sample points {z j } S j=1 used to build the surrogate model.To this aim, it is useful to consider the questions: how many sample points are necessary to attain a desired approximation accuracy?And where should they be placed?
In this work, we look at these questions from a practical "algorithmic" viewpoint (thus ignoring "optimality" concerns that are more pertinent to information theory), following the well-explored idea of greedy MOR [11].Specifically, assume that a starting set of S sampled frequencies {z j } S j=1 is given.We want to devise a constructive strategy for identifying the location of one new point z S+1 , in such a way that the surrogate built from samples at {z j } S+1 j=1 is (a priori ) the "best" among all possible surrogates built in this way.Otherwise said, given only the available samples, we wish to identify the unexplored frequency that will provide the most information on the not-yetunderstood features of the target H.We summarize this idea in Algorithm 1, where, for simplicity, we initialize the sample set as a single point.

3:
build the surrogate H from the available S samples find the next point z S+1 using a greedy criterion 8: compute H(z S+1 ) and add z S+1 to the sampled frequencies 9: end for In intrusive MOR for frequency-domain problems, the greedy sampling is commonly carried out by placing the next sample point at the location where the current approximation residual is largest.However, evaluating residuals is generally impossible in non-intrusive settings, and an alternative criterion for the greedy selection of sample points must be found.In this work, we derive one such criterion in Section 3. Specifically, with our proposed indicator, we aim at approximating the relative approximation error, a quantity that is notoriously difficult to pinpoint.This is especially the case in frequencydomain applications, since, in general, the inf-sup constant of the problem is not uniformly lower-bounded due to resonant behavior.
We note that, throughout this paper, we use the term "adaptivity" in a way that differs slightly from the customary one in rational approximation.Indeed, methods like RKFIT and AAA (whose first "A" actually stands for "adaptive") are able to adapt the so-called rational type of the surrogate, i.e., essentially, the degree of the approximation.VF has also often been endowed with strategies for adaptive selection of the rational type, which can be summarized as finding a balance between "interpolation" and "generalization" errors [12].However, all the above-mentioned methods for non-intrusive surrogate modeling assume that a sample set is provided in advance, over a grid of sampled frequencies {z j } S j=1 that is fine enough for the identification of the relevant features of the transfer function H. Building this initial database of S samples can be quite expensive, since it requires many queries of the high-fidelity transfer function H. Also, practitioners usually have to guess a large enough value S. In this work, we endeavor to remove this initial "fine sampling" phase.Instead, as exemplified by Algorithm 1 (and more in line with intrusive greedy MOR), we seek a strategy where the sample set is parsimoniously enriched as H is built.
The only prior works (that we could find) on comparable approaches for non-intrusive frequency-domain MOR are [13,14].Also, [15,16] explore similar ideas in intrusive settings.In the above references, the sampling is incremental and constructive.However, the greedy criteria driving the respective algorithms are only heuristic.In contrast, here we endeavor to provide more rigorous strategies, generalizing those developed for MRI in [17,18].
Before proceeding, we believe it important to mention the issue of noise: the samples H(z j ) might be inexact.This happens whenever the samples come from lab experiments, since they might be affected by measurement error.Also, samples are noisy when approximations (e.g., finite-element discretizations) are introduced in describing the system of interest.This step is necessary when modeling, for instance, distributed electrical devices.In such cases, the noise arises as a "discretization" error, which cannot always be neglected [19].In all these settings, interpolatory approaches might incur overfitting, resulting in wildly inaccurate surrogate models, e.g., due to Runge oscillations.More robust least-squares approaches are usually preferred in such cases.However, in any such framework (which includes oversampling as a measure against noise), a "truly parsimonious" adaptivity is usually impossible.We refer to [20] for a discussion on this issue.

Basics of rational MOR
We target a frequency-domain linear time-invariant descriptor system where E, A ∈ C n×n , B ∈ C n×m , and C ∈ C p×n .The frequency z is a complex scalar number, upon which the input U (z) ∈ C m , the state X(z) ∈ C n , and the output Y (z) ∈ C p depend.For all z ∈ C, we define the associated transfer functions for state and output, respectively, as (For conciseness, we refer to H as just "transfer function".)The frequency values for which H is unbounded (due to zE − A being singular) are the poles or resonating frequencies of the system.In this paper, we make use of the symbol ∥ • ∥ to denote the Frobenius norm for matrices.
As outlined above, we are interested in building an approximation of H using samples of it at some frequency values: {(z j , H(z j ))} S j=1 .Throughout the paper, S denotes the total number of available samples.In applications, the system dimension n is often large, and computing samples of the transfer function H is expensive due to the matrix inversion in Φ.For this reason, we try to keep S as low as possible.
We note that approximations of the state transfer function G are also possible.This case can be simply recovered by setting C = I n , the identity matrix of size n.Such setting is sometimes of interest when (1) is obtained by discretizing a PDE, e.g., the Helmholtz equation.In this case, the state X corresponds to the (discretized) "solution field" of the PDE.
In non-intrusive data-driven MOR methods, a rational approximation of the frequency response is built by fitting the sampled values of H.Both interpolation-based (e.g., LF [3] and MRI [7]) and least-squares-based (e.g., VF [1]) approaches are available in the literature.
All these approaches start by choosing a suitable format for rational functions.The classical choice involves expanding numerator and denominator in terms of a polynomial basis, e.g., monomials or Legendre polynomials: for some yet-to-be-determined sets of coefficients { P i } M i=0 ⊂ C p×m and { q i } N i=0 ⊂ C.More recently, this format has been (mostly) set aside due to its limited stability properties, in favor of the barycentric format [6,21]: The (distinct) support points {ζ j } j ⊂ C can be chosen arbitrarily.However, one should be careful about defining the value of the surrogate at such points, since both numerator and denominator are unbounded there.To solve this issue, one needs to eliminate a "removable discontinuity": This leads to a rational format that is equivalent to the usual one in (3).However, the barycentric format is (usually) much more reliable and convenient, since many operations that are numerically unstable can be carried out more safely.For instance, given S samples {(z j , H(z j ))} S j=1 , the interpolatory barycentric expansion characterizes all rational functions of type1 [S − 1/S − 1] that interpolate the data exactly, cf.(4).Notably, interpolation is achieved without involving any ill-conditioned Vandermonde matrix.Instead, to achieve interpolation in the barycentric form, one just needs to choose the sample points as support points!We refer to [21] for more details on the barycentric format.We restrict our focus to interpolation-based approaches, where all samples are reconstructed exactly: H(z j ) = H(z j ) for all j = 1, . . ., S. Without loss of generality, we can assume that the surrogate H is represented in interpolatory barycentric form (5). The coefficients q 1 , . . ., q S are found by imposing additional conditions, which depend on the specific MOR method.We mention here some options: • LF for SISO systems (p = m = 1).The barycentric coefficients are found by imposing interpolation conditions at additional frequencies z ′ 1 , . . ., z ′ S−1 .Note that only S − 1 additional points are necessary, since one of the degrees of freedom of the barycentric coefficients is fixed by prescribing a normalization constraint, usually (Note that the above-mentioned SISO LF is just a special case of this more general setting.)For MIMO LF, alternative strategies based on tangential interpolation of H are also available [3]. • MRI.The barycentric coefficients are found by solving a minimization problem involving only the original S sampled frequencies.Specifically, minimize MRI does not require any extra samples in addition to the initial S ones.However, spurious solutions might arise if the size of H, namely, pm, is too small, since S j=1 q j H(z j ) ≡ 0 can be trivially satisfied.For this reason, MRI is mostly recommended only for state approximation [7].

An ideal greedy strategy
When approximating the transfer function H, an "ideal" greedy algorithm is driven by the δ-adjusted relative error Above, δ is a small number (e.g., δ = 10 −8 ) that prevents division by 0 whenever the relative error is computed at a zero of the transfer function.Our discussion mostly generalizes to δ = 0 (relative error) and δ → ∞ (absolute error).See Section 6.
In this setting, the next sample point is z S+1 := arg max z ε( H, z), and the termination criterion is with tol being a user-specified tolerance.Note that this is a very idealized strategy: in general, the maximum relative error ε ∞ is infinite, since the surrogate H is unbounded near the surrogate poles.
In practice, the continuous maximization over z is replaced by its discrete version over a fine grid of S ′ test frequencies.In this context, the next sample point is chosen among the S ′ test frequencies, and the termination condition reads: all the S ′ test values of the error must be below the tolerance.This approach is extremely expensive, since evaluating the error at all test frequencies requires S ′ high-fidelity samples of the transfer function.For this reason, in projection-based approaches like reduced basis, it is common to employ residual-based error estimators instead of the actual error.
Such estimators cannot be developed for the transfer function, since no "residual" can be defined for H. Instead, one commonly considers the state X or its transfer function G. Specifically, in this latter case, we define the relative residual norm as with G being the current surrogate model for G.Note that ρ(G, •) ≡ 0. By relying on the so-called (in MOR) affine form of the original system (1), such residual can be efficiently computed at any frequency z, with a cost that is independent of the original system size n.See [11] for more details on this.
In a non-intrusive framework, residual-based estimators like ( 8) are unavailable: since it might be impossible to gather samples of the state X or of the state transfer function G, which are necessary to build G and to evaluate ρ.However, we can still use (8) to our advantage, as we do in the next section.

Non-intrusive error maximization
As showcased in Algorithm 1, the greedy algorithm requires two main ingredients: the identification of the next sample point and the termination criterion check.In this section, we focus on the former.In this context, we have described above how the next sample point should be selected as the location where the approximation error or residual norm is maximal.To make this process nonintrusive, we need to express such error measures only in terms of available quantities.To this effect, we show the following property, which generalizes a similar result presented in [7] for state approximation.
Proposition 1 Consider an interpolatory rational approximation of G, namely, There exists a z-independent constant γ such that the relative residual norm satisfies Proof Let Q(z) = S j=1 q j /(z − z j ) be the surrogate barycentric denominator, and r( G, z) = (zE − A) G(z) − B the residual.By multiplying r by Q(z), we obtain Now we add and subtract z j EG(z j ) from the numerator above and exploit the definition of G(z j ), cf.(2): The claim follows by taking the Frobenius norm and dividing by |Q(z)|∥B∥.Specifically, γ is the norm of the right-hand-side of (11), divided by ∥B∥.□ The above property states that the approximation residual for the state transfer function G equals, up to a z-independent proportionality constant, the inverse of the magnitude of the surrogate barycentric denominator Q(z) = S j=1 q j /(z − z j ).Specifically, the residual norm is large if and only if the surrogate denominator Q is small.
Note that, for Proposition 1 to hold, we just require the function G to be a rational function of type [S − 1/S − 1], interpolating the exact G at all the S support points.We can leverage this to obtain some useful insights on the approximation quality of non-intrusive methods.
Indeed, consider the surrogate H obtained by a generic interpolatory approximation-based method like LF or AAA, and assume that q 1 , . . ., q S are its barycentric coefficients, cf.(5).In the barycentric expansion of H, namely, (5), we can formally replace all occurrences of the transfer function H with the state transfer function G, leading to a new surrogate G as in (9), which is an approximation of G. (Note that such operations might be impossible in practice, since samples of G are generally unavailable!However, we are only proceeding formally here.)Proposition 1 reveals the exact form of the residual norm (8) of such new surrogate.
Specifically, assume that the residual norm ( 8) is used to drive Algorithm 1. Then (10) suggests that the approximation quality is best improved by adding new sample points at frequencies that minimize the magnitude of the denominator Q(z) appearing in the surrogate G.Such denominator is computable even in a non-intrusive setting.In fact, it is precisely the same denominator appearing in the surrogate H!
The resulting strategy is summarized by the rule: We note that, in order to apply (12), it is not necessary to compute the scaling constant γ.This is crucial for non-intrusive methods, since γ cannot be estimated non-intrusively.Before proceeding, we mention that we can employ the standard link between error and residual to obtain a relative error bound for G. Specifically, let K(•) denote the (Euclidean) condition number of a matrix, i.e., the ratio of its largest and smallest singular values.Then, , with γ as in Proposition 1.This might be of some interest for stateapproximation methods like MRI and reduced basis.

From state residual to output error estimation
A naturally arising question is: does it make sense to drive a greedy algorithm for the output transfer function H with a residual estimator for the state transfer function G? To answer this, we derive an H-oriented version of Proposition 1.
Proposition 2 Let H be an interpolatory barycentric function as in (5).The relative approximation error satisfies Above, the factor ∆ is with B ∈ C n×m and γ > 0 depending only on q 1 , . . ., q S , on z 1 , . . ., z S , and on the system matrices.Also, ∥•∥ ⋆ denotes the spectral norm of a matrix, i.e., its largest singular value.Additionally, assume that all eigenvalues of the pencil (A, E) are poles of H, and that their multiplicities as eigenvalues equal their orders as poles, i.e., that B and C are in general position with respect to the generalized eigenspaces of (A, E).Then, ∆ is uniformly bounded from above by a global (z-independent) constant ∆, depending only on the system matrices and on γ: ∆(z) ≤ ∆ for all z.
Proof First, we define G as in (9), by reusing the barycentric coefficients of H. Since H = CG and H = C G, we have Toward a certified greedy Loewner framework with minimal sampling with the residual r defined as in the proof of Proposition 1.The first claim (13) follows by applying (11).Specifically, expression ( 14) can be derived by setting q j G(z j ) and γ = ∥C∥ ∥ B∥.
Now it remains to bound ∆ from above.For simplicity, we avoid a rigorous derivation, opting instead to only sketch the remainder of the proof.
We rely on the upper bound in the right-hand-side of (14).First, we assume that z is far enough away from the poles of H (i.e., the spectrum of (A, E)).Then, (i) the numerator is bounded from above and (ii) the denominator is bounded from below by δ.
Otherwise, assume that z is close to a pole λ of H, whose order is ν.Then, both numerator and denominator are large, since both have poles at λ. Specifically, by our assumption, the pole λ has order ν for both numerator and denominator of ∆.As such, both numerator and denominator have a leading term of order |z − λ| −ν : for some α, β > 0, which are related to the norms of the residues of Φ (cf.( 2)) and of H at λ.As such, ∆ is uniformly bounded even on neighborhoods of the poles of H. □ According to Proposition 2, the error for the output transfer function behaves, with respect to z, almost like the residual for the state transfer function.The only additional z-dependent factor is ∆, which plays the role of "condition number" in moving from residual to error.Unfortunately, such term depends on the spectral properties of the system matrices, and cannot be evaluated non-intrusively.
Also, ∆ depends on z.As such, in general, the point with the largest error, namely, z ⋆ S+1 := arg max z ε( H, z), will be different from the point with the largest residual, namely, z S+1 in (12).However, Proposition 2 also states that ∆ is uniformly bounded.As such, we can hope that it does not vary too wildly.Intuitively, this "hope" is justified by the "balanced" structure of ∆, whose numerator and denominator both contain the resolvent (zE − A) −1 , as shown in (14).
Specifically, if ∆ is well-behaved (namely, if it only varies slightly over the considered frequency range), then we can expect the greedily built sets {z ⋆ j+1 } S j=1 and {z j+1 } S j=1 to be similar.Otherwise stated, we would like the (asymptotic) distribution of greedy sample points to be approximately the same with both criteria.We rely on this requirement (stated only informally here) to justify the use of our criterion (12) to drive the greedy sampling even when the target is minimizing the output approximation error ε.

Practical considerations for error maximization
Locating the next frequency to be sampled (denoted by z S+1 in Algorithm 1) is extremely efficient when using (12).Indeed, one can easily apply the "standard" greedy technique based on a set of test frequencies, already described in Section 2.1.To this aim, one chooses a fine set of S ′ test frequencies a priori.Then, at each greedy iteration, one evaluates the reciprocal of the indicator |Q| = | S j=1 q j /(• − z j )| at all test frequencies and finds the smallest entry of the corresponding size-S ′ vector.This is extremely efficient, since evaluating |Q| only involves scalar operations.
Alternative approaches are also possible, based on an explicit continuous minimization of |Q| on the whole frequency range, as opposed to just on S ′ discrete test frequencies.For this, it is crucial to observe that the barycentric denominator Q is the ratio of two polynomials, one in Lagrangian form and one in nodal form: Accordingly, |Q| attains its (global) minima at the roots of the numerator above, as long as they are distinct from the support points.Such roots λ can be efficiently computed by solving a generalized eigenvalue problem of size S + 1: See [21] for more details on this.

Greedy termination criteria
In choosing |Q| −1 as indicator, we are neglecting the additional scaling factors γ and ∆ in ( 10) and ( 13), respectively.The reason for doing this is that such constants are unavailable within the non-intrusive paradigm, as they depend on the high-fidelity system matrices.Consequently, so far, we have the ability to perform adaptive sampling, but we are unable to decide whether a given surrogate H is accurate enough, since we cannot estimate the magnitude of the error.As such, we do not yet have a termination criterion for Algorithm 1.
In the next sections, we discuss some options for doing this.

Maximum number or density of samples
The most straightforward solution to the above-mentioned problem is simply not having an error-based stopping criterion at all.In this case, one has to stop the greedy iterations only based on the number and location of the sampled frequencies.For instance, one could simply set a maximum number of samples S max .An alternative slightly more refined solution relies on setting a bound for the density of samples: the algorithm ends when too many samples are taken too close to each other.

Look-ahead error estimation
As already mentioned in Section 2.1, the ideal termination criterion ( 7) is unfeasible, since it requires evaluating the high-fidelity response H at all frequencies.In the same section, we have also described how the continuous maximization can be replaced by a discrete one, where the maximum is computed over a finite (but large) set of test frequencies.
As an even cheaper alternative, we can replace the continuous maximization in (7) with a single point evaluation of the relative error: Above, z ′ is some unexplored frequency, which should be carefully chosen so as to provide an insight on the largest approximation error: ideally, ε 1 ≈ max z ε( H, z).To this aim, we propose to select z ′ as the location that maximizes the error indicator |Q| −1 , i.e., z ′ := z S+1 .By Propositions 1 and 2, this value of z ′ is located where the residual is largest, and where the error is heuristically expected to be largest.Note that, since Q changes with H, such test point z ′ is different at each greedy iteration.
A second purpose is also served by choosing z ′ = z S+1 : evaluating the relative error at that point requires the sample H(z S+1 ).This is exactly the sample that is computed anyway, whenever the termination condition is not satisfied!This means that, by applying the 1-point criterion (16), only one expensive sample is wasted, namely, the final one used for testing.In effect, this means that the total "error estimation" cost in the greedy algorithm is just 1 high-fidelity sample, independently of the final number S of samples.

Look-ahead error estimation with memory
The main drawback of the previous idea is its lack of robustness.Indeed, we are approximating a max-norm by a single sample, albeit carefully (but heuristically) chosen.Consequently, by using ( 16), one incurs the risk of underestimating the maximum approximation error.A simple way of increasing the robustness of this approach is introducing a "memory" term: we terminate the greedy sampling loop only if the termination criterion ( 16) is satisfied by N memory greedy iterations in a row, with N memory ≥ 2 fixed in advance.We summarize this idea in Algorithm 2, which generalizes Algorithm 1.
As we will show numerically, this strategy, despite its simplicity, can be surprisingly effective.Essentially, this approach relies on the idea that, although the indicator |Q| −1 might occasionally poorly identify the location of the Algorithm 2 Surrogate modeling with greedy sampling and memory find the next point z S+1 using a greedy criterion compute H(z S+1 ) and add z S+1 to the sampled frequencies 15: end for largest error, it does not do it persistently: one or two "bad" estimations of the error maximum are usually followed by a "good" one.

Batch look-ahead error estimation
Another simple way of increasing the robustness of the look-ahead approach is adding more than one sample point to estimate the max-norm of the error: Again, the test frequencies z ′ 1 , . . ., z ′ N should be carefully placed so as to well identify the largest approximation error.To this aim, one can choose local maxima of the error indicator |Q| −1 .As already mentioned, these maxima can be found by solving the low-dimensional root-finding problem (15).Note that, once again, such test points change from one greedy iteration to the next.
At each greedy iteration, this "batch" approach for error estimation requires N extra expensive computations of the high-fidelity transfer function H.However, we observe that: • thanks to Proposition 2, there are reasons to believe that a small or modest value of N should be enough for a robust error estimation, since |Q| −1 should be fairly reliable at estimating the error; • the extra expensive computations are relatively few (∼ SN ), and they affect only the "offline" training phase of the surrogate.
Moreover, we note that, if the termination criterion (17) is not satisfied (i.e., if the surrogate is still too inaccurate), Algorithm 1 adds only a single new sample point at z S+1 .However, a modification of Algorithm 1 is possible, where all N test points z ′ 1 , . . ., z ′ N (among which z S+1 can also be found) are added to the training set.This comes at no additional computational cost, since their corresponding transfer function values are already available from the testing step.This can potentially remove the extra factor "N " in the cost of the offline training.For simplicity, we ignore this option here.

Randomized error estimation
The previous approach can be made even more robust, by choosing the test points z ′ 1 , . . ., z ′ N in ( 17) independently of the surrogate.Specifically, one can simply select the test points randomly, according to some (quasi-)random distribution.This has the advantage of being a (probabilistically) rigorous estimator, with theoretical bounds in probability [22].However, for a reliable error estimation, the number N of test samples will generally be much larger than the one from the Q-based "batch" method from the previous section: we need enough samples to approximate the max-norm well.This can significantly increase the computational cost of the training procedure.To mitigate this effect, one should fix the N test points once and for all, at the very beginning of the greedy sampling.In this way, the number of expensive test samples is a fixed overhead, which does not scale with S.
On top of the additional offline cost, the resulting approach has another drawback: the number N must be chosen a priori.At a fundamental level, this goes against the "adaptivity" brought forward by the present paper: how can N be chosen well, without (intrusive) knowledge of the underlying system?

Numerical examples
In this section, we showcase our approach by applying it to three different MIMO examples from the SLICOT library 2 .Our code is open-source, publicly available at https://github.com/pradovera/greedy-loewner.
In all our experiments, MIMO LF is used to build the surrogate transfer function H from the given data {(z j , H(z j ))} S j=1 .Specifically, we interpolate full p×m blocks of H as in (6), instead of applying the more "classical" MIMO LF based on tangential interpolation [4].
In all cases, we use our proposed greedy method from Section 3: at each greedy iteration, to find z S+1 , the indicator |Q| −1 is maximized over a fine grid of 10 4 geometrically spaced test frequencies.We always set tol = 10 −3 and δ = 10 −8 .

Example #1: MNA 4
We consider the impedance formulation of a 4-port electrical circuit, obtained by modified nodal analysis.This results in a descriptor system whose sizes are n = 980 and p = m = 4.We look at a range of imaginary frequencies z ∈ [3•10 4 , 3•10 9 ]ı, with ı the imaginary unit.The system response is shown in Fig. 1 (top), and, to approximate it, we apply our proposed greedy method.As   termination condition, we test two options: look-ahead error estimation and randomized error estimation.For the latter, we take N = 100 test frequencies z ′ 1 , . . ., z ′ N , drawn from a log-uniform distribution on the frequency range.As already mentioned, the random estimator comes with one main computational drawback: the additional overhead of computing the N test samples.
Both approaches converge in 9 greedy iterations.The sampled frequencies are chosen according to the same indicator, so that the surrogates obtained with the two methods are actually identical.Only the error estimates used for the termination condition are different.We do not display the surrogate transfer functions H in Fig. 1 (top), since they are indistinguishable from the exact H.We show the sampled frequencies in Fig. 1 (middle).In Fig. 1 (bottom), we also show the approximation error, which is uniformly below the tolerance.
Moreover, we test how accurate the two methods are at estimating the error.To this aim, we consider the following quantities: for the look-ahead method, cf. ( 16), and , where z ′ j ⋆ is the argmax in ( 17), (18) for the randomized method.We can see that both of them are simply rescaled versions of the indicator |Q| −1 .The scaling factors are chosen so that they interpolate the error estimate that each method employs: η LA (z S+1 ) = ε 1 for the look-ahead method, and η R (z ′ j ⋆ ) = ε N for the randomized method.Effectively, η LA and η R are the error estimators that the two methods use.(We call them so because, as opposed to indicators, they also include information on the error magnitude.) We compare the estimators with the exact error in Fig. 1 (bottom).We can observe that both approximate the error quite nicely.Specifically, we know from Proposition 2 that any discrepancy between error and estimators is due to the factor ∆. We note that η LA underestimates the error for large frequencies (z > 10 9 ).In comparison, η R is more robust, and never underestimates the error.
As a final remark, we note that the "look-ahead" approach was able to properly identify the "simplicity" of the system: less than 10 high-fidelity samples suffice in identifying H to the desired accuracy.The same can be said about the "randomized" approach, albeit to a lesser degree, due to the extra overhead related to the N testing samples.In comparison, had a non-adaptive approach like VF or AAA been chosen, one would have probably elected to take (much) more than 10 samples of H, incurring a higher training cost.

Example #2: tline
We consider the impedance formulation of a 2-port electrical circuit modeling a transmission line.The resulting descriptor system has sizes n = 256 and p = m = 2.We look at a range of imaginary frequencies z ∈ [10 7 , 10 15 ]ı.The system response is shown in Fig. 2 (top).We can observe strong resonating dynamics in the frequency band [10 10 , 10 12 ]ı.To approximate H, we apply our greedy method, using the simple look-ahead criterion as termination condition.
The greedy algorithm converges in 20 iterations.We display the corresponding surrogate H in Fig. 2 (top) and the adaptively chosen sample points in Fig. 2 (middle).We can see that the approximation quality is rather poor for some frequencies: H fails to identify and approximate several poles of H.We can understand the reason for this in Fig. 2 (bottom), where we display both the exact error and its estimate η LA : the error is grossly underestimated over a substantial portion of the frequency range.From a theoretical viewpoint, this appears to be due to the high concentration of resonating frequencies around z = 3 • 10 11 ı, which makes the factor ∆ in (13) vary across several orders of magnitude.In our specific experiment, the error estimator fails to identify the magnitude of such factor.For this reason, the maximum of the error estimator is not close to the maximum of the actual error, resulting in an early termination of the greedy sampling.
In Section 4, we presented several ways of counteracting this issue.Here, we choose the simplest one: introducing a memory term.Specifically, we set N memory = 3 in Algorithm 2. With this modification, the algorithm stops after 38 iterations instead of just 20.
The corresponding results are in Fig. 3.In Fig. 3 (bottom) we see that, again, the error is underestimated over much of the frequency range.However, the addition of the memory term artificially increased the number of greedy iterations.Thanks to this effect, the approximation error is now uniformly below the tolerance.Also, in Fig. 4 we display the "truth value" of the termination criterion (i.e., whether ( 16) is true or false) as the greedy iterations proceed.We can see that, despite a few short "false-negative" streaks, the estimator seems to be able to recognize and correct its own mistakes.
In this benchmark, we note that adaptivity is very effective at identifying the sub-range of frequencies where the response varies the most.If we had employed uniform or log-uniform sampling, which are the golden standards for VF and AAA, we estimate that at least 500 samples would have been necessary to attain the prescribed accuracy.

Example #3: iss
Lastly, we test our algorithm on a discrete frequency-domain structural response model of a module of the international space station.The resulting system has sizes n = 270 and p = m = 3.We look at a range of imaginary frequencies z ∈ [0.1, 50]ı.The system response is shown in Fig. 5 (top).We can observe that this is essentially a more complicated version of the tline example: the resonating frequencies are more, and the resonating dynamics have less uniform strengths.
We apply our method with look-ahead error estimation, with "memory depth" N memory = 3.The algorithm carries out 100 greedy iterations before terminating.We show the corresponding results in Fig. 5 (bottom).As in the previous case, the error is underestimated at many frequencies.
As a way to improve the quality of the estimator, we test the "batch error estimation" idea from Section 4: at each greedy iteration, we compute the exact error at the locations z ′ 1 , . . ., z ′ N of the N = 5 largest local maxima of the error indicator |Q| −1 .Then we take the maximum of such 5 errors and we compare it with tol to determine whether to continue or not.
This criterion is more strict than the "1-point" one used before.As a result, 12 extra greedy iterations are carried The corresponding error estimator η BLA , defined as in (18), is shown in Fig. 6.We can observe that the error bound is much more reliable.The resulting approximation error is almost uniformly below the tolerance.Note, however, that, compared to the simple "1point" method, the "batch" method had to compute approximately (N − 1)S extra high-fidelity samples for testing purposes, effectively increasing the offline training cost by a factor N .
As a third alternative, we also consider the "randomized error estimation" idea, using N = 100 test frequencies, log-uniformly spaced over the frequency range.Similarly to the original 1-point indicator, the greedy algorithm terminates too early (this time after 87 iterations), with the error being underestimated by the estimator η R , cf. (18).A larger value of N might improve the reliability of the randomized estimator, but, of course, this information would not have been available a priori.Finally, we compare the three error estimation strategies, by looking at the behavior of the estimators η LA , η BLA , and η R as the greedy iterations proceed.The results are shown in Fig. 7.We can observe that the look-ahead estimators η LA and η BLA often coincide, since the single test frequency used for η LA in (16) belongs to the batch of test frequencies used for η BLA in (17).Notably, both look-ahead estimators appear to oscillate as the iterations proceed.Ultimately, the oscillations are due to the fact that the indicators rely on evaluations of the error at test frequencies that vary at each iteration.The oscillations are particularly severe for the 1-point estimator, which also has several "downward spikes" below the tolerance level, symptoms of the underestimation of the error magnitude.
On the other hand, the randomized estimator η R seems to be more stable and less oscillatory.This different behavior is justified by the fact that η R relies on relative errors computed at test frequencies fixed in advance, which do not change throughout the greedy iterations.From this point of view, we can expect the randomized estimator to achieve a higher robustness, compared to the look-ahead ones.Still, this increased robustness of η R does not necessarily imply reliability, as evidenced by the greedy algorithm terminating too early due to underestimated errors.

Discussion and conclusions
In this work, we have proposed a non-intrusive error indicator for the rational approximation of frequency responses.The indicator is based on the denominator of the rational surrogate in barycentric form, and allows building rational approximations of transfer functions in an incremental greedy fashion, even in non-intrusive settings.The issue of choosing the greedy termination criterion was also discussed, and several approaches were presented for this purpose.
On one hand, our contribution can be compared to other "adaptive" approaches in the standard sense, like AAA.With respect to these methods, ours has the great advantage of not requiring an initial fine sampling.This leads to a lower cost in the training phase, making our method superior especially when evaluating the frequency response is expensive.On the other hand, our error indicator can be supported by a partial theory.This is an improvement over comparable state-of-the-art heuristic error indicators, like those proposed in [13][14][15][16].
Note that we do not claim our adaptive method to build a rational surrogate having minimal order, e.g., in the sense of [23], nor optimality properties, e.g., in the sense of [24].In particular, the final number of sample points may be larger than strictly necessary to achieve the target tolerance.This is due to two effects: (i) our error indicator is not an estimator, so that each new sample point may be placed sub-optimally, compared to the actual maximum of the error, which is out of reach in a non-intrusive setting (and also in most intrusive ones); (ii) greedy approaches, even when using intrusive error estimators, are only guaranteed to yield a sub-optimal approximation [25].In our view, thanks to its generality, our approach has significance despite its sub-optimality, since optimality is often incompatible with useful algorithmic features like non-intrusiveness and sampling efficiency.
Concerning the value of δ in the definition of the adjusted relative error ε, we mention that our discussion generalizes to δ = 0, which corresponds to the usual relative approximation error.However, in this case, ∆ may be unbounded near the zeros of H.Moreover, δ → ∞ is also allowed, which, by scaling tol → δ • tol, corresponds to a greedy algorithm based on the absolute approximation error as opposed to the relative one.In this case, ∆ may be unbounded near the poles of H.
In the scope of improving the stability and reliability of our error estimator, future research directions include: (i) a more careful study and/or estimation of the z-dependent ∆ factor in the error bound, (ii) the development of more reliable termination criteria, possibly improving the "batch" approach, and (iii) a comparison of our proposed method with the above-cited state-of-the-art alternatives, in terms of performance, stability, and effectiveness.
Finally, we mention the possibility of applying the proposed greedy method within the context of parametric problems, for instance by following the nonintrusive parametric MOR framework introduced in [26].

S j=1 |q j | 2
= 1.•LF for MIMO systems (pm > 1), least-squares LF, and AAA.The barycentric coefficients are found by solving a (linearized) least-squares fitting problem on a set of test frequencies:

Fig. 1
Fig.1Results for the MNA 4 example.Top plot: some entries of the exact transfer function H. Middle plot: adaptively sampled frequencies with their index number (in order of addition).Bottom plot: relative approximation error and error estimators.

Fig. 2 Fig. 3 Fig. 4
Fig. 2 Results for the tline example.Top row: exact transfer function H. Middle row: adaptively sampled frequencies.Bottom row: relative approximation error and error estimator.The same results are shown in the two columns of plots: the right figures are zooms on the "most interesting" frequencies.

Fig. 6
Fig. 6 Results for the iss example with batch error estimation.Top plot: adaptively sampled frequencies.Bottom plot: relative approximation error and error estimator.

Fig. 7
Fig.7Evolution of the error estimator for the iss example.

1 :
choose the first sampled frequency z 1 and compute H(z 1 ) 2: choose the memory depth N memory ≥ 1 and set n memory = 0 ▷ new! 3: for S = 1, 2, . . .do Results for the iss example.Top plot: some entries of the exact transfer function H. Middle plot: adaptively sampled frequencies.Bottom plot: relative approximation error and error estimator.