Data-Driven Modeling of Linear Dynamical Systems with Quadratic Output in the AAA Framework

We extend the Adaptive Antoulas-Anderson (AAA) algorithm to develop a data-driven modeling framework for linear systems with quadratic output (LQO). Such systems are characterized by two transfer functions: one corresponding to the linear part of the output and another one to the quadratic part. We first establish the joint barycentric representations and the interpolation theory for the two transfer functions of LQO systems. This analysis leads to the proposed AAA-LQO algorithm. We show that by interpolating the transfer function values on a subset of samples together with imposing a least-squares minimization on the rest, we construct reliable data-driven LQO models. Two numerical test cases illustrate the efficiency of the proposed method.


Introduction
Model order reduction (MOR) is used to approximate large-scale dynamical systems with smaller ones that ideally have similar response characteristics to the original. This has been an active research area and many approaches to MOR have been proposed. We refer the reader to [1,3,6,9,37,39] and the references therein for an overview of MOR methods for both linear and nonlinear dynamical systems.
MOR, as the name implies, assumes access to a full order model to be reduced; in most cases, in the form of a state-space formulation obtained via, e.g., a spatial discretization of the underlying partial differential equations. Then, the reduced order quantities are computed B Ion Victor Gosea gosea@mpi-magdeburg.mpg.de Serkan Gugercin gugercin@vt.edu 1 via an explicit projection of the full-order quantities. However, in some cases, access to the original (full order) dynamics is not available. Instead, one has access to an abundant amount of data, such as input/output measurements, snapshots of the state variable in the time domain, or evaluations of the transfer function(s) in the frequency domain. In this case, the goal is to construct an approximant (surrogate model) directly from data, which we refer to as data-driven modeling. This is the framework we consider in this paper. Such scenarios arise frequently in many applications such as circuit modeling where the modeling of distributed/integrated circuits characterized by many components is done by the frequencydomain data using, e.g., the S-parameters [27]. Structural dynamics is another example. Even when a mathematical model of a highly complex physical structure is not available, the structural (displacement and velocity) time and frequency domain responses can be measured accurately at specific locations on the structures, thanks to the advances in testing capabilities and the near ubiquitous deployment of high bandwidth sensing [30]. We refer the reader to [3,5,8,12,16,26,36] and the references therein for more details on data-driven modeling.
Specifically, we focus on data-driven modeling of linear dynamical systems with quadratic output (LQO). In our formulation, data correspond to frequency domain samples of the input/output mapping of the underlying LQO system, in the form of samples of its two transfer functions: the first transfer function being a single-variable one and the second a bivariate one. For this data set, the proposed framework first develops the barycentric rational interpolation theory for LQO systems to interpolate a subset of the data and then extends the AAA algorithm [32] to this setting by minimizing a least-square measure in the remaining data. We note that system identification of general nonlinear systems has been a popular topic. In particular, we mention here the special case of identifying linear systems with nonlinear output or input functions, e.g., the so-called Wiener [43] and Hammerstein models, respectively. Significant effort has been allocated for identification of such models; see, for example, [17,23], and the references therein. Nevertheless, the methods previously mentioned are based in the time domain, while in this paper we focus on frequency domain data. We point out that the frequency-data based Loewner framework was recently extended to identifying Hammerstein models in [24].
The rest of the paper is organized as follows: We discuss LQO systems and their transfer functions in Sect. 2, followed by a review of barycentric rational approximation for linear systems and the AAA algorithm in Sect. 3. Next, we develop the theory for barycentric representation and multivariate interpolation for LQO systems in Sect. 4. Based on this analysis, in Sect. 5, we present the proposed algorithm, AAA-LQO, for data-driven modeling of LQO systems. The numerical experiments are given in Sect. 6 followed by the conclusions in Sect. 7.

Linear Systems with Quadratic Output
In state-space form, linear dynamical systems with quadratic output (LQO systems) are described as LQO : where A ∈ R N ×N , b, c ∈ R N , K ∈ R 1×N 2 , and the symbol ⊗ denotes the Kronecker product, i.e., for the vector x = [x 1 x 2 · · · x N ] T ∈ R N , we have x ⊗ x = [x 2 1 x 1 x 2 x 1 x 3 · · · x 1 x N · · · x 2 N ] T ∈ R N 2 . In (2.1), x(t) ∈ R N , u(t) ∈ R, and y(t) ∈ R, are, respectively, the states, input, and output of LQO . The quadratic part of the output in (2.1), K x(t) ⊗ x(t) , can be rewritten as where vec(·) denotes the vectorization operation. In some cases, c = 0 in (2.1), and thus the output has only the quadratic term.
The class of dynamical systems (2.1) considered in this paper is particularly useful when the observed quantity of interest is given by the variance or deviation of the state variables from a reference point [7]. Particular examples are random vibrations analysis [29] and applications in which the observed output is expressed as an energy or power quantity [7].
Several projection-based MOR methodologies have been already proposed for LQO systems. More precisely, balanced truncation-type methods were considered in [7,35,41], while interpolation-based methods were used in [19,42]. All these methods explicitly work with the state-space matrices A, b, c and K in (2.1). The main goal of this work is to develop a data-driven modeling framework for LQO systems where only input-output measurements, in the form of transfer function evaluations, are needed as opposed to a state-space representation. Therefore, we first discuss the transfer functions of this special class of dynamical systems.

Transfer Functions of LQO Systems
Many classes of nonlinear systems can be represented in the time domain by generalized kernels as presented in the classical Wiener or Volterra series representations. Generically, infinite number of kernels appear in such series, corresponding to each homogeneous subsystem. For more details we refer the reader to [38,43]. For the LQO system (2.1), the nonlinearity is present in the state-to-output equation only and one can write the input-output mapping of the system in the frequency domain using two transfer functions: (i) one corresponding to the linear part of the output, i.e., y 1 (t) = c T x(t) and (ii) one corresponding to the quadratic part of the output, i.e., y 2 (t) = K(x(t) ⊗ x(t)). These transfer functions were recently derived in [19] using their time-domain representations. In the next result, we introduce and re-derive them for the completeness of the paper and to illustrate to the reader how they naturally appear.

Barycentric Rational Approximation for Linear Systems and the AAA Algorithm
For an underlying function H (·) : C → C, e.g., transfer function of a single-input/singleoutput (SISO) linear dynamical system, assume the following set of measurements: Partition the sampling points into two disjoint sets: We will clarify later how this partitioning is chosen. Based on (3.2), define the sampled values and the corresponding data sets Define the rational function r (s) in the barycentric form [11], a numerically stable representation of rational functions 1 : where ξ k ∈ C are the sampling (support) points and the weights w k ∈ C are to be determined. By construction, the degree-n rational function r (s) in (3.4) is a rational interpolant at the support point set ξ , i.e., r (ξ k ) = h k for k = 1, 2, . . . , n, (3.5) assuming w k = 0. Then, the freedom in choosing the weights {w k } can be used to match the remaining the data h in an appropriate measure. Assuming enough degrees of freedom, [2] chooses the weights {w k } to enforce interpolation of h as well, by computing the null space of the corresponding divided difference matrix, thus obtaining a degree-n rational function interpolating the full data (3.1). We skip the details for the conditions to guarantee the existence and uniqueness of such a rational interpolant and refer the reader to [2,3] for details.
The Adaptive Antoulas-Anderson (AAA) algorithm [32], on the other hand, elegantly combines interpolation and least-squares (LS) fitting. In the barycentric form (3.4), which interpolates the data h by construction, AAA chooses the weights {w k } to minimize a LS error over the data h. Note that the LS problem over h is nonlinear in the weights {w k } since these weights appear in the denominator of r (s) as well. AAA solves a relaxed linearized LS problem instead. For a sampling point ξ i in the set ξ , AAA uses the linearization (3.6) leading to the linearized LS problem AAA is an iterative algorithm and builds the partitioning (3.2) using a greedy search. Assume in step n, AAA has the rational approximant r (s) as in (3.4) corresponding to the partitioning (3.2) where the weights {w k } are selected by solving (3.7). AAA updates (3.2) via a greedy search by finding ξ i ∈ ξ for which the error | r ( ξ i ) − h i | is the largest. This sampling point is then added to the interpolation set ξ , the barycentric rational approximant r (s) in (3.4) is updated accordingly (it has one higher degree now), and the new weights are computed, as before, by solving a linearized LS problem. The procedure is repeated until either a desired order or an error tolerance is obtained. For further details, we refer the reader to the original source [32]. The AAA algorithm proved very flexible and effective, and has been employed in various applications such as rational approximation over disconnected domains [32], solving nonlinear eigenvalue problems [28], modeling of parametrized dynamics [13], and approximation of matrix-valued functions [20].

Barycentric Representations for LQO Systems
To develop interpolating barycentric forms for H 1 (s) and H 2 (s, z), we first need to specify the data corresponding to the underlying LQO system LQO . The first transfer function H 1 (s) of LQO is a single-variable rational function and, as in Sect. 3, we sample H 1 (s) at distinct points {s 1 , . . . , s N s } to obtain the data set The second transfer function H 2 (s, z), on the other hand, is a function of two-variables. Therefore, in agreement with the data (4.1), we will sample H 2 (s, z) at the corresponding rectangular grid: for i, j = 1, 2, . . . , N s , Partition the full set of sampling points into two disjoint sets and define the sampled values (measurements): and Then, the goal is to a construct a data-driven LQO system directly from these samples without access to the internal dynamics of LQO . The partition (4.3) and the error measure used in approximating the data will be clarified later. First we will show how the data in (4.1) and (4.2) can be used to develop barycentric-like representations corresponding to an LQO system. We will use the notation r 1 (s) to denote the rational approximation to H 1 (s) and r 2 (s, z) to H 2 (s, z).
interpolates the data in (4.4). Let e ∈ C n denote the vector of ones. Define (4.7) Then, r 1 (s) has the state-space form whereÎ n is the identity matrix of dimension n × n.
Proof The fact that r 1 (s) is an interpolating rational function for the data (4.4) is just a restatement of (3.5) for completeness. To prove (4.8), we will use the Sherman-Morrison formula [18]: Let M ∈ C n×n be invertible and u, v ∈ C n be such that 1 + v * M −1 u = 0 where (·) * denotes the conjugate transpose. Then, From (4.7) and (4.8), we have (4.10) To simplify the notation, letˆ s = sÎ n − A. Then, applying the Sherman-Morrison formula to the middle term in (4.10) with M =ˆ s , u = b, and v = e, we obtain Since is diagonal, Then, using the definitions of b and c in (4.7), we obtain Substituting these last two equalities into (4.11) yields (4.8).
We note that state-space realizations for rational functions are unique up to a similarity transformation. For other equivalent state-space representations of a barycentric form, we refer the reader to, e.g., [5,28]. Given the samples of H 1 (s) (data in (4.4)) of the LQO system (2.1), Proposition 4.1 constructs the linear part of the data-driven LQO model, directly from these samples. What we need to achieve next is to use the H 2 (s, z) samples (data in (4.5)) to construct a twovariable rational function r 2 (s, z) in a barycentric-like form corresponding to the quadratic part of the data-driven LQO model. However, r 2 (s, z) cannot be constructed independently from r 1 (s). Once r 2 (s, z) is constructed, we should be able to interpret r 1 (s) and r 2 (s, z) together as the linear and quadratic transfer functions of a single LQO system. This is the precise reason why we cannot simply view r 2 (s, z) as an independent two-variable rational function and use the classical multivariate barycentric form [3,4]. Therefore, r 2 (s, z) needs to have the form where A and b are the same matrices from (4.7) used in modeling r 1 (s) andK ∈ C 1×n 2 is the (quadratic) free variable that will incorporate to model the new data (4.5). The next result achieves this goal. . (4.13) Then, r 2 (s, z) interpolates the data (4.5), i.e., (4.14) DefineM ∈ C n×n andK ∈ C 1×n 2 using Then, r 2 (s, z) has the state-space form Proof To prove the interpolation property (4.14) of the barycentric representation (4.13), we start by introducing various polynomials in one or two variables: Then, evaluate r 2 (s, z) at s = ξ i and z = ξ j to obtain To prove (4.16), we first note that where we used the fact as shown in deriving (4.11). Sinceˆ s diagonal, we have Then, substituting into (4.20) the definition ofK from (4.15) and the second formula in (4.12), we obtain (4.21) which concludes the proof.
(4. 22) In others words, the first (linear) transfer function of LQO is r 1 (s) and its second transfer function is r 2 (s, z).
Recall the partitioning of sampling points in (4.3). Theorem 4.1 has shown that r 2 (s, z) interpolates H 2 (s, z) over the sampling set ξ × ξ . What is the value of r 2 (s, z) over the mixed sampling sets ξ × ξ and ξ × ξ ? Even though we do not enforce interpolation over these sets, in Sect. 5 we will need a closed-form expression for the value of r 2 (s, z) over ξ × ξ and ξ × ξ . The next lemma establishes these results. Lemma 4.1 Let r 2 (s, z) be as defined in (4.13) corresponding to the sampling points in (4.3) and the data in (4.2). Then, Proof Proof is given in at the end of the paper.
It is important to note that the numerators and denominators of r 2 (ξ i , ξ j ) and r 2 ( ξ j , ξ i ) in (4.23) are linear in the weights w . This is in contrast to the general form of r 2 (s, z) in (4.21) where both the numerator and denominator are quadratic in w when evaluated over ξ × ξ .

Proposed Framework for Data-Driven Modeling of LQO Systems
Section 4 established the necessary ingredients to extend AAA to LQO systems. Given the measurements (4.1) and (4.2), Proposition 4.1 and Theorem 4.1 show how to construct the barycentric forms r 1 (s) and r 2 (s, z) interpolating this data in accordance with the partitioning (4.3). Furthermore, Corollary 4.1 states that r 1 (s) and r 2 (s, z) together correspond to an interpolatory LQO system. Based on these results, we will now fully develop the AAA framework for LQO systems. The resulting algorithm will be denoted by AAA-LQO.
AAA-LQO will be an iterative algorithm, adding one degree of freedom to the current datadriven LQO model in every iteration step. To emphasize the iterative nature of the algorithm, in the nth step of AAA-LQO, we will use the notation r  2 (s, z) in (4.6) and (4.13). Then, in Sect. 5.2 we establish a greedy search procedure for updating the partitioning (4.3). The algorithm will then continue with the LS minimization for the updated partitioning at the (n + 1)th step to construct r (n+1) 1 (s) and r (n+1) 2 (s, z). AAA-LQO will terminate after a desired error criterion is met or a maximum allowed order is achieved as explained in Sect. 5.3.

A Combined LS Measure for Computing the Barycentric Weights for the Current Partition
Even though this section introduces and investigates the LS problem in the nth step of AAA-LQO, to simplify the notation for the complicated expressions appearing in the analysis, we will drop the superscript n and use r 1 (s) and r 2 (s, z) instead (as we did in Sect. 4). However, they should be understood as the approximants in the nth step. We will reintroduce the superscripts in Sect. 5.2. For the full LQO data (4.1) and (4.2), we recall (and repeat) the partitioning of the sampling points as in (4 .3): Then, r 1 (s) interpolates H 1 (s) over ξ (i.e., it interpolates the data (4.4)) and r 2 (s, z) interpolates H 2 (s, z) over ξ × ξ (i.e., it interpolates the data (4.5)). Also together, r 1 (s) and r 2 (s, z) correspond to an LQO system. The only remaining degrees of freedom in defining r 1 (s) and r 2 (s, z), and thus the corresponding LQO system, are the barycentric weights {w 1 , . . . , w n }.
We will choose those weights to minimize an appropriate error measure in the uninterpolated data corresponding to the sampling points ξ . We first introduce the notation for these uninterpolated values: 2 Let w ∈ C n denote the vector of weights to be determined: A reasonable error measure to minimize is the LS distance in the uninterpolated data, leading to the minimization problem min w =0 where 1) j,i ) 2 , and (5.9) (5.10) As in the original AAA for linear dynamical systems, the LS problem (5.6) is nonlinear in w for LQO systems. The formulation is more complicated here due to the additional r 2 (s, z) term.
To resolve this numerical difficulty, we will employ a strategy, similar to the lineraziation step in (3.6), and solve a relaxed optimization problem. However, the resulting LS problem in our case will still be nonlinear, yet much easier to solve than (5.6). In the end, we will tackle the original nonlinear LS problem (5.6) by solving a sequence of quadratic LS problems. We note that in (5.7)-(5.10), we scale every error term J i with the number of data points in it.

Quadraticized LS Problem in Step n
We show how to relax each J i term in the nonlinear LS problem (5.6). The resulting problem will play a crucial role in the proposed iterative algorithm (Sect. 5.3).
Linearizing J 1 The ith term of J 1 in (5.7), namely r 1 ( ξ i ) − h i , is the same as in (3.6). This is natural since r 1 (s) corresponds to the linear part of the LQO system. Thus, we linearize J 1 similar to (3.6). Write r 1 (s) as r 1 (s) = p 1 (s)/q 1 (s), as defined in (4.6). Then, the ith term in (5.7) is linearized as Substituting p 1 (s) and q 1 (s) from the definition of r 1 (s) in (4.6) into (5.11), one obtains For a matrix X, let (X) i j denote its (i j)th entry. Similarly, for a vector x, let (x) i denote its ith entry. Define the Loewner matrix L ∈ C (N s −n)×n with 13) and the vector h ∈ C N s −n with h i = h i . Then, Therefore, the J 1 term in (5.7) will be relaxed to (5.14) Linearizing J 2 and J 3 Now we extend the linearization strategy used in J 1 , which only involved the single-variable function r 1 (s), to the error terms J 2 and J 3 , which involve r 2 (s, z). The closed-form expressions for r 2 (ξ i , ξ j ) and r 2 ( ξ j , ξ i ) we derived in Lemma 4.1 will prove fundamental in achieving these goals. We start with J 2 . Write r 2 (s, z) = p 2 (s, z)/q 2 (s, z) as in (4.18). Then, linearizing the (i j)th term in (5.8) means We substitute p 2 (ξ i , ξ j ) and q 2 (ξ i , ξ j ) from (4.23) into (5.15) to obtain Define the indexing variable α i j = (i − 1)(N s − n) + j and let h (1,2) ∈ C n(N s −n) be the vector defined as Define the Loewner matrix L (1,2) ∈ C n(N s −n)×n with entries yielding the linearization of J 2 : Using similar arguments and the explicit formula for r 2 ( ξ j , ξ i ) from (4.23), the J 3 term in (5.9) is linearized to where the Loewner matrix L (2,1) ∈ C n(N s −n)×n and the vector h (2,1) ∈ C n(N s −n) are defined as Quadraticizing the J 4 term In this section we show how to relax the remaining term, J 4 , in the minimization problem (5.6). Note that this term includes r 2 ( ξ i , ξ j ); i.e., r 2 (s, z) evaluated over ξ × ξ . As we stated earlier, unlike r 2 (ξ i , ξ j ) (r 2 (s, z) over ξ × ξ ) or r 2 ( ξ i , ξ j ) (r 2 (s, z) over ξ × ξ ), the numerator and denominator of the quantity r 2 ( ξ i , ξ j ) is quadratic in the weights w . Therefore, relaxing the (i j)th term in J 4 via multiplying it out with its denominator, will not yield a linear term, but rather a quadratic one. Therefore, even the relaxed problem cannot be solved as a linear LS problem. This is what we establish next.

Solving the LS Problem in Step n
Now we are ready to describe the resulting LS problem to solve in Step n of AAA-LQO.
Combining the relaxations J 1 , J 2 , J 3 , and J 4 as given in (5.14), (5.19), (5.20), and (5.31), in the nth step of the algorithm, we need to solve the quadraticized minimization problem where (5.33) Note that due to the last term, the optimization problem (5.32) is no longer a linear LS problem, nevertheless can be solved efficiently. One can explicitly compute the gradient (and Hessian) of the cost function and can apply a well-established (quasi)-Newton formulation [33]. If we were to have a one-step algorithm whose solution were determined by (5.32), we would employ these techniques. However, solving (5.32) is only one step of our proposed iterative algorithm. As the iteration proceeds (and as n increases), the vector w (and the data-partition) will be updated and the new optimization problem with a larger-dimension needs to be solved. Therefore, we will approximately solve (5.32) in every step.
One can obtain an approximate solution to (5.32) in various ways. In our formulation, we will first solve part of the problem (5.32) that can be written as a linear least-squares problem in w, namely (5.34) The optimization problem (5.34) is a classical linear least-squares problem: Withw, we further relax the last term in (5.32) as Using the equality L (2,2) (w ⊗ w) = L (2,2) (w ⊗Î n )w, we rewrite (5.36) as where the matrix T ∈ C (N s −n) 2 ×n is given by Then, finally using (5.37) in place of the last term in (5.32), we obtain a minimization problem that is now a linear LS problem. Thus, the solution to our final approximation to (5.32) is given by (1,2) h (2,1) h (2,2) Therefore, in the nth step of AAA-LQO, the optimization problem (5.6) is relaxed and the solution of this relaxed problem (the weights) is given by (5.39). The algorithms proceeds with the updated weights as we discuss next. We also note that the second relaxation approach in (5.31) can be replaced by a few steps of a (quasi)-Newton method.

Partition Update via the Greedy Selection
Given the partition (5.1) in the Step n of the algorithm, Sect. 5.1 showed how to choose the barycentric weights w to minimize a joint LS measure over the uninterpolated data set. The only remaining component of the proposed approach is, then, to choose the next support point ξ n+1 and to update the data partition (5.1) (so that we repeat Sect. 5.1 for the updated partition until a desired tolerance achieved.) In other words, we will move one sampling point from the LS set ξ to the interpolation set ξ . Which point to move from ξ to ξ will be done in a greedy manner. To re-emphasize the iterative nature of the overall algorithm, at this Step n of the algorithm, we will denote by r 2 (s, z) the two transfer functions of the current LQO approximant. (Note that we dropped the superscripts in Sect. 5.1 to simplify the notation there.) We start by defining two constants based on the data: (5.40) where denotes the full sampling set = {s 1 , s 2 , . . . , s N s }. For the current approximant in Step n, introduce the two absolute error measures, namely deviations in the linear and quadratic parts: The next support point ξ n+1 is chosen by means of a greedy search over the set \{ξ 1 , . . . , ξ n } using the error measures Now the question is whether to choose either s (n+1) or z (n+1) as ξ n+1 . If only one of them was already a support point, then we choose the other one as ξ n+1 . If neither s (n+1) nor z (n+1) was previously chosen as a support point, then we compare |H 1 (s (n+1) ) − r (n) 1 (s (n+1) )| and 1 (z (n+1) )|, and choose ξ n+1 as the one that yields the higher deviation in the first transfer function. Clearly, both cannot be already a support point due to the interpolation property.

Remark 5.1
Instead of considering the full grid of pairs of sampling points (s, z) and the associated measurements, we could consider a sparser grid for H 2 (s, z) samples. More precisely, instead of using the full grid in (4.2) that contains N 2 s pairs, we could use the sample set {H 2 (s i , s j )} ∈ C where s i , s j ∈ C with i ∈ I, j ∈ J , I, J ⊂ Z + with cardinalities satisfying |I| = N  − n). However, this modification would require changing the greedy selection scheme accordingly to make sure that all possible combinations of selected points appear in the sparser grid. We skip this aspect in our examples and work with the full data set.

The Proposed Algorithm: AAA-LQO
Now, we have all the pieces to describe the algorithmic framework for the proposed method AAA-LQO, the AAA algorithm for LQO systems.
Given the full LQO data (4.1) and (4.2), we initiate the approximant (n = 0) by choosing r  H 2 (s, z) samples. Then, using the greedy selection strategy of Sect. 5.2 we update the partition (5.1) and solve for the barycentric weights as in Sect. 5.1, more specifically by solving (5.39). Let n max denote the largest dimension permitted for the data-driven LQO system LQO and and let denote the relative error tolerance. Then, AAA-LQO terminates either when the prescribed dimension n max is reached, or when the prescribed error tolerance is achieved, namely max(  Remark 5.2 Note that, by choosing complex-conjugate sampling points and sampled values, one can enforce the fitted models to be real-valued. This means that if a particular point ξ i is selected, its conjugate is also automatically selected (ξ i+1 =ξ i ), hence increasing the degree of the fitted functions. This is performed in both examples presented in Sect. 6.

Numerical Examples
We test AAA-LQO, as given in Algorithm 1, on two LQO systems. We also apply the original AAA algorithm (from the linear case) to the data corresponding to the first (linear) transfer function only. Therefore, we construct two approximants: (1) A data-driven LQO approximant using AAA-LQO and (2) A data-driven linear approximant using AAA. Both approximants are real-valued, enforced by using a data set that is closed under complex conjugation.

Example 1
First, we use a single-input/single-output version of the ISS 1R Model from the SLICOT MOR benchmark collection [14]. We construct an LQO system from this linear model by adding a quadratic output with the choice of M = 0.6I 270 + 0.3I which scales the product of the state variable with itself in the output equation. Here, I (k) 270 denotes a quasi-diagonal matrix for which the entries of ones are shifted from the main diagonal based on the integer k (k > 0 stands for shifting upward, while k < 0 is used for shifting downward -also, note that I  Fig. 1, where we display the measurements evaluated only on the "positive side" of the imaginary axis and skip plotting the conjugate data. We apply Algorithm 1 with n max = 30 and τ = 10 −2 (relative tolerance value corresponding to 99% approximation error on the data). With these variables, AAA-LQO yields a data-driven LQO model of order n = 18.
Using only the {H 1 (s i )} samples (corresponding to the linear observation map), we apply AAA and obtain a data-driven linear approximant of order n = 18. The AAA approximant is constructed to simply illustrate that a linear dynamical system approximation is not sufficient to accurately represent the underlying LQO system.
In the top plot of Fig. 2, we show the magnitude of the first transfer function H 1 (s) of the original system together with that of the linear AAA model and the first transfer function (r  Fig. 2, we depict the magnitude of the approximation errors in H 1 (s). The plot reveals that for this specific choice of τ , the AAA-LQO model has a smaller error for most of the frequency values, even in approximating H 1 (s). This happens despite the fact that it focuses on both H 1 (s) and H 2 (s, z) unlike the AAA model, which only tries to approximate H 1 (s). However, we do not claim this to be the general case. We have observed that for some lower values of τ , e.g., τ = 10 −4 , AAA model has outperformed In Fig. 3 we depict the selected support points (interpolation points) for both AAA and AAA-LQO algorithms (without the complex conjugate pairs), as well as the poles of the learned models (i.e., the eigenvalues ofÂ in both cases). Note that there are 9 complex conjugate pairs of support points for each method. Even though some of the support points of AAA and AAA-LQO overlap, two of the pairs are different. This difference causes a big deviation in the the pole pattern as shown in the bottom plot, illustrating that even the linear part of the AAA-LQO approximant, i.e., r (n) 1 (s), is fundamentally different than the linear AAA model.  H 2 (s, z).
To show the overall performance of AAA-LQO in accurately approximating not only H 1 (s) but also H 2 (s, z) (the full LQO behavior), we perform a time-domain simulation of the original LQO system LQO , the data-driven AAA-LQO model LQO , and the linear AAA model by using u(t) = 0.5 cos(4πt) as the control input. During the simulation of the original system LQO , we also compute only the linear part of the output, which the AAA model should approximate well. The results are given in the top plot of Fig. 4. The first observation is that the output of LQO from AAA-LQO accurately replicates the output of LQO . On the other hand, the linear AAA model completely misses the quadratic output and is only able to approximate the linear component in the output, as expected. The approximation error in the output corresponding to LQO is depicted in the bottom plot of Fig. 4.
In Fig. 5 we show the convergence behavior of AAA-LQO by plotting the evolution of the relative approximation errors ( 1 /M 1 error curve from n = 2 to n = 12 results from the fact that during those steps the greedy selection was based on the (n) 2 term, which was the dominant absolute value error term. One can observe that during these steps, (n) 2 /M 2 continues to decay slightly. A more detailed illustration is given in Fig. 6, where the n is varied from 2 to 62.
To investigate how the order of the AAA-LQO model varies based on the stopping tolerance, we set n max = 100 and run AAA-LQO for four tolerance values τ = 10 −2 , τ = 10 −3 , τ = 10 −4 , and τ = 10 −5 . The results are displayed in Table 1. For the case of τ = 10 −5 , in Fig. 6 we depict the convergence behavior of AAA-LQO by plotting

Example 2
This model taken from [35] corresponds to an LQO system whose output measures a variance in the state-variable. A linear mass-spring-damper SISO dynamical system was modified in [34] by means of stochastic modeling, by replacing the physical parameters by independent random variables, yielding a linear dynamical system with multiple outputs. Based on this multiple output system, a SISO LQO system was derived in [35] where the output corresponds to the variance of the original output (and thus is quadratic in nature). We refer the reader to [35] for further details. We obtain the measurements from a version of this model corresponding to an underlying LQO system of order N = 960.  The main difference from the previous example is that, in this model the observed output does not have a linear component and depends on the state variable solely quadratically, i.e., c = 0 in (2.1). This means that H 1 (s) = 0.
As the sampling points {s i }, we choose 60 logarithmically spaced points over the interval [10 −1 , 10 1 ]i together with its conjugate pairs, leading to N s = 120 samples. Since H 1 (s) = 0, we only need to sample H 2 (s i , s j ) for i, j = 1, 2, . . . , N s . The corresponding data for the second transfer function are depicted in Fig. 7.
We apply AAA-LQO with n max = 50 and τ = 10 −3 (relative stopping criterion), obtaining an LQO model of order n = 30. To show the accuracy of the approximant, we perform timedomain simulations of the full model and the approximant with the input u(t) = sin(0.2t). We The corresponding output error is plotted in the bottom plot of Fig. 8. Finally, in Fig. 9 we show the convergence behavior of AAA-LQO by plotting the evolution of relative approximation error

Remark 6.1
Since AAA-LQO uses a greedy selection scheme and is not a descent algorithm, there is no theoretical guarantee that the maximum approximation error will decrease monotonically. This can be seen in Figs. 5, 6 and 9. This behavior was also observed in the original AAA algorithm; see, e.g, Application 6.3 in [32]. However, numerically the error indeed decreases monotonically with n in most cases.

The Case of Noisy Data
In practice, the frequency-domain data to be used in rational approximation algorithms are often corrupted by noise. The recent works [15,20,21] have studied the effects of noisy data on some of the frequency-domain based rational approximation for linear dynamics with linear output, such as the AAA [32], Loewner [31], and RKFIT [10], and Vector Fitting (VF) [22] frameworks. It was illustrated in [20,21] that the methods such as RKFIT, VF and AAA with a (partial) least-squares formulation are more robust to noise in the measurements compared to the purely interpolatory Loewner framework.
In what follows, using the model in Example 2 we present a simple numerical test-case to study the effect of noise on AAA-LQO. We will artificially corrupt the measurements of the second transfer function with uniformly distributed numbers in the interval (0, ζ ), where ζ is the "noise level". We will be using moderate noise levels, i.e., ζ < 10 −3 M 2 , with M 2 defined as in (5.40). We will only perturb H 2 (s, z) in this example since H 1 (s) is zero everywhere. We use the same data as in Sect. 6.2, to which we add uniformly distributed noise.
We apply AAA-LQO with n max = 40 and τ = 10 −2 for various noise levels, and thus obtain LQO models of various orders n as depicted in Table 2. For this experiment, increasing the noise level also increases the order of the fitted LQO system by means of AAA-LQO, which is to be expected. It is to be noted that for the higher level of noise considered here, namely ζ = 3 · 10 −3 M 2 and ζ = 4 · 10 −3 M 2 , the target tolerance value τ = 10 −2 is not reached.
In the top plot in Fig. 10, we show the time-domain response of the full model and AAA-LQO model for the noise level of ζ = 3 · 10 −3 , illustrating that the data-driven model accurately recovers the original model response. We repeat the same experiment with the noise level ζ = 4 · 10 −3 and depict the result in the bottom plot in Fig. 10, illustrating that the data-driven model starts to visibly deviate from the true model response. As expected, the approximation quality decays as the noise level increases.
Even though in this simple experiment AAA-LQO performs well for low to moderate noise levels, a more in-depth theoretical analysis on the robustness of AAA-LQO to noisy data together with algorithmic considerations (stopping criterion based on noise level, regularization etc.) is necessary and will be considered in future work.

Conclusions
We have proposed a novel data-driven modeling method, called AAA-LQO, for linear systems with quadratic outputs (LQO). AAA-LQO extends the AAA algorithm to this new setting by first developing the barycentric representation theory for the two transfer functions arising in the analysis of LQO systems and then formulating a LS minimization framework to efficiently solve for the barycentric coefficients. The two numerical examples illustrate that AAA-LQO provides high-fidelity data-driven approximants to the original model.
The barycentric form we developed here for LQO systems offers promising research directions for modelling systems with general polynomial observation maps, as well as for nonlinearities appearing in the dynamical equation such as bilinear or quadratic-bilinear systems. These topics are the focus of on-going research. and q 2 (ξ i , ξ j ) = M(ξ i , ξ j ) + n k=1 w k m k (ξ i )m( ξ j ) + n =1 w m ( ξ j )m(ξ i ) where M, m, m k , and M k,l are as defined in (4.17). Write r 2 (ξ i , ξ j ) = p 2 (ξ i , ξ j ) q 2 (ξ i , ξ j ) as . (A.1) Introduce the notation By simplifying w i M L i (ξ i , ξ j ) from both the numerator and the denominator in the above expression proves the first desired result in (4.23). The proof for r 2 ( ξ j , ξ i ) follows similarly.