Model reduction of linear multi-agent systems by clustering with H 2 and H ∞ error bounds

In the recent paper (Monshizadeh et al. in IEEE Trans Control Netw Syst 1(2):145–154, 2014. https://doi.org/10.1109/TCNS.2014.2311883), model reduction of leader–follower multi-agent networks by clustering was studied. For such multi-agent networks, a reduced order network is obtained by partitioning the set of nodes in the graph into disjoint sets, called clusters, and associating with each cluster a single, new, node in a reduced network graph. In Monshizadeh et al. (2014), this method was studied for the special case that the agents have single integrator dynamics. For a special class of graph partitions, called almost equitable partitions, an explicit formula was derived for the H 2 model reduction error. In the present paper, we will extend and generalize the results from Monshizadeh et al. (2014) in a number of directions.


Introduction
In the last few decades, the world has become increasingly connected. This has brought a significant interest to complex networks, smart-grids, distributed systems, transportation networks, biological networks, and networked multi-agent systems, see, e.g., [2,10,28]. Widely studied topics in networked systems have been the problems of consensus and synchronization, see [19,20,27,30]. Other important subjects in the theory of networked systems are flocking, formation control, sensor placement, and controllability of networks, see, e.g., [8,9,11,12,24,29,34].
Analysis and controller design for large-scale complex networks can become very expensive from a computational point of view, especially for problems where the complexity of the network scales as a power of the number of nodes it contains. In order to tackle this problem, there is a need for methods and procedures to approximate the original networks by smaller, less complex ones.
Direct application of established model reduction techniques, such as balanced truncation, Hankel-norm approximation, and Krylov subspace methods, see, e.g., [1,3], to the dynamical models of networked systems generally leads to a collapse of the network structure, as well as the loss of important properties such as consensus or synchrony.
Model reduction techniques specifically for networked multi-agent systems with first-order agents have been proposed in [6,15,16,22]. Extensions to second-order agents have been considered in [7,14] and to more general higher-order agents in [4,17,23,25]. Some of these methods are based on clustering nodes in the network. With clustering, the idea is to partition the set of nodes in the network graph into disjoint sets called clusters, and to associate with each cluster a single, new, node in the reduced network, thus reducing the number of nodes and connections and the complexity of the network topology. For a review on clustering in data mining see, e.g., [18].
In [26], model reduction by clustering was put in the context of model order reduction by Petrov-Galerkin projection. The results in [26] provide explicit expressions for the H 2 model reduction error if a leader-follower network with single integrator agent dynamics is clustered using an almost equitable partition of the graph. In the present paper, our aim is to generalize and extend the results in [26] to networks where the agent dynamics is given by an arbitrary multivariable input-state-output system. We also aim at finding explicit formulas and a priori upper bounds for the model reduction error measured in the H ∞ -norm. Finally, we will consider the problem of clustering a network according to arbitrary, not necessarily almost equitable, graph partitions. The main contributions of this paper are the following: 1. We derive an a priori upper bound for the H 2 model reduction error for the case that the agents are represented by an arbitrary input-state-output system. 2. We extend the results in [26] for single integrator dynamics by giving an explicit expression for the H ∞ model reduction error in terms of properties of the given graph partition. 3. We establish an a priori upper bound for the H ∞ model reduction error for the case that the agents are represented by an arbitrary but symmetric input-state-output system. 4. We establish some preliminary results on the model reduction error in case of clustering using an arbitrary, possibly non almost equitable, partition.
The outline of this paper is as follows. In Sect. 2, we introduce some notation and discuss some elementary facts about computing the H 2 -and H ∞ -norm of stable transfer functions needed later on in this paper. In Sect. 3, we formulate our problem of model reduction of leader-follower multi-agent networks. Section 4 reviews some theory on graph partitions and model reduction by clustering and relates this method to Petrov-Galerkin projection of the original network. Also preservation of synchronization is discussed here. In Sect. 5, we provide a priori error bounds on the H 2 model reduction error for networks with arbitrary agent dynamics, clustered using almost equitable partitions. In Sect. 6, we complement these results by providing upper bounds on the H ∞ model reduction error. In Sect. 7, the problem of clustering networks according to general partitions is considered and the first steps toward a priori error bounds on both the H 2 and H ∞ model reduction errors are made. Numerical examples for which we compare the actual errors with the a priori bounds established in this paper are presented in Sect. 8. Finally, Sect. 9 provides some conclusions. To enhance readability, some of the more technical proofs in this paper have been put to "Appendix."

Preliminaries
In this section we briefly introduce some notation and discuss some basic facts on finite-dimensional linear systems. The trace of a square matrix A is denoted by tr(A). The largest singular value of a matrix A is denoted by σ 1 (A). For given real numbers α 1 , α 2 , . . . , α k , we denote by diag(α 1 , α 2 , . . . , α k ) the k × k diagonal matrix with the α i 's on the diagonal. For square matrices A 1 , A 2 , . . . , A k , we use diag(A 1 , A 2 , . . . , A k ) to denote the block diagonal matrix with the A i 's as diagonal blocks. For a given matrix A, let A + denote its Moore-Penrose pseudoinverse.
Consider the input-state-output systeṁ with x ∈ R n , u ∈ R m , y ∈ R p , and transfer function S(s) = C(s I − A) −1 B. If S has all its poles in the open left half complex plane, then its H 2 -norm is defined by If A is Hurwitz, then the H 2 -norm can be computed as where X is the unique positive semi-definite solution of the Lyapunov equation For the purposes of this paper, we also need to deal with the situation when A is not Hurwitz. Let X + (A) denote the unstable subspace of A, i.e., the direct sum of the generalized eigenspaces of A corresponding to its eigenvalues in the closed right half plane. We state the following proposition: A proof of this result can be found in "Appendix A". If S has all its poles in the open left half plane, then its H ∞ -norm is defined by We will now deal with computing the H ∞ -norm. The result is a generalization of Lemma 4 in [16]. For a proof, we refer to "Appendix B." Lemma 1 Consider the system (1). Assume that its transfer function S has all its poles in the open left half plane. If there exists X ∈ R p× p such that X = X T and C A = XC, then S H ∞ = σ 1 (S(0)).
Continuing our effort to compute the H ∞ -norm, we now formulate a lemma that will be instrumental in evaluating a transfer function at the origin. Recall that for a given matrix A, its Moore-Penrose inverse is denoted by A + . Lemma 2 Consider the system (1). If A is symmetric and ker A ⊂ ker C, then 0 is not a pole of the transfer function S and we have S(0) = −C A + B.
This result is proven in "Appendix C." To conclude this section, we briefly review the model reduction technique known as Petrov-Galerkin projection (see also [1]). Definition 1 Consider the system (1). Let W, V ∈ R n×r , with r < n, such that W T V = I . The matrix V W T is then a projector, called a Petrov-Galerkin projector. The reduced order systemẋ withx ∈ R r is called the Petrov-Galerkin projection of the original system (1).

Problem formulation
We consider networks of diffusively coupled linear subsystems. These subsystems, called agents, have identical dynamics; however, a selected subset of the agents, called the leaders, also receives an input from outside the network. The remaining agents are called followers. The network consists of N agents, indexed by i, so i ∈ V := {1, 2, . . . , N }. The subset V L ⊂ V is the index set of the leaders, more explicitly The followers are indexed by V F := V\V L . More specifically, the leaders are represented by the finite-dimensional linear systeṁ whereas the followers have dynamicṡ The weights a i j ≥ 0 represent the coupling strengths of the diffusive coupling between the agents. In this paper, we assume that a i j = a ji for all i, j ∈ V. Also, a ii = 0 for all i ∈ V. Furthermore, x i ∈ R n is the state of agent i, and u ∈ R r is the external input to the leader v . Finally, A ∈ R n×n , B ∈ R n×n , and E ∈ R n×r are real matrices. It is customary to represent the interaction between the agents by the graph G with node set V = {1, 2, . . . , N } and adjacency matrix A = (a i j ). In the setup of this paper, this graph is undirected, reflecting the assumption that A is symmetric. The Laplacian matrix L ∈ R N ×N of the graph G is defined as Recall that the set of leader nodes is V L = {v 1 , v 2 , . . . , v m }, and define the matrix M ∈ R N ×m as . , x N ) and u = col(u 1 , u 2 , . . . , u m ). The total network is then represented byẋ The goal of this paper is to find a reduced order networked system, whose dynamics is a good approximation of the networked system (3). Following [26], the idea to obtain such an approximation is to cluster groups of agents in the network, and to treat each of the resulting clusters as a node in a new, reduced order, network. The reduced order network will again be a leader-follower network, and by the clustering procedure, essential interconnection features of the network will be preserved. We will also require that the synchronization properties of the network are preserved after reduction. We assume that the original network is synchronized, meaning that if the external inputs satisfy u = 0 for = 1, 2, . . . , m, then for all i, j ∈ V, we have as t → ∞. We impose that the reduction procedure preserves this property. In this paper, a standing assumption will be that the graph G of the original network is connected. This is equivalent to the condition that 0 is a simple eigenvalue of the Laplacian L, see [21,Theorem 2.8]. In this case, the network reaches synchronization if and only if (L ⊗ I n )x(t) → 0 as t → ∞.
In order to be able to compare the original network (3) with its reduced order approximation and to make statements about the approximation error, we need a notion of distance between the networks. One way to obtain such notion is to introduce an output associated with the network (3). By doing this, both the original network and its approximation become input-output systems, and we can compare them by looking at the difference of their transfer functions. Being a measure for the disagreement between the states of the agents in (3), we choose y = (L ⊗ I n )x as the output of the original network. Indeed, this output y can be considered a measure of the disagreement in the network, in the sense that y(t) is small if and only if the network is close to being synchronized. Thus, with the original system (3) we now identify the input-state-output system: The state space dimension of (4) is equal to n N , its number of inputs equals to mr, and the number of outputs is n N . In this paper, we will use clustering to obtain a reduced order network, i.e., a network with a reduced number of agents, as an approximation of the original network (4).

Graph partitions and reduction by clustering
We consider networks whose interaction topologies are represented by weighted graphs G with node set V. The graph of the original network (3) is undirected; how-ever, our reduction procedure will lead to networks on directed graphs. As before, the adjacency matrix of the graph G is the matrix A = (a i j ), where a i j ≥ 0 is the weight of the arc from node j to node i. As noted before, the graph is undirected if and only if A is symmetric.
A nonempty subset C ⊂ V is called a cell or cluster of V. A partition of a graph is defined as follows.
When we say that π is a partition of G, we mean that π is a partition of the vertex set V of G. Nodes i and j are called cellmates in π if they belong to the same cell of π . The characteristic vector of a cell C ⊂ V is the N -dimensional column vector p(C) defined as For a given partition π = {C 1 , C 2 , . . . , C k }, consider the cells C p and C q with p = q. For any given node j ∈ C q , we define its degree with respect to C p as the sum of the weights of all arcs from j to i ∈ C p , i.e., the number Next, we will construct a reduced order approximation of (4) by clustering the agents in the network using a partition of G. Let π be a partition of G, and let P := P(π ) be its characteristic matrix. Extending the main idea in [26], we take as reduced order system the Petrov-Galerkin projection of the original system (4), with the following choice for the matrices V and W : The dynamics of the resulting reduced order model is then given bẏ whereL = P T P −1 P T L P ∈ R k×k , It can be seen by inspection that the matrixL is the Laplacian of a weighted directed graph with node set {1, 2, . . . , k}, with k equal to the number of clusters in the partition π , and adjacency matrixÂ = (â pq ), witĥ where d pq ( j) is the degree of j ∈ C q with respect to C p , and |C p | the cardinality of C p . In other words: in the reduced graph, the edge from node q to node p is obtained by summing over all j ∈ C q the weights of all edges to i ∈ C p and dividing this sum by the cardinality of C p . The row sums ofL are indeed equal to zero sinceL1 k = 0. The matrixM ∈ R k×m satisfieŝ where v 1 , v 2 , . . . , v m are the leader nodes, p = 1, 2, . . . , k, and j = 1, 2, . . . , m. Clearly, the state space dimension of the reduced order network (5) is equal to nk, whereas the dimensions mr and n N of the input and output have remained unchanged. Thus, we can investigate the error between the original and reduced order network by looking at the difference of their transfer functions. In the sequel, we will investigate both the H 2 -norm as well as the H ∞ -norm of this difference.
Before doing this, we will now first study the question whether our reduction procedure preserves synchronization. It is important to note that since, by assumption, the original undirected graph is connected, it has a directed spanning tree. It is easily verified that this property is preserved by our clustering procedure. Then, since the property of having a directed spanning tree is equivalent with 0 being a simple eigenvalue of the Laplacian (see [21,Proposition 3.8]), the reduced order LaplacianL has again 0 as a simple eigenvalue. Now assume that the original network (4) is synchronized. It is well known, see, e.g., [33], that this is equivalent with the condition that for each nonzero eigenvalue λ of the Laplacian L the matrix A − λB is Hurwitz. Thus, synchronization is preserved if and only if for each nonzero eigenvalueλ of the reduced order LaplacianL the matrix A −λB is Hurwitz.
Unfortunately, in general A − λB Hurwitz for all nonzero λ ∈ σ (L) does not imply that A −λB Hurwitz for all nonzero λ ∈ σ (L). An exception is the "single integrator" case A = 0 and B = 1, where this condition is trivially satisfied, so in this special case synchronization is preserved. Also if we restrict ourselves to a special type of graph partitions, namely almost equitable partitions, then synchronization turns out to be preserved. We will review this type of partition now.
Again, let G be a weighted, undirected graph, and let π = {C 1 , C 2 , . . . , C k } be a partition of G. Given two clusters C p and C q with p = q, and a given node j ∈ C q , recall that d pq ( j) denotes its degree with respect to C p . We call the partition π an almost equitable partition (in short: an AEP) if for each p, q with p = q, the degree We refer to Fig. 1 for an example of a graph with an AEP. It is a well-known fact (see [5]) that π is an AEP if and only if the image of its characteristic matrix is invariant under the Laplacian.

Lemma 3
Consider the weighed undirected graph G with Laplacian matrix L. Let π be a partition of G with characteristic matrix P := P(π ). Then, π is an AEP if and only if L im P ⊂ im P.
As an immediate consequence, the reduced LaplacianL resulting from an AEP satisfies L P = PL. Indeed, since im P is L-invariant we have L P = P X for some matrix X . Obviously, we must then have X = P T P −1 P T L P =L. From this, it follows that It then readily follows that synchronization is preserved if we cluster according to an AEP: Theorem 1 Assume that the network (4) is synchronized. Let π be an AEP. Then, the reduced order network (5) obtained by clustering according to π is synchronized.
To the best of our knowledge, there is no known polynomial-time algorithm for finding nontrivial AEPs of a given graph, where by "trivial AEPs" we mean the coarsest and the finest partitions ({V} and {{i} : i ∈ V}). There is a polynomial-time algorithm for finding the coarsest AEP which is finer than a given partition (see [35]), but there is no guarantee that it will find a nontrivial AEP. Furthermore, it is not clear whether a given graph has any nontrivial AEPs at all. On the other hand, a graph can have many AEPs, e.g., every partition of a complete unweighted graph is an AEP. Because of this, in Sect. 7 we consider extensions of our results in Sects. 5 and 6, which are based on AEPs, to arbitrary partitions.

H 2 -error bounds
In this section, we will formulate the first main theorem of this paper. The theorem gives an a priori upper bound for the H 2 -norm of the approximation error in the case that we cluster according to an AEP. After formulating the theorem, in the remainder of this section we will establish a proof. The proof will use a sequence of separate lemmas, whose proofs can be found in "Appendix." Before stating the theorem, we will now first discuss some important ingredients. Let S andŜ denote the transfer functions of the original (4) and reduced order network (5), respectively. We will measure the approximation error by the H 2 -norm S −Ŝ H 2 of these transfer functions. An important role will be played by the N − 1 auxiliary input-state-output systemsẋ where λ ranges over the N − 1 nonzero eigenvalues of the Laplacian L. Let S λ (s) = λ(s I − A + λB) −1 E be the transfer matrices of these systems. We assume that the original network (4) is synchronized, so that all of the A−λB are Hurwitz.
Node v i will be called leader i. This leader is an element of cluster C k i for some k i ∈ {1, 2, . . . , k}. We now have the following theorem: Theorem 2 Assume that the network (4) is synchronized. Let π be an AEP of the graph G. The absolute approximation error when clustering G according to π then satisfies where C k i is the set of cellmates of leader i, and Furthermore, the relative approximation error satisfies Remark 1 We see that, with fixed number of agents and fixed number of leaders, the approximation error is equal to 0 if in each cluster that contains a leader, the leader is the only node in that cluster. In general, the upper bound increases if the numbers of cellmates of the leaders increase. The upper bound also depends multiplicatively on the maximal H 2 -norm of the auxiliary systems (6) over all Laplacian eigenvalues in the complement of the spectrum of the reduced LaplacianL. The relative error in addition depends on the minimal H 2 -norm of the auxiliary systems (6) over all nonzero eigenvalues of the Laplacian L.
Remark 2 For the special case that the agents are single integrators (so n = 1, A = 0, B = 1, and E = 1) it is easily seen that S max,H 2 = 1 2 max{λ | λ ∈ σ (L)\σ (L)} and S min,H 2 = 1 2 min{λ | λ ∈ σ (L), λ = 0}. Thus, in the single integrator case the corresponding a priori upper bounds explicitly involve the Laplacian eigenvalues. As already noted in Sect. 1, the single integrator case was also studied in [26] for the slightly different setup that the output equation in the original network (4) is taken as Here, R is the incidence matrix of the graph and W the diagonal matrix with the edge weights on the diagonal (in other words, L = RW R T ). It was shown in [26] that in that case the absolute and relative approximation errors even admit the explicit formulas In the remainder of this section, we will establish a proof of Theorem 2. Being rather technical, most of the proofs will the deferred to "Appendix." As a first step, we establish the following lemma (see also [26], where only the single integrator case was treated): Lemma 4 Let π be an AEP of the graph G. The approximation error when clustering G according to π then satisfies Proof See "Appendix D." Recall that, since π is an AEP, we have σ (L) ⊂ σ (L). Label the eigenvalues of L as 0, λ 2 , λ 3 , . . . , λ N in such a way that 0, λ 2 , λ 3 , . . . , λ k are the eigenvalues ofL. Also, without loss of generality, we assume that π is regularly formed, i.e., all ones in each of the columns of P(π ) are consecutive. One can always relabel the agents in the graph in such a way that this is achieved. For simplicity, we again denote P(π ) by P. Consider now the symmetric matrix Note that the eigenvalues ofL andL coincide. LetÛ be an orthogonal matrix that diagonalizesL. We then havê Next, take U 1 = P P T P − 1 2Û . The columns of U 1 form an orthonormal set: Furthermore, we have that It is easily verified that the first column of U 1 , and thus the first column of U , is given by 1 s, a fact that we will use in the remainder of this paper.
Using the above, we will now first establish explicit formulas for the H 2 -norms of S andŜ separately. The following lemma gives a formula for the H 2 -norm of the original transfer function S: Lemma 5 Let U be as in (9). For i = 2, . . . , N , let X i be the observability Gramian of Proof See "Appendix E." We proceed with finding a formula for the H 2 -norm for the reduced system. This will be dealt with in the following lemma: Then, the H 2 -norm ofŜ is given by: Proof See "Appendix F." We will now combine the previous lemmas and give a proof of Theorem 2.
Proof of Theorem 2 Using Lemma 4, and formulas (10) and (11), we compute where the second equality follows from the fact that Next, observe that (12) can be rewritten as where S λ j for j = k + 1, . . . , N is the transfer function of the auxiliary system (6). An upper bound for this expression is given by Since, by assumption, the partition π is regularly formed, P P T P −1 P T is a block diagonal matrix of the form P P T P −1 P T = diag(P 1 , P 2 , . . . , P k ).
It is easily verified that each P i is a |C i | × |C i | matrix whose elements are all equal to 1 |C i | . The matrix M M T is a diagonal matrix whose diagonal entries are either 0 or 1. We then have that the ith column of P P T P −1 P T M M T is either equal to the ith column of P P T P −1 P T if agent i is a leader, or zero otherwise. It then follows that the diagonal elements of P P T P −1 P T M M T are either zero or 1 |C k i | if i is part of the leader set, where C k i is the cell containing agent i. Hence, we have and consequently, In conclusion, we have which completes the proof of the first part of the theorem. We now prove the statement about the relative error. For this, we will establish a lower bound for S 2 H 2 . By (10), we have The first column of U spans the eigenspace corresponding to the eigenvalue 0 of L and hence must be equal to u 1 = 1 √ N 1 N . LetŪ be such that U = u 1Ū . It is then easily verified using (13) that Finally, since This then yields the upper bound for the relative error as claimed.
Remark 3 Note that by our labeling of the eigenvalues of L, in the formulation of Theorem 2, we have that σ (L)\σ (L) is equal to {λ k+1 , . . . , λ N } used in the proof. We stress that this should not be confused with the notation often used in the literature, where the λ i s are labeled in increasing order.

H ∞ -error bounds
Whereas in the previous section we studied a priori upper bounds for the approximation error in terms of the H 2 -norm, the present section aims at expressing the approximation error in terms of the H ∞ -norm. This section consists of two subsections. In the first subsection, we consider the special case that the agent dynamics is a single integrator system. Here, we obtain an explicit formula for the H ∞ -norm of the error. In the second subsection, we find an upper bound for the H ∞ -error for symmetric systems.

The single integrator case
Here, we consider the special case that the agent dynamics is a single integrator system. In this case, we have A = 0, B = 1, and E = 1 and the original system (4) reduces tȯ The state space dimension of (14) is then simply N , the number of agents. For a given partition π = {C 1 , C 2 , . . . , C k }, the reduced system (5) is now given bẏ where P = P(π ) is again the characteristic matrix of π andx ∈ R k . The transfer functions S andŜ, of the original and reduced system, respectively, are given by The first main result of this section is the following explicit formula for the H ∞ -model reduction error. It complements the formula for the H 2 -error obtained in [26] (see also Remark 2): Theorem 3 Let π be an AEP of the graph G. If the network with single integrator agent dynamics is clustered according to π , then the H ∞ -error is given by We now first show that, since π is an AEP, we have First note thatŜ(s) = L P P T P , where the symmetric matrixL is given by (7). Thus, a state space representation for the error system is given byẋ Next, we show that (15) holds by applying Lemma 1 to system (16). Indeed, with X = −L, we have and from Lemma 1 it then immediately follows that Δ H ∞ = σ 1 (Δ(0)). To compute σ 1 (Δ(0)), we apply Lemma 2 to system (16). First, it is easily verified that By applying Lemma 2 we then obtain Recall thatÛ in (8) is an orthogonal matrix that diagonalizesL and that U 1 = P P T P − 1 2Û . Then,L + =ÛΛ +Û T . Thus, we have Next, we compute where the last equality follows from the fact that the first column of U is 1 Combining (18) and (19) with (17), we obtain From (15) then, we have that the H ∞ -error is given by All that is left is to compute the minimal eigenvalue of M T P P T P −1 P T M. Again, let {v 1 , v 2 , . . . , v m } be the set of leaders and note that M satisfies Again, without loss of generality, assume that π is regularly formed. Then, the matrix P P T P −1 P T is block diagonal where each diagonal block P i is a |C i | × |C i | matrix whose entries are all 1 |C i | . Let k i ∈ {1, 2, . . . , k} be such that v i ∈ C k i . If all the leaders are in different cells, then and so λ min M T P P T P −1 P T M = min Now suppose that two leaders v i and v j are cellmates. Then, we have From (20), (21), and (22), we find the absolute H ∞ -error. To find the relative H ∞error, we compute S H ∞ by applying Lemmas 1 and 2 to the original system (14). Combined with (18), this results in the H ∞ -norm of the original system: This completes the proof.

The general case with symmetric agent dynamics
In this subsection, we return to the general case that the agent dynamics is given by an arbitrary multivariable input-state-output system. Thus, the original and reduced networks are again given by (4) and (5), respectively. As in the proof of Theorem 3, we will rely heavily on Lemma 2 to compute the H ∞ -error. Since Lemma 2 relies on a symmetry argument, we will need to assume that the matrices A and B are both symmetric, which will be a standing assumption in the remainder of this section. We will now establish an a priori upper bound for the H ∞ -norm of the approximation error in the case that we cluster according to an AEP. Again, an important role is played by the N − 1 auxiliary systems (6) with λ ranging over the nonzero eigenvalues of the Laplacian L. Again, let S λ (s) = λ(s I − A + λB) −1 E be their transfer functions. We assume that the original network (4) is synchronized, so that all of the A − λB are Hurwitz. We again use S,Ŝ, and Δ to denote the relevant transfer functions.
The following is the second main theorem of this section: and S min,H ∞ := min with S λ the transfer functions of the auxiliary systems (6). Proof of Theorem 4 First note that the transfer functionŜ of the reduced network (5) is equal tô with the symmetric matrixL given by (7). Analogous to the proof of Theorem 3, we first apply Lemma 1 to the error systeṁ with transfer function Δ. Take X = I N ⊗ A − L ⊗ B. We then have From Lemma 1, we thus obtain that In the proof of Lemma 4, it was shown that By applying Lemma 2 to system (4), we obtain where S λ is again given by (6). Recall thatM = P T P −1 P T M and U 1 = P P T P − 1 2Û . Now apply Lemma 2 to the transfer function (25) of the system (5): Combining the two expressions above, it immediately follows that By taking S max,H ∞ as defined by (23) it then follows that Continuing as in the proof of Theorem 3, we find an upper bound for the H ∞ -error: To compute an upper bound for the relative H ∞ -error, we bound the H ∞ -norm of system (4) from below. Again, letŪ be such that U = u 1Ū and let S min,H ∞ be as defined by (24). From (26) it now follows that Again using Lemma 2, we find a lower bound to the H ∞ -norm of S: which concludes the proof of the theorem.

Toward a priori error bounds for general graph partitions
Up to now, we have only dealt with establishing error bounds for network reduction by clustering using almost equitable partitions of the network graph. Of course, we would also like to obtain error bounds for arbitrary, possibly non almost equitable, partitions. In this section, we present some ideas to address this more general problem. We will first study the single integrator case. Subsequently, we will look at the general case.

The single integrator case
Consider the multi-agent networkẋ As before, assume that the underlying graph G is connected. The network is then synchronized. Let π = {C 1 , C 2 , . . . , C k } be a graph partition, not necessarily an AEP, and let P = P(π ) ∈ R N ×k be its characteristic matrix. As before, the reduced order network is taken to be the Petrov-Galerkin projection of (27) and is represented by Again, let S andŜ be the transfer functions of (27) and (28), respectively. We will address the problem of obtaining a priori upper bounds for S −Ŝ H 2 and S − S H ∞ . We will pursue the following idea: as a first step we will approximate the original Laplacian matrix L (of the original network graph G) by a new Laplacian matrix, denoted by L AEP (corresponding to a "nearby" graph G AEP ) such that the given partition π is an AEP for this new graph G AEP . This new graph G AEP defines a new multi-agent system with transfer function S AEP (s) = L AEP (s I N + L AEP ) −1 M. The reduced order network of S AEP (using the AEP π ) has transfer functionŜ AEP (s) = L AEP P s I k +L AEP −1M . Then, using the triangle inequality, both for p = 2 and p = ∞, we have The idea is to obtain a priori upper bounds for all three terms in (29). We first propose an approximating Laplacian matrix L AEP , and subsequently study the problems of establishing upper bounds for the three terms in (29) separately.
In other words, we want to compute a positive semi-definite matrix L AEP with row sums equal to zero, and with the property that im P is invariant under L AEP (equivalently, the given partition π is an AEP for the new graph). We will show that such an L AEP may correspond to an undirected graph with negative weights. However, it is constrained to be positive semi-definite, so the results of Sects. 4, 5, and 6 in this paper will remain valid.

Theorem 5
The matrix L AEP := P LP + (I N − P)L(I N − P) is the unique solution to the convex optimization problem (30). If L corresponds to a connected graph, then, in fact, ker L AEP = im 1 N .
Proof Clearly, L AEP is symmetric and positive semi-definite since L is. Also, (I N − P)L AEP P = 0 since (I N − P)P = 0. It is also obvious that L AEP 1 N = 0 since P1 N = 1 N . We now show that L AEP uniquely minimizes the distance to L. Let X satisfy the constraints and define Δ = L AEP − X . Then, we have It Thus, we obtain To prove the second statement, let x ∈ ker L AEP , so x T L AEP x = 0. Then, both x T P LP x = 0 and x T (I N − P)L(I N − P)x = 0. This clearly implies LP x = 0 and L(I N − P)x = 0. Since L corresponds to a connected graph, we must have P x ∈ im 1 N and (I N − P)x ∈ im 1 N . We conclude that x ∈ im 1 N , as desired.
As announced above, L AEP may have positive off-diagonal elements, corresponding to a graph with some of its edge weights being negative. For example, for we have so the edge between nodes 3 and 5 has a negative weight. Figure 2 shows the graphs corresponding to L and L AEP . Although L AEP is not necessarily a Laplacian matrix with only nonpositive off-diagonal elements, it has all the properties we associate with a Laplacian matrix. Specifically, it can be checked that all results in this paper remain valid, since they only depend on the symmetric positive semi-definiteness of the Laplacian matrix.
Using the approximating Laplacian L AEP = P LP + (I N − P)L(I N − P) as above, we will now deal with establishing upper bounds for the three terms in (29). We start off with the middle term S AEP −Ŝ AEP H p in (29). According to Remark 2, for p = 2 this term has an upper bound depending on the maximal λ ∈ σ (L AEP )\σ (L AEP ), and on the number of cellmates of the leaders with respect to the partitioning π . For p = ∞, in Theorem 3 this term was expressed in terms of the maximal number of cellmates with respect to the partitioning π (noting that it is equal to 1 in case two or more leaders share the same cell).
Next, we will take a look at the first and third term in (29) It is also easily seen thatL AEP = P T P −1 P T L AEP P = P T P −1 P T L P =L and L AEP P = P P T P −1 P T L P = PL. Therefore, Since, finally, (L P − PL) T (L P − PL) = P T (ΔL) 2 P, for p = 2 and p = ∞, we

S(s)
Thus, both in (31) and (32) the upper bound involves the difference ΔL = L − L AEP between the original Laplacian and its optimal approximation in the set of Laplacian matrices for which the given partition π is an AEP. In a sense, the difference ΔL measures how far π is away from being an AEP for the original graph G. Obviously, ΔL = 0 if and only if π is an AEP for G. In that case only the middle term in (29) is present.

The general case
In this final subsection, we will put forward some ideas to deal with the case that the agent dynamics is a general linear input-state-output system and the given graph partition π , with characteristic matrix P, is not almost equitable. In this case, the original network is given by (4) and the reduced network by (5). Their transfer functions are S andŜ, respectively. Let L AEP andL AEP as in the previous subsection and let As before, we assume that (4) is synchronized, so S is stable. However, since the partition π is no longer assumed to be an AEP, the reduced transfer functionŜ need not be stable anymore. Also, S AEP andŜ AEP need not be stable. We will now first study under what conditions these are stable. First note thatŜ is stable if and only if A −λB is Hurwitz for all nonzero eigenvaluesλ ofL. Moreover, S AEP andŜ AEP are stable if and only if A − λB is Hurwitz for all nonzero eigenvalues λ of L AEP . In the following, let λ min (L) and λ max (L) denote the smallest nonzero and largest eigenvalue of L, respectively. We have the following lemma about the location of the nonzero eigenvalues ofL and L AEP : Proof The claim about the eigenvalues ofL follows from the interlacing property (see, e.g., [13]). Next, note that P = Q 1 Q T 1 , with Q 1 = P(P T P) − 1 2 . Since the columns of Q 1 are orthonormal, there exists a matrix Q 2 ∈ R N ×(N −r ) such that Q 1 Q 2 is an orthogonal matrix. Then, we have I N − P = Q 2 Q T 2 and we find . By the interlacing property, both the eigenvalues of Q T 1 L Q 1 and Q T 2 L Q 2 are interlaced with the eigenvalues of L, so in particular we have that all eigenvalues λ of L AEP satisfy λ ≤ λ max (L). In order to prove the lower bound, note that Q T 1 L Q 1 is similar toL, for which we know that its nonzero eigenvalues are between the nonzero eigenvalues of L. As for the eigenvalues of Q T 2 L Q 2 , note that 1 T Q 2 = 0 and Q 2 x 2 = x 2 for all x. Thus, we find min Therefore, the smallest eigenvalue of Q T 2 L Q 2 is larger than the smallest positive eigenvalue of L. We conclude that indeed λ ≥ λ min (L) for all nonzero eigenvalues λ of L AEP .
Using this lemma, we see that a sufficient condition forŜ, S AEP , andŜ AEP to be stable is that for each λ ∈ [λ min (L), λ max (L)], the strict Lyapunov inequality has a positive definite solution X . This sufficient condition can be checked by verifying solvability of a single linear matrix inequality, whose size does not depend on the number of agents, see [31]. After having checked this, it would then remain to establish upper bounds for the first and third term in (29). This can be done in an analogous way as in the previous subsection. Specifically, it can be shown that for p = 2 and p = ∞ we have Fig. 3 Ratios of H 2 (left) and H ∞ (right) upper bounds and corresponding true errors, for a fixed almost equitable partition and all possible sets of leaders. In both figures, the sets of leaders are sorted such that the ratio is increasing (in particular, the ordering of the sets of leaders is not the same)

Numerical examples
To illustrate the error bounds we have established in this paper, consider the graph with 10 nodes taken from [26], as shown in Fig. 1. Its Laplacian matrix is Note that, indeed, π is an AEP. Also, in order to satisfy the assumptions of Theorem 4, we have taken A and B symmetric. Note that A − λB is Hurwitz for all nonzero eigenvalues λ of the Laplacian matrix L. Therefore, the multi-agent system is synchronized. It remains to choose the set of leaders V L . For demonstration, we compute the H 2 and H ∞ upper bounds and the true errors for all possible choices of V L . Since the sets of leaders are nonempty subsets of V, it follows that there are 2 10 − 1 = 1023 possible sets of leaders. Figure 3 shows all the ratios of upper bounds and corresponding true errors, where we define 0 0 := 1. We see that in this example, all true errors and upper bounds are within one order of magnitude, and that in most cases the ratio is below 2.
Next, we compare the true errors with the triangle inequality-based error bounds from (29) for a fixed set of leaders and all possible partitions consisting of five cells.  For the set of leaders, we take V L = {6, 7}, as was also used in [26]. With this choice of leaders, the systems norms are S H 2 ≈ 6.4 and S H ∞ ≈ 1.03 (rounded to three significant digits). Figure 4 shows true errors and upper bounds for all partitions of V with five cells (there are 42,525 such partitions). We observe that the upper bounds vary significantly as the true error increases, but the ratio is still less than one order of magnitude. Additionally, we notice that partitions giving small H 2 errors give smaller upper bounds, as seen more clearly in the left subfigure of Fig. 5. Furthermore, we observe a jump after the 966th partition. In fact, the 966 partitions giving the smallest H 2 error are all those partitions where the leaders are the only members in their cell. For the H ∞ error this is not the case, i.e., there are partitions with leaders sharing a cell with more agents that give a smaller H ∞ error then a partition with leaders not sharing a cell. On the other hand, partitions with the smallest H 2 or H ∞ upper bound are close to the optimal true error.
In the following, we also compute the errors L − L AEP F for all partitions with five cells. Figure 6 shows the relative approximation errors L−L AEP F L F . We see that only a few (six, to be precise) partitions give a relative error less than 0.1. Irrespective of this, a small triangle inequality-based error bound (29) seems to indicate good partitions.
Finally, we compare the bound (29) with those from Ishizaki et al. [15][16][17]. There are also error bounds developed in [4,6], but they depend on the proposed model reduction methods and cannot be evaluated for an arbitrary partition. The H 2 and H ∞ error bounds from Ishizaki et al. are based on the decomposition (see equation (31) in [16], (20) in [14], or (17) in [17]) −1 , and Q is such that P Q is orthogonal. The error bounds are then for p = 2 and p = ∞. Figure 7 shows the comparison between these bounds, the triangle inequality-based bound (29), and the true errors. In this example, our bounds are, for most partitions, lower than those from Ishizaki et al. Yet, they do share some qualitative properties: both vary significantly as the true error increases and those partitions with the small bounds are close to the optimal.

Conclusions
In this paper, we have extended results on model reduction of leader-follower networks with single integrator agent dynamics from [26] to leader-follower networks with arbitrary linear multivariable agent dynamics. We have also extended these results to the case that the approximation error is measured in the H ∞ -norm. The proposed model reduction technique reduces the complexity of the network topology by clustering the agents. We have shown that clustering amounts to applying a specific Petrov-Galerkin projection associated with the graph partition. The resulting reduced order model can be interpreted as a networked multi-agent system with a weighted, directed network graph. If the original network is clustered using an almost equitable graph partition, then its consensus properties are preserved. We have provided a priori upper bounds on the H 2 and H ∞ model reduction errors in this case. These error bounds depend on Fig. 7 Comparison with error bounds from Ishizaki et al. [15][16][17]. The first column shows the H 2 errors and bounds, the second column the H ∞ errors and bounds. The first row contains values for all partitions with five cells, the second row only the first 1000 best ones an auxiliary system related to the agent dynamics, the eigenvalues of the Laplacian matrices of the original and the reduced network, and on the number of cellmates of the leaders in the network. Finally, we have provided some insight into the general case of clustering according to arbitrary, not necessarily almost equitable, partitions.
Here, direct computation of a priori upper bounds on the error is not as straightforward as in the case of almost equitable partitions. We have shown that in this more general case, one can bound the model reduction errors by first optimally approximating the original network by a new network for which the chosen partition is almost equitable, and then bounding the H 2 and H ∞ errors using the triangle inequality.
where A − is Hurwitz, and A + has all its eigenvalues in the closed right half plane. Let X − be the unique solution to the reduced Lyapunov equation Then Obviously then, X = diag(X − , 0) is a positive semi-definite solution of (2). Now let X be a positive semi-definite solution to (2) with the property that X + (A) ⊂ ker X . Then, X must be of the form X = diag(X 1 , 0), and X 1 must satisfy the reduced Lyapunov equation (33). Thus, X = diag(X − , 0). Finally, S is stable since X + (A) ⊂ ker C. Moreover,

Appendix B Proof of Lemma 1
Proof For the first part of the proof, let us assume that (A, B, C) is minimal. Then, in particular, A is Hurwitz and (A, B) is controllable. Clearly, the inequality S H ∞ ≥ σ 1 (S(0)) is always satisfied. We will prove that S H ∞ ≤ σ 1 (S(0)) using the bounded real lemma [32], which states that S H ∞ ≤ γ if and only if there exists P ∈ R n×n such that P = P T and A T P + P A + C T C + 1 γ 2 P B B T P ≤ 0.
Let us take γ = σ 1 (S(0)) = σ 1 (C A −1 B). This implies that Defining P := −A −T C T XC A −1 and using (34) yields A T P + P A + C T C + From the bounded real lemma, we conclude that S H ∞ ≤ σ 1 (S(0)). For a non-minimal representation (A, B, C), applying the Kalman decomposition, let T be a nonsingular matrix such that where (A 1 , B 1 , C 1 ) is a minimal representation of (A, B, C) with A 1 Hurwitz. From, we find that C 1 A 1 = XC 1 . Therefore, the minimal representation satisfies the sufficient condition and using the result obtained above the proof is completed.

Appendix C Proof of Lemma 2
Proof If A is nonsingular, then the conclusion follows immediately. Otherwise, let A = U ΛU T be an eigenvalue decomposition with orthogonal U and Λ = diag(0, Λ 2 ), where Λ 2 ∈ R r ×r and r is the rank of A. We denote U = U 1 U 2 , with U 2 ∈ R n×r . Then, Note that CU 1 = 0. We have

Appendix D Proof of Lemma 4
Proof First, note that the columns of P(π ) are orthogonal. We construct a matrix T = P Q , where P := P(π ), and where the N × (N − k) matrix Q is chosen such that the columns of T form an orthogonal basis for R N . In this case, we have P T Q = 0. Next, we apply the state space transformation x = Tx to system (4). We obtain ẋ 1 where the matrices A e , B e , and C e are given by Obviously, in (35) the transfer function from u to y is equal to S. Furthermore, if the state componentx 2 is truncated from (35), what we are left with is the reduced order model (5). Since π is an AEP of G, by Lemma 3, im P is invariant under L. From this, it follows that not only Q T P = 0, but also Q T L P = 0 and Q T L 2 P = 0.