1 Introduction

Many control problems for systems of ordinary differential equations can be posed as convex optimization problems with matrix inequality constraints that must hold on a prescribed portion of the state space [8, 14, 20, 43]. For differential equations with polynomial right-hand side, these problems often take the generic form

$$\begin{aligned} B^* := \inf _{\lambda \in \mathbb {R}^\ell } \quad b(\lambda ) \quad \text {s.t.} \quad P(x,\lambda ) := P_0(x) - \sum _{i=1}^\ell P_i(x)\lambda _i \succeq 0 \quad \forall x \in \mathcal {K}, \end{aligned}$$
(1.1)

where \(b:\mathbb {R}^\ell \rightarrow \mathbb {R}\) is a convex cost function, \(P_0,\ldots ,P_\ell \) are \(m \times m\) symmetric polynomial matrices depending on the system state \(x \in \mathbb {R}^n\), and

$$\begin{aligned} \mathcal {K} = \left\{ x \in \mathbb {R}^n:\; g_1(x)\ge 0,\, \ldots ,\, g_q(x) \ge 0 \right\} \end{aligned}$$
(1.2)

is a basic semialgebraic set defined by inequalities on fixed polynomials \(g_1,\,\ldots ,\,g_q\). There is no loss of generality in considering only inequality constraints because any equality \(g(x)=0\) can be replaced by the two inequalities \(g(x)\ge 0\) and \(-g(x)\ge 0\).

Verifying polynomial matrix inequalities is generally an NP-hard problem [29], which makes (1.1) intractable. Nevertheless, feasible vectors \(\lambda \) can be found via semidefinite programming if one imposes the stronger condition that

$$\begin{aligned} P(x,\lambda ) = S_0(x) + g_1(x)S_1(x) + \cdots + g_q(x) S_q(x) \end{aligned}$$
(1.3)

for some \(m \times m\) sum-of-squares (SOS) polynomial matrices \(S_0,\,\ldots ,\,S_q\). A polynomial matrix S(x) is SOS if \(S(x)=H(x)^{{\mathsf T}}H(x)\) for some polynomial matrix H(x), and it is well known [12, 18, 34, 42] that linear optimization problems with SOS matrix variables can be reformulated as semidefinite programs (SDPs). However, the size of these SDPs increases very rapidly as a function of the size of P, its polynomial degree, and the number of independent variables x. Thus, even though in theory SDPs can be solved using algorithms with polynomial-time complexity [7, 31, 32, 47], in practice reformulations of (1.1) based on (1.3) remain intractable because they require prohibitively large computational resources.

This work introduces new sparsity-exploiting SOS decompositions that can be used to efficiently certify the nonnegativity of large but sparse polynomial matrices, where “sparse” means that many of their off-diagonal entries are identically zero. Specifically, let P(x) be an \(m \times m\) polynomial matrix and describe its sparsity using an undirected graph \(\mathcal {G}\) with vertices \(\mathcal {V}=\{1,\ldots ,m\}\) and edges \(\mathcal {E} \subseteq \mathcal {V} \times \mathcal {V}\) such that \(P_{ij}(x)=P_{ji}(x)\equiv 0\) when \(i \ne j\) and \((i,j) \notin \mathcal {E}\). Motivated by chordal decomposition techniques for semidefinite programming [11, 30, 45, 46, 55], we ask whether the computational complexity of (1.3) can be lowered by decomposing the matrices \(S_0,\ldots ,S_q\) into sums of sparse SOS matrices, with nonzero entries only on the principal submatrix indexed by one of the maximal cliques of the sparsity graph \(\mathcal {G}\) of P. We prove that this clique-based decomposition exists if \(\mathcal {G}\) is a chordal graph (meaning that, for every cycle of length larger than three, there is at least one edge in \(\mathcal {E}\) connecting nonconsecutive vertices in the cycle), \(\mathcal {K}\) is a compact set satisfying the so-called Archimedean condition, and P(x) is strictly positive definite on \(\mathcal {K}\) (cf. Theorem 2.4). This result is a sparsity-exploiting version of Putinar’s Positivstellensätze [36] for polynomial matrices. We also give a sparse-matrix version of the Putinar–Vasilescu Positivstellensätze [37], stating that \((x_1^2 + \cdots + x_n^2)^\nu P\) admits a clique-based SOS decomposition for some integer \(\nu \ge 0\) if P is homogeneous, has even degree, and is positive definite on a semialgebraic set \(\mathcal {K}\) defined by homogeneous polynomials \(g_1, \ldots , g_m\) of even degree (cf. Theorem 2.5). This result applies even if \(\mathcal {K}\) is noncompact. For the particular case of global nonnegativity, \(\mathcal {K}\equiv \mathbb {R}^n\), we immediately recover a sparse-matrix version of Reznick’z Positivestellensätz [39] (cf. Theorem 2.3), and further prove a version of the Hilbert–Artin theorem [5] where the strict positivity of P is weakened into positive semidefiniteness upon replacing the factor \((x_1^2 + \cdots + x_n^2)^\nu \) with a generic SOS polynomial (cf. Theorem 2.2). Table 1 summarizes our results and gives references to their counterparts for polynomials and general (dense) polynomial matrices.

Table 1 Summary of Positivstellensätze for polynomials, polynomial matrices, and polynomial matrices with structural sparsity

These chordal SOS decomposition theorems for polynomial matrices extend a classical chordal decomposition result for constant (i.e., independent of x) positive semidefinite (PSD) sparse matrices [1]. The latter allows for significant computational gains when applied to large-scale sparse SDPs [45, 55], analysis and control of structured systems [4, 56], and optimal power flow for large grids [3, 27]. Similarly, our decomposition results can be used to construct convergent hierarchies of sparsity-exploiting SOS reformulations of problem (1.1) (cf. Theorems 3.1, 3.2 and 3.3), which produce a minimizing sequence of feasible vectors \(\lambda \) and often have a significantly lower computational complexity compared to traditional approaches based on the “dense” weighted SOS representation (1.3).

Finally, when the polynomial matrix P in (1.1) is not only sparse, but also depends only on a small set of n-variate monomials, our chordal SOS decompositions can be combined with known methods to exploit term sparsity. These methods include facial reduction [24, 35, 38], symmetry reduction [12, 40], the exploitation of so-called correlative sparsity in the couplings between the independent variables [13, 15, 17, 19, 48], and the recent TSSOS, chordal-TSSOS and CS-TSSOS approaches to polynomial optimization [49,50,51,52]. Even though all of these methods have been developed for polynomial inequalities, rather than polynomial matrix inequalities, they can be applied directly upon reformulating the matrix inequality \(P(x; \lambda ) \succeq 0\) on \(\mathcal {K}\) as the polynomial inequality \(p(x,y)= y^{{\mathsf T}}P(x; \lambda ) y \ge 0\) for all \(x \in \mathcal {K}\) and \(y \in \mathbb {R}^m\) with \(\Vert y\Vert _\infty \le 1\). In particular, if P is structurally sparse, then p(xy) is correlatively term sparse with respect to y, and the techniques of [13, 15, 19, 48, 50, 54] can be used to check if it is nonnegative for all x and y of interest. This connection does not make our matrix decomposition theorems redundant: on the contrary, they reveal that correlatively sparse SOS decompositions for p(xy) depend only quadratically on y (Corollaries 4.1, 4.2 and 4.3), which cannot be concluded from the available SOS decomposition theorems for scalar polynomials.

The rest of this work is structured as follows. Section 2 states our main chordal SOS decomposition results, while Sect. 3 explains how they can be used to formulate convergent hierarchies of sparsity-exploiting SOS reformulations of problem (1.1). Section 4 relates our decomposition results for polynomial matrices to the classical SOS techniques for correlatively sparse polynomials [13, 19, 48]. Computational examples are presented in Sect. 5. Our matrix decomposition results are proven in Sect. 6, and conclusions are offered in Sect. 7. Appendices contain details of calculations and proofs of auxiliary results.

2 Chordal decomposition of polynomial matrices

The main contributions of this work are chordal decomposition theorems for n-variate PSD polynomial matrices P(x) whose sparsity is described by a chordal graph \(\mathcal {G}\). After reviewing the connection between sparse matrices and graphs, as well as the standard chordal decomposition theorem for constant matrices, we present decomposition theorems that apply globally (Sect. 2.2) and on basic semialgebraic sets (Sect. 2.3).

2.1 Sparse matrices and chordal graphs

A graph \(\mathcal {G}\) is a set of vertices \(\mathcal {V}=\{1,\dots , m\}\) connected by a set of edges \(\mathcal {E} \subseteq \mathcal {V} \times \mathcal {V}\). We call \(\mathcal {G}\) undirected if edge (ji) is identified with edge (ij), so edges are unordered pairs; complete if \(\mathcal {E} = \mathcal {V}\times \mathcal {V}\); connected if there exists a path \((i,v_1),\,(v_1,v_2),\,\ldots ,\,(v_k,j)\) between any two distinct vertices i and j. We consider only undirected graphs, and focus mainly on the connected but not complete case.

A vertex \(i \in \mathcal {V}\) of an undirected graph is called simplicial if the subgraph induced by its neighbours is complete. A subset of vertices \(\mathcal {C} \subseteq \mathcal {V}\) that are fully connected, meaning that \((i,j) \in \mathcal {E}\) for all pairs of (distinct) vertices \(i,j \in \mathcal {C}\), is called a clique. A clique is maximal if it is not contained in any other clique. Finally, a sequence of vertices \(\{v_1, v_2, \ldots , v_k\} \subseteq \mathcal {V}\) with \(k \ge 3\) is called a cycle of length k if \((v_i, v_{i+1}) \in \mathcal {E}\) for all \(i = 1, \ldots , k-1\) and \((v_k, v_{1}) \in \mathcal {E}\). Any edge \((v_i,v_j)\) between nonconsecutive vertices in a cycle is known as a chord, and a graph is said to be chordal if all cycles of length \(k\ge 4\) have at least one chord. Complete graphs, chain graphs, and trees are all chordal; other particular examples are illustrated in Fig. 1. Any non-chordal graph can be made chordal by adding appropriate edges to it; the process is known as a chordal extension [46].

Fig. 1
figure 1

Connected, non-complete, chordal undirected graphs, and the sparse matrices they describe. a Star graph with vertices \(\mathcal {V}=\{1,2,3,4\}\) and edges \(\mathcal {E}=\{(1,2),(1,3),(1,4)\}\). b Triangulated graph with vertices \(\mathcal {V}=\{1,2,3,4\}\) and edges \(\mathcal {E}=\{(1,2),(2,3),(3,4),(1,4),(2,4)\}\) (colour figure online)

The sparsity pattern of any \(m \times m\) symmetric matrix P can be described using an undirected graph \(\mathcal {G}\) with vertices \(\mathcal {V}=\{1,\ldots ,m\}\) and an edge set \(\mathcal {E}\) such that \((i,j)\notin \mathcal {E}\) if and only if \(i\ne j\) and \(P_{ij}=0\); see Fig. 1 for two examples. We call \(\mathcal {G}\) the sparsity graph of P. Dense principal submatrices of P are indexed by cliques of \(\mathcal {G}\), and maximal dense principal submatrices are indexed by maximal cliques.

For each maximal clique \(\mathcal {C}_k\) of \(\mathcal {G}\), define a matrix \(E_{\mathcal {C}_k} \in \mathbb {R}^{|\mathcal {C}_k| \times m}\) as

$$\begin{aligned} (E_{\mathcal {C}_k})_{ij} := {\left\{ \begin{array}{ll} 1, &{}\text {if } \mathcal {C}_k(i) = j, \\ 0, &{}\text {otherwise}, \end{array}\right. } \end{aligned}$$
(2.1)

where \(|\mathcal {C}_k|\) is the cardinality of \(\mathcal {C}_k\) and \(\mathcal {C}_k(i)\) is the i-th vertex in \(\mathcal {C}_k\). This definition ensures that the operation \(E_{\mathcal {C}_k}^{{\mathsf T}}X_k E_{\mathcal {C}_k}\) “inflates” a \(\vert \mathcal {C}_k\vert \times \vert \mathcal {C}_k\vert \) matrix \(X_k\) into a sparse \(m\times m\) matrix with nonzero entries only in the submatrix indexed by \(\mathcal {C}_k\); for example, if \(m=3\), \(\mathcal {C}_k=\{1,3\}\), and \(S = \left[ {\begin{matrix}\alpha &{}\quad \beta \\ \beta &{}\quad \gamma \end{matrix}}\right] \) we have

$$\begin{aligned} E_{\mathcal {C}_k} = \begin{bmatrix} 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1 \end{bmatrix} \qquad \text {and} \qquad E_{\mathcal {C}_k}^{{\mathsf T}}S E_{\mathcal {C}_k} = \begin{bmatrix} \alpha &{}\quad 0 &{}\quad \beta \\ 0 &{}\quad 0 &{}\quad 0\\ \beta &{}\quad 0 &{}\quad \gamma \end{bmatrix}. \end{aligned}$$

The following classical result states that PSD matrices with a chordal sparsity graph admit a clique-based PSD decomposition.

Theorem 2.1

(Agler et al. [1]) A matrix P whose sparsity graph is chordal and has maximal cliques \(\mathcal {C}_1,\ldots ,\mathcal {C}_t\) is positive semidefinite if and only if there exist positive semidefinite matrices \(S_k\) of size \(\left| \mathcal {C}_k \right| \times \left| \mathcal {C}_k \right| \) such that

$$P = \sum _{k=1}^{t} E_{\mathcal {C}_k}^{{\mathsf T}}S_k E_{\mathcal {C}_k}.$$
Fig. 2
figure 2

Chordal sparsity graph of the \(3 \times 3\) matrices used in Examples 2.1, 2.2, 2.4 and 2.5

Example 2.1

The PSD matrix \(P = \left[ {\begin{matrix} 4 &{}\quad 2 &{}\quad 0 \\ 2 &{}\quad 2 &{}\quad 2 \\ 0 &{}\quad 2 &{}\quad 4 \end{matrix}}\right] \) has the sparsity graph illustrated in Fig. 2, which is chordal because it has no cycles. This graph has maximal cliques \(\mathcal {C}_1 = \{1,2\}\) and \(\mathcal {C}_2 = \{2,3\}\). The decomposition guaranteed by Theorem 2.1 reads \(P = E_{\mathcal {C}_1}^{{\mathsf T}}S_1 E_{\mathcal {C}_1} + E_{\mathcal {C}_2}^{{\mathsf T}}S_2 E_{\mathcal {C}_2}\) with \(S_1 = \left[ {\begin{matrix} 4 &{}\quad 2\\ 2 &{}\quad 1 \end{matrix}}\right] \) and \(S_2 = \left[ {\begin{matrix} 1 &{}\quad 2 \\ 2 &{}\quad 4 \end{matrix}}\right] \). \(\blacksquare \)

Our goal is to derive versions of Theorem 2.1 for sparse polynomial matrices that are positive semidefinite, either globally or on a basic semialgebraic set, where the matrices \(S_k\) are polynomial and SOS. This allows us to build convergent hierarchies of sparsity-exploiting SOS reformulations for the optimization problem (1.1), which have a considerably lower computational complexity compared to standard (dense) ones. Throughout the paper, we assume without loss of generality that the sparsity graph \(\mathcal {G}\) of P(x) is connected and not complete. Complete sparsity graphs correspond to dense matrices, while disconnected ones correspond to matrices that have a block-diagonalizing permutation. Each irreducible diagonal block can be analyzed individually and has a connected (but possibly complete) sparsity graph by construction.

2.2 Polynomial matrix decomposition on \(\mathbb {R}^n\)

Let the polynomial matrix P(x) be positive semidefinite for all \(x \in \mathbb {R}^n\) and have a chordal sparsity graph with maximal cliques \(\mathcal {C}_1, \ldots , \mathcal {C}_t\). Applying Theorem 2.1 for each \(x\in \mathbb {R}^n\) yields PSD matrices \(S_1(x),\,\ldots ,\,S_t(x)\) such that

$$\begin{aligned} P(x) = \sum _{k=1}^{t} E_{\mathcal {C}_k}^{{\mathsf T}}S_k(x) E_{\mathcal {C}_k}. \end{aligned}$$
(2.2)

Are these matrices always polynomial in x? Our first result gives a negative answer to this question for all matrix sizes \(m \ge 3\), irrespective of the number n of independent variables and of the sparsity graph of P.

Proposition 2.1

Let \(\mathcal {G}\) be a connected and not complete chordal graph with \(m\ge 3\) vertices and maximal cliques \(\mathcal {C}_1,\ldots ,\mathcal {C}_t\). Fix any positive integer n. There exists an n-variate \(m \times m\) polynomial matrix P(x) with sparsity graph \(\mathcal {G}\) that is strictly positive definite for all \(x \in \mathbb {R}^n\), but cannot be written in the form (2.2) with positive semidefinite polynomial matrices \(S_k(x)\).

The proof of this proposition, given in Sect. 6.1, relies on the following example.

Example 2.2

The \(3\times 3\) univariate polynomial matrix

$$\begin{aligned} P(x) = \begin{bmatrix} k+1+x^2&{}\quad x+x^2 &{}\quad 0 \\ x+x^2 &{}\quad k+2x^2 &{}\quad x-x^2 \\ 0 &{}\quad x-x^2 &{}\quad k+1+x^2 \end{bmatrix} = \begin{bmatrix} x &{}\quad 1\\ x &{}\quad x\\ 1 &{}\quad -x \end{bmatrix} \begin{array}{lll} \begin{bmatrix} x &{}\quad x &{}\quad 1\\ 1 &{}\quad x &{}\quad -x \end{bmatrix} \end{array} + k I_3 \end{aligned}$$
(2.3)

is globally positive semidefinite and SOS for all \(k\ge 0\), and it is strictly positive definite if \(k >0\). Let us try to search for a basic decomposition of the form (2.2). We need to find two \(2 \times 2\) positive semidefinite polynomial matrices \(S_1\) and \(S_2\) such that \(P(x) = E_{\mathcal {C}_1}^{{\mathsf T}}S_1(x) E_{\mathcal {C}_1} + E_{\mathcal {C}_2}^{{\mathsf T}}S_2(x) E_{\mathcal {C}_2}\). Equivalently, we need to find polynomials a, b, c, d, e and f such that

$$\begin{aligned} P(x) = \begin{bmatrix} a(x)&{}\quad b(x) &{}\quad 0 \\ b(x) &{}\quad c(x) &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 \end{bmatrix} + \begin{bmatrix} 0&{}\quad 0 &{}\quad 0 \\ 0 &{}\quad d(x) &{}\quad e(x) \\ 0 &{}\quad e(x) &{}\quad f(x) \end{bmatrix}, \end{aligned}$$
(2.4)

and such that the two matrices on the right-hand side are positive semidefinite. Fixing \(a(x)=k+1+x^2\), \(b(x) = x+x^2\), \(e(x)=x-x^2\), \(f(x) = k+1+x^2\) and \(d(x) = k + 2x^2 - c(x)\) to ensure the equality, positive semidefiniteness requires the traces and determinants of the \(2\times 2\) nonzero blocks to be nonnegative, i.e.

$$\begin{aligned}&c(x) \ge 0, \end{aligned}$$
(2.5a)
$$\begin{aligned}&k + 2x^2 - c(x) \ge 0, \end{aligned}$$
(2.5b)
$$\begin{aligned}&(k+1+x^2)c(x) - (x^4 + 2x^3+x^2)\ge 0, \end{aligned}$$
(2.5c)
$$\begin{aligned}&x^4 + 2x^3 + (3k+1)x^2 + k^2 + k - (k+1+x^2)c(x) \ge 0. \end{aligned}$$
(2.5d)

If c(x) is to be nonnegative, then it must be quadratic; otherwise, (2.5b) cannot hold for all x. In particular, we must have \(c(x) = \alpha + 2x + x^2\) for some scalar \(\alpha \) to ensure that the coefficients of \(x^4\) and \(x^3\) in (2.5c) and (2.5d) vanish, otherwise at least one of these conditions cannot hold for all \(x \in \mathbb {R}\). Then, (2.5a) and (2.5b) become \(x^2 + 2 x + \alpha \ge 0\) and \(x^2 - 2 x - \alpha + k \ge 0\), and hold if and only if \(1\le \alpha \le k-1\). A suitable \(\alpha \) therefore exists when \(k\ge 2\), while the decomposition (2.4) fails to exist if \(0\le k <2\) even though P(x) is PSD for all such values of k (and, in fact, positive definite if \(k\ne 0\)). \(\blacksquare \)

Clique-based decompositions similar to (2.2) with polynomial matrices \(S_k(x)\), however, do exist after multiplying P(x) by a suitable SOS polynomial \(\sigma (x)\). The next result generalizes the Hilbert–Artin theorem on the representation of nonnegative polynomial as sums of squares of rational functions [5]. Importantly, it establishes that each \(S_k(x)\) is not just positive semidefinite, but SOS.

Theorem 2.2

Let P(x) be an \(m \times m\) positive semidefinite polynomial matrix whose sparsity graph is chordal and has maximal cliques \(\mathcal {C}_1,\ldots ,\mathcal {C}_t\). There exist an SOS polynomial \(\sigma (x)\) and SOS matrices \(S_k(x)\) of size \(|\mathcal {C}_k| \times |\mathcal {C}_k|\) such that

$$\begin{aligned} \sigma (x) P(x)= \sum _{k=1}^{t} E_{\mathcal {C}_k}^{{\mathsf T}}S_k(x) E_{\mathcal {C}_k}. \end{aligned}$$
(2.6)

The proof, given in Sect. 6.2, extends a constructive proof of Theorem 2.1 for standard PSD matrices with chordal sparsity [16] using Schmüdgen’s diagonalization procedure for polynomial matrices [44] and the Hilbert–Artin theorem [5].

Example 2.3

Consider once again the polynomial matrix P(x) from Example 2.2. Inequalities (2.5a–d) hold for the rational function \(c(x) = (1+x)^2x^2(k+1+x^2)^{-1}\). We can therefore decompose

$$\begin{aligned} P(x) = (1+k+x^2)^{-1} \left[ E_{\mathcal {C}_1}^{{\mathsf T}}S_1(x) E_{\mathcal {C}_1} + E_{\mathcal {C}_2}^{{\mathsf T}}S_2(x) E_{\mathcal {C}_2} \right] \end{aligned}$$
(2.7)

where, by construction, the polynomial matrices

$$\begin{aligned} S_1(x)&:= \begin{bmatrix} (k+1+x^2)^2 &{}\quad (k+1+x^2)(x+x^2)\\ (k+1+x^2)(x+x^2) &{}\quad (1+x)^2 x^2 \end{bmatrix} \\ S_2(x)&:= \begin{bmatrix} k^2 + k + 3k x^2 + (1-x)^2x^2 &{}\quad (k+1+x^2)(x-x^2) \\ (k+1+x^2)(x-x^2) &{}\quad (k+1+x^2)^2 \end{bmatrix} \end{aligned}$$

are PSD for all \(k \ge 0\). They are also SOS because the two concepts are equivalent for univariate polynomial matrices [6]. Rearranging (2.7) yields the decomposition of P guaranteed by Theorem 2.2 with \(\sigma (x) = k+1 + x^2\). \(\blacksquare \)

If P(x) and its highest-degree homogeneous part are strictly positive definite on \(\mathbb {R}^n\) and \(\mathbb {R}^n{\setminus }\{0\}\), respectively, one can fix either \(\sigma (x)=\Vert x\Vert ^{2\nu }\) or \(\sigma (x)=(1 +\Vert x\Vert ^2)^{\nu }\) for a sufficiently large integer \(\nu \ge 0\), where \(\Vert x\Vert ^2 := x_1^2 + \cdots + x_n^2\). Precisely, we have the following versions of Reznick’s Positivstellensätze [39] for sparse polynomial matrices, which follow from more general SOS chordal decomposition results on semialgebraic sets stated in the next section (cf. Theorem 2.5 and Corollary 2.2).

Theorem 2.3

Let P(x) be an \(m \times m\) homogeneous polynomial matrix whose sparsity graph is chordal and has maximal cliques \(\mathcal {C}_1,\ldots ,\mathcal {C}_t\). If P is strictly positive definite on \(\mathbb {R}^n {\setminus }\{0\}\), there exist an integer \(\nu \ge 0\) and homogeneous SOS matrices \(S_{k}(x)\) of size \(|\mathcal {C}_k| \times |\mathcal {C}_k|\) such that

$$\begin{aligned} \Vert x\Vert ^{2\nu } P(x) = \sum _{k=1}^t E_{\mathcal {C}_k}^{{\mathsf T}}S_{k}(x) E_{\mathcal {C}_k}. \end{aligned}$$
(2.8)

Corollary 2.1

Let \(P(x) = \sum _{\left| \alpha \right| \le 2d} P_\alpha x^\alpha \) be an inhomogeneous \(m \times m\) polynomial matrix of even degree 2d whose sparsity graph is chordal and has maximal cliques \(\mathcal {C}_1,\ldots ,\mathcal {C}_t\). If P is strictly positive definite on \(\mathbb {R}^n\) and its highest-degree homogeneous part \(\sum _{\left| \alpha \right| =2d}P_\alpha x^\alpha \) is strictly positive definite on \(\mathbb {R}^n{\setminus } \{0\}\), there exist an integer \(\nu \ge 0\) and SOS matrices \(S_{k}(x)\) of size \(|\mathcal {C}_k| \times |\mathcal {C}_k|\) such that

$$\begin{aligned} (1+\Vert x\Vert ^2)^\nu P(x) = \sum _{k=1}^t E_{\mathcal {C}_k}^{{\mathsf T}}S_{k}(x) E_{\mathcal {C}_k}. \end{aligned}$$
(2.9)

Example 2.4

Let \(q(x) = x_1^2 x_2^4 + x_1^4 x_2^2 - 3 x_1^2 x_2^2 + 1\) be the Motzkin polynomial [28], which is nonnegative but not SOS [22, Example 3.7]. The polynomial matrix

$$\begin{aligned} P(x) = \begin{bmatrix} 0.01(1+x_1^6+x_2^6)+q(x) &{}\quad -0.01x_1 &{}\quad 0 \\ -0.01x_1 &{}\quad x_1^6+x_2^6+1 &{}\quad -x_2 \\ 0&{}\quad -x_2 &{}\quad x_1^6 + x_2^6 + 1 \end{bmatrix} \end{aligned}$$
(2.10)

is strictly positive definite on \(\mathbb {R}^2\) (see Appendix A), but is not SOS since \(\varepsilon (1+x_1^6+x_2^6)+q(x)\) is not SOS unless \(\varepsilon \gtrsim 0.01006\) [22, Example 6.25]. Nevertheless, since the highest-degree homogeneous part of P is also positive definite on \(\mathbb {R}^2{\setminus }\{0\}\), Corollary 2.1 guarantees that P can be decomposed as in (2.9) for a large enough exponent \(\nu \). Here \(\nu =1\) suffices, and \((1+ \Vert x\Vert ^2) P(x) = E_{\mathcal {C}_1}^{{\mathsf T}}S_1(x) E_{\mathcal {C}_1} + E_{\mathcal {C}_2}^{{\mathsf T}}S_2(x) E_{\mathcal {C}_2}\) with

$$\begin{aligned} S_1(x) = \begin{bmatrix} (1+\Vert x\Vert ^2)q(x) &{}\quad 0 \\ 0 &{}\quad 0 \end{bmatrix} + \frac{1+\Vert x\Vert ^2}{100}\begin{bmatrix} 1+x_1^6+x_2^6 &{}\quad -x_1 \\ -x_1 &{}\quad 100 x_1^2 \end{bmatrix} \end{aligned}$$
(2.11a)

and

$$\begin{aligned} S_2(x) = (1+\Vert x\Vert ^2)\begin{bmatrix} 1 - x_1^2 + x_1^6 + x_2^6 &{}\quad -x_2 \\ -x_2 &{}\quad 1+x_1^6+x_2^6 \end{bmatrix}. \end{aligned}$$
(2.11b)

To see that these two matrices are SOS, observe that the first addend on the right-hand side of (2.11a) is SOS because \((1+\Vert x\Vert ^2)q(x) = (1-x_1^2x_2^2)^2 + x_2^2(1-x_1^2)^2 + x_1^2(1-x_2^2)^2 + \tfrac{1}{4}(x_1^3x_2 - x_1x_2^3)^2 + \tfrac{3}{4} (x_1^3x_2 + x_1x_2^3 - 2x_1x_2)^2\), the second addend on the right-hand side of (2.11a) is SOS because

$$\begin{aligned} \begin{bmatrix}1+x_1^6+x_2^6 &{}\quad -x_1 \\ -x_1 &{}\quad 100 x_1^2\end{bmatrix} = H(x)H(x)^{{\mathsf T}}\quad \text {with}\quad H(x) = \begin{bmatrix}1 &{}\quad x_1^3 &{}\quad 0 &{}\quad x_2^3 \\ -x_1 &{}\quad 0 &{}\quad \sqrt{99}x_1 &{}\quad 0\end{bmatrix}, \end{aligned}$$

and the matrix on the right-hand side of (2.11b) is the sum of two univariate PSD (hence, SOS) matrices: setting \(k=2/(3\sqrt{3})\), we have

$$\begin{aligned} \begin{bmatrix} 1 - x_1^2 + x_1^6 + x_2^6 &{}\quad -x_2 \\ -x_2 &{}\quad 1+x_1^6+x_2^6 \end{bmatrix} \!\!=\!\! \begin{bmatrix} k - x_1^2 + x_1^6 &{}\quad 0 \\ 0 &{}\quad x_1^6 \end{bmatrix} \!+\! \begin{bmatrix} 1 - k + x_2^6 &{}\quad -x_2 \\ -x_2 &{}\quad 1+x_2^6 \end{bmatrix}. \end{aligned}$$

\(\blacksquare \)

2.3 Polynomial matrix decomposition on semialgebraic sets

We now turn our attention to SOS chordal decompositions on basic semialgebraic sets \(\mathcal {K}\) defined as in (1.2). We say that \(\mathcal {K}\) satisfies the Archimedean condition if there exist SOS polynomials \(\sigma _0(x),\,\ldots ,\,\sigma _q(x)\) and a scalar r such that

$$\begin{aligned} \sigma _0(x) + g_1(x) \sigma _1(x) + \cdots + g_q(x) \sigma _q(x) = r^2 - \Vert x\Vert ^2. \end{aligned}$$
(2.12)

This condition implies that \(\mathcal {K}\) is compact because \(r^2 - \Vert x\Vert ^2\) is positive on \(\mathcal {K}\). The converse is not always true [20], but can be ensured by adding the redundant inequality \(r^2 - \Vert x\Vert ^2 \ge 0\) to the definition (1.2) of \(\mathcal {K}\) for a sufficiently large r.

Theorem 2.4 below guarantees that if a polynomial matrix is strictly positive definite on a compact \(\mathcal {K}\) satisfying the Archimedean condition, then it admits a chordal decomposition in terms of weighted sums of SOS matrices supported on the cliques of the sparsity graph, where the weights are exactly the polynomials \(g_1,\,\ldots ,\,g_q\) used in the semialgebraic definition (1.2) of \(\mathcal {K}\). This result extends Putinar’s Positivstellensatz [36] to sparse polynomial matrices, and can be considered a sparsity-exploiting version of a Positivstellensatz for general (dense) polynomial matrices (see [21, Theorem 2.19] and [42, Theorem 2]).

Theorem 2.4

Let \(\mathcal {K}\) be a compact semialgebraic set defined as in (1.2) that satisfies the Archimedean condition (2.12), and let P(x) be a polynomial matrix whose sparsity graph is chordal and has maximal cliques \(\mathcal {C}_1,\ldots ,\mathcal {C}_t\). If P is strictly positive definite on \(\mathcal {K}\), there exist SOS matrices \(S_{j,k}(x)\) of size \(|\mathcal {C}_k| \times |\mathcal {C}_k|\) such that

$$\begin{aligned} P(x) = \sum _{k=1}^t E_{\mathcal {C}_k}^{{\mathsf T}}\bigg ( S_{0,k}(x) + \sum _{j=1}^q g_j(x)S_{j,k}(x) \bigg ) E_{\mathcal {C}_k}. \end{aligned}$$
(2.13)

The proof, given in Sect. 6.3, exploits the Cholesky algorithm for matrices with chordal sparsity, the Weierstrass polynomial approximation theorem, and the aforementioned Positivstellensätze for general polynomial matrices [42, Theorem 2].

Example 2.5

The bivariate polynomial matrix

$$\begin{aligned} P(x) := \begin{bmatrix} 1+2x_1^2-x_1^4 &{}\quad x_1+x_1x_2-x_1^3 &{}\quad 0\\ x_1+x_1x_2-x_1^3 &{}\quad 3+4x_1^2-3x_2^2 &{}\quad 2x_1^2x_2-x_1x_2-2x_2^3\\ 0 &{}\quad 2x_1^2x_2-x_1x_2-2x_2^3 &{}\quad 1+x_2^2+x_1^2x_2^2-x_2^4 \end{bmatrix}\nonumber \\ \end{aligned}$$
(2.14)

is not positive semidefinite globally (the first diagonal element is negative if \(x_1\) is sufficiently large) but is strictly positive definite on the compact semialgebraic set \(\mathcal {K}=\{x \in \mathbb {R}^2:\; g_1(x) := 1 - x_1^2 \ge 0,\, g_2(x) := x_1^2 - x_2^2 \ge 0\}\). This can be verified numerically by approximating the region of \(\mathbb {R}^2\) where P is positive definite (see Fig. 3), and an analytical certificate will be given below.

Fig. 3
figure 3

The semialgebraic set \(\mathcal {K}\) considered in Example 2.5 (red shading, solid boundary), compared to the region of \(\mathbb {R}^2\) where the matrix P(x) in (2.14) is positive definite (grey shading, dashed boundary). On the boundary of this region, P(x) is PSD but not definite

The semialgebraic set \(\mathcal {K}\) satisfies the Archimedean condition (2.12) with \(\sigma _0(x) = 0\), \(\sigma _1(x)=2\), \(\sigma _2(x)=1\) and \(r=\sqrt{2}\). Therefore, Theorem 2.4 guarantees that

$$\begin{aligned} P(x) = \sum _{k=1}^2 E_{\mathcal {C}_k}^{{\mathsf T}}\left[ S_{0,k}(x) + g_1(x) S_{1,k}(x) + g_2(x) S_{2,k}(x) \right] E_{\mathcal {C}_k} \end{aligned}$$
(2.15)

for some \(2 \times 2\) SOS matrices \(S_{0,1}\), \(S_{1,1}\), \(S_{2,1}\), \(S_{0,2}\), \(S_{1,2}\) and \(S_{1,2}\). Possible choices for these matrices are \(S_{2,1}=0\), \(S_{1,2}=0\) and

$$\begin{aligned} S_{0,1}(x)&= I_2 +\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \begin{array}{@{}c@{}}\begin{bmatrix} x_1 &{}\quad x_2\end{bmatrix} \\ \end{array}&S_{1,1}(x)&= \begin{bmatrix} x_1 \\ 1 \end{bmatrix} \begin{array}{@{}c@{}}\begin{bmatrix} x_1 &{}\quad 1 \end{bmatrix} \\ \end{array} \\ S_{0,2}(x)&= I_2 +\begin{bmatrix} x_1 \\ -x_2 \end{bmatrix} \begin{array}{@{}c@{}}\begin{bmatrix} x_1 &{}\quad -x_2\end{bmatrix} \\ \end{array}&S_{2,2}(x)&=\begin{bmatrix} 2 \\ x_2 \end{bmatrix} \begin{array}{@{}c@{}}\begin{bmatrix} 2 &{}\quad x_2 \end{bmatrix} \\ \end{array}. \end{aligned}$$

Since \(S_{0,1}\) and \(S_{0,2}\) are positive definite and all other addends in (2.15) are PSD on \(\mathcal {K}\), we conclude in particular that P(x) is positive definite on that set, as claimed initially. \(\blacksquare \)

If \(\mathcal {K}\) is not compact or does not satisfy the Archimedean condition, Theorem 2.4 can be used to prove a similar decomposition result that applies to \((1+\Vert x\Vert ^2)^\nu P\) with large enough exponent \(\nu \), as long as P has even degree and the behaviour of its leading term can be controlled. We start with the case in which P is homogeneous and \(\mathcal {K}\) is defined using homogeneous polynomial inequalities of even degree.

Theorem 2.5

Let \(\mathcal {K}\) be a semialgebraic set defined as in (1.2) with homogeneous polynomials \(g_1,\ldots ,g_q\) of even degree, and such that \(\mathcal {K} {\setminus }\{0\}\) is nonempty. Let P(x) be a homogeneous polynomial matrix of even degree whose sparsity graph is chordal and has maximal cliques \(\mathcal {C}_1,\ldots ,\mathcal {C}_t\). If P is strictly positive definite on \(\mathcal {K} {\setminus }\{0\}\), there exist an integer \(\nu \ge 0\) and homogeneous SOS matrices \(S_{j,k}(x)\) of size \(|\mathcal {C}_k| \times |\mathcal {C}_k|\) such that

$$\begin{aligned} \Vert x\Vert ^{2\nu } P(x) = \sum _{k=1}^t E_{\mathcal {C}_k}^{{\mathsf T}}\bigg ( S_{0,k}(x) + \sum _{j=1}^q g_j(x)S_{j,k}(x) \bigg ) E_{\mathcal {C}_k}. \end{aligned}$$
(2.16)

This result, proven in Sect. 6.4, recovers Theorem 4 in [9] when P is dense. If P is not homogeneous, we find the following version of the Putinar–Vasilescu Positivstellensätze [37] for sparse polynomial matrices, which is a sparsity-exploiting formulation of a recent result for general (dense) matrices [9, Corollary 3].

Corollary 2.2

Let \(\mathcal {K}\) be a semialgebraic set defined as in (1.2), and let \(P(x)= \sum _{|\alpha |\le 2d_0} P_{\alpha } x^{\alpha }\) be an inhomogeneous polynomial matrix of even degree \(2d_0\) whose sparsity graph is chordal and has maximal cliques \(\mathcal {C}_1,\ldots ,\mathcal {C}_t\). If P is strictly positive definite on \(\mathcal {K}\) and its highest-degree homogeneous part \(\sum _{|\alpha | = 2d_0} P_{\alpha } x^{\alpha }\) is strictly positive definite on \(\mathbb {R}^n {\setminus } \{0\}\), there exist an integer \(\nu \ge 0\) and SOS matrices \(S_{j,k}(x)\) of size \(|\mathcal {C}_k| \times |\mathcal {C}_k|\) such that

$$\begin{aligned} (1+\Vert x\Vert ^2)^\nu P(x) = \sum _{k=1}^t E_{\mathcal {C}_k}^{{\mathsf T}}\bigg ( S_{0,k}(x) + \sum _{j=1}^q g_j(x)S_{j,k}(x) \bigg ) E_{\mathcal {C}_k}. \end{aligned}$$
(2.17)

Proof

Set \(Q(x,y) = y^{2d_0}P(x/y)\), \(d_j = \lceil \frac{1}{2} \deg (g_j) \rceil \), \(h_j(x,y) = y^{2d_j} g_j(x/y)\) for all \(j=1,\ldots ,q\), and \(\mathcal {K}' = \{(x,y): \,h_j(x,y) \ge 0,\, j=1,\ldots ,q\}\). The polynomial matrix Q and the polynomials \(h_j\) are homogeneous of even degree, and satisfy \(Q(x,1) = P(x)\) and \(h_j(x,1) = g_j(x)\) for all \(j = 1, \ldots , q\). Furthermore, Q is positive definite on \(\mathcal {K}'{\setminus } \{(0,0)\}\) because \(Q(x,0)=\sum _{|\alpha | = 2d_0} P_{\alpha } x^{\alpha }\) is positive definite by assumption, while if \((x,y) \in \mathcal {K}'\) with \(y\ne 0\), then \(x/y \in \mathcal {K}\) and Q(xy) is positive definite because so is P(x/y). Applying Theorem 2.5 to Q and \(\mathcal {K}'\), noting that \(\Vert (x,y)\Vert ^2 = y^2 + \Vert x\Vert ^2\), setting \(y=1\), and recalling that \(Q(x,1) = P(x)\) yields (2.17). \(\square \)

Remark 2.1

Setting \(g_1 = \cdots = g_q \equiv 0\) in Theorem 2.5 and Corollary 2.2 immediately yields Theorem 2.3 and Corollary 2.1 for the global case \(\mathcal {K}=\mathbb {R}^n\) (observe that a globally PSD homogeneous polynomial matrix must have even degree).

3 Convex optimization with sparse polynomial matrix inequalities

The decomposition results in Sects. 2.2 and 2.3 can be used to construct hierarchies of sparsity-exploiting SOS reformulations for the optimization problem (1.1) that produce feasible vectors \(\lambda \) and upper bounds on its optimal value \(B^*\).

Specifically, fix any two integers \(\nu \) and d satisfying \(\nu \ge 0\) and \(2d \ge 2\nu + \max \{\deg (P), \deg (g_1), \ldots , \deg (g_q)\},\) and consider the SOS optimization problem

$$\begin{aligned} B_{d,\nu }^* :=&\inf _{\lambda ,\, S_{j,k}} \; b(\lambda ) \nonumber \\&\quad \text {s.t.}\; \sigma (x)^{\nu } P(x,\lambda ) = \sum _{k=1}^t E_{\mathcal {C}_k}^{{\mathsf T}}\bigg ( S_{0,k}(x) + \sum _{j=1}^m g_j(x)S_{j,k}(x) \bigg ) E_{\mathcal {C}_k}, \nonumber \\&\quad S_{j,k} \in \Sigma _{2d_j}^{\left| \mathcal {C}_k \right| } \quad \forall j = 0, \ldots , q, \; \forall k = 1, \ldots , t. \end{aligned}$$
(3.1)

Here \(\Sigma _{2\omega }^m\) denotes the cone of n-variate \(m \times m\) SOS matrices of degree \(2\omega \), \(d_0 := d\), \(d_j := d - \lceil \frac{1}{2} \deg (g_j) \rceil \) for each \(j = 1, \ldots , q\), and either \(\sigma (x) = \Vert x\Vert ^2\) or \(\sigma (x) = 1+ \Vert x\Vert ^2\) depending on whether P is homogeneous in x or not. For each choice of \(\nu \) and d, problem (3.1) can be recast as an SDP [12, 18, 34, 42] and solved using a wide range of algorithms. The optimal \(\lambda \) is clearly feasible for (1.1), so \(B_{d,\nu }^* \ge B^*\).

The nontrivial and far-reaching implication of the decomposition theorems presented in Sects. 2.2 and 2.3 is that the SOS problem (3.1) is asymptotically exact as d or \(\nu \) are increased, provided that the original problem (1.1) satisfies suitable technical conditions and is strictly feasible. For instance, the sparsity-exploiting version of Putinar’s Positivstellensätze in Theorem 2.4 leads to the following result.

Theorem 3.1

Let \(\mathcal {K}\) be a compact basic semialgebraic set defined as in (1.2) that satisfies the Archimedean condition (2.12), and let \(B^*\) and \(B_{d,\nu }^*\) be as in (1.1) and (3.1). If there exists \(\lambda _0 \in \mathbb {R}^\ell \) such that \(P(x; \lambda _0)\) is strictly positive definite on \(\mathcal {K}\), then \(B_{d,0}^* \rightarrow B^*\) from above as \(d \rightarrow \infty \).

Proof

It suffices to show that, for any \(\varepsilon >0\), there exists d such that \(B^* \le B_{d,0}^* \le B^* + 2\varepsilon \). If \(\lambda _0\) is optimal for (1.1), Theorem 2.4 guarantees that \(\lambda _0\) is feasible for (3.1) for \(\nu =0\) (observe that \([\sigma (x)]^0 \equiv 1\)) and some sufficiently large d. Since \(b(\lambda _0) = B^* \le B_{d,0}^* \le b(\lambda _0)\), we obtain \(B_{d,0}^*=B^*\). In particular, if the minimizer of (1.1) is strictly feasible, then the convergence \(B_{d,0}^* \rightarrow B^*\) is finite.

If \(\lambda _0\) is not optimal, fix \(\varepsilon >0\) and let \(\lambda _\varepsilon \) be an \(\varepsilon \)-suboptimal feasible point for (1.1) such that \(b(\lambda _\varepsilon ) \le B^* + \varepsilon < b(\lambda _0)\). Fix \(\lambda = (1-\gamma ) \lambda _\varepsilon + \gamma \lambda _0\) for some \(\gamma \in (0,1)\) to be determined. Since \(P(x,\lambda _0)\) is strictly positive definite on \(\mathcal {K}\) and \(P(x,\lambda _\varepsilon )\) is PSD on the same set, the matrix \(P(x,\lambda ) = (1-\gamma ) P(x,\lambda _\varepsilon ) + \gamma P(x,\lambda _0)\) is strictly positive definite on \(\mathcal {K}\) and Theorem 2.4 guarantees that \(\lambda \) is feasible for (3.1) when d is sufficiently large. Given such d, we can use the inequality \(B^*\le B_{d,0}^*\) and the convexity of the cost function b to estimate

$$\begin{aligned} B^*\le & {} B_{d,0}^* \le b(\lambda ) = b\left( (1-\gamma ) \lambda _\varepsilon + \gamma \lambda _0 \right) \le (1-\gamma ) b(\lambda _\varepsilon ) + \gamma b(\lambda _0)\\\le & {} (1-\gamma ) B^* + (1-\gamma ) \varepsilon + \gamma b(\lambda _0) = B^* + \varepsilon + \gamma \big [ b(\lambda _0) - B^* - \varepsilon \big ]. \end{aligned}$$

The term in square brackets is strictly positive by construction, so we can fix \(\gamma = \varepsilon /[b(\lambda _0) - B^* - \varepsilon ]\) and conclude that \(B^* \le B_\nu ^* \le B^* + 2\varepsilon \), as required. \(\square \)

If \(\mathcal {K}\) is not compact or does not satisfy the Archimedean condition, similar arguments that use Theorem 2.5 and Corollary 2.2 instead of Theorem 2.4 (omitted for brevity) give asymptotic convergence results provided that P satisfies additional conditions. For homogeneous problems of even degree, strict feasiblity suffices.

Theorem 3.2

Let \(\mathcal {K}\) be a basic semialgebraic set defined as in (1.2), and let \(B^*\) and \(B_{d,\nu }^*\) be as in (1.1) and (3.1). Suppose that \(P(x,\lambda )\) and the polynomials \(g_1,\ldots ,g_q\) defining \(\mathcal {K}\) are homogeneous of even degree in x for all \(\lambda \). If there exists \(\lambda _0 \in \mathbb {R}^\ell \) such that \(P(x; \lambda _0)\) is strictly positive definite on \(\mathcal {K}{\setminus } \{0\}\), then \(B_{d,\nu }^* \rightarrow B^*\) from above as \(\nu \rightarrow \infty \) with \(d = \nu + \frac{1}{2} \max \{\deg (P), \deg (g_1), \ldots , \deg (g_q)\}\) and \(\sigma (x) = \Vert x\Vert ^2\).

For inhomogeneous problems, instead, we require additional control on the leading homogeneous part of \(P(x,\lambda )\) for all \(\lambda \).

Theorem 3.3

Let \(\mathcal {K}\) be a basic semialgebraic set defined as in (1.2), and let \(B^*\) and \(B_{d,\nu }^*\) be as in (1.1) and (3.1). Suppose that \(P(x,\lambda ) = \sum _{\left| \alpha \right| \le 2d} P_\alpha (\lambda ) x^\alpha \) is an inhomogeneous polynomial matrix of even degree 2d such that \(\sum _{\left| \alpha \right| = 2d} P_\alpha (\lambda ) x^\alpha \) is positive semidefinite on \(\mathbb {R}^n\) for all \(\lambda \in \mathbb {R}^\ell \). If there exists \(\lambda _0 \in \mathbb {R}^\ell \) such that \(P(x; \lambda _0)\) is strictly positive definite on \(\mathcal {K}\) and such that \(\sum _{\left| \alpha \right| = 2d} P_\alpha (\lambda _0) x^\alpha \) is strictly positive definite on \(\mathbb {R}^n {\setminus }\{0\}\), then \(B_{d,\nu }^* \rightarrow B^*\) from above as \(\nu \rightarrow \infty \) with \(d = \nu + \lceil \frac{1}{2} \max \{\deg (P), \deg (g_1), \ldots , \deg (g_q)\}\rceil \) and \(\sigma (x) = 1+\Vert x\Vert ^2\).

Remark 3.1

Theorems 3.2 and 3.3 apply also when \(\mathcal {K} \equiv \mathbb {R}^n\), in which case they can be deduced from Theorem 2.3 and Corollary 2.1. Thus, when \(\mathcal {K} \equiv \mathbb {R}^n\) the SOS multipliers \(S_{j,k}(x)\) for \(j = 1, \ldots , q\) and \(k = 1, \ldots , t\) in (3.1) can be set to zero.

4 Relation to correlatively sparse SOS decompositions of polynomials

The SOS chordal decomposition theorems stated in Sect. 2 can be used to derive new existence results for sparsity-exploiting SOS decompositions of certain families of correlatively sparse polynomials [13, 19, 48]. A polynomial

$$\begin{aligned} p(x,y) = \sum _{\alpha ,\beta } c_{\alpha ,\beta } \,x^\alpha y^\beta , \end{aligned}$$

with independent variables \(x=(x_1,\ldots ,x_n)\) and \(y=(y_1,\ldots ,y_m)\) and coefficients \(c_{\alpha ,\beta } \in \mathbb {R}\), is correlatively sparse with respect to y if the variables \(y_1,\ldots ,y_m\) are sparsely coupled, meaning that the \(m \times m\) coupling matrix \(\mathrm{CSP}_y(p)\) with entries

$$\begin{aligned}{}[\mathrm{CSP}_y(p)]_{ij} = {\left\{ \begin{array}{ll} 1 &{}\text {if } i = j \text { or } \exists \beta : \beta _i \beta _j \ne 0 \text { and } c_{\alpha ,\beta } \ne 0 \\ 0 &{}\text {otherwise} \end{array}\right. } \end{aligned}$$
(4.1)

is sparse. For example, the polynomial \(p(x,y) = x_1^2x_2 y_1^2 +y_1y_2- x_2 y_2y_3 + y_4^4\) with \(n=2\) and \(m=4\) is correlatively sparse with respect to y and

$$\begin{aligned} \mathrm{CSP}_y(x_1^2x_2 y_1^2 + y_1y_2 - x_2 y_2y_3 + y_4^4) = \begin{bmatrix} 1 &{}\quad 1 &{}\quad 0 &{}\quad 0\\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 1 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1\\ \end{bmatrix}. \end{aligned}$$

The sparsity graph of the coupling matrix \(\mathrm{CSP}_y(p)\) is known as the correlative sparsity graph of p, and we say that p(xy) has chordal correlative sparsity with respect to y if its correlative sparsity graph is chordal.

To exploit correlative sparsity when attempting to verify the nonnegativity of p(xy), one looks for an SOS decomposition in the form [19, 48]

$$\begin{aligned} p(x,y) = \sum _{k=1}^t \sigma _k\!\left( x,y_{\mathcal {C}_k}\right) , \end{aligned}$$
(4.2)

where \(\mathcal {C}_1,\ldots ,\mathcal {C}_t\) are the maximal cliques of the correlative sparsity graph and each \(\sigma _k\) is an SOS polynomial that depends on x and on the subset \(y_{\mathcal {C}_k} = E_{\mathcal {C}_k} y\) of y indexed by \(\mathcal {C}_k\). For instance, with \(m=3\) and two cliques \(\mathcal {C}_1 = \{1,2\}\) and \(\mathcal {C}_2 = \{2,3\}\) we have \(y_{\mathcal {C}_1} = (y_1, y_2)\) and \(y_{\mathcal {C}_2} = (y_2, y_3)\).

In general, the existence of the sparse SOS representation (4.2) is only sufficient to conclude that p(xy) is nonnegative: Example 3.8 in [33] gives a nonnegative (in fact, SOS) correlatively sparse polynomial that cannot be decomposed as in (4.2). Nevertheless, our SOS chordal decomposition theorems from Sect. 2 imply that sparsity-exploiting SOS decompositions do exist for polynomials p(xy) that are quadratic and correlatively sparse with respect to y. This is because any polynomial p(xy) that is correlatively sparse, quadratic, and (without loss of generality) homogeneous with respect to y can be expressed as \(p(x,y)=y^{{\mathsf T}}P(x) y\) for some polynomial matrix P(x) whose sparsity graph coincides with the correlative sparsity graph of p(xy). Using this observation, we can “scalarize” Theorems 2.2, 2.3, 2.4 and 2.5 to obtain the following statements.

Corollary 4.1

Let \(p(x,y)=\sum _{\alpha , |\beta |\le 2} c_{\alpha ,\beta }x^\alpha y^\beta \) be nonnegative on \(\mathbb {R}^n \times \mathbb {R}^m\), quadratic and correlatively sparse in y, and such that \(\sum _{\alpha , |\beta | = 2} c_{\alpha ,\beta }x^\alpha y^\beta \) is nonnegative globally. If the correlative sparsity graph is chordal with maximal cliques \(\mathcal {C}_1,\ldots ,\mathcal {C}_t\), there exist an SOS polynomial \(\sigma _0(x)\) and SOS polynomials \(\sigma _k(x,y_{\mathcal {C}_k})\) quadratic in the second argument such that \(\sigma _0(x) p(x,y) = \sum _{k=1}^t \sigma _k\!\left( x, y_{\mathcal {C}_k}\right) .\)

Proof

Assume first that p is homogeneous in y and write \(p(x,y)=y^{{\mathsf T}}P(x) y\), where P(x) is positive semidefinite globally and has the same sparsity pattern as the correlative sparsity matrix \(\mathrm{CSP}_y(p)\). Theorem 2.2 guarantees that

$$\begin{aligned} \sigma _0(x) p(x,y) = y^{{\mathsf T}}\left[ \sigma _0(x) P(x) \right] y = y^{{\mathsf T}}\bigg (\sum _{k=1}^{t} E_{\mathcal {C}_k}^{{\mathsf T}}S_k(x) E_{\mathcal {C}_k}\bigg ) y = \sum _{k=1}^t y_{\mathcal {C}_k}^{{\mathsf T}}S_k(x) y_{\mathcal {C}_k} \end{aligned}$$

for some SOS polynomial \(\sigma _0(x)\) and SOS polynomial matrices \(S_k(x)\). Setting \(\sigma _k(x,y_{\mathcal {C}_k}) := y_{\mathcal {C}_k}^{{\mathsf T}}S_k(x) y_{\mathcal {C}_k}\) gives the desired decomposition. When p is not homogeneous, the result follows from a relatively straightforward homogenization argument described in Appendix B. \(\square \)

Corollary 4.2

Let \(p(x,y)=\sum _{|\alpha | = 2d, |\beta | \le 2} c_{\alpha ,\beta }x^\alpha y^\beta \) be homogeneous with degree 2d in x, and both quadratic and correlatively sparse in y. Suppose that

  1. (1)

    The correlative sparsity graph is chordal with maximal cliques \(\mathcal {C}_1,\ldots ,\mathcal {C}_t\);

  2. (2)

    \(\sum _{|\alpha |=2d, |\beta | = 2} c_{\alpha ,\beta }x^\alpha y^\beta > 0\) for all \((x,y)\ne (0,0)\);

  3. (3)

    If p is not homogeneous in y, then \(p(x,y)>0\) for all \(x \ne 0\) and \(y\in \mathbb {R}^m\).

Then, there exist an integer \(\nu \ge 0\) and SOS polynomials \(\sigma _k(x,y_{\mathcal {C}_k})\) quadratic in the second argument such that \(\Vert x\Vert ^{2\nu } p(x,y) = \sum _{k=1}^t \sigma _k\!\left( x,y_{\mathcal {C}_k}\right) .\)

Proof

If p is homogeneous in y, write \(p(x,y)=y^{{\mathsf T}}P(x) y\), observe that P is strictly positive definite for all \(x \in \mathbb {R}^n {\setminus } \{0\}\), apply Theorem 2.3 to P, and proceed as in the proof of Corollary 4.1. If p is not homogeneous, use a homogenization argument similar to that in “Appendix B”. \(\square \)

Corollary 4.3

Let \(p(x,y)=\sum _{|\alpha | \le d, |\beta | \le 2} c_{\alpha ,\beta }x^\alpha y^\beta \) be quadratic and correlatively sparse in y. Further, let \(\mathcal {K}\) be a semialgebraic set defined as in (1.2). Suppose that

  1. (1)

    The correlative sparsity graph is chordal with maximal cliques \(\mathcal {C}_1,\ldots ,\mathcal {C}_t\),

  2. (2)

    \(\sum _{|\alpha |\le d, |\beta | = 2} c_{\alpha ,\beta }x^\alpha y^\beta > 0\) for all \(x\in \mathcal {K}\) and \(y \in \mathbb {R}^m{\setminus } \{0\}\),

  3. (3)

    If p is not homogeneous in y, then \(p(x,y)>0\) for all \(x\in \mathcal {K}\) and \(y \in \mathbb {R}^m\).

Then:

  1. (i)

    If \(\mathcal {K}\) is compact and satisfies the Archimedean condition (2.12), there exist SOS polynomials \(\sigma _{j,k}(x,y_{\mathcal {C}_k})\), quadratic in the second argument, such that

    $$\begin{aligned} p(x,y) = \sum _{k=1}^t \bigg [ \sigma _{0,k}\!\left( x, y_{\mathcal {C}_k}\right) + \sum _{j=1}^q g_{j}(x)\sigma _{j,k}\!\left( x, y_{\mathcal {C}_k}\right) \bigg ]. \end{aligned}$$
  2. (ii)

    If p and the polynomials \(g_1,\ldots ,g_q\) defining \(\mathcal {K}\) are homogeneous of even degree in x, the set \(\mathcal {K} {\setminus } \{0\}\) is nonempty, and conditions 2) and 3) above hold for \(x\in \mathcal {K} {\setminus } \{0\}\), there exist an integer \(\nu \ge 0\) and SOS polynomials \(\sigma _{j,k}(x,y_{\mathcal {C}_k})\), quadratic in the second argument, such that

    $$\begin{aligned} \Vert x\Vert ^{2\nu } p(x,y) = \sum _{k=1}^t \bigg [ \sigma _{0,k}\!\left( x, y_{\mathcal {C}_k}\right) + \sum _{j=1}^q g_{j}(x)\sigma _{j,k}\!\left( x, y_{\mathcal {C}_k}\right) \bigg ]. \end{aligned}$$

Proof

If p is homogeneous in y, write \(p(x,y)=y^{{\mathsf T}}P(x)y\) for a polynomial matrix P(x) with chordal sparsity graph. The strict positivity of p for all nonzero y implies that P is strictly positive definite on \(\mathcal {K}\). Therefore, we can apply Theorem 2.4 for statement (i) and Theorem 2.5 for statement (ii), and proceed as in the proof of Corollary 4.1 to conclude the proof. If p is not homogeneous in y, one can use a homogenization argument similar to that in “Appendix B”. \(\square \)

Corollary 4.3 specializes, but appears not to be a particular case of, an SOS representation result for correlatively sparse polynomials proved by Lasserre [19, Theorem 3.1]. Similarly, Corollaries 4.1 and 4.2 specialize recent results in [25]. In particular, although our statements apply only to polynomials p(xy) that are quadratic and correlatively sparse with respect to y rather than to general ones, they provide explicit and tight degree bounds on the quadratic variables that cannot be deduced directly from the (more general) results in the references. For example, let \(\mathcal {K}\) be as in (1.2), suppose that the Archimedean condition (2.12) holds, and suppose that p(xy) is quadratic, homogeneous, and correlatively sparse in y with a chordal correlative sparsity graph. If p is strictly positive for all \(x \in \mathcal {K}\) and all \(y \in \mathbb {R}^m{\setminus }\{0\}\), then in particular it is so on the basic semialgebraic set \(\mathcal {K}':=\{(x,y) \in \mathcal {K} \times \mathbb {R}^m: \pm (1-y_1^2) \ge 0, \ldots , \pm (1-y_m^2) \ge 0\}\). This set also satisfies the Archimedean condition, so one can use Theorem 3.1 in [19] to represent p as

$$\begin{aligned} p(x,y) = \sum _{k=1}^t \bigg [ \sigma _{0k}\!\left( x, y_{\mathcal {C}_k}\right) + \sum _{j=1}^q g_{j}(x)\sigma _{jk}\!\left( x, y_{\mathcal {C}_k}\right) + \sum _{\ell \in \mathcal {C}_k} \rho _{k\ell }(x,y_{\mathcal {C}_k}) (1 - y_\ell ^2) \bigg ] \end{aligned}$$
(4.3)

for some SOS polynomials \(\sigma _{jk}\) and some polynomials \(\rho _{k\ell }\), not necessarily SOS. Corollary 4.3 enables one to go further and conclude that one may take \(\rho _{k\ell }\equiv 0\) and \(\sigma _{jk}\!\left( x, y_{\mathcal {C}_k}\right) = y_{\mathcal {C}_k}^{{\mathsf T}}S_{jk}(x) y_{\mathcal {C}_k}\) for some SOS matrices \(S_{jk}\). These restrictions could probably be deduced starting from (4.3), but our approach based on the SOS chordal decomposition of sparse polynomial matrices makes them almost immediate.

5 Numerical experiments

We now give numerical examples demonstrating the practical performance of the sparsity-exploiting SOS reformulations of the optimization problem (1.1) introduced in Sect. 3. All examples were implemented on a PC with a 2.2 GHz Intel Core i5 CPU and 12GB of RAM, using the SDP solver MOSEK [2] and a customized version of the MATLAB optimization toolbox YALMIP [23, 24]. The toolbox and all scripts used to generate the results presented below are available from https://github.com/aeroimperial-optimization/aeroimperial-yalmip and https://github.com/aeroimperial-optimization/sos-chordal-decomposition-pmi.

5.1 Approximation of global polynomial matrix inequalities

Our first numerical experiment illustrates the computational advantage of our sparsity-exploiting SOS reformulation for a problem with a global polynomial matrix inequality. Fix an integer \(\omega \ge 1\) and consider the \(3\omega \times 3\omega \) tridiagonal polynomial matrix \(P_\omega =P_\omega (x,\lambda )\), parameterized by \(\lambda \in \mathbb {R}^2\), given by

$$\begin{aligned} P_\omega = \begin{bmatrix} \lambda _2 x_1^4+x_2^4 &{} \lambda _1 x_1^2 x_2^2\\ \lambda _1 x_1^2 x_2^2 &{} \lambda _2 x_2^4+x_3^4 &{} \lambda _2 x_2^2 x_3^2\\ &{} \lambda _2 x_2^2 x_3^2 &{} \lambda _2 x_3^4 + x_1^4 &{} \lambda _1 x_1^2 x_3^2\\ &{} &{} \lambda _1 x_1^2 x_3^2 &{} \lambda _2 x_1^4+x_2^4 &{} \lambda _2 x_1^2 x_2^2\\ &{} &{} &{} \lambda _2 x_1^2 x_2^2 &{} \lambda _2 x_2^4+x_3^4 &{} \ddots \\ &{} &{} &{} &{} \ddots &{} \ddots &{} \lambda _i x_2^2 x_3^2\\ &{} &{} &{} &{} &{} \lambda _i x_2^2 x_3^2 &{} \lambda _2 x_3^4 + x_1^4 \end{bmatrix}\!, \end{aligned}$$

where \(i=1\) if \(3\omega \) is even and \(i = 2\) otherwise. Its sparsity graph is chordal with vertices \(\mathcal {V}=\{1,\,\ldots ,\,3\omega \}\), edges \(\mathcal {E} = \{(1,2),\,(2,3),\,\ldots ,\,(3\omega -1,3\omega )\}\), and maximal cliques \(\mathcal {C}_1 =\{1, 2\}\), \(\mathcal {C}_2 =\{2, 3\}\), \(\ldots \) , \(\mathcal {C}_{3\omega -1} =\{3\omega -1, 3\omega \}\). Observe that \(P_\omega (x)\) is homogeneous for all \(\lambda \), and it is positive definite on \(\mathbb {R}^3{\setminus }\{0\}\) when \(\lambda =(0,0)\).

First, we illustrate how Theorem 2.3 enables one to approximate the set of vectors \(\lambda \) for which \(P_\omega \) is PSD globally,

$$\begin{aligned} \mathcal {F}_\omega =\{\lambda \in \mathbb {R}^2 :\; P_\omega (x,\lambda ) \succeq 0 \quad \forall x \in \mathbb {R}^3\}. \end{aligned}$$

Define two hierarchies of subsets of \(\mathcal {F}_\omega \), indexed by a nonnegative integer \(\nu \), as

$$\begin{aligned}&\mathcal {D}_{\omega ,\nu } := \left\{ \lambda \in \mathbb {R}^2:\; \Vert x\Vert ^{2\nu } P_\omega (x,\lambda ) \text { is SOS}\right\} , \end{aligned}$$
(5.1a)
$$\begin{aligned}&\mathcal {S}_{\omega ,\nu } := \bigg \{\lambda \in \mathbb {R}^2:\; \Vert x\Vert ^{2\nu } P_\omega (x,\lambda ) = \sum _{k=1}^{3\omega - 1} E_{\mathcal {C}_k}^{{\mathsf T}}S_k(x) E_{\mathcal {C}_k}, S_k(x)\text { is SOS}\bigg \}.\nonumber \\ \end{aligned}$$
(5.1b)

The sets \(\mathcal {D}_{\omega ,\nu }\) are defined using the standard (dense) SOS constraint (1.3), while the sets \(\mathcal {S}_{\omega ,\nu }\) use the sparsity-exploiting nonnegativity certificate in Theorem 2.3. For each \(\nu \) we have \({\mathcal {S}}_{\omega ,\nu } \subseteq {\mathcal {D}}_{\omega ,\nu } \subseteq \mathcal {F}_\omega \), and the inclusions are generally strict. This is confirmed by the (approximations to the) first few sets \(\mathcal {D}_{2,\nu }\) and \(\mathcal {S}_{2,\nu }\) shown in Fig. 4, which were obtained by maximizing the linear cost function \(\lambda _1\,\cos \theta + \lambda _2\,\sin \theta \) for 1000 equispaced values of \(\theta \) in the interval \([0,\pi /2]\) and exploiting the \(\lambda _1 \mapsto - \lambda _1\) symmetry of \(\mathcal {D}_{2,\nu }\) and \(\mathcal {S}_{2,\nu }\). (Computations for \(\mathcal {S}_{2,1}\) were ill-conditioned, so the results are not reported.) On the other hand, for any choice of \(\omega \), Theorem 2.3 guarantees that any \(\lambda \) for which \(P_\omega \) is positive definite belongs to \(\mathcal {S}_{\omega ,\nu }\) for sufficiently large \(\nu \). Thus, the sets \(\mathcal {S}_{\omega ,\nu }\) can approximate \(\mathcal {F}_\omega \) arbitrarily accurately in the sense that any compact subset of the interior of \(\mathcal {F}_\omega \) is included in \(\mathcal {S}_{\omega ,\nu }\) for some sufficiently large integer \(\nu \). The same is true for the sets \(\mathcal {D}_{\omega ,\nu }\) since \(\mathcal {S}_{\omega ,\nu } \subseteq \mathcal {D}_{\omega ,\nu }\). Once again, this is confirmed by our numerical results for \(\omega =2\) in Fig. 4, which suggest that \(\mathcal {S}_{2,3} = \mathcal {D}_{2,2} = \mathcal {F}_2\).

Fig. 4
figure 4

Inner approximations of the set \(\mathcal {F}_2\) obtained with SOS optimization. a Sets \(\mathcal {D}_{2,\nu }\) obtained using the standard SOS constraint (5.1a); b Sets \(\mathcal {S}_{2,\nu }\) obtained using the sparse SOS constraint (5.1b). Theorem 3.2 guarantees the sequences of sets \(\{\mathcal {D}_{2,\nu }\}_{\nu \in \mathbb {N}}\) and \(\{\mathcal {S}_{2,\nu }\}_{\nu \in \mathbb {N}}\) are asymptotically exact as \(\nu \rightarrow \infty \). The numerical results suggest \(\mathcal {S}_{2,3} = \mathcal {D}_{2,2} = \mathcal {F}_2\) (colour figure online)

Next, to illustrate the computational advantages of our sparsity-exploiting SOS methods compared to the standard ones, we use both approaches to bound

$$\begin{aligned} B^* := \inf _{\lambda \in \mathcal {F}_\omega } \lambda _2 - 10\lambda _1 \end{aligned}$$
(5.2)

from above by replacing \(\mathcal {F}_\omega \) with its inner approximations \(\mathcal {D}_{\omega ,\nu }\) and \(\mathcal {S}_{\omega ,\nu }\) in (5.1a) and (5.1b). Optimizing over \(\mathcal {D}_{\omega ,\nu }\) requires one SOS constraint on a \(3\omega \times 3\omega \) polynomial matrix of degree \(d=2\nu +4\), while optimizing over \(\mathcal {S}_{\omega ,\nu }\) requires \(3\omega -1\) SOS constraints on \(2\times 2\) polynomial matrices of the same degree. Theorem 3.2 and the inclusion \(\mathcal {S}_{\omega ,\nu } \subseteq \mathcal {D}_{\omega ,\nu }\) guarantee that the upper bounds \(B_{d,\nu }\) on \(B^*\) obtained with either SOS formulation converge to the latter as \(\nu \rightarrow \infty \). (Here, as in Sect. 3, \(B_{d,\nu }\) denotes the upper bound on \(B^*\) obtained from SOS reformulations of (5.2) with SOS matrices of degree d and exponent \(\nu \).)

Table 2 Upper bounds \(B_{d,\nu }\) on the optimal value \(B^*\) of (5.2) for increasing values of matrix sizes \(\omega \), obtained using the standard SOS constraint (1.3) and the sparsity-exploiting SOS condition (3.1) with SOS matrices of degree \(d=4+2\nu \)

Table 2 lists upper bounds \(B_{d,\nu }\) computed with MOSEK using both SOS formulations, degree \(d=4+2\nu \), and different values of \(\omega \) and \(\nu \). The CPU time is also listed. Bounds for our sparse SOS formulation with \(\nu =1\) are not reported because MOSEK encountered severe numerical problems irrespective of the matrix size \(\omega \). It is evident that our sparsity-exploiting SOS method scales significantly better than the standard approach as \(\omega \) and \(\nu \) increase. For \(\omega =10\), for example, the bound obtained with our sparsity-exploiting approach and \(\nu =3\) agrees to two decimal places with the bounds calculated using traditional methods with \(\nu =2\) and 3, but the computation is three orders of magnitude faster. More generally, our sparsity-exploiting computations took less than 10 seconds for all tested values of \(\omega \) and \(\nu \),Footnote 1 while traditional ones required more RAM than available for all but the smallest values. We expect similarly large efficiency gains for any optimization problem with sparse polynomial matrix inequalities if the size of the largest maximal clique of the sparsity graph is much smaller than the matrix size.

5.2 Approximation of polynomial matrix inequalities on compact sets

As our second example, we consider the problem of constructing inner approximations for compact sets where a polynomial matrix is positive semidefinite. This problem arises, for instance, when approximating the robust stability region of linear dynamical systems [43], and was studied in [14] using standard SOS methods. Here, we show that our sparse-matrix version of Putinar’s Positivstellensätze in Theorem 2.4 allows for significant reductions in computational complexity without sacrificing the rigorous convergence guarantees established in [14].

Let \(\mathcal {K}\subset \mathbb {R}^n\) be a compact semialgebraic set defined as in (1.2) that satisfies the Archimedean condition, and let P(x) be an \(m\times m\) symmetric polynomial matrix. We seek to construct a sequence \(\{\mathcal {S}_{2d}\}_{d \in \mathbb {N}}\) of subsets of the (compact) set \(\mathcal {P} = \{x \in \mathcal {K} \mid P(x) \succeq 0\}\), such that \(\mathcal {S}_{2d}\) converges to \(\mathcal {P}\) in volume. Following [14], this can be done by letting \(\mathcal {S}_{2d} = \{x \in \mathcal {K} \mid s_{2d}(x) \ge 0\}\) be the superlevel set of the degree-2d polynomial \(s_{2d}(x)\) that solves the convex optimization problem

$$\begin{aligned} \begin{aligned} B_{m,d}^* := \max _{s_{2d}(x)} \int _{\mathcal {K}} s_{2d}(x) \,\mathrm{d}x \quad \text {s.t.}\quad P(x) - s_{2d}(x)I \succeq 0 \quad \forall x \in \mathcal {K}. \end{aligned} \end{aligned}$$
(5.3)

This problem is in the form (1.1), and the optimization variable \(\lambda \) is the vector of \(\left( {\begin{array}{c}n+2d\\ n\end{array}}\right) \) coefficients of \(s_{2d}\) (with respect to any chosen basis). The polynomial \(s_{2d}\) is a pointwise lower bound for the minimum eigenvalue function of P(x) on \(\mathcal {K}\). Using this observation, the compactness of \(\mathcal {K}\), the continuity of eigenvalues, and the Weierstrass polynomial approximation theorem, one can show that, as \(d \rightarrow \infty \), \(\mathcal {S}_{2d}\) converges to \(\mathcal {P}\) in volume, \(s_{2d}\) converges pointwise almost everywhere to the minimum eigenvalue function, and \(B_{m,d}^*\) tends to the integral of the latter on \(\mathcal {K}\).

Theorem 1 in [14] shows that convergence is maintained if the intractable matrix inequality constraint is replaced with a weighted SOS representation for \(P(x)-s_{2d}(x)I\) in the form (1.3), where the SOS matrices \(S_k\) are chosen such that the degree of \(S_0 + g_1 S_1 + \cdots + g_q S_q\) does not exceed 2d. By Theorem 2.4, the same is true for the sparsity-exploiting reformulation (3.1) with \(\nu =0\), SOS matrices \(S_{0,k}\) of degree \(d_0 = d\), and SOS matrices \(S_{j,k}\) of degree \(d_j = d - \lceil \frac{1}{2} \deg (g_j) \rceil \).

Fig. 5
figure 5

Chordal sparsity patterns for the polynomial matrix P(x) in (5.4)

To illustrate the computational advantages gained by exploiting sparsity, we consider a relatively simple (but still nontrivial) bivariate problem with \(\mathcal {K}=\{x \in \mathbb {R}^2: 1-x_1^2 - x_2^2 \ge 0\}\) being the unit disk and

$$\begin{aligned} P(x) = (1 - x_1^2 - x_2^2)I_m + (x_1 + x_1x_2 - x_1^3)A + (2x_1^2x_2-x_1x_2-2x_2^3)B, \end{aligned}$$
(5.4)

where A and B are \(m \times m\) symmetric matrices with chordal sparsity graphs, zero diagonal elements, and other entries drawn randomly from the uniform distribution on (0, 1). The sparsity graphs of A and B were generated randomly whilst ensuring that their maximal cliques contain no more than five vertices [26], and the corresponding structure of P for \(m = 15\), 20, 25, 30, 35 and 40 is shown in Fig. 5. The exact data matrices used in our calculations are available at https://github.com/aeroimperial-optimization/sos-chordal-decomposition-pmi.

Fig. 6
figure 6

Inner approximations \(\mathcal {S}_{2d}\) of the subset \(\mathcal {P}\) of the unit disk (black dots) where the sparse \(m \times m\) polynomial matrix P(x) in (5.4) is PSD. The boundary of \(\mathcal {P}\) is plotted as a solid black line. The approximating sets computed using the standard SOS constraint (1.3) (blue solid boundary and blue shading; shown if available) and the sparsity-exploiting SOS problem (3.1) with \(\nu =0\) (red solid boundary, no shading) (colour figure online)

Table 3 Lower bounds on the optimal value of (5.3) with P(x) as in (5.4) and \(\mathcal {K}\) the unit disk, obtained using the standard SOS constraint (1.3) and the sparsity-exploiting SOS problem (3.1) for increasing values of m and d

Figure 6 illustrates the inner approximations \(\mathcal {S}_{2d}\) of \(\mathcal {P}\) computed using both the standard SOS constraint (1.3) and our sparsity-exploiting formulation (3.1). Table 3 lists the corresponding lower bounds \(B^\mathrm{sos}_{m,d}\) on \(B_{m,d}^*\), as well as the CPU time required to solve the SOS programs with MOSEK and the limit \(B_{m,\infty }^*\) obtained from numerical integration of the minimum eigenvalue function of P on the unit disk \(\mathcal {K}\). Similar to what was observed in Sect. 5.1, for fixed d the dense SOS constraints give better bounds than the sparse ones. As expected, however, the sparsity-exploiting formulation requires significantly less time for large m, and all problem instances were solved within 10 seconds. In addition, the approximating sets \(\mathcal {S}_{2d}\) in Fig. 6 provided by both SOS formulations for every combination of d and m are almost indistinguishable. For a given matrix size m, therefore, our sparse SOS formulation enables the construction of much better approximations to \(\mathcal {P}\) by considering large values of d, which are beyond the reach of standard SOS formulations. This is important because, as shown in Fig. 7 and Table 4 for \(m=15\), the convergence to the set \(\mathcal {P}\) and to the limit \(B_{m,\infty }^*\) is slow as d is raised.

Fig. 7
figure 7

Top: Boundaries of the set \(\mathcal {P}\) (black lines) and of the inner approximations \(\mathcal {S}_{2d}\) (red lines) for the matrix P(x) in (5.4) with \(m=15\), obtained with the sparse SOS formulation for \(d=2\), 4, 6, 10, 12 and 14 (left to right). Bottom: Absolute difference between the optimal polynomial \(s_{2d}\) and the minimum eigenvalue function of P on the unit disk \(\mathcal {K}\) (colour figure online)

Table 4 Lower bounds \(B^\mathrm{sos}_{15,d}\) on the asymptotic value \(B_{15,\infty }^* = -1.153\) of (5.3) for \(m=15\), calculated using the sparsity-exploiting SOS problem (3.1) with \(\nu =0\) and the standard SOS constraint (1.3)

6 Proofs

6.1 Proof of Proposition 2.1

We construct polynomial matrices that cannot be decomposed according to (2.2) with polynomial \(S_k\). To do so, we may assume that \(n=1\) without loss of generality because univariate polynomial matrices are particular cases of multivariate ones.

First, fix \(m=3\) and let \(\mathcal {G}\) be the sparsity graph of the \(3\times 3\) positive definite polynomial matrix considered in Example 2.2 for \(k=1\),

$$\begin{aligned} P(x) = I_3 + \begin{bmatrix} 1+x^2 &{}\quad x+x^2 &{}\quad 0\\ x+x^2 &{}\quad 2x^2 &{}\quad x-x^2\\ 0 &{}\quad x-x^2 &{}\quad x^2 \end{bmatrix}. \end{aligned}$$

Observe that \(\mathcal {G}\) is essentially the only connected but not complete graph with \(m=3\): any other such graph can be reduced to \(\mathcal {G}\) by reordering its vertices, which corresponds to a symmetric permutation of the polynomial matrix it describes. We have already shown in Example 2.2 that P has no decomposition of the form (2.2) with polynomial \(S_k\), so Proposition 2.1 holds for \(m=3\).

The same \(3\times 3\) matrix can be used to generate counterexamples for a general connected but not complete sparsity graph \(\mathcal {G}\) with \(m > 3\). Non-completeness implies that \(\mathcal {G}\) must have at least two maximal cliques, while connectedness implies that every maximal clique \(\mathcal {C}_i\) must contain at least two elements and intersect at least one other clique \(\mathcal {C}_j\). Whenever \(\mathcal {C}_i\cap \mathcal {C}_j \ne \emptyset \), therefore, there exist vertices \(v_i \in \mathcal {C}_i {\setminus } \mathcal {C}_j\), \(v_j \in \mathcal {C}_j {\setminus } \mathcal {C}_i\) and \(v_k \in \mathcal {C}_i\cap \mathcal {C}_j\). Moreover, since \(\mathcal {G}\) is chordal, Theorem 3.3 in [46] guarantees that it contains at least one simplicial vertex (cf. Sect. 2.1 for a definition), which must belong to one and only one maximal clique. Upon reordering the vertices and the maximal cliques if necessary, we may therefore assume without loss of generality that: (i) \(\mathcal {C}_1 = \{1,\ldots ,r\}\) for some r; (ii) vertex 1 is simplicial, so it belongs only to clique \(\mathcal {C}_1\); (iii) vertex 2 is in \(\mathcal {C}_1 \cap \mathcal {C}_2\) and vertex \(r+1\) is in \(\mathcal {C}_2 {\setminus } \mathcal {C}_1\).

Now, consider the positive definite \(m \times m\) matrix

$$\begin{aligned} P(x) = I_m + E_{\{1,2,r+1\}}^{{\mathsf T}}\begin{bmatrix} 1+x^2 &{}\quad x+x^2 &{}\quad 0\\ x+x^2 &{}\quad 2x^2 &{}\quad x-x^2\\ 0 &{}\quad x-x^2 &{}\quad x^2 \end{bmatrix} E_{\{1,2,r+1\}}, \end{aligned}$$

whose nonzero entries are on the diagonal or in the principal submatrix with rows and columns indexed by \(\{1,2,r+1\}\). Note that the sparsity pattern of P is compatible with the sparsity graph \(\mathcal {G}\). We claim that no decomposition of the form (2.2) exists where each \(S_k\) is a PSD polynomial matrix.

For the sake of contradiction, assume that such a decomposition exists, so

$$\begin{aligned} P(x) = E_{\mathcal {C}_1}^{{\mathsf T}}S_1(x) E_{\mathcal {C}_1} + \sum _{k=2}^t E_{\mathcal {C}_k}^{{\mathsf T}}S_k(x) E_{\mathcal {C}_k} =: E_{\mathcal {C}_1}^{{\mathsf T}}S_1(x) E_{\mathcal {C}_1} + Q(x), \end{aligned}$$

where \(S_1(x)\) and Q(x) are \(r\times r\) and \(m \times m\) PSD polynomial matrices, respectively. Since vertex 1 is contained only in clique \(\mathcal {C}_1\), the matrix \(S_1\) must have the form

$$\begin{aligned} S_1(x) = \begin{bmatrix} 2+x^2 &{}\quad (x+x^2,\, 0,\,\ldots ,\,0)\\ (x+x^2,\, 0,\,\ldots ,\,0)^{{\mathsf T}}&{}\quad T(x) \end{bmatrix} \end{aligned}$$

for some \((r-1)\times (r-1)\) polynomial matrix T to be determined. For the same reason, the matrix Q(x) can be partitioned as

$$\begin{aligned} Q(x) = \begin{bmatrix} 0 &{}\quad 0_{1 \times (r-1)} &{}\quad 0_{1\times (m-r)}\\ [0.5ex] 0_{(r-1) \times 1} &{}\quad A(x) &{}\quad B(x) \\ 0_{(m-r) \times 1} &{}\quad B(x)^{{\mathsf T}}&{}\quad C(x) \end{bmatrix}, \end{aligned}$$

where \(0_{p\times q}\) is a \(p\times q\) matrix of zeros, A is an \((r-1)\times (r-1)\) polynomial matrix to be determined, and the \((r-1) \times (m-r)\) block B and the \((m-r) \times (m-r)\) block C are given by

$$\begin{aligned} B(x)&= \left[ \begin{array}{cl} x-x^2 &{}\quad 0_{1 \times (m-r-1)}\\ 0_{(r-2) \times 1} &{}\quad 0_{(r-2) \times (m-r-1)} \end{array}\right] ,&C(x)&= \begin{bmatrix} x^2+2 &{}\quad 0_{1 \times (m-r-1)}\\ 0_{(m-r-1) \times 1} &{}\quad I_{m-r-1} \end{bmatrix}. \end{aligned}$$

The block T of \(S_1\) and the block A of Q correspond to elements of clique \(\mathcal {C}_1\) that may belong also to other cliques. These blocks cannot be determined uniquely, but their sum must be equal to the principal submatrix of P with rows and columns indexed by \(\{2,\ldots ,r\}\). In particular, we must have \(A_{11}(x) = 2x^2+1 - T_{11}(x)\). Moreover, since \(S_1\) and Q are PSD by assumption, we may take appropriate Schur complements to find

$$\begin{aligned} T(x)&\succeq \left[ \begin{array}{cl} \frac{x^2(1+x)^2}{x^2+2} &{} 0_{1 \times (r-2)}\\ 0_{(r-2) \times 1} &{} 0_{(r-2) \times (r-2)} \end{array}\right] ,&A(x)&\succeq \left[ \begin{array}{cl} \frac{x^2(1-x)^2}{x^2+2} &{} 0_{1 \times (r-2)}\\ 0_{(r-2) \times 1} &{} 0_{(r-2) \times (r-2)} \end{array}\right] . \end{aligned}$$

Using the identity \(A_{11}(x) = 2x^2+1 - T_{11}(x)\), these conditions require

$$\begin{aligned} T_{11}(x) \ge \frac{x^2(1+x)^2}{x^2+2}, \qquad 2 x^2+1 - T_{11}(x) \ge \frac{x^2(1-x)^2}{x^2+2}. \end{aligned}$$

However, just as in Example 2.2, no polynomial \(T_{11}(x)\) can satisfy these inequalities. We conclude that P cannot admit a decomposition of the form (2.2) with PSD polynomial matrices \(S_k\), which proves Proposition 2.1 in the general case.

6.2 Proof of Theorem 2.2

To establish Theorem 2.2 we adapt ideas by Kakimura [16], who proved the chordal decomposition theorem for constant PSD matrices (cf. Theorem 2.1) using the fact that symmetric matrices with chordal sparsity patterns admit an \(LDL^{{\mathsf T}}\) factorization with no fill-in [41]. In Appendix C, we use Schmüdgen’s diagonalization procedure [44] to prove the following analogous statement for polynomial matrices.

Proposition 6.1

If P(x) is an \(m\times m\) symmetric polynomial matrix with chordal sparsity graph, there exist an \(m \times m\) permutation matrix T, an invertible \(m \times m\) lower-triangular polynomial matrix L(x), and polynomials b(x), \(d_1(x),\,\ldots ,\,d_m(x)\) such that

$$\begin{aligned} b^4(x)\, T P(x) T^{{\mathsf T}}= L(x) {{\,\mathrm{Diag}\,}}\left( d_1(x),\,\ldots ,\,d_m(x) \right) L(x)^{{\mathsf T}}. \end{aligned}$$
(6.1)

Moreover, L has no fill-in in the sense that \(L + L^{{\mathsf T}}\) has the same sparsity as \(TPT^{{\mathsf T}}\).

Now, let P(x) be a PSD polynomial matrix with chordal sparsity graph, and apply Proposition 6.1 to diagonalize it. We will assume first that the permutation matrix T is the identity, and remove this assumption at the end.

Since P is PSD, the polynomials \(d_1(x),\,\ldots ,\,d_m(x)\) in (6.1) must be nonnegative globally and, by the Hilbert–Artin theorem [5], can be written as sum of squares of rational functions. In particular, there exist SOS polynomials \(f_1,\,\ldots ,\,f_m\) and \(g_1,\,\ldots ,\,g_m\) such that \(f_i(x) d_i(x) = g_i(x)\) for all \(i=1,\,\ldots ,\,m\). Therefore, we can write (omitting the argument x for notational simplicity)

$$\begin{aligned} \prod _{j=1}^m f_j b^4 \, P = L {{\,\mathrm{Diag}\,}}\bigg ( g_1\prod _{j\ne 1} f_j ,\,\ldots ,\,g_i\prod _{j\ne i} f_j,\,\ldots ,\, g_m\prod _{j\ne m} f_j \bigg ) L^{{\mathsf T}}. \end{aligned}$$

Next, define the polynomial \(\sigma := \prod _{j} f_j b^4\) and observe that it SOS because it is the product of SOS polynomials. For the same reason, the products \( g_i\prod _{j\ne i} f_j \) appearing on the right-hand side of the last equation are SOS polynomials. Thus, we can find an integer s and polynomials \(q_{11},\,\ldots ,\,q_{m1},\,\ldots ,\,q_{1s},\,\ldots ,\,q_{ms}\) such that

$$\begin{aligned} \sigma P = \sum _{i=1}^s L {{\,\mathrm{Diag}\,}}\left( q_{1i}^2,\,\ldots ,\,q_{mi}^2\right) L^{{\mathsf T}}=: \sum _{i=1}^s H_i H_i^{{\mathsf T}}, \end{aligned}$$
(6.2)

where, for notational simplicity, we have introduced the lower-triangular matrices

$$H_i := L {{\,\mathrm{Diag}\,}}\left( q_{1i},\,\ldots ,\,q_{mi}\right) .$$

Under our additional assumption that Proposition 6.1 can be applied with \(T=I\), Theorem 2.2 follows if we can show that

$$\begin{aligned} H_i H_i^{{\mathsf T}}= \sum _{k=1}^t E_{\mathcal {C}_k}^{{\mathsf T}}S_{ik} E_{\mathcal {C}_k} \end{aligned}$$
(6.3)

for some SOS matrices \(S_{ik}\) and each \(i=1,\,\ldots ,\,s\). Indeed, combining (6.3) with (6.2) and setting \(S_k = \sum _{i=1}^s S_{ik}\) yields the desired decomposition (2.6) for P.

To establish (6.3), denote the columns of \(H_i\) by \(h_{i1},\ldots ,h_{im}\) and write

$$\begin{aligned} H_iH_i^{{\mathsf T}}= \sum _{j =1}^m h_{ij}h_{ij}^{{\mathsf T}}. \end{aligned}$$
(6.4)

Since \(H_i\) has the same sparsity pattern as L, the nonzero elements of each column vector \(h_{ij}\) must be indexed by a clique \(\mathcal {C}_{\ell _j}\) for some \(\ell _j \in \{1, \ldots , t\}\). Thus, the nonzero elements of \(h_{ij}\) can be extracted through multiplication by the matrix \(E_{\mathcal {C}_{\ell _j}}\) and \(h_{ij} = E_{\mathcal {C}_{\ell _j}}^{{\mathsf T}}E_{\mathcal {C}_{\ell _j}}h_{ij}\). Consequently,

$$\begin{aligned} h_{ij}h_{ij}^{{\mathsf T}}= E_{\mathcal {C}_{\ell _j}}^{{\mathsf T}}\underbrace{\left( E_{\mathcal {C}_{\ell _j}}h_{ij}h_{ij}^{{\mathsf T}}E_{\mathcal {C}_{\ell _j}}^{{\mathsf T}}\right) }_{=:Q_{ij}} E_{\mathcal {C}_{\ell _j}} \end{aligned}$$
(6.5)

where \(Q_{ij}\) is an SOS matrix by construction. Now, let \(J_{ik} = \{j: \ell _j = k\}\) be the set of column indices j such that column \(h_{ij}\) is indexed by clique \(\mathcal {C}_k\). These index sets are disjoint and \(\cup _k J_{ik} = \{1,\ldots ,m\}\), so substituting (6.5) into (6.4) we obtain

$$\begin{aligned} H_iH_i^{{\mathsf T}}= \sum _{j =1}^m E_{\mathcal {C}_{\ell _j}}^{{\mathsf T}}Q_{ij} E_{\mathcal {C}_{\ell _j}} = \sum _{k=1}^t \sum _{j \in J_{ik}} E_{\mathcal {C}_k}^{{\mathsf T}}Q_{ij} E_{\mathcal {C}_k} = \sum _{k=1}^t E_{\mathcal {C}_k}^{{\mathsf T}}\bigg ( \sum _{j \in J_{ik}} Q_{ij} \bigg ) E_{\mathcal {C}_k}. \end{aligned}$$

This is exactly (6.3) with matrices \(S_{ik} = \sum _{j \in J_{ik}} Q_{ij}\), which are SOS because they are sums of SOS matrices. Thus, we have proved Theorem 2.2 for polynomial matrices P to which Proposition 6.1 can be applied with \(T=I\).

The general case follows from a relatively straightforward permutation argument. First, apply the argument above to decompose the permuted matrix \(T P T^{{\mathsf T}}\), whose sparsity graph \(\mathcal {G}'\) is obtained by reordering the vertices of the sparsity graph \(\mathcal {G}\) of P according to the permutation T. Second, observe that the cliques \(\mathcal {C}_1,\ldots ,\mathcal {C}_t\) of \(\mathcal {G}\) are related to the cliques \(\mathcal {C}'_1,\ldots ,\mathcal {C}'_t\) of \(\mathcal {G}'\) by the permutation T, so the matrices \(E_{\mathcal {C}_k}\) and \(E_{\mathcal {C}'_k}\) satisfy \(E_{\mathcal {C}_k} = E_{\mathcal {C}'_k}T\). As required, therefore,

$$\begin{aligned} \sigma (x) P(x) = T^{{\mathsf T}}\!\left[ \sigma (x) T P(x) T^{{\mathsf T}}\right] \! T = T^{{\mathsf T}}\!\left[ \sum _{k=1}^{t}\! E_{\mathcal {C}'_k}^{{\mathsf T}}S_k(x) E_{\mathcal {C}'_k} \!\right] \! T = \sum _{k=1}^{t} \!E_{\mathcal {C}_k}^{{\mathsf T}}S_k(x) E_{\mathcal {C}_k}. \end{aligned}$$

6.3 Proof of Theorem 2.4

Our proof of Theorem 2.4 follows the same steps used by Kakimura [16] to prove the chordal decomposition theorem for constant PSD matrices (Theorem 2.1). Borrowing ideas from [13], this can be done with the help of the Weierstrass polynomial approximation theorem and the following version of Putinar’s Positivstellensätze for polynomial matrices due to Scherer and Hol [42, Theorem 2].

Theorem 6.1

(Scherer and Hol [42]) Let \(\mathcal {K}\) be a compact semialgebraic set defined as in (1.2) that satisfies the Archimedean condition (2.12). If an \(m\times m\) symmetric polynomial matrix P(x) is strictly positive definite on \(\mathcal {K}\), there exist \(m \times m\) SOS matrices \(S_0,\,\ldots ,\,S_q\) such that \(P(x) = S_0(x) + \sum _{i=1}^q S_i(x)g_i(x)\).

Remark 6.1

It is also possible to establish Theorem 2.4 by modifying the proof of Theorem 6.1 with the help of Theorem 2.1. This alternative approach is technically more involved, but might be extended more easily to obtain sparsity-exploiting versions of the general result in [42, Corollary 1], rather than of its particular version in Theorem 6.1. We leave this generalization to future research.

Let P(x) be an \(m\times m\) polynomial matrix with chordal sparsity graph \(\mathcal {G}\). If \(m=1\) or 2, Theorem 2.4 is a direct consequence of Theorem 6.1. For \(m\ge 3\), we proceed by induction assuming that Theorem 2.4 holds for matrices of size \(m-1\) or less. Without loss of generality, we assume that the sparsity graph \(\mathcal {G}\) is not complete (otherwise, P is dense and Theorem 2.4 reduces to Theorem 6.1) and connected (otherwise, P and \(\mathcal {G}\) can be replaced by their connected components).

Since \(\mathcal {G}\) is chordal, it has at least one simplicial vertex [46, Theorem 3.3]. Relabelling vertices if necessary, which is equivalent to permuting P, we may assume that vertex 1 is simplicial and that the first maximal clique of \(\mathcal {G}\) is \(\mathcal {C}_1 = \{1,\ldots ,r\}\) with \(1< r < m\). Thus, P(x) has the block structure

$$\begin{aligned} P(x) = \begin{bmatrix} a(x) &{}\quad {b}(x)^{{\mathsf T}}&{}\quad 0 \\ {b}(x) &{}\quad U(x) &{}\quad V(x)\\ 0 &{}\quad V(x)^{T} &{}\quad W(x) \end{bmatrix} \end{aligned}$$

for some polynomial a, polynomial vector \(b=(b_1,\,\ldots ,\,b_{r-1})\), and polynomial matrices U of dimension \((r-1) \times (r-1)\), V of dimension \((r-1)\times (m-r)\), and W of dimension \((m-r) \times (m-r)\).

The polynomial a must be strictly positive on \(\mathcal {K}\) because P is positive definite on that set, so we can apply one step of the Cholesky factorization algorithm to write

$$\begin{aligned} L(x) P(x) L(x)^{{\mathsf T}}= \begin{bmatrix} a(x) &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad U(x) - a(x)^{-1}{b(x)}{b(x)}^{{\mathsf T}}&{}\quad V(x)\\ 0 &{}\quad V(x)^{{\mathsf T}}&{}\quad W(x) \end{bmatrix}, \end{aligned}$$
(6.6)

where

$$\begin{aligned} L(x) := \begin{bmatrix} 1 &{}\quad 0 &{}\quad 0\\ -a(x)^{-1}{b(x)} &{}\quad I &{}\quad 0\\ 0 &{}\quad 0 &{}\quad I \end{bmatrix}. \end{aligned}$$

The matrix on the right-hand side of (6.6) is positive definite on the compact set \(\mathcal {K}\) because so is P and L is invertible. Therefore, there exists \(\varepsilon >0\) such that

$$\begin{aligned} \begin{bmatrix} U(x) - a(x)^{-1}{b(x)}{b(x)}^{{\mathsf T}}&{}\quad V(x)\\ V(x)^{{\mathsf T}}&{}\quad W(x) \end{bmatrix} \succ 4 \varepsilon I \quad \forall x \in \mathcal {K}. \end{aligned}$$
(6.7)

Moreover, the rational entries of the matrix \(a^{-1} b b^{{\mathsf T}}\) are continuous on \(\mathcal {K}\) because a is strictly positive on that set, so we may apply the Weierstrass approximation theorem to choose a polynomial matrix H(x) that satisfies

$$\begin{aligned} {\qquad \forall x \in \mathcal {K}.} -\varepsilon I \preceq H(x) - a(x)^{-1}{b(x)}{b(x)}^{{\mathsf T}}\preceq \varepsilon I \quad \forall x \in \mathcal {K}. \end{aligned}$$
(6.8)

Next, consider the decomposition

$$\begin{aligned} P(x) = \begin{bmatrix} a(x) &{}\quad {b}(x)^{{\mathsf T}}&{}\quad 0 \\ {b}(x) &{}\quad H(x) + 2\varepsilon I &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 0 \end{bmatrix} + \begin{bmatrix} 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad U(x) - H(x)-2\varepsilon I &{}\quad V(x)\\ 0 &{}\quad V(x)^{{\mathsf T}}&{}\quad W(x) \end{bmatrix}. \end{aligned}$$
(6.9)

Combining (6.8) with the strict positivity of a(x) on \(\mathcal {K}\) we obtain

$$\begin{aligned} Q(x) := \begin{bmatrix}a(x) &{}\quad {b(x)}^{{\mathsf T}}\\ {b(x)} &{}\quad H(x)+2\varepsilon I\end{bmatrix} \succeq \begin{bmatrix}a(x) &{}\quad {b(x)}^{{\mathsf T}}\\ {b(x)} &{}\quad a(x)^{-1}{b(x)}{b(x)}^{{\mathsf T}}+ \varepsilon I\end{bmatrix} \succ 0 \qquad \forall x \in \mathcal {K}, \end{aligned}$$

where the last strict matrix inequality follows from the strict positivity of a and Schur’s complement conditions. Since Q is positive definite on \(\mathcal {K}\), we may apply Theorem 6.1 to find SOS matrices \(T_{0},\,\ldots ,\,T_{q}\) such that

$$\begin{aligned} Q(x) = T_{0}(x) + \sum _{i=1}^q g_i(x) T_{i}(x). \end{aligned}$$
(6.10)

Moreover, for all \(x \in \,\mathcal {K}\) inequalities (6.7) and (6.8) yield

$$\begin{aligned} R(x)&:= \begin{bmatrix}U - H(x)- 2\varepsilon I &{}\quad V(x)\\ V(x)^{{\mathsf T}}&{}\quad W(x)\end{bmatrix} \\&\quad \succeq \begin{bmatrix}U - a(x)^{-1}{b(x)}{b(x)}^{{\mathsf T}}- 3\varepsilon I &{}\quad V(x)\\ V(x)^{{\mathsf T}}&{}\quad W(x)\end{bmatrix} \succeq \varepsilon I. \end{aligned}$$

The sparsity of R(x) is described by the subgraph \(\tilde{\mathcal {G}}\) of \(\mathcal {G}\) obtained by removing the simplicial vertex 1 and its corresponding edges. This subgraph is chordal [46, Section 4.2] and has either t maximal cliques \(\tilde{\mathcal {C}}_1 = \mathcal {C}_1{\setminus }\{1\},\,\tilde{\mathcal {C}}_2 = \mathcal {C}_2,\,\ldots ,\,\tilde{\mathcal {C}}_t=\mathcal {C}_t\), or \(t-1\) maximal cliques \(\tilde{\mathcal {C}}_2=\mathcal {C}_2,\,\ldots ,\,\tilde{\mathcal {C}}_{t}=\mathcal {C}_t\) (in the latter case, we set \(\tilde{\mathcal {C}}_1=\emptyset \) for notational convenience). In either case, by the induction hypothesis, we can find SOS matrices \(Y_i\) and \(\tilde{S}_{ik}\) such that (omitting the argument x from all polynomials and polynomial matrices for notational simplicity)Footnote 2

$$\begin{aligned} R = E_{\tilde{\mathcal {C}}_1}^{{\mathsf T}}\bigg ( Y_{0} + \sum _{i=1}^q g_i Y_{i} \bigg ) E_{\tilde{\mathcal {C}}_1} + \sum _{k=2}^t E_{\tilde{\mathcal {C}}_k}^{{\mathsf T}}\bigg (\tilde{S}_{0k} + \sum _{i=1}^q g_i \tilde{S}_{ik}\bigg ) E_{\tilde{\mathcal {C}}_k}. \end{aligned}$$
(6.11)

The SOS decomposition (6.10) and (6.11) can now be combined with (6.9) to derive the desired SOS decomposition for P(x). The process is straightforward but cumbersome in notation, because we need to handle matrices of different dimensions. For each \(i \in \{0,\ldots ,q\}\) and \(k \in \{1,\ldots ,t\}\) define the matrices

$$\begin{aligned}&Z_i(x)&:= \begin{bmatrix} 0 &{}\quad 0 \\ 0 &{}\quad Y_i(x)\end{bmatrix},&S_{ik}(x)&:= \begin{bmatrix} 0 &{}\quad 0 \\ 0 &{}\quad \tilde{S}_{ik}(x)\end{bmatrix},&\end{aligned}$$

and note that

$$\begin{aligned} \begin{bmatrix} 0 &{}\quad 0 \\ 0 &{}\quad E_{\tilde{\mathcal {C}}_1}^{{\mathsf T}}Y_i(x)E_{\tilde{\mathcal {C}}_1} \end{bmatrix}&= E_{\mathcal {C}_1}^{{\mathsf T}}Z_i(x) E_{\mathcal {C}_1}.&\begin{bmatrix} 0 &{}\quad 0 \\ 0 &{}\quad E_{\tilde{\mathcal {C}}_k}^{{\mathsf T}}\tilde{S}_{ik}(x)E_{\tilde{\mathcal {C}}_k} \end{bmatrix}&= E_{\mathcal {C}_k}^{{\mathsf T}}{S}_{ik}(x) E_{\mathcal {C}_k}. \end{aligned}$$

We therefore obtain

$$\begin{aligned} \begin{bmatrix} 0 &{}\quad 0 \\ 0 &{}\quad R \end{bmatrix} = E_{\mathcal {C}_1}^{{\mathsf T}}\bigg ( Z_{0} + \sum _{i=1}^q g_i Z_{i} \bigg ) E_{\mathcal {C}_1} + \sum _{k=2}^t E_{\mathcal {C}_k}^{{\mathsf T}}\bigg (S_{0k} + \sum _{i=1}^q g_i S_{ik}\bigg ) E_{\mathcal {C}_k} \end{aligned}$$

and can rewrite the decomposition (6.9) as

$$\begin{aligned} P = E_{\mathcal {C}_1}^{{\mathsf T}}\bigg ( Q + Z_{0} + \sum _{i=1}^q g_i Z_{i} \bigg ) E_{\mathcal {C}_1} + \sum _{k=2}^t E_{\mathcal {C}_k}^{{\mathsf T}}\bigg (S_{0k} + \sum _{i=1}^q g_i S_{ik}\bigg ) E_{\mathcal {C}_k}. \end{aligned}$$

Substituting the decomposition of Q from (6.10), letting \(S_{i1}(x) := T_i(x) + Z_i(x)\), and reintroducing the x-dependence of various terms we arrive at

$$\begin{aligned} P(x) = \sum _{k=1}^t E_{\mathcal {C}_k}^{{\mathsf T}}\bigg (S_{0k}(x) + \sum _{i=1}^q g_i(x) S_{ik}(x)\bigg ) E_{\mathcal {C}_k}. \end{aligned}$$

which is the desired SOS decomposition of P(x).

6.4 Proof of Theorem 2.5

We combine the argument given in [9] for general (dense) polynomial matrices with Theorem 2.4 and the following auxiliary result, proven in Appendix D.

Lemma 6.1

Let S(x) be an SOS polynomial matrix satisfying \(S(x)=S(-x)\). For any real number \(r\ge 0\) and any integer \(\omega \) such that \(2\omega \ge \deg (S)\), the matrix \(\Vert x\Vert ^{2\omega } S(r\Vert x\Vert ^{-1}x)\) is polynomial of degree \(2\omega \), homogeneous, and SOS.

Choose any nonzero \(x_0 \in \mathcal {K}\), let \(r = \Vert x_0\Vert \ne 0\), and observe that the (nonempty) semialgebraic set \(\mathcal {K}' := \mathcal {K} \cap \{x \in \mathbb {R}^n: \pm (r^2 - \Vert x\Vert ^2) \ge 0 \}\) satisfies the Archimedean condition (2.12). Set \(g_{q+1}(x) = r^2 - \Vert x\Vert ^2\) and \(g_{q+2}(x) = \Vert x\Vert ^2 -r^2\) for notational convenience. Since the homogeneous polynomial matrix \(P(x')\) is strictly positive definite for all \(x' \in \mathcal {K}' \subseteq \mathcal {K}{\setminus } \{0\}\), we can apply Theorem 2.4 to find SOS matrices \(S_{j,k}'\) (not necessarily homogeneous) such that

$$\begin{aligned} P(x') = \sum _{k=1}^t E_{\mathcal {C}_k}^{{\mathsf T}}\bigg ( \hat{S}_{0,k}(x') + \sum _{j=1}^{q+2} g_j(x')\hat{S}_{j,k}(x') \bigg ) E_{\mathcal {C}_k}. \end{aligned}$$
(6.12)

Moreover, standard symmetry arguments (see, e.g., [24, 40]) reveal that we may take \(\hat{S}_{j,k}(-x')=\hat{S}_{j,k}(x')\) for all j and k because the matrix P and the polynomials \(g_1,\ldots ,g_{q+2}\) are invariant under the transformation \(x \mapsto -x\). The latter assertion is true because P and \(g_1,\ldots ,g_{q}\) are homogeneous and have even degree by assumption, while \(g_{q+1}(-x')=g_{q+1}(x')\) and \(g_{q+2}(-x')=g_{q+2}(x')\) by construction.

Next, set \(2d_0 = \deg (P)\) and \(2d_j = \deg (g_j)\) for all \(j=1,\ldots ,q\). Given any nonzero \(x \in \mathbb {R}^n\), evaluating (6.12) at the point \(x' = r x \Vert x\Vert ^{-1}\) yields

$$\begin{aligned} \frac{r^{2d_0} }{\Vert x\Vert ^{2d_0}} P(x) = \sum _{k=1}^t E_{\mathcal {C}_k}^{{\mathsf T}}\bigg [ \hat{S}_{0,k}\!\left( \frac{r x}{\Vert x\Vert } \right) + \sum _{j=1}^{q} \frac{r^{2d_j}}{\Vert x\Vert ^{2d_j}} \,g_j(x)\, \hat{S}_{j,k}\!\left( \frac{r x}{\Vert x\Vert } \right) \bigg ] E_{\mathcal {C}_k}, \end{aligned}$$
(6.13)

where we have used the fact that \(g_{q+1}(r x \Vert x\Vert ^{-1}) = g_{q+2}(r x \Vert x\Vert ^{-1}) = 0\). Let \(\omega \) be the smallest integer such that

$$\begin{aligned} 2\omega \ge 2d_0 + \sum _{j} 2d_j + \sum _{j,k} \deg (\hat{S}_{j,k}) \end{aligned}$$

and set \(\nu := \omega - d_0\). Multiplying (6.13) by \(\Vert x\Vert ^{2\omega }\) and rearranging, we obtain

$$\begin{aligned} \Vert x\Vert ^{2\nu } P(x) = \sum _{k=1}^t E_{\mathcal {C}_k}^{{\mathsf T}}\bigg ( {S}_{0,k}\!\left( x\right) + \sum _{j=1}^{q} g_j(x)\, {S}_{j,k}\!\left( x\right) \bigg ) E_{\mathcal {C}_k} \end{aligned}$$
(6.14)

with

$$\begin{aligned} S_{0,k}(x)&:= \frac{\Vert x\Vert ^{2\omega }}{r^{2d_0}} \,\hat{S}_{0,k}\!\left( \frac{r x}{\Vert x\Vert } \right) ,&S_{j,k}(x)&:= \frac{\Vert x\Vert ^{2\omega - 2d_j}}{r^{2d_0-2d_j} } \, \hat{S}_{j,k}\!\left( \frac{r x}{\Vert x\Vert } \right) . \end{aligned}$$

Lemma 6.1 guarantees that these matrices are homogeneous and SOS. Since (6.14) clearly holds also for \(x = 0\), it is the desired chordal SOS decomposition of P.

7 Conclusion

We have proven SOS decomposition theorems for positive semidefinite polynomial matrices with chordal sparsity (Theorems 2.2, 2.3, 2.1, 2.5 and Corollaries 2.2, 2.4), which can be viewed as sparsity-exploiting versions of the Hilbert–Artin, Reznick, Putinar, and Putinar–Vasilescu Positivstellensätze for polynomial matrices. Our theorems extend in a nontrivial way a classical chordal decomposition result for sparse numeric matrices [1], and we have shown that a naïve adaptation of this classical result to sparse polynomial matrices fails (Proposition 2.1).

In addition to being interesting in their own right, our SOS chordal decompositions have two important consequences. First, they can be combined with a straightforward scalarization argument to deduce new SOS representation results for nonnegative polynomials that are quadratic and correlatively sparse with respect to a subset of independent variables (Corollaries 4.1, 4.2 and 4.3). These statements specialize a sparse version of Putinar’s Positivstellensätze proven in [19], as well as recent sparsity-exploiting extensions of Reznick’s Positivstellensätze [25]. Second, Theorems 2.3, 2.1, 2.5 and Corollaries 2.2, 2.4 enable us to build new sparsity-exploiting hierarchies of SOS reformulations for convex optimization problems subject to large-scale but sparse polynomial matrix inequalities. These hierarchies are asymptotically exact for problems that have strictly feasible points and whose matrix inequalities are either imposed on a compact set satisfying the Archimedean condition (Theorem 3.1), or satisfy additional homogeneity and strict positivity conditions (Theorems 3.2 and 3.3). Moreover, and perhaps most importantly, our SOS hierarchies have significantly lower computational complexity than traditional ones when the maximal cliques of the sparsity graph associated to the polynomial matrix inequality are much smaller than the matrix. As demonstrated by the numerical examples in Sect. 5, this makes it possible to solve optimization problems with polynomial matrix inequalities that are well beyond the reach of standard SOS methods, without sacrificing their asymptotic convergence.

It would be interesting to explore if the results we have presented in this work can be extended in various directions. For example, it may be possible to adapt the analysis in [42] to derive a more general version of Theorem 2.4. It should also be possible to deduce explicit degree bounds for the SOS matrices that appear in all of our decomposition results. Stronger decomposition results for inhomogeneous polynomial matrix inequalities imposed on semialgebraic sets that are noncompact or do not satisfy the Archimedean condition would also be of interest. For instance, Corollaries 2.2 and 2.1 have restrictive assumptions on the behaviour of the leading homogeneous part of a polynomial matrix. These assumptions often are not met and, in such cases, SOS reformulations of convex optimization problems with polynomial matrix inequalities cannot be guaranteed to converge using Corollaries 2.2 and 2.1. Finally, the chordal decomposition problem for semidefinite matrices has a dual formulation that considers positive semidefinite completion of partially specified matrices; see, e.g., [46, Chapter 10]. Building on a notion of SOS matrix completion introduced in [53], it may be possible to establish SOS completion results for polynomial matrices. All of these extensions will contribute to building a comprehensive theory for SOS decomposition and completion of polynomial matrices, which will enable the application of SOS programming to tackle large-scale optimization problems with semidefinite constraints on sparse polynomial matrices.