1 Introduction

The sum of squares (SOS) condition is commonly used as a tractable restriction of polynomial nonnegativity. While SOS programs have traditionally been formulated and solved using semidefinite programming (SDP), Papp and Yildiz [16] recently demonstrated the effectiveness of a nonsymmetric interior point algorithm in solving SOS programs without SDP formulations. In this note, we focus on structured SOS constraints that can be modeled using more specialized cones. We describe and give barrier functions for three related cones useful for modeling functions of dense polynomials, which we hope will become useful modeling primitives.

The first is the cone of SOS matrices, which was described by Coey et al. [4, Section 5.7] without derivation. Characterizations of univariate SOS matrix cones in the context of optimization algorithms have previously been given by Genin et al. [6, Section 6]. However, their use of monomial or Chebyshev bases complicates computations of oracles in an interior point algorithm [16, Section 3.1] and prevents effective generalizations to the multivariate case. The second is an SOS \(\ell _2\)-norm (SOS-L2) cone, which can be used to certify pointwise membership in the second order cone for a vector with polynomial components. The third is an SOS \(\ell _1\)-norm (SOS-L1) cone, which can be used to certify pointwise membership in the epigraph set of the \(\ell _1\)-norm function. Although it is straightforward to use SOS representations to approximate these sets, such formulations introduce cones of higher dimension than the constrained polynomial vector. We suggest new barriers, with lower barrier parameters than SOS formulations allow.

In what follows, we use \(\mathbb {S}^m\), \(\mathbb {S}_{+}^m\), and \(\mathbb {S}_{++}^m\) to represent the symmetric, positive semidefinite and positive definite matrices respectively with side dimension m. For sets, \({{\,\mathrm{cl}\,}}\) denotes the closure and \({{\,\mathrm{int}\,}}\) denotes the interior. \(\llbracket a..b \rrbracket \) are the integers in the interval [ab]. \(\vert A \vert \) denotes the dimension of a set A, and \({{\,\mathrm{sd}\,}}(m) = \vert \mathbb {S}^m \vert = {m (m+1)}/{2}\). We use \(\langle \cdot , \cdot \rangle _{A}\) for the inner product on A. For a linear operator \(M: A \rightarrow B\), the adjoint \(M^*: B \rightarrow A\) is the unique operator satisfying \(\langle x, M y \rangle _A = \langle y, M^*x \rangle _B\) for all \(x \in A\) and \(y \in B\). \(\mathbf {I}_m\) is the identity in \(\mathbb {R}^{m \times m}\). \(\otimes _K: \mathbb {R}^{a_1 \times a_2} \times \mathbb {R}^{b_1 \times b_2} \rightarrow \mathbb {R}^{a_1 b_1 \times a_2 b_2}\) is the usual Kronecker product. \({{\,\mathrm{diag}\,}}\) returns the diagonal elements of a matrix and \({{\,\mathrm{Diag}\,}}\) maps a vector to a matrix with the vector on the diagonal. All vectors, matrices, and higher order tensors are written in bold font. \(s_i\) is the ith element of a vector \(\mathbf {s}\) and \(\mathbf {s}_{i \in \llbracket 1..N \rrbracket }\) is the set \(\{ \mathbf {s}_1, \ldots , \mathbf {s}_N \}\). If A is a vector space then \(A^n\) is the Cartesian product of n spaces A. \(\mathbb {R}[\mathbf {x}]_{n,d}\) is the ring of polynomials in the variables \(\mathbf {x}= (x_1, \ldots , x_n)\) with maximum degree d. Following Papp and Yildiz [16], we use \(L = \left( {\begin{array}{c}n+d\\ n\end{array}}\right) \) and \(U = \left( {\begin{array}{c}n+2d\\ n\end{array}}\right) \) to denote the dimensions of \(\mathbb {R}[\mathbf {x}]_{n,d}\) and \(\mathbb {R}[\mathbf {x}]_{n,2d}\) respectively, when n and d are given in the surrounding context.

1.1 The SOS polynomials cone and generic interior point algorithms

A polynomial \(p(\mathbf {x}) \in \mathbb {R}[\mathbf {x}]_{n,2d}\) is SOS if it can be expressed in the form \(p(\mathbf {x}) = \sum _{i \in \llbracket 1..N \rrbracket } q_i(\mathbf {x})^2\) for some \(N \in \mathbb {N}\) and \(q_{i \in \llbracket 1..N \rrbracket }(\mathbf {x}) \in \mathbb {R}[\mathbf {x}]_{n,d}\). We denote the set of SOS polynomials in \(\mathbb {R}[\mathbf {x}]_{n,2d}\) by \({K}_{{{\,\mathrm{SOS}\,}}}\), which is a proper cone in \(\mathbb {R}[\mathbf {x}]_{n,2d}\) [13].

We also say that \(\mathbf {s}\in {K}_{{{\,\mathrm{SOS}\,}}}\) for \(\mathbf {s}\in \mathbb {R}^U\) if \(\mathbf {s}\) represents a vector of coefficients of an SOS polynomial under a given basis. We use such vectorized definitions interchangeably with functional definitions of polynomial cones. To construct a vectorized definition for \({K}_{{{\,\mathrm{SOS}\,}}}\), suppose we have a fixed basis for \(\mathbb {R}[\mathbf {x}]_{n,2d}\), and let \(p_{i \in \llbracket 1..L \rrbracket }(\mathbf {x})\) be basis polynomials for \(\mathbb {R}[\mathbf {x}]_{n,d}\). Let \(\lambda : \llbracket 1..L \rrbracket ^2 \rightarrow \mathbb {R}^U\) be a function such that \(\lambda (i,j)\) returns the vector of coefficients of the polynomial \(p_i(\mathbf {x}) p_j(\mathbf {x})\) using the fixed basis for \(\mathbb {R}[\mathbf {x}]_{n,2d}\). Define the lifting operator , introduced by Nesterov [13], as:

(1.1)

where is a component in row i and column j. Now the cones \({K}_{{{\,\mathrm{SOS}\,}}}\) and \({K}_{{{\,\mathrm{SOS}\,}}}^*\) admit the characterization [13, Theorem 7.1]:

(1.2)

Equation (1.2) shows that the dual cone \({K}_{{{\,\mathrm{SOS}\,}}}^{*}\) is an inverse linear image of the positive semidefinite (PSD) cone, and therefore has an efficiently computable logarithmically homogeneous self-concordant barrier (LHSCB) (see [14, Definitions 2.3.1, 2.3.2]). In particular, by linearity of , the function is an LHSCB for \({K}_{{{\,\mathrm{SOS}\,}}}^*\) [14, Proposition 5.1.1] with parameter L (an L-LHSCB for short). This makes it possible to solve optimization problems over \({K}_{{{\,\mathrm{SOS}\,}}}\) or \({K}_{{{\,\mathrm{SOS}\,}}}^{*}\) with a generic primal-dual interior point algorithm in polynomial time [17].Footnote 1

In a generic primal-dual interior point algorithm, very few oracles are needed for each cone in the optimization problem. For example, the algorithm used by Coey et al. [4] only requires a membership check, an initial interior point, and evaluations of derivatives of an LHSCB for each cone or its dual. Therefore, there is no particular advantage to favoring either \({K}_{{{\,\mathrm{SOS}\,}}}\) or \({K}_{{{\,\mathrm{SOS}\,}}}^{*}\) formulations. Optimizing over \({K}_{{{\,\mathrm{SOS}\,}}}\) (or \({K}_{{{\,\mathrm{SOS}\,}}}^*\)) directly instead of building SDP formulations is appealing because the dimension of \({K}_{{{\,\mathrm{SOS}\,}}}\) is generally much smaller than the cone dimension in SDP formulations that are amenable to more specialized algorithms [4, 16]. In later sections we describe efficient LHSCBs and membership checks for each cone we introduce.

The output of the lifting operator depends on the polynomial basis chosen for \(\mathbb {R}[\mathbf {x}]_{n,d}\) as well as the basis for \(\mathbb {R}[\mathbf {x}]_{n,2d}\). Following Papp and Yildiz [16], we use a set of Lagrange polynomials that are interpolant on some points \(\mathbf {t}_{i \in \llbracket 1..U \rrbracket }\) as the basis for \(\mathbb {R}_{n,2d}[\mathbf {x}]\) and the multivariate Chebyshev polynomials [9] as the basis in \(\mathbb {R}_{n,d}[\mathbf {x}]\). These choices give the particular lifting operator :

(1.3)

Equivalently, , where \(P_{u, \ell } = p_\ell (\mathbf {t}_u)\) for all \(u \in \llbracket 1..U \rrbracket , \ell \in \llbracket 1..L \rrbracket \). The adjoint is given by . Papp and Yildiz [16] show that the Lagrange basis gives rise to expressions for the gradient and Hessian of the barrier for \({K}_{{{\,\mathrm{SOS}\,}}}^*\) that are computable in \(\mathcal {O}(LU^2)\) time for any \(d, n \ge 1\).

Although we assume for simplicity that p is a dense basis for \(\mathbb {R}[\mathbf {x}]_{n,d}\), this is without loss of generality. A modeler with access to a suitable sparse basis of \(\bar{L} < L\) polynomials in \(\mathbb {R}[\mathbf {x}]_{n,d}\) and \(\bar{U} < U\) interpolation points, could use Eq. (1.3) and obtain a barrier with parameter \(\bar{L}\).

2 Polynomial generalizations for three conic sets

The first set we consider are the polynomial matrices \(\mathbf {Q}(\mathbf {x}) \in \mathbb {R}[\mathbf {x}]_{n,2d}^{m \times m}\) (i.e. \(m \times m\) matrices with components that are polynomials in n variables of maximum degree 2d)Footnote 2 satisfying the constraint:

$$\begin{aligned} \mathbf {Q}(\mathbf {x}) \succeq 0 \quad \forall \mathbf {x}. \end{aligned}$$
(2.1)

One of the first applications of matrix SOS constraints was by Henrion and Lasserre [8]. The moment-SOS hierarchy was extended from the scalar case to the matrix case, using a suitable extension of Putinar’s Positivstellesatz studied by Hol and Scherer [10] and Kojima [11].

This constraint has various applications in statistics, control, and engineering (e.g. [2, 7]). A tractable restriction for Eq. (2.1) is given by the SOS formulation:

$$\begin{aligned} \mathbf {y}^\top \mathbf {Q}(\mathbf {x}) \mathbf {y}\in {K}_{{{\,\mathrm{SOS}\,}}}\quad \forall {\mathbf {y}} \in \mathbb {R}^m. \end{aligned}$$
(2.2)

This formulation is sometimes implemented in practice and requires an SOS cone of dimension \(U{{\,\mathrm{sd}\,}}(m)\) (by exploiting the fact that all terms are bilinear in the \(\mathbf {y}\) variables). It is well known that Eq. (2.2) is equivalent to restricting \(\mathbf {Q}(\mathbf {x})\) to be an SOS matrix of the form \(\mathbf {Q}(\mathbf {x}) = \mathbf {M}(\mathbf {x})^\top \mathbf {M}(\mathbf {x})\) for some \(N \in \mathbb {N}\) and \(\mathbf {M}(\mathbf {x}) \in \mathbb {R}[\mathbf {x}]_{n,d}^{N \times m}\) [3, Definition 3.76]. To be consistent in terminology with the other cones we introduce, we refer to SOS matrices as SOS-PSD matrices, or belonging to \({K}_{{{\,\mathrm{SOSPSD}\,}}}\). We show how to characterize \({K}_{{{\,\mathrm{SOSPSD}\,}}}\) and use it directly in an interior point algorithm in Sect. 3.

The second set we consider are the polynomial vectors \(\mathbf {q(x)} \in \mathbb {R}[\mathbf {x}]_{n,2d}^m\) satisfying:

$$\begin{aligned} {q}_1(\mathbf {x}) \ge \sqrt{{\textstyle \sum _{i \in \llbracket 2..m \rrbracket }} ( q_i(\mathbf {x}) )^2 } \quad \forall \mathbf {x}, \end{aligned}$$
(2.3)

and hence requiring \(\mathbf {q(x)}\) to be in the epigraph set of the \(\ell _2\)-norm function (second order cone) pointwise (cf. Eq. (2.1) requiring the polynomial matrix to be in the PSD cone). A tractable restriction for this constraint is given by the SOS formulation:

$$\begin{aligned} \mathbf {y}^\top {{\,\mathrm{{Arw}}\,}}(\mathbf {q(\mathbf {x})}) \mathbf {y}\in {K}_{{{\,\mathrm{SOS}\,}}}\quad \forall {\mathbf {y}} \in \mathbb {R}^m, \end{aligned}$$
(2.4)

where \({{\,\mathrm{{Arw}}\,}}: \mathbb {R}[\mathbf {x}]_{n,2d}^m \rightarrow \mathbb {R}[\mathbf {x}]_{n,2d}^{m \times m}\) is defined by:

$$\begin{aligned} \begin{aligned} {{\,\mathrm{{Arw}}\,}}(\mathbf {p}(\mathbf {x})) = \begin{bmatrix} p_1(\mathbf {x}) &{} \bar{\mathbf {p}}(\mathbf {x})^\top \\ \bar{\mathbf {p}}(\mathbf {x}) &{} p_1(\mathbf {x}) \mathbf {I}_{m-1} \end{bmatrix}, \\ \mathbf {p}(\mathbf {x}) = ( p_1(\mathbf {x}), \bar{\mathbf {p}}(\mathbf {x}) ) \in \mathbb {R}[\mathbf {x}]_{n,2d} \times \mathbb {R}[\mathbf {x}]_{n,2d}^{m-1} . \end{aligned} \end{aligned}$$
(2.5)

Due to the equivalence between Eq. (2.2) and membership in \({K}_{{{\,\mathrm{SOSPSD}\,}}}\), Equation (2.4) is equivalent to requiring that \(\mathbf {q}(\mathbf {x})\) belongs to the cone we denote \({K}_{{{\,\mathrm{{Arw}}\,}}{{\,\mathrm{SOSPSD}\,}}}\) defined by:

$$\begin{aligned} {K}_{{{\,\mathrm{{Arw}}\,}}{{\,\mathrm{SOSPSD}\,}}} = \{ \mathbf {q}(\mathbf {x}) \in \mathbb {R}[\mathbf {x}]_{n,2d}^m: {{\,\mathrm{{Arw}}\,}}({\mathbf {q}(\mathbf {x})}) \in {K}_{{{\,\mathrm{SOSPSD}\,}}}\} . \end{aligned}$$
(2.6)

Membership in \({K}_{{{\,\mathrm{{Arw}}\,}}{{\,\mathrm{SOSPSD}\,}}}\) ensures Eq. (2.3) holds due to the SDP representation of the second order cone [1], and the fact that the SOS-PSD condition certifies pointwise positive semidefiniteness. An alternative restriction of Eq. (2.3) is described by the set we denote \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\), which is not representable by the usual scalar polynomial SOS cone in general:

$$\begin{aligned} {K}_{{{\,\mathrm{SOS}\,}}\ell _2}= \left\{ \begin{aligned} \mathbf {q}(\mathbf {x}) \in \mathbb {R}[\mathbf {x}]_{n,2d}^m : \exists N \in \mathbb {N}, \mathbf {p}_{i \in \llbracket 1..N \rrbracket }(\mathbf {x}) \in \mathbb {R}[\mathbf {x}]_{n,d}^m, \\ \mathbf {q}(\mathbf {x}) = {\textstyle \sum _{i \in \llbracket 1..N \rrbracket }} \mathbf {p}_i(\mathbf {x}) \circ \mathbf {p}_i(\mathbf {x}) \end{aligned} \right\} , \end{aligned}$$
(2.7)

where \(\circ : \mathbb {R}^m \times \mathbb {R}^m \rightarrow \mathbb {R}^m\) is defined by:

$$\begin{aligned} \mathbf {x}\circ \mathbf {y} = \begin{bmatrix} \mathbf {x}^\top \mathbf {y} \\ x_1 \bar{\mathbf {y}} + y_1 \bar{\mathbf {x}} \end{bmatrix} , \quad \mathbf {x} = (x_1, \bar{\mathbf {x}}) , \quad \mathbf {y} = (y_1, \bar{\mathbf {y}}) \in \mathbb {R}\times \mathbb {R}^{m-1} , \end{aligned}$$
(2.8)

and \(\circ : \mathbb {R}[\mathbf {x}]^m_{n,d} \times \mathbb {R}[\mathbf {x}]^m_{n,d} \rightarrow \mathbb {R}[\mathbf {x}]^m_{n,2d}\) on polynomial vectors is defined analogously. This set was also studied by Kojima and Muramatsu with a focus on extending Positivstellensatz results [12]. The validity of \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\) as a restriction of Eq. (2.3) follows from the the characterization of the second order cone as a cone of squares [1, Section 4]. For this reason we will refer to the elements of \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\) as the SOS-L2 polynomials. For a polynomial vector in \(\mathbb {R}[\mathbf {x}]_{n,2d}^m\), the dimension of \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\) is Um, which is favorable to the dimension \(U {{\,\mathrm{sd}\,}}(m)\) of \({K}_{{{\,\mathrm{SOS}\,}}}\) required for Eq. (2.4) or \({K}_{{{\,\mathrm{SOSPSD}\,}}}\) in Eq. (2.6). In addition, we show in Sect. 4.1 that \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\) admits an LHSCB with smaller parameter than \({K}_{{{\,\mathrm{{Arw}}\,}}{{\,\mathrm{SOSPSD}\,}}}\). However, we conjecture that for general n and d, \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\subsetneq {K}_{{{\,\mathrm{{Arw}}\,}}{{\,\mathrm{SOSPSD}\,}}}\) (for example, consider the vector \([1 + x^2, 1 - x^2, 2 x]\), which belongs to \({K}_{{{\,\mathrm{{Arw}}\,}}{{\,\mathrm{SOSPSD}\,}}}\) but not \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\)). A third formulation could be obtained by modifying the SDP formulation for \({K}_{{{\,\mathrm{{Arw}}\,}}{{\,\mathrm{SOSPSD}\,}}}\) to account for all sparsity in the \(\mathbf {y}\) monomials (by introducing a specialized cone for the Gram matrix of \(\mathbf {y}^\top {{\,\mathrm{{Arw}}\,}}(\mathbf {q}(\mathbf {x})) \mathbf {y}\)). However, this approach suffers from requiring \(\mathcal {O}(L^2)\) conic variables for each polynomial in \(\mathbf {q}(\mathbf {x})\).

The third and final set we consider is also described through a constraint on a polynomial vector \(\mathbf {q}(\mathbf {x}) \in \mathbb {R}[\mathbf {x}]_{n,2d}^m\). This constraint is given by:

$$\begin{aligned} {q}_1(\mathbf {x}) \ge {\textstyle \sum _{i \in \llbracket 2..m \rrbracket }} \vert {q}_i(\mathbf {x}) \vert \quad \forall \mathbf {x}, \end{aligned}$$
(2.9)

and hence requires the polynomial vector to be in the epigraph set of the \(\ell _1\)-norm function (\(\ell _1\)-norm cone) pointwise. A tractable restriction for this constraint is given by the SOS formulation:

$$\begin{aligned} q_1(\mathbf {x}) - {\textstyle \sum _{i \in \llbracket 2..m \rrbracket }} ( p_{i}(\mathbf {x})^{+} + p_{i}(\mathbf {x})^{-} )&\in {K}_{{{\,\mathrm{SOS}\,}}}, \end{aligned}$$
(2.10a)
$$\begin{aligned} q_i(\mathbf {x}) = p_{i}(\mathbf {x})^{+} - p_{i}(\mathbf {x})^{-}, p_{i}(\mathbf {x})^{+}, p_{i}(\mathbf {x})^{-}&\in {K}_{{{\,\mathrm{SOS}\,}}}&\forall i \in \llbracket 2..m \rrbracket , \end{aligned}$$
(2.10b)

which uses auxiliary polynomial variables \(p_{i \in \llbracket 2..m \rrbracket }^+(\mathbf {x}) \in \mathbb {R}[\mathbf {x}]_{n,2d}\) and \(p_{i \in \llbracket 2..m \rrbracket }^-(\mathbf {x}) \in \mathbb {R}[\mathbf {x}]_{n,2d}\). We refer to the projection of Eq. (2.10) onto \(\mathbf {q(x)} \in \mathbb {R}[\mathbf {x}]_{n,2d}^m\) as \({K}_{{{\,\mathrm{SOS}\,}}\ell _1}\) and to its elements as the SOS-L1 polynomials. Note that the dimension of \({K}_{{{\,\mathrm{SOS}\,}}\ell _1}\) is Um, while Eq. (2.10) requires \(2m-1\) SOS cones of dimension U and \(U(m - 1)\) additional equality constraints. In Sect. 4.2 we derive an Lm-LHSCB that allows us to optimize over \({K}_{{{\,\mathrm{SOS}\,}}\ell _1}\) directly, while Eq. (2.10) would require an LHSCB with parameter \(L (2 m - 1)\).

We summarize some key properties of the new cones and SOS formulations in Table 1: the total dimension of cones involved, the parameter of an LHSCB for the conic sets, the time complexity to calculate the Hessian of the LHSCB, the level of conservatism of each new conic set compared to its alternative SOS formulation, and the number of auxiliary equality constraints and variables that need to be added in an optimization problem. A computational comparison of each pair of formulations using an example problem can be found at https://github.com/chriscoey/Hypatia.jl/wiki. The algorithmic advantages from the new cones usually translate to faster solve times in practice, and our experiments agree with the conjecture \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\subsetneq {K}_{{{\,\mathrm{{Arw}}\,}}{{\,\mathrm{SOSPSD}\,}}}\).

Table 1 Properties of new cones compared to SOS formulations

3 SOS-PSD and SOS-L2 cones from general algebras

The ideas introduced by Papp and Alizadeh [15] relating to SOS cones in general algebras allow us to characterize \({K}_{{{\,\mathrm{SOSPSD}\,}}}\) and \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\) without auxiliary SOS polynomial constraints. As in Papp and Alizadeh [15], let us define \((A,B,\diamond )\) as a general algebra if AB are vector spaces and \(\diamond : A \times A \rightarrow B\) is a bilinear product that satisfies the distributive property. For a general algebra \((A,B,\diamond )\), Papp and Alizadeh [15] define the SOS cone \({K}_\diamond \):

$$\begin{aligned} {K}_\diamond = \{ b \in B: \exists N \in \mathbb {N}, a_{i \in \llbracket 1..N \rrbracket } \in A, b = {\textstyle \sum _{i \in \llbracket 1..N \rrbracket }} a_i \diamond a_i \}. \end{aligned}$$
(3.1)

For instance, \(\mathbb {S}_+\) is equal to the SOS cone of \((\mathbb {R}^m, \mathbb {S}^m, \overline{\diamond })\) for \(\overline{\diamond }\) given by \(\mathbf {x} \overline{\diamond } \mathbf {y} = \tfrac{1}{2} (\mathbf {x} \mathbf {y}^\top + \mathbf {y} \mathbf {x}^\top )\). The second order cone is equal to the SOS cone of \((\mathbb {R}^m, \mathbb {R}^m, \circ )\). \({K}_{{{\,\mathrm{SOS}\,}}}\) is equal to the SOS cone of \((\mathbb {R}[\mathbf {x}]_{n,d}, \mathbb {R}[\mathbf {x}]_{n,2d}, \cdot )\) where \(\cdot \) is the product of polynomials. To obtain our vectorized representation of \({K}_{{{\,\mathrm{SOS}\,}}}\) we can redefine the function \(\lambda : \mathbb {R}^L \times \mathbb {R}^L \rightarrow \mathbb {R}^U\) so that for \(\mathbf {p}_i, \mathbf {p}_j \in \mathbb {R}^L\) representing coefficients of any polynomials in \(\mathbb {R}[\mathbf {x}]_{n,d}\), \(\lambda (\mathbf {p}_i, \mathbf {p}_j)\) returns the vector of coefficients of the product of the polynomials. Then \({K}_{{{\,\mathrm{SOS}\,}}}\) is equal to the SOS cone of \((\mathbb {R}^L, \mathbb {R}^U, \lambda )\).

As we describe in Sect. 3.1, Papp and Alizadeh [15] also show how to build lifting operators for general algebras. This allows us to construct membership checks and easily computable LHSCBs for \({K}_{{{\,\mathrm{SOSPSD}\,}}}^*\) and \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}^*\) once we represent them as SOS cones of tensor products of algebras.

The tensor product of two algebras \((A_1, B_1, \diamond _1)\) and \((A_2, B_2, \diamond _2)\) is a new algebra \((A_1 \otimes A_2, B_1 \otimes B_2, \diamond _1 \otimes \diamond _2)\), where \(\diamond _1 \otimes \diamond _2\) is defined via its action on elementary tensors. For \(\mathbf {u}_1, \mathbf {v}_1 \in A_1\) and \(\mathbf {u}_2, \mathbf {v}_2 \in A_2\):

$$\begin{aligned} (\mathbf {u}_1 \otimes \mathbf {u}_2) \diamond _1 \otimes \diamond _2 (\mathbf {v}_1 \otimes \mathbf {v}_2) = (\mathbf {u}_1 \diamond _1 \mathbf {v}_1) \otimes (\mathbf {u}_2 \diamond _2 \mathbf {v}_2). \end{aligned}$$
(3.2)

The algebra we are interested in for a functional representation of \({K}_{{{\,\mathrm{SOSPSD}\,}}}\) is the tensor product of \((\mathbb {R}[\mathbf {x}]_{n,d}, \mathbb {R}[\mathbf {x}]_{n,2d}, \cdot )\) with \((\mathbb {R}^m, \mathbb {S}^m, \bar{\diamond })\). We can think of elements in \(\mathbb {R}[\mathbf {x}]_{n,d} \otimes \mathbb {R}^m\) as polynomial vectors in \(\mathbb {R}[\mathbf {x}]_{n,d}^m\), and \(\mathbb {R}[\mathbf {x}]_{n,2d} \otimes \mathbb {S}^m\) as the symmetric polynomial matrices in \(\mathbb {R}[\mathbf {x}]_{n,2d}^{m \times m}\). The SOS cone of \((\mathbb {R}[\mathbf {x}]_{n,d} \otimes \mathbb {R}^m, \mathbb {R}[\mathbf {x}]_{n,2d} \otimes \mathbb {S}^m, \cdot \otimes \bar{\diamond })\) corresponds to the polynomial matrices that can be written as \({\textstyle \sum _{i \in \llbracket 1..N \rrbracket }} \mathbf {m}_i(\mathbf {x}) \mathbf {m}_i(\mathbf {x})^\top \) with \(\mathbf {m}_i(\mathbf {x}) \in \mathbb {R}[\mathbf {x}]^m\) for all \(i \in \llbracket 1..N \rrbracket \) [15, Section 4.3], which is exactly \({K}_{{{\,\mathrm{SOSPSD}\,}}}\). Equivalently, a vectorized representation of \({K}_{{{\,\mathrm{SOSPSD}\,}}}\) can be characterized as the SOS cone of \((\mathbb {R}^L \otimes \mathbb {R}^m, \mathbb {R}^U \otimes \mathbb {S}^m, \lambda \otimes \bar{\diamond })\). We can think of \(\mathbb {R}^L \otimes \mathbb {R}^m\) as \(\mathbb {R}^{L \times m}\) and we can think of \(\mathbb {R}^U \otimes \mathbb {S}^m\) as a subspace of \(\mathbb {R}^{U \times m \times m}\) that represents the coefficients of symmetric polynomial matrices.

Likewise, the algebra we are interested in for a functional representation of \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\) is the tensor product of \((\mathbb {R}[\mathbf {x}]_{n,d}, \mathbb {R}[\mathbf {x}]_{n,2d}, \cdot )\) with \((\mathbb {R}^m, \mathbb {R}^m, \circ )\). We can think of \(\mathbb {R}[\mathbf {x}]_{n,d} \otimes \mathbb {R}^m\) and \(\mathbb {R}[\mathbf {x}]_{n,2d} \otimes \mathbb {R}^m\) as \(\mathbb {R}[\mathbf {x}]_{n,d}^m\) and \(\mathbb {R}[\mathbf {x}]_{n,2d}^m\) respectively. The SOS cone of the tensor product of these algebras then corresponds to \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\) due to Eq. (2.7). A vectorized representation of \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\) may be characterized as the SOS cone of \((\mathbb {R}^L \otimes \mathbb {R}^m, \mathbb {R}^U \otimes \mathbb {R}^m, \lambda \otimes \circ )\). We can think of \(\mathbb {R}^U \otimes \mathbb {R}^m\) as the coefficients of polynomial vectors, represented in \(\mathbb {R}^{U \times m}\).

3.1 Lifting operators for SOS-PSD and SOS-L2

The lifting operator of \((A, B, \diamond )\), when A and B are finite dimensional, is defined by Papp and Alizadeh [15] as the function satisfying for all \(a_1, a_2 \in A\), \(b \in B\). This leads to the following descriptions of \({K}_\diamond \) and \({K}_\diamond ^*\) [15, Theorem 3.2]:

(3.3)

Recall that in order to use either \({K}_\diamond \) or \({K}_\diamond ^*\) in a generic interior point algorithm, we require efficient oracles for a membership check and derivatives of an LHSCB of \({K}_\diamond \) or \({K}_\diamond ^*\). If is efficiently computable, Equation (3.3) provides a membership check for \({K}_\diamond ^*\). Furthermore, an LHSCB for \({K}_\diamond ^*\) is given by with barrier parameter \(\vert A \vert \) due to the linearity of [14, Proposition 5.1.1]. The following lemma describes how to compute for a tensor product algebra.

Lemma 3.1

[15, Lemma 4.1]: If \(\mathbf {w}_1 \in B_1\) and \(\mathbf {w}_2 \in B_2\), then:

(3.4)

Let us define \(\otimes : \mathbb {R}^U \times \mathbb {S}^m \rightarrow \mathbb {R}^{U \times m \times m}\) such that \((\mathbf {u}\otimes \mathbf {V})_{i,j,k} = u_i V_{j,k}\) and let us represent the coefficients of a polynomial matrix by a tensor \(\mathbf {S}\in \mathbb {R}^{U \times m \times m}\). Then we may write \(\mathbf {S} = {\textstyle \sum _{i \in \llbracket 1..m \rrbracket , j \in \llbracket 1..i \rrbracket }} \mathbf {S}_{i,j} \otimes \mathbf {E}_{i,j}\), where \(\mathbf {E}_{i,j} \in \mathbb {R}^{m \times m}\) is a matrix of zeros and ones with \(E_{i,j} = E_{j,i} = 1\) and \(\mathbf {S}_{i,j} \in \mathbb {R}^U\) are the coefficients of the polynomial in row i and column j. Applying Lemma 3.1, the lifting operator for \({K}_{{{\,\mathrm{SOSPSD}\,}}}\), is:

(3.5a)
(3.5b)

The output is a block matrix, where each \(L \times L\) submatrix in the ith group of rows and jth group of columns is for all \(i, j \in \llbracket 1..m \rrbracket \). The adjoint operator may also be defined blockwise, for all \(i, j \in \llbracket 1..m \rrbracket \) where \(\mathbf {S}_{i,j} \in \mathbb {R}^{L \times L}\) is the (ij)th submatrix in \(\mathbf {S}\). Note that the Hessian of can be evaluated in \(\mathcal {O}(L U^2 m^3)\) time, which is implemented in the Hypatia solver [4].

Likewise, we use a tensor \(\mathbf {s} \in \mathbb {R}^{U \times m}\) to describe the coefficients of a polynomial vector, and write \(\mathbf {s}_i \in \mathbb {R}^U\) to denote the vector of coefficients of the polynomial in component i. Applying Lemma 3.1 again, we obtain the (blockwise) definition of the lifting operator for \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\), :

(3.6)

where is the (ij)th submatrix of . Thus has a block arrowhead structure. The output of the adjoint operator may be defined as:

(3.7)

where is the ith slice of and \(\mathbf {S}_{i,j} \in \mathbb {R}^{L \times L}\) is the (ij)th block in \(\mathbf {S}\) for all \(i, j \in \llbracket 1..m \rrbracket \).

4 Efficient barriers for SOS-L2 and SOS-L1

As for \({K}_{{{\,\mathrm{SOSPSD}\,}}}^*\) and \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}^*\), we show that a barrier for \({K}_{{{\,\mathrm{SOS}\,}}\ell _1}^*\) can be obtained by composing a linear lifting operator with the \({{\,\mathrm{logdet}\,}}\) barrier. This is sufficient to optimize over \({K}_{{{\,\mathrm{SOSPSD}\,}}}\), \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\) and \({K}_{{{\,\mathrm{SOS}\,}}\ell _1}\) without high dimensional SDP formulations. However, for \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}^*\) and \({K}_{{{\,\mathrm{SOS}\,}}\ell _1}^*\) we can derive improved barriers by composing nonlinear functions with the \({{\,\mathrm{logdet}\,}}\) barrier instead. We show that these compositions are indeed LHSCBs.

4.1 SOS-L2

Recall Eq. (3.3) suggests that checking membership in \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}^*\) amounts to checking positive definiteness of with side dimension Lm. This membership check corresponds to a straightforward LHSCB with parameter Lm given by . We now show that by working with a Schur complement of , we obtain a membership check for \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}^*\) that requires factorizations of only two matrices with side dimension L and implies an LHSCB with parameter 2L.

Let return the Schur complement:

(4.1)

By Eqs. (3.3) and (4.1):

(4.2)

Equation (4.2) describes a simple membership check. Furthermore, the function \(F: \mathbb {R}^{U \times m} \rightarrow \mathbb {R}\) defined by:

(4.3a)
(4.3b)

is a 2L-LHSCB barrier for \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}\).

Theorem 4.1

The function F defined by Equation (4.3) is a 2L-LHSCB for \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}^*\).

Proof

It is easy to verify that F is a logarithmically homogeneous barrier, so we show it is a 2L-self-concordant barrier for \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}^*\). We first show that \(\hat{F}: \mathbb {S}_{++}^L \times (\mathbb {R}^{L \times L})^{m-1} \rightarrow \mathbb {R}\) defined as \(\hat{F}(\mathbf {X}_1, \ldots , \mathbf {X}_m) = -{{\,\mathrm{logdet}\,}}(\mathbf {X}_1 - {\textstyle \sum _{i \in \llbracket 2..m \rrbracket }} \mathbf {X}_i \mathbf {X}_{1}^{-1} \mathbf {X}_i^{\top }) - {{\,\mathrm{logdet}\,}}(\mathbf {X}_1)\), is a 2L-self-concordant barrier for the cone:

(4.4)

We then argue that F is a composition of \(\hat{F}\) with the linear map and \({K}_{{{\,\mathrm{SOS}\,}}\ell _2}^*\) is an inverse image of \({K}_{\ell _2}^m\) under the same map. Then by Nesterov and Nemirovskii [14, Proposition 5.1.1] F is self-concordant.

Let \(\Gamma = \mathbb {S}^L_{+} \times (\mathbb {R}^{L \times L})^{m-1}\) and \(\mathbf {G}: {{\,\mathrm{int}\,}}(\Gamma ) \rightarrow \mathbb {S}^L\) be defined as:

$$\begin{aligned} \mathbf {G}(\mathbf {X}_1, \ldots , \mathbf {X}_m) = \mathbf {X}_1 - {\textstyle \sum _{i \in \llbracket 2..m \rrbracket }} \mathbf {X}_i \mathbf {X}_{1}^{-1} \mathbf {X}_i^\top . \end{aligned}$$
(4.5)

Let us check that \(\mathbf {G}\) is \((\mathbb {S}_{+}^L, 1)\)-compatible with the domain \(\Gamma \) in the sense of [14, Definition 5.1.1]. This requires that \(\mathbf {G}\) is \(C^3\)-smooth on \({{\,\mathrm{int}\,}}(\Gamma )\), \(\mathbf {G}\) is concave with respect to \(\mathbb {S}_{+}^L\), and at each point \(\mathbf {X}= (\mathbf {X}_1, \ldots , \mathbf {X}_m) \in {{\,\mathrm{int}\,}}(\Gamma )\) and any direction \(\mathbf {V}= (\mathbf {V}_1, \ldots , \mathbf {V}_m) \in \mathbb {S}^L \times (\mathbb {R}^{L \times L})^{m-1}\) such that \(-\mathbf {X}_1 \preceq \mathbf {V}_1 \preceq \mathbf {X}_1\), the directional derivatives of \(\mathbf {G}\) satisfy:

$$\begin{aligned} \tfrac{d^3 \mathbf {G}}{d \mathbf {X}^3}[\mathbf {V}, \mathbf {V}, \mathbf {V}] \preceq -3 \tfrac{d^2 \mathbf {G}}{d \mathbf {X}^2}[\mathbf {V}, \mathbf {V}] . \end{aligned}$$
(4.6)

Let \(\mathbf {V}\in \mathbb {S}^L \times (\mathbb {R}^{L \times L})^{m-1}\). It can be checked that \(\tfrac{d^3 \mathbf {G}}{d \mathbf {X}^3}\) is continuous on the domain of \(\mathbf {G}\) and we have the directional derivatives:

$$\begin{aligned} \tfrac{d^2 \mathbf {G}}{d \mathbf {X}^2}[\mathbf {V}, \mathbf {V}]&= -2 {\textstyle \sum _{i \in \llbracket 2..m \rrbracket }} (\mathbf {X}_i \mathbf {X}_{1}^{-1} \mathbf {V}_1 - \mathbf {V}_i)\mathbf {X}_{1}^{-1}(\mathbf {X}_i \mathbf {X}_{1}^{-1} \mathbf {V}_1 - \mathbf {V}_i)^\top , \end{aligned}$$
(4.7)
$$\begin{aligned} \tfrac{d^3 \mathbf {G}}{d \mathbf {X}^3}[\mathbf {V}, \mathbf {V}, \mathbf {V}]&= 6 {\textstyle \sum _{i \in \llbracket 2..m \rrbracket }} (\mathbf {X}_i \mathbf {X}_{1}^{-1} \mathbf {V}_1 - \mathbf {V}_i)\mathbf {X}_{1}^{-1} \mathbf {V}_1 \mathbf {X}_{1}^{-1}(\mathbf {X}_i \mathbf {X}_{1}^{-1} \mathbf {V}_1 - \mathbf {V}_i)^\top . \end{aligned}$$
(4.8)

Since \(\mathbf {X}_1 \succ 0\) in \({{\,\mathrm{int}\,}}(\Gamma )\), \(-\tfrac{d^2 \mathbf {G}}{d \mathbf {X}^2}[\mathbf {V}, \mathbf {V}] \succeq 0\) and so by Nesterov and Nemirovskii [14, Lemma 5.1.2], \(\mathbf {G}\) is concave with respect to \(\mathbb {S}_{+}^L\). It remains to show that (4.6) is satisfied. Since the directional derivatives decouple by each index i in the sum, it is sufficient to show that the inequality is satisfied for each \(i \in \llbracket 2..m \rrbracket \). For this, it is sufficient that \(6 \mathbf {X}_{1}^{-1} \mathbf {V}_1 \mathbf {X}_{1}^{-1} \preceq -3 \times -2 \mathbf {X}_{1}^{-1}\) for all \(-\mathbf {X}_1 \preceq \mathbf {V}_1 \preceq \mathbf {X}_1\), which follows since \(\mathbf {X}_1\) is positive definite on \({{\,\mathrm{int}\,}}(\Gamma )\). Now by [14, proposition 5.1.7], \(\hat{F}\) is a 2L-LHSCB. The same is true for F by composing \(\hat{F}\) with a linear map. \(\square \)

The Hessian of F can be evaluated in \(\mathcal {O}(L U^2 m^2)\) time, which is implemented in Hypatia.

4.2 SOS-L1

By combining Eqs. (1.2) and  (2.10), the \({K}_{{{\,\mathrm{SOS}\,}}\ell _1}\) cone admits the semidefinite representation:

(4.9)

Its dual cone is:

(4.10)

Equation (4.10) suggests that checking membership in \({K}_{{{\,\mathrm{SOS}\,}}\ell _1}^*\) amounts to checking positive definiteness of \(2 (m - 1)\) matrices of side dimension L. This membership check corresponds to a straightforward LHSCB with parameter \(2 L (m - 1)\) that is given by . We now describe a membership check for \({K}_{{{\,\mathrm{SOS}\,}}\ell _1}^*\) that requires factorizations of only m matrices, and corresponds to an LHSCB with parameter Lm.

Lemma 4.2

The set \(\{\mathbf {X}\in \mathbb {S}_{+}^L, \mathbf {Y}\in \mathbb {S}^L: -\mathbf {X}\preceq \mathbf {Y}\preceq \mathbf {X}\}\) is equal to \({K}_{\ell _2}^2 = {{\,\mathrm{cl}\,}}\{\mathbf {X}\in \mathbb {S}_{++}^L, \mathbf {Y}\in \mathbb {S}^L: \mathbf {X}- \mathbf {Y}\mathbf {X}^{-1} \mathbf {Y}\succ 0\}\).

Proof

For inclusion in one direction:

(4.11a)
(4.11b)
(4.11c)
(4.11d)

For the other direction, suppose \(-\mathbf {X}\prec \mathbf {Y}\prec \mathbf {X}\). Then \(\mathbf {X}\succ 0\), \(\mathbf {Y}+ \mathbf {X}\succ 0\), \(\mathbf {X}- \mathbf {Y}\succ 0\). Note that \((\mathbf {Y}+ \mathbf {X}) \mathbf {X}^{-1} (\mathbf {X}- \mathbf {Y}) = \mathbf {X}- \mathbf {Y}\mathbf {X}^{-1} \mathbf {Y}\) is symmetric. Due to Subramanian and Bhagwat [18, Corollary 1], this product of three matrices also has nonnegative eigenvalues. We conclude that \(-\mathbf {X}\prec \mathbf {Y}\prec \mathbf {X}\) implies \(\mathbf {X}\succ 0\) and \(\mathbf {X}- \mathbf {Y}\mathbf {X}^{-1} \mathbf {Y}\succeq 0\). Since \(-\mathbf {X}\preceq \mathbf {Y}\preceq \mathbf {X}= {{\,\mathrm{cl}\,}}\{ -\mathbf {X}\prec \mathbf {Y}\prec \mathbf {X}\}\), taking closures gives the result. \(\square \)

By Lemma 4.2 we can write the dual cone as:

(4.12)

Theorem 4.3

The function \(F: \mathbb {R}^{U \times m} \rightarrow \mathbb {R}\) given by:

(4.13)

is an Lm-LHSCB for \({K}_{{{\,\mathrm{SOS}\,}}\ell _1}^*\).

Proof

It is easy to verify that F is a logarithmically homogeneous barrier, and we show it is an Lm-self-concordant barrier. As in Theorem 4.1, we define an auxiliary cone:

$$\begin{aligned} {K}_{\ell _\infty }^m&= \{ (\mathbf {X}_1, \ldots , \mathbf {X}_m) \in \mathbb {S}_{+}^L \times (\mathbb {R}^{L \times L})^{m-1} : (\mathbf {X}_1, \mathbf {X}_i) \in {K}_{\ell _2}^2 \forall i \in \llbracket 2..m \rrbracket \}. \end{aligned}$$
(4.14)

Let \(\hat{F}: \mathbb {S}_{++}^L \times (\mathbb {R}^{L \times L})^{m-1} \rightarrow \mathbb {R}\) be defined as \(\hat{F}(\mathbf {X}_1, \ldots , \mathbf {X}_m) = -{\textstyle \sum _{i \in \llbracket 2..m \rrbracket }} {{\,\mathrm{logdet}\,}}(\mathbf {X}_1 - \mathbf {X}_i \mathbf {X}_1^{-1} \mathbf {X}_i^\top ) - {{\,\mathrm{logdet}\,}}(\mathbf {X}_1)\). We argue that \(\hat{F}\) is an Lm-self-concordant barrier for \({K}_{\ell _\infty }^m\). F is a composition of \(\hat{F}\) with the same linear map used in Theorem 4.1 and self-concordance of F then follows by the same reasoning.

Let \(\Gamma = \mathbb {S}_{+}^L \times (\mathbb {R}^{L \times L})^{m-1}\) and \(\mathbf {H}: {{\,\mathrm{int}\,}}(\Gamma ) \rightarrow (\mathbb {S}_{+}^L)^{m-1}\) be defined by:

$$\begin{aligned} \mathbf {H}(\mathbf {X}_1, \ldots , \mathbf {X}_m) = \Big ( \mathbf {X}_1 - \mathbf {X}_2 \mathbf {X}_{1}^{-1} \mathbf {X}_2^\top , \ldots , \mathbf {X}_1 - \mathbf {X}_m \mathbf {X}_{1}^{-1} \mathbf {X}_m^\top \Big ). \end{aligned}$$
(4.15)

We claim that \(\mathbf {H}\) is \(((\mathbb {S}_+^L)^{m-1}, 1)\)-compatible with the domain \(\Gamma \). This amounts to showing that for all \(i \in \llbracket 2..m \rrbracket \), the mapping \(\mathbf {H}_i: \mathbb {S}_{++}^L \times \mathbb {R}^{L \times L} \rightarrow \mathbb {S}^L\), \(\mathbf {H}_i(\mathbf {X}) = \mathbf {X}_1 - \mathbf {X}_i \mathbf {X}_{1}^{-1} \mathbf {X}_i^\top \) is \((\mathbb {S}_+^L, 1)\)-compatible with the domain \(\mathbb {S}_{+}^L \times \mathbb {R}^{L \times L}\) (the requirements for compatibility decouple for each i). The latter holds since \(\mathbf {H}_i\) is equivalent to the function \(\mathbf {G}\) from Theorem 4.1 with \(m=2\). Then by Nesterov and Nemirovskii [14, Lemma 5.1.7], \(\hat{F}\) is an Lm-self-concordant barrier. \(\square \)

The Hessian of F can be evaluated in \(\mathcal {O}(L U^2 m)\) time, which is implemented in Hypatia. Note that we rely on an analogy of a representation for the \(\ell _\infty \)-norm cone (see [4, Section 5.1]) in Eq. (4.12). From this we derive an LHSCB that is analogous to the \(\ell _\infty \)-norm cone LHSCB. On the other hand, we are not aware of an efficient LHSCB for its dual, the \(\ell _1\)-norm cone, so we cannot use the same technique to derive an LHSCB for the dual of a polynomial analogy to the \(\ell _\infty \)-norm cone.