Abstract
A new primaldual interiorpoint algorithm applicable to nonsymmetric conic optimization is proposed. It is a generalization of the famous algorithm suggested by Nesterov and Todd for the symmetric conic case, and uses primaldual scalings for nonsymmetric cones proposed by Tunçel. We specialize Tunçel’s primaldual scalings for the important case of 3 dimensional exponentialcones, resulting in a practical algorithm with good numerical performance, on level with standard symmetric cone (e.g., quadratic cone) algorithms. A significant contribution of the paper is a novel higherorder search direction, similar in spirit to a Mehrotra corrector for symmetric cone algorithms. To a large extent, the efficiency of our proposed algorithm can be attributed to this new corrector.
Introduction
In 1984 Karmarkar [11] presented an interiorpoint algorithm for linear optimization with polynomial complexity. This triggered the interiorpoint revolution which gave rise to a vast amount research on interiorpoint methods. A particularly important result was the analysis of socalled selfconcordant barrier functions, which led to polynomialtime algorithms for linear optimization over a convex domain with a selfconcordant barrier, provided that the barrier function can be evaluated in polynomial time. This was proved by Nesterov and Nemirovski [19], and as a consequence convex optimization problems with such barriers can be solved efficently by interiorpoint methods, at least in theory.
However, numerical studies for linear optimization quickly demonstrated that primaldual interiorpoint methods were superior in practice, which led researchers to generalize the primaldual algorithm to general smooth convex problems. A major breakthrough in that direction is the seminal work by Nesterov and Todd (NT) [20, 21] who generalized the primaldual algorithm for linear optimization to selfscaled cones, with the same complexity bound as in the linear case. Güler [8] later showed that the selfscaled cones correspond to the class of symmetric cones, which has since become the most commonly used term. The good theoretical performance of primaldual methods for symmetric cones has since been confirmed computionally, e.g., by [2, 28].
The class of symmetric cones has been completely characterized and includes 5 different cones, where the three most interesting ones are the nonnegative orthant, the quadratic cone, and the cone of symmetric positive semidefinite matrices, as well as products of those three cones. Although working exclusively with the symmetric cones is a limitation, they cover a great number of important applications, see, e.g., Nemirovski [16] for an excellent survey of conic optimization over symmetric cones.
Some convex sets with a symmetric cone representation are more naturally characterized using nonsymmetric cones, e.g., semidefinite matrices with chordal sparsity patterns, see [32] for an extensive survey. Thus algorithmic advancements for handling nonsymmetric cones directly could hopefully lead to both simpler modeling and reductions in computational complexity. Many other important convex sets cannot readily be modeled using symmetric cones, e.g., convex sets involving exponentials or logarithms, for which no representation using symmetric cones is known. In terms of practical importance, the three dimensional power and exponentialcones are perhaps the most important nonsymmetric cones; Lubin et al. [12] showed how all instances in a large benchmark library can modeled using the three symmetric cones as well the three dimensional power and exponential cone.
Generalizing methods from symmetric to nonsymmetric cones is not straightforward, however. In [22] Nesterov et al. suggest a longstep algorithm using both the primal and dual barriers effectively doubling the size of the linear system solved at each iteration. For smalldimensional cones (such as the exponential cone) this overhead might be acceptable.
More recently Nesterov [18] proved the existence of a NTlike primaldual scaling in the vicinity of the central path, leading to an algorithm that uses only a single barrier, but is restricted to following the central path closely. Compared to [22] this method has a main advantage of reducing the size of the linear system to the same size of the algorithms for symmetric cones; a similar advantage is shared by algorithms using explicit primaldual scalings, e.g., scalings by Tunçel [30] considered later. At each iteration Nesterov’s method [18] has a centering phase, which brings the current iterate close to the central path, which is followed by one affine step which brings the iterate closer to optimum. From a practical perspective this is a significant drawback of the method since the centering phase is computationally costly.
Hence, the centering and affine steps should be combined as in the symmetric cone case. Also both algorithms [18, 22] are feasible methods, i.e., they require either a strictly feasible known starting point or some sort of phaseI method to get a feasible starting point. Skajaa and Ye [27] extended Nesterov’s method [18] with a homogeneous model, which also simplifies infeasibility detection. In practice, however, the method of Skaajaa and Ye is not competitive with other methods [3] due to the centering steps.
In more recent work Serrano and Ye [26] improves the algorithm in [27] such that explicit centering steps are not needed, but instead restricting iterates to a vicinity of the central path. This method has been implemented as part of the ECOS solver [6] and will be used for comparison in Sect. 8.
A different approach to nonsymmetric conic optimization was proposed by Tunçel [30], who extends the concepts of the NT algorithm in a more direct fashion. Nesterov and Todd [20, 21] showed existence of a primaldual scaling defined by a single scaling point w satifying two secant equations \(s=F''(w)x\) and \(F'(x) = F''(w)F_*'(s)\), where \(F'\) and \(F''\) denotes the first and secondorder derivatives of the barrier. Furthermore, the scaling \(F''(w)\) is shown to be bounded, which is a key property of establishing polynomialtime complexity of the NT algorithm. Tunçel later showed in [30] that such a scaling point w exists for any convex cone K, but satisfying only one secant equation \(s = F''(w)x\), and if the barrier has negative curvature then w is unique [23]. To enforce both secant equations in the nonsymmetric case, Tunçel [30] considered a sequence of lowrank quasiNewton updates to a general positive definite matrix.
These ideas were further explored by Myklebust and Tunçel [15] and also in [14] and form the basis of our proposed algorithm. Following this line of work, the essential difference from a symmetric NT algorithm is the computation of a general positve definite scaling matrix, satisfying the same secant equations, but without relying on a given scalingpoint w. Furthermore, such scaling matrices should be bounded to ensure polynomialtime complexity.
For threedimensional cones these scaling matrices are particularly simple and characterized by a single scalar, as shown in Sect. 5. Uniform boundedness of the scaling matrices is not established in this correspondence, but we comment on an efficient method for computing the most bounded scaling matrix using the formulations developed in Sect. 5.
It is also possible to develop algorithms for convex optimization specified on functional form. This has been done by several authors, for example by [3, 4, 7, 9] who all solve the KKT optimality conditions of the problem in functional form. The algorithms all require a sort of merit or penalty function, which often require problem specific parameter tuning to work well in practice. Another strain of research is the work Nemirovski and Tunçel [17] and very recently Karimi and Tunçel [10] who advocate a nonlinear convex formulation instead of a conic formulations, explicitly using selfconcordant barriers for the convex domains.
From a theoretical point of view the different algorithms for nonsymmetric cones (including the functional formulations) all share the same bestknown complexity bounds as the symmetric counterparts. Whether these methods are competetive in practice with algorithms for symmetric cones is still an unanswered question, though.
The algorithm we consider herein uses the scaling matrices by Tunçel [30], resulting in an algorithm that is similar to the symmetric counterpart; the linear system solved at each iterations is very similar and both the residuals and the complementarity gap decrease at the same rate. The algorithm is a natural extension of the NesterovTodd algorithm implemented for symmetric cones in, e.g., SeDuMi [28] and MOSEK [3].
It is well known that the Mehrotra predictorcorrector idea [13] leads to vastly improved computational performance in the symmetric cone case. One of our main contributions is a new corrector for nonsymmetric cones, closely tied to Tunçel’s primal dual scalings. It is derived from a secondorder approximation of the centrality condition \(s_\mu = \mu F'(x_\mu )\) and thus involves thirdorder directional derivatives. The proposed corrector loosely follows the central path, without restricting the size of the neighborhood, thereby allowing the algorithm to take longer steps, and it shares similarities with the standard Mehrotra corrector; in particular they coincide if \(F(x) = \sum _i \log x_i\) denotes the standard barrier for linear optimization.
We demonstrate numerically that the proposed corrector offers a substantial and consistent reduction in the number of iterations required to solve the problems in all our numerical studies. Skajaa and Ye [27] suggest a different corrector by characterizating the full central path as a differential equation, solved by a RungeKutta method. An immediate drawback of Skajaa’s RungeKutta method is that it requires additional factorizations of the full KKT system, which significantly adds to the overall solution time.
The remaining paper is structured as follows. We define basic properties for the exponential cone in Sect. 2, and we discuss the homogeneouous model, the central path and related metrics in Sect. 3. In Sect. 4 we discuss searchdirections assuming a primaldual scaling is known, and we derive a our new corrector and provide a simple numerical example that illustrates how the corrector algorithm makes more steady progress. In Sect. 5 we discuss new characterizations of the primaldual scalings from [15, 30], which reduce to univariate characterizations for threedimensional cones.
In Sect. 6 we give a collected overview of the suggested pathfollowing algorithm. This is followed by a discussion of some implementation details. Next in Sect. 8 we present numerical results on a moderately large collection of exponentialcone problems. We conclude in Sect. 9, and in the appendix we give details of the first, second and thirdorder derivatives of the barrier for the exponential cone.
Preliminaries
In this section we list wellknown properties of selfconcordant and selfscaled barriers, which are used in the remainder of the paper. The proofs can be found in references such as [20, 21]. We consider a pair of primal and dual linear conic problems
and
where \(y\in \mathbf{{R}}^m\), \(s\in \mathbf{{R}}^n\) and \(K\subset \mathbf{{R}}^n\) is a proper cone, i.e., a pointed, closed, convex cone with nonempty interior. We assume throughout the paper that A has full rank, i.e., the rows A are linearly independent. The dual cone \(K^*\) is
If K is proper then \(K^*\) is also proper. A cone K is called selfdual if there is positive definite map between \({K}\) and \({K}^*\), i.e., if \(T {K}= {K}^*\), \(T\succ 0\). A function \(F:\mathbf{{int}}(K)\mapsto \mathbf{{R}}\), \(F\in C^3\) is a \(\vartheta \)logarithmically homogeneouos selfconcordant barrier (\(\vartheta \)LHSCB) for \(\mathbf{{int}}(K)\) if
and
holds for all \(x\in \mathbf{{int}}(K)\) and for all \(u\in \mathbf{{R}}^n\). For a pointed cone \(\vartheta \ge 1\). We will refer to a LHSCB simply as a selfconcordant barrier. If \(F_1\) and \(F_2\) are \(\vartheta _1\) and \(\vartheta _2\)selfconcordant barriers for \(K_1\) and \(K_2\), respectively, then \(F_1(x_1) + F_2(x_2)\) is a \((\vartheta _1+\vartheta _2)\)selfconcordant barrier for \(K_1 \times K_2\). Some straightforward consequences of the homogeneouos property include
If F is a \(\vartheta \)selfconcordant barrier for K, then the Fenchel conjugate
is a \(\vartheta \)selfconcordant barrier for \(K^*\). Futhermore, if \((x,s)\in \mathbf{{int}}(K)\times \mathbf{{int}}(K^*)\) then \((F'(x),F'_*(s)) \in \mathbf{{int}}(K^*)\times \mathbf{{int}}(K)\).
For a \(\vartheta \)selfconcordant barrier, the socalled Dikin ellipsoid
is included in the cone for \(r<1\), i.e., \(E(x,r)\subset \mathbf{{int}}(K)\) for \(r<1\), and F is almost quadratic inside this ellipsoid,
for all \(z\in E(x,r)\).
A cone is called selfscaled if it has a \(\vartheta \)selfconcordant barrier F such that for all \(w, x\in \mathbf{{int}}(K)\),
and
Selfscaled cones are equivalent to symmetric cones and they satisfy the stronger longstep Hessian estimation property
for any \(\alpha \in [0; \sigma _x(p)^{1})\) where
denotes the distance to the boundary. Many properties of symmetric cones follow from the fact that the barriers have negative curvature \(F'''(x)[u]\preceq 0\) for all \(x\in \mathbf{{int}}(K)\) and all \(u\in K\). An interesting property proven in [23] is that if both the primal and dual barrier has negative curvature then the cone is symmetric.
In addition to the three symmetric cones (i.e., the nonnegative orthant, the quadratic cone and the cone of symmetric positive semidefinite matrices) we mainly consider the nonsymmetric exponential cone studied by Charez [5] in the present work.
with a 3selfconcordant barrier,
The dual exponential cone is
The exponential cone is not selfdual, but \(T {K}_\text {exp}={K}_\text {exp}^*\) for
For the exponential cone the conjugate barrier \(F_*(s)\) or its derivatives cannot be evaluated on closedform, but it can be evaluated numerically to high accuracy (e.g., with a damped Newton’s method) using the definition (1), i.e., if
then
We conclude this survey of introductory material by listing some of the many convex sets that can be represented using the exponential cone, or a combination of exponential cones and symmetric cones. The epigraph \(t\ge e^x\) can be modelled as \((t,1,x) \in {K}_\text {exp}\) and similarly for the hypograph of the logarithm \(t\le \log x \Leftrightarrow (x,1,t) \in {K}_\text {exp}\). The hypograph of the entropy function, \(t\le x \log x\) is equivalent to \((1,x,t) \in {K}_\text {exp}\), and similarly for relative entropy \(t\ge x \log (y/x) \Leftrightarrow (y,x,t) \in {K}_\text {exp}\). The softplus function \(\log (1 + e^x)\) can be thought of as a smooth approximation of \(\max \{0, x\}\). Its epigraph can be modelled as \(t\ge \log (1 + e^x) \Leftrightarrow u+v = 1, \, (u,1,xt),(v,1,t) \in {K}_\text {exp}\). The epigraph of the logarithm of a sum exponentials can modelled as
These examples all have auxiliary variables and constraints in their conic representations, which might suggest that an algorithm working directly with a barrier of the convex domain (e.g., [10]) is more efficient. However, a conic formulation has the advantage of nicer conic duality, and it is easy to exploit the special (sparse) structure from the additional constraints and variables in the linear algebra implementation, thereby eliminating the overhead in a conic formulation.
The homogeneous model and central path
In the simplified homogeneous model we embed the KKT conditions for (P) and (D) into the homogeneous selfdual model
where \({\hat{x}} = (x_1,\dots ,x_k)\) is a concatenation of conic variables, \(x_i\in {K}_i\). We denote the primaldual variables by \(({\hat{x}}, {\hat{s}})\) to distinguish them from augmented variables defined next. Let
We than have
where \({K}:= K_1\times \cdots \times K_{k+1}\) has a barrier
with complexity \(\vartheta = \sum _{i=1}^{k+1} \vartheta _i\). Let \(z:=(x, s, y)\) and define
The KKT conditions can then be expressed succinctly as
where \({{\mathcal {D}}} := {K}\times {K}^* \times \mathbf{{R}}^m \). Given an initial \(z^0\in \mathbf{{int}}({{\mathcal {D}}})\) we consider a central path \(z_\mu \) as the solution to
parametrized by \(\mu \in (0,1]\), and on the central path we have
The following lemma gives an equivalent variational characterization of the central path.
Lemma 1
Given \(z^0\in \mathbf{{int}}({{\mathcal {D}}})\). Let
Then
We omit the proof, which follows from the optimality conditions for minimizing \(\varPsi (z)\). In [22] the central path is defined from the variational characterization in Lemma 1, and they prove that the definition in (4)–(5) is equivalent. From the variational characterization \(z_\mu \) is welldefined, and \(\lim _{\mu \rightarrow 0} z_\mu = z^\star \) satisfies (see, e.g., [22]),

1.
\({\langle x^\star , s^\star \rangle } = 0\).

2.
If \(\tau ^\star > 0\) then \({\hat{x}}^\star /\tau ^\star \) is an optimal solution for (P) and \((y^\star ,{\hat{s}}^\star )/\tau ^\star \) is an optimal solution for (D).

3.
If \(\kappa ^\star > 0\) then \({\langle b, y^\star \rangle }>0\) and (P) is infeasible, or \({\langle c, {\hat{x}}^\star \rangle } < 0 \) and (D) is infeasible, or both.
In the following neighborhood definitions and subsequent algorithms we consider iterates (x, s, y) that are generally not on the central path, and we define
A neighborhood used by Skajaa and Ye [27] is then
which characterizes the central path for \(\beta =0\). We can think of (6) as a generalization of the standard twonorm neighborhood
from linear optimization [33].
A different neighborhood is due to Nesterov and Todd [21]. We define shadow iterates (following [30])
and
for an iterate \((x, s, y)\in {{\mathcal {D}}}\). Nesterov and Todd [21] then showed that \(\mu {{\tilde{\mu }}}\ge 1\) with equality only on the central path. This leads to a different neighborhood \(\beta \mu {{\tilde{\mu }}} \le 1\) for \(\beta \in (0;1]\), or equivalently
This is satisfied if
leading to another neighborhood definition
which (in contrast (6)) characterizes the central path for \(\beta =1\). We use the neighborhood \({{\mathcal {N}}}(\beta )\), which can be seen as a generalization of the onesided \(\infty \)norm neighborhood.
Both the central path and the neighborhood depend on the initial values. A simple choice is \(y^0=0\) and
which are optimality conditions for minimizing
If \({K}_i={K}_\text {exp}\) this can be solved offline using a backtracking Newton’s method to get
For the symmetric cones and the threedimensional powercone such a central starting point can be found analytically. Then \({\langle x^0, s^0 \rangle }/\vartheta = 1\) and \((x^0, s^0) \in {{\mathcal {N}}}(1)\).
Searchdirections using a primaldual scaling
In this section we define search directions, assuming a primaldual scaling is known; how to compute such scalings is discussed in Sect. 5. We consider an iterate \((x,s,y)\in \mathbf{{int}}({{\mathcal {D}}})\) and nonsingular primaldual scalings \(W_i\) satisfying double secant equations
where \(\tilde{x_i}\) and \(\tilde{s_i}\) are the shadow iterates defined in (7). Let
We can then express the primaldual scaling succinctly as
where \(W_{k+1} := \sqrt{\kappa /\tau }\) and
Whenever we encounter a scaling W in the remainder of this paper, it is assumed that W satisfies the double secant equation (8) for the current iterate.
To linearize the centrality condition we consider
with a linearization given by
On the central path we can express (9) as
with \(W := [\mu F''(x)]^{1/2}\). If (x, s) is not on the central path then (10) is an approximate linearization of the symmetric centrality condition
with a quality determined by the distance \(\Vert W^T W  \mu F''(x)\Vert \). Thus, the assumption that \(W^T W \approx \mu F''(x)\) is important for our proposed algorithm, including the corrector studied later in this section.
We next define different searchdirections used in the proposed algorithm. The affine searchdirection is the solution to
and is characterized by the following lemma.
Lemma 2
The solution to (11) satisfies
and for all \(\alpha \in \mathbf{{R}}\),
Proof
It follows from (11) that
and skewsymmetry implies that
which combined shows that \({\langle \varDelta x^\text {a}, \varDelta s^\text {a} \rangle }=0\). The last part follows directly from (12). \(\square \)
Lemma 2 shows that a full affine step step \((z+\varDelta z^\text {a})\) satisfies both \(G(z+\varDelta z^\text {a})=0\) and \({\langle x+\varDelta x^\text {a}, s+\varDelta s^\text {a} \rangle }=0\). Thus, if \((z+\varDelta z^\text {a})\in {{\mathcal {D}}}\) then \((z+\varDelta z^\text {a})\) is optimal (i.e., a solution to (3)).
We next consider a higherorder corrector for nonsymmetric cones, with some similarities to a Mehrotra corrector for symmetric cones. We consider the first and secondorder derivatives of \(s_\mu = \mu F'(x_\mu )\) with respect to \(\mu \),
From (13) we have
resulting in an expression for the thirdorder directional derivative,
where (15) follows from the homogeneity property \(F'''(x)[x]=2F''(x)\). This results in an alternative expression for the secondorder derivative of the centrality condition, i.e.,
Assuming that \(W^T W \approx \mu F''(x)\) and comparing (11) and (13) we interpret \(\varDelta s^\text {a}= \dot{s}_\mu /\mu \) and \(\varDelta x^\text {a}= \dot{x}_\mu /\mu \), leading to the definition of our corrector term
Thus a pure corrector searchdirection can be defined as the solution to
and satisfies properties given in the following lemma.
Lemma 3
The solution to (17) satisfies
Proof
From (17) we have that
using the homogeneity property \(F'''(x)[x] = 2F''(x)\), and skewsymmetry implies that \({\langle \varDelta x^\text {c}, \varDelta s^\text {c} \rangle }=0\). \(\square \)
We note that
reduces to the familiar expression \(\kappa \varDelta \tau ^\text {c}+ \tau \varDelta \kappa ^\text {c}= \varDelta \tau ^\text {a}\varDelta \kappa ^\text {a}\).
For a given centering parameter \(\gamma > 0\) we define a combined searchdirection as the solution to
with properties given in the following lemma.
Lemma 4
The solution to (18) satisfies
and for all \(\alpha \in \mathbf{{R}}\),
Proof
From (18) and Lemma 3 we have that
and skewsymmetry implies that
i.e., \({\langle \varDelta x, \varDelta s \rangle }=0\). The last part now follows. \(\square \)
The searchdirection (18) forms the basis of our algorithm. For a given stepsize \(\alpha \in (0, 1]\) the residuals and the complementarity gap decrease at the same rate. More explicitly, for \(\mu ^k := {\langle x^k, s^k \rangle }/\vartheta \) the residuals are the kth iteration are
which should be compared with central path definition in Lemma 1. Similarly, the complementarity gap at the kth iteration is
This is in contrast with other methods [18, 26, 27], which do not decrease the complementarity gap at the same rate. Also, no explicit merit function as in [9] is required to ensure a balanced decrease of the residuals and the complementarity gap.
Tunçel [30] showed polynomial complexity of an infeasible method (without a corrector) assuming boundedness of the scaling matrices. These results were further extended by Myklebust and Tunçel [15] to include an analysis for scaling matrices obtained by a BFGS scaling (such scalings are considered in the next section). Although we use slightly different scaling matrices, their analysis could be be applied in a small neighborhood around the central path. Thus a shortstep algorithm using the search directions herein would likely inherit good theoretical performance. It is also possibility that future studies will prove a conjucture that the scaling matrices considered herein are bounded, which would simplify a complexity analysis.
To gain some insight into the corrector \(\eta \) we note that in the case of the nonnegative orthant we have the familiar expression
and similarly for the semidefinite cone we have
using the generalized product associated with the Euclidean Jordan algebra, see, e.g., [31] for a discussion of Euclidean Jordan algebra in a similar context as ours. For the Lorentz cone we have
with \(Q=\mathbf {diag}(1,1,\dots ,1)\). Let \(e_k\) denote the kth standard basisvector (i.e., the vector with value 1 in position k and 0 elsewhere). Then
again using the notation of the generalized product [31]. We defer the derivation and implementation specific details for the exponential cone to the appendix.
As an illustration of the proposed corrector we consider a simple example,
In Table 1 we list the complementarity gap \({\langle x^{k}, s^{k} \rangle }\) for the iterates produced using the search direction (18) with and without the proposed corrector, using the scaling matrices defined in Sect. 5.
In Fig. 1 we plot the same iterates \(x^{k}/\tau ^\star \) projected onto the hyperplane \(x_1 + x_2 + x_3 = 1\). Although some iterates appear close to the boundary they are all within the defined neighborhood, i.e., no cutback is performed to stay within the neighborhood. We see how the corrector algorithm makes significantly more progress and thereby reduces the required number of iterations. A similar observation is made for a wide selection of testproblems in Sect. 8.
Primaldual scalings
In this section we review key results for primaldual scalings by Tunçel [30], we make connections to the theory of multiple secantequation updates by Schnabel [25], and we present new formulations, which were used collaboratively in [24] to compute optimally bounded scaling matrices specifically for threedimensional cones.
Tunçel [30] defines a set of scalings
i.e., positive definite scalings satisfying double secant equations. Tunçel further defines a set of bounded scalings parametrized by \(\xi \),
where \(\delta _F := [\vartheta (\mu {{\tilde{\mu }}}  1) + 1]/\mu \). Let
In the case where \(\xi ^\star \in {{\mathcal {O}}}(1)\) over all \((x,s)\in \mathbf{{int}}(K)\times \mathbf{{int}}(K^*)\), Tunçel proved an iteration complexity bound of \({{\mathcal {O}}}(\sqrt{\vartheta }\log (1/\epsilon ))\) for an infeasiblestart primaldual interior point algorithm (coinciding with the best known complexity bound for interiorpoint methods). The parameter \(\delta _F\) and set \({{\mathcal {T}}}_2 (x,s,\xi )\) are further addressed towards the end of this section in the context of finding optimally bounded scaling matrices.
A selfscaled cone has a unique NesterovTodd scaling point w satisfying double secant equations,
Furthermore, \(F''(w)\) is bounded (see [20, 21, 30]) in the sense that
A barrier F(x) is said to have negative curvature if for all \(x\in \mathbf{{int}}(K)\) and for \(u\in {K}\) we have
and F(x) is selfscaled if and only if F(x) and \(F_*(s)\) both have negative curvature [23]. Barriers with negative curvature (but which are not selfscaled) still have a unique scaling point w satisfying exactly one secant equation
but not the other. The exponentialcone barrier does not have negative curvature, as can be seen be considering \({\hat{x}} := (1,e^{2},0)\in {K}_\text {exp}\) and \({\hat{u}}:=(1,0,0)\in {K}_\text {exp}\setminus \mathbf{{int}}({K}_\text {exp})\). Then it can be verified using the expressions in the appendix that
which is indefinite, for example
for \({\hat{v}}:=(1, 8e^{2}, 4e^{2})\).
Thus for general nonsymmetric cones the essential question is how to define bounded scaling matrices satisfying both secant equations, without relying on a scaling point w. Tunçel [30] partly answers that question by deriving scalings \(T\in {{\mathcal {T}}}_1(x,s)\) using BFGS update equations, resulting in a rank4 update to a given positive definite matrix. It is still instructive, however, to review work by Schnabel on quasiNewton methods with multiple secant equations. In particular, the following theorem from [25] is used repeatedly in the following discussion.
Theorem 1
Let \(S,Y\in \mathbf{{R}}^{n\times p}\) have full rank p. Then there exists \(H\succ 0\) such that \(HS=Y\) if and only if \(Y^TS\succ 0\).
As a consequence we can write any such \(H\succ 0\) as
or in factored form \(H = W^T W\) with
One may verify that given any \(\varOmega \succ 0\) satisfying \(\varOmega S=Y\), the following identify holds
and therefore
where \(R R^T = \varOmega ^{1}Z(Z^T \varOmega ^{1} Z)^{2} Z^T \varOmega ^{1}\). One of the most popular quasiNewton update rules is the BroydenFletcherGoldfarbShanno (BFGS) step in (23), where \(H\succ 0\) denotes an approximation of the Hessian.
It is well known (see, e.g., [25]) that the update (23) is the solution to
for any \(\varOmega \succ 0\) satisfying \(\varOmega S=Y\). In the following theorem we show how (23) can be computed using a sequence of updates, similar to quasiNewton updates with a single secant equation.
Theorem 2
Given \(Y_0,S_0\in \mathbf{{R}}^{n\times p}\) with \(Y_0^T S_0\succ 0\). Then
where \(V:=\begin{pmatrix}v_1&\cdots&v_p\end{pmatrix}\), \(U:=\begin{pmatrix}u_1&\cdots&u_p\end{pmatrix}\) and
for \(k=1,\dots ,p\).
Proof
The proof is constructive and follows from a Cholesky factorization of \(Y_0^T S_0\). Let
In the proof we first show that \(Y^T_0 S_0 = L L^T\), and we then show second that \(LV^T=Y_0^T\), which in turn implies (25).
We start by showing that the recursion (25), (26) is welldefined. Let \(\varPsi _k\) be the principal submatrix obtained from the last \(pk\) rows and columns of \(Y_k^T S_k\), where \(\varPsi _0 = Y_0^T S_0\succ 0\) by assumption. Expanding \(Y_k^T S_k\) we have
i.e., \(\varPsi _k \) is the Schurcomplement of the first element of \(\varPsi _{k1}\) and therefore positive definite.
We next make a simplifying observation, namely that the first k columns of \(Y_k\) and \(S_k\) are zero. From (25), (26) we immediately have that \(Y_ke_k = S_ke_k = 0\), \(k=1,\dots p\), and that sparsity propagates to subsequent steps, i.e., \(Y_j e_k = S_j e_k = 0\), \(j>k\).
We can now prove that \(LL^T = Y_0^T S_0\). We have
Repeated use of (27) in (28) then shows that
We finally show that \(LV^T = Y_0^T\). From repeated use of (25) it follows that
Since \(LV^T = Y_0^T\) we have that \(Y_0 (L L^T)^{1} Y_0^T = Y_0(Y_0^T S_0)^{1} Y_0^T = V V^T\). Equation (26) follows similarly. \(\square \)
Primaldual scalings can then be derived similarly to Tunçel [30] and Myklebust and Tunçel [15]. In our context we derive nonsingular (factored) scalings W satisfying
i.e., \((W^T W)^{1} \in {\mathcal {T}}_1(x,s)\) defined in (20). We define
where we remind the reader that \({\tilde{x}} := F'_*(s)\) and \({\tilde{s}} := F'(x)\).
The condition \(Y^T S \succ 0\) is equivalent to assuming that (x, s) is not on the central path^{Footnote 1}. We next define
also used in [15]. To compute (23) we first use Theorem 2 with \(S_0:=S\), \(Y_0:=Y\) resulting in
We next define
If we use Theorem 2 again, this time with \(S_0:=S\) and \(Y_0:=HS\) we get
leading to an expression for the BFGS update as a rank 4 update to \(H\succ 0\) (for a general H),
Considering (24), we see that the BFGS update to \(H:=\mu F''(x)\) has the desirable property of minimizing
measured in a weighted norm; for simplicity we can assume that \(\varOmega = W^T W\). With this choice of H we have
which curiously reduces to a rank 3 update,
We conclude this section by considering the threedimensional case, for which the expressions simplify significantly. It follows from Theorem 1 that any scaling (29) has the form
where \(S^Tz = 0\), \(Y^Tr = 0\) and \({\langle r, z \rangle }=1\), i.e., the scaling is essentially characterized by \(t>0\). For simplicity, we assume \(\Vert z\Vert =1\). We compute z and r using crossproducts,
In the three dimensional case we can devise a simple algorithm for finding scalings achieving the bound (22). For notational convinience we introduce \(Q:=\left[ \begin{array}{cc}r,&S\end{array}\right] \), which is nonsingular. We can then solve
using a simple bisection algorithm. Consider the monotonically decreasing function,
and the monotonically increasing increasing function
Given upper and lower bounds on t, the solution to (31) can then be found using bisection on t to solve \(\xi ^l(t) = \xi ^u(t)\). Such a bisection method was considered in [24], where it was conjectured that \(\xi ^\star \approx 1.253\) for the exponential cone.
The BFGS scaling corresponds to
We have tried both the optimal scaling and the BFGS scaling for all numerical test problems in Sect. 8, without noticing a significant difference in required number of iterations or quality of the solution. For all the problems the largest observed bound on \(\xi \) was 1.72 for the BFGS scaling; for simplicity we only report results for the simpler BFGS scaling in the following. In preliminary experiments we have observed similarly encouraging numerical results for higher dimensional nonsymmetric cones using the BFGS scaling. The bisection algorithm is still valuable as a reference, however, and we hope that future studies will prove the conjecture, and possibly derive similar bounds for other three dimensional cones (including a tighter bound than 4/3 for quadratic cones).
A primaldual algorithm for exponential cone optimization
In this section we give a collected overview of the suggested pathfollowing primaldual algorithm. The algorithm is specialized for threedimensional cones (in particular, the exponential cone) by using crossproducts for computing the BFGS scalings. By computing the scaling matrices as a general rank 3 update (30) the algorithm is readily adapted to other nonsymmetric cones.
We fix \(\beta \) to a constant low value, for example \(\beta =10^{6}\). The different essential parts of the method are i) finding a starting point, ii) computing a searchdirection and stepsize, and iii) checking the stopping criteria for termination.

Starting point. Find a starting point on the central path
$$\begin{aligned} x = s = F'(x) \end{aligned}$$and \(y=0\), \(\tau =\kappa =1\). Then \(z:=({\hat{x}}, {\hat{s}},y,\tau ,\kappa )\in {{\mathcal {N}}}(1)\).

Scaling matrices. Compute BFGS scaling matrices
$$\begin{aligned} W_k = \left[ \begin{array}{ccc} \frac{s_k}{\sqrt{{\langle x_k, s_k \rangle }}},&\frac{\delta _{s_k}}{\sqrt{{\langle \delta _{x_k}, \delta _{s_k} \rangle }}},&\sqrt{t_k}\cdot z_k \end{array} \right] ^T, \quad W_k^{1} = \left[ \begin{array}{ccc} \frac{x_k}{\sqrt{{\langle x_k, s_k \rangle }}},&\frac{\delta _{x_k}}{\sqrt{{\langle \delta _{x_k}, \delta _{s_k} \rangle }}},&\frac{r_k}{\sqrt{t_k}} \end{array} \right] ^T \end{aligned}$$where \(z_k = (x_k \otimes {\tilde{x}}_k)/\Vert x_k \otimes {\tilde{x}}_k\Vert \), \(r_k = (s_k \otimes {\tilde{s}}_k)/{\langle s_k \otimes {\tilde{s}}_k, z_k \rangle }\) and \(t_k\) is chosen from (32). Note that \(\{z_k\}\) and z are unrelated; the later denotes the aggregation of all primal and dual variables.

Searchdirection and stepsize. Compute an affine direction \(\varDelta z^\text {a}\) as the solution to (11),
$$\begin{aligned} G(\varDelta z^\text {a}) = G(z), \quad W\varDelta x^\text {a}+ W^{T}\varDelta s^\text {a}= v. \end{aligned}$$From \(\varDelta z^\text {a}\) we compute a corrector (16),
$$\begin{aligned} \eta := \frac{1}{2} F'''(x) [\varDelta x^\text {a}, (F''(x))^{1} \varDelta s^\text {a}], \end{aligned}$$similar to Mehrotra [13], where details on evaluting the derivatives are given in the appendix, see (34). We define a centering parameter \(\gamma \) as
$$\begin{aligned} \gamma := (1\alpha _\text {a}) \min \{ (1\alpha _\text {a})^2, 1/4 \}, \end{aligned}$$where \(\alpha _\text {a}\) is the stepsize to the boundary, i.e.,
$$\begin{aligned} \alpha _\text {a} = \sup \{ \alpha \, \mid \, (x+\alpha \varDelta x^\text {a})\in K, \, (s+\alpha \varDelta s^\text {a})\in K^*, \, \alpha \in [0; 1]\} \end{aligned}$$which we approximate using a bisection procedure. We then compute a combined centeringcorrector search direction \(\varDelta z\) as the solution to (18),
$$\begin{aligned} G(\varDelta z) = (1\gamma )G(z), \quad W\varDelta x+ W^{T}\varDelta s= v + \gamma \mu {\tilde{v}}  W^{T}\eta \end{aligned}$$and we update \(z:=z+\alpha \varDelta z\) with the largest step \(\alpha \in [0;1)\) inside a neighborhood \({{\mathcal {N}}}(\beta )\) of the central path.

Checking termination. Terminate if the updated iterate satisfies the termination criteria (given in Sect. 7.3) or else take a new step.
Implementation
MOSEK is software package for solving large scale linear and conic optimization problems. It can solve problems with a mixture of linear, quadratic and semidefinite cones, and the implementation is based on the homogeneous model, the NT search direction and a Mehrotra like predictorcorrector algorithm [2].
Our implementation has been extended to handle the three dimensional exponential cone using the algorithm above. We use the usual NT scaling for the symmetric cones and the Tunçel scalings for the nonsymmetric cones. Except for small differences in the linearization of the complementarity conditions, the symmetric and nonsymmmetric cones are handled completely analogously. Our extension for nonsymmetric cones also includes the three dimensional power cone, but this is not discussed further here.
Dualization, presolve and scaling
Occasionally it is worthwhile to dualize the problem before solving it, since it will make the linear algebra more efficient. Whether the primal or dual formulation is more efficient is not easily determined in advance. MOSEK makes a heuristic choice between the two forms, and the dualization is transparent to the user.
Furthermore, a presolve step is applied to the problem, which often leads to a significant reduction in computational complexity [1]. The presolve step removes obviously redundant constraints, tries to remove linear dependencies, etc. Finally, many optimization problems are badly scaled, so MOSEK rescales the problem before solving it. The rescaling is very simple, essentially normalizing the rows and columns of the A.
Computing the search direction
Usually the most expensive operation in each iteration of the primaldual algorithm is to compute the search direction, i.e., solving the linear system
where W is blockdiagonal scaling matrix for a product of cones. Eliminating \(\varDelta s\) and \(\varDelta \kappa \) from the linearized centrality conditions results in the reduced bordered system
which can be solved in different ways. Given a (sparse) \(LDL^T\) factorization of the symmetric matrix
it is computationally cheap to compute the search direction. In the case that the factorization breaks down due to numerical issues we add regularization to the system, i.e., we modify the diagonal, which is common in interiorpoint methods. If the resulting search direction is inaccurate (i.e., the residuals are not decreased sufficiently) we use iterative refinement, which in most cases improves the accuracy of the search direction. We omit details of computing the \(LDT^T\) factorization, since it is fairly conventional and close to the approach discussed in [2].
The algorithm is implemented in the C programming language, and the Intel MKL BLAS library is used for small dense matrix operations; the remaining portions of the code are developed internally, and the most computationally expensive parts have been parallelized.
The termination criteria
We next discuss the termination criteria employed in MOSEK. Let \((\varepsilon _p, \varepsilon _d, \varepsilon _g, \varepsilon _i)>0\) be given tolerance levels for the algorithm, and denote by \(({\hat{x}}^k,{\hat{s}}^k,y^k,\tau ^k,\kappa ^k)\in \mathbf{{int}}({{\mathcal {D}}})\) the kth interiorpoint iterate. Consider the metrics
and
If
then
and hence \(({\hat{x}}^k,y^k,{\hat{s}}^k)/\tau ^k\) is an almost primal and dual feasible solution with small duality gap. Clearly, the quality of the approximation is depends on the problem and the specified tolerances \((\varepsilon _p, \varepsilon _d, \varepsilon _g, \varepsilon _i)\). Therefore, \(\rho _p^k\) and \(\rho _d^k\) measure how far the kth iterate is from being approximately primal and dual feasible, respectively. Furthermore, \(\rho _g^k\) measure how far the kth iterate is from having a zero duality gap.
Similarly, define infeasibility metrics
and
If \(\rho _{pi} \le 1\) then
Thus, for
we have
i.e., \(({\bar{y}},{\bar{s}})\) is an approximate certificate of primal infeasibility. Similarly, if \(\rho _{di} \le 1\) we have an approximate certificate of dual infeasibility. Finally, assume that \(\rho _{ip} \le 1\). Then
is an approximate certificate of illposedness. For example, if \(\left\ y^k \right\ _\infty \gg 0\) then a tiny perturbation in b will make the problem infeasible. Hence, the problem is by definition unstable.
Numerical results
We investigate the numerical performance of our implementation on a selection of exponential cone problems from the Conic Benchmark Library (CBLIB) [29], as well as a selection of customer provided problems. Some of those problems have integer variables, in which case we solve their continuous relaxations, i.e., we ignore the integrality constraints. In the study we compare the performance of MOSEK 9.2, both with and without the proposed corrector. In the case without a corrector we also disable the standard Mehrotra corrector effecting linear and quadratic cones; otherwise the residuals will not decrease at the same rate. We also compare our implementation with the opensource solver ECOS [6], which implements the algorithm by Serrano [26]. Since ECOS only supports linear, quadratic and exponential cones we limit the testset to examples with combinations of those cones; there are also instances in CBLIB with both exponential and semidefinite cones.
Fig. 2 shows histograms of the number of iterations required to the solve the problems for ECOS and MOSEK (with and without the proposed corrector). The figure shows a substantial advantage of the proposed corrector over a wide selection of test problems, both in terms of stability and required number of iterations.
Conclusions
Based on previous work by Tunçel we have presented a generalization of the NesterovTodd algorithm for symmetric conic optimization to handle the nonsymmetric exponential cone. Our main contribution is a new Mehrotralike corrector search direction for the to nonsymmetric case, which improves practical performance significantly. Moreover, we presented a practical implementation with extensive computational results documenting the efficiency of proposed algorithm. Indeed the suggested algorithm is significantly more robust and faster than the current state of the art software for nonsymmetric conic optimization ECOS.
Possible future work includes establishing the complexity of the algorithm and applying it to other nonsymmetric cone types, possibly of larger dimensions. One such is example is the nonsymmetric cone of semidefinite matrices with sparse chordal structure [32], which could extend primaldual solvers like MOSEK with the ability to solve large sparse semidefinite programs.
Notes
 1.
If (x, s) is on the central path, then \(W^TW = \mu F''(x)\) is a scaling with \((W^T W)^{1}\in {\mathcal {T}}_2(x,s,1)\), see (21).
References
 1.
Andersen, E.D., Andersen, K.D.: Presolving in linear programming. Math. Program. 71(2), 221–245 (1995)
 2.
Andersen, E.D., Roos, C., Terlaky, T.: On implementing a primaldual interiorpoint method for conic quadratic optimization. Math. Program. 95(2), 249–277 (2003)
 3.
Andersen, E.D., Ye, Y.: On a homogeneous algorithm for the monotone complementarity problem. Math. Program. 84(2), 375–399 (1999)
 4.
Anstreicher, K., Vial, J.P.: On the convergence of an infeasible primaldual interiorpoint method for convex programming. Optim. Methods Softw. 3, 273–283 (1994)
 5.
Chares, P.R.: Cones and interiorpoint algorithms for structured convex optimization involving powers and exponentials. Ph.D. thesis, Université Catholique de Louvain, LouvainlaNeuve (2009)
 6.
Domahidi, A., Chu, E., Boyd, S.: ECOS: An SOCP solver for embedded systems. In: European Control Conference (ECC), pp. 3071–3076 (2013)
 7.
ElBakry, A.S., Tapia, R.A., Tsuchiya, T., Zhang, Y.: On the formulation and theory of the primaldual Newton interiorpoint method for nonlinear programming. J. Optim. Theory Appl. 89(3), 507–541 (1996)
 8.
Güler, O.: Barrier functions in interior point methods. Math. Oper. Res. 21(4), 860–885 (1996)
 9.
Huang, K.L., Mehrotra, S.: A Modified Potential Reduction Algorithm for Monotone Complementarity and Convex Programming Problems and Its Performance. Northwestern University, Technical repot (2012)
 10.
Karimi, M., Tunçel, L.: Primaldual interiorpoint methods for domaindriven formulations: Algorithms. arXiv preprint arXiv:1804.06925 (2018)
 11.
Karmarkar, N.: A polynomialtime algorithm for linear programming. Combinatorica 4, 373–395 (1984)
 12.
Lubin, M., Yamangil, E., Bent, R., Vielma, J.P.: Extended Formulations in Mixedinteger Convex Programming. In: Q. Louveaux, M. Skutella (eds.) Integer Programming and Combinatorial Optimization. IPCO 2016. Lecture Notes in Computer Science, Volume 9682, pp. 102–113. Springer, Cham (2016)
 13.
Mehrotra, S.: On the implementation of a primaldual interior point method. SIAM J. Optim. 2(4), 575–601 (1992)
 14.
Myklebust, T.G.J.: On primaldual interiorpoint algorithms for convex optimisation. Ph.D. thesis, University of Waterloo (2015)
 15.
Myklebust, T.G.J., Tunçel, L.: Interiorpoint algorithms for convex optimization based on primaldual metrics. arXiv preprint arXiv:1411.2129 (2014)
 16.
Nemirovski, A.: Advances in convex optimization: conic programming. Int. Congress Math. 1, 413–444 (2007)
 17.
Nemirovski, A., Tunçel, L.: “conefree” primaldual pathfollowing and potentialreduction polynomial time interiorpoint methods. Mathematical Programming 102(2), 261–294 (2005)
 18.
Nesterov, Y.: Towards nonsymmetric conic optimization. Optim. Methods Softw. 27(4–5), 893–917 (2012)
 19.
Nesterov, Y., Nemirovskii, A.: InteriorPoint Polynomial Algorithms in Convex Programming, 1st edn. SIAM, Philadelphia (1994)
 20.
Nesterov, Y., Todd, M.J.: Selfscaled barriers and interiorpoint methods for convex programming. Math. Oper. Res. 22(1), 1–42 (1997)
 21.
Nesterov, Y., Todd, M.J.: Primaldual interiorpoint methods for selfscaled cones. SIAM J. Optim. 8(2), 324–364 (1998)
 22.
Nesterov, Y., Todd, M.J., Ye, Y.: Infeasiblestart primaldual methods and infeasibility detectors for nonlinear programming problems. Math. Program. 84(2), 227–267 (1999)
 23.
Nesterov, Y., Tuncel, L.: Local superlnear convergence of polynomialtime interiorpoint methods for hyperbolicity cone optimization problems. Tech. rep., CORE, LovainlaNeuve (2009). Revised August 2015
 24.
Øbro, M.: Conic optimization with exponential cones. Master’s thesis, Technical University of Denmark (2019)
 25.
Schnabel, R.B.: Quasinewton methods using multiple secant equations. Colorado University at Boulder, Technical Report (1983)
 26.
Serrano, S.A.: Algorithms for unsymmetric cone optimization and an implementation for problems with the exponential cone. Ph.D. thesis, Stanford University (2015)
 27.
Skajaa, A., Ye, Y.: A homogeneous interiorpoint algorithm for nonsymmetric convex conic optimization. Mathe. Program. 150(2), 391–422 (2015)
 28.
Sturm, J.F.: SeDuMi 1.02, a MATLAB toolbox for optimizing over symmetric cones. Optim. Methods Softw. 11–12, 625–653 (1999)
 29.
The Conic Benchmark Library. http://cblib.zib.de/ (2018). [Online; accessed 01December2018]
 30.
Tunçel, L.: Generalization of primaldual interiorpoint methods to convex optimization problems in conic form. Found. Comput. Math. 1(3), 229–254 (2001)
 31.
Vandenberghe, L.: The cvxopt linear and quadratic cone program solvers. Online: http://cvxopt. org/documentation/coneprog.pdf (2010)
 32.
Vandenberghe, L., Andersen, M.S.: Chordal graphs and semidefinite optimization. Found. Trends Optim. 1(4), 241–433 (2015)
 33.
Wright, S.J.: Primaldual interiorpoint methods, vol. 54. SIAM, Philadelphia (1997)
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The material in this manuscript has not published or submitted elsewhere.
Barrier function and derivatives
Barrier function and derivatives
We consider derivatives up to third order of the exponentialcone barrier (2).
Firstorder derivatives
Let \(\psi (x) = x_2 \log (x_1 / x_2)  x_3 \), \(g(x)=\log (\psi (x))\) and \(h(x)=\log x_1 \log x_2\), i.e., \(F(x) = g(x) + h(x)\). Then \(F'(x) = g'(x) + h'(x)\) with
and \(\psi '(x) = (x_2/x_1, \log (x_1/x_2)1,1)\), \(h'(x) = (1/x_1, 1/x_2, 0)\). Sometimes we will omit the arguments for these functions and their derivatives when it is implicitly given.
Secondorder derivatives
with \(h''(x) = \mathbf {diag}(1/x_1^2, 1/x_2^2, 0)\) and
Let \({{\hat{\psi }}}'(x)\) and \({{\hat{\psi }}}''(x)\) denote the leading parts of \(\psi '(x)\) and \(\psi ''(x)\), respectively, i.e.,
and similarly for \({\hat{h}}'\) and \({\hat{h}}''\). We can then write \(F''(x)\) as
where
We can factor \(A(x)=V(x)V(x)^T\) with
which gives a factored expression of \(F''(x)=R(x)R(x)^T\) where
Thirdorder directional derivatives
We have \(F'''(x)[u]=\frac{d}{dt} F''(x + tu)\biggr _{t=0}\) with \(F''(x)=g''(x)+h''(x)\). Then
with
To evalutate the corrector (16) we compute
using (33) where \(u:=\varDelta x^\text {a}\) and \(F''(x)v=\varDelta s^\text {a}\). For stability we solve for v using the factored expression of \(F''(x)\).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Dahl, J., Andersen, E.D. A primaldual interiorpoint algorithm for nonsymmetric exponentialcone optimization. Math. Program. (2021). https://doi.org/10.1007/s10107021016314
Received:
Accepted:
Published:
Keywords
 Nonsymmetric cone optimization
 Exponentialcone optimization
 Interiorpoint methods
Mathematics Subject Classification
 90C99