1 Introduction

In 1984 Karmarkar [11] presented an interior-point algorithm for linear optimization with polynomial complexity. This triggered the interior-point revolution which gave rise to a vast amount research on interior-point methods. A particularly important result was the analysis of so-called self-concordant barrier functions, which led to polynomial-time algorithms for linear optimization over a convex domain with a self-concordant barrier, provided that the barrier function can be evaluated in polynomial time. This was proved by Nesterov and Nemirovski [19], and as a consequence convex optimization problems with such barriers can be solved efficently by interior-point methods, at least in theory.

However, numerical studies for linear optimization quickly demonstrated that primal-dual interior-point methods were superior in practice, which led researchers to generalize the primal-dual algorithm to general smooth convex problems. A major breakthrough in that direction is the seminal work by Nesterov and Todd (NT) [20, 21] who generalized the primal-dual algorithm for linear optimization to self-scaled cones, with the same complexity bound as in the linear case. Güler [8] later showed that the self-scaled cones correspond to the class of symmetric cones, which has since become the most commonly used term. The good theoretical performance of primal-dual methods for symmetric cones has since been confirmed computionally, e.g., by [2, 28].

The class of symmetric cones has been completely characterized and includes 5 different cones, where the three most interesting ones are the nonnegative orthant, the quadratic cone, and the cone of symmetric positive semidefinite matrices, as well as products of those three cones. Although working exclusively with the symmetric cones is a limitation, they cover a great number of important applications, see, e.g., Nemirovski [16] for an excellent survey of conic optimization over symmetric cones.

Some convex sets with a symmetric cone representation are more naturally characterized using nonsymmetric cones, e.g., semidefinite matrices with chordal sparsity patterns, see [32] for an extensive survey. Thus algorithmic advancements for handling nonsymmetric cones directly could hopefully lead to both simpler modeling and reductions in computational complexity. Many other important convex sets cannot readily be modeled using symmetric cones, e.g., convex sets involving exponentials or logarithms, for which no representation using symmetric cones is known. In terms of practical importance, the three dimensional power- and exponential-cones are perhaps the most important nonsymmetric cones; Lubin et al. [12] showed how all instances in a large benchmark library can modeled using the three symmetric cones as well the three dimensional power- and exponential cone.

Generalizing methods from symmetric to nonsymmetric cones is not straightforward, however. In [22] Nesterov et al. suggest a long-step algorithm using both the primal and dual barriers effectively doubling the size of the linear system solved at each iteration. For small-dimensional cones (such as the exponential cone) this overhead might be acceptable.

More recently Nesterov [18] proved the existence of a NT-like primal-dual scaling in the vicinity of the central path, leading to an algorithm that uses only a single barrier, but is restricted to following the central path closely. Compared to [22] this method has a main advantage of reducing the size of the linear system to the same size of the algorithms for symmetric cones; a similar advantage is shared by algorithms using explicit primal-dual scalings, e.g., scalings by Tunçel [30] considered later. At each iteration Nesterov’s method [18] has a centering phase, which brings the current iterate close to the central path, which is followed by one affine step which brings the iterate closer to optimum. From a practical perspective this is a significant drawback of the method since the centering phase is computationally costly.

Hence, the centering and affine steps should be combined as in the symmetric cone case. Also both algorithms [18, 22] are feasible methods, i.e., they require either a strictly feasible known starting point or some sort of phase-I method to get a feasible starting point. Skajaa and Ye [27] extended Nesterov’s method [18] with a homogeneous model, which also simplifies infeasibility detection. In practice, however, the method of Skaajaa and Ye is not competitive with other methods [3] due to the centering steps.

In more recent work Serrano and Ye [26] improves the algorithm in [27] such that explicit centering steps are not needed, but instead restricting iterates to a vicinity of the central path. This method has been implemented as part of the ECOS solver [6] and will be used for comparison in Sect. 8.

A different approach to non-symmetric conic optimization was proposed by Tunçel [30], who extends the concepts of the NT algorithm in a more direct fashion. Nesterov and Todd [20, 21] showed existence of a primal-dual scaling defined by a single scaling point w satifying two secant equations \(s=F''(w)x\) and \(F'(x) = F''(w)F_*'(s)\), where \(F'\) and \(F''\) denotes the first- and second-order derivatives of the barrier. Furthermore, the scaling \(F''(w)\) is shown to be bounded, which is a key property of establishing polynomial-time complexity of the NT algorithm. Tunçel later showed in [30] that such a scaling point w exists for any convex cone K, but satisfying only one secant equation \(s = F''(w)x\), and if the barrier has negative curvature then w is unique [23]. To enforce both secant equations in the nonsymmetric case, Tunçel [30] considered a sequence of low-rank quasi-Newton updates to a general positive definite matrix.

These ideas were further explored by Myklebust and Tunçel [15] and also in [14] and form the basis of our proposed algorithm. Following this line of work, the essential difference from a symmetric NT algorithm is the computation of a general positve definite scaling matrix, satisfying the same secant equations, but without relying on a given scaling-point w. Furthermore, such scaling matrices should be bounded to ensure polynomial-time complexity.

For three-dimensional cones these scaling matrices are particularly simple and characterized by a single scalar, as shown in Sect. 5. Uniform boundedness of the scaling matrices is not established in this correspondence, but we comment on an efficient method for computing the most bounded scaling matrix using the formulations developed in Sect. 5.

It is also possible to develop algorithms for convex optimization specified on functional form. This has been done by several authors, for example by [3, 4, 7, 9] who all solve the KKT optimality conditions of the problem in functional form. The algorithms all require a sort of merit or penalty function, which often require problem specific parameter tuning to work well in practice. Another strain of research is the work Nemirovski and Tunçel [17] and very recently Karimi and Tunçel [10] who advocate a non-linear convex formulation instead of a conic formulations, explicitly using self-concordant barriers for the convex domains.

From a theoretical point of view the different algorithms for nonsymmetric cones (including the functional formulations) all share the same best-known complexity bounds as the symmetric counterparts. Whether these methods are competetive in practice with algorithms for symmetric cones is still an unanswered question, though.

The algorithm we consider herein uses the scaling matrices by Tunçel [30], resulting in an algorithm that is similar to the symmetric counterpart; the linear system solved at each iterations is very similar and both the residuals and the complementarity gap decrease at the same rate. The algorithm is a natural extension of the Nesterov-Todd algorithm implemented for symmetric cones in, e.g., SeDuMi [28] and MOSEK [3].

It is well known that the Mehrotra predictor-corrector idea [13] leads to vastly improved computational performance in the symmetric cone case. One of our main contributions is a new corrector for nonsymmetric cones, closely tied to Tunçel’s primal dual scalings. It is derived from a second-order approximation of the centrality condition \(s_\mu = -\mu F'(x_\mu )\) and thus involves third-order directional derivatives. The proposed corrector loosely follows the central path, without restricting the size of the neighborhood, thereby allowing the algorithm to take longer steps, and it shares similarities with the standard Mehrotra corrector; in particular they coincide if \(F(x) = -\sum _i \log x_i\) denotes the standard barrier for linear optimization.

We demonstrate numerically that the proposed corrector offers a substantial and consistent reduction in the number of iterations required to solve the problems in all our numerical studies. Skajaa and Ye [27] suggest a different corrector by characterizating the full central path as a differential equation, solved by a Runge-Kutta method. An immediate draw-back of Skajaa’s Runge-Kutta method is that it requires additional factorizations of the full KKT system, which significantly adds to the overall solution time.

The remaining paper is structured as follows. We define basic properties for the exponential cone in Sect. 2, and we discuss the homogeneouous model, the central path and related metrics in Sect. 3. In Sect. 4 we discuss search-directions assuming a primal-dual scaling is known, and we derive a our new corrector and provide a simple numerical example that illustrates how the corrector algorithm makes more steady progress. In Sect. 5 we discuss new characterizations of the primal-dual scalings from [15, 30], which reduce to univariate characterizations for three-dimensional cones.

In Sect. 6 we give a collected overview of the suggested path-following algorithm. This is followed by a discussion of some implementation details. Next in Sect. 8 we present numerical results on a moderately large collection of exponential-cone problems. We conclude in Sect. 9, and in the appendix we give details of the first-, second- and third-order derivatives of the barrier for the exponential cone.

2 Preliminaries

In this section we list well-known properties of self-concordant and self-scaled barriers, which are used in the remainder of the paper. The proofs can be found in references such as [20, 21]. We consider a pair of primal and dual linear conic problems

$$\begin{aligned} \begin{array}{ll} \text{ minimize }&{} {\langle c, x \rangle } \\ \text{ subject } \text{ to }&{} Ax = b\\ &{} x \in {K}, \end{array} \end{aligned}$$
(P)

and

$$\begin{aligned} \begin{array}{ll} \text{ maximize }&{} {\langle b, y \rangle } \\ \text{ subject } \text{ to }&{} c - A^Ty = s\\ &{} s \in {K}^*, \end{array} \end{aligned}$$
(D)

where \(y\in \mathbf{{R}}^m\), \(s\in \mathbf{{R}}^n\) and \(K\subset \mathbf{{R}}^n\) is a proper cone, i.e., a pointed, closed, convex cone with non-empty interior. We assume throughout the paper that A has full rank, i.e., the rows A are linearly independent. The dual cone \(K^*\) is

$$\begin{aligned} {K}^* = \{ z\in \mathbf{{R}}^n \mid {\langle x, z \rangle } \ge 0, \forall x \in {K}\}. \end{aligned}$$

If K is proper then \(K^*\) is also proper. A cone K is called self-dual if there is positive definite map between \({K}\) and \({K}^*\), i.e., if \(T {K}= {K}^*\), \(T\succ 0\). A function \(F:\mathbf{{int}}(K)\mapsto \mathbf{{R}}\), \(F\in C^3\) is a \(\vartheta \)-logarithmically homogeneouos self-concordant barrier (\(\vartheta \)-LHSCB) for \(\mathbf{{int}}(K)\) if

$$\begin{aligned} |F'''(x)[u,u,u]| \le 2(F''(x)[u,u])^{3/2} \end{aligned}$$

and

$$\begin{aligned} F(\tau x) = F(x) - \vartheta \log \tau \end{aligned}$$

holds for all \(x\in \mathbf{{int}}(K)\) and for all \(u\in \mathbf{{R}}^n\). For a pointed cone \(\vartheta \ge 1\). We will refer to a LHSCB simply as a self-concordant barrier. If \(F_1\) and \(F_2\) are \(\vartheta _1\) and \(\vartheta _2\)-self-concordant barriers for \(K_1\) and \(K_2\), respectively, then \(F_1(x_1) + F_2(x_2)\) is a \((\vartheta _1+\vartheta _2)\)-self-concordant barrier for \(K_1 \times K_2\). Some straightforward consequences of the homogeneouos property include

$$\begin{aligned} F'(\tau x)= & {} \frac{1}{\tau } F'(x), \quad F''(\tau x) = \frac{1}{\tau ^2} F''(x),\\ F''(x) x= & {} -F'(x), \quad F'''(x) x = -2 F''(x),\\ {\langle F'(x), x \rangle }= & {} -\vartheta . \end{aligned}$$

If F is a \(\vartheta \)-self-concordant barrier for K, then the Fenchel conjugate

$$\begin{aligned} F_*(s) = \sup _{x\in \mathbf{{int}}(K)} \{ -{\langle s, x \rangle }-F(x) \} \end{aligned}$$
(1)

is a \(\vartheta \)-self-concordant barrier for \(K^*\). Futhermore, if \((x,s)\in \mathbf{{int}}(K)\times \mathbf{{int}}(K^*)\) then \((-F'(x),-F'_*(s)) \in \mathbf{{int}}(K^*)\times \mathbf{{int}}(K)\).

For a \(\vartheta \)-self-concordant barrier, the so-called Dikin ellipsoid

$$\begin{aligned} E(x;r) = \{ z \in \mathbf{{R}}^n \, \mid \, {\langle F''(x)(z-x), z-x \rangle } \le r \} \end{aligned}$$

is included in the cone for \(r<1\), i.e., \(E(x,r)\subset \mathbf{{int}}(K)\) for \(r<1\), and F is almost quadratic inside this ellipsoid,

$$\begin{aligned} (1-r)^2 F''(x) \preceq F''(z) \preceq \frac{1}{(1-r)^2}F''(x) \end{aligned}$$

for all \(z\in E(x,r)\).

A cone is called self-scaled if it has a \(\vartheta \)-self-concordant barrier F such that for all \(w, x\in \mathbf{{int}}(K)\),

$$\begin{aligned} F''(w) x \in \mathbf{{int}}(K^*) \end{aligned}$$

and

$$\begin{aligned} F_*(F''(w)) = F(x) - 2 F(w) - \vartheta . \end{aligned}$$

Self-scaled cones are equivalent to symmetric cones and they satisfy the stronger long-step Hessian estimation property

$$\begin{aligned} \frac{1}{(1+\alpha \sigma _x(-p))^2} F''(x) \preceq F''(x - \alpha p) \preceq \frac{1}{(1-\alpha \sigma _x(p))^2} F''(x) \end{aligned}$$

for any \(\alpha \in [0; \sigma _x(p)^{-1})\) where

$$\begin{aligned} \sigma _x(p):=\frac{1}{\sup \{\alpha \, : \, x - \alpha p \in K\}} \end{aligned}$$

denotes the distance to the boundary. Many properties of symmetric cones follow from the fact that the barriers have negative curvature \(F'''(x)[u]\preceq 0\) for all \(x\in \mathbf{{int}}(K)\) and all \(u\in K\). An interesting property proven in [23] is that if both the primal and dual barrier has negative curvature then the cone is symmetric.

In addition to the three symmetric cones (i.e., the nonnegative orthant, the quadratic cone and the cone of symmetric positive semidefinite matrices) we mainly consider the nonsymmetric exponential cone studied by Charez [5] in the present work.

$$\begin{aligned} {K}_\text {exp}= \text {cl}\{ x\in \mathbf{{R}}^3 \mid x_1 \ge x_2 \exp (x_3/x_2), \, x_2 > 0 \}, \end{aligned}$$

with a 3-self-concordant barrier,

$$\begin{aligned} F(x) = -\log (x_2\log (x_1/x_2) - x_3 ) - \log x_1 - \log x_2. \end{aligned}$$
(2)

The dual exponential cone is

$$\begin{aligned} {K}_\text {exp}^* = \text {cl}\{ z\in \mathbf{{R}}^3 \mid e \cdot z_1 \ge -z_3 \exp (z_2/z_3), \, z_1 > 0, z_3 < 0 \}. \end{aligned}$$

The exponential cone is not self-dual, but \(T {K}_\text {exp}={K}_\text {exp}^*\) for

$$\begin{aligned} T = \left[ \begin{array}{ccc}e &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad -1\\ 0 &{}\quad -1 &{}\quad 0\end{array}\right] \not \succ 0. \end{aligned}$$

For the exponential cone the conjugate barrier \(F_*(s)\) or its derivatives cannot be evaluated on closed-form, but it can be evaluated numerically to high accuracy (e.g., with a damped Newton’s method) using the definition (1), i.e., if

$$\begin{aligned} x_s = \arg \min \{ -{\langle s, x \rangle } - F(x) \, : \, x\in \mathbf{{int}}(K) \} \end{aligned}$$

then

$$\begin{aligned} F_*'(s) = -x_s, \quad F_*''(s) = \left[ F''(x_s) \right] ^{-1}. \end{aligned}$$

We conclude this survey of introductory material by listing some of the many convex sets that can be represented using the exponential cone, or a combination of exponential cones and symmetric cones. The epigraph \(t\ge e^x\) can be modelled as \((t,1,x) \in {K}_\text {exp}\) and similarly for the hypograph of the logarithm \(t\le \log x \Leftrightarrow (x,1,t) \in {K}_\text {exp}\). The hypograph of the entropy function, \(t\le -x \log x\) is equivalent to \((1,x,t) \in {K}_\text {exp}\), and similarly for relative entropy \(t\ge x \log (y/x) \Leftrightarrow (y,x,-t) \in {K}_\text {exp}\). The soft-plus function \(\log (1 + e^x)\) can be thought of as a smooth approximation of \(\max \{0, x\}\). Its epigraph can be modelled as \(t\ge \log (1 + e^x) \Leftrightarrow u+v = 1, \, (u,1,x-t),(v,1,-t) \in {K}_\text {exp}\). The epigraph of the logarithm of a sum exponentials can modelled as

$$\begin{aligned} t \ge \log (e^{x_1}+\cdots +e^{x_1}) \quad \Longleftrightarrow \quad \sum _{i=1}^n u_i = 1, \, (u_i,1,x_i-t) \in {K}_\text {exp}, i=1,\dots ,n, \end{aligned}$$

These examples all have auxiliary variables and constraints in their conic representations, which might suggest that an algorithm working directly with a barrier of the convex domain (e.g., [10]) is more efficient. However, a conic formulation has the advantage of nicer conic duality, and it is easy to exploit the special (sparse) structure from the additional constraints and variables in the linear algebra implementation, thereby eliminating the overhead in a conic formulation.

3 The homogeneous model and central path

In the simplified homogeneous model we embed the KKT conditions for (P) and (D) into the homogeneous self-dual model

$$\begin{aligned} \begin{array}{c} \left[ \begin{array}{ccc} 0 &{}\quad A &{}\quad -b \\ -A^T &{}\quad 0 &{}\quad c \\ b^T &{}\quad -c^T &{}\quad 0 \end{array} \right] \left[ \begin{array}{c} y \\ {\hat{x}} \\ \tau \end{array} \right] - \left[ \begin{array}{c} 0 \\ {\hat{s}} \\ \kappa \end{array} \right] = 0\\ {\hat{x}}\in {K}_1\times \cdots \times K_k, \quad {\hat{s}}\in {K}_1^* \times \cdots \times K_k^*, \quad y\in \mathbf{{R}}^m, \quad \tau ,\kappa \ge 0, \end{array} \end{aligned}$$
(3)

where \({\hat{x}} = (x_1,\dots ,x_k)\) is a concatenation of conic variables, \(x_i\in {K}_i\). We denote the primal-dual variables by \(({\hat{x}}, {\hat{s}})\) to distinguish them from augmented variables defined next. Let

$$\begin{aligned} x_{k+1} := \tau , \quad s_{k+1} := \kappa , \quad K_{k+1} := \mathbf{{R}}_+, \quad F_{k+1}(x_{k+1}) := -\log (x_{k+1}), \quad \vartheta _{k+1} = 1. \end{aligned}$$

We than have

$$\begin{aligned} x := (x_1,\dots ,x_{k+1})\in {K}, \qquad s := (s_1,\dots ,s_{k+1})\in {K}^*. \end{aligned}$$

where \({K}:= K_1\times \cdots \times K_{k+1}\) has a barrier

$$\begin{aligned} F(x) = \sum _{i=1}^{k+1} F_i(x_i) . \end{aligned}$$

with complexity \(\vartheta = \sum _{i=1}^{k+1} \vartheta _i\). Let \(z:=(x, s, y)\) and define

$$\begin{aligned} G(z) := \left[ \begin{array}{ccc} 0 &{}\quad A &{}\quad -b \\ -A^T &{}\quad 0 &{}\quad c \\ b^T &{}\quad -c^T &{}\quad 0 \end{array} \right] \left[ \begin{array}{c} y \\ {\hat{x}} \\ \tau \end{array} \right] - \left[ \begin{array}{c} 0 \\ {\hat{s}} \\ \kappa \end{array} \right] . \end{aligned}$$

The KKT conditions can then be expressed succinctly as

$$\begin{aligned} G(z) = 0, \quad z \in {{\mathcal {D}}}, \end{aligned}$$

where \({{\mathcal {D}}} := {K}\times {K}^* \times \mathbf{{R}}^m \). Given an initial \(z^0\in \mathbf{{int}}({{\mathcal {D}}})\) we consider a central path \(z_\mu \) as the solution to

$$\begin{aligned} G(z_\mu )= & {} \mu G(z^0) \end{aligned}$$
(4)
$$\begin{aligned} s_\mu= & {} -\mu F'(x_\mu ), \quad x_\mu = -\mu F'_*(s_\mu ), \end{aligned}$$
(5)

parametrized by \(\mu \in (0,1]\), and on the central path we have

$$\begin{aligned} {\langle x_\mu , s_\mu \rangle }/\vartheta = \mu . \end{aligned}$$

The following lemma gives an equivalent variational characterization of the central path.

Lemma 1

Given \(z^0\in \mathbf{{int}}({{\mathcal {D}}})\). Let

$$\begin{aligned} \varPsi (z) := {\langle x^0, s \rangle } + {\langle x, s^0 \rangle } + F(x) + F_*(s). \end{aligned}$$

Then

$$\begin{aligned} z_\mu := \arg \min _{z\in {\mathcal {D}}} \{ \varPsi (z) \, : \, G(z) = \mu G(z^0)\}. \end{aligned}$$

We omit the proof, which follows from the optimality conditions for minimizing \(\varPsi (z)\). In [22] the central path is defined from the variational characterization in Lemma 1, and they prove that the definition in (4)–(5) is equivalent. From the variational characterization \(z_\mu \) is well-defined, and \(\lim _{\mu \rightarrow 0} z_\mu = z^\star \) satisfies (see, e.g., [22]),

  1. 1.

    \({\langle x^\star , s^\star \rangle } = 0\).

  2. 2.

    If \(\tau ^\star > 0\) then \({\hat{x}}^\star /\tau ^\star \) is an optimal solution for (P) and \((y^\star ,{\hat{s}}^\star )/\tau ^\star \) is an optimal solution for (D).

  3. 3.

    If \(\kappa ^\star > 0\) then \({\langle b, y^\star \rangle }>0\) and (P) is infeasible, or \({\langle c, {\hat{x}}^\star \rangle } < 0 \) and (D) is infeasible, or both.

In the following neighborhood definitions and subsequent algorithms we consider iterates (xsy) that are generally not on the central path, and we define

$$\begin{aligned} \mu := {\langle x, s \rangle }/{\vartheta }. \end{aligned}$$

A neighborhood used by Skajaa and Ye [27] is then

$$\begin{aligned} \Vert s + \mu F'(x) \Vert ^*_{x} = {\langle s + \mu F'(x), F''(x)^{-1}(s + \mu F'(x)) \rangle }^{1/2} \le \beta \mu , \end{aligned}$$
(6)

which characterizes the central path for \(\beta =0\). We can think of (6) as a generalization of the standard two-norm neighborhood

$$\begin{aligned} \Vert XSe - \mu e \Vert _2 \le \beta \mu \end{aligned}$$

from linear optimization [33].

A different neighborhood is due to Nesterov and Todd [21]. We define shadow iterates (following [30])

$$\begin{aligned} {\tilde{x}} := -F'_*(s), \quad {\tilde{s}} := -F'(x) \end{aligned}$$
(7)

and

$$\begin{aligned} {{\tilde{\mu }}} := {\langle {\tilde{x}}, {\tilde{s}} \rangle }/\vartheta \end{aligned}$$

for an iterate \((x, s, y)\in {{\mathcal {D}}}\). Nesterov and Todd [21] then showed that \(\mu {{\tilde{\mu }}}\ge 1\) with equality only on the central path. This leads to a different neighborhood \(\beta \mu {{\tilde{\mu }}} \le 1\) for \(\beta \in (0;1]\), or equivalently

$$\begin{aligned} \beta \mu {\langle {\tilde{x}}, {\tilde{s}} \rangle } \le \vartheta . \end{aligned}$$

This is satisfied if

$$\begin{aligned} \beta \mu {\langle {\tilde{x}}_i, {\tilde{s}}_i \rangle } \le \vartheta _i, \, i=1,\dots ,{k+1}, \end{aligned}$$

leading to another neighborhood definition

$$\begin{aligned} {{\mathcal {N}}}(\beta ) = \left\{ (x,s)\in K \times K^* \, \mid \, \vartheta _i {\langle F'(x_i), F'_*(s_i) \rangle }^{-1} \ge \beta \mu , \, i=1,\dots ,{k+1} \right\} , \end{aligned}$$

which (in contrast (6)) characterizes the central path for \(\beta =1\). We use the neighborhood \({{\mathcal {N}}}(\beta )\), which can be seen as a generalization of the one-sided \(\infty \)-norm neighborhood.

Both the central path and the neighborhood depend on the initial values. A simple choice is \(y^0=0\) and

$$\begin{aligned} x^0 = s^0 = -F'(x^0), \end{aligned}$$

which are optimality conditions for minimizing

$$\begin{aligned} f(x) := (1/2)x^T x + F(x). \end{aligned}$$

If \({K}_i={K}_\text {exp}\) this can be solved off-line using a backtracking Newton’s method to get

$$\begin{aligned}{}[x^0]_i = [s^0]_i \approx (1.290928, 0.805102, -0.827838). \end{aligned}$$

For the symmetric cones and the three-dimensional power-cone such a central starting point can be found analytically. Then \({\langle x^0, s^0 \rangle }/\vartheta = 1\) and \((x^0, s^0) \in {{\mathcal {N}}}(1)\).

4 Search-directions using a primal-dual scaling

In this section we define search directions, assuming a primal-dual scaling is known; how to compute such scalings is discussed in Sect. 5. We consider an iterate \((x,s,y)\in \mathbf{{int}}({{\mathcal {D}}})\) and nonsingular primal-dual scalings \(W_i\) satisfying double secant equations

$$\begin{aligned} W_i x_i = W_i^{-T}s_i, \qquad W_i \tilde{x_i} = W_i^{-T} \tilde{s_i}, \quad i=1,\dots ,{k+1} \end{aligned}$$

where \(\tilde{x_i}\) and \(\tilde{s_i}\) are the shadow iterates defined in (7). Let

$$\begin{aligned} W := \left[ \begin{array}{ccc} W_1\\ &{} \ddots \\ &{} &{} W_{k+1}\end{array} \right] \end{aligned}$$

We can then express the primal-dual scaling succinctly as

$$\begin{aligned} v = W x = W^{-T} s, \quad \tilde{v} = W \tilde{x} = W^{-T} \tilde{s}. \end{aligned}$$
(8)

where \(W_{k+1} := \sqrt{\kappa /\tau }\) and

$$\begin{aligned} v_{k+1} = \sqrt{\tau \kappa }, \qquad {\tilde{v}}_{k+1} = \frac{1}{\sqrt{\tau \kappa }}. \end{aligned}$$

Whenever we encounter a scaling W in the remainder of this paper, it is assumed that W satisfies the double secant equation (8) for the current iterate.

To linearize the centrality condition we consider

$$\begin{aligned} s+\varDelta s= -\mu F'(x + \varDelta x), \end{aligned}$$

with a linearization given by

$$\begin{aligned} s + \varDelta s= -\mu F'(x) - \mu F''(x)\varDelta x. \end{aligned}$$
(9)

On the central path we can express (9) as

$$\begin{aligned} W \varDelta x+ W^{-T}\varDelta s= \mu {\tilde{v}} - v \end{aligned}$$
(10)

with \(W := [\mu F''(x)]^{-1/2}\). If (xs) is not on the central path then (10) is an approximate linearization of the symmetric centrality condition

$$\begin{aligned} v = \mu {\tilde{v}} \end{aligned}$$

with a quality determined by the distance \(\Vert W^T W - \mu F''(x)\Vert \). Thus, the assumption that \(W^T W \approx \mu F''(x)\) is important for our proposed algorithm, including the corrector studied later in this section.

We next define different search-directions used in the proposed algorithm. The affine search-direction is the solution to

$$\begin{aligned} G(\varDelta z^\text {a}) = -G(z), \qquad W\varDelta x^\text {a}+ W^{-T}\varDelta s^\text {a}= -v, \end{aligned}$$
(11)

and is characterized by the following lemma.

Lemma 2

The solution to (11) satisfies

$$\begin{aligned} {\langle s, \varDelta x^\text {a} \rangle } + {\langle x, \varDelta s^\text {a} \rangle } = -{\langle x, s \rangle }, \qquad {\langle \varDelta x^\text {a}, \varDelta s^\text {a} \rangle } = 0, \end{aligned}$$
(12)

and for all \(\alpha \in \mathbf{{R}}\),

$$\begin{aligned} {\langle x + \alpha \varDelta x^\text {a}, s + \alpha \varDelta s^\text {a} \rangle } = (1-\alpha ){\langle x, s \rangle }. \end{aligned}$$

Proof

It follows from (11) that

$$\begin{aligned} {\langle s, \varDelta x^\text {a} \rangle } + {\langle x, \varDelta s^\text {a} \rangle } = -{\langle v, v \rangle } = -{\langle x, s \rangle }, \end{aligned}$$

and skew-symmetry implies that

$$\begin{aligned} -{\langle [y+\varDelta y^\text {a},\, x+\varDelta x^\text {a}], G(z+\varDelta z^\text {a}) \rangle } = {\langle x + \varDelta x^\text {a}, s + \varDelta s^\text {a} \rangle } = 0, \end{aligned}$$

which combined shows that \({\langle \varDelta x^\text {a}, \varDelta s^\text {a} \rangle }=0\). The last part follows directly from (12). \(\square \)

Lemma 2 shows that a full affine step step \((z+\varDelta z^\text {a})\) satisfies both \(G(z+\varDelta z^\text {a})=0\) and \({\langle x+\varDelta x^\text {a}, s+\varDelta s^\text {a} \rangle }=0\). Thus, if \((z+\varDelta z^\text {a})\in {{\mathcal {D}}}\) then \((z+\varDelta z^\text {a})\) is optimal (i.e., a solution to (3)).

We next consider a higher-order corrector for nonsymmetric cones, with some similarities to a Mehrotra corrector for symmetric cones. We consider the first- and second-order derivatives of \(s_\mu = -\mu F'(x_\mu )\) with respect to \(\mu \),

$$\begin{aligned} \dot{s}_\mu + \mu F''(x_\mu ) \dot{x}_\mu&= -F'(x_\mu ),\nonumber \\ \ddot{s}_\mu + \mu F''(x_\mu )\ddot{x}_\mu&= -2 F''(x_\mu ) \dot{x}_\mu - \mu F'''(x_\mu )[ \dot{x}_\mu , \dot{x}_\mu ] \end{aligned}$$
(13)

From (13) we have

$$\begin{aligned} \mu \dot{x}_\mu = - [ F''(x_\mu ) ]^{-1}(F'(x_\mu ) + \dot{s}_\mu ) = x_\mu - [ F''(x_\mu ) ]^{-1} \dot{s}_\mu , \end{aligned}$$

resulting in an expression for the third-order directional derivative,

$$\begin{aligned} \mu F'''(x_\mu )[ \dot{x}_\mu , \dot{x}_\mu ]&= F'''(x_\mu )[\dot{x}_\mu , x_\mu ] - F'''(x_\mu )[\dot{x}_\mu , (F''(x_\mu ))^{-1}\dot{s}_\mu ] \end{aligned}$$
(14)
$$\begin{aligned}&= -2 F''(x_\mu )\dot{x}_\mu - F'''(x_\mu )[\dot{x}_\mu , (F''(x_\mu ))^{-1}\dot{s}_\mu ] \end{aligned}$$
(15)

where (15) follows from the homogeneity property \(F'''(x)[x]=-2F''(x)\). This results in an alternative expression for the second-order derivative of the centrality condition, i.e.,

$$\begin{aligned} \ddot{s}_\mu + \mu F''(x_\mu )\ddot{x}_\mu = F'''(x_\mu )[\dot{x}_\mu , (F''(x_\mu ))^{-1}\dot{s}_\mu ]. \end{aligned}$$

Assuming that \(W^T W \approx \mu F''(x)\) and comparing (11) and (13) we interpret \(\varDelta s^\text {a}= -\dot{s}_\mu /\mu \) and \(\varDelta x^\text {a}= -\dot{x}_\mu /\mu \), leading to the definition of our corrector term

$$\begin{aligned} \eta := -\frac{1}{2} F'''(x) [\varDelta x^\text {a}, (F''(x))^{-1} \varDelta s^\text {a}]. \end{aligned}$$
(16)

Thus a pure corrector search-direction can be defined as the solution to

$$\begin{aligned} G(\varDelta z^\text {c}) = 0, \qquad W\varDelta x^\text {c}+ W^{-T}\varDelta s^\text {c}= -W^{-T}\eta , \end{aligned}$$
(17)

and satisfies properties given in the following lemma.

Lemma 3

The solution to (17) satisfies

$$\begin{aligned} {\langle s, \varDelta x^\text {c} \rangle } + {\langle x, \varDelta s^\text {c} \rangle } = 0, \qquad {\langle \varDelta x^\text {c}, \varDelta s^\text {c} \rangle } = 0. \end{aligned}$$

Proof

From (17) we have that

$$\begin{aligned} {\langle s, \varDelta x^\text {c} \rangle } + {\langle x, \varDelta s^\text {c} \rangle } = (1/2){\langle x, F'''(x) [\varDelta x^\text {a}, (F''(x))^{-1} \varDelta s^\text {a}] \rangle } = -{\langle \varDelta x^\text {a}, \varDelta s^\text {a} \rangle } = 0, \end{aligned}$$

using the homogeneity property \(F'''(x)[x] = -2F''(x)\), and skewsymmetry implies that \({\langle \varDelta x^\text {c}, \varDelta s^\text {c} \rangle }=0\). \(\square \)

We note that

$$\begin{aligned} W_{k+1}^2 \varDelta x^\text {c}_{k+1} - \varDelta s^\text {c}_{k+1} = -\eta _{k+1} \end{aligned}$$

reduces to the familiar expression \(\kappa \varDelta \tau ^\text {c}+ \tau \varDelta \kappa ^\text {c}= -\varDelta \tau ^\text {a}\varDelta \kappa ^\text {a}\).

For a given centering parameter \(\gamma > 0\) we define a combined search-direction as the solution to

$$\begin{aligned} G(\varDelta z) = -(1-\gamma )G(z), \qquad W\varDelta x+ W^{-T}\varDelta s= -v + \gamma \mu {\tilde{v}} -W^{-T}\eta , \end{aligned}$$
(18)

with properties given in the following lemma.

Lemma 4

The solution to (18) satisfies

$$\begin{aligned} {\langle s, \varDelta x \rangle } + {\langle x, \varDelta s \rangle } = 0, \qquad {\langle \varDelta x, \varDelta s \rangle } = 0 \end{aligned}$$

and for all \(\alpha \in \mathbf{{R}}\),

$$\begin{aligned} G(z + \alpha \varDelta z)&= (1-\alpha (1-\gamma ))G(z),\\ {\langle x + \alpha \varDelta x, s + \alpha \varDelta s \rangle }&= (1-\alpha (1-\gamma )) {\langle x, s \rangle }. \end{aligned}$$

Proof

From (18) and Lemma 3 we have that

$$\begin{aligned} {\langle s, \varDelta x \rangle } + {\langle x, \varDelta s \rangle }= & {} -(1-\gamma ){\langle x, s \rangle }-(1/2){\langle x, F'''(x) [\varDelta x^\text {a}, (F''(x))^{-1} \varDelta s^\text {a}] \rangle } \\= & {} -(1-\gamma ){\langle x, s \rangle }, \end{aligned}$$

and skewsymmetry implies that

$$\begin{aligned} {\langle (1-\gamma )x + \varDelta x, (1-\gamma )s + \varDelta s \rangle }= & {} (1-\gamma )^2{\langle x, s \rangle } + (1-\gamma )[{\langle s, \varDelta x \rangle } + {\langle x, \varDelta s \rangle }] \\&+ {\langle \varDelta x, \varDelta s \rangle } = 0, \end{aligned}$$

i.e., \({\langle \varDelta x, \varDelta s \rangle }=0\). The last part now follows. \(\square \)

The search-direction (18) forms the basis of our algorithm. For a given step-size \(\alpha \in (0, 1]\) the residuals and the complementarity gap decrease at the same rate. More explicitly, for \(\mu ^k := {\langle x^k, s^k \rangle }/\vartheta \) the residuals are the kth iteration are

$$\begin{aligned} G(z^k) = \mu ^k G(z^0), \end{aligned}$$

which should be compared with central path definition in Lemma 1. Similarly, the complementarity gap at the kth iteration is

$$\begin{aligned} {\langle x^k, s^k \rangle } = \mu ^k \vartheta . \end{aligned}$$

This is in contrast with other methods [18, 26, 27], which do not decrease the complementarity gap at the same rate. Also, no explicit merit function as in [9] is required to ensure a balanced decrease of the residuals and the complementarity gap.

Tunçel [30] showed polynomial complexity of an infeasible method (without a corrector) assuming boundedness of the scaling matrices. These results were further extended by Myklebust and Tunçel [15] to include an analysis for scaling matrices obtained by a BFGS scaling (such scalings are considered in the next section). Although we use slightly different scaling matrices, their analysis could be be applied in a small neighborhood around the central path. Thus a short-step algorithm using the search directions herein would likely inherit good theoretical performance. It is also possibility that future studies will prove a conjucture that the scaling matrices considered herein are bounded, which would simplify a complexity analysis.

To gain some insight into the corrector \(\eta \) we note that in the case of the nonnegative orthant we have the familiar expression

$$\begin{aligned} \frac{1}{2}F'''(x) [\varDelta x^\text {a}, (F''(x))^{-1} \varDelta s^\text {a}] = -\mathbf {diag}(x)^{-1}\mathbf {diag}(\varDelta x^\text {a})\varDelta s^\text {a}, \end{aligned}$$

and similarly for the semidefinite cone we have

$$\begin{aligned} \frac{1}{2}F'''(x) [\varDelta x^\text {a}, (F''(x))^{-1} \varDelta s^\text {a}]= & {} -\frac{1}{2} x^{-1} \varDelta x^\text {a}\varDelta s^\text {a}- \frac{1}{2} \varDelta s^\text {a}\varDelta x^\text {a}x^{-1}\\= & {} -(x^{-1}) \circ (\varDelta x^\text {a}\varDelta s^\text {a}), \end{aligned}$$

using the generalized product associated with the Euclidean Jordan algebra, see, e.g., [31] for a discussion of Euclidean Jordan algebra in a similar context as ours. For the Lorentz cone we have

$$\begin{aligned} F'''(x)[ (F''(x))^{-1}u] = -\frac{2}{x^TQx}(ux^TQ + Qxu^T - (x^T u) Q) \end{aligned}$$

with \(Q=\mathbf {diag}(1,-1,\dots ,-1)\). Let \(e_k\) denote the kth standard basis-vector (i.e., the vector with value 1 in position k and 0 elsewhere). Then

$$\begin{aligned} F'''(x)[ (F''(x))^{-1}u]e_1 = -2 (x^{-1}) \circ u, \end{aligned}$$

again using the notation of the generalized product [31]. We defer the derivation and implementation specific details for the exponential cone to the appendix.

As an illustration of the proposed corrector we consider a simple example,

$$\begin{aligned} \begin{array}{ll} \text{ minimize }&{} x_1 + x_2\\ \text{ subject } \text{ to }&{} x_1 + x_2 + x_3 = 1\\ &{} (x_1,x_2,x_3) \in {K}_\text {exp}. \end{array} \end{aligned}$$
(19)

In Table 1 we list the complementarity gap \({\langle x^{k}, s^{k} \rangle }\) for the iterates produced using the search direction (18) with and without the proposed corrector, using the scaling matrices defined in Sect. 5.

Table 1 Complementarity gap for the iterates in example (19), with and without corrector

In Fig. 1 we plot the same iterates \(x^{k}/\tau ^\star \) projected onto the hyperplane \(x_1 + x_2 + x_3 = 1\). Although some iterates appear close to the boundary they are all within the defined neighborhood, i.e., no cut-back is performed to stay within the neighborhood. We see how the corrector algorithm makes significantly more progress and thereby reduces the required number of iterations. A similar observation is made for a wide selection of test-problems in Sect. 8.

Fig. 1
figure 1

Central path, boundary, and iterates for example (19), projected onto the hyperplane \(x_1 + x_2 + x_3 = 1\). The iterates using the corrector direction are plotted in the darker solid line, and iterates without the corrector are plotted in ligher solid line

5 Primal-dual scalings

In this section we review key results for primal-dual scalings by Tunçel [30], we make connections to the theory of multiple secant-equation updates by Schnabel [25], and we present new formulations, which were used collaboratively in [24] to compute optimally bounded scaling matrices specifically for three-dimensional cones.

Tunçel [30] defines a set of scalings

$$\begin{aligned} {{\mathcal {T}}}_1 (x,s) := \{T \, : \, T\succ 0, \, T^2s = x, \, T^2 F'(x) = F'_*(s) \}, \end{aligned}$$
(20)

i.e., positive definite scalings satisfying double secant equations. Tunçel further defines a set of bounded scalings parametrized by \(\xi \),

$$\begin{aligned} {{\mathcal {T}}}_2 (x,s,\xi ) := \left\{ T\in {{\mathcal {T}}}_1 (x,s) \, : \, (\xi \delta _F)^{-1} F_*''(s) \preceq T^2 \preceq \xi \delta _F \left[ F''(x)\right] ^{-1} \right\} , \end{aligned}$$
(21)

where \(\delta _F := [\vartheta (\mu {{\tilde{\mu }}} - 1) + 1]/\mu \). Let

$$\begin{aligned} \xi ^\star := \inf _{\xi } {{\mathcal {T}}}_2 (x,s,\xi ). \end{aligned}$$
(22)

In the case where \(\xi ^\star \in {{\mathcal {O}}}(1)\) over all \((x,s)\in \mathbf{{int}}(K)\times \mathbf{{int}}(K^*)\), Tunçel proved an iteration complexity bound of \({{\mathcal {O}}}(\sqrt{\vartheta }\log (1/\epsilon ))\) for an infeasible-start primal-dual interior point algorithm (coinciding with the best known complexity bound for interior-point methods). The parameter \(\delta _F\) and set \({{\mathcal {T}}}_2 (x,s,\xi )\) are further addressed towards the end of this section in the context of finding optimally bounded scaling matrices.

A self-scaled cone has a unique Nesterov-Todd scaling point w satisfying double secant equations,

$$\begin{aligned} s = F''(w) x, \qquad F'(x) = F''(w) F'_*(s). \end{aligned}$$

Furthermore, \(F''(w)\) is bounded (see [20, 21, 30]) in the sense that

$$\begin{aligned}{}[F''(w)]^{-1} \in {\mathcal {T}}_2(x,s,4/3). \end{aligned}$$

A barrier F(x) is said to have negative curvature if for all \(x\in \mathbf{{int}}(K)\) and for \(u\in {K}\) we have

$$\begin{aligned} F'''(x)[u]\preceq 0 \end{aligned}$$

and F(x) is self-scaled if and only if F(x) and \(F_*(s)\) both have negative curvature [23]. Barriers with negative curvature (but which are not self-scaled) still have a unique scaling point w satisfying exactly one secant equation

$$\begin{aligned} s = F''(w) x \end{aligned}$$

but not the other. The exponential-cone barrier does not have negative curvature, as can be seen be considering \({\hat{x}} := (1,e^{-2},0)\in {K}_\text {exp}\) and \({\hat{u}}:=(1,0,0)\in {K}_\text {exp}\setminus \mathbf{{int}}({K}_\text {exp})\). Then it can be verified using the expressions in the appendix that

$$\begin{aligned} F'''({\hat{x}})[{\hat{u}}] = \left[ \begin{array}{ccc} -4 &{}\quad \frac{e^2}{2} &{}\quad \frac{e^2}{2}\\ \frac{e^2}{2} &{}\quad 0 &{}\quad 0 \\ \frac{e^2}{2} &{}\quad 0 &{}\quad -\frac{e^4}{4} \end{array} \right] , \end{aligned}$$

which is indefinite, for example

$$\begin{aligned} F'''({\hat{x}})[{\hat{u}}, {\hat{v}}, {\hat{v}}] = 4 \end{aligned}$$

for \({\hat{v}}:=(1, 8e^{-2}, 4e^{-2})\).

Thus for general nonsymmetric cones the essential question is how to define bounded scaling matrices satisfying both secant equations, without relying on a scaling point w. Tunçel [30] partly answers that question by deriving scalings \(T\in {{\mathcal {T}}}_1(x,s)\) using BFGS update equations, resulting in a rank-4 update to a given positive definite matrix. It is still instructive, however, to review work by Schnabel on quasi-Newton methods with multiple secant equations. In particular, the following theorem from [25] is used repeatedly in the following discussion.

Theorem 1

Let \(S,Y\in \mathbf{{R}}^{n\times p}\) have full rank p. Then there exists \(H\succ 0\) such that \(HS=Y\) if and only if \(Y^TS\succ 0\).

As a consequence we can write any such \(H\succ 0\) as

$$\begin{aligned} H = Y(Y^T S)^{-1}Y^T + Z Z^T, \quad S^T Z = 0, \quad \mathbf{{rank}}(Z)=n-p \end{aligned}$$

or in factored form \(H = W^T W\) with

$$\begin{aligned} W=\left[ \begin{array}{cc}Y (Y^TS)^{-1/2},&Z\end{array}\right] ^T. \end{aligned}$$

One may verify that given any \(\varOmega \succ 0\) satisfying \(\varOmega S=Y\), the following identify holds

$$\begin{aligned} W^{-1} = \left[ \begin{array}{cc}S (Y^TS)^{-1/2},&\varOmega ^{-1}Z(Z^T \varOmega ^{-1} Z)^{-1}\end{array}\right] , \end{aligned}$$

and therefore

$$\begin{aligned} H^{-1} = S(Y^T S)^{-1}S^T + R R^T, \quad Y^T R = 0, \quad \mathbf{{rank}}(R)=n-p, \end{aligned}$$

where \(R R^T = \varOmega ^{-1}Z(Z^T \varOmega ^{-1} Z)^{-2} Z^T \varOmega ^{-1}\). One of the most popular quasi-Newton update rules is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) step in (23), where \(H\succ 0\) denotes an approximation of the Hessian.

$$\begin{aligned} H_\text {BFGS} := Y(Y^T S)^{-1}Y^T + H - HS(S^T H S)^{-1}S^TH \end{aligned}$$
(23)

It is well known (see, e.g., [25]) that the update (23) is the solution to

$$\begin{aligned} \begin{array}{ll} \text{ minimize }&{} \Vert \varOmega ^{1/2} (H_+^{-1} - H^{-1}) \varOmega ^{1/2} \Vert _F\\ \text{ subject } \text{ to }&{} H_+^{-1} Y = S, \\ &{} H_+^{-1} \succ 0 \end{array} \end{aligned}$$
(24)

for any \(\varOmega \succ 0\) satisfying \(\varOmega S=Y\). In the following theorem we show how (23) can be computed using a sequence of updates, similar to quasi-Newton updates with a single secant equation.

Theorem 2

Given \(Y_0,S_0\in \mathbf{{R}}^{n\times p}\) with \(Y_0^T S_0\succ 0\). Then

$$\begin{aligned} Y_0(Y_0^T S_0)^{-1} Y_0^T&= V V^T \\ S_0(Y_0^T S_0)^{-1} S_0^T&= U U^T, \end{aligned}$$

where \(V:=\begin{pmatrix}v_1&\cdots&v_p\end{pmatrix}\), \(U:=\begin{pmatrix}u_1&\cdots&u_p\end{pmatrix}\) and

$$\begin{aligned} v_k&:= \frac{Y_{k-1} e_k}{{\langle Y_{k-1}e_k, S_{k-1} e_k \rangle }^{1/2}} , \qquad Y_k := Y_{k-1} - v_k v_k^T S_{k-1} \end{aligned}$$
(25)
$$\begin{aligned} u_k&:= \frac{S_{k-1} e_k}{{\langle Y_{k-1}e_k, S_{k-1} e_k \rangle }^{1/2}} , \qquad S_k := S_{k-1} - u_k u_k^T Y_{k-1} \end{aligned}$$
(26)

for \(k=1,\dots ,p\).

Proof

The proof is constructive and follows from a Cholesky factorization of \(Y_0^T S_0\). Let

$$\begin{aligned} L := \left( \begin{array}{ccc} \frac{Y_0^T S_0 e_1}{{\langle Y_0e_1, S_0e_1 \rangle }^{1/2}},&\cdots ,&\frac{Y_{p-1}^T S_{p-1} e_p}{{\langle Y_{p-1} e_p, S_{p-1} e_p \rangle }^{1/2}} \end{array} \right) . \end{aligned}$$

In the proof we first show that \(Y^T_0 S_0 = L L^T\), and we then show second that \(LV^T=Y_0^T\), which in turn implies (25).

We start by showing that the recursion (25), (26) is well-defined. Let \(\varPsi _k\) be the principal submatrix obtained from the last \(p-k\) rows and columns of \(Y_k^T S_k\), where \(\varPsi _0 = Y_0^T S_0\succ 0\) by assumption. Expanding \(Y_k^T S_k\) we have

$$\begin{aligned} Y_k^T S_k = Y_{k-1}^T S_{k-1} - \frac{(Y_{k-1}^T S_{k-1} e_k) (Y_{k-1}^T S_{k-1} e_k)^T}{{\langle Y_{k-1}e_k, S_{k-1} e_k \rangle }} , \end{aligned}$$
(27)

i.e., \(\varPsi _k \) is the Schur-complement of the first element of \(\varPsi _{k-1}\) and therefore positive definite.

We next make a simplifying observation, namely that the first k columns of \(Y_k\) and \(S_k\) are zero. From (25), (26) we immediately have that \(Y_ke_k = S_ke_k = 0\), \(k=1,\dots p\), and that sparsity propagates to subsequent steps, i.e., \(Y_j e_k = S_j e_k = 0\), \(j>k\).

We can now prove that \(LL^T = Y_0^T S_0\). We have

$$\begin{aligned} LL^T e_k= & {} \sum _{i=0}^{k-1}\frac{Y_i^T S_i e_{i+1}(Y_i^T S_i e_{i+1})^Te_k}{{\langle Y_i e_{i+1}, S_i e_{i+1} \rangle }}\nonumber \\= & {} \sum _{i=0}^{k-2}\frac{Y_i^T S_i e_{i+1}(Y_i^T S_i e_{i+1})^Te_k}{{\langle Y_i e_{i+1}, S_i e_{i+1} \rangle }} + Y_{k-1}^T S_{k-1}e_k. \end{aligned}$$
(28)

Repeated use of (27) in (28) then shows that

$$\begin{aligned} LL^T e_k&= \sum _{i=0}^{k-2}\frac{Y_i^T S_i e_{i+1}(Y_i^T S_i e_{i+1})^Te_k}{{\langle Y_i e_{i+1}, S_i e_{i+1} \rangle }} + Y_{k-1}^T S_{k-1}e_k \\&= \sum _{i=0}^{k-3}\frac{Y_i^T S_i e_{i+1}(Y_i^T S_i e_{i+1})^Te_k}{{\langle Y_i e_{i+1}, S_i e_{i+1} \rangle }} + Y_{k-2}^T S_{k-2}e_k = \cdots = Y_0^T S_0 e_k. \end{aligned}$$

We finally show that \(LV^T = Y_0^T\). From repeated use of (25) it follows that

$$\begin{aligned} L V^T&= Y_0^T - Y_0^T + \frac{S_0^T Y_0 e_1 (Y_0 e_1)^T}{{\langle Y_0 e_1, S_0 e_1 \rangle }} + \frac{S_1^T Y_1 e_2 (Y_1 e_2)^T}{{\langle Y_1e_2, S_1 e_2 \rangle }} + \dots \\&\quad + \frac{S_{p-1}^T Y_{p-1} e_p (Y_{p-1} e_p)^T}{{\langle Y_{p-1}^Te_p, S_{p-1} e_p \rangle }} \\&= Y_0^T - Y_1^T + \frac{S_1^T Y_1 e_2 (Y_1 e_2)^T}{{\langle Y_1e_2, S_1 e_2 \rangle }} + \dots + \frac{S_{p-1}^T Y_{p-1} e_p (Y_{p-1} e_p)^T}{{\langle Y_{p-1}^Te_p, S_{p-1} e_p \rangle }} \\&= \cdots = Y_0^T - Y_p^T = Y_0^T. \end{aligned}$$

Since \(LV^T = Y_0^T\) we have that \(Y_0 (L L^T)^{-1} Y_0^T = Y_0(Y_0^T S_0)^{-1} Y_0^T = V V^T\). Equation (26) follows similarly. \(\square \)

Primal-dual scalings can then be derived similarly to Tunçel [30] and Myklebust and Tunçel [15]. In our context we derive non-singular (factored) scalings W satisfying

$$\begin{aligned} W^{-T} s = W x, \qquad -W^{-T} F'(x) = -W F_*'(s), \end{aligned}$$
(29)

i.e., \((W^T W)^{-1} \in {\mathcal {T}}_1(x,s)\) defined in (20). We define

$$\begin{aligned} H := \mu F''(x), \qquad S := \left[ \begin{array}{cc} x,&{\tilde{x}}\end{array}\right] , \qquad Y := \left[ \begin{array}{cc} s,&{\tilde{s}}\end{array}\right] , \end{aligned}$$

where we remind the reader that \({\tilde{x}} := -F'_*(s)\) and \({\tilde{s}} := -F'(x)\).

The condition \(Y^T S \succ 0\) is equivalent to assuming that (xs) is not on the central pathFootnote 1. We next define

$$\begin{aligned} \delta _x := x - \mu {\tilde{x}}, \qquad \delta _s := s - \mu {\tilde{s}} \end{aligned}$$

also used in [15]. To compute (23) we first use Theorem 2 with \(S_0:=S\), \(Y_0:=Y\) resulting in

$$\begin{aligned} Y(S^T Y)^{-1} Y^T = \frac{s s^T}{{\langle x, s \rangle }} + \frac{\delta _s \delta _s^T}{{\langle \delta _x, \delta _s \rangle }} . \end{aligned}$$

We next define

$$\begin{aligned} \rho _x := {\tilde{x}} - \frac{{\langle x, H {\tilde{x}} \rangle }}{{\langle x, H x \rangle }} x. \end{aligned}$$

If we use Theorem 2 again, this time with \(S_0:=S\) and \(Y_0:=HS\) we get

$$\begin{aligned} H S (S^T H S)^{-1} S^T H = \frac{H x (H x)^T}{{\langle x, H x \rangle }} + \frac{H \rho _x (H \rho _x)^T}{{\langle \rho _x, H \rho _x \rangle }} , \end{aligned}$$

leading to an expression for the BFGS update as a rank 4 update to \(H\succ 0\) (for a general H),

$$\begin{aligned} H_\text {BFGS} = H + \frac{ss^T}{{\langle x, s \rangle }} + \frac{\delta _s \delta _s^T}{{\langle \delta _x, \delta _s \rangle }} - \frac{Hx (Hx)^T}{{\langle x, H x \rangle }} - \frac{H \rho _x (H \rho _x)^T}{{\langle \rho _x, H \rho _x \rangle }} . \end{aligned}$$

Considering (24), we see that the BFGS update to \(H:=\mu F''(x)\) has the desirable property of minimizing

$$\begin{aligned} \Vert W^{-1} W^{-T} - (\mu F''(x))^{-1} \Vert _\varOmega , \end{aligned}$$

measured in a weighted norm; for simplicity we can assume that \(\varOmega = W^T W\). With this choice of H we have

$$\begin{aligned} H_\text {BFGS} = \mu F''(x) + \frac{ss^T}{{\langle x, s \rangle }} + \frac{\delta _s \delta _s^T}{{\langle \delta _x, \delta _s \rangle }} - \frac{\mu }{\vartheta } {\tilde{s}} {\tilde{s}}^T - \mu \frac{(F''(x){\tilde{x}} - {{\tilde{\mu }}}{\tilde{s}})(F''(x){\tilde{x}} - {{\tilde{\mu }}}{\tilde{s}})^T}{{\langle {\tilde{x}}, F''(x){\tilde{x}} \rangle }-\vartheta {{\tilde{\mu }}}^2}, \end{aligned}$$

which curiously reduces to a rank 3 update,

$$\begin{aligned} H_\text {BFGS}= & {} \mu F''(x) + \frac{1}{2\mu \vartheta } \delta _s(s + \mu {\tilde{s}} + \frac{1}{\mu {{\tilde{\mu }}} - 1}\delta _s)^T + \frac{1}{2\mu \vartheta } (s + \mu {\tilde{s}} + \frac{1}{\mu {{\tilde{\mu }}} - 1}\delta _s)\delta _s^T\nonumber \\&- \mu \frac{(F''(x){\tilde{x}} - {{\tilde{\mu }}}{\tilde{s}})(F''(x){\tilde{x}} - {{\tilde{\mu }}}{\tilde{s}})^T}{{\langle {\tilde{x}}, F''(x){\tilde{x}} \rangle }-\vartheta {{\tilde{\mu }}}^2}. \end{aligned}$$
(30)

We conclude this section by considering the three-dimensional case, for which the expressions simplify significantly. It follows from Theorem 1 that any scaling (29) has the form

$$\begin{aligned} W^T W = Y (Y^TS)^{-1} Y^T + t z z^T,\qquad W^{-1}W^{-T} = S (Y^TS)^{-1} S^T + t^{-1} r r^T \end{aligned}$$

where \(S^Tz = 0\), \(Y^Tr = 0\) and \({\langle r, z \rangle }=1\), i.e., the scaling is essentially characterized by \(t>0\). For simplicity, we assume \(\Vert z\Vert =1\). We compute z and r using cross-products,

$$\begin{aligned} z = \frac{x \otimes {\tilde{x}}}{\Vert x \otimes {\tilde{x}}\Vert }, \qquad r = \frac{s \otimes {\tilde{s}}}{{\langle s \otimes {\tilde{s}}, z \rangle }}. \end{aligned}$$

In the three dimensional case we can devise a simple algorithm for finding scalings achieving the bound (22). For notational convinience we introduce \(Q:=\left[ \begin{array}{cc}r,&S\end{array}\right] \), which is nonsingular. We can then solve

$$\begin{aligned} \inf _{\xi } \left\{ \xi \, : \, (\xi \delta _F)^{-1} Q F''(x) Q^T \preceq \left[ \begin{array}{c@{\quad }c} t &{} 0\\ 0 &{} Y^T S \end{array}\right] \preceq (\xi \delta _F) Q [F''_*(s)]^{-1} Q^T \right\} \end{aligned}$$
(31)

using a simple bisection algorithm. Consider the monotonically decreasing function,

$$\begin{aligned} \xi ^l( t) := \inf _{\xi } \left\{ \xi \, : \, (\xi \delta _F)^{-1} Q F''(x) Q^T \preceq \left[ \begin{array}{c@{\quad }c} t &{} 0\\ 0 &{} Y^T S \end{array}\right] \right\} , \end{aligned}$$

and the monotonically increasing increasing function

$$\begin{aligned} \xi ^u( t) := \inf _{\xi } \left\{ \xi \, : \, (\xi \delta _F) Q [F''_*(s)]^{-1} Q^T \succeq \left[ \begin{array}{c@{\quad }c} t &{} 0\\ 0 &{} Y^T S \end{array}\right] \right\} . \end{aligned}$$

Given upper and lower bounds on t, the solution to (31) can then be found using bisection on t to solve \(\xi ^l(t) = \xi ^u(t)\). Such a bisection method was considered in [24], where it was conjectured that \(\xi ^\star \approx 1.253\) for the exponential cone.

The BFGS scaling corresponds to

$$\begin{aligned} t = \mu \left\| F''(x) - \frac{{\tilde{s}} {\tilde{s}}^T}{\vartheta } - \frac{(F''(x){\tilde{x}} - {{\tilde{\mu }}}{\tilde{s}})(F''(x){\tilde{x}} - {{\tilde{\mu }}}{\tilde{s}})^T}{{\langle {\tilde{x}}, F''(x){\tilde{x}} \rangle }-\vartheta {{\tilde{\mu }}}^2} \right\| _F. \end{aligned}$$
(32)

We have tried both the optimal scaling and the BFGS scaling for all numerical test problems in Sect. 8, without noticing a significant difference in required number of iterations or quality of the solution. For all the problems the largest observed bound on \(\xi \) was 1.72 for the BFGS scaling; for simplicity we only report results for the simpler BFGS scaling in the following. In preliminary experiments we have observed similarly encouraging numerical results for higher dimensional non-symmetric cones using the BFGS scaling. The bisection algorithm is still valuable as a reference, however, and we hope that future studies will prove the conjecture, and possibly derive similar bounds for other three dimensional cones (including a tighter bound than 4/3 for quadratic cones).

6 A primal-dual algorithm for exponential cone optimization

In this section we give a collected overview of the suggested path-following primal-dual algorithm. The algorithm is specialized for three-dimensional cones (in particular, the exponential cone) by using cross-products for computing the BFGS scalings. By computing the scaling matrices as a general rank 3 update (30) the algorithm is readily adapted to other non-symmetric cones.

We fix \(\beta \) to a constant low value, for example \(\beta =10^{-6}\). The different essential parts of the method are i) finding a starting point, ii) computing a search-direction and step-size, and iii) checking the stopping criteria for termination.

  • Starting point. Find a starting point on the central path

    $$\begin{aligned} x = s = -F'(x) \end{aligned}$$

    and \(y=0\), \(\tau =\kappa =1\). Then \(z:=({\hat{x}}, {\hat{s}},y,\tau ,\kappa )\in {{\mathcal {N}}}(1)\).

  • Scaling matrices. Compute BFGS scaling matrices

    $$\begin{aligned} W_k = \left[ \begin{array}{ccc} \frac{s_k}{\sqrt{{\langle x_k, s_k \rangle }}},&\frac{\delta _{s_k}}{\sqrt{{\langle \delta _{x_k}, \delta _{s_k} \rangle }}},&\sqrt{t_k}\cdot z_k \end{array} \right] ^T, \quad W_k^{-1} = \left[ \begin{array}{ccc} \frac{x_k}{\sqrt{{\langle x_k, s_k \rangle }}},&\frac{\delta _{x_k}}{\sqrt{{\langle \delta _{x_k}, \delta _{s_k} \rangle }}},&\frac{r_k}{\sqrt{t_k}} \end{array} \right] ^T \end{aligned}$$

    where \(z_k = (x_k \otimes {\tilde{x}}_k)/\Vert x_k \otimes {\tilde{x}}_k\Vert \), \(r_k = (s_k \otimes {\tilde{s}}_k)/{\langle s_k \otimes {\tilde{s}}_k, z_k \rangle }\) and \(t_k\) is chosen from (32). Note that \(\{z_k\}\) and z are unrelated; the later denotes the aggregation of all primal and dual variables.

  • Search-direction and step-size. Compute an affine direction \(\varDelta z^\text {a}\) as the solution to (11),

    $$\begin{aligned} G(\varDelta z^\text {a}) = -G(z), \quad W\varDelta x^\text {a}+ W^{-T}\varDelta s^\text {a}= -v. \end{aligned}$$

    From \(\varDelta z^\text {a}\) we compute a corrector (16),

    $$\begin{aligned} \eta := -\frac{1}{2} F'''(x) [\varDelta x^\text {a}, (F''(x))^{-1} \varDelta s^\text {a}], \end{aligned}$$

    similar to Mehrotra [13], where details on evaluting the derivatives are given in the appendix, see (34). We define a centering parameter \(\gamma \) as

    $$\begin{aligned} \gamma := (1-\alpha _\text {a}) \min \{ (1-\alpha _\text {a})^2, 1/4 \}, \end{aligned}$$

    where \(\alpha _\text {a}\) is the stepsize to the boundary, i.e.,

    $$\begin{aligned} \alpha _\text {a} = \sup \{ \alpha \, \mid \, (x+\alpha \varDelta x^\text {a})\in K, \, (s+\alpha \varDelta s^\text {a})\in K^*, \, \alpha \in [0; 1]\} \end{aligned}$$

    which we approximate using a bisection procedure. We then compute a combined centering-corrector search direction \(\varDelta z\) as the solution to (18),

    $$\begin{aligned} G(\varDelta z) = -(1-\gamma )G(z), \quad W\varDelta x+ W^{-T}\varDelta s= -v + \gamma \mu {\tilde{v}} - W^{-T}\eta \end{aligned}$$

    and we update \(z:=z+\alpha \varDelta z\) with the largest step \(\alpha \in [0;1)\) inside a neighborhood \({{\mathcal {N}}}(\beta )\) of the central path.

  • Checking termination. Terminate if the updated iterate satisfies the termination criteria (given in Sect. 7.3) or else take a new step.

7 Implementation

MOSEK is software package for solving large scale linear and conic optimization problems. It can solve problems with a mixture of linear, quadratic and semidefinite cones, and the implementation is based on the homogeneous model, the NT search direction and a Mehrotra like predictor-corrector algorithm [2].

Our implementation has been extended to handle the three dimensional exponential cone using the algorithm above. We use the usual NT scaling for the symmetric cones and the Tunçel scalings for the nonsymmetric cones. Except for small differences in the linearization of the complementarity conditions, the symmetric and nonsymmmetric cones are handled completely analogously. Our extension for nonsymmetric cones also includes the three dimensional power cone, but this is not discussed further here.

7.1 Dualization, presolve and scaling

Occasionally it is worthwhile to dualize the problem before solving it, since it will make the linear algebra more efficient. Whether the primal or dual formulation is more efficient is not easily determined in advance. MOSEK makes a heuristic choice between the two forms, and the dualization is transparent to the user.

Furthermore, a presolve step is applied to the problem, which often leads to a significant reduction in computational complexity [1]. The presolve step removes obviously redundant constraints, tries to remove linear dependencies, etc. Finally, many optimization problems are badly scaled, so MOSEK rescales the problem before solving it. The rescaling is very simple, essentially normalizing the rows and columns of the A.

7.2 Computing the search direction

Usually the most expensive operation in each iteration of the primal-dual algorithm is to compute the search direction, i.e., solving the linear system

$$\begin{aligned} \left[ \begin{array}{ccc} 0 &{}\quad A &{}\quad -b\\ -A^T &{}\quad 0 &{}\quad c\\ b^T &{}\quad -c^T &{}\quad 0 \end{array} \right] \left[ \begin{array}{c} \varDelta y\\ \varDelta x\\ \varDelta \tau \end{array} \right] - \left[ \begin{array}{c} 0 \\ \varDelta s\\ \varDelta \kappa \end{array} \right] = \left[ \begin{array}{c} r_p \\ r_d \\ r_g \end{array} \right] \\ W\varDelta x+ W^{-T}\varDelta s= r_{xs}, \qquad \tau \varDelta \kappa + \kappa \varDelta \tau = r_{\tau \kappa }, \end{aligned}$$

where W is block-diagonal scaling matrix for a product of cones. Eliminating \(\varDelta s\) and \(\varDelta \kappa \) from the linearized centrality conditions results in the reduced bordered system

$$\begin{aligned} \left[ \begin{array}{ccc} W^T W &{}\quad -A^T &{}\quad c \\ A &{}\quad 0 &{}\quad -b\\ -c^T &{}\quad b^T &{}\quad \tau ^{-1}\kappa \end{array} \right] \left[ \begin{array}{c} \varDelta x\\ \varDelta y\\ \varDelta \tau \end{array} \right] = \left[ \begin{array}{c} r_d + W^T r_{xs} \\ r_p \\ r_g + \tau ^{-1} r_{\tau \kappa } \end{array} \right] , \end{aligned}$$

which can be solved in different ways. Given a (sparse) \(LDL^T\) factorization of the symmetric matrix

$$\begin{aligned} \left[ \begin{array}{ccc} -W^T W &{}\quad A^T \\ A &{}\quad 0 \\ \end{array} \right] \end{aligned}$$

it is computationally cheap to compute the search direction. In the case that the factorization breaks down due to numerical issues we add regularization to the system, i.e., we modify the diagonal, which is common in interior-point methods. If the resulting search direction is inaccurate (i.e., the residuals are not decreased sufficiently) we use iterative refinement, which in most cases improves the accuracy of the search direction. We omit details of computing the \(LDT^T\) factorization, since it is fairly conventional and close to the approach discussed in [2].

The algorithm is implemented in the C programming language, and the Intel MKL BLAS library is used for small dense matrix operations; the remaining portions of the code are developed internally, and the most computationally expensive parts have been parallelized.

7.3 The termination criteria

We next discuss the termination criteria employed in MOSEK. Let \((\varepsilon _p, \varepsilon _d, \varepsilon _g, \varepsilon _i)>0\) be given tolerance levels for the algorithm, and denote by \(({\hat{x}}^k,{\hat{s}}^k,y^k,\tau ^k,\kappa ^k)\in \mathbf{{int}}({{\mathcal {D}}})\) the kth interior-point iterate. Consider the metrics

$$\begin{aligned} \rho _p^k&:= \min \left\{ \rho \mid \, \left\| A \frac{{\hat{x}}^k}{\tau ^k} - b \right\| _\infty \le \rho \varepsilon _p (1+\left\| b \right\| _\infty ) \right\} , \\ \rho _d^k&:= \min \left\{ \rho \mid \, \left\| A^T \frac{y^k}{\tau ^k} + \frac{{\hat{s}}^k}{\tau ^k}- c \right\| _\infty \le \rho \varepsilon _d (1+\left\| c \right\| _\infty ) \right\} , \end{aligned}$$

and

$$\begin{aligned} \rho _g^k&:= \min \left\{ \rho \mid \, \min \left( \frac{{\langle {\hat{x}}^k, {\hat{s}}^k \rangle }}{ (\tau ^k)^2 }, \left| \frac{{\langle c, {\hat{x}}^k \rangle }}{\tau ^k} - \frac{ {\langle b, y^k \rangle }}{\tau ^k} \right| \right) \right. \\&\left. \le \rho \varepsilon _g \max \left( 1, \frac{ \min \left( \left| {\langle c, {\hat{x}}^k \rangle } \right| , \left| {\langle b, y^k \rangle } \right| \right) }{\tau ^k} \right) \right\} . \end{aligned}$$

If

$$\begin{aligned} \max (\rho _p^k, \rho _d^k, \rho _g^k) \le 1 \end{aligned}$$

then

$$\begin{aligned} \begin{array}{rcl} \left\| A \frac{{\hat{x}}^k}{\tau ^k} - b \right\| _\infty &{} \le &{} \varepsilon _p (1+\left\| b \right\| _\infty ), \\ \left\| A^T \frac{y^k}{\tau ^k} + \frac{{\hat{s}}^k}{\tau ^k}- c \right\| _\infty &{} \le &{} \varepsilon _d (1+\left\| c \right\| _\infty ), \\ \min \left( \frac{{\langle {\hat{x}}^k, {\hat{s}}^k \rangle }}{ (\tau ^k)^2 }, \left| \frac{{\langle c, {\hat{x}}^k \rangle }}{\tau ^k} - \frac{{\langle b, y^k \rangle }}{\tau ^k} \right| \right) &{} \le &{} \varepsilon _g \max \left( 1, \frac{ \min \left( \left| {\langle c, {\hat{x}}^k \rangle } \right| , \left| {\langle b, y^k \rangle } \right| \right) }{\tau ^k} \right) , \\ \end{array} \end{aligned}$$

and hence \(({\hat{x}}^k,y^k,{\hat{s}}^k)/\tau ^k\) is an almost primal and dual feasible solution with small duality gap. Clearly, the quality of the approximation is depends on the problem and the specified tolerances \((\varepsilon _p, \varepsilon _d, \varepsilon _g, \varepsilon _i)\). Therefore, \(\rho _p^k\) and \(\rho _d^k\) measure how far the kth iterate is from being approximately primal and dual feasible, respectively. Furthermore, \(\rho _g^k\) measure how far the kth iterate is from having a zero duality gap.

Similarly, define infeasibility metrics

$$\begin{aligned} \rho _{pi}^k&:= \min \left\{ \rho \mid \, \left\| A^T y^k + {\hat{s}}^k \right\| _\infty \le \rho \varepsilon _i {\langle b, y^k \rangle },\, {\langle b, y^k \rangle } > 0 \right\} , \\ \rho _{di}^k&:= \min \left\{ \rho \mid \, \left\| A {\hat{x}}^k \right\| _\infty \le - \rho \varepsilon _i {\langle c, {\hat{x}}^k \rangle },\, {\langle c, {\hat{x}}^k \rangle } < 0 \right\} , \end{aligned}$$

and

$$\begin{aligned} \rho _{ip}^k := \min \left\{ \rho \mid \, \left\| \begin{array}{c} A^T y^k + {\hat{s}}^k \\ A {\hat{x}}^k \end{array} \right\| _\infty \le \rho \varepsilon _i \left\| \begin{array}{c} y^k \\ {\hat{s}}^k \\ {\hat{x}}^k \end{array} \right\| _\infty ,\, \left\| \begin{array}{c} y^k \\ {\hat{s}}^k \\ {\hat{x}}^k \end{array} \right\| _\infty >0 \right\} . \end{aligned}$$

If \(\rho _{pi} \le 1\) then

$$\begin{aligned} \left\| A^T y^k + {\hat{s}}^k \right\| _\infty \le \varepsilon _i {\langle b, y^k \rangle },\quad {\langle b, y^k \rangle } > 0. \end{aligned}$$

Thus, for

$$\begin{aligned} {\bar{y}} := \frac{y^k}{ {\langle b, y^k \rangle } }, \quad {\bar{s}} := \frac{{\hat{s}}^k}{ {\langle b, y^k \rangle } } \end{aligned}$$

we have

$$\begin{aligned} b^T {\bar{y}} \ge 1, \quad \left\| A^T {\bar{y}} + {\bar{s}} \right\| \le \varepsilon _i, \quad {\bar{s}} \in K^* \end{aligned}$$

i.e., \(({\bar{y}},{\bar{s}})\) is an approximate certificate of primal infeasibility. Similarly, if \(\rho _{di} \le 1\) we have an approximate certificate of dual infeasibility. Finally, assume that \(\rho _{ip} \le 1\). Then

$$\begin{aligned} \left\| \begin{array}{c} A^T y^k + {\hat{s}}^k \\ A {\hat{x}}^k \end{array} \right\| _\infty \le \varepsilon _i \left\| \begin{array}{c} y^k \\ {\hat{s}}^k \\ {\hat{x}}^k \end{array} \right\| _\infty ,\, \left\| \begin{array}{c} y^k \\ {\hat{s}}^k \\ {\hat{x}}^k \end{array} \right\| _\infty >0 \end{aligned}$$

is an approximate certificate of ill-posedness. For example, if \(\left\| y^k \right\| _\infty \gg 0\) then a tiny perturbation in b will make the problem infeasible. Hence, the problem is by definition unstable.

8 Numerical results

We investigate the numerical performance of our implementation on a selection of exponential cone problems from the Conic Benchmark Library (CBLIB) [29], as well as a selection of customer provided problems. Some of those problems have integer variables, in which case we solve their continuous relaxations, i.e., we ignore the integrality constraints. In the study we compare the performance of MOSEK 9.2, both with and without the proposed corrector. In the case without a corrector we also disable the standard Mehrotra corrector effecting linear and quadratic cones; otherwise the residuals will not decrease at the same rate. We also compare our implementation with the open-source solver ECOS [6], which implements the algorithm by Serrano [26]. Since ECOS only supports linear, quadratic and exponential cones we limit the test-set to examples with combinations of those cones; there are also instances in CBLIB with both exponential and semidefinite cones.

Fig. 2 shows histograms of the number of iterations required to the solve the problems for ECOS and MOSEK (with and without the proposed corrector). The figure shows a substantial advantage of the proposed corrector over a wide selection of test problems, both in terms of stability and required number of iterations.

Fig. 2
figure 2

Histograms of solver iterations for 326 test problems with exponential cones. MOSEK (including corrector) succesfully solved all instances and generally required the fewest iterations. MOSEK N/C (no corrector) solved 300 instances, and ECOS solved 201 instances. The number of iterations was limited to 400 in all solvers

9 Conclusions

Based on previous work by Tunçel we have presented a generalization of the Nesterov-Todd algorithm for symmetric conic optimization to handle the nonsymmetric exponential cone. Our main contribution is a new Mehrotra-like corrector search direction for the to nonsymmetric case, which improves practical performance significantly. Moreover, we presented a practical implementation with extensive computational results documenting the efficiency of proposed algorithm. Indeed the suggested algorithm is significantly more robust and faster than the current state of the art software for nonsymmetric conic optimization ECOS.

Possible future work includes establishing the complexity of the algorithm and applying it to other nonsymmetric cone types, possibly of larger dimensions. One such is example is the nonsymmetric cone of semidefinite matrices with sparse chordal structure [32], which could extend primal-dual solvers like MOSEK with the ability to solve large sparse semidefinite programs.