A primal-dual interior-point algorithm for nonsymmetric exponential-cone optimization

A new primal-dual interior-point algorithm applicable to nonsymmetric conic optimization is proposed. It is a generalization of the famous algorithm suggested by Nesterov and Todd for the symmetric conic case, and uses primal-dual scalings for nonsymmetric cones proposed by Tunçel. We specialize Tunçel’s primal-dual scalings for the important case of 3 dimensional exponential-cones, resulting in a practical algorithm with good numerical performance, on level with standard symmetric cone (e.g., quadratic cone) algorithms. A significant contribution of the paper is a novel higher-order search direction, similar in spirit to a Mehrotra corrector for symmetric cone algorithms. To a large extent, the efficiency of our proposed algorithm can be attributed to this new corrector.


Introduction
In 1984 Karmarkar [11] presented an interior-point algorithm for linear optimization with polynomial complexity. This triggered the interior-point revolution which gave rise to a vast amount research on interior-point methods. A particularly important result was the analysis of so-called self-concordant barrier functions, which led to polynomial-time algorithms for linear optimization over a convex domain with a selfconcordant barrier, provided that the barrier function can be evaluated in polynomial The material in this manuscript has not published or submitted elsewhere.
B Joachim Dahl joachim.dahl@mosek.com 1 MOSEK ApS, Copenhagen, Denmark time. This was proved by Nesterov and Nemirovski [19], and as a consequence convex optimization problems with such barriers can be solved efficently by interior-point methods, at least in theory.
However, numerical studies for linear optimization quickly demonstrated that primal-dual interior-point methods were superior in practice, which led researchers to generalize the primal-dual algorithm to general smooth convex problems. A major breakthrough in that direction is the seminal work by Nesterov and Todd (NT) [20,21] who generalized the primal-dual algorithm for linear optimization to self-scaled cones, with the same complexity bound as in the linear case. Güler [8] later showed that the self-scaled cones correspond to the class of symmetric cones, which has since become the most commonly used term. The good theoretical performance of primal-dual methods for symmetric cones has since been confirmed computionally, e.g., by [2,28].
The class of symmetric cones has been completely characterized and includes 5 different cones, where the three most interesting ones are the nonnegative orthant, the quadratic cone, and the cone of symmetric positive semidefinite matrices, as well as products of those three cones. Although working exclusively with the symmetric cones is a limitation, they cover a great number of important applications, see, e.g., Nemirovski [16] for an excellent survey of conic optimization over symmetric cones.
Some convex sets with a symmetric cone representation are more naturally characterized using nonsymmetric cones, e.g., semidefinite matrices with chordal sparsity patterns, see [32] for an extensive survey. Thus algorithmic advancements for handling nonsymmetric cones directly could hopefully lead to both simpler modeling and reductions in computational complexity. Many other important convex sets cannot readily be modeled using symmetric cones, e.g., convex sets involving exponentials or logarithms, for which no representation using symmetric cones is known. In terms of practical importance, the three dimensional power-and exponential-cones are perhaps the most important nonsymmetric cones; Lubin et al. [12] showed how all instances in a large benchmark library can modeled using the three symmetric cones as well the three dimensional power-and exponential cone.
Generalizing methods from symmetric to nonsymmetric cones is not straightforward, however. In [22] Nesterov et al. suggest a long-step algorithm using both the primal and dual barriers effectively doubling the size of the linear system solved at each iteration. For small-dimensional cones (such as the exponential cone) this overhead might be acceptable.
More recently Nesterov [18] proved the existence of a NT-like primal-dual scaling in the vicinity of the central path, leading to an algorithm that uses only a single barrier, but is restricted to following the central path closely. Compared to [22] this method has a main advantage of reducing the size of the linear system to the same size of the algorithms for symmetric cones; a similar advantage is shared by algorithms using explicit primal-dual scalings, e.g., scalings by Tunçel [30] considered later. At each iteration Nesterov's method [18] has a centering phase, which brings the current iterate close to the central path, which is followed by one affine step which brings the iterate closer to optimum. From a practical perspective this is a significant drawback of the method since the centering phase is computationally costly.
Hence, the centering and affine steps should be combined as in the symmetric cone case. Also both algorithms [18,22] are feasible methods, i.e., they require either a strictly feasible known starting point or some sort of phase-I method to get a feasible starting point. Skajaa and Ye [27] extended Nesterov's method [18] with a homogeneous model, which also simplifies infeasibility detection. In practice, however, the method of Skaajaa and Ye is not competitive with other methods [3] due to the centering steps.
In more recent work Serrano and Ye [26] improves the algorithm in [27] such that explicit centering steps are not needed, but instead restricting iterates to a vicinity of the central path. This method has been implemented as part of the ECOS solver [6] and will be used for comparison in Sect. 8.
A different approach to non-symmetric conic optimization was proposed by Tunçel [30], who extends the concepts of the NT algorithm in a more direct fashion. Nesterov and Todd [20,21] showed existence of a primal-dual scaling defined by a single scaling point w satifying two secant equations s = F (w)x and F (x) = F (w)F * (s), where F and F denotes the first-and second-order derivatives of the barrier. Furthermore, the scaling F (w) is shown to be bounded, which is a key property of establishing polynomial-time complexity of the NT algorithm. Tunçel later showed in [30] that such a scaling point w exists for any convex cone K , but satisfying only one secant equation s = F (w)x, and if the barrier has negative curvature then w is unique [23]. To enforce both secant equations in the nonsymmetric case, Tunçel [30] considered a sequence of low-rank quasi-Newton updates to a general positive definite matrix.
These ideas were further explored by Myklebust and Tunçel [15] and also in [14] and form the basis of our proposed algorithm. Following this line of work, the essential difference from a symmetric NT algorithm is the computation of a general positve definite scaling matrix, satisfying the same secant equations, but without relying on a given scaling-point w. Furthermore, such scaling matrices should be bounded to ensure polynomial-time complexity.
For three-dimensional cones these scaling matrices are particularly simple and characterized by a single scalar, as shown in Sect. 5. Uniform boundedness of the scaling matrices is not established in this correspondence, but we comment on an efficient method for computing the most bounded scaling matrix using the formulations developed in Sect. 5.
It is also possible to develop algorithms for convex optimization specified on functional form. This has been done by several authors, for example by [3,4,7,9] who all solve the KKT optimality conditions of the problem in functional form. The algorithms all require a sort of merit or penalty function, which often require problem specific parameter tuning to work well in practice. Another strain of research is the work Nemirovski and Tunçel [17] and very recently Karimi and Tunçel [10] who advocate a non-linear convex formulation instead of a conic formulations, explicitly using self-concordant barriers for the convex domains.
From a theoretical point of view the different algorithms for nonsymmetric cones (including the functional formulations) all share the same best-known complexity bounds as the symmetric counterparts. Whether these methods are competetive in practice with algorithms for symmetric cones is still an unanswered question, though.
The algorithm we consider herein uses the scaling matrices by Tunçel [30], resulting in an algorithm that is similar to the symmetric counterpart; the linear system solved at each iterations is very similar and both the residuals and the complementarity gap decrease at the same rate. The algorithm is a natural extension of the Nesterov-Todd algorithm implemented for symmetric cones in, e.g., SeDuMi [28] and MOSEK [3].
It is well known that the Mehrotra predictor-corrector idea [13] leads to vastly improved computational performance in the symmetric cone case. One of our main contributions is a new corrector for nonsymmetric cones, closely tied to Tunçel's primal dual scalings. It is derived from a second-order approximation of the centrality condition s μ = −μF (x μ ) and thus involves third-order directional derivatives. The proposed corrector loosely follows the central path, without restricting the size of the neighborhood, thereby allowing the algorithm to take longer steps, and it shares similarities with the standard Mehrotra corrector; in particular they coincide if F(x) = − i log x i denotes the standard barrier for linear optimization.
We demonstrate numerically that the proposed corrector offers a substantial and consistent reduction in the number of iterations required to solve the problems in all our numerical studies. Skajaa and Ye [27] suggest a different corrector by characterizating the full central path as a differential equation, solved by a Runge-Kutta method. An immediate draw-back of Skajaa's Runge-Kutta method is that it requires additional factorizations of the full KKT system, which significantly adds to the overall solution time.
The remaining paper is structured as follows. We define basic properties for the exponential cone in Sect. 2, and we discuss the homogeneouous model, the central path and related metrics in Sect. 3. In Sect. 4 we discuss search-directions assuming a primal-dual scaling is known, and we derive a our new corrector and provide a simple numerical example that illustrates how the corrector algorithm makes more steady progress. In Sect. 5 we discuss new characterizations of the primal-dual scalings from [15,30], which reduce to univariate characterizations for three-dimensional cones.
In Sect. 6 we give a collected overview of the suggested path-following algorithm. This is followed by a discussion of some implementation details. Next in Sect. 8 we present numerical results on a moderately large collection of exponential-cone problems. We conclude in Sect. 9, and in the appendix we give details of the first-, second-and third-order derivatives of the barrier for the exponential cone.

Preliminaries
In this section we list well-known properties of self-concordant and self-scaled barriers, which are used in the remainder of the paper. The proofs can be found in references such as [20,21]. We consider a pair of primal and dual linear conic problems and where y ∈ R m , s ∈ R n and K ⊂ R n is a proper cone, i.e., a pointed, closed, convex cone with non-empty interior. We assume throughout the paper that A has full rank, i.e., the rows A are linearly independent. The dual cone K * is If K is proper then K * is also proper. A cone K is called self-dual if there is positive definite map between K and K * , i.e., if holds for all x ∈ int(K ) and for all u ∈ R n . For a pointed cone ϑ ≥ 1. We will refer to a LHSCB simply as a self-concordant barrier. If F 1 and F 2 are ϑ 1 and ϑ 2self-concordant barriers for K 1 and K 2 , respectively, then Some straightforward consequences of the homogeneouos property include If F is a ϑ-self-concordant barrier for K , then the Fenchel conjugate For a ϑ-self-concordant barrier, the so-called Dikin ellipsoid is included in the cone for r < 1, i.e., E(x, r ) ⊂ int(K ) for r < 1, and F is almost quadratic inside this ellipsoid, Self-scaled cones are equivalent to symmetric cones and they satisfy the stronger long-step Hessian estimation property denotes the distance to the boundary. Many properties of symmetric cones follow from the fact that the barriers have negative curvature F (x) [u] 0 for all x ∈ int(K ) and all u ∈ K . An interesting property proven in [23] is that if both the primal and dual barrier has negative curvature then the cone is symmetric.
In addition to the three symmetric cones (i.e., the nonnegative orthant, the quadratic cone and the cone of symmetric positive semidefinite matrices) we mainly consider the nonsymmetric exponential cone studied by Charez [5] in the present work.
with a 3-self-concordant barrier, The dual exponential cone is For the exponential cone the conjugate barrier F * (s) or its derivatives cannot be evaluated on closed-form, but it can be evaluated numerically to high accuracy (e.g., with a damped Newton's method) using the definition (1), i.e., if We conclude this survey of introductory material by listing some of the many convex sets that can be represented using the exponential cone, or a combination of exponential cones and symmetric cones. The epigraph t ≥ e x can be modelled as (t, 1, x) ∈ K exp and similarly for the hypograph of the logarithm t ≤ log x ⇔ (x, 1, t) ∈ K exp . The hypograph of the entropy function, t ≤ −x log x is equivalent to (1, x, t) ∈ K exp , and similarly for relative entropy t ≥ x log(y/x) ⇔ (y, x, −t) ∈ K exp . The soft-plus function log(1 + e x ) can be thought of as a smooth approximation of max{0, x}. Its epigraph can be modelled as The epigraph of the logarithm of a sum exponentials can modelled as These examples all have auxiliary variables and constraints in their conic representations, which might suggest that an algorithm working directly with a barrier of the convex domain (e.g., [10]) is more efficient. However, a conic formulation has the advantage of nicer conic duality, and it is easy to exploit the special (sparse) structure from the additional constraints and variables in the linear algebra implementation, thereby eliminating the overhead in a conic formulation.

The homogeneous model and central path
In the simplified homogeneous model we embed the KKT conditions for (P) and (D) into the homogeneous self-dual model We denote the primal-dual variables by (x,ŝ) to distinguish them from augmented variables defined next. Let We than have where K := K 1 × · · · × K k+1 has a barrier The KKT conditions can then be expressed succinctly as where D := K × K * × R m . Given an initial z 0 ∈ int(D) we consider a central path z μ as the solution to parametrized by μ ∈ (0, 1], and on the central path we have The following lemma gives an equivalent variational characterization of the central path. Then We omit the proof, which follows from the optimality conditions for minimizing Ψ (z). In [22] the central path is defined from the variational characterization in Lemma 1, and they prove that the definition in (4)-(5) is equivalent. From the variational characterization z μ is well-defined, and lim μ→0 z μ = z satisfies (see, e.g., [22]), 1. x , s = 0. 2. If τ > 0 thenx /τ is an optimal solution for (P) and (y ,ŝ )/τ is an optimal solution for (D).
In the following neighborhood definitions and subsequent algorithms we consider iterates (x, s, y) that are generally not on the central path, and we define A neighborhood used by Skajaa and Ye [27] is then which characterizes the central path for β = 0. We can think of (6) as a generalization of the standard two-norm neighborhood X Se − μe 2 ≤ βμ from linear optimization [33]. A different neighborhood is due to Nesterov and Todd [21]. We define shadow iterates (following [30] andμ for an iterate (x, s, y) ∈ D. Nesterov and Todd [21] then showed that μμ ≥ 1 with equality only on the central path. This leads to a different neighborhood βμμ ≤ 1 for β ∈ (0; 1], or equivalently This is satisfied if leading to another neighborhood definition which (in contrast (6)) characterizes the central path for β = 1. We use the neighborhood N (β), which can be seen as a generalization of the one-sided ∞-norm neighborhood.
Both the central path and the neighborhood depend on the initial values. A simple choice is y 0 = 0 and which are optimality conditions for minimizing If K i = K exp this can be solved off-line using a backtracking Newton's method to get For the symmetric cones and the three-dimensional power-cone such a central starting point can be found analytically. Then

Search-directions using a primal-dual scaling
In this section we define search directions, assuming a primal-dual scaling is known; how to compute such scalings is discussed in Sect. 5. We consider an iterate (x, s, y) ∈ int(D) and nonsingular primal-dual scalings W i satisfying double secant equations wherex i ands i are the shadow iterates defined in (7). Let We can then express the primal-dual scaling succinctly as where Whenever we encounter a scaling W in the remainder of this paper, it is assumed that W satisfies the double secant equation (8) for the current iterate.
To linearize the centrality condition we consider with a linearization given by On the central path we can express (9) as s) is not on the central path then (10) is an approximate linearization of the symmetric centrality condition v = μṽ with a quality determined by the distance W T W − μF (x) . Thus, the assumption that W T W ≈ μF (x) is important for our proposed algorithm, including the corrector studied later in this section. We next define different search-directions used in the proposed algorithm. The affine search-direction is the solution to and is characterized by the following lemma.
We next consider a higher-order corrector for nonsymmetric cones, with some similarities to a Mehrotra corrector for symmetric cones. We consider the first-and second-order derivatives of s μ = −μF (x μ ) with respect to μ, From (13) we have resulting in an expression for the third-order directional derivative, where (15) follows from the homogeneity property This results in an alternative expression for the second-order derivative of the centrality condition, i.e.,s Assuming that W T W ≈ μF (x) and comparing (11) and (13) we interpret Δs a = −ṡ μ /μ and Δx a = −ẋ μ /μ, leading to the definition of our corrector term Thus a pure corrector search-direction can be defined as the solution to and satisfies properties given in the following lemma. We note that reduces to the familiar expression κΔτ c + τ Δκ c = −Δτ a Δκ a . For a given centering parameter γ > 0 we define a combined search-direction as the solution to with properties given in the following lemma.
Proof From (18) and Lemma 3 we have that and skewsymmetry implies that i.e., Δx, Δs = 0. The last part now follows.
The search-direction (18) forms the basis of our algorithm. For a given step-size α ∈ (0, 1] the residuals and the complementarity gap decrease at the same rate. More explicitly, for μ k := x k , s k /ϑ the residuals are the kth iteration are which should be compared with central path definition in Lemma 1. Similarly, the complementarity gap at the kth iteration is This is in contrast with other methods [18,26,27], which do not decrease the complementarity gap at the same rate. Also, no explicit merit function as in [9] is required to ensure a balanced decrease of the residuals and the complementarity gap. Tunçel [30] showed polynomial complexity of an infeasible method (without a corrector) assuming boundedness of the scaling matrices. These results were further extended by Myklebust and Tunçel [15] to include an analysis for scaling matrices obtained by a BFGS scaling (such scalings are considered in the next section). Although we use slightly different scaling matrices, their analysis could be be applied in a small neighborhood around the central path. Thus a short-step algorithm using the search directions herein would likely inherit good theoretical performance. It is also possibility that future studies will prove a conjucture that the scaling matrices considered herein are bounded, which would simplify a complexity analysis.
To gain some insight into the corrector η we note that in the case of the nonnegative orthant we have the familiar expression and similarly for the semidefinite cone we have using the generalized product associated with the Euclidean Jordan algebra, see, e.g., [31] for a discussion of Euclidean Jordan algebra in a similar context as ours. For the Lorentz cone we have . Let e k denote the kth standard basis-vector (i.e., the vector with value 1 in position k and 0 elsewhere). Then again using the notation of the generalized product [31]. We defer the derivation and implementation specific details for the exponential cone to the appendix.
As an illustration of the proposed corrector we consider a simple example, In Table 1 we list the complementarity gap x k , s k for the iterates produced using the search direction (18) with and without the proposed corrector, using the scaling matrices defined in Sect. 5. In Fig. 1 we plot the same iterates x k /τ projected onto the hyperplane x 1 +x 2 +x 3 = 1. Although some iterates appear close to the boundary they are all within the defined neighborhood, i.e., no cut-back is performed to stay within the neighborhood. We see how the corrector algorithm makes significantly more progress and thereby reduces the required number of iterations. A similar observation is made for a wide selection of test-problems in Sect. 8.

Primal-dual scalings
In this section we review key results for primal-dual scalings by Tunçel [30], we make connections to the theory of multiple secant-equation updates by Schnabel [25], and  1 Central path, boundary, and iterates for example (19), projected onto the hyperplane x 1 +x 2 +x 3 = 1. The iterates using the corrector direction are plotted in the darker solid line, and iterates without the corrector are plotted in ligher solid line we present new formulations, which were used collaboratively in [24] to compute optimally bounded scaling matrices specifically for three-dimensional cones. Tunçel [30] defines a set of scalings i.e., positive definite scalings satisfying double secant equations. Tunçel further defines a set of bounded scalings parametrized by ξ , where δ F := [ϑ(μμ − 1) + 1]/μ. Let In the case where ξ ∈ O(1) over all (x, s) ∈ int(K ) × int(K * ), Tunçel proved an iteration complexity bound of O( √ ϑ log(1/ )) for an infeasible-start primal-dual interior point algorithm (coinciding with the best known complexity bound for interiorpoint methods). The parameter δ F and set T 2 (x, s, ξ) are further addressed towards the end of this section in the context of finding optimally bounded scaling matrices.
A self-scaled cone has a unique Nesterov-Todd scaling point w satisfying double secant equations, Furthermore, F (w) is bounded (see [20,21,30]) in the sense that A barrier F(x) is said to have negative curvature if for all x ∈ int(K ) and for u ∈ K we have Thus for general nonsymmetric cones the essential question is how to define bounded scaling matrices satisfying both secant equations, without relying on a scaling point w. Tunçel [30] partly answers that question by deriving scalings T ∈ T 1 (x, s) using BFGS update equations, resulting in a rank-4 update to a given positive definite matrix. It is still instructive, however, to review work by Schnabel on quasi-Newton methods with multiple secant equations. In particular, the following theorem from [25] is used repeatedly in the following discussion. As a consequence we can write any such H 0 as One may verify that given any Ω 0 satisfying Ω S = Y , the following identify holds and therefore One of the most popular quasi-Newton update rules is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) step in (23), where H 0 denotes an approximation of the Hessian.
It is well known (see, e.g., [25]) that the update (23) is the solution to for any Ω 0 satisfying Ω S = Y . In the following theorem we show how (23) can be computed using a sequence of updates, similar to quasi-Newton updates with a single secant equation.
Proof The proof is constructive and follows from a Cholesky factorization of Y T 0 S 0 . Let In the proof we first show that Y T 0 S 0 = L L T , and we then show second that LV T = Y T 0 , which in turn implies (25). We start by showing that the recursion (25), (26) is well-defined. Let Ψ k be the principal submatrix obtained from the last p − k rows and columns of i.e., Ψ k is the Schur-complement of the first element of Ψ k−1 and therefore positive definite. We next make a simplifying observation, namely that the first k columns of Y k and S k are zero. From (25), (26) we immediately have that Y k e k = S k e k = 0, k = 1, . . . p, and that sparsity propagates to subsequent steps, i.e., Y j e k = S j e k = 0, j > k.
We can now prove that L L T = Y T 0 S 0 . We have Repeated use of (27) in (28) then shows that We finally show that LV T = Y T 0 . From repeated use of (25) it follows that leading to an expression for the BFGS update as a rank 4 update to H 0 (for a general H ), Considering (24), we see that the BFGS update to H := μF (x) has the desirable property of minimizing measured in a weighted norm; for simplicity we can assume that Ω = W T W . With this choice of H we have which curiously reduces to a rank 3 update, We conclude this section by considering the three-dimensional case, for which the expressions simplify significantly. It follows from Theorem 1 that any scaling (29) has the form where S T z = 0, Y T r = 0 and r , z = 1, i.e., the scaling is essentially characterized by t > 0. For simplicity, we assume z = 1. We compute z and r using cross-products, In the three dimensional case we can devise a simple algorithm for finding scalings achieving the bound (22). For notational convinience we introduce Q := r , S , which is nonsingular. We can then solve using a simple bisection algorithm. Consider the monotonically decreasing function, and the monotonically increasing increasing function Given upper and lower bounds on t, the solution to (31) can then be found using bisection on t to solve ξ l (t) = ξ u (t). Such a bisection method was considered in [24], where it was conjectured that ξ ≈ 1.253 for the exponential cone. The BFGS scaling corresponds to We have tried both the optimal scaling and the BFGS scaling for all numerical test problems in Sect. 8, without noticing a significant difference in required number of iterations or quality of the solution. For all the problems the largest observed bound on ξ was 1.72 for the BFGS scaling; for simplicity we only report results for the simpler BFGS scaling in the following. In preliminary experiments we have observed similarly encouraging numerical results for higher dimensional non-symmetric cones using the BFGS scaling. The bisection algorithm is still valuable as a reference, however, and we hope that future studies will prove the conjecture, and possibly derive similar bounds for other three dimensional cones (including a tighter bound than 4/3 for quadratic cones).

A primal-dual algorithm for exponential cone optimization
In this section we give a collected overview of the suggested path-following primaldual algorithm. The algorithm is specialized for three-dimensional cones (in particular, the exponential cone) by using cross-products for computing the BFGS scalings. By computing the scaling matrices as a general rank 3 update (30) the algorithm is readily adapted to other non-symmetric cones.
We fix β to a constant low value, for example β = 10 −6 . The different essential parts of the method are i) finding a starting point, ii) computing a search-direction and step-size, and iii) checking the stopping criteria for termination.
-Scaling matrices. Compute BFGS scaling matrices where z k = (x k ⊗x k )/ x k ⊗x k , r k = (s k ⊗s k )/ s k ⊗s k , z k and t k is chosen from (32). Note that {z k } and z are unrelated; the later denotes the aggregation of all primal and dual variables. -Search-direction and step-size. Compute an affine direction Δz a as the solution to (11), From Δz a we compute a corrector (16), similar to Mehrotra [13], where details on evaluting the derivatives are given in the appendix, see (34). We define a centering parameter γ as where α a is the stepsize to the boundary, i.e., which we approximate using a bisection procedure. We then compute a combined centering-corrector search direction Δz as the solution to (18), and we update z := z +αΔz with the largest step α ∈ [0; 1) inside a neighborhood N (β) of the central path. -Checking termination. Terminate if the updated iterate satisfies the termination criteria (given in Sect. 7.3) or else take a new step.

Implementation
MOSEK is software package for solving large scale linear and conic optimization problems. It can solve problems with a mixture of linear, quadratic and semidefinite cones, and the implementation is based on the homogeneous model, the NT search direction and a Mehrotra like predictor-corrector algorithm [2]. Our implementation has been extended to handle the three dimensional exponential cone using the algorithm above. We use the usual NT scaling for the symmetric cones and the Tunçel scalings for the nonsymmetric cones. Except for small differences in the linearization of the complementarity conditions, the symmetric and nonsymmmetric cones are handled completely analogously. Our extension for nonsymmetric cones also includes the three dimensional power cone, but this is not discussed further here.

Dualization, presolve and scaling
Occasionally it is worthwhile to dualize the problem before solving it, since it will make the linear algebra more efficient. Whether the primal or dual formulation is more efficient is not easily determined in advance. MOSEK makes a heuristic choice between the two forms, and the dualization is transparent to the user.
Furthermore, a presolve step is applied to the problem, which often leads to a significant reduction in computational complexity [1]. The presolve step removes obviously redundant constraints, tries to remove linear dependencies, etc. Finally, many optimization problems are badly scaled, so MOSEK rescales the problem before solving it. The rescaling is very simple, essentially normalizing the rows and columns of the A.

Computing the search direction
Usually the most expensive operation in each iteration of the primal-dual algorithm is to compute the search direction, i.e., solving the linear system where W is block-diagonal scaling matrix for a product of cones. Eliminating Δs and Δκ from the linearized centrality conditions results in the reduced bordered system which can be solved in different ways. Given a (sparse) L DL T factorization of the symmetric matrix it is computationally cheap to compute the search direction. In the case that the factorization breaks down due to numerical issues we add regularization to the system, i.e., we modify the diagonal, which is common in interior-point methods. If the resulting search direction is inaccurate (i.e., the residuals are not decreased sufficiently) we use iterative refinement, which in most cases improves the accuracy of the search direction.
We omit details of computing the L DT T factorization, since it is fairly conventional and close to the approach discussed in [2]. The algorithm is implemented in the C programming language, and the Intel MKL BLAS library is used for small dense matrix operations; the remaining portions of and ρ k i p := min If ρ pi ≤ 1 then Thus, forȳ i.e., (ȳ,s) is an approximate certificate of primal infeasibility. Similarly, if ρ di ≤ 1 we have an approximate certificate of dual infeasibility. Finally, assume that ρ i p ≤ 1. Then is an approximate certificate of ill-posedness. For example, if y k ∞ 0 then a tiny perturbation in b will make the problem infeasible. Hence, the problem is by definition unstable.

Numerical results
We investigate the numerical performance of our implementation on a selection of exponential cone problems from the Conic Benchmark Library (CBLIB) [29], as well as a selection of customer provided problems. Some of those problems have integer variables, in which case we solve their continuous relaxations, i.e., we ignore the integrality constraints. In the study we compare the performance of MOSEK 9.2, both with and without the proposed corrector. In the case without a corrector we also disable the standard Mehrotra corrector effecting linear and quadratic cones; otherwise the residuals will not decrease at the same rate. We also compare our implementation with the open-source solver ECOS [6], which implements the algorithm by Serrano [26]. Since ECOS only supports linear, quadratic and exponential cones we limit the test-set to examples with combinations of those cones; there are also instances in CBLIB with both exponential and semidefinite cones.

Conclusions
Based on previous work by Tunçel we have presented a generalization of the Nesterov-Todd algorithm for symmetric conic optimization to handle the nonsymmetric exponential cone. Our main contribution is a new Mehrotra-like corrector search direction for the to nonsymmetric case, which improves practical performance significantly. Moreover, we presented a practical implementation with extensive computational results documenting the efficiency of proposed algorithm. Indeed the suggested algorithm is significantly more robust and faster than the current state of the art software for nonsymmetric conic optimization ECOS.
Possible future work includes establishing the complexity of the algorithm and applying it to other nonsymmetric cone types, possibly of larger dimensions. One such is example is the nonsymmetric cone of semidefinite matrices with sparse chordal structure [32], which could extend primal-dual solvers like MOSEK with the ability to solve large sparse semidefinite programs.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

A Barrier function and derivatives
We consider derivatives up to third order of the exponential-cone barrier (2).

A.3 Third-order directional derivatives
We To evalutate the corrector (16) we compute using (33) where u := Δx a and F (x)v = Δs a . For stability we solve for v using the factored expression of F (x).