Hessian barrier algorithms for non-convex conic optimization

We consider the minimization of a continuous function over the intersection of a regular cone with an affine set via a new class of adaptive first- and second-order optimization methods, building on the Hessian-barrier techniques introduced in [Bomze, Mertikopoulos, Schachinger, and Staudigl, Hessian barrier algorithms for linearly constrained optimization problems, SIAM Journal on Optimization, 2019]. Our approach is based on a potential-reduction mechanism and attains a suitably defined class of approximate first- or second-order KKT points with the optimal worst-case iteration complexity $O(\varepsilon^{-2})$ (first-order) and $O(\varepsilon^{-3/2})$ (second-order), respectively. A key feature of our methodology is the use of self-concordant barrier functions to construct strictly feasible iterates via a disciplined decomposition approach and without sacrificing on the iteration complexity of the method. To the best of our knowledge, this work is the first which achieves these worst-case complexity bounds under such weak conditions for general conic constrained optimization problems.


Introduction
Let E be a finite dimensional vector space with inner product •, • and norm • .In this paper we are concerned with solving constrained conic optimization problems of the form min x f (x) s.t.: Ax = b, x ∈ K. (Opt) The main working assumption underlying our developments is as follows: Assumption 1.
1. K ⊂ E is a regular convex cone with nonempty interior K: K is closed convex, solid and pointed (i.e.contains no lines); 2. A : E → R m is a linear operator assigning each element x ∈ E to a vector in R m and having full rank 1 , i.e., im(A) = R m , b ∈ R m ; 3. The feasible set X = K ∩ L, where L = {x ∈ E|Ax = b}, has nonempty relative interior denoted by X = K ∩ L; 4. f : E → R is possibly non-convex, continuous on X and continuously differentiable on X; 5. Problem (Opt) admits a global solution.We let f min (X) = min{ f (x)|x ∈ X}.
Example 1.1 (NLP with non-negativity constraints).For E = R n and K ≡ KNN = R n + we recover non-linear programming problems with linear equality constraints and nonnegativity constraints: X = {x ∈ R n |Ax = b, and x i ≥ 0 for all i = 1, . . ., n}.
Example 1.3 (Semi-definite programming).If E = S n is the space of real symmetric n × n matrices and K ≡ KSDP = S n + is the cone of positive semi-definite matrices, we obtain a nonlinear semi-definite programming problem.Endow this space with the standard inner product a, b = tr(ab).In this case, the linear operator A assigns a matrix x ∈ S n to a vector Ax = [ a 1 , x , . . ., a m , x ] ⊤ .Such mathematical programs have received enormous attention due to the large number of applications in control theory, combinatorial optimization and engineering [9,31,54].

Motivating applications
Statistical estimation with non-convex regularization An important instance of (Opt) is the composite optimization problem where ℓ : R n → R is a smooth data fidelity function, ϕ : R → R is a convex function, p ∈ (0, 1), and λ > 0 is a regularization parameter.A common use of this problem formulation is the regularized empirical risk-minimization problem in high-dimensional statistics, or the variational regularization technique in inverse problems.Common specifications for the regularizing function are ϕ(s) = s, or ϕ(s) = s 2/p .In the first case, we obtain n i=1 ϕ(x p i ) = n i=1 x p i = x p p on K NN , whereas in the second case, we get n i=1 ϕ(x Note that the first case yields the objective f which is non-convex and non-differentiable at the boundary of the feasible set.It has been reported in imaging sciences that the use of such non-convex and non-differentiable regularizer has advantages in the restoration of piecewise constant images.[11] contains a nice survey of studies supporting this observation.Moreover, in variable selection, the L p penalty function with p ∈ (0, 1) owns the oracle property [34] in statistics, while L 1 (called the LASSO) does not; problem (1.1) with p ∈ (0, 1) can be used for variable selection at the group and individual variable levels simultaneously, while the very same problem with p = 1 can only work for individual variable selection [50].See [27,40] for a complexity-theoretic analysis of this problem.
Low rank matrix recovery Similar to the composite minimization problem (1.1), there are many relevant optimization problems defined on matrix domains E = S n , which are of the similar form, but now defined over a feasible set of the form X = {x ∈ E|Ax = b, x ∈ KSDP }.In particular, let us consider the composite model f (x) = ℓ(x) + r(x), with smooth loss function ℓ : E → R, and with regularizer given in form of a matrix function r(x) = i σ i (x) p on x ∈ K SDP , where p ∈ (0, 1) and σ i (x) is the i-th singular value of the matrix x.The resulting optimization problem is a matrix-version of the non-convex regularized problem (1.1).The use of the non-convex Schatten regularizer has received quite some attention because its favorable properties to promote sparse solutions.In particular, [51] used this approach to solve large-scale network localization problems with a potential reduction method based on a trust-region approach.Another application fitting into the above framework is the task to recover a low rank matrix X ∈ KSDP from measurements P(x) = d ∈ R m .To solve this problem an attractive formulation is to minimize f (x) = P(x) − d 2 + r(x), with r(x) a p-Schatten norm for p ∈ (0, 1).See [70] for a recent survey.

Challenges and contribution.
One of the challenges to approach problem (Opt) algorithmically is to deal with the feasible set L ∩ K.A projection-based approach faces the computational bottleneck to project onto the intersection of a cone with an affine set, which makes the majority of the existing firstorder [2,17,24,41,45,63] and second-order [13, 20, 22-24, 28, 29, 32, 62] methods practically less attractive, as they either are designed for unconstrained problems or use proximal steps in the updates.When primal feasibility is not a major concern, augmented Lagrangian algorithms [6,14,42] are an alternative, though they do not always come with complexity guarantees.These observations motivate us to focus on primal barrier-penalty methods that allow to decompose the feasible set and treat K and L separately.Barrier methods are classical and powerful for convex optimization in the form of interior-point methods, but the results in the non-convex setting are in a sense fragmentary, with many different algorithms existing for different particular instantiations of (Opt).In particular, the main focus of barrier methods for non-convex optimization has been on particular cases, such as non-negativity constraints [12,16,46,65,69,71] and quadratic programming [37,57,71].In this paper we develop a flexible and unifying algorithmic framework that is able to provide first-and second-order interior-point algorithms for (Opt) with potentially non-convex objective functions, potentially non-differentiable on the boundary, and general conic constraints.To the best of our knowledge, our method is the first one providing complexity results for first-and second-order algorithms to reach approximate first-and second-order KKT points, respectively, under such weak assumptions.
Our approach.At the core of our approach is the assumption that the cone K admits a logarithmically homogeneous self-concordant barrier (LHSCB) h(x) ([59], cf.Definition 2.1), for which we can retrieve information about the function value h(x), the gradient ∇h(x) and the Hessian H(x) = ∇ 2 h(x) with relative ease.This is not a very restrictive assumption, since all standard conic restrictions in optimization (i.e.KNN , KSOC and KSDP ) have this property.Using this barrier, our algorithms are designed to reduce the potential function where µ > 0 is a (typically) small penalty parameter.By definition, the domain of the potential function F µ is the interior of the cone K. Therefore, any algorithm designed to reduce the potential will automatically respect the conic constraints, and the satisfaction of the linear constraints L can be ensured by choosing search directions from the nullspace of the linear operator A. Our target is to identify points satisfying approximate necessary first-and second-order optimality conditions for problem (Opt) expressed in terms of ε-KKT and (ε 1 , ε 2 )-2KKT points respectively (cf.Section 3 for a precise definition).
Approaching first-order stationary points.To produce a first-order stationary point, we construct a novel gradient-based method, which we call the adaptive Hessian barrier algorithm (AHBA, Algorithm 1).The main computational steps involved in AHBA is the identification of a search direction and a step size policy, guaranteeing feasibility and sufficient decrease in the potential function value.To find a step direction, we employ a linear model for F µ regularized by the squared local norm induced by the Hessian of h which is then minimized over the tangent space of the affine set L. The step-size is adaptively chosen to ensure feasibility and sufficient decrease in the objective function value f .For a judiciously chosen value of µ, we prove that this gradient-based method enjoys the upper iteration complexity bound O(ε −2 ) for reaching an ε-KKT point when a "descent Lemma" holds relative to the local norm induced by the Hessian of h (cf.Assumption 3 and Theorem 4.2 in Section 4).We then embed AHBA into a path-following scheme that iteratively reduces the value of µ making the algorithm parameter-free and any-time convergent with the O(ε −2 ) complexity.
Approaching second-order stationary points.We next move on to derive a second-order method called the second-order adaptive Hessian barrier algorithm (SAHBA, Algorithm 3).Under this approach the step direction is determined by a minimization subproblem over the same tangent space.But, in this case, the minimized model is composed of the linear model for F µ augmented by second-order term for f and regularized by the cube of the local norm induced by the Hessian of h.The regularization parameter is chosen adaptively to allow for potentially larger steps in the areas of small curvature.For a judiciously chosen value of µ, we establish (see Theorem 5.3) the worst-case bound O(max{ε the number of iterations for reaching an (ε 1 , ε 2 )-2KKT point, under a weaker assumption that the Hessian of f is Lipschitz relative to the local norm induced by the Hessian of h (see Assumption 4 in Section 5 for a precise definition).We then propose a path-following version of SAHBA that iteratively reduces the value of µ making the algorithm parameterfree and any-time convergent with O(max{ε

Related work
To the best of our knowledge, AHBA and SAHBA are the first interior-point algorithms that achieve such complexity bounds universally for the general non-convex problem template (Opt).Our closest algorithmic and complexity-theoretic competitors are [46,65].
Both papers focus on the special case of non-negativity constraints as in Example 1.1 and fix µ before the start of the algorithm based on the desired accuracy ε, which may require some hyperparameter tuning in practice and may not work if the desired accuracy is not yet known.Interestingly, for the special case K = KNN , our general algorithms provide stronger results under weaker assumptions, compared to first-and second-order methods in [46] and first-order implementation of the second-order method in [65] (cf.Sections 4.4 and 5.3).
First-order methods.In the unconstrained setting, when the gradient is Lipschitz continuous, the standard gradient descent achieves the lower iteration complexity bound O(ε −2 ) to find a first-order ε-stationary point x such that ∇ f ( x) 2 ε [18,19,61].Notably, despite problem (Opt) has non-trivial constraints, our bound for AHBA matches this bound.The original motivation for our work comes from the paper [16] on Hessian Barrier Algorithms, which in turn was strongly influenced by the continuous-time techniques of [4,15].Our results include second-order method and general conic constraints and hold far beyond the realm of [16], where the complexity result is proved only for first-order method in the setting of non-negativity constraints and quadratic objective.
Approximate optimality conditions.[12] consider box-constrained minimization of the same objective as in (1.1) and propose a notion of ε scaled KKT points.Their definition is tailored to the geometry of the optimization problem, mimicking the complementarity slackness condition of the classical KKT theorem for the non-negative orthant.In particular, their first-order condition consist of feasibility of x along with a scaled gradient condition.[46,65] point out that without additional assumptions on f , points that satisfy the scaled gradient condition may not approach KKT points as ε decreases.Thus, [46,65], provide alternative notions of approximate first-and second-order KKT conditions for the setting of non-negativity constraints.Inspired by [46], we define the corresponding notions for general cones.Our first-order conditions turn out to be stronger than that of [46,65] and the second-order condition is equivalent to theirs in the particular case of non-negativity constraints (see Sections 4.4 and 5.3).The proof that our algorithms are guaranteed to find such approximate KKT points requires some fine analysis exploiting the structural properties of logarithmically homogeneous barriers attached to the cone K, which, to the best of our knowledge, appear to be novel.

Notation
In what follows E denotes a finite-dimensional real vector space, and E * the dual space, which is formed by all linear functions on E. The value of s ∈ E * at x ∈ E is denoted by s, x .In the particular case where E = R n , we have E = E * .Important elements of the dual space are gradients of differentiable functions f : Operator H : E → E * is positive semi-definite if Hu, u ≥ 0 for all u ∈ E. If the inequality is always strict for non-zero u, then H is called positive definite.These attributes are denoted as H 0 and H ≻ 0, respectively.By fixing a positive definite self-adjoint operator H : E → E * , we can define the following Euclidean norms If E = R n , then H is usually taken as the identity matrix H = I.The directional derivative of function f is defined in the usual way: More generally, for v 1 , . . ., v p ∈ E, we define D p f (x)[v 1 , . . ., v p ] the p-th directional derivative at x along directions v i ∈ E. In that way we define the tangent space associated with the linear subspace L ⊂ E.

Cones and their self-concordant barriers
Let K ⊂ E be a regular cone: K is closed convex, solid and pointed (i.e.contains no lines).We assume that K := int( K) ∅, where int( K) is the interior of K. Any such cone admits a self-concordant logarithmically homogeneous barrier h(x) with finite parameter value ν [59].
(a) h is a ν-self-concordant barrier for K, i.e., for all x ∈ K and u ∈ E We denote the set of ν-logarithmically homogeneous barriers by H ν (K).
Given h ∈ H ν (K), from [61,Thm 5.1.3]we know that for any x ∈ bd( K), any sequence (x k ) k≥0 with x k ∈ K and lim k→∞ x k = x satisfies lim k→∞ h(x k ) = +∞.For a pointed cone K, we have ν ≥ 1 and the Hessian H(x) ∇ 2 h(x) : E → E * is a positive definite linear operator defined by H(x)u, v D 2 h(x) [u, v] for all u, v ∈ E, see [61,Thm. 5.1.6].The Hessian gives rise to a local norm We also define a dual norm on E * as The Dikin ellipsoid is defined as the open set W(x; r) {u ∈ E| u − x x < r}, r > 0. The usage of the local norm adapts the unit ball to the local geometry of the set K. Indeed, the following classical result is key to the development of our methods.Lemma 2.2 (Theorem 5.1.5[61]).For all x ∈ K we have W(x; 1) ⊆ K. Proposition 2.3 (Theorem 5.1.9[61]).Let h ∈ H ν (K), x ∈ dom h, and a fixed direction d ∈ E. For all t ∈ [0, 1  d x ), with the convention that 1 d x = +∞ if d x = 0, we have: where ω(t) .
We will also use the following inequality for the function ω(t) [61,Lemma 5.1.5]: , t ∈ [0, 1). (2.6) We close this section with important examples of conic domains to which our method can be applied.
Example 2.1 (The exponential cone).Consider the exponential cone studied by [26] defined as with closure Kexp = cl(K exp ).This set admits a 3-LHSB We remark that this cone is not self-dual (cf.Definition 2.4), but . There are many convex sets that can be represented using the exponential cone; We list some example below, but refer to the PhD thesis [26] for further details.

Exploiting the structure of Symmetric Cones
Nesterov and Todd [60] introduced self-scaled barriers, which later have been realized as LHSCB's for symmetric cones.Such barriers are nowadays key to define primal-dual interior point methods for convex problems with potentially larger step sizes.Our method can also exploit the additional properties of self-scaled barriers, leading to potentially larger step sizes and faster convergence in our non-convex setting as well.For a given closed convex nonempty cone K, its dual cone is the closed convex and nonempty conce K * defined as K * {s ∈ 4. An open convex cone is said to be self-dual if K * = K.K is homogeneous if for all x, y ∈ K there exists a linear bijection G : E → E such that Gx = y and GK = K.An open convex cone K is called symmetric if it is self-dual and homogeneous.
The class of symmetric cones can be characterized within the language of Euclidean Jordan algebras [35][36][37]67].For optimization, the three symmetric cones of most relevance are KNN , KSOC and KSDP .

Definition 2.5 ([60])
We emphasize that B ν (K) ⊂ H ν (K).[47] showed that every symmetric cone admits a ν-SSB for some ν ≥ 1, while a characterization of the barrier parameter ν has been obtained in [44].The main advantage of working with SSB's instead of LHSCB's is that we can make potentially longer steps in the interior of the cone K towards the direction of its boundary.Let x ∈ K and d ∈ E. Denote d) ).Hence, if the scalar quantity σ x (d) can be computed efficiently, it would allow us to make a larger step without violating feasibility.

Unified Notation
Our algorithms work on any conic domain on which we can efficiently evaluate a ν-LHSCB.
We formalize this in the following assumption Assumption 2. K is a regular cone admitting an efficient barrier setup h ∈ H ν (K).By this we mean that at a given query point x ∈ K, we can construct an oracle that returns to us information about the values h(x), ∇h(x) and H(x) = ∇ 2 h(x), with low computational efforts.
Given the potential advantages when working on symmetric cones, it is useful to develop a unified notation handling both cases at the same time.Note that when h ∈ B ν (K), we have the flexibility to treat h either as h ∈ H ν (K) or as h ∈ B ν (K).To unify the presentation, we define (2.10) Note that Finally, for the Bregman divergence 3, Proposition 2.6 together with eq.(2.10), give us the one-and-forall Bregman bound

Approximate optimality conditions
The next definition specifies our notion of an approximate first-order KKT point for problem (Opt).
To justify this definition, let x * be a local solution of problem (Opt).Then, for δ > 0 sufficiently small, the point x * is the unique global solution to the perturbed optimization problem with ball restriction B(x * ; δ) {x ∈ E| x − x * ≤ δ}: Next, using the barrier h ∈ H ν (K), we absorb the constraint x ∈ K in the penalty µ k h(x), where µ k > 0, µ k ↓ 0 is a given sequence.This leads to the barrier formulation min From the classical theory of interior penalty methods [38], it is known that a global solution x k exists for this problem for all k and that cluster points of x k are global solutions of (3.4).Clearly, x k ∈ X ∩ B(x * , δ) for all k and x k → x * .Setting s k = −µ k ∇h(x k ), which belongs to K * by eq.(A.1), and exploiting the properties of the barrier function h ∈ H ν (K), we see , the restriction x k ∈ B(x * ; δ) will automatically hold for k sufficiently large.By the full-rank assumption, the first-order optimality conditions of problem (3.5) reads as Assuming twice continuous differentiability of f on X, our notion of an approximate second-order KKT point for problem (Opt) is defined as follows.
The first three conditions are the same as for the ε-KKT point.The last one can be justified as follows.Using the full-rank condition, the second-order optimality condition for problem (3.5) which is clearly implied by (3.9).
Remark 3.1.To compare our second-order condition with the ones previously formulated in the literature, we consider the particular case K = KNN as in [46,65] with the log-barrier setup giving Within this setup, our second-order condition (3.9) becomes, after multiplication by [H(x)] −1/2 = X from left and right, This is equivalent to Proposition 2(c) in [46], modulo our use of √ ε 2 instead of ε in [46], as well as equation (1.6d) in [65], modulo our use of √ ε 2 instead of ε H in [65]., there exists a constant a > 0 such that a tr(x • y) = x, y for all x, y ∈ K.In view of this relation, our complementarity notions could be specialized to the condition s • x ≤ ε.Hence, our approximate KKT conditions reduce to the ones reported in [5].In particular, for K = KNN we recover the standard complementary slackness condition s k i x k i → 0 as k → ∞ for all i, as in this case the Jordan product • gives rise to the Hadamard product.See [6] for more details.

On the relation to scaled critical points
In absence of differentiability at the boundary, a popular formulation of necessary optimality conditions involves the definition of scaled-critical points.Indeed, at a local minimizer x * , the scaled first-order optimality condition where the product is taken to be 0 when the derivative does not exist.Based on this characterization, one may call a point x ∈ K NN with |x i [∇ f (x)] i | ≤ ε for all i = 1, . . ., n and ε-scaled first-order point.Algorithms designed to produce ε-scaled first-order points, with some small ε > 0, have been introduced in [12] and [11].As reported in [46], there are several problems associated with this weak definition of a critical point.First, when derivatives are available on KNN , the standard definition of a critical point would entail the inclusion a condition that is absent in the definition of a scaled critical point.Second, scaled critical points come with no measure of strength, as they holds trivially when x = 0, regardless of the objective function.Third, there is a general gap between local minimizers and limits of ε-scaled first-order points, when ε → 0 + (see [46]).Similar remarks apply to the scaled second-order condition, considered in [11].Our definition of approximate KKT points overcome these issues.In fact, our definitions of approximate first-and second-order KKT points is continuous in ε, and therefore in the limit our approximate KKT points coincide with the classical first-and second-order KKT conditions for a local minimizer.This is achieved without assuming global differentiability of the objective function or performing an additional smoothing of the problem data as in [10,11].

A first-order Hessian-Barrier Algorithm
In this section we introduce a first-order potential reduction method for solving (Opt) that uses a barrier h ∈ H ν (K) and potential function (1.2).We assume that we are able to compute an approximate analytic center at low computational cost.Specifically, our algorithm relies on the availability of a ν-analytic center, i.e. a point x 0 ∈ X such that To obtain such a point x 0 , one can apply interior point methods to the convex programming problem min x∈X h(x).Moreover, since ν ≥ 1 we do not need to solve it with high precision, making the application of computationally cheap first-order method, such as [33], an appealing choice for this preprocessing step.

Local properties
Given x ∈ X, define the set of feasible directions as Hence, for x ∈ K, we can equivalently characterize the set T x as Our complexity analysis relies on the ability to control the behavior of the objective function along the set of feasible directions and with respect to the local norm.
and there exists a constant M > 0 such that for all x ∈ X and v ∈ T x we have Remark 4.1.If the set X is bounded, we have λ min (H(x)) ≥ σ for some σ > 0. In this case, assuming f has an M-Lipschitz continuous gradient, the classical descent lemma [61] implies Assumption 3. Indeed, Remark 4.2.We emphasize that the local Lipschitz smoothness condition (4.3) does not require global differentiability.Consider the composite non-smooth and non-convex model (1.1) on KNN , with ϕ(s) = s for s ≥ 0. This means n i=1 ϕ(x p i ) = x p p for p ∈ (0, 1) and x ∈ KNN .As a concrete example for the smooth part of the problem let us consider the L 2 -loss ℓ(x) = 1 2 Nx − p 2 .This gives rise to the L 2 − L p minimization problem, an important optimization formulation arising in phase retrieval, mathematical statistics, signal processing and image recovery [27,39,40,55].
Since t → t p is concave for t > 0 and p ∈ (0, 1), we have Adding all these inequalities together, we immediately arrive at condition (4.3) in terms of the Euclidean norm.Over a bounded feasible set X, Remark 4.1 makes it clear that this implies Assumption 3. At the same time, f is not differentiable at zero.
We emphasize that in Assumption 3 the constant M is in general either unknown or is a very conservative upper bound.Therefore, adaptive techniques should be used to estimate it and are likely to improve the practical performance of the method.
Considering x ∈ X, v ∈ T x and combining eq. ( 4.3) with eq.(2.13) (with d = v and ) reveals a suitable quadratic model, to be used in the design of our first-order algorithm.Lemma 4.1 (Quadratic Overestimation).For all x ∈ X, v ∈ T x and L ≥ M, we have

Algorithm description and its complexity
Let x ∈ X be given.Our first-order method employs a quadratic model Q (1)  µ (x, v) to compute a search direction v µ (x), given by For the above problem, we have the following system of optimality conditions involving the dual variable y µ (x) ∈ R m : Since H(x) ≻ 0 for x ∈ X, any standard solution method [64] can be applied for the above linear system.Moreover, this system can be solved explicitly.Indeed, since H(x) ≻ 0 for x ∈ X, and A has full column rank, the linear operator To give some intuition behind this expression, observe that we can give an alternative representation of S x as This shows that S x is just the • x -orthogonal projection operator onto ker(A[H(x)] −1/2 ).Hence, we can always find a scalar t > 0 such that tv µ (x) ∈ L 0 and tv µ (x) x < 1.Any such scalar will be a suitable candidate for a step size.To determine an acceptable step-size, consider a point x ∈ X, the search direction v µ (x) gives rise to a family of parameterized arcs x + (t) x+tv µ (x), where t ≥ 0. Our aim is to choose this step-size to ensure feasibility of the iterates and decrease of the potential.By (2.12) and (4.7), we know that x + (t) ∈ X for all ). Multiplying (4.6) by v µ (x) and using (4.7), we obtain x .Choosing t ∈ I x,µ , we bound Therefore, if tζ(x, v µ (x)) ≤ 1/2, we readily see from (4.4) that The function t → η x (t) is strictly concave with the unique maximum at 1 M+2µ , and two real roots at t ∈ 0, 2 M+2µ .Thus, maximizing the per-iteration decrease η x (t) under the restriction 0 ≤ t ≤ 1 2ζ(x,v µ (x)) , we choose the step-size .
This step-size rule, however, requires knowledge of the parameter M. To boost numerical performance, we employ a backtracking scheme in the spirit of [62] to estimate the constant M at each iteration.This procedure generates a sequence of positive numbers (L k ) k≥0 for which the local Lipschitz smoothness condition (4.3) holds.More specifically, suppose that x k is the current position of the algorithm with the corresponding initial local Lipschitz estimate L k and v k = v µ (x k ) is the corresponding search direction.To determine the next iterate x k+1 , we iteratively try step-sizes α k of the form t µ, (4.11).This process must terminate in finitely many steps, since when 3) with M changed to 2 i k L k , i.e., (4.11), follows from Assumption 3.
Combining the search direction finding problem (4.5) with the just outlined backtracking strategy, yields an Adaptive first-order Hessian-Barrier Algorithm (AHBA, Algorithm 1).
Our main result on the iteration complexity of Algorithm 1 is the following Theorem, whose proof is given in Section 4.3.
Remark 4.3.The line-search process of finding the appropriate i k is simple since only recalculating z k is needed, and repeatedly solving problem (4.9) is not required.Furthermore, the sequence of constants L k is allowed to decrease along subsequent iterations, which is achieved by the division by the constant factor 2 in the final updating step of each iteration.This potentially leads to longer steps and faster decrease of the potential.
Algorithm 1: Adaptive first-order Hessian-Barrier Algorithm -AHBA(µ, ε, L 0 , x 0 ) ) and the corresponding dual variable y k y µ (x k ) as the solution to , where ζ(•, •) as in (2.10) (4.10) ; Remark 4.4.Since ν ≥ 1, f (x 0 ) − f min (X) is expected to be larger than ε, and the constant M is potentially large, we see that the main term in the complexity bound (4.12) is = O( ν 2 ε 2 ), i.e. has the same dependence on ε as the standard complexity bounds [18,19,53] of first-order methods for non-convex problems under the standard Lipschitz-gradient assumption, which on bounded sets is subsumed by our Assumption 3. Further, if the function f is quadratic, Assumption 3 holds with M = 0 and we can take L 0 = 0.In this case, the complexity bound (4.12) improves to O Just like classical interior-point methods, the iteration complexity of AHBA depends on the barrier parameter ν ≥ 1.For conic domains, the characterization of this barrier parameter has thus been an active research line.[44] demonstrated that for symmetric cones, the barrier parameter is equivalent to algebraic properties of the cone and identified it with the rank of the cone (see [35] for a definition of the rank of a symmetric cone).This deep analysis gives an exact characterization of the optimal barrier parameter for the most important conic domains in optimization.For K NN and K SDP , it is known that ν = n is optimal, whereas for K SOC the optimal barrier parameter is ν = 2 (and therefore independent of the ambient dimension n).
Connection with interior point flows on polytopes.Consider K = KNN , and X = K NN ∩L.We are given a function f : X → R which is the restriction of a smooth function f : R n → R. The canonical barrier for this setting is h Applying our first-order method on this domain gives the search direction v µ (x) = −S x ∇F µ (x) = −X(I − XA ⊤ (AX 2 A ⊤ ) −1 AX)X∇F µ (x).This explicit formula yields various interesting connections between our approach and classical methods.For A = 1 ⊤ n , the feasible set X reduces to the relative interior of the (n − 1)dimensional unit simplex.In this case, the vector field v µ (•) simplifies further to x and µ = 0, we further obtain from this formula the search direction employed in affine scaling methods for linear programming [1,7,8,69].[16] partly motivated their algorithm as a discretization of the Hessian-Riemannian gradient flows introduced in [4] and [15].Heuristically, we can therefore interpret AHBA as an Euler discretization (with non-monotone adaptive step-size policies) of the gradient-like flow ẋ(t) = −S x(t) ∇F µ (x(t)), which resembles very much the class of dynamical systems introduced in [15].This gives an immediate connection to a large class of interior point flows on polytopes, heavily studied in control theory [48].

Proof of Theorem 4.2
Our proof proceeds in several steps.First, we show that procedure AHBA(µ, ε, L 0 , x 0 ) produces points in X, and, thus, is indeed an interior-point method.Next, we show that the line-search process of finding appropriate L k in each iteration is finite, and estimate the total number of trials in this process.Then we enter the core of our analysis where we prove that if the stopping criterion does not hold at iteration k, i.e. v k x k ≥ ε ν , then the objective f is decreased by a quantity O(ε 2 ), and, since the objective is globally lower bounded, we conclude that the method stops in at most O(ε −2 ) iterations.Finally, we show that when the stopping criterion holds, the method has generated an ε-KKT point.

Bounding the number of backtracking steps
Let us fix iteration k.Since the sequence 2 i k L k is increasing as i k is increasing, and Assumption 3 holds, we know that when 2 i k L k ≥ max{M, L k }, the line-search process for sure stops since inequality (4.11) holds.Hence, 2 i k L k ≤ 2 max{M, L k } must be the case, and, consequently, L k+1 = 2 i k −1 L k ≤ max{M, L k }, which, by induction, gives L k+1 ≤ M max{M, L 0 }.At the same time, log 2 Let N(k) denote the number of inner line-search iterations up to the k−th iteration of AHBA(µ, ε, L 0 , x 0 ).Then, using that This shows that on average the inner loop ends after two trials.

Per-iteration analysis and a bound for the number of iterations
Let us fix iteration counter k.Since L k+1 = 2 i k −1 L k , the step-size (4.10) reads as where we used that α k ≤ 1 2(L k+1 +µ) in the last inequality.Substituting into (4.13) the two possible values of the step-size α k in (4.10) gives Recalling L k+1 ≤ M (see section 4.3.2),we obtain that Rearranging and summing these inequalities for k from 0 to K − 1 gives where we used that, by the assumptions of Theorem 4.2, x 0 is a ν-analytic center defined in (4.1) and µ = ε/ν, implying that h(x 0 ) − h(x K ) ≤ ν = ε/µ.Thus, up to passing to a subsequence, δ k → 0, and consequently v k x k → 0 as k → ∞.This shows that the stopping criterion in Algorithm 1 is achievable.
Assume now that the stopping criterion v k x k < ε ν does not hold for K iterations of AHBA.Then, for all k = 0, . . ., K − 1, it holds that δ k ≥ min ε 4ν , ε 2 4ν 2 ( M+µ) .Together with the parameter coupling µ = ε ν , it follows from (4.16) that Hence, recalling that M = max{M, L 0 }, i.e., the algorithm stops for sure after no more than this number of iterations.This, combined with the bound for the number of inner steps in Section 4.3.2,proves the first statement of Theorem 4.2.

Generating ε-KKT point
To finish the proof of Theorem 4.2, we now show that when Algorithm 1 stops for the first time, it returns a 2ε-KKT point of (Opt) according to Definition 3.1.
Let the stopping criterion hold at iteration k.By the optimality condition (4.6) and the definition of the potential (1.2), we have Denoting k −µ∇h(x k ), multiplying both equations, and using the stopping criterion v k x k < ε ν , we conclude Whence, setting s k ∇ f (x k ) − A * y k ∈ E * , we get, by the definition of the dual norm, where in the last equality we used that since . Thus, we arrive at where in the last equality we used that, by the assumptions of Theorem 4.2, µ = ε ν .Thus, since, by (A.1), k = −µ∇h(x k ) ∈ K * , we get that s k ∈ K * .By construction, x k ∈ K and Ax k = b.
where the last inequality uses ν ≥ 1.Hence, the complementarity condition (3.3) holds as well.This finishes the proof of Theorem 4.2.

Discussion
Strengthened KKT condition.For K = KNN , [46] consider a first-order potential reduction method employing the standard log-barrier h(x) = − n i=1 ln(x i ) using a trustregion subproblem for obtaining the search direction.For x ∈ K NN , we have ∇h and the stopping criterion of Algorithm 1 at iteration k, saying that v k x k < ε ν , we see Therefore, since µ = ε/n and By Remark 4.4, these inequalities are achieved after and they are seen to be by the factor 1 n sharper than the complementarity measure employed in [46].Conversely, in order to attain an approximate KKT point with the same strength as in [46], the above calculations suggest that we can weaken our tolerance , and a complementarity measure X k s k ∞ ≤ 2ε.Thus, in the particular case of non-negativity constraints our general algorithm is able to obtain results similar to [46], but under weaker assumptions.At the same time, our algorithm ensures a stronger measure of complementarity.Indeed, our algorithm guarantees that i.e., x k , s k ≥ 0, and approximate complementary 0 iterations, which is stronger than max 1≤i≤n |x k i s k i | ≤ 2ε guaranteed by [46].
, and both equalities are achievable.
Moreover, to match our stronger guarantee, one has to change ε → ε/n in the complexity bound of [46], which leads to the same O Mn 2 ( f (x 0 )− f min (X)) ε 2 complexity bound.Besides this important insights, our algorithm is designed for general cones, rather than only for KNN .Therefore, we provide a unified approach for essentially all conic domains of relevance in optimization.Finally, our method does not rely on the trust-region techniques as in [46] that may slow down the convergence in practice since the radius of the trust region is no grater than O(ε) leading to short steps.
Exploiting problem structure.In (4.14) we can clearly observe the benefit of the use of ν-SSB in our algorithm, whenever K is a symmetric cone.Indeed, when α k = 1 2ζ(x k ,v k ) , the per-iteration decrease of the potential is The role of the potential function.Next, we discuss more explicitly, how the algorithm and complexity bounds depend on the parameter µ.The first observation is that from (4.21), to guarantee that s k ∈ K * , we need the stopping criterion to be v k x k < µ, which by (4.22) leads to the error 2µν in the complementarity conditions.From the analysis following equation (4.16), we have that Whence, recalling that M = max{M, L 0 }, Thus, we see that after O(µ −2 ) iterations the algorithm finds a (2µν)-KKT point, and if µ → 0, we have convergence to a KKT point, but the complexity bound tends to infinity and becomes non-informative.At the same time, as it is seen from (4.9), when µ → 0, the algorithm itself converges to a preconditioned gradient method since F µ (x) = f (x)+µh(x) → f (x).We also see from the above explicit expressions in terms of µ that the design of the algorithm requires careful balance between the desired accuracy of the approximate KKT point expressed mainly by the complementarity condition, stopping criterion, and complexity.Moreover, the step-size should be also taken carefully to ensure the feasibility of the iterates, and the standard for first-order methods step-size 1/M may not work.

Anytime convergence via restarting AHBA
The analysis of Algorithm 1 is based on the a-priori fixed tolerance ε > 0 and the parameter coupling µ = ε/ν.This coupling allows us to embed Algorithm 1 within a restarting scheme featuring a decreasing sequence {µ i } i≥0 , followed by restarts of AHBA.This restarting strategy frees Algorithm 1 from hard-coded parameters and connects it well to traditional barrier methods.
To describe this double-loop algorithm, we fix ε 0 > 0 and select the starting point x 0 0 as a ν-analytic centre of X with respect to h ∈ H ν (K).We let i ≥ 0 denote the counter for the restarting epochs at the start of which the value µ i is decreased.In epoch i, we generate a sequence {x k i } K i k=0 by calling AHBA(µ i , ε i , L (i) 0 , x 0 i ) until the stopping condition is reached.This will take at most K I (ε i , x 0 i ) iterations, specified in eq.(4.12).We store the last iterate xi = x K i i and the last estimate of the Lipschitz modulus Mi = L (i) 0 , x 0 i ) and then restart the algorithm using the "warm starts" is the target accuracy of the final solution, it suffices to perform ⌈log 2 (ε 0 /ε)⌉ + 1 restarts since, by construction, From the analysis following equation (4.16), we have that Further, using the fact that µ i is a decreasing sequence and (4.1), it is easy to deduce ) for all the performed restarts i = 0, ..., I − 1 and rearranging the terms, we obtain Moreover, based on our updating choice Hence, where Finally, we obtain that the total number of iterations of procedures AHBA(µ i , ε i , L (i) 0 , x 0 i ), 0 ≤ i ≤ I − 1, to reach accuracy ε is at most .

A second-order Hessian-Barrier Algorithm
In this section we introduce a second-order potential reduction method for problem (Opt) under the assumption that the second-order Taylor expansion of f on the set of feasible directions T x defined in (4.2) is sufficiently accurate in the geometry induced by h ∈ H ν (K).
Assumption 4 (Local second-order smoothness).f : E → R ∪ {+∞} is twice continuously differentiable on X and there exists a constant M > 0 such that, for all x ∈ X and v ∈ T x , we have (5.1) A sufficient condition for (5.1) is the following local counterpart of the global Lipschitz condition on the Hessian of f : where B op,x sup u: is the induced operator norm for a linear operator B : E → E * .Indeed, this condition implies (5.1): Further, (5.1) in turn implies another important estimate Indeed, for all x ∈ X and v T x , Remark 5.1.Assumption 4 subsumes, when X is bounded, the standard Lipschitz-Hessian setting since if the Hessian of f is Lipschitz with modulus M with respect to the Euclidean norm, we have by [62,Eq. (2.2)].
Since X is bounded, one can observe that λ max ([H(x)] −1 ) −1 = λ min (H(x)) ≥ σ for some σ > 0, and (5.1) holds.Indeed, denoting Remark This means, for all x, x + ∈ X, we have As penalty function, we again consider the L p regularizer with p ∈ (0, 1).For any t, s > 0, one has Assuming that X is bounded, there exists a universal constant Combining with Remark 5.1, we obtain a cubic overestimation as in eq. ( 5.3).Importantly, f (x) is not differentiable for x ∈ {x i = 0, for some i}.
We emphasize that in Assumption 4 the constant M is in general unknown or may be a conservative upper bound.Therefore, adaptive techniques should be used to estimate it and are likely to improve the practical performance of the method.Assumption 4 also implies, by (5.3) and (2.13) (with d = v and t = 1 ), the following upper bound for the potential function F µ .Lemma 5.1 (Cubic Overestimation).For all x ∈ X, v ∈ T x and L ≥ M, we have (5.4)

Algorithm description and its complexity theorem
Let x ∈ X be given.In order to find a search direction, we choose a parameter L > 0, construct a cubic-regularized model of the potential F µ (1.2), and minimize it on the linear subspace L 0 : where by Argmin we denote the set of global minimizers.The model consists of three parts: linear approximation of h, quadratic approximation of f , and a cubic regularizer with penalty parameter L > 0. Since this model and our algorithm use the second derivative of f , we call it a second-order method.Our further derivations rely on the first-order optimality conditions for the problem (5.5), which say that there exists y µ,L (x) ∈ R m such that v µ,L (x) satisfies −Av µ,L (x) = 0. (5.7) We also use the following extension of [62, Prop.1] to our setting with the local norm induced by H(x).
Proposition 5.2.For all x ∈ X it holds Proof.The proof follows the same strategy as Lemma 3.2 in [21].Let {z 1 , . . ., z p } be an orthonormal basis of L 0 and the linear operator Z : R p → L 0 be defined by Zw = p i=1 z i w i for all w = [w 1 ; . . .; w p ] ⊤ ∈ R p .With the help of this linear map, we can absorb the null-space restriction, and formulate the search-direction finding problem (5.5) using the projected data (5.9) We then arrive at the cubic-regularized subproblem to find u L ∈ R p s.t.
where • H is the norm induced by the operator H. From [62, Thm.10] we deduce Denoting v µ,L (x) = Zu L , we see x H(x) ≻ 0 over the null space L 0 = {v ∈ E : Av = 0}.The above proposition gives some ideas on how one could numerically solve problem (5.5) in practice.In a preprocessing step, we once calculate matrix Z and use it during the whole algorithm execution.At each iteration we calculate the new data using (5.9), leaving us with a standard unconstrained cubic subproblem (5.10).[62] show how such problems can be transformed to a convex problem to which fast convex programming methods could in principle be applied.However, we can also solve it via recent efficient methods based on Lanczos' method [21,52].Whatever numerical tool is employed, we can recover our search direction by v µ,L (x) by the matrix vector product Zu L in which u L denotes the solution obtained from this subroutine.
Our next goal is to construct an admissible step-size policy, given the search direction v µ,L (x).Let x ∈ X be the current position of the algorithm.Define the parameterized family of arcs x + (t) x + tv µ,L (x), where t ≥ 0 is a step-size.By (2.12) and since v µ,L (x) ∈ L 0 by (5.7), ).For all such t, Lemma 5.1 yields (5.11) Since v µ,L (x) ∈ L 0 , multiplying (5.8) with v µ,L (x) from the left and the right, and multiplying (5.6) by v µ,L (x) and combining with (5.7), we obtain ) Under the additional assumption that t ≤ 2 and L ≥ M, we obtain Substituting this into (5.11),we arrive at for all t ∈ I x,µ,L .Therefore, if tζ(x, v µ,L (x)) ≤ 1/2, we readily see Maximizing the above function η x (t) and finding a lower bound for its optimal value is technically quite challenging.Instead, we adopt the following step-size rule Note that t µ,L (x) ≤ 1 and t µ,L (x)ζ(x, v µ,L (x)) ≤ 1/2.Thus, this choice of the step-size is feasible to derive (5.14).Just like Algorithm 1, our second-order method employs a line-search procedure to estimate the Lipschitz constant M in (5.1), (5.3) in the spirit of [22,62].More specifically, suppose that x k ∈ X is the current position of the algorithm with the corresponding initial local Lipschitz estimate M k .To determine the next iterate x k+1 , we solve problem (5.5) with Then, we check whether the inequalities (5.1) and ( 5. (5.19) and (5.18).If they hold, we make a step to x k+1 .Otherwise, we increase i k by 1 and repeat the procedure.Obviously, when L k = 2 i k M k ≥ M, both inequalities (5.1) and (5.3) with M changed to L k , i.e., (5.19) and (5.18), are satisfied and the line-search procedure ends.For the next iteration we set L}, so that the estimate for the local Lipschitz constant on the one hand can decrease allowing larger step-sizes, and on the other hand is bounded from below.The resulting procedure gives rise to a Second-order Adaptive Hessian-Barrier Algorithm (SAHBA, Algorithm 3).Our main result on the iteration complexity of Algorithm 3 is the following Theorem, whose proof is given in Section 5.2.Theorem 5.3.Let Assumptions 1, 2, and 4 hold.Fix the error tolerance ε > 0, the regularization parameter µ = ε 4ν , and some initial guess M 0 > 144ε for the Lipschitz constant.Let (x k ) k≥0 be the trajectory generated by SAHBA(µ, ε, M 0 , x 0 ), where x 0 is a 4ν-analytic center satisfying (4.1).Then the algorithm stops in no more than outer iterations, and the number of inner iterations is no more than 2(K II (ε, x 0 )+1)+2 max{log 2 (2M/M 0 ), 1}.Moreover, the output of SAHBA(µ, ε, M 0 , x 0 ) constitute an (ε, max{M,M 0 }ε 8ν )-2KKT point for problem (Opt) in the sense of Definition 3.2.Remark 5.3.Since f (x 0 )− f min (X) is expected to be larger than ε, and the constant M is potentially large, we see that the main term in the complexity bound (5.20) is O [18,19] to find an (ε 1 , ε 2 )-2KKT point for arbitrary ε 1 , ε 2 > 0, is known to be optimal for unconstrained smooth non-convex optimization by second-order methods under the standard Lipschitz-Hessian assumption.It can be easily obtained from our theorem by setting

Proof of Theorem 5.3
The main steps of the proof are similar to the analysis of Algorithm 1.We start by showing the feasibility of the iterates and correctness of the line-search process.Next, we analyze Algorithm 3: Second-order Adaptive Hessian-Barrier Algorithm -SAHBA(µ, ε, M 0 , x 0 ) , where ζ(•, •) as in (2.10). (5.17) the per-iteration decrease of F µ and f and show that if the stopping criterion does not hold at iteration k, then the objective function is decreased by the value O(ε 3/2 ).From this, since the objective is globally lower bounded, we conclude that the algorithm stops in O(ε −3/2 ) iterations.Finally, we show that when the stopping criterion holds, the primal-dual pair (x k , y k−1 ) resulting from solving the cubic subproblem (5.16) yields a dual slack variable s k such that this triple constitutes an second-order KKT point.

Interior point property of the iterates
By construction x 0 ∈ X. Proceeding inductively, let x k ∈ X be the k-th iterate of the algorithm, with the search direction v k ≡ v µ,L (x k ).By eq.(5.17), the step-size α k satisfies for all k ≥ 0, and using (2.12) as well as Av k = 0, we have that x k+1 = x k + α k v k ∈ X.By induction, it follows that x k ∈ X for all k ≥ 0.

Bounding the number of backtracking steps
To bound the number of cycles involved in the line-search process for finding appropriate constants L k , we proceed as in Section 4.3.2.Let us fix an iteration k.The sequence L k = 2 i k M k is increasing as i k is increasing, and Assumption 4 holds.This implies (5.3), and thus when L k = 2 i k M k ≥ max{M, M k }, the line-search process for sure stops since inequalities (5.18) and (5.19) hold.Hence, L k = 2 i k M k ≤ 2 max{M, M k } must be the case, and, consequently, M k+1 = max{L k /2, L} ≤ max{max{M, M k }, L} = max{M, M k }, which, by induction, gives M k ≤ M ≡ max{M, M 0 } and L k ≤ 2 M. At the same time, by construction, Let N(k) denote the number inner line-search iterations up to iteration k of SAHBA.Then, since L k ≤ 2 M = 2 max{M, M 0 } in the last step.Thus, on average, the inner loop ends after two trials.

Per-iteration analysis and a bound for the number of iterations
Let us fix iteration counter k.The main assumption of this subsection is that the stopping criterion is not satisfied, i.e. either v k Without loss of generality, we assume that the first inequality holds, i.e., v k x k ≥ ∆ k , and consider iteration k.Otherwise, if the second inequality holds, the same derivations can be made considering the iteration k − 1 and using the second inequality v k−1 x k−1 ≥ ∆ k−1 .Thus, at the end of the k-th iteration Since the step-size (5.15) and a remark after it), we can repeat the derivations of Section 5.1, changing (5.3) to (5.18).In this way we obtain the following counterpart of (5.14 .22)where in the last inequality we used that α k ≤ 1 by construction.Substituting µ = ε 4ν , and using (5.21), we obtain using that, by construction, L k = 2 i k M k ≥ L = 144ε and that ν ≥ 1.Hence, from (5.22), (5.23) Substituting into (5.23) the two possible values of the step-size α k in (5.17) gives (5.24) This implies Rearranging and summing these inequalities for k from 0 to K − 1, and using that L k ≥ L, we obtain where we used that, by the assumptions of Theorem 5.3, x 0 is a 4ν-analytic center defined in (4.1) and µ = ε 4ν , implying that h(x 0 ) − h(x K ) ≤ 4ν = ε/µ.Thus, up to passing to a subsequence, we have v k x k → 0 as k → ∞, which makes the stopping criterion in Algorithm 3 achievable.
Assume now that the stopping criterion does not hold for K iterations of SAHBA.Then, for all k = 0, . . ., K − 1, it holds that Thus, from (5.26).
Hence, reacalling that M = max{M 0 , M}, K ≤ , i.e. the algorithm stops for sure after no more than this number of iterations.This, combined with the bound for the number of inner steps in Section 5.2.2, proves the first statement of Theorem 5.3.

Generating a (ε 1 , ε 2 )-2KKT point
In this section, to finish the proof of Theorem 5.3, we show that if the stopping criterion in Algorithm 3 holds, i.e. v k−1 x k−1 < ∆ k−1 and v k x k < ∆ k , then the algorithm has generated an (ε 1 , ε 2 )-2KKT point of (Opt) according to Definition 3.2, with ε 1 = ε and ε 2 = max{M 0 , M}ε 8ν .Let the stopping criterion hold at iteration k.Using the first-order optimality condition (5.6) for the subproblem (5.16) solved at iteration k−1, there exists a dual variable y k−1 ∈ R m such that (5.6) holds.Now, expanding the definition of the potential (1.2) and adding ∇ f (x k ) to both sides, we obtain Setting Multiplying both of the above equalities, we arrive at Taking the square root and applying the triangle inequality, we obtain Since the stopping criterion holds, at iteration k − 1 we have where we used that, by construction, L k−1 ≥ L = 144ε and that ν ≥ 1.Hence, by (5.17), we have that α k−1 = 1 and x k = x k−1 + v k−1 .This, in turn, implies that (5.29) Now we follow the analysis of the first-order method by noting that s (5.30) Thus, since, by (A.1), k−1 = −µ∇h(x k−1 ) ∈ K * , we get that s k ∈ K * .By construction, x k ∈ K and Ax k = b.Thus, (3.6) holds.We also have that, by construction, ∇ f (x k )−A * y k−1 −s k = 0 ≤ ε, meaning that (3.7) holds with ε 1 = ε.To finish the analysis of the first-order condition, it remains to check the complementarity condition (3.8).We have We estimate each of the two terms in the r.h.s.separately.First, (5.29),(A.5),(A.4) Summing up, using the stopping criterion i.e., (3.8) holds with ε 1 = ε.Finally, we show the second-order condition (3.9).By inequality (5.8) for subproblem (5.16) solved at iteration k, we obtain on L 0 where we used the second part of the stopping criterion, i.e. v k x k < ∆ k and that L k ≤ 2 M = 2 max{M, M 0 } (see Section 5.2.2).Thus, (3.9) holds with ε 2 = max{M,M 0 }ε 8ν , which finishes the proof of Theorem 5.3.

Discussion
Strengthened KKT condition.As in Section 4.4, our aim in this section is to compare our result with those available in the contemporary literature.We therefore onsider the special case K = KNN , endowed with the standard log-barrier h(x) = − n i=1 ln(x i ).Recall that for this barrier setup we have ∇h Assume that the stopping criterion applies at iteration k.Using the first-order optimality condition (5.6) for the subproblem (5.16) solved at iteration k − 1 and expanding the definition of the potential (1.2), there exists a dual variable y k−1 ∈ R m such that (5.6) holds, i.e., Multiplying both sides by H(x k−1 ) −1/2 , using the stopping criterion Whence, since µ = ε 4n , the above bound (5.33) combined with the triangle inequality yields Let V k−1 = diag(v k−1 ).Using the fact that x k = x k−1 + v k−1 shown after (5.28), we obtain Let us estimate each of the four terms I − IV, using two technical facts (B.1), (B.2) proved in Appendix B. We have: where we have used x k = z k−1 = x k−1 +v k−1 in bounding II, and the last bound for expression III uses v k−1 x k−1 < 1, which is implied by eq.(5.28).Finally, we also obtain Summarizing, we arrive at Further, by Theorem 5.3, we have that By Remark 5.3, these inequalities are achieved after In contrast, the second-order algorithm of [46] requires an additional assumption that the level set of the objective f is bounded in the L ∞ -norm, gives a slightly worse guarantee denoting the L ∞ upper bound of the level set corresponding to x 0 ).We also can repeat the same remark as in Section 4.4 that our measure of complementarity 0 ≤ s k , x k ≤ ε is stronger than max 1≤i≤n |x k i s k i | used in [46,65].Furthermore, our algorithm is applicable to general cones admitting an efficient barrier setup, rather than only for KNN .For more general cones we can not use the coupling H(x) − 1 2 = X, which was seen to be very helpful in the derivations of the bound (5.35) above.Thus, to deal with general cones, we had to find and exploit suitable properties of the barrier class H ν (K) and develop a new analysis technique that works for general, potentially non-symmetric, cones.Finally, our method does not rely on the trust-region techniques as in [46] that may slow down the convergence in practice since the radius of the trust region is no grater than O( √ ε) leading to short steps.
Exploiting problem structure.We note that in (5.24) we can clearly observe the benefit of the use of ν-SSB in our algorithm.When α k = 1 2ζ(x k ,v k ) , the per-iteration decrease of the potential is Dependence on parameters.Next, we discuss more explicitly, how the algorithm and complexity bounds depend on the parameter µ.The first observation is that from (5.30), to guarantee that s k ∈ K * , we need the stopping criterion to be v k−1 x k−1 < ∆ k−1 = µ/L k−1 , which by (5.31) leads to the error 4µν in the complementarity conditions and by (5.32) leads to the error µ/ M in the second-order condition.From the analysis following equation (5.24), we have that K µ 3/2 24 √ M ≤ f (x 0 ) − f min (X) + µν.
Whence, recalling that M = max{M, M 0 }, Thus, we see that after O(µ −3/2 ) iterations the algorithm finds a (4µν, µ/ M)-KKT point, and if µ → 0, we have convergence to a KKT point, but the complexity bound tends to infinity and becomes non-informative.At the same time, as it is seen from (5.16), when µ → 0, the algorithm resembles a cubic-regularized Newton method, but with the regularization with the cube of the local norm.We also see from the above explicit expressions in terms of µ that the design of the algorithm requires careful balance between the desired accuracy of the approximate KKT point expressed mainly by the complementarity conditions, stopping criterion, and complexity.Moreover, the step-size must be selected carefully to ensure the feasibility of the iterates.

Conclusion
We derived Hessian-barrier algorithms based on first-and second-order information on the objective f .We performed a detailed analysis of their worst-case iteration complexity in order to find a suitably defined approximate KKT point.Under weak regularity assumptions and in presence of general conic constraints, our Hessian-barrier algorithms share the best known complexity rates in the literature for first-and second-order approximate KKT points.Our methods are characterized by a decomposition approach of the feasible set which leads to numerically efficient subproblems at each their iteration.Several open questions for the future remain.First, our iterations assume that the subproblems are solved exactly, and for practical reasons this should be relaxed.Second, we mentioned that AHBA can be interpreted as a discretization of the Hessian-barrier gradient system [4], but the exact relationship is not explored yet.This, however, could be an important step towards understanding acceleration techniques of AHBA, akin to accelerated methods for the cubic regularized Newton method.Furthermore, the cubic-regularized version has no corresponding continuous-time version yet.It will be very interesting to investigate this question further.Additionally, the question of convergence of the trajectory (x k ) k≥0 generated by either scheme is open.Another interesting direction for future research would be to allow for higher-order Taylor expansions in the subproblems in order to boost convergence speed further, similar to [25].