1 Introduction

Motivation In the end of the last century, Polynomial-Time interior-point methods (IPM) were the most popular tools for solving convex optimization problems. Starting from the famous papers by Karmarkar [7], Renegar [8], Gonzaga [5], and many others, these methods completely changed our abilities in solving Linear Optimization Problems. The further extension onto nonlinear optimization problems was done by the theory of self-concordant functions [15]. Due to this theory, it became possible to develop linearly convergent methods for Linear Matrix Inequalities and other important classes of structured convex optimization problems (see [2, 12]).

However, the machinery underlying the general IPM is not simple. The most efficient methods require rewriting the initial problem into the primal-dual conic form (see [10, 17]). Moreover, the process of solution of these problem has to be divided on two or even three stages, which makes the practical implementation of the new methods a nontrivial task.

The algorithmic complexity of IPM and their high iteration cost were the main factors in switching the research priorities onto much simpler gradient-type methods. Development of the smoothing technique [13], supported by the demands of Big Data, opened for these optimization schemes new possibilities of acceleration, which overpass the limits of the classical Complexity Theory [11]. In parallel, we have seen intensive development of the new second-order methods [6, 9, 14, 16], which now benefit from the global complexity analysis even for nonconvex problems [1, 4].

However, an unavoidable drawback of the first- and second-order Black Box Schemes consists in their sublinear global rate of convergence on general classes of convex problems. Since in the most practical applications the internal structure of convex problem is quite visible, these methods compete with linearly convergent IPM, which use this structure for creating a powerful descriptor of the problem, the self-concordant barrier of the feasible set.

In this paper, we revisit some basic elements of the theory of self-concordant functions, having in mind simplification of the existing IPM. This theory is based on two concepts, the notions of self-concordant function and notion of self-concordant barrier. The first one is necessary for a proper description of the behavior of Newton Method, and the second one is responsible for the global polynomial-time complexity of IPM. Both concepts are local in the sense that they assume some relations between directional derivatives of convex function, computed at the same point.

In our presentation, we replace the notion of self-concordant barrier by a new concept of set-limited function, which requires boundedness of the variation of the gradient with respect to the current point. This condition is clearly global and it is much easier for verification. Thus, we significantly increase the class of good barriers. On the other hand, using a new line of arguments, we show that the polynomial-time complexity of the corresponding schemes is preserved.

Our second development is the Greedy Path-Following Method. In the standard framework of IPM, it is necessary to follow the central path, which starts in a close neighborhood of the analytic center of feasible set. Thus, usually a preliminary stage is needed for approaching this center (e.g. Section 4.2 in [12]). In our new method, we start moving towards the optimum immediately from the starting point, by following a deviated path. We present some simple characteristics of the starting point ensuring the polynomial-time complexity of this procedure.

As important application examples, we consider problems of unconstrained minimization, where we know barriers for the epigraph of the objective function. We show how to choose the starting point in the epigraphs, which ensure the global linear rate of convergence of the corresponding methods. Note that our scheme updates some objects in the epigraph of the objective function, similarly to the methods based on the overestimating technique (e.g. [3, 16]). However, in contrast to them, our new method benefits from the global linear rate of convergence even for nonsmooth functions.

Contents In Sect. 2, we introduce the notion of a proper set-limited function, which replaces in our presentation the notion of self-concordant barrier. The class of such functions is much wider. However, we prove that the main properties of self-concordant barrier are preserved.

In Sect. 3, we describe the Greedy Path-Following Method, which follows the deviated path, starting from an arbitrary feasible point. We highlight the conditions ensuring polynomial-time complexity of this scheme. As a side result, we show that in the case of sharp minimum, the deviated path asymptotically approached the standard central path.

In the last Sect. 4, we discuss several applications of our results to problem of unconstrained optimization. In all cases, we show how to choose the starting point in the epigraph of the objective function, in order to ensure the linear rate of convergence of the scheme. Note that this rate is achieved even if the objective function is nonsmooth.

Notation and Generalities In what follows, we denote by \(\mathbb {E}\) a finite-dimensional real vector space, and by \(\mathbb {E}^*\) its dual space composed by linear functions on \(\mathbb {E}\). For such a function \(s \in \mathbb {E}^*\), we denote by \(\langle s, x \rangle \) its value at \(x \in \mathbb {E}\).

We measure distances in \(\mathbb {E}\) by arbitrary norm \(\Vert \cdot \Vert _{\mathbb {E}}\), denoting by \(\mathcal{B}\) the corresponding unit ball. Then the dual norm is defined in the standard way:

$$\begin{aligned} \begin{array}{rcl} \Vert g \Vert ^*_{\mathbb {E}} = \max \limits _{x \in \mathcal{B}} \; \langle g,x \rangle , \quad g \in \mathbb {E}^*. \end{array} \end{aligned}$$

For two sets \(Q_1\) and \(Q_2\) in \(\mathbb {E}\), we say that \(Q_1 \subset Q_2\) if there exists some \(\epsilon >0\) such that \(Q_1 + \epsilon \, \mathcal{B} \subseteq Q_2\). Sometimes we measure distances in \(\mathbb {E}\) by Euclidean norm \(\Vert \cdot \Vert \). It is defined by a self-adjoint positive-definite operator linear \(B: \mathbb {E}\rightarrow \mathbb {E}^*\) in the following way:

$$\begin{aligned} \Vert x \Vert = \langle B x, x \rangle ^{1/2},\quad x \in \mathbb {E}, \quad \Vert g \Vert ^* = \langle g, B^{-1} g \rangle ^{1/2}, \; g \in \mathbb {E}^*. \end{aligned}$$

For a smooth function \(f: \mathbb {E}\rightarrow \mathbb {R}\), denote by \(\nabla f(x)\) its gradient, and by \(\nabla ^2 f(x)\) its Hessian evaluated at point \(x \in \textrm{dom}\,f\). Note that

$$\begin{aligned} \nabla f(x) \in \mathbb {E}^*, \quad \nabla ^2 f(x) h \in \mathbb {E}^*, \quad h \in \mathbb {E}. \end{aligned}$$

Another possibility for measuring distances is given by the local norms defined by a self-concordant function (see Sect. 4.2 in [12]). Recall that function \(f(\cdot )\) is called self-concordant if it is a closed convex function with open domain, which satisfies the following condition:

$$\begin{aligned} D^3f(x)[h]^3 \le 2 \langle \nabla ^2 f(x) h, h \rangle ^{3/2}, \quad x \in \textrm{dom}\,f, \; h \in \mathbb {E}, \end{aligned}$$

where notation in the left-hand side corresponds to the third directional derivative of function \(f(\cdot )\) along direction h. The Hessian of self-concordant function provides us with the local Euclidean norms:

$$\begin{aligned} \Vert u \Vert _x = \langle \nabla ^2f(x) u, u \rangle ^{1/2}, \quad u \in \mathbb {E}, \quad \Vert g \Vert ^*_x = \langle g, [\nabla ^2f(x)]^{-1} g \rangle ^{1/2}, \quad g \in \mathbb {E}^*. \end{aligned}$$

Among many useful properties of self-concordant functions, we employ the following inequality (e.g. Theorem 4.1.7 in [12]):

$$\begin{aligned} \langle \nabla f(x) - \nabla f(y), x - y \rangle \ge {\Vert y - x \Vert ^2_x \over 1 + \Vert y - x \Vert _x}, \quad x, y \in \textrm{dom}\,f. \end{aligned}$$
(1)

In what follows, we often use different statements from Chapter 4 of [12]. We put the corresponding references in bold. Thus, reference T.1.2 means Theorem 4.1.2 from [12].

2 Set-Limited Functions

Let Q be a closed convex set in \(\mathbb {E}\) with nonempty interior. Denote by \(F(\cdot )\) a self-concordant function with \(\textrm{dom}\,F = \textrm{int}\,Q\). In this paper, we consider the following standard minimization problem:

$$\begin{aligned} c_* = \min \limits _{x \in Q} \; \langle c, x \rangle , \end{aligned}$$
(2)

where c is a linear functional from \(\mathbb {E}^*\).

In order to justify efficiency bounds for corresponding optimization methods, we need to introduce additional assumptions on \(F(\cdot )\). In the standard theory of Polynomial-Time Interior-Point Methods [15], we assume that \(F(\cdot )\) is a \(\nu \)-self-concordant barrier:

$$\begin{aligned} \langle \nabla F(x), h \rangle ^2 \le \nu \langle \nabla ^2F(x)h, h \rangle , \quad x \in \textrm{dom}\,F, \; h \in E, \end{aligned}$$
(3)

where the barrier parameter \(\nu \ge 1\) is responsible for the complexity of problem (2). In this way, it is possible to justify polynomial-time solvability for many important classes of optimization problems (see [15]). However, the barrier property (3) is very fragile. Even an addition to \(F(\cdot )\) a linear function destroys the values of the barrier parameter. The main goal of this paper is to replace (3) by a more robust definition, which still gives us a possibility to prove polynomial-time complexity of interior-point methods.

Note that the condition (3) is local. It looks as an upper bound on the size of the gradient in the local norm defined by the Hessian of the barrier. And maybe this is the main reason of its fragility. In what follows, we replace it by a global condition, related to the size of the gradient with respect to the feasible set.

Definition 2.1

We call convex function \(F(\cdot )\) \(\varkappa \)-set-limited with respect convex set \(Q \subseteq \textrm{dom}\,F\) if there exists a constant \(\varkappa \ge 0\) such that for any \(x, y \in Q\) we have

$$\begin{aligned} \langle \nabla F(x), y - x \rangle \le \varkappa . \end{aligned}$$
(4)

This definition has a geometric interpretation. Define the polar set at point \(x \in \textrm{int}\,Q\) as follows:

$$\begin{aligned} P_Q(x) = \Big \{ g \in \mathbb {E}^*: \langle g, y - x \rangle \le 1, \; y \in Q \Big \}. \end{aligned}$$

Then the condition (4) is equivalent to the following inclusion:

$$\begin{aligned} {1 \over \varkappa } \nabla F(x) \in P_Q(x), \quad x \in \textrm{int}\,Q. \end{aligned}$$
(5)

Note that inequality (4) with \(\varkappa =\nu \) is one of the main properties of self-concordant barrier (see T.2.4(1)). By lifting it up to the status of definition, we significantly increase the class of good barrier functions.

Let us mention simple properties of set-limited functions, which do not need proofs.

  • If \(F(\cdot )\) is \(\varkappa \)-set-limited, then for any \(\lambda >0\) function \(\lambda F(\cdot )\) is \((\lambda \varkappa )\)-set-limited.

  • If functions \(F_i(\cdot )\) are \(\varkappa _i\)-set-limited with respect to the sets \(Q_i\), \(i=1,2\), then function \(F(x) = F_1(x) + F_2(x)\) is \((\varkappa _1+\varkappa _2)\)-set-limited with respect to \(Q = Q_1 \bigcap Q_2\).

  • If \(F(\cdot )\) has bounded variation on the set Q, then it is set-limited with respect to Q with parameter \(\varkappa = \textrm{Var}\,_Q(F) {\mathop {=}\limits ^{\textrm{def}}}\sup _{x,y \in Q} [F(y) - F(x)]\).

The next property is also simple. However, we put it separately for future references.

Lemma 2.1

Let function \(F(\cdot )\) be set-limited and p be a recession direction of its domain. Then

$$\begin{aligned} \langle \nabla F(z), p \rangle \le 0, \quad z \in \textrm{dom}\,F. \end{aligned}$$
(6)

Proof

Indeed, since the inequality \(\langle \nabla F(z), (z + \tau p) - z \rangle {{\mathop {\le }\limits ^{(4)}}} \varkappa \) is valid for arbitrarily large \(\tau > 0\), we get (6). \(\square \)

The following result sometimes is useful.

Lemma 2.2

Let \(f(\cdot )\) be differentiable and concave on the set \(\Omega = \{ x \in \mathbb {E}: f(x) \ge 0 \}\) and \(\textrm{int}\,\Omega \ne \emptyset \). Then function \(F(x) = - \ln f(x)\) is set-limited on \(\Omega \) with \(\varkappa = 1\).

Proof

Indeed, let \(f(x) > 0\) and \(f(y) \ge 0\). Then, since \(f(\cdot )\) is concave, we have

$$\begin{aligned} \langle \nabla F(x), y - x \rangle = {1 \over f(x)} \langle \nabla f(x), x - y \rangle \le {1 \over f(x)} (f(x) - f(y)) \le 1. \end{aligned}$$

\(\square \)

For the needs of interior-point methods, we specify an additional property of set-limited functions.

Definition 2.2

If set-limited function is self-concordant with respect to its domain, we call it the proper set-limited function.

Recall that self-concordant functions are closed and convex and they have open domains. Let us prove that proper set-limited functions belong to the family of self-concordant barriers.

Lemma 2.3

Any proper \(\varkappa \)-self-limited function is also a \(\varkappa ^2\)-self-concordant barrier.

Proof

For \(x \in \textrm{dom}\,F\), let us choose an arbitrary direction \(h \in \mathbb {E}\). If necessary, we can multiply it by \(- 1\), ensuring anyway the inequality \(\langle \nabla F(x), h \rangle \ge 0\). In view of T.4.1.5(1), the point \(y = x + h/\Vert h \Vert _x\) belongs to \(\textrm{Cl}\,(\textrm{dom}\,F)\). Since \(F(\cdot )\) is \(\varkappa \)-set-limited, we have

$$\begin{aligned} 0 \le \langle \nabla F(x), h \rangle = \langle \nabla F(x), y - x \rangle \Vert h \Vert _x \; {{\mathop {\le }\limits ^{(4)}}} \; \varkappa \Vert h \Vert _x. \end{aligned}$$

Thus, the definition (3) is valid with \(\nu = \kappa ^2\). \(\square \)

One of the consequences of this statement is that for proper set-limited functions we always have \(\varkappa \ge 1\) (see L.3.1).

Let us show that the proper set-limited functions inherit one of the most important properties of self-concordant barriers. It is important that for this property we can use parameter \(\varkappa \), not \(\varkappa ^2\).

Theorem 2.1

Let \(F(\cdot )\) be a proper set-limited function with parameter \(\varkappa \). Then for all \(x, y\in \textrm{dom}\,F\) with \(\langle \nabla F(x), y - x \rangle \ge 0\) we have

$$\begin{aligned} \Vert y - x \Vert _x \le \varkappa + 2 \sqrt{\varkappa }. \end{aligned}$$
(7)

Proof

Denote \(r = \Vert x - y \Vert _x\), and let \(r > \sqrt{\varkappa }\) (otherwise, (7) is trivial). Choosing \(\alpha = {1 \over r} \sqrt{\varkappa } < 1\) for \(y_{\alpha } = x + \alpha (y-x)\), we get

$$\begin{aligned} \omega&{\mathop {=}\limits ^{\textrm{def}}}\langle \nabla F(y_{\alpha }), y - x \rangle \ge \langle \nabla F(y_{\alpha }) - \nabla F(x), y - x \rangle \\&= {1 \over \alpha } \langle \nabla F(y_{\alpha }) - \nabla F(x), y_{\alpha } - x \rangle \; {{\mathop {\ge }\limits ^{(1)}}} \; {1 \over \alpha } \cdot { \Vert y_{\alpha } - x \Vert ^2_x \over 1 + \Vert y_{\alpha } - x \Vert _x}\\&= {\alpha \Vert y - x \Vert ^2_x \over 1 + \alpha \Vert y - x \Vert _x} = {r \sqrt{\varkappa } \over 1 + \sqrt{\varkappa }}. \end{aligned}$$

At the same time, \((1-\alpha )\omega = \langle \nabla F(y_{\alpha }), y - y_{\alpha } \rangle {{\mathop {\le }\limits ^{(4)}}} \varkappa \). Thus, \(\left( 1 - {1 \over r} \sqrt{\varkappa } \right) {r \sqrt{\varkappa } \over 1 + \sqrt{\varkappa }} \le \varkappa \), and this is exactly inequality (7). \(\square \)

The following result provides us with many examples of proper set-limited functions.

Lemma 2.4

Let \(F_1(\cdot )\) be \(\nu \)-self-concordant barrier for set \(Q {\mathop {=}\limits ^{\textrm{def}}}\textrm{dom}\,F_1\) and \(F_2(\cdot )\) be a self-concordant function with \(Q \subset \textrm{dom}\,F_2\). Then function \(F(x) = F_1(x) + F_2(x)\) is a proper \(\varkappa \)-set-limited with respect to Q with \(\varkappa = \nu + \textrm{Var}\,_Q (F_2)\).

Proof

Indeed, for any x and y from Q, we have

$$\begin{aligned} \langle \nabla F(x), y - x \rangle= & {} \langle \nabla F_1(x), y - x \rangle + \langle \nabla F_2(x), y - x \rangle \; {{\mathop {\le }\limits ^{(4)}}} \; \nu + F_2(y) - F_2(x)\\\le & {} \nu + \textrm{Var}\,_Q(F_2). \end{aligned}$$

\(\square \)

It is convenient to use Lemma 2.4 when \(\textrm{dom}\,F_2 = \mathbb {E}\). Therefore, let us recall the most important examples of such functions.

  • \(F_2(x) = {1 \over 2}\langle Q x, x \rangle \), where matrix Q is positive semidefinite.

  • \(F_2(x) = \phi ^*(x) {\mathop {=}\limits ^{\textrm{def}}}\sup _{g \in \textrm{dom}\,\phi } [ \langle g, x \rangle - \phi (g)]\), where \(\phi (\cdot )\) is a self-concordant function defined on a bounded open convex set \(\textrm{dom}\,\phi \subset \mathbb {E}^*\) (see Theorem 2.4.1 in [15]).

3 Greedy Path-Following Method

Let \(F(\cdot )\) be a proper set-limited function with respect to the set Q with parameter \(\varkappa \). For solving optimization problem (2), we propose the following simple scheme.

(8)

We call this method greedy since it immediately attacks the problem (2), without preliminary finding the analytic center of the set Q, as it is advised by the standard theory (e.g. Section 4.2 in [12]). Note that the bounds on \(\beta \) ensure \(\gamma > 0\).

Defining \(t_0 = 0\) and \(t_{k+1} = t_k + {\gamma \over \Vert c \Vert ^*_{x_k}}\) for \(k \ge 0\), we can see that \(g_k = g_0 + t_k c\). Let us prove that \(t_k \rightarrow \infty \) and method (8) follows approximately the sequence of minimizers of the auxiliary problems

$$\begin{aligned} \min \limits _{x \in Q} \Big \{ f_k(x) {\mathop {=}\limits ^{\textrm{def}}}\langle g_k, x \rangle + F(x) \Big \}, \quad k \ge 0. \end{aligned}$$

Lemma 3.1

Let \(\beta \in \left( 0, {1 \over 3} \right] \). Then, for any \(k \ge 0\), we have

$$\begin{aligned} \Vert \nabla F(x_k) + g_k \Vert ^*_{x_k} \le \beta . \end{aligned}$$
(9)

Proof

For \(k = 0\), the left-hand side of inequality (9) is zero, so it is satisfied.

Note that the main step at each iteration of method (8) is the Newton Step from the point \(x_k\) for the potential \(f_{k+1}(\cdot )\). Assume that (9) is valid for certain \(k \ge 0\). Then

$$\begin{aligned} \Vert \nabla f_{k+1}(x_k) \Vert ^*_{x_k} = \left\| \nabla F(x_k) + g_{k} + {\gamma c \over \Vert c \Vert ^*_{x_k}} \right\| ^*_{x_k} \; {{\mathop {\le }\limits ^{(9)}}} \; \beta + \gamma . \end{aligned}$$

Therefore, in view of T.1.14, we have

$$\begin{aligned} \Vert \nabla f_{k+1}(x_{k+1}) \Vert ^*_{x_{k+1}} \le \left( \beta + \gamma \over 1 - \beta - \gamma \right) ^2 = \beta . \end{aligned}$$

Hence, inequality (9) is proved for all \(k \ge 0\). \(\square \)

Thus, we have proved that method (8) follows approximately a deviated path \(x_d(\cdot )\), defined by the equation

$$\begin{aligned} \nabla F(x_d(t)) + t c - \nabla F(x_0) = 0, \quad t \ge 0. \end{aligned}$$
(10)

It starts at an arbitrary point \(x_d(0) = x_0 \in \textrm{int}\,Q = \textrm{dom}\,F\), and we can expect that it approaches an optimal solution of the problem (2) as \(t \rightarrow \infty \). In the standard theory of Polynomial-Time Interior-Point Methods, this property is ensured by two additional assumptions. Firstly, we assume that \(F(\cdot )\) is also a \(\nu \)-self-concordant barrier. Another assumption is that we follow the central path, which starts from the analytic center \(x_F\) of the set Q, defined by condition \(\nabla F(x_F) = 0\). In our analysis, we drop the last assumption and study the behavior of \(x_d(\cdot )\) assuming that \(F(\cdot )\) is a proper set-limited function.

First of all, we need to prove that the big values of t provide us with a good approximation of the optimal solution.

Theorem 3.1

Let \(F(\cdot )\) be a proper \(\varkappa \)-set-limited function. Then, for all \(t > 0\) we have

$$\begin{aligned} \langle c, x_d(t) \rangle - c_* \le {\kappa _0 \over t}, \quad \kappa _0 {\mathop {=}\limits ^{\textrm{def}}}2\varkappa + \langle \nabla F(x_0), x_0 - x^* \rangle . \end{aligned}$$
(11)

Moreover, if \(x_k\) satisfies the approximate centering condition (9), then

$$\begin{aligned} \langle c, x_k \rangle - c_* \le {(1 -\beta )\kappa _0 \over (1 - 2 \beta ) \; t_k}. \end{aligned}$$
(12)

Proof

Indeed,

$$\begin{aligned} t \langle c, x_d(t) - x^* \rangle&{{\mathop {=}\limits ^{(10)}}}&\langle - \nabla F(x_d(t)) +\nabla F(x_0), x_d(t) - x^* \rangle \\&{{\mathop {\le }\limits ^{(4)}}}&\varkappa + \langle \nabla F(x_0), x_d(t) - x^* \rangle \; {{\mathop {\le }\limits ^{(4)}}} \; 2 \varkappa + \langle \nabla F(x_0), x_0 - x^* \rangle . \end{aligned}$$

Note that by T.1.5(1), \(\Vert c \Vert ^*_{x_k} \le \langle c, x_k \rangle - c^*\). Therefore, in view of T.1.13, we have

$$\begin{aligned} \langle c, x_k - x_d(t_k) \rangle \le \Vert c \Vert ^*_{x_k} \Vert x_k - x_d(t_k) \Vert _{x_k} \; {{\mathop {\le }\limits ^{(9)}}} \; {\beta \over 1 - \beta } \Vert c \Vert ^*_{x_k} \le {\beta \over 1 - \beta }(\langle c, x_k \rangle - c_*). \end{aligned}$$

Thus, \(\langle c, x_k \rangle - c_* \le {1 - \beta \over 1 - 2 \beta } (\langle c, x_d(t_k) \rangle - c_*)\), and (12) follows from (11). \(\square \)

From the theory of self-concordant barriers, it is known that the path-following strategy along the central path has linear rate of convergence, which depends only on the value of barrier parameter \(\nu \) (e.g. [12]). Let us show that for the deviated paths this role is now taken by the constant \(\kappa _0\).

Theorem 3.2

Let \( \beta \in \left( 0, {2 - \sqrt{3} \over 2} \right) \). Then, for any \(k \ge 1\), we have

$$\begin{aligned} t_k \ge {t_1 \over 2} \cdot 2^{k/\hat{\kappa }_{\beta }}, \end{aligned}$$
(13)

where \({\hat{\kappa }}_{\beta } = 1 + \sqrt{\kappa _{\beta } \over \gamma (\gamma - \beta )}\) with \(\kappa _{\beta } = {1 - \beta \over 1 - 2 \beta } \kappa _0\).

Proof

Denote by \({\hat{k}}\) the smallest integer such that \({\hat{k}} \ge \sqrt{\kappa _{\beta } \over \gamma (\gamma -\beta )}\). Since the values \(t_k\) are monotonically increasing, for all k, \(1 \le k \le {\hat{k}}\), we have

$$\begin{aligned} t_k \ge t_1 \ge t_1 \cdot 2^{(k-{{\hat{k}}})/{{\hat{k}}}}. \end{aligned}$$
(14)

Let us assume now that inequality (14) is valid for all k, \(1 \le k \le n\), where \(n \ge {\hat{k}}\). Note that the upper bound for \(\beta \) ensures \(\gamma > \beta \). At the same time,

$$\begin{aligned} \langle c, x_{k+1} \rangle= & {} \langle c, x_k \rangle - \left\langle c, [\nabla ^2 F(x_k)]^{-1}\left( \nabla F(x_k) + g_k + {\gamma c \over \Vert c \Vert ^*_{x_k}}\right) \right\rangle \\&{{\mathop {\le }\limits ^{(9)}}}&\langle c, x_k \rangle + \beta \Vert c \Vert ^*_{x_k} - \gamma \Vert c \Vert ^*_{x_k}. \end{aligned}$$

Thus, \(0 < (\gamma - \beta ) \Vert c \Vert ^*_{x_k} \le \langle c, x_k \rangle - \langle c, x_{k+1} \rangle \), and we conclude that

$$\begin{aligned} t_{n+1} - t_{n+1-{\hat{k}}} = \sum \limits _{i=n+1-{\hat{k}}}^{n} { \gamma \over \Vert c \Vert ^*_{x_{i}}}&\ge \sum \limits _{i=n+1-{\hat{k}}}^n { \gamma (\gamma -\beta ) \over \langle c, x_{i} \rangle - \langle c, x_{i+1} \rangle }\\&\ge { \gamma (\gamma -\beta ) \over \langle c, x_{n+1-{\hat{k}}} \rangle - c_*} \cdot {\hat{k}}^2 \end{aligned}$$

since \(\sum _{i=n+1-{\hat{k}}}^n [\langle c, x_{i} \rangle - \langle c, x_{i+1} \rangle ] = \langle c, x_{n+1-{{\hat{k}}}} \rangle - \langle c, x_{n+1} \rangle \le \langle c, x_{n+1-{{\hat{k}}}} \rangle - c_*\). Note that

$$\begin{aligned} \langle c, x_{n+1-{\hat{k}}} \rangle - c_* {{\mathop {\le }\limits ^{(12)}}} {\kappa _{\beta } \over t_{n+1-{\hat{k}}}}. \end{aligned}$$

Therefore, \(t_{n+1} \ge t_{n+1-{\hat{k}}} \left( 1 + {\gamma (\gamma - \beta ) \over \kappa _{\beta }} {\hat{k}}^2 \right) \ge 2 t_{n+1-{\hat{k}}}\). Hence, in view of assumption (14),

$$\begin{aligned} t_{n+1} \ge 2 t_1 \cdot 2^{(n+1-2{\hat{k}})/{\hat{k}}} = t_1 \cdot 2^{(n+1-{\hat{k}})/{\hat{k}}}. \end{aligned}$$

Thus, inequality (14) is proved for all \(k \ge 1\). It remains to note that \({\hat{k}} \le \sqrt{\kappa _{\beta } \over \gamma (\gamma -\beta )}+1\). \(\square \)

Thus, if we start from a point \(x_0\) with \(\kappa _0 \le O(\varkappa )\), then the efficiency of Greedy Path-Following Method (8) remains on the level of a standard path-following scheme, equipped with a \(\varkappa \)-self-concordant barrier. Note that we keep the standard possibility to start from a neighborhood of the analytic center \(x_F\) since

$$\begin{aligned} \langle \nabla F(x_0), x_0 - x^* \rangle\le & {} \Vert \nabla F(x_0) \Vert ^*_{x_F} \left( \Vert x_0 - x_F \Vert _{x_F} + \Vert x^* - x_F \Vert _{x_F} \right) \\&{{\mathop {\le }\limits ^{(7)}}}&\left( \Vert x_0 - x_F \Vert _{x_F} + \varkappa + 2 \sqrt{\varkappa } \right) \Vert \nabla F(x_0) \Vert ^*_{x_F}. \end{aligned}$$

Thus, if \(\Vert \nabla F(x_0) \Vert ^*_{x_F}\) is smaller than an absolute constant, we have \(\kappa _0 \le O(\varkappa )\). However, sometimes we have another possibilities for reaching this relation (see Sect. 4).

In the opposite case, if \(\kappa _0\) is very big, then we cannot guarantee good complexity bounds. However, in one particular situation, any deviated path nevertheless approaches asymptotically the central path.

Definition 3.1

We say that problem (2) has sharp minimum if there exists a constant \(\rho > 0\) such that for all \(x \in Q\) we have

$$\begin{aligned} \langle c, x \rangle - c_* \ge \rho \Vert x - x^* \Vert . \end{aligned}$$
(15)

Our analysis is based on the following result, which was initially proved in [18] for the dual formulation of a conic optimization problem.

Lemma 3.2

Let function \(F(\cdot )\) be self-concordant with \(\textrm{dom}\,F = \textrm{int}\,Q\) and the minimum of problem (2) be sharp. Then for any \(x \in \textrm{int}\,Q\) we have

$$\begin{aligned} \nabla ^2F(x) \succeq {\rho ^2 \over 4 [\langle c, x \rangle - c_*]^2} B. \end{aligned}$$
(16)

Proof

Let \(x \in \textrm{int}\,Q\). By T.1.5(1), the ellipsoid \(W = \{ y \in \mathbb {E}: \Vert y - x \Vert _x \le 1 \}\) belongs to Q. Denote \(\tau = \langle c, x \rangle - c_*\). Then \(\langle c, x \rangle - \Vert c \Vert _x^* \ge c_*\). Thus, \(\Vert c \Vert _x^* \le \tau \) and we conclude that for any \(y \in W\) we have \(\langle c, y \rangle - c_* \le 2 \tau \). This means that \(\Vert y - x^* \Vert {{\mathop {\le }\limits ^{(15)}}}{2\tau \over \rho }\). Consequently, for any \(g \in \mathbb {E}^*\), we have

$$\begin{aligned} \max \limits _{y \in W} \; \langle g, y \rangle = \langle g, x \rangle + \Vert g \Vert ^*_x \le \langle g, x^* \rangle + {2 \tau \over \rho } \Vert g \Vert ^*. \end{aligned}$$

Without loss of generality, we assume that \(\langle g, x^* \rangle \le \langle g, x \rangle \) (otherwise, multiply g by \(-1\)). Thus, we have proved the inequality \(\Vert g \Vert ^*_x \le {2 \tau \over \rho } \Vert g \Vert ^*\). Since g is arbitrary, we get (16). \(\square \)

In order to get point \(x_k\) in a neighborhood of the central path, we need to ensure that the norm \(\Vert \nabla F(x_k) + t_k c \Vert ^*_{x_k}\) is smaller than an absolute constant. Note that

$$\begin{aligned} \Vert \nabla F(x_k) + t_k c \Vert ^*_{x_k}\le & {} \Vert \nabla F(x_k) + g_k \Vert ^*_{x_k} + \Vert \nabla F(x_0) \Vert ^*_{x_k}\\&{{\mathop {\le }\limits ^{(16)}}}&\beta + {2 \over \rho } [ \langle c, x_k \rangle - c_*] \Vert \nabla F(x_0) \Vert ^*. \end{aligned}$$

Hence, by inequality (12), we obtain an upper bound for this moment.

4 Second-Order Methods for Structured Unconstrained Minimization

Let us consider the following problem of unconstrained convex optimization:

$$\begin{aligned} \min \limits _{x \in \mathbb {E}} \; f(x), \end{aligned}$$
(17)

where function \(f(\cdot )\) has an explicit structure, which allows us to point out a proper set-limited function F with respect to its epigraph \(\mathcal{E}_f = \{ z = (\tau ,x) \in \mathbb {R}\times \mathbb {E}: \tau \ge f(x) \}\). This gives us a possibility to solve problem (17) by a linearly convergent method (8) as applied to the initial optimization problem, rewritten in the standard form:

$$\begin{aligned} \min \limits _{z}\Big \{ \langle c, z \rangle \equiv \tau : z = (\tau ,x) \in \mathcal{E}_f \Big \}. \end{aligned}$$
(18)

Note that a special form of the feasible set in (18) gives us additional possibilities in keeping the value \(\langle \nabla F(z_0), z_0 - z^* \rangle \) small enough (see (11) and (13) for its role in the complexity bounds). Indeed, given by arbitrary starting point \(x_0 \in \mathbb {E}\), we are free to choose any starting value \(\tau _0 > f(x_0)\). If \(\tau _0\) is big enough, then we can expect the value of \(\langle \nabla F(z_0), z_0 - z^* \rangle \) to be small, or even negative. Note that the condition

$$\begin{aligned} \langle \nabla F(z_0), z_0 - z^* \rangle \le 0 \end{aligned}$$
(19)

allows us to estimate the distance to the solution from the starting point by Theorem 2.1. At the same time, it is much weaker than any of the ”centering” conditions for the starting point. Let us look at the following example. \(\square \)

Example 4.1

Let the objective function in problem (17) is \(f(x) = | x |\), \(x \in \mathbb {R}\equiv \mathbb {E}\). We can rewrite this problem in the standard form as follows:

$$\begin{aligned} \min \limits _{z = (\tau ,x) \in \mathbb {R}^2} \Big \{ \tau : \tau \ge | x | \Big \}. \end{aligned}$$
(20)

Thus, a natural self-concordant barrier for epigraph of the objective is \(F(z) = - \ln (\tau ^2 - x^2)\) with barrier parameter \(\nu = 2\). The central path z(t) of problem (2) is the trajectory of minimizers of the following function:

$$\begin{aligned} t \tau - \ln (\tau ^2 - x^2), \quad t > 0. \end{aligned}$$

Thus, \(z(t) = ({2 \over t}, 0)\). A narrow neighborhood \(\mathcal{N}_{\beta }\) of the central path, which is appropriate for the standard path-following scheme, can be found from the following condition:

$$\begin{aligned} z \in \mathcal{N}_{\beta } \Leftrightarrow \min \limits _{t > 0} \Vert z - z(t) \Vert _z \le \beta , \end{aligned}$$

where \(\beta \in (0,{1 \over 3})\) (see Sect. 4.2 in [12]). Simple but tedious computations tell us that

$$\begin{aligned} \min \limits _{t > 0} \Vert z - z(t) \Vert ^2_z = {2 x^2 \over x^2 + \tau ^2}. \end{aligned}$$

Hence, the neighborhood \(\mathcal{N}_{\beta }\) has the following representation:

$$\begin{aligned} \mathcal{N}_{\beta } = \left\{ (\tau ,x): \tau \ge \sqrt{{2 \over \beta ^2} - 1} \cdot | x | \right\} . \end{aligned}$$

Thus, if \(\beta \) is small, this is a tiny cone around the horizontal axis in \(\mathbb {R}^2\). The standard theory of Interior-Point Methods requires to start the path-following process exactly from this region. Let us look what happens with condition (19). Since \(z^* = (0,0)\), for any \(z \in \textrm{dom}\,F\) we have:

$$\begin{aligned} \langle \nabla F(z), z - z^* \rangle = \langle \nabla F(z), z \rangle = - 2. \end{aligned}$$

Thus, any point in \(\textrm{dom}\,F\) is appropriate for starting the corresponding deviated path, which approaches the optimal solution of problem (20) with linear rate. \(\square \)

In the remaining part of this section, we show how to ensure condition (19) for different problem settings.

4.1 Lipschitz-Continuous Functions

Let us assume that the objective function of problem (17) satisfies the following condition:

$$\begin{aligned} f(x) - f^* \le L_f \Vert x - x^* \Vert _{\mathbb {E}}, \quad x \in \mathbb {E}, \end{aligned}$$
(21)

where \(\Vert \cdot \Vert _{\mathbb {E}}\) is an arbitrary norm on \(\mathbb {E}\). This means that all elements of the cone

$$\begin{aligned} \mathcal{R} = \Big \{ z = (\tau ,x) \in \mathbb {R}\times \mathbb {E}: \tau \ge L_f \Vert x \Vert _{\mathbb {E}} \Big \} \end{aligned}$$

are recession directions of the epigraph \(\mathcal{E}_f\).

Let us choose arbitrary \(x_0 \in \mathbb {E}\) with a known bound to the optimum: \(\Vert x_0 - x^* \Vert _{\mathbb {E}} \le R\). Then we can take

$$\begin{aligned} z_0 = (f(x_0) + L_f R, x_0). \end{aligned}$$
(22)

In this case, for direction \(p_0 = z_0 - z^* = (f(x_0) - f^* + L_f R, x_0 - x^*)\), we have

$$\begin{aligned} L_f \Vert x_0 - x^* \Vert _{\mathbb {E}} \le L_fR \le f(x_0) - f^* + L_fR. \end{aligned}$$

Thus, \(p_0 \in \mathcal{R}\), and by Lemma 2.1 we conclude that

$$\begin{aligned} \varkappa _0 {{\mathop {=}\limits ^{(11)}}} 2 \varkappa + \langle \nabla F(z_0), z_0 - z^* \rangle {{\mathop {\le }\limits ^{(6)}}} 2 \varkappa . \end{aligned}$$

Hence, the choice (22) of the starting point ensures the following complexity of method (8) as applied to problem (17):

$$\begin{aligned} O \left( \sqrt{\varkappa } \ln { \Vert c \Vert ^*_{z_0} \over \epsilon } \right) \end{aligned}$$
(23)

iterations of the path-following scheme.

From the view point of iteration complexity, method (8) is a second-order scheme. However, it can be applied to the objective functions in (17) satisfying no specific assumptions on their smoothness except boundedness of the first-order derivatives (21). An alternative approach could consist in applying to \(f(\cdot )\) a variant of smoothing technique [13] and minimizing the result by a second-order method. However, to the best of our knowledge, all existing strategies of this kind have sublinear rate of convergence.

4.2 Max-Type Functions

Let \(f(x) = \max _{1 \le i \le m} f_i(x)\), where the functions \(f_i(\cdot )\) are closed and convex, with epigraphs

$$\begin{aligned} \mathcal{E}_i = \{ z = (\tau ,x): \tau \ge f_i(x)\} \end{aligned}$$

admitting \(\nu _i\)-self-concordant barriers in the form \(F_i(z) = - \ln (\tau - f_i(x))\), \(i = 1, \dots , m\). For an important example of quadratic \(f_i(\cdot )\), we have all \(\nu _i = 1\). Note that quadratic functions do not satisfy assumption (21) of Sect. 4.1.

For the standard barrier \(F(z) = - \sum _{i=1}^m \ln (\tau - f_i(x))\), we have \(\varkappa = \sum _{i=1}^m \nu _i\). At the same time,

$$\begin{aligned} \langle \nabla F(z_0), z_0 - z^* \rangle= & {} \sum \limits _{i=1}^m {1 \over \tau _0 - f_i(x_0)} [ \tau ^* - \tau _0 + \langle \nabla f_i(x_0), x_0 - x^* \rangle ]\\\le & {} \sum \limits _{i=1}^m {1 \over \tau _0 - f_i(x_0)} [ f(x_0) - \tau _0 + \langle \nabla f_i(x_0), x_0 - x^* \rangle \\\le & {} \sum \limits _{i=1}^m {f(x_0) - f_i(x_0) \over \tau _0 - f(x_0)} + \sum \limits _{i=1}^m {\Vert \nabla f_i(x_0) \Vert ^*_{\mathbb {E}} \over \tau _0 - f(x_0)} R - m, \end{aligned}$$

where \(R \ge \Vert x_0 - x^* \Vert _{\mathbb {E}}\). Thus, in order to have \(\langle \nabla F(z_0), z_0 - z^* \rangle \le 0\), it is enough to choose

$$\begin{aligned} \tau _0 = f(x_0) + {1 \over m} \sum \limits _{i=1}^m \Big \{ f(x_0) - f_i(x_0) + R\Vert \nabla f_i(x_0) \Vert ^*_{\mathbb {E}} \Big \}. \end{aligned}$$
(24)

In this case, the corresponding implementation of method (8) admits the polynomial-time complexity bound (23).

4.3 Average Function

Let \(f(x) = {1 \over m}\sum _{i=1}^m f_i(\langle a_i, x \rangle - b_i)\), where the univariate functions \(f_i(\cdot )\) are convex on \(\mathbb {R}\). We assume also that they are Lipschitz-continuous:

$$\begin{aligned} | f_i(s_1) - f_i(s_2) | \le L_i | s_1 - s_2 |, \quad s_1,s_2 \in \mathbb {R}, \; i = 1, \dots , m. \end{aligned}$$
(25)

Let the epigraphs \(\mathcal{E}_i = \{z=(\tau ,s) \in \mathbb {R}^2: \tau \ge f_i(s) \}\) admit \(\nu _i\)-self-concordant barriers \(F_i(\cdot ,\cdot )\) with parameters \(\nu _i\), \(i = 1, \dots , m\). Then we can use the following proper set-limited function

$$\begin{aligned} F(\tau ,x) = \sum \limits _{i=1}^m F_i(\tau ^{(i)}, \langle a_i, x \rangle - b_i), \quad \tau = (\tau ^{(1)}, \dots , \tau ^{(m)}) \in \mathbb {R}^m, \; x \in \mathbb {E}, \end{aligned}$$

with parameter \(\varkappa = \sum _{i=1}^m \nu _i\). Our objective function now is \({1 \over m}\sum _{i=1}^m \tau ^{(i)}\).

Let \(z = (\tau ,x)\). Our goal is to find the starting point \(z_0\) such that \(\langle \nabla F(z_0), z_0 - z^* \rangle \le 0\). In view of Lemma 2.1, for that we need to ensure the inclusion

$$\begin{aligned} z_0 - z^* \in \mathcal{R} = \Big \{(\tau ,x): \tau ^{(i)} \ge L_i |\langle a_i, x \rangle |, \; i = 1, \dots , m \Big \}. \end{aligned}$$

Thus, we can handle each coordinate independently. Note that for \(R \ge \Vert x_0 - x^* \Vert _{\mathbb {E}}\) we have

$$\begin{aligned}{} & {} L_i|\langle a_i, x_0 - x^* \rangle |\\ {}{} & {} \quad \le L_i \Vert a_i \Vert ^*_{\mathbb {E}} R {{\mathop {\le }\limits ^{(25)}}} 2 L_i \Vert a_i \Vert ^*_{\mathbb {E}} \cdot R + f_i(\langle a_i, x_0 \rangle - b_i) - f_i(\langle a_i, x^* \rangle - b_i). \end{aligned}$$

Therefore, we can take

$$\begin{aligned} \tau ^{(i)}_0 = f_i(\langle a_i, x_0 \rangle - b_i) + 2 L_i \Vert a_i \Vert ^*_{\mathbb {E}} \cdot R, \quad i = 1, \dots , m. \end{aligned}$$
(26)

In this case, the corresponding second-order scheme (8) has the complexity bound (23).

As an example, consider the following problem, arising in Machine Learning:

$$\begin{aligned} \min \limits _{x \in \mathbb {E}} \left\{ {1 \over m} \sum \limits _{i=1}^m \phi _0(\langle a_i, x \rangle - b_i) \right\} , \quad \phi _0(s) = \max \{0,s\}, \quad L_{\phi _0} = 1. \end{aligned}$$
(27)

This is a convex nonsmooth optimization problem, which admits only slowly convergent optimization schemes. Therefore, for accelerating the methods, very often the function \(\phi _0(\cdot )\) is replaced by its smooth approximation \(\phi _{\mu }(s) = \mu \ln \left( 1 + e^{s/\mu }\right) \) with \(\mu > 0\), augmenting sometimes the objective in (27) by a strongly convex regularization term.

In our approach, we get linearly convergent scheme directly for the problem (27). For that, we need to endow the epigraph \(\mathcal{E}_{\phi _0} = \{ (\tau ,s) \in \mathbb {R}^2: \tau \ge \phi _0(s) \}\) with the standard 2-self-concordant barrier

$$\begin{aligned} F(\tau ,s) = - \ln (\tau - s) - \ln \tau , \quad \tau > 0, \; s \in \mathbb {R}, \end{aligned}$$

and use method (8) for the epigraph of the objective function.